Technologies dedicated to pedestrian crowd traffic management are emerging. The tracking of human crowd motion is a key problem in this field. Despite many recent advances, it is still difficult to accurately track pedestrians in real-world scenarios, especially as the crowd density increases. The problem is hard problem due to the following reasons: intra-pedestrian occlusion (one pedestrian blocking another), changes in lighting and pedestrian appearance, and the difficulty of modeling human behavior or the intent of each pedestrian. In this context, our objective is to improve the accuracy of tracking algorithms.
In this chapter, we restrict ourselves to online and realtime trackers (Li et al., 2008b; Breitenstein et al., 2011; Li et al., 2008a; Khan et al., 2004; Comaniciu et al., 2000; SanMiguel et al., 2012), which tend to compute the trajectories based on current and prior frames. Many of these trackers use motion priors to update the trajectories of the pedestrians between successive frames, and propagate the search space from one frame to the next. The simplest algorithms to model the motion are based on constant velocity or constant acceleration formulations. However, these techniques are unable to model the interaction between the pedestrians, as the crowd density increases.
The simpler motion models assume that agents will ignore any interactions with other pedestrians, instead assuming that they will follow “constant-speed” or “constant-acceleration” paths to their immediate destinations. However, the accuracy of this assumption decreases as crowd density in the environment increases (e.g. to 2-4 pedestrians per square meter). More sophisticated pedestrian motion models take into account interactions between pedestrians, formulated either in terms of attraction or repulsion forces or collision-avoidance constraints.
In real-world scenarios, the trajectory of each pedestrian is governed by its intermediate goal location, intrinsic behaviors, as well as local interactions with other pedestrians and obstacles in the scene. In a dense crowd setting, the behavior of each pedestrian changes in response to the environment, the overall crowd
Figure 2.1: Improved Realtime TrackingThe results of our approach on some challenging datasets. We highlight the performance of our algorithm for realtime tracking of pedestrians in indoor and outdoor scenes (shown above) with many tens of pedestrians. In such challenging scenarios, our algorithm can track up to 79% of pedestrians in a frame at 26 fps (on average). We observe up to 20% improvement in the accuracy over prior interactive methods.
density and flow, and the behavior of other pedestrians. It may not be possible, therefore, to model the overall behavior of each pedestrian with a single, homogeneous motion model. Furthermore, each of these homogeneous models is described using some parameters that may correspond to the size, speed, anticipation period, or local navigation constraints of each pedestrian. The accuracy of each motion model is governed by the choice of these parameters. As the behavior of each pedestrian responds to changes in a dynamic environment, these model parameters should be recomputed or updated to improve the resulting motion model’s accuracy. Overall, we need efficient techniques that can take into account heterogeneous behaviors based on constantly changing models and underlying parameters.
Main Results:
We present a hybrid formulation that combines that combines discrete (microscopic) and continuum (macroscopic) pedestrian motion models. The discrete model is used to predict the local interactions and
collision avoidance behaviors of each pedestrian whereas the continuum method is used to model the flow of homogeneous clusters within a crowd. Our primary contributions include:
• We cluster pedestrians in a crowd based on different characteristics including their positions, velocity, inter-pedestrian distance, orientations, etc.
• For each large cluster, we model its trajectory using a continuum flow model.
• For small clusters and individual pedestrians, we model their motion using an adaptive microscopic mixture motion model algorithm.
• We combine the discrete and continuum models with particle filters to track the pedestrians at interactive rates.
Figure 2.2: The left image highlights the tracked trajectories based on discrete motion models. The image on the right demonstrates the use of a hybrid motion model, using the continuum method for a cluster of pedestrians as well as discrete motion models for individuals. These clusters are computed in realtime based on frame coherence and pedestrian flow. The hybrid motion model can improve the tracking accuracy in these dense scenarios by 20% over prior methods.
The motion model parameter (for microscopic clusters) estimation is formulated as an optimization problem, and we use an approach that solves this combinatorial optimization problem in a model-independent manner and that is hence scalable to include any multi-agent pedestrian motion model. Our formulation computes the best-fit microscopic mixture motion model for each pedestrian based on prior tracked data. Our approach can be viewed as a feedback pipeline. In order to characterize the heterogeneous, dynamic behavior of each agent, we use an optimization-based scheme to perform the following steps:
• Choosing, every few frames, the new motion model that best describes the local behavior of each pedestrian based on tracked data.
Figure 2.3: Our microscopic mixture motion model can accurately compute the trajectories in real time. We highlight different motion models (Boids, Social-Forces, or reciprocal velocity obstacles) used for the same pedestrian (marked in red) over different frames. We believe that it is not possible to model the trajectory of all pedestrians based on a single, uniform model. Instead, we adaptively choose the best-fit model for every pedestrian in the scene that can be adapted to the environment or the crowd conditions
• Computing the optimal set of parameters for that motion model that best fit this tracked data.
• Computing the adaptive number of particles for each pedestrian based on a combination of metrics for optimizing performance.
The resulting mixture model is used to predict the next state of the pedestrian for the next frame. In other words, the next state is used as motion prior input for the tracker; it is also combined with a confidence estimation computation to dynamically compute the number of particles. As a final step, the tracker’s definitively estimated next state is fed back into the loop, becoming the most recent agent state. Our approach can track the positions of tens of pedestrians in around 40-50 milliseconds over long-intervals. Furthermore, we demonstrate its benefits over prior real-time prediction algorithms.
The rest of our chapter is organized as follows. Section II gives a brief overview of prior work in tracking and motion models. We present our algorithm in Section III. We highlight its performance on different crowd video datasets in Section IV and compare its performance with prior methods.