Using Big Data and Efficient Methods to
Capture Stochasticity for Calibration of
Macroscopic Traffic Simulation Models
1 Department of Civil and Environmental Engineering,
Rutgers University, New Jersey
2 Center for Urban Science + Progress (CUSP);
Department of Civil & Urban Engineering, New York University, New York
Sandeep Mudigonda
1, Kaan Ozbay
2Simulation & Calibration
Traffic Simulation Model S I | , Sim s s O I CCalibration
Inputs: Travel Demand Geometry Operational rules Obs O{
}
min : ( , ( , )) S obs sim s s C ε U O O I C s C Calibration Parameters: User-, traffic-related parameters ε Simulated Outputs: given inputs and parameters Observed Field Data Error : ( , ) | ,( , ) functional form of the internal models in a simulation system
simulation output data given the input data and calibrations, margin of error between simulation
obs s s sim s s s s sim O f I C O I C f I C O ε ε → + = =
= output and observed data, and,
observed field data.
obs O =
Model Inputs: • Driver Characteristics Data • Vehicle Composition Data • Travel Demand Data • Ped./Bike Data Model Parameters: • Link (capacity, speed limit,…)
• Path (route choice,
tolls,…) • Infrastructure (signal timings, VMS, work zones, …) • Weather • Driver behavior data • Activity data Observed Outputs:
• Flows & Speeds
• Queue data
• Trajectories
• Accidents (?)
• Emissions
• Other
Study Data Hourdakis et
al. (2003)
5-min. data; 21 detector stations; 12-mile freeway section; PM peak; 3 days Jha et al.
(2004)
Detector data;15 days; AM and PM peaks; large urban network Toledo et al.
(2004)
68 detector stations; 3 freeways; 5 weekdays
Qin and Mahmassani (2004)
7 detector stations; 3 freeways; AM peak; 5 weekdays
Kim et al. (2005)
Travel time data for 1 hr.; AM peak;1.1 km. freeway section Balakrishna
et al. (2007)
15 min. data; 33 detector stations
Zhang et al. (2008)
5-min detector count; PM peak; 7 days
Mudigonda et al. (2009)
ETC data for AM and PM peaks
Lee and Ozbay (2009)
5-min detector count; AM & PM peaks ; 16 days
OR
OR OR
Data used for simulation calibration, • spans 3-16 days,
• limited to few specific conditions or,
• a diluted sample of different
conditions
Data used in Previous Calibration Studies
Distribution of traffic data: Typical day?
• Illustration of demand
clusters: No “typical” day
• Big Data’s large spatial
and temporal extent can help calibrate and
validate traffic simulation models.
– RFIDs,
– GPS-equipped devices,
– Traffic sensors and
Incorporating Stochasticity in Macroscopic Model
• Macroscopic model used adopted to simulate traffic flow during
different conditions.
• Stochastic version of first-order model
• Hence simulation parameters and output are obtained
(
)
( , , ) ( , , ) 0 ( , , )
( , ) : - for t-th time period ( ,0) ( , ) : stochastic initial condition ( , ) ( , ) : stochastic demand t x t I B i i x t x t x t D q v f q x x x t t Z ω ω ω ρ ρ ω ρ ρ ρ ω ρ ρ ∂ +∂ = ∈ × Ω = = = ρ ρv ρ @ ( , ) ( , , ) t x f t x A P ρ = ∈ Ω Θtx = g t x( , ) ( , , )∈ Ωʹ′ ʹ′ ʹ′A P
t (time of day, season, weather, etc.) and x (distance, changing geometry or pavement condition in different parts of the network).
METHODOLOGY
( : , , ) 0
Solving Stochastic Traffic Simulation Models
• Computational complexity is an important factor in the choice of
numerical solution methods
• Most simple and common solution method is a Monte
Carlo-type independent sampling of n simulation runs for various traffic conditions.
• No. of replications for a level of precision γ:
• Convergence rate for MC-type method is slow: O(1/√n)
• Depending on the size of the network and no. of stochastic
dimensions, this approach can become prohibitive in terms of computational time requirements.
• Also, all possible points in the stochastic space of simulation
output may not have corresponding observed data.
METHODOLOGY 2 1,1 /2 ( ) ( ) ( ) n t S n n n α γ γ − − ⎛ ⎞ ≥ ⎜ ⎟ ⎜ ⋅ Ξ ⎟ ⎝ ⎠
Stochastic Collocation
• The stochasticity is treated as another dimension and the
stochastic solution space Ω is approximated (Γ) using
– a set of prescribed support nodes with basis functions Θj (stochastic
collocation)
• The multi-dimensional stochastic solution is approximated by
an interpolation function built using deterministic solutions evaluated at each of a set of prescribed nodes (collocation points) METHODOLOGY
{ }
1 Q j j= ∈ Γ Θ 1 ( , , ) ( , , ) ( ) ˆ ( ), where, ( ) pdf/weight of-th interpolation basis function
Q j j j j j x t x t p d p j ρ α α Γ = ≈ ≈ = Θ = =
∫
∑
ρ ξ ρ ξ ξ ξ ρ ξ ξ %For higher dimensions of stochasticity,
computationally efficient schemes are required to reduce the number of collocation points.
Smolyak Algorithm
• Developed originally for multi-dimensional integration
• 1-D interpolant
• In N-dimensions the full tensor interpolant is approximated by
the sparse grid interpolant
• Error: O(Q-2|logQ|3(N-1)). (piecewise linear basis)
• O(Q-k|logQ|(k+2)(N-1)). (k-polynomial basis)
1 ( ) i ( ) , # nodes at level . m i j j i j U f f L m i = =
∑
Θ = Can be controlled by poly. order k : O(Smolyak) > O(MC), O(LHS)Collocation Points at which the deterministic simulation is performed Distribution of parameter 2 (jam density, etc.) Distribution of parameter 1 (free
flow speed, etc.) Multiple replications
for variance
reduction due to stochastic demands
Parameter Optimization
• From each realization of the parameter set, using the demand
distribution as an input, the simulation output distribution (e.g., flow or density distribution) is generated.
• This distribution is compared with the observed output distribution
and using a test statistic (such as the test statistic from the KS test), the error is estimated.
• This error is used as an objective function and is minimized as part
of the multi-objective parameter optimization using the simultaneous perturbation stochastic approximation (SPSA) algorithm.
{
1 1 2 2}
1
min ( , ( )) ( , ( )) where,
, - observed and simulated flows at location
, - observed and simulated densities for location - parameter set for time period
t N Ob S k Ob S k i i t i i t i Ob S i i Ob S i i k t wU q q w U q q i i ρ ρ ρ ρ Θ = Θ + Θ Θ
∑
1 2 1 2 and iteration , - weights for the error measures, - functions representing the error in flow and density
t k
w w U U
Weight parameter signifies the variance of each output measure in
the data
w
Output at collocation point Collocation Points Deterministic 1st Order PDE Error ≤ Allowed Error ? Output Distribution Yes
Flowchart of
Calibration
using
Stochastic
Collocation
No ( , )x tj ρAny existing simulation or legacy codes can alternatively be used Parallelizable Input Demand / Parameter Distribution j = j + 1 1 |Q j j t = ? j Q t = t j t No Yes *( , , )x t Z ρ METHODOLOGY SPSA Optimization New Parameter Set k t Θ j = 0 END
Study section
• Section of NJTPK at interchange 7 with
a single on- and off-ramp with stochastic demand
• Big Data: ETC Data
– Vehicle-by-vehicle entry and exit time, lane,
transaction type, vehicle type, number of axles.
– Available in NJ for 150 miles of NJTPK and
170 miles of GSP.
• The variation in demand at this section
is captured using the ETC data for every 5 minutes between January 1, 2011 and August 31, 2011.
Study Data
• The demand is divided into
clusters using k-means algorithm.
• For each cluster, the distribution
of demand during each 5 minute time period is generated.
• The simulation is performed for
weekday
– AM peak (7-9AM)
– off-peak (10AM-12PM)
Implementation of Proposed Approach to Study
section
• With the demand distribution as an input, for each realization
of the parameter set, the simulation output flow distribution is generated.
• Clemshaw-Curtis grid is the appropriate sparse grid to
discretize the stochastic demand.
• Sparse grid interpolation is performed using the output of the
simulation at each collocation node.
• Distribution of simulated flows is obtained by repeated
evaluation of the Smolyak interpolation function.
• This distribution is compared with the sensor data flow
distribution and using the test statistic from the KS test @ 90% sign., the error is estimated and is minimized using the SPSA algorithm.
Results
• To achieve the flow
distribution for ,
• AM peak:
– SC approach required 2433
evaluations
– MC-type sampling required
Results
• To achieve the flow
distribution for ,
• Off-peak
– SC approach required 441
evaluations
– MC-type sampling required
Results
• Weekend peak
– SC approach required 441
evaluations
– MC-type sampling required
Results
• To illustrate the drawback
of using limited data, we compare the distribution of flow for high and low weekend demands with the case where only three weekend days of flow
and demand are used to calibrate the weekend model.
Conclusions
• Calibrating for various traffic conditions require large datasets.
• Big Data such as RFID data from ETC data is useful.
• However, Big Data poses the computational problem in
calibrating for all conditions.
• Traditional MC-type sampling need heavy computational
resources
• We propose a methodology to capture stochasticity using
stochastic collocation by defining each stochastic factor as a dimension.
• Computationally efficient sparse grids are used to sample the
stochastic space and build an interpolant using the deterministic output at each support node of the grid.
Conclusions & Future Work
• Using 5-min. 8-month demand data, we calibrate AM peak,
off-peak and weekend peak macroscopic traffic models.
• Distribution of flows is obtained from the interpolant and used
with the observed dist. to build a KS test stat. for calibration using SPSA.
• Proposed methodology:
– Any type of simulation model
– Efficient than MC-type methods
– Parallelized to increase speed
• Use stochastic parameters for jam density and wave speed
for the traffic flow fundamental diagram for a larger freeway section.
• Apply methodology for a larger network with higher
dimensions of stochasticity