Using Big Data and Efficient Methods to Capture Stochasticity for Calibration of Macroscopic Traffic Simulation Models

(1)

Using Big Data and Efficient Methods to

Capture Stochasticity for Calibration of

Macroscopic Traffic Simulation Models

1 _{Department of Civil and Environmental Engineering,}

Rutgers University, New Jersey

2 _{Center for Urban Science + Progress (CUSP);}

Department of Civil & Urban Engineering, New York University, New York

Sandeep Mudigonda

1

_{, Kaan Ozbay}

2

(2)

Simulation & Calibration

Traffic Simulation Model S I | , Sim s s O I C

Calibration

Inputs: Travel Demand Geometry Operational rules Obs O

{

}

min : ( , ( , )) S obs sim s s C ε U O O I C s C Calibration Parameters: User-, traffic-related parameters ε Simulated Outputs: given inputs and parameters Observed Field Data Error : ( , ) | ,

( , ) functional form of the internal models in a simulation system

simulation output data given the input data and calibrations, margin of error between simulation

obs s s sim s s s s sim O f I C O I C f I C O ε ε → + = =

= output and observed data, and,

observed field data.

obs O ₌

(3)

Model Inputs: •  Driver Characteristics Data •  Vehicle Composition Data •  Travel Demand Data •  Ped./Bike Data Model Parameters: •  Link (capacity, speed limit,…)

•  Path (route choice,

tolls,…) •  Infrastructure (signal timings, VMS, work zones, …) •  Weather •  Driver behavior data •  Activity data Observed Outputs:

•  Flows & Speeds

•  Queue data

•  Trajectories

•  Accidents (?)

•  Emissions

•  Other

(4)

Study Data Hourdakis et

al. (2003)

5-min. data; 21 detector stations; 12-mile freeway section; PM peak; 3 days Jha et al.

(2004)

Detector data;15 days; AM and PM peaks; large urban network Toledo et al.

(2004)

68 detector stations; 3 freeways; 5 weekdays

Qin and Mahmassani (2004)

7 detector stations; 3 freeways; AM peak; 5 weekdays

Kim et al. (2005)

Travel time data for 1 hr.; AM peak;1.1 km. freeway section Balakrishna

et al. (2007)

15 min. data; 33 detector stations

Zhang et al. (2008)

5-min detector count; PM peak; 7 days

Mudigonda et al. (2009)

ETC data for AM and PM peaks

Lee and Ozbay (2009)

5-min detector count; AM & PM peaks ; 16 days

OR

OR OR

Data used for simulation calibration, • spans 3-16 days,

•  limited to few specific conditions or,

•  a diluted sample of different

conditions

Data used in Previous Calibration Studies

(5)

Distribution of traffic data: Typical day?

•  Illustration of demand

clusters: No “typical” day

•  Big Data’s large spatial

and temporal extent can help calibrate and

validate traffic simulation models.

–  RFIDs,

–  GPS-equipped devices,

–  Traffic sensors and

(6)

Incorporating Stochasticity in Macroscopic Model

•  Macroscopic model used adopted to simulate traffic flow during

different conditions.

•  Stochastic version of first-order model

•  Hence simulation parameters and output are obtained

(

)

( , , ) ( , , ) 0 ( , , )

( , ) : - for t-th time period ( ,0) ( , ) : stochastic initial condition ( , ) ( , ) : stochastic demand t x t I B i i x t x t x t D q v f q x x x t t Z ω ω ω ρ ρ ω ρ ρ ρ ω ρ ρ ∂ +∂ = ∈ × Ω = = = ρ ρv ρ @ ( , ) ( , , ) t x f t x A P ρ = ∈ Ω Θtx = g t x( , ) ( , , )∈ Ωʹ′ ʹ′ ʹ′A P

t (time of day, season, weather, etc.) and x (distance, changing geometry or pavement condition in different parts of the network).

METHODOLOGY

( : , , ) 0

(7)

Solving Stochastic Traffic Simulation Models

• Computational complexity is an important factor in the choice of

numerical solution methods

• Most simple and common solution method is a Monte

Carlo-type independent sampling of n simulation runs for various traffic conditions.

• No. of replications for a level of precision γ:

• Convergence rate for MC-type method is slow: O(1/√n)

• Depending on the size of the network and no. of stochastic

dimensions, this approach can become prohibitive in terms of computational time requirements.

• Also, all possible points in the stochastic space of simulation

output may not have corresponding observed data.

METHODOLOGY 2 1,1 /2 ( ) ( ) ( ) n t S n n n α γ γ − − ⎛ ⎞ ≥ ⎜ ⎟ ⎜ _{⋅ Ξ} ⎟ ⎝ ⎠

(8)

Stochastic Collocation

•  The stochasticity is treated as another dimension and the

stochastic solution space Ω is approximated (Γ) using

–  a set of prescribed support nodes with basis functions Θ_j (stochastic

collocation)

•  The multi-dimensional stochastic solution is approximated by

an interpolation function built using deterministic solutions evaluated at each of a set of prescribed nodes (collocation points) METHODOLOGY

{ }

1 Q j _j₌ ∈ Γ Θ 1 ( , , ) ( , , ) ( ) ˆ ( ), where, ( ) pdf/weight of

-th interpolation basis function

Q j j j j j x t x t p d p j ρ α α Γ = ≈ ≈ = Θ = =

∫

∑

ρ ξ ρ ξ ξ ξ ρ ξ ξ %

For higher dimensions of stochasticity,

computationally efficient schemes are required to reduce the number of collocation points.

(9)

Smolyak Algorithm

•  Developed originally for multi-dimensional integration

•  1-D interpolant

•  In N-dimensions the full tensor interpolant is approximated by

the sparse grid interpolant

•  Error: O(Q-2|logQ|3(N-1)). (piecewise linear basis)

•  O(Q-k|logQ|(k+2)(N-1)). (k-polynomial basis)

1 ( ) i ( ) , # nodes at level . m i j j i j U f f L m i = =

∑

Θ = Can be controlled by poly. order k : O(Smolyak) > O(MC), O(LHS)

(10)

Collocation Points at which the deterministic simulation is performed Distribution of parameter 2 (jam density, etc.) Distribution of parameter 1 (free

flow speed, etc.) Multiple replications

for variance

reduction due to stochastic demands

(11)

Parameter Optimization

•  From each realization of the parameter set, using the demand

distribution as an input, the simulation output distribution (e.g., flow or density distribution) is generated.

•  This distribution is compared with the observed output distribution

and using a test statistic (such as the test statistic from the KS test), the error is estimated.

•  This error is used as an objective function and is minimized as part

of the multi-objective parameter optimization using the simultaneous perturbation stochastic approximation (SPSA) algorithm.

{

1 1 2 2

}

1

min ( , ( )) ( , ( )) where,

, - observed and simulated flows at location

, - observed and simulated densities for location - parameter set for time period

t N Ob S k Ob S k i i t i i t i Ob S i i Ob S i i k t wU q q w U q q i i ρ ρ ρ ρ Θ = Θ + Θ Θ

∑

1 2 1 2 and iteration , - weights for the error measures

, - functions representing the error in flow and density

t k

w w U U

Weight parameter signifies the variance of each output measure in

the data

w

(12)

Output at collocation point Collocation Points Deterministic 1st_Order PDE Error ≤ Allowed Error ? Output Distribution Yes

Flowchart of

Calibration

using

Stochastic

Collocation

No ( , )x t_j ρ

Any existing simulation or legacy codes can alternatively be used Parallelizable Input Demand / Parameter Distribution j = j + 1 1 |Q j j t ₌ ? j Q t ₌ t j t No Yes *_{( , , )}_{x t Z} ρ METHODOLOGY SPSA Optimization New Parameter Set k t Θ j = 0 END

(13)

Study section

•  Section of NJTPK at interchange 7 with

a single on- and off-ramp with stochastic demand

•  Big Data: ETC Data

–  Vehicle-by-vehicle entry and exit time, lane,

transaction type, vehicle type, number of axles.

–  Available in NJ for 150 miles of NJTPK and

170 miles of GSP.

•  The variation in demand at this section

is captured using the ETC data for every 5 minutes between January 1, 2011 and August 31, 2011.

(14)

Study Data

•  The demand is divided into

clusters using k-means algorithm.

•  For each cluster, the distribution

of demand during each 5 minute time period is generated.

•  The simulation is performed for

weekday

–  AM peak (7-9AM)

–  off-peak (10AM-12PM)

(15)

Implementation of Proposed Approach to Study

section

•  With the demand distribution as an input, for each realization

of the parameter set, the simulation output flow distribution is generated.

•  Clemshaw-Curtis grid is the appropriate sparse grid to

discretize the stochastic demand.

•  Sparse grid interpolation is performed using the output of the

simulation at each collocation node.

•  Distribution of simulated flows is obtained by repeated

evaluation of the Smolyak interpolation function.

•  This distribution is compared with the sensor data flow

distribution and using the test statistic from the KS test @ 90% sign., the error is estimated and is minimized using the SPSA algorithm.

(16)

Results

•  To achieve the flow

distribution for ,

•  AM peak:

–  SC approach required 2433

evaluations

–  MC-type sampling required

(17)

Results

•  To achieve the flow

distribution for ,

•  Off-peak

evaluations

(18)

Results

•  Weekend peak

evaluations

(19)

Results

•  To illustrate the drawback

of using limited data, we compare the distribution of flow for high and low weekend demands with the case where only three weekend days of flow

and demand are used to calibrate the weekend model.

(20)

Conclusions

•  Calibrating for various traffic conditions require large datasets.

•  Big Data such as RFID data from ETC data is useful.

•  However, Big Data poses the computational problem in

calibrating for all conditions.

•  Traditional MC-type sampling need heavy computational

resources

•  We propose a methodology to capture stochasticity using

stochastic collocation by defining each stochastic factor as a dimension.

•  Computationally efficient sparse grids are used to sample the

stochastic space and build an interpolant using the deterministic output at each support node of the grid.

(21)

Conclusions & Future Work

•  Using 5-min. 8-month demand data, we calibrate AM peak,

off-peak and weekend peak macroscopic traffic models.

•  Distribution of flows is obtained from the interpolant and used

with the observed dist. to build a KS test stat. for calibration using SPSA.

•  Proposed methodology:

–  Any type of simulation model

–  Efficient than MC-type methods

–  Parallelized to increase speed

•  Use stochastic parameters for jam density and wave speed

for the traffic flow fundamental diagram for a larger freeway section.

•  Apply methodology for a larger network with higher

dimensions of stochasticity

(22)