Working(Paper(Series(

(1)

W

or

ki

ng

(P

ap

er

(S

er

ie

s(

Mathema2cal(characteriza2on(of(

conges2on(based(on(speed(distribu2on:(A(

case(study(of(greater(Toronto(and(Hamilton(

area,(Ontario,(Canada(

!

Natalia!Kyriakopoulou,!Pavlos!Kanaroglou!and!Yorgos!N.!Pho8s!

(

GCWP - 002

(

Paper!presented!at!the!13th!Interna8onal!Conference!on!Environmental!Science!and!Technology! (CEST2013),!Athens,!Greece,!5J7!September!2013!

!

July!

2013!

G e o C H O R O S

!

Geospa8al!Analysis!&!GIS!Research!Group! G eo CH O RO S !J! G eo sp a8 al !A nal ys is !an d! G IS !Re se ar ch !G ro up !!! !!! !!! !!! !!! !!! !!! !!! !!! !!! !!! !!! !!! !!! !!! !!! !!! !!! !!! !!! !!! !!! !!! !!! !!! !!! !!! !!! !!! !!! !!! !!! !!! !!! !!! !!! !!! !!! !!! !!! !!! !!! !!! !!! !!! !N a8 on al !T ec hn ic al !U ni ve rs ity !o f!A th en s!

(2)

MATHEMATICAL CHARACTERIZATION OF CONGESTION BASED ON SPEED DISTRIBUTION: A CASE STUDY OF GREATER TORONTO AND HAMILTON AREA,

ONTARIO, CANADA

KYRIAKOPOULOU NATALIA1_{PAVLOS KANAROGLOU}2_{YORGOS PHOTIS}3

1_{Kyriakopoulou A. Natalia, School of Rural and Survey Engineering, National Technical} University of Athens, 9 Iroon Polutechneiou Street, Zografou, Attiki, Greece, 15780.

Email address: [email protected]

2_{Pavlos S. Kanaroglou, Professor, Center for Spatial Analysis (CSpA), School of Geography} and Earth Sciences, McMaster University, 1280 Main Street West, Hamilton, Ontario, Canada,

L8S-4K1. Email address: [email protected]

3_{Yorgos N. Photis, Professor, School of Planning and Regional Development, University of} Thessaly, Pedion Areos, Volos, Magnesia, Greece, 383 34. Email address: [email protected]

EXTENDED ABSTRACT

This study formulates a comprehensive methodology for quantifying and identifying congestion characteristics based on speed distribution. For the purposes of our analyses, we utilize speed data collected by INRIX inc. from vehicles traveling through the Greater Toronto and Hamilton area in 2011. A mathematical approach is applied in order to characterize the roadway segments in terms of travel reliability as well as congestion severity and duration. We argue that the Gaussian mixture model and the combination of its parameters constitute a useful tool in order to obtain quantitative congestion measures and to rank the roadway performance.

A plethora of measures have been developed to valuate traffic congestion levels of urban roadways. Two aspects that have been investigated are the duration of congestion in a roadway segment (Stathopoulos and Karlaftis, 2002) and the bimodalitity in the speed distribution curve under mixed traffic conditions (Partha and Satish, 2006).

In a similar context, our methodology is based on assumptions regarding mixed components and speed distribution. Processing starts with the Gaussian mixture model parameters calculations. Then the investigation of the bimodality of the distributions categorizes every roadway link as unreliable, reliable slow or reliable fast. At the last stage a ranking process prioritizes all the segments from worst to best using the Analytical Hierarchy process. Finally, GIS mapping capabilities provide spatiotemporal information of congestion characteristics by identifying hot spots according to congestion level, severity and duration.

We conclude that the Gaussian mixture model can be a useful tool in congestion quantification. Moreover our methodological framework can be applied to large databases. Results indicate that speed patterns differ both between counties and days of the week in the study area.

Keywords: Traffic congestion, Gaussian mixture model, speed distribution, EM algorithm, bimodal distribution.

(3)

1. INTRODUCTION

Traffic congestion is known to exacerbate emissions from mobile sources in urban areas, thus contributing to air quality deterioration with significant health, environmental and economic impacts (Smit et al., 2008). A comprehensive selection of mitigation policies should include a sound understanding of congestion characteristics, which are known to vary significantly over the time of day and day of the week. Congestion also is not uniform over space, varying significantly between roadway sections of the transport network. This paper proposes a mathematical approach using vehicle speed data for the identification and quantification of congestion characteristics using a Gaussian mixture model. The effectiveness of the method is demonstrated using speed data at the transportation link level for the Greater Toronto and Hamilton Area (GTHA), Canada.

2. BACKGROUND

Literature review reveals that there are many different definitions and analytical expressions for congestion. A widely used definition is: ‘congestion is the time or the delay in excess of that

normally incurred under light or free flow traffic condition’ (Turner et al., 1996). The selection of

the congestion measures is not an easy task and each study depending on its purpose focuses on a suitable methodological framework. Although the aforementioned definition points out that the travel delay or the amount of extra time is the basic measure (Shrank and Lomax, 2011), there are many studies that deal with the problem using other methods based on fuzzy logic (Hamad and Kikuchi, 2002) or mathematical models.

A growing body of research has focused on evaluating traffic patterns on congested highway systems using mathematical distributions (Junkwood, 2009). With the respect to the duration of congestion, Stathopoulos and Karlaftis (2002) argue that is best described by the Loglogistic functional form while Vlachogianni et al. (2011) apply a multiregime nonlinear autoregressive conditional model. Introducing the issue of bimodality Ko and Guesnier (2004) identify congested and uncongested components in order to quantify the characteristics of congestion while Partha and Satish, (2006) examine it under mixed traffic conditions. Finally, Junkwood, (2009) focuses on the variability on speed patterns due to holiday traffic using a Gaussian mixture distribution estimated by the Expectation-Maximization (EM) algorithm.

A Gaussian Mixture Model (GMM) is a parametric probability density function represented as a weighted sum of M Gaussian component densities, as follows:

! !! = ! !_!!!(!|!_!

!!! ,Σ!) ,

where x is a D-dimensional continuous-valued data vector, wi, i= 1, … , M, are the mixture weights, and φ(x|!!,Σ!), i = 1, … , M, are the component Gaussian densities.

This paper builds on this body of research and aims to provide a mathematical characterization of congestion in the GTHA road network, using average speed data, obtained from INRIX inc.

(4)

3. METHODOLOGICAL FRAMEWORK

Our methodology is based on three assumptions. First, that speed distribution has a mixed form with congested and uncongested periods; second that the speed distribution reveals the traffic characteristics without a need to account for roadway capacity and traffic volume; and third that the speed distribution over a given time period is normal.

The methodology consists of four stages. First, the Gaussian Mixture parameters are estimated with the EM algorithm, which for robustness is initialized with multiple runs of the k-means algorithm (Marakakis et al., 2006). A mixture of two normal densities must be either unimodal or bimodal. To test for modality we employ a methodology used by Schillilng et al. (2002). If

!_!−!_! ≥|!_!+!!_!| and !_!≤!,75∗! , then the distribution is bimodal. Here !_!,!_! are the means of the two distributions, σ is the standard deviation and V is the free flow speed computed as the average of night-time speed data.

The modality of the density functions is related to the travel conditions, especially to the reliability of the roadway performance. The result of this step is the characterization of every roadway link as unreliable, reliable slow or reliable fast. The last step entails a ranking process that prioritizes all the segments from the worst to best and identifies the hot spots segments based on their level of congestion severity and duration. This ranking process requires the development of an indicator which is composed of a set of weighted variables that characterize congestion. One of these variables is the travel time index, as used by Schrank et al (2012), which represents congestion level comparing travel time in peak period with travel time at free flow conditions. We propose the use of the Analytical Hierarchy process as described by Saaty (1990), which is a structure technique for computing the relative weights of the selected variables in order to develop the indicator.

4. CASE STUDY 4.1 Study Area

GTHA is located in Our Southern Ontario and has a population of 6,574,140 (Canadian Census Analyzer, 2011). It is Canada’s largest and fastest growing urban region. It comprises two single-tier municipalities (Hamilton and Toronto) and four regional municipalities (Durham, Halton, Peel and York). Congestion in the Greater Toronto and Hamilton Area is presently a serious problem and is expected to become worst as the region grows.

4.2 Data

The data utilized in this work were collected by INRIX inc. from vehicles traveling on roads throughout the Greater Toronto and Hamilton Area per day in 2011. The data set includes average speed on 7,879 roadway segments with length range [3.15m – 13.45m] every 15 minutes per week day, segments’ ID and the associated roadway attribute.

(5)

4.3 Gaussian Mixture Analysis- Results

Using the speed data from 5am to 10pm and following the methodology described in the previous section, two distributions are estimated for every roadway segment based on two-components Gaussian Mixture Model. We adopt the Expectation-Maximization algorithm to elicit these parameters. Night-time data are excluded from the traffic analysis due to consistently high vehicle speeds associated with low standard deviation.

The results of the Gaussian Mixture Model analysis include a set of 3 parameters –mean standard deviation and mixture weight– for two distributions per roadway segment per day of the week. Wednesday (Table 1) had the greatest difference in average mean speeds between the congested and uncongested distributions, which indicates congestion severity. Also, Saturday and Sunday followed different trends relative to the other week days.

Table&1&Results&of&the&Gaussian&Mixture&distribution&by&day&of&the&week.&

Days&

Mixture&Component&1&&

(congested)& Mixture&Component&2&(uncongested)& Speed& difference&

Mean%

Speed% Deviation%Standard% Proportion%Mixing% Speed%Mean% Deviation%Standard% Proportion%Mixing%

Monday& 34.24% 1.9322% 0.44% 38.65% 0.5735% 0.56% 4.40% Tuesday& 33.74% 1.9721% 0.45% 38.47% 0.6574% 0.55% 4.73% Wednesday& 32.94% 2.0903% 0.46% 38.07% 0.8095% 0.54% 5.13% Thursday& 33.36% 1.9964% 0.45% 38.27% 0.7621% 0.55% 4.91% Friday& 33.97% 1.8803% 0.46% 38.54% 0.6450% 0.54% 4.57% Saturday& 36.19% 1.2462% 0.43% 39.06% 0.3601% 0.57% 2.87% Sunday& 37.11% 1.0168% 0.38% 39.33% 0.2039% 0.62% 2.22%

A mathematical description of the speed distribution curve should be at the roadway segment level in order to understand where the hotspots are, to compare and analyze the speed profiles spatially and temporally.

Special caution is required to interpret the distributional characteristics in terms of congestion of the two speed components estimated by the EM algorithm. After estimating the two-component mixtures, the investigation of the bimodality of the distributions is necessary. Using the estimated parameters, a set of rules is proposed to understand whether a distribution is bimodal as we assumed or unimodal.

A bimodal distribution (Figure 1) shows that the roadway segment experiences congested and uncongested conditions while a unimodal distribution (Figure 2) indicates either congested condition or free flow condition. Figure 2 depicts a reliable slow roadway segment because the weighted average speed of the two components is below the congestion threshold which is the

0.75*free flow speed (Schranket al., 2012) and the segment is experiencing serious traffic

(6)

The performance of each roadway segment per day of the week is evaluated by using the algorithms presented. Table 2 summarizes the number of the unimodal and bimodal roadway segments per day of the week. Wednesday had the most bimodal segments which experienced serious congestion, while Sunday had the fewest with 1,194 and 88 respectively.

Table&2&Counts&of&Unimodal&and&Bimodal&roadway&segments&by&day&of&the&week.&&

Days Unimodal& Bimodal&

Reliable&fast& Reliable&slow& Monday& 6,970% 27% 882% Tuesday& 6,860% 2% 1,017% Wednesday& 6,683% 2% 1,194% Thursday& 6,744% 2% 1,133% Friday& 6,971% 4% 904% Saturday& 7,673% 2% 204% Sunday& 7,790% 1% 88%

The map 1 shows the classification of the roadway segments into three categories -unreliable, reliably slow and reliably fast- for Wednesday. We observe that the unreliable segments are mainly concentrated in the centre of Toronto. The average speed difference for the unreliable segments is 13 mph and

the average Travel Time Index is 1.37. Unreliable travel conditions over these links indicate that the travel time on the section is unpredictable and it could cause commuters to spend an average of extra 72 minutes of travel time during the Wednesdays in 2011. The locations of the bimodal links identify the regions with the serious traffic problem and further analysis is needed.

Figure&1&Bimodal&estimated&Gaussian&mixtures&

of&travel&speed.&Roadway&segment’s&ID:&2659& Figure&2&Unimodal&estimated&Gaussian&mixtures&of&travel&speed.&Roadway&segment’s&ID:&6969&

& & &

(7)

The literature review has observed that the estimated model parameters mean, standard deviation, and mixture weight, represent the traffic conditions in a quantitative manner. The mean of the lower speed distribution approximates the severity of the congestion; the mean of higher speed distribution defines the acceptable speed range; the variance represents the reliability of the roadway performance and the mixture weight is an index of the duration of congestion (Junkwood, 2009; Ko and Guesnier, 2004). Although each parameter can have its own implications, combining them may give comprehensive insights about the congestion. By combining the beneficial effects of using the parameters of the Gaussian mixture analysis with the travel time index and the free flow speed, an index can be formed.

The Analytical Hierarchy process (Saaty, 1990) is applied using four main criteria:

• The ratio of the high speed distribution to the low speed distribution (variable 1). This

ratio measures the severity of congestion. We propose this ratio because it is unitless and comparable among all the roadway links.

• The mixture weight of the low speed distribution (variable 2). The two weights of the

Gaussian Mixture Model can be adopted as a congestion index on a scale of 0 to1 in terms of severity and duration. However, we propose the weight of the lower speed distribution because lower speeds lead to traffic congestion. The lower values of this weight are related to better traffic conditions.

• The ratio of the free flow speed to the speed of the distribution with the greatest weight

(variable 3) It measures the severity of congestion. This criterion shows the relation between the main distribution and the free flow speed.

• The Travel Time Index (variable 4). The Travel Time Index compares peak period travel

time to free-flow travel time. This ratio is includes the concept of the delay which is a basic performance measure. Also, this “unitless” feature allows the comparison of roadway segments with different characteristics such as length and number of lanes. In order to calculate the weights (table 4), we apply the analytical hierarchy process (AHP) as recommended by Saaty (1990). In the same study, the scale of numbers that indicates how many times more important is one element over another element is explained. Table 3 shows the relative importance between variables using pairwise comparison (PC).

Variable 1 Variable 2 Variable 3 Variable 4

Variable 1 1.00 0.33 4.00 3.00 Variable 2 3.00 1.00 3.00 6.00 Variable 3 0.25 0.33 1.00 2.00 Variable 4 0.33 0.17 0.50 1.00 Weights Variable 1 0.5 Variable 2 0.3 Variable 3 0.1 Variable 4 0.1 Table&3 Table&4

The spatial variation of the indicator across the study area constitutes the last step of the analysis and it is visualized in map 2. Table 5 presents the indicator statistics per county. We include results only for Wednesday since this is the most problematic day of the week. With a 28.9% coefficient of variation (CV) and 0.9 mean, Toronto area is the most congested. In a similar manner, the 10 worst-performing segments are presented in Table 6. For these, average weekly delay was about 500 minutes while vehicle speed was less than half of the free flow speed.

(8)

Table&5&Indicator’s&statistics&for&the&counties&in&the&GTHA&

County Min Max Mean _DeviationSt. CV % Durham 0.52 1.28 0.79 0.15 19.0 Halton 0.53 1.16 0.81 0.14 17.3 Hamilton 0.52 1.70 0.76 0.15 19.7 Peel 0.53 1.44 0.80 0.16 20.0 Toronto 0.52 2.41 0.90 0.26 28.9 York 0.52 1.21 0.79 0.15 19.0 Table&6&The&top&10&worstTperforming&segments&in&the&GTHA& %

Code Route name County Length(miles) Mean 1 (mph) Mean 2 (mph) Indicator

1 4939 ON-401- Southbound Toronto 0.28 7.22 41.47 2.41

2 7763 ON-401- Collector Toronto 0.42 18.86 51.05 1.79

3 7820 ON-401- Collector Toronto 0.23 21,20 54.24 1.77

4 4915 James Street- Eastbound Hamilton 0.47 7.24 22.01 1.70

5 6530 Eglinton Avenue- Northbound Toronto 0.40 17.16 46.90 1.67

6 7765 ON-401- Collector Westbound Toronto 0.26 12.06 28.40 1.63

7 4544 Park Lawn Road- Southbound Toronto 0.29 5.00 21,40 1.59

8 7732 ON-401- Collector- Eastbound Toronto 0.15 13.37 45.86 1.50

9 6357 ON-401 Exp- Eastbound Toronto 0.31 13.38 44.83 1.48

10 7805 Don Valley Pky- Southbound Toronto 0.24 11.16 32.51 1.48

The map 2 locates hot spots around Toronto centre where high values of the indicator concentrate. And thus, commuters traveling to Toronto on an average Wednesday experience bad and unstable traffic conditions in terms of both trip duration and severity meaning that recorded vehicle speeds in these areas were less than half of free flow for approximately 100 minutes weekly.

5. CONCLUSIONS

This paper proposes an approach to identify and quantify the congestion based on speed distribution. A case study is carried out using speed data for the Greater Toronto and Hamilton Area. The results from the proposed mathematical approach show that Gaussian Mixture Model analysis is a useful tool for describing the characteristics of congestion. It is beneficial to investigate and understand the historical trends of speed patterns and congestion characteristics because congestion is related to significant health, environmental and economic impacts. Also, the suggested methodological framework is relatively easy to apply in large database. The main advantages of this methodology is that the only speed data are used and that it combines a mathematical approach with fundamental measures of congestion in order to Map&2&Congestion&hot&spots

(9)

identify the reliability of the roadway segments and to rank their level of congestion. This methodology provides an efficient tool for decision makers to select an appropriate congestion mitigation strategy. Future research will extend the framework to estimate the traffic emission, especially in the identified hot spots of congestion. Also, the results could be incorporated into the path selection process such as in GPS devices where you can avoid unreliable segments akin to avoiding highways with tolls.

REFERENCES

1. Assimakopoulos V., Moussiopoulos N. and Apsimon H.M. (2000), Effects of Street Canyon Geometry on the Dispersion Characteristics in Urban Areas, 16th IMACS World Congress 2000, August 21-25 2000 Lausanne, Switzerland.

2. Hamad K., Kikuchi S. (2002), Developing a measure of traffic congestion: fuzzy inference approach, Transportation Research Record: Journal of the Transportation Research Board, No 1802, Transportation Research Board on the National Academies, Washington, D.C.,2002, pp.77-85.

3. Junkwood J. (2009), Understanding the variability of speed distributions under mixed traffic conditions caused by holiday traffic, Transportation Research C, 18, pp. 499-610.

4. Ko J., Guesnier R. (2004), Characterization of congestion based on speed distribution: A statistical approach using Gaussian mixture model , Transportation Research Board, Washington, D.C..

5. Marakakis A., Galatsanos N., Likas A., Stafylopatis A. (2006) A Relevance feedback approach forcontent based image retrieval using Gaussian mixture models, Proc. International Conference Artificial Neural Networks, Athens, pp. 84-93.

6. Partha P., Satish C. (2006), Speed distribution curves under mixed traffic conditions, Journal of Transportation Engineering 132, pp. 475-481.

7. Saaty, T. L. (1990). How to make a decision: the analytic hierarchy process. European journal of operational research, 48(1), 9-26.

8. Schilling M., Watkins A., Watkins W. (2002), Is human height bimodal?, Americal Statistical Association, Vol.56, pp. 223-229.

9. Schrank D., Eisele B., Lomax T. (2012), TTI’s 2012 Urban Mobility Report, Texas A&M Transportation Institute.

10. Shrank D., Lomax T. (2011). The 2011 urban mobility report, Texas Transportation Institute.

11. Smit R., Brown Al., Chan Y.C. (2008), Do air pollution emissions and fuel consumption models for roadways include the effects of congestion in the roadway traffic flow? Environmental Modellingand Software 23, pp.1262-1270.

12. Stathopoulos A., Karlaftis M. (2002), Modeling duration of urban traffic congestion, Journal of Transportation Engineering128, pp.587-590.

13. Turner S.M., Lomax T.J. and Levinson H.S. (1996), Measuring and estimating congestion using travel time-based procedures, Transportation Research Record, 1564, 11-19.

14. Vlachogianni E., Karlaftis M., Kepatsoglou K. (2011), Nonlinear autoregressive conditional duration models for traffic congestion estimation, Journal of Probability and Statistics.