D3.2 Prototype development of a fully integrated data-driven simulator

(1)

1

D3.2 Prototype development of a fully

integrated data-driven simulator

DATA science for SIMulating the era of electric vehicles

www.datasim-fp7.eu

Project details

Project reference: 270833 Status: Execution

Programme acronym: FP7-ICT (FET Open)

Subprogramme area: ICT-2009.8.0 Future and Emerging Technologies Contract type: Collaborative project (generic)

Consortium details

Coordinator: 1. (UHasselt) Universiteit Hasselt

Partners: 2. (CNR) Consiglio Nazionale delle Ricerche

3. (BME) Budapesti Muszaki es Gazdasagtudomanyi Egyetem

4. (Fraunhofer) Fraunhofer-Gesellschaft zur Foerdering der Angewandten Forschung E.V 5. (UPM) Universidad Politecnica de Madrid

6. (VITO) Vlaamse Instelling voor Technologisch Onderzoek N.V. 7. (IIT) Technion – Israel Institute of Technology

8. (UPRC) University of Piraeus Research Center 9. (HU) University of Haifa

Contact details

Prof. dr. Davy Janssens

Universiteit Hasselt– Transportation Research Institute (IMOB)

Function in DATA SIM: Person in charge of scientific and technical/technological aspects

Address: Wetenschapspark 5 bus 6 | 3590 Diepenbeek | Belgium Tel.: +32 (0)11 26 91 28 Fax:+32 (0)11 26 91 99 E-mail: [email protected] URL: www.imob.uhasselt.be Deliverable details Work Package WP3

Deliverable: D3.2 Prototype development of a fully integrated data-driven simulator Dissemination level: PU

Nature: R

Contractual Date of Delivery: 31.08.2014 Actual Date of Delivery: 31.08.2014 Total number of pages: 40 (incl. 1 title page) Authors: Luk Knapen, Davy Janssens, Ansar Yasar

Abstract

This document presents the work pursued in the third year of the project by the participants of WP3. The document presents the scientific results that constitute fundaments for the development of the new simulator.

(2)

List of Figures

2.1 Diary alignment GUI command panel. . . 7

2.2 Diary alignment tool display showing two cases before alignment. . . 8

2.3 Diary alignment tool display showing two cases after alignment. . . 9

2.4 Histogram for activity start time due to diary alignment to GPS recording . . . 10

2.5 Activity Duration - Corrected vs. Original . . . 11

2.6 Part of a larger route showing a large number of splitVertexSets. . . 13

2.7 Detail view of Figure 2.6 showing splitVertexSets in and near the city center. . . 14

2.8 Route showingsplitVertexSets that correspond to traffic lights. . . 14

2.9 Detail of Figure 2.8 (rightmost part of the route). . . 15

2.10 Frequency distributions for the number ofbasicComponents per trip. . . 16

2.11 Scatter plot: given and predicted flow values (gravitation law). . . 23

2.12 Scatter plot: given and predicted flow values (radiation law). . . 24

2.13 Probability density for predicted commuting distance (gravitation law). . . 24

2.14 Probability density for predicted commuting distance (radiation law). . . 25

2.15 cosα as a function of rejection probability. . . 27

2.16 Scatter plot: given and predicted flow values (FEATHERS). . . 27

2.17 Probability density for predicted commuting distance (FEATHERS). . . 28

2.18 Evolution of number of individuals involved in carpooling. . . 35

(4)

Chapter 1

General Information

1.1 Management Summary

This document constitutes DATASIM deliverable D.3.2 that has been defined in [Jan11] page 30/75 as

Behaviorally- sensitive simulator design ready for the calculation of Mobility-EV scenarios.

In the context of WP3, a simulator is an activity-based model for a specific area. Travel demand is considered to be caused by the need to perform activities of several kinds at suitable times and locations. Activity-based schedulers generate a daily activity plan for every member of a synthetic population. The travel demand is derived from those schedule and hence directly depend on the decisions taken by the individuals. WP3 focuses on the data required to feed the simulator and on the behavioral models for schedule adaptation and coordination between individuals.

This deliverable reports on work that integrates with WP2 (information extraction from big data to feed travel demand predictor tools) and feeds with realistic data both the scalability research performed in WP5 and the EV related electric power demand research in WP6.

The deliverable describes several simulator components that allow to estimate the effects of EV introduction both on the traveler behavior and on the electric power grid. The work was performed along the following research directions :

• Efforts to use big data to feed (data hungry) activity-based models: this part describes (1) a method to quantify the errors that occur when collecting diaries (2) a technique to expand a small survey using a large set of car traces (3) three efforts to integrate laws detected from big data in activity-based simulation

• Prediction of travel times between regions from big data: while planning their agenda for the next day, people make use of expected travel times. Synthetic individuals in a simulator have a similar need and extract the information from transportation network data. Currently, travel times on a road network are calculated using functions that relate the traffic intensity on a road segment to the time required to travel that road segment; the functions depend on the road category. This research result allows to feed modeling techniques with area specific travel times between zone instead of relying on generic parameters for road categories. The results of big data analysis allow for better travel duration estimates at specific stages of the schedule generation.

• Schedule adaptation: when people decide to cooperate or need to adapt to technical constraints, they need to adapt their daily activity plans. Coordination and adaptation are investigated in the context of carpooling and electric vehicle use. Schedule adaptation research on one hand focuses on individual behavior and on the other hand on the overall effects and their feedback. Specifically, the feedback induced by time dependent electric energy tariffs due to power peak shaving efforts in smart grids, have been investigated.

• Cooperating individuals: during schedule execution, coordination between people and/or with service providers in the environment might be required. The planned behavior specified by the generated individual schedules, is used to estimate the characteristics of the network of candidates to carpool for daily commuting. The generated data served to feed the scalability research.

(5)

Pre-dicted schedules were used to set up an agent-based model to evaluate the effects at the level of the society of the behavior shown while negotiating to carpool.

• FEATHERS schedule generator sensitivity analysis: continuation of the model adaptation to operate on finer grained traffic analysis zoning.

Source code (OSGi bundles) will be made available as open source software. This includes the code for schedule (daily planning) adaptation and code to determine the optimal electric energy charging,

1.2 List of DATASIM WP3 Related Publications for Year 3

Published

[KYC+13] Exploiting Graph-theoretic Tools for Matching in Carpooling Applications

[KUB+_13] _{Within Day Rescheduling Microsimulation combined with Macrosimulated Traffic}

[KKY+13] Estimating Scalability Issues while Finding an Optimal Assignment for Carpooling [KBAHK+14] Scalability issues in optimal assignment for carpooling

[BKJ+_14] _{Geographical Extension of the Activity-based Modeling Framework FEATHERS}

[KBJW14a] Canonic Route Splitting

Internal Reports

[KG13] Design Note : Agent-based Carpooling Model. Specification note for the carpooling agent-based model development.

[Kna13] Data used to feed schedule generators (data4simulation). Initial design for a method to combine information rich survey data with large sets of car traces.

[KU13] Notes on EV-Charging and ReScheduling Simulation. Discussion note used to plan model design.

[KYCB14] Agent-based modeling for carpooling. Chapter in book written as a DATASIM dis-semination effort.

[KKBJ14] Notes on the use of radiation laws in simulation. Technical proposal to include saturation effect in the radiation law (serves as a basis for IMOB/BME research cooperation).

[Sim14] A non-parametric method to estimate weights and deterrence function in singly constrained gravity models. Technical proposal to develop a doubly constrained gravitation law for use in activity based models (serves as a basis for BME/IMOB research cooperation).

Submitted

[RKD+15] Diary Survey Quality Assessment Using GPS Traces

[HVK+_15] _{Choosing an electric vehicle as a travel mode: Travel Diary Case Study in a Belgian}

Living Lab context

[UKY+14] A framework for electric vehicle charging strategy optimization tested for travel demand generated by an activity-based model

[HKB+15] An Agent-based Negotiation Model for Carpooling: A Case Study for Flanders (Belgium)

Papers in preparation

[KBAHS+14] Determining Structural Route Components from GPS Traces

[UKB+14] Effect of electrical vehicles charging cost optimization over charging cost and travel timings

[KBJW14b] Map matching GPS traces by sub-network selection (being prepared, provisional title)

(6)

Chapter 2

Research Results

2.1 The Use of Big Data to feed Activity-Based Models

In order to find out how automatically recorded and big data can be used in activity-based modeling of traffic demand, two research projects have been executed. The problem of consistency between surveyed and automatically collected data, has been investigated using data collected in the iMOVE project (http://www.livinglab-ev.be/content/imove-platform The iMOVE project ran simultaneously with the DATASIM project and IMOB was not a consortium member. However, we were allowed to organize a data collection campaign among iMOVE test users. Since there are only very few opportunities to collect behavioral data from EV drivers, we decided align DATASIM work with the iMOVE project. Hence we invested in software to collect data from EV drivers in order to feed DATASIM research.

In a second research effort, we investigated whether the location choice component in activity-based models can be replaced by theradiation law that predicts home-work travel flows based on population densities. This law was verified using big data.

2.1.1 iMOVE Diaries and GPS Traces

Following description was taken from the iMOVE website.

• The consortium of 17 Flemish companies and research institutions, coordinated by Umicore, strives through iMOVE for a breakthrough of electric vehicles and sustainable mobility. In this pilot project, a large group of employees and private persons will use 175 electric cars and 300 charging points, spread all over Flanders, every day during a period of 3 years. The research focus is on 3 themes playing a crucial part: renewable energy, new battery and car technology and mobility conduct.

• The test group consists both of private persons and employees of companies that will use the cars as company, professional or carpool cars. This very diverse user group, spread all over Flanders, will enable us to make an assessment of various (family) profiles, their purchasing behavior and the driving behavior in different weather and road conditions.

The use of electric vehicles as a transport mode is a new phenomenon. The overall impact of the use of electric vehicles on the travel behavior of individuals is focused in the DATASIM project. The purpose of our research was to find out whether or not people adapt their travel behavior when they switch from an internal combustion engine car to an electric car. Behavior change can be expected due to the limited vehicle range, the long charging duration and the lack of sufficient charging points. Thereto, a survey among the test users was organized and GPS traces were collected. Both the survey and the GPS trace collection were executed by a smartphone kept by the user.

2.1.1.1 The iMOVE Data Analysis in the DATASIM Project

Surveying diaries to collect evidence about travel behavior is known to be susceptible to erroneous reporting. Many respondents do not exactly remember details about executed trips and activities. This problem has been reported frequently but is was not studied in a quantitative way.

(7)

This section describes a diary collection project aimed at tracking travel behavior in the iMOVE electric vehicle pilot project. The participants were provided with an electric vehicle. Each participant was asked to collect her/his diary using a smartphone application known as SPARROW. The respondents filled in the details of their activities and trips in the application (including type of activity performed, transport mode used and start/end times of activities and trips). A diary consists of a sequence of activities and trips; multiple activities can be conducted at a given location (without intermediate trip) and multiple trips can be chained (without intermediate activity). The users were also asked to keep the smartphone with them in order to collect the GPS traces corresponding to the reported diaries.

An interactive software tool was developed to check and align where necessary the reported activity and trip timing with the GPS recordings. This required because most people report activity timing using a quarter-of-an-hour resolution which is insufficiently fine grained to observe behavior change (it that would occur). The difference between each original diary and the corresponding corrected version, has been quantified. The number of modifications as well as the distribution of their magnitudes have been analyzed. Time needed for correction has been recorded. Those results can serve to plan future data collection efforts and lead to specific recommendations to avoid errors and data cleaning time.

Participants of course still had their own car available and could select a car for each tour they traveled. The analysis of the data in order to detect a relationship between the activity type at the target location of a trip and the choice for the EV is described in [HVK+15].

2.1.1.2 Work Process Details and Results

1. The SPARROW tool described in ([KJY13]) was used. This tool notifies users when erroneous data are entered but does not force the user to correct any error immediately. This option was deliberately chosen at design time because it was thought to result in an ergonomic tool which was expected to motivate users to use the tool. Participants cannot be forced to enter at all and hence completeness and consistency of collected data cannot be expected. Users need to be convinced to supply correct data, technical measures by themselves cannot solve the problem.

SPARROW allows to enter planned activities (in the future) and not only history data (past activities) and also allows every possible edition of any data entered before.

The SPARROW tools registers for each interactive data entry the time of the operation and assigns a unique identifier to each activity/trip. As a result, it is possible to known the time difference the start of an activity/trip and the moment at which it was reported. This is an interesting feature to feed the analysis of the data quality.

2. An interactive tool to align diaries with GPS recordings was developed as aQgis

(http://www.qgis.org/en/site/ plugin. The tools is described in [KJY13] and in [RKD+15]. The interactive user sees the trace panel (window) displaying the recorded GPS points as well as a representation of the original and the adjusted diary : see Figure 2.1. The software allows to show the trace on a map or satellite image for easy identification of locations. This facility was turned of in the figures in order to enhance the visibility of the GPS point colors. Figure 2.2 shows two cases ’a’ and ’b’ before any alignment was carried out. The upper part is the trace panel, the lower part shows the diary alignment panel (the two time lines). The GPS points have the same color as the time block they belong to. Home activity time blocks are green, trip (travel) blocks are pink. Note that case ’b’ shows a cluster of points at a given location, some of which are colored pink. This is due to mis-reporting of trip start time (or equivalently of activity end time). Figure 2.3 shows the same cases after alignment. Note the GPS point colors and the differences between the corresponding time lines. The consistency of the travel diaries was checked with the help of GPS traces after the end of data collection period. The errors in the data were removed with the help of schedule alignment software (a Qgis (Quantum GIS) plugin).

3. Diaries were selected using specific quality criteria. This was done because it turned out that the recorded data were of low quality. This is explained by the combination of following facts

(8)

Figure 2.1: Command panel to interactively align a diary with the corresponding GPS recording. Both versions of the diary are shown on a time line. Activities and trips are shown by colored blocks. Colors are used to distinguish types activity between for activity time blocks and between modes for trip time blocks. The blocks on the lower time line can be adjusted by the interactive user. This panel is show together with thetrace panel.

• the participating users were not sufficiently notified and motivated in advance about the diary collection

Many people did not correct the errors in the data although the SPARROW software clearly indicates error detection, specifies what is wrong and asks for correction. People recorded their data for several weeks (between 3 and 6). Quality criteria were applied to complete datasets: i.e. we did not extract sub-periods for which the data fulfill the criteria. A diary was selected if and only id

(a) there was a minimum number of activities reported (at least one for each day) (b) the maximum total amount of non annotated time periods was sufficiently small

(c) all data weer entered at most 24[h] in advance and at most 48[h] after thestart of the activity or trip

(d) the diary does not contain stacks of more than two activities reported to take place simulta-neously (some people reported up to 6 simultaneous activities, probably because they entered wrong hours or dates)

(e) people had to record data for several weeks and got tired to do so

The set of diaries that met the quality requirements was extremely small. 117 people participated in the project 83 from which participated in the batches (pilot project phases) in which the final version of the SPARROW software was used. From those 83 people, 33 met the quality criteria. 4. Two people cleaned the diaries that met the quality requirements by means of the diary alignment

tool described before. This took 2[h] and 22[min] per diary which is economically infeasible to be performed on a large scale.

5. Original and aligned (corrected) diaries were kept in XML files. Properly formatting XML and the unique identification of activities and trips made it possible to generate a list of modifications (difffile) that specifies how to generate the cleaned version from the original one. Thediff files specify additions, deletions, and modifications for start-time, end-time and type for the activities as well as start-time, end-time and transportation mode for the trips. The set of applied corrections (modifications, additions, deletions) was linked to some basic socio-demographic variables (gender, age-class, profession and diploma). Descriptive statistics (histograms, scatter plots) have been reported and discussed in [RKD+15]. The main results are summarized as follows:

(a) Number of modifications, additions and deletions per person per day

We first looked at the number of corrections per type (modification, addition, deletion) ac-cording to gender, age-class, profession, diploma and batch. There were between three and four corrections per person per day, almost one addition per person per day and between two

(9)

Case ’a’.

Case ’b’.

Figure 2.2: Displays showing the GPS trace panel and the diaries panel for two cases ’a’ and ’b’ before any alignment was performed. The colored blocks in both time lines are identical. Note the color of the GPS points.

(10)

Case ’a’.

Case ’b’.

c

(11)

Figure 2.4: Histogram for difference in activity start time induced by diary alignment with the GPS recordings (range cut off at 120[min])

and three deletions per person per day. There were slightly less corrections for men (3.4) than for women (4.3), the same number of additions (0.8 and 0.7 respectively) and roughly the same number of deletions (2.1 and 2.5 respectively). Figure 2.4 shows the histogram for difference in activity start time induced by diary alignment with the GPS recordings. The range was cut off to the interval [-120,120] minutes. By doing so we dropped 1.08% of the observations on the lower side and 0.14% of the observations on the higher side.

(b) Distributions of the differences in start-time, end-time end duration

A very useful plot is the one for the corrected duration versus the original duration (Figure 2.5).

There is a clear concentration of points on the line y = x, meaning no change in duration, but there seems to be a kind of parallel line above this line. Further investigation of these observations learned that they mainly came from persons who claimed to have made trips but the GPS didn’t show any trips and they seemed to have stayed at home all day. In these cases the home activities were prolonged, which explains the parallel line (shifted over about 700[min]) in the graph in Figure 2.5.

The graph was refined by using different colors for gender, age-class, profession, diploma, batch and activity-type. None of these revealed a clear trend, except that by activity-type, which confirmed that the large changes in duration showing the parallel line were mainly for home activities.

2.1.1.3 Conclusion

With the help of this analysis, various problems in the active data collection techniques could be high-lighted. Conclusions are: if survey data are required, aprompted recall technique is to be used. A proto-type for an Android app has been developed and briefly described in [RKD+15]. It logs GPS recordings and sends GPS traces to a back-end server where STOP detection is performed. The detected stops are sent back to the smartphone were they are shown on a map to be annotated interactively by the user (in chronological order).

(12)

Figure 2.5: Activity Duration - Corrected vs. Original

2.1.2 Route Decomposition : Route Selection Behavior from Big Data

The first step in traffic demand modeling is calculating the expected traffic flows between locations. Activity-based schedule generation is one of the techniques to achieve this. The second step is to determine which route people will use when private vehicles are used. Detailed route information is required to analyze the suitability of planned locations for infrastructure that generates large amounts of traffic (shopping centers, office buildings, hospitals, production facilities, university campuses etc).

A large body of research has been devoted to the route choice problem. Several discrete choice theory based methods have been proposed. Problems faced include: the large choice set and the mutual dependency of alternatives. Most studies focus on aggregate route characteristics like the total travel time or distance.

In this research we focus on the structure of the selected route. We formulate an hypothesis about the structural aspects of routes used by individuals and plan to investigate the hypothesis using big data. If the hypothesis turns out to hold, it can be used to to construct much smaller route choice sets. Generalized constant (hence time independent) non-negative link traversal cost is used. Either time to drive or distance or monetary cost can be used. The basic hypothesis HYP-LOW-NR-MINCOST-COMPis that for utilitarian trips, people use a small number of concatenated least cost paths. Details are explained in the following items.

1. The hypothesis is relevant in the context of the route choice process. After travel demand has been predicted by activity-based models, a set of trips for which start time, origin and destination locations and mode (car, bus, train, walk, bike, . . . ) is known. This set is used as input for the traffic assignment (network loading) procedure to calculate the actual use of each network link as a function of time. Network loading requires the determination of route for each trip. Therefor a

route choice model is required. The number of possible routes between given origin and destination nodes in a network, can be very large even if the cost for the actual route shall not exceed the least cost with a pre-specified factor (calleddetour factor below). Furthermore, multinomial logit models cannot simply be applied because alternative routes are not independent. Several models have been proposed, some of them are based on constrained route choice sets. The method investigated in this research aims to support automatic creation of route choice sets.

(13)

(a) Integration of big data in the simulator : route choice models : generate routes with (1) enough split points (2) plausible detour multiplier

(b) Validation of routes generated by traffic assignment software

3. Several sets of GPS recordings have been processed using following multi-step procedure:

(a) trip detection, STOP detection: splitting a sequence of GPS points into subsequences each of which corresponds to a trip. This is typically done by finding sequences of recordings for which all coordinates identify points in an area of restricted size during a period of minimum duration.

(b) map matching: to transform the sequence of GPS recordings into a sequence of network links (edge) visited. This phase only retains simple paths (not everywalk) because utilitarian trips serving a single purpose are assumed to visit each node at most once (because of rational utility maximization).

(c) route splitting: a path in a graph can be split into basic components where a basic component is either a least cost path or a single edge that does not constitute a least cost path between its vertices. Such path splits in general are not unique. In [KBJW14a] and [KBAHS+14] an algorithm has been presented that delivers that calculates two specific path decompositions (having the minimal number of basic components) and delivers sets of split vertices that can be used to construct other decompositions of minimal size by taking one split vertex from each set.

4. Split vertices (or their incident network links) are assumed to have a special meaning to the traveler because those vertices are connected by least cost paths and thus the split vertices in a route essentially define the detour. As a consequence, route splitting allows for

(a) detection of short duration activities (like bring/get (pick/drop)) that can be undetected by the STOP detection phase because of the particular spatial and temporal thresholds used (b) detection of locations in the network that attract flows by deviating effective routes from the

least cost routes (traffic lights, bridges, . . . ) 5. Following software tools have been used:

(a) Fraunhofer software that splits trajectories into trips and then matches the trips to the Navteq map. During this process, spatial gaps in trips are bridged by the shortest path when the gap corresponds to a time period not larger than a threshold ∆. Different experiments used ∆ values 1.0, 2.5 and 5.0[min] respectively. This software has been applied to both the Flemish and Italian data sets.

(b) IMOB wrote two separate tools for trip detection and map matching. Trip detection is based on evaluation speed and acceleration; GPS sequences for which the recording device was turned on/off while moving are discarded. Specifically for this project, a map matcher for high density GPS recordings was written. A recording is qualified as high density if and only if the recording frequency is sufficiently high so that at most N links on a path can go undetected. For this project N = 1 was used so that missing mode than one road network link is flagged as an error. This tool is described in [KBJW14b].

(c) IMOB wrote the route decomposition software that finds the minimal number of components in each route (path) and that reports the sets of split vertices that are potential boundaries between components. This tool is described in [KBJW14a] and [KBAHS+14]. Since for a given path, a shortest path calculation is to be performed for each vertex, a derivative of the Dijkstra algorithm was used where the queue is adapted, reused and extended in each step. 6. Procedures executed: following cases have been handled. The differences between them are caused

by limited access rights to data sets only and hence not by technical reasons.

(a) Milano car traces have been map matched to the Navteq network using the Fraunhofer soft-ware for 1.0, 2.5 and 5.0[min] thresholds respectively

(14)

Figure 2.6: Part of a larger route showing a large number ofsplitVertexSets.

(b) SBO2 person traces have been

i. map matched to the Navteq network using the Fraunhofer software for 1.0, 2.5 and 5.0[min] thresholds respectively

ii. processed by the IMOB trip detector and map matcher using the Open Street Map (OSM) network

7. Following figures show some of the detectedsplit vertex sets (a unique color is used for each split vertex set). The figures apply to trajectories from the IMOB SBO2 map-matched onto the Navteq network. Figure 2.6 shows part of a route starting at the right hand side, visiting the center of the city of Geel, then moving around the city in clockwise direction, heading to the north and finally arriving near the center of the city of Mol. The partial route shows 11 splitVertexSets. Figure 2.7 shows the lower-left part of the same route. The lower-rightsplitVertexSet in Figure 2.7 suggests that a particular street in the city center was an intermediate destination. The lower-left

splitVertexSet suggests the intentional use of the ring way and/or a specific junction. Figure 2.8 shows a route of about 4 kilometers having 7 splitVertexSets first visiting something special at the first splitVertexSet and then avoiding the narrow streets in a residential area up to the 4-th

splitVertexSet which represents a location equipped with traffic lights. The arterial road is used up to the 5-th splitVertexSet which also is equipped with traffic lights. The trip ends near the parking of a shopping center. The 7-thsplitVertexSet is the upper-right one in Figure 2.9. It is an artifact caused by the fact that the Van Groesbeekstraat (south of the splitVertexSet) constructed in 2012-2013 did not exist at the time the trajectories were recorded (between 2006 and 2008). 8. Figure 2.10 shows the absolute and relative distributions for the number ofbasicComponents per

trip for all cases.

(a) The relative frequency distributions suggest that the hypothesis HYP-LOW-NR-MINCOST-COMP holds. The distributions depend on the methods used for trip detection and map

(15)

Figure 2.7: Detail view of Figure 2.6 showing splitVertexSets in and near the city center.

(16)

Figure 2.9: Detail of Figure 2.8 (rightmost part of the route).

matching. The distributions labeled Belgium Navteq x.y[min], differ only in the value for delay threshold parameter used in stop-detection. The smaller the threshold value, the more stops are detected and hence the smaller the size of the trips (expressed as the number of links they contain). For a given sequence of GPS recordings, the more subsequences are flagged as stops, the lower the number of detected basicPathComponents; this occurs because some briefly visited locations will be flagged as astopwhen using a small delay threshold parameter value whereas they are detected to be a splitVertexSet in the opposite case. This is reflected in the relative frequency distribution diagram. The probability (relative frequency) to find routes having 1 basicPathComponent, decreases with increasing value of the delay parameter (1.0 , 2.5 , 5.0). For the case of 2basicPathComponentsthe phenomenon is largely attenuated. Starting at 3 basicPathComponents per trip, the effect is reversed as expected.

(b) For Belgium, the OSM and Navteq cases seem to slightly differ. This can have been caused by the use of different map matching software tools, based on different methods and concepts. The Fraunhofer IAIS map matcher closes small gaps by assuming that the traveler used the shortest path. The IMOB map matcher does not use this procedure. This conclusion is not final because the cases differ both in the network and the map matcher used.

(c) The difference between the Belgian and Italian cases, however, is much larger. The relative frequency for trips consisting of a single basicPathComponent is much larger than for the trajectories registered in Belgium. For routes having more than one component, the relative frequency is lower than in any Belgian case. Detail analysis is required to find out whether this phenomenon occurs because in the Italian data set, the pre- and post-car-trip components are missing.

9. Table 2.1 summarizes characteristics of the runs. The poor performance of the calc4 server was explained by an operating system mis-configuration and seems not to be specific to this application.

(17)

Absolute frequency distribution for the number ofbasicPathComponents for the 4 runs applied to the Belgian trips.

Relative frequency distribution for the number ofbasicPathComponents for the 4 runs applied to the Belgian trips.

Figure 2.10: Frequency distributions for the number of basicComponents per trip. The number of trips in each set depends on the map matcher used.

(18)

Region Belgium Belgium Belgium Belgium Italy

Case OSM Navteq Navteq Navteq Navteq

1.0[min] 2.5[min] 5.0[min] 1.0[min]

Runtime[sec] 5191 17058 25822 36424 498850

Machine calc2 calc2 lucp2364 linux1 calc4

OS Linux Linux Linux Linux Windows

Debian Debian Debian Debian Server wheezy wheezy wheezy wheezy 2008

CPU Xeon Xeon i5 Core2 Xeon X5670

Memory 3[GB] 3[GB] 4[GB] 2[GB] 48[GB]

ClockFreq 2.8[GHz] 2.8[GHz] 2.4[Ghz] 2.4[GHz] 2.93[GHz]

Cores used 7 7 3 2 20

Trips scanned 6632 12429 9408 8020 34308

Trips dropped 694 2066 2426 2508 3427

Net number of trips 5938 10363 6982 5512 30881

Number of basic compo-nents

12687 22921 17429 14412 56346

Number of basic compo-nents per trip

2.14 2.21 2.50 2.61 1.82

Average number of nodes per trip

21.68 82.47 75.43 62.20 55.37

Number of least cost path calculations

128736 854637 526652 342846 1772569 Least cost calculations per

second

24.8 50.10 20.40 9.41 3.55

Number of trips per sec-ond

1.14 0.61 0.27 0.15 0.06

(19)

2.1.3 Radiation Model in FEATHERS

The location choice model for work locations in FEATHERS has been replaced by the radiation law described in [SGMB12]. In this section we first compare the original location choice model used in FEATHERS and the radiation model. Then, simulation results for both models are compared.

2.1.3.1 Radiation model

The radiation model described in [SGMB12] applies to work locations (commuting). The number of job opportunities in a region (city) is assumed to be proportional to the population. The model assumes that ajob request emitted from a given home locationL0 is intercepted by ajob opportunity in the same

or a different region. A request emitted from L0 is possibly intercepted by L1 6= L0 if and only if it

was not intercepted byL0 or by an intermediate locationLi|Li6=L0∧Li 6=L1∧d(L0, Li)≤d(L0, L1).

Hence, the model is not based on distance but on a ranking of the opportunity locations using their distance to the home location.

2.1.3.2 Location choice in FEATHERS

The location choice model in FEATHERS is used for all activity types and hence applies to both fixed (mandatory) and flexible (discretionary) activities.

1. The model is used in to consecutive steps. At a coarse spatial level it is first used to determine the municipality; in a finer grained level it is used to select a FEATHERS subzone within the municipality. The location model has been described in paper [AHT03] and more elaborated in the book [AT04a]. Note that the zoning terminology differs slightly from the one used in the original paper [AHT03].

2. The method uses following concepts: (a) order

(b) distance band

(c) distance ranking

3. In all casesorder of an area is a rank number in a list where areas are sorted in ascending order by a size attribute. Several different size attributes are used.

(a) For municipality order the size is the number of households residing in the area (not the number of individuals, see [AHT03]) which acts as a proxy for the population size. The rank is assumed to define a functional hierarchy of the cities/towns so that the highest ranked cities/towns expose a supra-regional function in terms of work, leisure and social activities. Lower ranked cities/towns have a local function only.

(b) For a zone, the size attribute to use depends on the activity type for which a location is to be selected. Forwork activities, the relative employment opportunity level of the zone within the municipality is used as the size.

sw(z) =

E(z)

P

z∈Z(M)

E(z) (2.1)

In (2.1) sw(z) denotes the value of the size attribute of zonezwhen it comes to work location

selection, Z(M) denotes the set of zones in municipality M and E(z) denotes the number of employments in zone z.

For all activity types, the size values are subdivided into 4 categories by specifying the bound-aries between the classes in terms of relative occurrence frequencies

order .class

4 5.0% largest zones with size >0

3 10.5% second largest zones with size >0 2 28.0% third largest zones withsize >0 1 56.5% remaining zones withsize >0

(20)

2.1.3.3 Models comparison for work location choice

The radiation model described in [SGMB12] is a single step model. In this model

• population size is used as a proxy for the number of job opportunities

• candidate locations are ranked according to their distance to the home location of the individual selecting a work location

The FEATHERS (Albatross) model described in [AHT03] and in [AT04b] is a two phase model. Each phase consists of following steps:

• decision tree based selection by order

• decision tree based selection by distance band

• random selection by distance ranking

For the municipality level, the order is determined by ranking the regions using the number of resident households. For the subzone level, the relative size of employment is used to rank the alternative locations.

Algorithm 2.1 has been derived from figure 2 in paper [AHT03] and shows the location choice pro-cedure. Names starting withDT denote methods that apply a decision tree based procedure to select a result. The decision trees have been trained using survey data.

Algorithm 2.1Location choice algorithm in FEATHERS (Albatross)

1: mun←currentM unicip

2: if DT otherM un()then

3: if DT homeM unicip()then

4: mun←homeM un

5: else

6: nextLocM unOrder←DT selectM unOrder()

7: if DT nearestM unOf GivenOrder(nextLocM unOrder) then

8: mun←munN earestT oCurrentLocation(nextLocM unOrder)

9: else

10: munDistBand←DT selectM unDistBand(nextLocM unOrder)

11: munSet←selectM unSetByDistBand(currentM un, munDistBand, nextLocM unOrder)

12: mun←selectF romM unSet(munSet)

13: end if 14: end if

15: end if

16: zoneOrderInM un←DT getOrderOf ZoneT oSelect()

17: zoneDistBand←DT selectZoneDistBand(zoneOrderInM un)

18: zoneSet←selectZoneSetByDistBand(mun, currentM un, currentZone, zoneDistBand, zoneOrderInM un)

19: zone←selectF romZoneSet(zoneSet)

• Line 2 determines whether or not to execute the next activity in the municipality that contains the current location.

• Line 4 finds out whether the next activity is to be performed in the individual’s home municipality

• Line 6 determines the order for the next municipality to select

• Line 7 determines whether the nearest municipality of the given order is to selected.

• Line 8 selects municipality of given order that is nearest to the current location

(21)

• Line 11 selects municipalities of given order within the distance band

• Line 12 selects one municipality from the set by random selection based ondistance ranking mech-anism

• Line 16 determines the order of the zone to select within the municipality

• Line 17 selects a distance band using a decision tree

• Line 18 selects set of zones of given order within the distance band

• Line 19 selects one zone from the set by random selection based on distance ranking mechanism

2.1.3.4 Replacement of the FEATHERS location choice model by radiation models

In order to compare the FEATHERS (Albatross, see section 2.1.3.3) and radiation models, the location selection at municipality level has been replaced in FEATHERS. Details and experimental results have been reported in DLV4.2 sectionRadiation Model in FEATHERS

2.1.4 Doubly Constrained Location Choice Models

This section presents work onlocation choice models based on radiation and gravitation laws respectively that take saturation effects into account. In a constellation of requesters (the ones looking for a job) and providers (job opportunities) having a limited capacity, the state of each provider shall be taken into account. This induces the effect of saturation which means that simple state ignoring models (neither gravitation nor radiation) are adequate. Therefore, doubly constrained models are proposed in this section. For a given pair (requester,provider) those models are based on (1) the magnitude of the demand, (2) the magnitude of the supply at a given moment in time (or equivalently at a given system state) and (3) a function based on attributes of the pair that characterize the attraction between requester and provider. In the worklocation choice model case, those methods are based on the knowledge of the number of job seekers and of the number of remaining open job opportunities at a given moment in time in each location. The state variableremaining open job opportunities is ignored in the simpler models.

When considering doubly constrained models, the incoming and outgoing commuter flows are given. We started from a given commuter home-work OD matrix derived from census data for Flanders. The aim is to reconstruct the OD matrix from the given row and column sums (numbers of outgoing and incoming commuters respectively). If this turns out to be possible, a location choice model for activity based schedule generators can be built from the given marginals. Filling all cells in an OD matrix when only the row and column sums are given is a highly under-determined problem. The idea is to find out whether reconstruction is feasible by making use of the corresponding known OD-impedance (distance or travel duration) matrix.

In order to evaluate the usability of the radiation model in the location choice module of an activity-based schedule predictor, experiments using data for Flanders have been executed. The current location choice model used in FEATHERS is described in section 2.1.3.3. The aim is to find out whether it is possible to reconstruct the origin-destination (OD) flow matrix from the row and columns sums by using the information contained in the radiation and gravity laws. If this would be the case, the reconstructed OD flow matrix can serve as a basis to sample (work) activity locations in the Monte Carlo simulation for an activity-based model. The research is summarized in the following sub-sections.

2.1.4.1 Data used

A home-work flow matrix was derived from the SEE (Socio-Economische Enquete) dataset for Belgium. The SEE makes use of building blocks (BB) or statistical sectors. The average area for the building blocks in the Flanders region is 1.2[km2]. The dataset is based on following subdivisions of the study area:

(22)

Number Zoning : Level 20759 building blocks : 1

2815 subZones : 2

1314 zones : 3

332 superZones, municipalities : 4

Each entity at a given level is completely contained in exactly one entity at the next higher level. The area of an entity is completely covered by all the areas at the next lower level it contains.

The experiments have been carried out at superZone level for following reasons: (i) the matrices for the BB level are too large to run simulations on a 4GB laptop (ii) the simulations took too much run-time Building blocks are more or less homogeneous with respect to socio-demographic characteristics and land-use (residential, business area, industry zoning, recreational, agricultural, etc). This does not hold at municipality level.

Distance was used as a measure for travel impedance. The given distances matrix contains the length for shortest path between building blocks measured along the road network. This matrix was condensed to a distance matrix at superZone level. The distance between two superZonesszA and szB

consisting of the sets of building blocksbbA and bbB respectively, is defined by _N1 P a∈bbA,b∈bbB

d(a, b) with

N = |bbA| · |bbB| and d(a, b) the impedance (in this case: distance) to travel from the centroid of a to

the centroid ofbalong the network.

All home-work location pairs were extracted from the SEE data (and hence have been reported by individuals).

2.1.4.2 Software: Principle of Operation

1. Only home-work travel between different zones is considered. Intra-zonal travel is ignored. No other individual travel or activity is considered.

2. Software was developed to micro-simulate the location choice for each individual. During the simulation, an individual is selected for processing by first sampling a superZone TAZ from the set of non-exhausted TAZ using a uniform distribution. A non-exhausted TAZ is a TAZ for which at least one more outgoing home-work trip needs to be assigned to a destination TAZ. This technique is used because individuals are not identified neither represented in the software. Sampling individuals from the population using a uniform distribution would require two steps: (i) sampling a TAZ from a distribution defined by the population size for the TAZ (ii) sampling an individual from the TAZ using a uniform distribution This technique was expected to take too much run-time. Since the TAZ are more or less equally populated, it is assumed that uniformly sampling the TAZ is sufficiently accurate (the sampling order affects the result due to saturation effects by job opportunities being consumed).

3. As soon a an individual is sampled, the target (work) location is sampled using the technique described below in sections 2.1.4.3 and 2.1.4.4; then the number of job requesters in the home location Fout[h] and the number of remaining job opportunities Fin[t] in the target location are

decremented. As a consequence, no individuals can be drawn from an exhausted TAZ t (where

Fout[t] = 0) neither be assigned to a saturated TAZ t (where Fin[t] = 0). The actual sizes of the

remaining outgoing and incoming flows in each TAZ are used to feed the functions that determine the assignment probabilities. Finally, each time a request is absorbed by a location, the home-work flow matrix is adapted. This is summarized by:

h←sampleHome()

t←sampleT arget()

Fout[h]←Fout[h]−1

Fin[h]←Fin[h]−1

(23)

2.1.4.3 Gravity Model

A gravity model using the deterrence function f(d) =d−2 where dis the impedance between the TAZ involved. The probability weight function for trips leaving TAZtorig is given by

p(tdest) = N(tdest)·α P ti∈T N(ti)·d2(torig, ti) (2.2)

whereα is a normalization factor, d(torig, ti) is the impedance to travel from torig toti and N(t) is the

number of remaining open job opportunities int. A probability matrixPwith∀i∈ |T |: P

j∈|T |

p[i][j] = 1 is maintained.

The probability matrix is not recalculated for each individual assignment but the populations is subdivided inchunks and the matrix is recalculated each time a TAZ becomes saturated or a chunk has been processed. For the experiments, 200 chunks were used.

2.1.4.4 Radiation Model

The radiation model is simulated by executing for each individual the emission-absorption process dis-cussed in [SGMB12] page 23.

Consider the levelzof the emitted job request. It has a probability densityf(z) that does not depend on the individual. For each request (outgoing commuter) the level is assumed to equal the maximum

zmax,m value found afterm extractions wheremis the number of remaining outgoing commuters in the

origin location who are still looking for a job opportunity. In the same way, for a possible destination location, the maximum value for the request level that can be absorbed, is determined a the maximum value in a set ofnextractions fromf(z) wherenis the number of open job opportunities in the candidate destination location.

Note that for every probability density functionf(x) with (cumulative) distributionF(x) =R

f(x)dx, the cumulative distribution for the maximum of k extractions, is given by Fk(x). A value for the maximum is sampled by samplingu∼unif orm(0,1) and calculatingx from

u=Fk(x) (2.3)

x=F−1(u1k) (2.4)

(2.5) One needs to sample the maximum level for the emitter location and for the absorber location and to determine whether the level for the emitter is lower than the one for the absorber.

xe=F−1(u 1 m e ) (2.6) xa=F−1(u 1 n a) (2.7) xe ≤xa⇔u 1 m e ≤u 1 n a (2.8) ⇔ log(ue) m ≤ log(ue) n (2.9)

sinceF−1₍_x_{) is monotonically increasing in [0}_,_{1]. See also [SGMB12].}

A value ue ∼unif orm(0,1) is sampled for the emitter (origin) location. Then the potential

desti-nations are scanned in order of increasing travel impedance. A valueua∼unif orm(0,1) is sampled for

each candidate absorbing (destination) location and the first one for which the condition mentioned in (2.9) holds is selected to absorb the request.

It is possible that a request is not absorbed because the number of absorbers is finite. Therefore, the software provide a fallback: the request is recycled after doubling the log(ue)

m value which means that the

maximum value after m₂ extractions is considered. The number of times this was required for the 986931 commuters for the experiments reported below, is shown in the table.

(24)

Figure 2.11: Scatter plot showing the correspondence between the given flow values and the flows calculated using the gravitation law. The value for each off-diagonal cell in the home-work matrix is shown.

Number of Number of Adaptations Relative number people adaptations per adapted person of people adapted

23437 45253 1.930 0.024

23478 45046 1.918 0.024

23242 44752 1.925 0.024

23310 44830 1.923 0.024

2.1.4.5 Results

1. Radiation and gravitation models.

Given and calculated OD-flow matrices are compared. For each OD-flow matrix, the diagonal elements specify the number of intra-TAZ trips; those numbers are given and hence identical in all matrices (given and calculated ones). For comparisons, off-diagonal elements only are considered because diagonal elements shall have no influence on the quality measure.

Figure 2.11 shows the scatter plot for tuples (HWcalc[o][d], HWgiven[o][d]) for all (o, d)|o6=dpairs.

HWgivendesignates the given home-work flow matrix andHWcalcdesignates the matrix calculated

(reconstructed) using the gravity law. Figure 2.12 shows the similar scatter plot for the matrix calculated (reconstructed) using the gravity law. The correlation between given and reconstructed values (not their logarithms) ∀(o, d)|o 6=d is low in both cases as shown in following table. In a perfect reconstruction, all points should be on the line y=x.

Case R (Pearson) R2

Gravity 0.75585 0.57130 Radiation 0.60983 0.37189

Figure 2.13 shows the frequency distribution for the distances found in thegiven and thecalculated

home-work OD flow matrices for a gravity model simulation. The red line represents the distribu-tion for the distances in the reconstructed OD flow matrix; the green line corresponds to the given

matrix and the blue line shows the difference. The number of short distance trips is underesti-mated. Figure 2.14 shows the frequency distribution for the distances found in the given and the

(25)

Figure 2.12: Scatter plot showing the correspondence between the given flow values and the flows calculated using the radiation law. The value for each off-diagonal cell in the home-work matrix is shown.

Figure 2.13: Probability density functions for the distance found in thegiven (green line) andcalculated

(red line) OD home-work flow matrices for thegravitation law case. The blue line shows the difference.

(26)

(red line) OD home-work flow matrices for theradiation law case. The blue line shows the difference.

calculated home-work OD flow matrices for a radiation model simulation. The red line represents the distribution for the distances in the reconstructed OD flow matrix; the green line corresponds to the given matrix and the blue line shows the difference. The number of short distance trips is over-estimated. In order to further compare the methods, four runs for the gravity case and four runs for the radiation case have been executed. The off-diagonal elements are arranged in a N2 ₋_N ₌ _N _·₍_N ₋_{1) dimensional vector. Using the scalar product, the cosine of the angle}

between vectors is used to evaluate the quality of the matrix reconstruction. This is similar to the Pearson correlation coefficient but does not apply a translation to the exected value. Results have been summarized in the table below. Each row and each column in the table corresponds to a simulation run (using either the gravitation law or the radiation law). Each cell corresponds to two checks and hence is associated with two vectors (matrices). The cell value is cos(α) whereαis the angle between the vectors. The table is symmetric. The first column/row corresponds to the given OD flow matrix.

Given Rad.1 Rad.2 Rad.3 Rad.4 Grav.1 Grav.2 Grav.3 Grav.4

Given 1.000000 0.612213 0.611238 0.612454 0.611430 0.760153 0.761436 0.759589 0.760234 Radi.1 0.612213 1.000000 0.999428 0.999435 0.999414 0.573575 0.577157 0.574327 0.574533 Radi.2 0.611238 0.999428 1.000000 0.999327 0.999405 0.573921 0.577609 0.574655 0.574953 Radi.3 0.612454 0.999435 0.999327 1.000000 0.999425 0.573447 0.577071 0.574202 0.574446 Radi.4 0.611430 0.999414 0.999405 0.999425 1.000000 0.574676 0.578347 0.575413 0.575733 Grav.1 0.760153 0.573575 0.573921 0.573447 0.574676 1.000000 0.997460 0.997592 0.997496 Grav.2 0.761436 0.577157 0.577609 0.577071 0.578347 0.997460 1.000000 0.997390 0.997452 Grav.3 0.759589 0.574327 0.574655 0.574202 0.575413 0.997592 0.997390 1.000000 0.997468 Grav.4 0.760234 0.574533 0.574953 0.574446 0.575733 0.997496 0.997452 0.997468 1.000000

The cosine values in the first row, show that the angle between each vector constructed using the

gravitation law and the vector corresponding to the given flow matrix, is nearly the same for all simulations. The same is observed for the vectors generated using the radiation law. The cosine values however are very small indicating that the vectors for the reconstructed flow matrix are almost orthogonal to the vector corresponding to the given flow matrix.

The next rows also show the cosine values for the angles between the vector for all pairs of solutions. It is easily observed that a given law (gravitation or radiation) produces solutions corresponding to

(27)

vectors having almost the same direction. This means that the magnitude of stochastic effects is much smaller than the systematic error. On average, the angleαbetween radiation and gravitation solutions is larger (cosα = 0.577) than between the solutions and the given vector (cosα= 0.612 and cosα= 0.760 for the radiation and gravitation law respectively).

2. Radiation model with personal decision to reject opportunity.

Since the small distance cases seem to be overestimated, an additional experiment was performed. The radiation law assumes that a TAZ is chosen as soon as region is found for which the sampled the maximum required job level does not exceed the sampled maximum level for the available op-portunities. In such case, the TAZ is chosen as the destination location. The model was adapted so that there is a given probability to reject the offered opportunity. The probability is identical for everyone. The software was run for values [0.02. . .0.30] using a 0.02 increment. The values for cosα and r are given in the table below and in Figure 2.15. Clearly, the values remain well below those found for the gravitation model.

Probability to reject cosα r(P earson)

0.00 0.61477 0.61241 0.02 0.61589 0.61350 0.04 0.61992 0.61752 0.06 0.61891 0.61646 0.08 0.62665 0.62422 0.10 0.62884 0.62637 0.12 0.62894 0.62644 0.14 0.63227 0.62972 0.16 0.63760 0.63506 0.18 0.63942 0.63684 0.20 0.64050 0.63789 0.22 0.64111 0.63845 0.24 0.64166 0.63895 0.26 0.64410 0.64136 0.28 0.64413 0.64134 0.30 0.63941 0.63654 3. Predictions by FEATHERS

A set of schedules was predicted by a FEATHERS run for half of the Flemish population (FRAC2). A home-work flow matrix was derived by considering every schedule that contains a work activity. The home location and the first trip arriving at the work location have been extracted to determine the required matrix. This method is used because, in many cases, people travel to another location before going to work. The values in the matrix have been doubled to get a prediction for the full population. Figure 2.16 shows the scatterplot where each point represents a tuple of a predicted and the corresponding observed value. Please note that the given (observed) home-work flow matrix was derived from the SEE census data for the year 2001 whereas the FEATHERS prediction was produced using data for 2010. The size of the FEATHERS predicted vector is 1.1613 times the length of vector for the observed data; this can be caused by the fact that the observed (given) dataset holds for 2001 (SEE data) while the FEATHERS prediction holds for 2010. The vector size does not matter, the direction is important. On the other hand, there is no reason to believe that distribution of the trips over the OD-pairs has changed a lot during that period.

The cosα = 0.637957 value nearly equals the value for the radiation model with saturation and with 18% probability for people not accepting the proposed opportunity (first or any subsequent proposal) but it shows that the OD-matrix reconstruction is less accurate than for the gravity model case. Figure 2.17 shows the probability density for the home-work distance. Clearly, the FEATHERS prediction for the distance is better that the one for any other method.

(28)

Figure 2.15: Evolution of cosα as a function of the probability to reject an opportunity.

Figure 2.16: Scatter plot showing the correspondence between the given flow values and the flows calculated using FEATHERS law. The value for each off-diagonal cell in the home-work matrix is shown.

(29)

(red line) OD home-work flow matrices for theFEATHERS prediction case. The blue line shows the difference.

2.1.4.6 Discussion - Conclusion

1. Please note that the results about the integration of the radiation law in FEATHERS that have been reported in section 2.1.3.4 and in deliverable D4.2 do not apply to flow matrices. In the section, the original radiation law (not the doubly constrained variant) was integrated in FEATHERS. The results presented in that section are numbers of jobs assigned to locations. Because this section covers doubly constrained models, the number of jobs assigned to each zone (i.e. the size of the incoming flow) is given. This section focuses on the reconstruction of the flows (OD matrix elements). The problem is largely under-determined since only since for case of n locations there aren2 unknowns and 2·equations. Location choice models essentially try to use some law in order to compensate for the lack of equations.

2. Both thegravitation andradiation laws seem to be able to select locations so that the distributions for the home-work distance approximate the one found in the original data. However, neither of both is succeeds in reproducing the given OD flow matrix accurately which means that neither of the laws by itself is sufficient to serve as a location choice model in an activity-based micro-simulator. Although the distance distributions seem to be realistic, none of the methods can be expected to generate realistic flows. Clearly, some essential information is not captured by the models.

This is explained by following factors:

(a) the homogeneity of the area. Municipalities cover the entire area of Flanders (i.e. there is not much empty space between municipality centers) and their population sizes are similar. (b) the laws apply to the selection of a work location as a function of the travel impedance and

fail to discriminate between locations having similar impedance values. In particular, any information about job kinds is ignored since a single density functionf(z) is used for thelevel

of requests and opportunities.

(c) the impedance values (travel distance, travel duration) are calculated between municipality centroids. In densely populated regions like Flanders, the distance between centroids is of the same size as the radius of the TAZ.

(30)

(d) in cases where the impedance values and number of job opportunities are nearly equal, the candidate TAZ in the gravity model have nearly equal probability to become selected. In such cases, rank based models (like the radiation model) assigns a higher priority to particular zones in a set where all zones have nearly equal impedance and number of job opportunities. 3. The OD-matrix reconstruction by the FEATHERS location choice model, performs as good as the

radiation model but is not as accurate as the gravitation model.

Conclusion: impedance values combined with one of the investigated laws are not sufficient to recon-struct the OD flow matrix when the numbers of outgoing and incoming commuters for each TAZ are given. Please note that the methods have been evaluated using one given dataset of incoming and outgoing home-work trips for the municipality level zoning of Flanders. The gravity based model us-ing a quadratic deterrence function, delivers the most accurate prediction for the OD-matrix and the FEATHERS location choice model delivers the best distance distribution.

2.1.5 Estimating the deterrence function for singly constraint gravity models A non-parametric method to simultaneously estimate the weights (populations) and the deterrence function in single constrained gravity models is presented. This section reports research that is going on and for which preliminary results only are available at the time of writing this DATASIM deliverable DLV3.2. The principle of the method is presented here, results will be presented only in the coming months.

2.1.5.1 Concept - Aim

The gravity model is the typical modeling framework used to estimate commuting trips between cities or municipalities, and it has found wide application in the estimate of various kinds of spatial flows. Although there are many different formulations of gravity models, most of them estimate the average number of trips from locationitoj as

Tij ∝wiwjf(rij, γ) (2.10)

wherewis the weight, a local variable usually proportional to the resident population or the number of jobs in a location, and the other is thedistance,r, which depends on the pair of locations considered and can be the geographic distance, the road distance, the travel time, the travel cost or other. f(r, γ) is the

deterrence function that depends on the parameter(s) γ and describes how flows decay with distance. Gravity models can be further classified based on the type of constraints that are imposed on their estimates. In a single constrained model the total number of estimated departures from any location

i, Ti, is constrained to be equal to the total number of observed departures from i,Ti∗, Ti =PjTij =

P

jTij∗ = Ti∗. In a doubly constrained model there is an additional constraint, ensuring that the total

number of estimated arrivals in each location is also equal to the total number of observed arrivals.

2.1.5.2 Method

Here we propose a general method to find the deterrence function and the values of weights that pro-vide an optimal estimate of the observed OD matrix, T_ij∗, using a single constrained gravity model. We compare the estimates of commuting trips in Flanders using our method with those obtained us-ing a doubly constrained model, and comment on the performances of the two methods in the discussion. In single constrained gravity models the average number of trips fromitoj is given by the following equation Tij =Ti wjf(rij, γ) P jwjf(rij, γ) =Tipij(w, fγ) (2.11)

wherepij(w, fγ) is the estimated probability of a trip from locationitoj, and depends on the weights

(31)

probability to observe a given sequence of trips from locationi to all other locations,{tij}j, is given by

the multinomial distribution

P({tij}|Ti,{pij}) =Ti! Y j6=i ptij ij tij! (2.12)

and the average number of trips between any two locations is indeedTij =Tipij.

Usually the weight is assumed to be proportional to the resident population or the number of jobs, and in other cases it is assumed to be a function (usually a power law) of these variables. The deterrence function is usually assumed to be power law, exponential, stretched exponential or a more complex function. Our method will seek to find the single constrained gravity model that provides the best estimate of the observed flows, i.e. the model with the shortestL1distance from the data,P_ij|T_ij∗−Tij|.

The difference of our approach with respect to other methods in the literature is that we do not make any a priori assumption on the functional form of weights and deterrence function. In particular, weights do not explicitly depend on population or any other socio-economic variable, but are treated as free parameters. Similarly, the deterrence function is estimated using a non-parametric regression, i.e. without pre-imposing a particular functional form but expressing it as a sum of 10 gaussians whose parameters (amplitude, mean, and variance) are free parameters. Our method thus consists in looking for the values of thenweights and the parameters of the 10 gaussians (i.e. n+ 30 total parameters) that minimize the sum of the absolute differences between model and data trips.

Starting from a uniform set of values for the free parameters, we iterate over the following steps: 1. Optimize the deterrence function: Keeping the weights fixed, the downhill simplex algorithm is

used to find the parameters of the 10 gaussians {Ak, µk, σk}10k=1 that minimize the cost function

P ij|Tij∗ −Tij|: min γ   X i6=j T_ij∗ −Tipij(w, fγ)  = (2.13) min {Ak,µk,σk}10k=1   X i6=j T_ij∗ −Tipij w, 10 X k=1 Ake −(rij−µk)2 σk !   (2.14)

2. Optimize the weights: Keeping the deterrence function’s parameters fixed, the downhill simplex algorithm is used to find the weights wthat minimize the cost function P

ij|Tij∗ −Tij|: min w   X i6=j T_ij∗ −T_ip_ij(w, f_γ)   (2.15)

Steps 1. and 2. are repeated until the final values of the parameters do not significantly vary with respect to the initial values (typically when they have not changed more than 1%).

2.2 Prediction of Travel Time from Big Data

The travel time and travel volume between regions were derived from recorded GPS traces. This tech-nique allows for direct observation of so-called volume delay functions (VDF) for use in the traffic assignment (network loading) procedure.

Details about this work have been reported in WP4 because in a first stage, the results serve for validation of predicted travel demand on arterial relations. Nevertheless, the results can be used in the prediction phase too. Preparing a road network for use in simulations is expensive because a volume-delay function (VDF) needs to be assigned to each link. Several such functions are in use and all of them require parameters. Discussions about which function to use last for a long time. The VDF cannot be derived solely from the characteristics of the cross-section of the road. Longitudinal characteristics,

D3.2 Prototype development of a fully integrated data-driven simulator