Emergency Vehicle Trip Analysis using GPS AVL Data: A Dynamic Program for Map Matching

(1)

Emergency Vehicle Trip Analysis using GPS AVL

Data: A Dynamic Program for Map Matching

A.J. Mason

Department of Engineering Science

University of Auckland, New Zealand

www.esc.auckland.ac.nz/Mason

[email protected]

Abstract

We outline a new dynamic programming approach for determining the most likely route taken by a vehicle given a set of GPS locations and times recorded by an AVL (automatic vehicle location) system. By using optimisation approaches, this map matching algorithm can solve a complex maximum likelihood problem that takes into account data such as the vehicle’s location, heading, and speed. The optimisation approach allows routes to be determined even when the GPS data points are widely spaced in time and/or distance. Preliminary results are reported using data collected from ambulances.

1 Introduction

As automatic vehicle location (AVL) systems become more common, a wealth of vehicle tracking data is being collected and stored. For example, many ambulance organisations have now equipped each vehicle with a GPS device that continuously tracks the location of the vehicle. This location data is transmitted back to the headquarters where it is plotted and stored, and typically never looked at again. As part of our work in developing an ambulance simulator [3], we wish to extract as much information as possible from this vehicle GPS data. In particular, we would like to determine road speeds for both standard and ‘lights and sirens’ travel, and also to extract accurate trip timings from the data. In this paper, we discuss a new dynamic programming approach to analyse the GPS data and extract from it detailed route and timing information that can help address these requirements.

In the literature, the problem we face is referred to as ‘map matching’ [2]. The basic map matching problem is to translate a sequence of GPS locations into a sequence of arcs that define a route on some underlying road network. If each GPS location was known exactly, and the arc locations defining the road network were also known exactly, then this would be a simple problem of determining which arc each GPS point is located on. As long as the GPS points were sufficiently close,

(2)

this would then define a sequence of arcs that would, in turn, define a route on the road network.

Unfortunately, the problem is typically not so simple. The GPS system relies on triangulating positions from satellite locations, and thus small errors in a reported location can occur if the number of visible satellites is low, or if atmospheric distur-bances introduce transmission timing errors. Large errors or gaps in the data can occur if the signal is blocked or reflected, such as occurs in tunnels or in the ‘urban canyons’ created by tall buildings and narrow streets. Gaps in the data can also occur at the start of a trip when the GPS unit spends time searching for and then locking onto the satellites. Furthermore, the data used to define the road network will typically contain errors arising from geometry simplification or digitisation dif-ficulties. This combination of uncertainties can complicate the mapping of GPS points to arcs. For example, in Figure 1, it is not clear from the GPS data whether the vehicle turned left or right at the intersection.

Figure 1: A possible sequence of GPS points (circles) generated by a vehicle trav-elling on a road network.

A further issue that complicates the problem is the frequency of GPS data transmissions. In most of the previous work, GPS data has been provided at intervals of around 10 seconds or less (e.g. the German and Swedish data used by [1] and [4] was mostly collected at 1 second intervals). This means that there will typically be a number of GPS points available to determine and then confirm that a vehicle is travelling along some proposed road segment. In this case, one can use a back-tracking approach in which some candidate route is constructed by heuristically choosing a ‘most likely’ next arc at each intersection, and then back-tracking and altering this choice if the successive GPS data points stray too far from the proposed route. However, this approach becomes more problematic if, as in our case, the GPS data points are widely spaced, and thus large segments of the route have to be determined before the next GPS data point is encountered.

1.1 Data Available

In this work, we assume we have a set of GPS data points generated by a vehicle. (In our ambulance data, these points are typically generated using rules that result in a new GPS data point being recorded ‘every 500m or 5 minutes’ or ‘every 333m or 15 minutes’.) Each data point has a time and a location, and, depending on the particular system being used, may also contain information on the vehicle’s speed, heading, and the distance travelled since the time the last data point was recorded. We also assume that we have some detailed road network available represented as a set of nodes and directed arcs with associated travel speeds. Each arc is defined

(3)

by a sequence of 2 or more locations joined by straight line segments, thus allowing the arcs to contain bends. Each node in the network represents the location of a traffic intersection (where the driver has a choice of next arc), a merging of the traffic flows on two or more arcs, or the end of a dead-end road. We assume we have some efficient algorithm for finding fastest paths in this network.

2 Model Description

We are fortunate in that, as well as having GPS data records available, we also have vehicle status information that tells us when a vehicle begins or finishes a trip. The first step in our data processing is to use this status information to break the sequence of GPS data points into a sequence of trips. Each trip is defined by a set of GPS data points that represent the vehicle travelling from some start point (e.g. an ambulance base) to some destination (e.g. the scene of an accident). As well as using this status information, the trip processing algorithm also constructs new trips whenever it finds a long sequence of data points at the same location. This can occur, for example, when a vehicle makes a lunch stop on the way back from a hospital.

We consider a trip T to be defined as a sequence of n GPS data points, T =

{g1, g2, ..., gn}, where a GPS data point gc, c = 1,2, ..., n is represented by a tuple

gc = (tgpsc , pgpsc , vgpsc , dcgps) giving the vehicle’s position pgpsc = (xgpsc , ycgps) at time

tgps c > t

gps

c−1, along with its velocity vgpsc (given as a speed and heading (sgpsc , hgpsc ))

and the distance travelleddgps

c since the last data point was recorded.

In our algorithm, we will be mapping each GPS data pointgcto a corresponding

network position pnet

c , being a point defined on the road network by the pairpnetc =

(ac, oc). This position, also denoted (xcnet, ycnet), is defined by an arc anetc and an

offset 0≤ onet_c ≤ 1 where onet_c defines a fraction along arc anet_c measured from the start of arcanet

c . A network positionpnetc is typically said to be a ‘directed location’

in the sense that the vehicle is travelling in the direction of arcanet

c . However, the

position may also be an undirected position if the direction of anet

c is not relevant,

and thus the vehicle may either be travelling in the direction of anet

c or in the

opposite direction, i.e. in the direction of arc anet

c ’s reverse arc (if such an arc

exists). Network positions will be directed unless stated otherwise.

Our algorithm starts by defining, for each GPS data positionpgps

c = (xgpsc , ygpsc ),

an associated set of discretised candidate network positions Pnet

c = {pnetc,i , i =

1,2, ...,|Pnet

c |} where pnetc,i = (ac,i, oc,i) represents a possible position for the

vehi-cle at timetgps

c when thecth GPS data pointgcwas generated. Figure 2 shows one

such set of positions that could be generated for a GPS data point. Heuristics can

be used to limit the setP_cnet of candidate network positions by, for example,

limit-ing the maximum distance allowed between the network positionpnet

c,i and the GPS

point pgps

c . Another useful simplification is to restrict the candidate positions to

no more than one per arc, thereby converting the generation of candidate positions to the problem of constructing a set of nearby arcs and then finding that point on each arc that is closest to the GPS location.

(4)

c'th GPS location

Figure 2: Discretised candidate vehicle locations (network positions) for a GPS data point.

net

p₁_,₅ p₂net_,₅ p₃net_,₅ ... p_nnet_,₅

net

p₁_,₄ p₂net_,₄ p₃net_,₄ p₄net_,₄ ... p_nnet_,₄

net

p₁_,₃ p₂net_,₃ p₃net_,₃ p₄net_,₃ ... pnnet,3

net

p₁_,₂ p₂net_,₂ p₃net_,₂ p₄net_,₂ ... p_nnet_,₂

net

p₁_,₁ p₂net_,₁ p₃net_,₁ p₄net_,₁ ... pnnet,1

g1 g2 g3 g4 ... gn

Figure 3: A solution (vehicle route) can be represented as a path through a state space of candidate network positions, where no more than 1 candidate position is chosen for each GPS data point.

2.1 Solution Representation

We can define a possible solution to the map matching problem by choosing a sequence of positions from our sets of candidate network positions. Figure 3 shows

a representation of the candidate network positionspnet

c,i defined for each GPS data

pointgc, c= 1,2, ..., n. We define a solution S as a sequence of candidate positions

formed by choosing (up to) one candidate position for each GPS data point, along with the fastest paths between each consecutive pair of chosen candidate positions. A possible solution is shown by the path in Figure 3, while Figure 4 shows a segment of the route that might be specified by a consecutive pair of candidate

positions. Note that in our model, we allow some of the nGPS data points to be

rejected because they have very large errors, and thus we assume that the solutionS

containsnS ≤nnetwork positions, denoted pnetc1,i1, p

net c2,i2, ..., p

net

c_nS,i_nS. In this example

solution, we have c₁ = 1, i₁ = 5; c₂ = 2, i₂ = 3; c₃ = 4, i₃ = 2 (skipping data point g₃), ..., andcnS =n, inS = 4. For convenience, we will also denote a solution

bypnet

[1] , pnet[2] , ..., pnet[nS] (where p

net

[j] =pnetcj,ij, j = 1,2, ..., nS) with associated GPS data

pointsg_[c]= (tgps_[c] , p gps [c] , v gps [c] , d gps [c] ).

The solution S states that the vehicle was at location pnet

(5)

GPS Data Point 7

GPS Data Point 6

Figure 4: A partial vehicle route (shown in bold) is defined by a pair of candidate network positions (shown as solid circles).

tgps_[1] =tgps

c1 (when it generated a GPS data point at location p

gps [c] =pgpsc1 ), and then at locationpnet [2] at time t gps [2] , and so on.

The measure that we use to distinguish solutions is the probability of the vehicle having been at the locationspnet

[1] , pnet[2] , ..., pnet[nS]given the GPS data points we have

ob-served. Thus, we wish to know the probability (denoted byP(pnet

[1] , ..., pnet[nS]|g1, ..., gnS))

of these vehicle positions having occurred given the recorded GPS data points. We can estimate this as follows:

P(pnet_[1] , pnet_[2] , ..., pnet_[_n

S]|g1, g2, ..., gn) (1)

In this expression we have decomposed the probability function into a product of probabilitiesP(pnet

[c] |pnet[c−1], g[c])that each depends only on successive pairs of vehicle

locations/data points, and a termPskip(g1, g2, , ..., gn, g[1], g[2], ..., g[nS])that

incorpo-rates the impact of skipping some GPS data points. The main assumption in this simplification is that, at each step, we can ignore any impact on the route of the destination that is implied by the later GPS data points.

The general term P(pnet

[c] |pnet[c−1], g[c]) in our product is the probability that the

vehicle was at location pnet

[c] at time t

gps

[c] given that the GPS data point g[c] was

generated at this time and that the vehicle was at location pnet

[c−1] at time t

gps

[c−1].

We need to estimate this probability, and the Pskip(g1, g2, , ..., gn, g[1], g[2], ..., g[nS])

factor, by taking a number of considerations into account.

1. Normally Distributed GPS Errors: We need to develop an estimate for the

probability P_location(pnet

[c] , p

gps

[c] ) of the vehicle being at location p

net

[c] given the

reported GPS location of pgps_[_c_] . Now, the position errors in the GPS system

are generally assumed to be normally and independently distributed in thex

andydirections. Thus, if pnet

[c] is a network position at coordinates (x

net

[c] , y

net

[c] )

that corresponds with the GPS data point at pgps_[_c_] = (x_[gps_c_] , y_[gps_c_] ), then P_location(pnet_[_c_] , pgps_[_c_] ) = P(pnet_[_c_] |pgps_[_c_] )

(6)

= P(p gps [c] |p net [c] )P(pnet[c]) P(pgps_[_c_] ) ∝ P(zx =xnet[c] −x gps [c] )∗P(zy =ynet[c] −y gps [c] ),

where we have assumed all network positionspnet_[_c_] are equally likely,P(pgps_[_c_] ) = 1, andzx ∼N(0, σ), zy ∼N(0, σ), with σ being provided by the GPS

equip-ment operators.

2. GPS Speed & Heading: A calculation of the probability Pspeed(pnet_[c] , v

gps

[c] ) of

the vehicle having a particular speed given by vgps_[_c_] at location pnet_[_c_] can be made if speed information is available in the GPS data. A similar calculation can be made for the probability Pheading(pnet_[c] , v

gps

[c] ) of obtaining the observed

vehicle heading.

3. Observed Travel Time and Distance: We assume that the vehicle travels

by the fastest path between points pnet

[c−1] and pnet[c] . We can then define

Ptime(pnet_[c−1], t gps [c−1], p net [c] , t gps

[c] ) to be the probability that a fastest-path trip that

starts at locationpnet

[c−1] at time t

gps

[c−1] will arrive at location p

net

[c] at time t

gps

[c] .

(This calculation would have to assume some distribution for travel times; this is an area of on-going research.) Similarly, we letPdistance(pnet_[c−1], t

gps

[c−1], pnet[c], d

gps

[c] )

be the probability that this trip will have lengthdgps_[_c_] .

4. Directness: Drivers typically have a preference for faster more direct routes. To recognise this, we wish to give faster routes a higher probability score than slower routes. This is important because, in our travel model, we assume that U-turns occur only at intersections, and thus the directions associated

with network locationspnet

[c−1] andpnet[c] can significantly impact the travel time

T(pnet

[c−1], t

gps

[c−1], p

net

[c] )between these locations. Clearly, some choices of direction

at these locations will give much longer travel times, and so we wish to penalise these ‘unlikely’ choices of direction. To achieve this, we introduce a penalty ‘probability’ factorPdirectness(pnet_[c−1], t

gps [c−1], p net [c] )∝exp(−k∗T(p net [c−1], t gps [c−1], p net [c] ))

which decreases with increasing travel time. (Increasing the parameter k

makes short routes look increasingly more favourable.) This function behaves correctly in the sense that if a trip is broken into three sub-trips, the product of the penalties for the 3 sub-trips is the same as the penalty computed for the original full length trip.

5. Skipped Points: A solution S comprised of nS network positions for a trip

with n GPS data points will have skipped nS −n of the GPS data points.

Clearly, skipping data points is not a good use of the data available, but is required to determine routes in the presence of large GPS errors. Therefore,

we assume, with some low probability, p_invalid, that any GPS data point is

invalid, and incorporatePskip(g1, g2, , ..., gn, g[1], g[2], ..., g[nS]) by introducing a

term P_skip(n, nS) = (pinvalid)n−nS that allows data points to be skipped. (To

see this more formally, we assume the probability distribution P(pgps_[_c_] |pnet

[c] )

is a weighted sum of a normal distribution centred on pnet

(7)

distribution that is uniformly distributed over all possible (x, y) locations in the geographical area of interest.)

Taking these terms together, we see that we can write our objective function (1) in the form

P(pnet_[1] , pnet_[2], ..., pnet_[_n

S]|g1, g2, ..., gnS) ∝ Pskip(n, nS)×Plocation(p net [1] , p gps [1] ) (2) ×Pheading(pnet[1] , v gps [1] )×Pspeed(p net [c] , v gps [c] ) × nS c=2 P(pnet_[_c_] |pnet_[_c₋_1], g_[c]) where

P(pnet_[_c_] |pnet_[_c₋_1], g[c]) = Plocation(pnet[c] , p

gps [c] )×Pheading(p net [c] , v gps [c] )

×P_speed(pnet_[_c_] , vgps_[_c_] )×P_duration(p_[net_c₋_1], tgps_[_c₋_1], pnet_[_c_] , tgps_[_c_] )

×Pdistance(pnet[c−1], t gps [c−1], p net [c] , d gps [c] )×Pdirectness(p net [c−1], t gps [c−1], p net [c] ) and, as before, g_[c] = (tgps_[c] , p gps [c] , v gps [c] , d gps

[c] ). Note that the first term P(p

net

[1] |g[1]) in

(1) has been handled differently as it has no dependence on an earlier point. A fundamental assumption of this model is the use of fastest paths to interpolate between GPS data points. This gives rise to an unavoidable inconsistency in our model in that we assume vehicles travel by fastest paths between data points, but that the trip as a whole does not necessarily follow the fastest path. This conflict is made explicit in our directness penalty. If a large weight is placed on this, then paths that are short but skip many GPS data points will score more highly than those paths that faithfully follow the GPS data, even when the date contains errors. Careful parameter selection can help avoid this.

Our probability model provides a flexible framework that can be extended in a number of ways. For example, we note that the probability of a GPS unit gen-erating an invalid data point is much greater in some areas than in others. For example, points generated upon entering or leaving a tunnel often give locations many kilometers away from the correct location. Thus, the handling of skipped points could be improved by computing an ‘invalid data point’ probability that would depend on the location of the vehicle when the GPS data point was gen-erated. This would involve calculating a path-specific ‘skipped points’ probability

for each of the route segmentspnet

cj,ij −→ p

net

cj+1,ij+1 that contain skipped points (i.e. have cj+1 > cj + 1). This calculation would require predicting a vehicle location

for each of the times tgps_c

j+1, t

gps cj+2, ..., t

gps

cj+1−1 associated with the skipped GPS data

points, and then using these locations to compute location-specific probabilities of erroneous GPS locations being generated.

2.2 Dynamic Program

Consider now the problem of constructing a solutionS that best matches the given

GPS data. Once the candidate network positions have been constructed for a trip, the problem becomes that of choosing one candidate position for each GPS

(8)

data point (or deciding to skip that data point) so that these positions together give the maximum likelihood locations for the vehicle during the trip based on the probability measure given in (2). The product form of this equation allows us to

develop a dynamic programming recursion to solve this problem. Letf(pnet

c,j)be the

probability (objective function value) associated with the most likely partial route

that ends at network locationpnet

c,j taking into account GPS data pointsg1, g2, ..., gc.

Then, forc= 2,3, ..., n, f(pnet_c,j) = max b=1,2,...,c−1(pinvalid) c−b−1 _max i=1,2,...,|Pnet b |

f(pnet_b,i )P(pnet_c,j|pnet_b,i , gc).

The initial value is defined by f(pnet

1,j) = Plocation(pnet1,j, p gps 1,j)×Pheading(pnet1,j, v gps 1,j)× Pspeed(pnet1,j, v gps 1,j),j = 1,2, ...,|P net

1 |. The optimal objective function is then given by

maxj=1,2,...,|Pnet n |f(p

net

n,j). If we take logarithms of the terms in the objective function,

we see that this problem is equivalent to that of finding a shortest path through the state space network illustrated in Figure 3.

3 Implementation and Results

The dynamic program has been implemented and tested within the Siren ambulance simulation system [3] using data for Perth, Australia. The GPS data lacked heading, speed and distance information, and so the terms associated with these in the objective function were dropped.

There are a number of algorithmic efficiencies to consider when implementing this algorithm. The main computational burdens are (1) generating the candidate network positions, and (2) constructing the fastest path routes between each pair of candidate network positions. The first of these steps requires identifying arcs that are close to the GPS data point. To avoid searching through all arcs, we break the area into a rectangular grid of cells, and pre-compute all the arcs that intersect each cell. Given the limited objective function we are using, we have found that recording only one candidate network position for each arc still gives good results. The other expensive step is generating the fastest paths between pairs of candidate network positions. In practice, we do not consider skipping more than 2 successive GPS data points, and thereby restrict the number of fastest paths we need to construct.

Figure 5 shows a sample of the GPS data points collected from the vehicles. (In this plot, the points have been coloured using a simple straight-line estimate of vehicle speed.) The area shown has a tunnel located near the centre of the figure, which is responsible for the scattering of erroneous data points in this area. Figure 6 shows what can go wrong if the GPS data is followed too faithfully, even if the GPS errors are very small, while Figure 7 shows a more typical route that has been successfully re-constructed from the GPS data.

There are a number of efficiency improvements currently being implemented to our system. For example, we note that we can use a one-to-many shortest path algorithm to efficiently compute the fastest paths from a candidate network position to all other next candidate positions. (Only one such tree would be required for all those starting candidate positions that share the same arc.) We could also use such a

(9)

Figure 5: GPS Data Points collected by St John Ambulance in Perth

Figure 6: This figure shows an example of a badly map-matched route with a loop caused by matching to the closest (but incorrect) arc where lanes in each direction are physically separated. This problem was fixed by increasing the importance of ‘directness’ and increasing the variance in the GPS position estimates.

Figure 7: An example of a map-matched route. The GPS data points are labelled with their time stamps.

(10)

tree to compute the candidate network positions, perhaps limiting the tree’s growth by noting the time when the next GPS data point will be encountered. This last change would make our algorithm a formal generalisation of the algorithms found in the literature (e.g. [2]) that rely on there being only a short distance between successive points.

Conclusions

We have developed a formal dynamic programming approach to the map matching problem that is well suited to the problems faced when analysing the sparse GPS data typically generated by ambulance services. We have validated this approach using data from St John Ambulance in Perth and are currently implementing algo-rithmic improvements aimed at reducing run times.

Acknowledgments

The author would like to acknowledge his colleagues at Optimal Decision Tech-nologies Ltd (ODT) who supported the research and testing required for this work. The author also thanks Perth’s St John Ambulance for providing the data, and both Otto Neilsen and René Jørgensen from the Technical University of Denmark (DTU), and Shane Henderson from Cornell University for their valuable personal communications regarding this problem.

References

[1] F. Marchal, J. Hackney, and K.W. Axhausen. Efficient Map-Matching of Large

GPS Data Sets - Tests on a Speed Monitoring Experiment in Zurich, volume

244 ofArbeitsbericht Verkehrs und Raumplanung. ETH Zürich, Zürich, 2004.

[2] Otto Anker Nielsen. Map-matching algorithms for GPS data - methodology and test on data from the AKTA roadpricing experiment in Copenhagen. Tech-nical report, Centre for Traffic and Transport (CTT), TechTech-nical University of Denmark (DTU), 2004.

[3] Optimal Decision Technologies Ltd. Simulation for Improving Responses in Emergency Networks (Siren) Software, 2005. www.Optimal-Decision.com. [4] K.W. Axhausenand S. Schönfelder, J. Wolf, M. Oliveira, and U. Samaga. Eighty

weeks of GPS traces: Approaches to enriching trip information. Technical re-port, IVT / ETH, CH 8093 Zürich, 2003. Presented at the 83rd Transportation Research Board Meeting, Washington, D.C., Jan. 11-15, 2004.