Path Inference Filter: Map Matching of Probe Vehicle Data
Algorithm 4: Lagged smoothing algorithm
2. Estimation of the gradient
Chapter 4 Path Inference Filter: Map Matching of Probe Vehicle Data
Hybrid Traffic Data Collection Roadmap: Objectives and Methods 79
This partial summation can also be defined recursively:
πππ(π) = ππβ πππ οΏ½ πππβ1
πβοΏ½1...πΎπβ1οΏ½:βποΏ½π§π,π§ποΏ½=1 (π) (13)
The start of the recursion is for all π β {1 β― πΎ1}:
ππ1(π) = ππβ ππ1
and the complete partition function is the summation of the auxiliary values:
π(π) = οΏ½ πππΏ
πΎπΏ
π=1
(π)
Computing the partition function can be done in polynomial time complexity by a simple application of dynamic programming. By using sparse data structures to implement β, some additional savings in computations can be made8.
2. Estimation of the gradient
The estimation of the gradient for the first part of the log likelihood function is straightforward.
The gradient of the partition function can also be computed using Equation (13):
π»ππππ(π) = πππ(π)πππ+ ππβ πππ οΏ½ π»π
π:βποΏ½π§π,π§ποΏ½=1
πππβ1(π) The Hessian matrix can be evaluated in similar fashion:
8In particular, care should be taken to implement all the relevant computations in log-domain due to the limited precision of floating point arithmetic on computers. The reference implementation [59] shows one way to do it.
Chapter 4 Path Inference Filter: Map Matching of Probe Vehicle Data
Hybrid Traffic Data Collection Roadmap: Objectives and Methods 80
β³πππππ(π) = πππ(π)οΏ½ππποΏ½οΏ½ππποΏ½ β²
+ππβ ππποΏ½ οΏ½ π»π
π:βποΏ½π§π,π§ποΏ½=1
πππβ1(π)οΏ½ οΏ½ππποΏ½ β²
+ππβ ππποΏ½ππποΏ½ οΏ½ οΏ½ π»π
π:βποΏ½π§π,π§ποΏ½=1
πππβ1(π)οΏ½
β²
+ππβ πππ οΏ½ β³ππ
π:βποΏ½π§π,π§ποΏ½=1
πππβ1(π)
EXPONENTIAL FAMILY MODELS 4.2
We now express our formulation of Conditional Random Fields to a form compatible with Equation (10).
Consider π = πβ2 and π the stacked vector of the desired parameters:
π = οΏ½π ποΏ½
There is a direct correspondence between the path and state variables with the π variables introduced above. Let us pose πΏ = 2π β 1, then for all π β [1, πΏ] we have:
π§2π‘= ππ‘ π§2π‘β1 = ππ‘
and the feature vectors are simply the alternating values of π and d, completed by some zero values:
ππ2π‘ = οΏ½ 0 ποΏ½πππ‘οΏ½οΏ½
ππ2π‘β1= οΏ½β1
2 dοΏ½π, π₯ππ‘οΏ½2
π οΏ½
These formulas establish how we can transform our learning problem that involves paths and states into a more abstract problem that considers a single set of variables.
Chapter 4 Path Inference Filter: Map Matching of Probe Vehicle Data
Hybrid Traffic Data Collection Roadmap: Objectives and Methods 81
SUPERVISED LEARNING WITH KNOWN TRAJECTORIES 4.3
The most straightforward way to learn π and π, or equivalently to learn the joint vector π, is to maximize the likelihood of some GPS observations π1:π, knowing the complete trajectory followed by the vehicle. For all time π‘, we also know which path πobservedπ‘ was taken and which state π₯observedπ‘
produced the GPS observation ππ‘. We make the assumption that the observed path πobservedπ‘ is one of the possible path amongst the set of candidate paths οΏ½πππ‘οΏ½π:
βπ β [1 β― π½π‘]β:βπobservedπ‘ = πππ‘
and similarly, that the observed state π₯observedπ‘ is one of the possible states:
βπ β [1 β― πΌπ‘]β:βπ₯observedπ‘ = π₯ππ‘
In this case, the values of ππ‘ and ππ‘ are known (they are the matching indexes), and the optimization problem of Equation (11) can be solved using methods outlined in Section 4.1.
UNSUPERVISED LEARNING WITH INCOMPLETE OBSERVATIONS:
4.4
EXPECTATION MAXIMIZATION
Usually, only the GPS observations π1:π are available; the values of π1:πβ1 and π1:π (and thus π§1:πΏ) are hidden to us. In this case, we estimate the expected likelihood β, which is the expected value of the likelihood under the distribution over the assignment variables ππ:π³:
β(π) = πΌπβΌπ (β |π½)οΏ½log οΏ½π(π§; π)οΏ½οΏ½ (14)
= οΏ½ π
π§
(π§; π) log οΏ½π(π§; π)οΏ½ (15)
The intuition behind this expression is quite natural: since we do not know the value of the assignment variable π§, we consider the expectation of the likelihood over this variable. This expectation is done with respect to the distribution π(π§; π). The challenge lies in the dependency in π of the very distribution used to take the expectation. Computing the expected likelihood becomes much more complicated than simply solving the optimization problem of (11).
One strategy is to find some βfill inβ values for π§ that would correspond to our guesses of which path was taken, and which point made the observation. However, such a guess would likely involve our model for the data, which we are currently trying to learn. A solution to this chicken and egg problem is the Expectation Maximization (EM) algorithm [62]. This algorithm performs an iterative projection ascent by assigning some distributions (rather than singular values) to every π§π, and uses these
distributions to updates the parameters π and π using the procedures seen in Section 4.3. This iterative procedure performs two steps:
Chapter 4 Path Inference Filter: Map Matching of Probe Vehicle Data
Hybrid Traffic Data Collection Roadmap: Objectives and Methods 82
1. Fixing some value for π, it computes a distribution ποΏ½(π§) = π(π§; π)
2. It then uses this distribution ποΏ½(π§) to compute some new value of π by solving the approximate problem in which the expectation is fixed with respect to π:
maxπ πΌπ§βΌποΏ½(β )οΏ½log οΏ½π(π§; π)οΏ½οΏ½ (16)
This problem is significantly simpler than the optimization problem in Equation (14) since the expectation itself does not depend on π and thus is not part of the optimization problem.
Under this procedure, the expected likelihood is shown to converge to a local maximum [54]. It can be shown that good values for the plug-in distribution ποΏ½ are simply the values of the posterior distributions π(ππ‘|π1:π) and π(π₯π‘|π1:π), i.e. the values ππ‘ and πΜπ‘. Furthermore, owing to the particular shape of the distribution π(π§), taking the expectation is a simple task: we simply replace the value of the feature vector by its expected value under the distribution ποΏ½(π§). More practically, we simply have to consider:
π2π‘(π§2π‘) = πΌπβΌποΏ½β |π,π/1:ποΏ½οΏ½οΏ½ 0π(πππ‘)οΏ½οΏ½
These values of feature vectors plug directly into the supervised learning problem in Equation (11) and produce updated parameters π and π, which are then used in turn for updating the values of πΜ and πΜ
and so on.
Chapter 4 Path Inference Filter: Map Matching of Probe Vehicle Data
Hybrid Traffic Data Collection Roadmap: Objectives and Methods 83
Algorithm 5: Expectation maximization algorithm for learning