Experiments with Known Constant Tracking Probabilities

4.5 Simulation results

4.5.1 Experiments with Known Constant Tracking Probabilities

We performed a series of numerical tests using a small eight node network. The ﬁrst two nodes are origins of travel and nodes 7 and 8 are destinations. Origin and destination nodes are highlighted as darker circles in the depiction of the network in Figure 4.6.

1 2 3 4

5 6 7 8

1 2 3

4 5 6 7

8 9 10

Figure 4.6: Test network.

We order the total of 12 routes lexicographically in blocks by OD pair, and then lexicographically by link sequence within each block. We assume that vehicle counts were measured on all links except for 3, 4, 7 and 8, which is suﬃcient to produce a 6×12 link-path routing matrixA with linearly independent rows.

We begin by comparing the performance of the full and simpliﬁed maximum likelihood estimators under varying levels of demand, and availability of routing data. The mean route ﬂows areλ= (21,6,3,18,12,6,4,34,6,24,3,3)T for the lowest level of demand. We also consider medium and high demand scenarios where λ is multiplied by factors of 5 and 25 respectively. In these simulation experiments we consider the tracking probabilitiespto be a known constantp0across all routes. We examine values of p0 of 0.05,0.1,0.2, . . . ,0.8,1.0.

For each combination of demand level and p0 we generate 100 sets of link counts and tracked routing data using the Poisson model described in section 4.2. We then compute estimates of λ for each data set, and then calculate the scaled root mean squared error, deﬁned as

SRMSE= 1 ||λ||1 1 100 100 n=1 12 r=1 (ˆλ(rn)−λr)2, (4.21)

over each block of 100 datasets, where ˆλ(rn) is the estimate of the average ﬂow rate for

This error criterion standardises the squared errors by the total demand, which provides us with a measure that can be used in a straightforward manner for making comparisons in estimation accuracy across diﬀerent levels of demand.

As a point of reference we also present results obtained using just the routing data at hand and investigate how the accuracy of the estimates contrasts with the results achieved where we use the available link count data as well. We implement our simulations under the assumption that p is known, so that when using the tracking data alone we use ˜λ=y_gps/p, where division is applied elementwise, to estimate the mean route ﬂows. In this experiment we assume the rate of penetration to be constant across all routes, ergo ˜λis given byygps/p0here. We can see the results for all methods in Table4.1.

Method Tracking probability,p₀

0.05 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 1 Low demand NOR 0.253 0.198 0.152 0.127 0.113 0.097 0.093 0.087 0.086 0.083 SIM 0.195 0.155 0.123 0.115 0.106 0.097 0.089 0.085 0.084 0.082 GPS 0.346 0.246 0.171 0.144 0.125 0.114 0.099 0.092 0.087 0.082 Medium demand NOR 0.127 0.085 0.061 0.051 0.046 0.044 0.041 0.037 0.036 0.036 SIM 0.108 0.081 0.063 0.052 0.047 0.044 0.042 0.038 0.037 0.036 GPS 0.166 0.116 0.083 0.064 0.055 0.051 0.047 0.041 0.039 0.036 High demand NOR 0.049 0.036 0.026 0.023 0.020 0.019 0.018 0.018 0.017 0.016 SIM 0.048 0.037 0.028 0.024 0.021 0.020 0.019 0.018 0.017 0.016 GPS 0.069 0.051 0.036 0.029 0.025 0.023 0.021 0.019 0.018 0.016

Table 4.1: Scaled root mean squared errors of maximum likelihood estimators using NOR (normal model), SIM (simpliﬁed model), and GPS (tracked vehicle data only). The results are computed for varying levels of demand and global tracking probability.

One striking result we notice from studying this table is the fact that the simpliﬁed model returns a lower error rate than the normal model when the demand is low. The rationale for this phenomena is rooted in the failure of the normal approximation to the Poisson distribution for very low mean ﬂows.

For small tracking probabilities (p0 ≤ 0.3) and low demand (λr = 3 for some r),

there are some routes where the mean number of tracked vehicles (λgps,r = p0λr) is

less than one. In such cases, the probability mass functions whose variance depend on the mean rates will become unbounded for λgps,r → 0 (as we are essentially trying to

divide by 0). The mass function of the simplified model on the other hand remains finite because the variance is a fixed value which is not influenced by the values of true flow rates.

A closer analysis of the results conﬁrm that for data sets where the normal model returns interior (i.e. non-boundary) estimates, it enjoys (slightly) smaller errors than the results using the simpliﬁed model.

It is only in those cases in which the normal method produces boundary estimates (coupled with inﬁnite log-likelihood values) that the results are particularly imprecise. In both cases, where we have small p0 in combination with low demand, we ﬁnd that we often generate zero counts ygps,r, which leads to both the normal and the

simplified model producing estimates that lie on the boundary of the parameter space. The normal model however, will never produce a zero estimate of an element of λ if the corresponding route flow is zero as we saw in our discussion of the theoretical properties of the normal likelihood. The simplified model, in contrast, while it continues to produce fairly reliable estimates due to the relative robustness of GLS estimation, can sometimes turn out negative estimates of average route flows, which are non-sensical.

For moderate demand the differences between the normal and simplified model estimators are miniscule. There is a suggestion that the latter is preferable for the smallest tracking probabilities. This is again explicable in terms of problems with the normal approximation to the Poisson at low demands, and the consequent likelihood of boundary estimates, an effect we still occasionally see, even with the moderate demand, if the tracked vehicle flows rates λgps,r are sufficiently small. Actually, there is still a

slight hint of the expected theoretical (asymptotic) advantage of the normal method over the simpliﬁed one when the demand is high, but the diﬀerences in the results are very small.

We note that in all cases the benchmarking results based on the use of the routing data alone are signiﬁcantly less accurate than the models which incorporate the link counts as well. Especially in the case of lower penetration rates, p0 ≤ 0.2, which are more likely to be encountered in practice in the near future, we often see a 50% loss in estimation eﬃciency when the link data is ignored.

In document Statistical modelling and inference for traffic networks : a thesis submitted for the degree of Doctor of Philosophy (Page 114-116)