• No results found

4.5 Simulation results

4.5.1 Experiments with Known Constant Tracking Probabilities

We performed a series of numerical tests using a small eight node network. The first two nodes are origins of travel and nodes 7 and 8 are destinations. Origin and destination nodes are highlighted as darker circles in the depiction of the network in Figure 4.6.

1 2 3 4

5 6 7 8

1 2 3

4 5 6 7

8 9 10

Figure 4.6: Test network.

We order the total of 12 routes lexicographically in blocks by OD pair, and then lexicographically by link sequence within each block. We assume that vehicle counts were measured on all links except for 3, 4, 7 and 8, which is sufficient to produce a 6×12 link-path routing matrixA with linearly independent rows.

We begin by comparing the performance of the full and simplified maximum likeli- hood estimators under varying levels of demand, and availability of routing data. The mean route flows areλ= (21,6,3,18,12,6,4,34,6,24,3,3)T for the lowest level of de- mand. We also consider medium and high demand scenarios where λ is multiplied by factors of 5 and 25 respectively. In these simulation experiments we consider the tracking probabilitiespto be a known constantp0across all routes. We examine values of p0 of 0.05,0.1,0.2, . . . ,0.8,1.0.

For each combination of demand level and p0 we generate 100 sets of link counts and tracked routing data using the Poisson model described in section 4.2. We then compute estimates of λ for each data set, and then calculate the scaled root mean squared error, defined as

SRMSE= 1 ||λ||1 1 100 100 n=1 12 r=1 (ˆλ(rn)−λr)2, (4.21)

over each block of 100 datasets, where ˆλ(rn) is the estimate of the average flow rate for

This error criterion standardises the squared errors by the total demand, which provides us with a measure that can be used in a straightforward manner for making comparisons in estimation accuracy across different levels of demand.

As a point of reference we also present results obtained using just the routing data at hand and investigate how the accuracy of the estimates contrasts with the results achieved where we use the available link count data as well. We implement our simulations under the assumption that p is known, so that when using the tracking data alone we use ˜λ=ygps/p, where division is applied elementwise, to estimate the mean route flows. In this experiment we assume the rate of penetration to be constant across all routes, ergo ˜λis given byygps/p0here. We can see the results for all methods in Table4.1.

Method Tracking probability,p0

0.05 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 1 Low demand NOR 0.253 0.198 0.152 0.127 0.113 0.097 0.093 0.087 0.086 0.083 SIM 0.195 0.155 0.123 0.115 0.106 0.097 0.089 0.085 0.084 0.082 GPS 0.346 0.246 0.171 0.144 0.125 0.114 0.099 0.092 0.087 0.082 Medium demand NOR 0.127 0.085 0.061 0.051 0.046 0.044 0.041 0.037 0.036 0.036 SIM 0.108 0.081 0.063 0.052 0.047 0.044 0.042 0.038 0.037 0.036 GPS 0.166 0.116 0.083 0.064 0.055 0.051 0.047 0.041 0.039 0.036 High demand NOR 0.049 0.036 0.026 0.023 0.020 0.019 0.018 0.018 0.017 0.016 SIM 0.048 0.037 0.028 0.024 0.021 0.020 0.019 0.018 0.017 0.016 GPS 0.069 0.051 0.036 0.029 0.025 0.023 0.021 0.019 0.018 0.016

Table 4.1: Scaled root mean squared errors of maximum likelihood estimators using NOR (normal model), SIM (simplified model), and GPS (tracked vehicle data only). The results are computed for varying levels of demand and global tracking probability.

One striking result we notice from studying this table is the fact that the simplified model returns a lower error rate than the normal model when the demand is low. The rationale for this phenomena is rooted in the failure of the normal approximation to the Poisson distribution for very low mean flows.

For small tracking probabilities (p0 0.3) and low demand (λr = 3 for some r),

there are some routes where the mean number of tracked vehicles (λgps,r = p0λr) is

less than one. In such cases, the probability mass functions whose variance depend on the mean rates will become unbounded for λgps,r 0 (as we are essentially trying to

divide by 0). The mass function of the simplified model on the other hand remains finite because the variance is a fixed value which is not influenced by the values of true flow rates.

A closer analysis of the results confirm that for data sets where the normal model returns interior (i.e. non-boundary) estimates, it enjoys (slightly) smaller errors than the results using the simplified model.

It is only in those cases in which the normal method produces boundary estimates (coupled with infinite log-likelihood values) that the results are particularly imprecise. In both cases, where we have small p0 in combination with low demand, we find that we often generate zero counts ygps,r, which leads to both the normal and the

simplified model producing estimates that lie on the boundary of the parameter space. The normal model however, will never produce a zero estimate of an element of λ if the corresponding route flow is zero as we saw in our discussion of the theoretical properties of the normal likelihood. The simplified model, in contrast, while it continues to produce fairly reliable estimates due to the relative robustness of GLS estimation, can sometimes turn out negative estimates of average route flows, which are non-sensical.

For moderate demand the differences between the normal and simplified model estimators are miniscule. There is a suggestion that the latter is preferable for the smallest tracking probabilities. This is again explicable in terms of problems with the normal approximation to the Poisson at low demands, and the consequent likelihood of boundary estimates, an effect we still occasionally see, even with the moderate demand, if the tracked vehicle flows rates λgps,r are sufficiently small. Actually, there is still a

slight hint of the expected theoretical (asymptotic) advantage of the normal method over the simplified one when the demand is high, but the differences in the results are very small.

We note that in all cases the benchmarking results based on the use of the routing data alone are significantly less accurate than the models which incorporate the link counts as well. Especially in the case of lower penetration rates, p0 0.2, which are more likely to be encountered in practice in the near future, we often see a 50% loss in estimation efficiency when the link data is ignored.