Maximum likelihood estimation - Statistical methods of OD matrix estimation

1.3 Statistical methods of OD matrix estimation

1.3.3 Maximum likelihood estimation

The general likelihood function for a random sample of counts is L(λ) = f(x|λ). Maximum likelihood (ML) estimation involves ﬁnding the model parameters that are most likely given the data observed, that is, the parameters that maximise the likelihood function.

The expression for the likelihood function as well as its calculation is simple if the level of demand in the network is suﬃciently large, that is, all mean route ﬂows are high (Cao et al., 2000; Vardi, 1996). We can then use a normal approximation where the probability density function of f(x|λ) for a random sample of counts collected onM links follows a multivariate normal distribution with meanAλand variance-covariance

AΛAT, as seen in Equation 1.3:

f(x|λ) = (2π)−M2|A_ΛAT− 1

2_e−12(x−Aλ)T(AΛAT)−1(x−Aλ)_. _(1.3)

Note that the dispersion matrix Λ for the route ﬂows is a diagonal matrix with the mean vectorλ as its elements. The functional relationship between the mean and the variance enables second-order properties of the link counts, such as the covariances between link counts, to be incorporated into the mean route ﬂow estimation. A covariance occurs because links which consistently share very similar numbers of vehicles are likely to actually have the same vehicles on them (Hazelton, 2003).

This connection is beneficial in countering the identifiability problem when deter- mining the underlying mean route flows. It provides additional information that can be used to identify the realised route flows in situations where the actual route flows alone would otherwise be ambiguous (Cao et al.,2000; Hazelton, 2003).

The likelihood takes on a signiﬁcantly simpler form when the link between the mean and the variance is severed, which can be achieved in two ways.

The first is to replace the dispersion matrix of the route flows Λ with dispersion matrix Σ whose diagonal elements are fixed values instead of elements of the mean route flow vector,λ.

The second approach is to substitute the dispersion matrix of the link countsAΛAT in Equation1.3by the sample variance-covariance matrix of the link countsS so that the dispersions do not depend on the parameter vector λ. The main advantage of simplifying the likelihood is the improvement in the computational eﬃciency in the estimation process.

The GLS estimation method is strongly associated with the ML estimation procedure (Nakayama et al.,2009), provided the likelihood is in quadratic form. While the likelihood in Equation 1.3is not in a quadratic form because the variance-covariance matrix is a function of λ, both the simpliﬁed likelihoods are. In particular, Hazelton

(2000) shows that ML estimation on the log-likelihood version of1.3 with S instead of AΛAT is a special case of GLS estimation where the target matrix is considered completely redundant (η=0).

This second approach to simplifying the likelihood is problematic if the link counts are not reliable, for example because there is not a great quantity available or the counts contain a lot of measurement error. As the dispersion matrix is a function of the link counts only, inaccuracies in the counts will lead to an erroneous matrix S. This leads to the potential of misleading ML estimates as a consequence. The ML estimator does not suﬀer from this issue if the variance-covariance matrix of the route ﬂows depends on the parameter λ.

Another major drawback of a fixed dispersion matrix for the route flows, as illus- trated in Hazelton (2000), is that only first order properties are considered. That is, GLS estimation (or ML estimation on the simplified likelihoods) focuses exclusively on reproducing the mean traffic flows. ML estimation methods on the other hand are ca- pable of considering second order properties as well through the functional relationship of the variance and the mean in the matrix Λ.

The popularity of the ML estimator lies in its sound statistical properties, such as asymptotic unbiasedness, normality and efficiency. However one must bear in mind these are asymptotic properties, which means they are only valid for networks with a high traffic demand. Furthermore, efficiency is only given if the model assumptions are valid, which implies careful validation of any beliefs held (Watling, 2002a).

The form of the likelihood and the ML estimation procedure are considerably less straightforward when dealing with networks that do not have high levels of demand. In these cases we are dealing with discrete distributions, typically Poisson.

In the case of discrete route flow models, the evaluation of the likelihood L(λ) involves summing over all terms in the set of feasible route flows, defined as

Y(x) ={y:x=Ay,y≥0}, (1.4) that is, all route ﬂows consistent with the observed link counts according to the system of linear equations 1.1. The set Y(x) is typically of substantial size which makes summation over all the elements problematic. This can become a computational black hole even for moderately sized networks. Another point at issue is that even when calculating a numerical approximation the need to enumerate possible elements in the set of feasible route ﬂows typically presents itself as a challenging computational task for larger networks (Hazelton, 2000;Watling,2002a,b).

In document Statistical modelling and inference for traffic networks : a thesis submitted for the degree of Doctor of Philosophy (Page 32-34)