CHAPTER 2 LITERATURE REVIEW
2.3 Statistical Methods for Traffic Reliability Analysis
2.3.1 Non-Parametric Method for Confidence Interval Estimation
Due to data availability, traffic parameter statistics (e.g., mean and median) used for traffic system evaluation are generally estimated from samples rather than the whole population. The sample-based estimates, however, might not be exactly equal to the true population parameters, resulting in uncertainties for performance evaluation. Standard error is one indicator of such uncertainty. Naik (2010) applied ordinary bootstrap, block bootstrap, and gap bootstrap to estimate the uncertainty of the travel time prediction model. In this dissertation, the confidence interval of traffic parameter estimates will be used to evaluate the uncertainty of traffic system performance. In this section, various bootstrap methods for interval estimation are reviewed.
2.3.1.1 Standard Error Based Confidence Interval
Assuming that the estimator (πΜ) of the true parameter (π) follows a normal
distribution, the (1 β 2πΌ) confidence interval can be approximated as πΜ Β± π§1βπΌβ π πΜ,
where πΜ is the point estimate of π and π πΜ is the estimated standard error. When the
sample size (n) is not large enough to make the assumption of normal distribution hold,
πΜ Β± π‘πβ11βπΌβ π πΜ can generate efficient average estimates. These two methods are named as
the standard confidence interval and the Studentβs t interval. They yield equal-tail intervals that are unable to represent the distribution skewness or other errors when πΜ represents other statistics (e.g., median) instead of the mean.
To relieve the constraints of normal theory assumption and account for unequal tail, the bootstrap-t interval was proposed to estimate the distribution π‘Μ directly from the data instead of making the assumption of normal or t distribution. The resulting interval is in the form of [πΜ β π‘Μ1βπΌβ π πΜ, πΜ β π‘ΜπΌβ π πΜ]. It is important to note that π‘Μ1βπΌ is not equal
to π‘ΜπΌ in regards to skewness.
To apply this method, an efficient way to estimate the standard error estimator is necessary for the dataset with dependent structure. It is well-established that the standard
error of the sample mean could be estimated usingβπ 2/π, where π 2 = β (π₯
πβ π₯Μ )2 π
π=1 /
(π β 1). However, there is no such equation for most statistical estimators (e.g., median). In these instances, the bootstrap estimate of standard error first proposed by Efron in 1979, can be used. It is illustrated using the statistical estimator median as an example. The basic bootstrap algorithm starts with generating a large number of independent
bootstrap samples: π₯β1, π₯β2,β¦ , π₯βπ΅, each of size n. The number of samples (B), generally
ranges from 50 to 200 for standard deviation estimation. Bootstrap median replicates
π (π₯β1), s(π₯β2),β¦ , π (π₯βπ΅) can be calculated for each sample. The standard deviation of
these replicates is the standard error estimator of the median π (π₯), as shown in equation 2.8. π π Μππππ‘ = {β[π ( π΅ π=1 π₯βπ) β π (β)]2/(π΅ β 1)}1/2 (2.8a) π (β) = β π ( π΅ π=1 π₯βπ)/π΅ (2.8b) where:
π π
Μππππ‘ = the estimated standard error of median using bootstrap-t method,
B = the size of bootstrap sample, and
π (π₯βπ) = bth bootstrap median replicate.
Different from the standard intervals which are symmetric around zero, the asymmetric intervals resulting from bootstrap-t percentiles represent an improvement in coverage. It is particularly applicable to location statistics like the sample mean, median, and other percentiles, but is not trustworthy for more general problems such as setting a confidence interval for a correlation coefficient. An overall assessment of the three standard-error based confidence intervals are quoted from Efron and Tibshirani (1993):
βThe increase in accuracy of estimation for Bootstrap-t approximation is at the price of generality. The standard confidence interval applies to all samples, and all sample sizes; the student-t table applies to all samples of a fixed size n; the bootstrap-t table applies only to the given sample.β
2.3.1.2 Percentile Based Confidence Interval
Although the bootstrap-t method can theoretically account for skewness and yield good theoretical coverage probabilities, it can yield somewhat erratic results in practice. Improved methods use percentiles instead of the standard error of bootstrapped estimates to identify the confidence limits.
If bootstrap distribution of πβ = π (π₯β) is roughly normal, then the standard
normal and percentile intervals will nearly agree. The bootstrap distribution can be regarded as a normal distribution if sample size n approaches infinite, according to the
central limit theorem. However, this might not hold for small samples in which case the percentile interval is superior to the standard normal interval. Also, a percentile interval has transformation-respecting and range-preserving property. By range-preserving property, a percentile interval always falls within the allowable range of its estimator. Although percentile intervals are less erratic in practice compared to bootstrap-t intervals, they have less satisfactory coverage properties.
Given independent bootstrapped samples π₯β1, π₯β2,β¦ , π₯βπ΅, each of size n,
bootstrap replicates πΜβ(π)= π (π₯βπ), π = 1, 2, β¦ , π΅. Denote πΜ
π΅β(πΌ) as the 100Ξ±th empirical
percentile (i.e., the value in the ordered list of the B replications of πΜπ΅β). The (1 β 2πΌ)
percentile interval would be [πΜπ΅β(πΌ), πΜπ΅β(1βπΌ)]. It needs more bootstrap samples (B) for
percentile estimation than for standard error estimation. Variable B should be greater than 500 or 1000 to make the variability of percentile estimators acceptably low.
2.3.1.3 Bias-Corrected and Accelerated (BCa) Interval
The BCa interval is an improved version of the percentile method in both theory and practice. Given enough sample size, the resulting interval would closely match exact confidence intervals in special situations, where the statistically exact interval is
accessible through statistical theory, and give dependably accurate coverage probabilities in all situations. In addition, the BCa method is also transformation-respecting.
Integrating the performance on accuracy and flexibility, the BCa method is recommended for general use by Efron and Tibshirani (1993).
The end points of the BCa interval is modified by acceleration (πΜ) and bias-
The notation Ξ¦(β) is the standard, normal cumulative distribution function and π§(πΌ) is the 100Ξ±th percentile point of a standard normal distribution. For example, Ξ¦(1.645) = 0.95
and π§(0.95) = 1.645. It can be assessed from equation 2.9 that if πΜ and π§Μ0 are zero, the
interval is equal to the percentile interval. Non-zero πΜ and π§Μ0 correct deficiencies of the
previous standard and percentile methods. BCa: (πΜπ΅β(πΌ1), πΜ π΅β(πΌ2)) (2.9a) πΌ1 = Ξ¦(π§Μ0+ π§Μ0+ π§(πΌ) 1 β πΜ(π§Μ0+ π§(πΌ))) (2.9b) πΌ2 = π·(π§Μ0+ π§Μ0+ π§(1βπΌ) 1 β πΜ(π§Μ0+ π§(1βπΌ))) (2.9c) where: πΜ = acceleration, π§Μ0 = bias-correction,
π§(πΌ) = the 100Ξ± th percentile point of a standard normal distribution.
2.3.1.4 Modified Bootstrap
When the dataset is not composed of independent observations, the standard bootstrap method is not enough to get independent bootstrap samples, and modified bootstrap (e.g., block bootstrap) is needed. Specifically for the traffic dataset with dependent observations within one day, Lahiri et al. (2012) applied gap bootstrap to generate consistent and asymptotically unbiased estimates of standard error for a massive dataset with certain dependent structure.
2.3.1.5 Summary
This dissertation compares the coverage of the standard error based confidence interval and the BCa confidence interval. The BCa method is selected to calculate the confidence interval of individual traffic parameters.