• No results found

2   Chapter 2: Copula–based joint discrete–continuous model of road vehicle type

2.3   Methods

2.3.1   Model formulation

The joint decision of vehicle type and shipment size is a multidimensional problem for which a copula–based approach offers several advantages over the currently used methods.

Firstly, copulas determine the dependency by joining marginal distributions to form a new joint distribution, without the need for using any specific distribution family or transforming the marginal distributions when they are not normal. Copulas have proved useful for discrete or joint discrete–continuous models when non–normality and non–linearity frequently arise (Trivedi and Zimmer, 2007). Secondly, multivariate correlation methods (e.g., Pearson, Kendall’s tau, Spearman’s rho) measure the central dependence and fail to properly estimate near the boundaries. Copulas allow estimating the tail dependence in both symmetric and asymmetric forms (Frey et al., 2001). Thirdly, copulas can handle complex joint distributions in any form of univariate marginal distributions, particularly as the number of dimensions (i.e., the number of joint distributions) increases.

In this study, we express the choice of vehicle type via an MNL model where Uin represents the random utility of vehicle type i for shipper (carrier) n, Vin is the deterministic part of the utility of vehicle type i for shipper (carrier) n, Xin is a K–dimensional vector of attributes xkin of vehicle type i for shipper (carrier) n, and in is an error term that is assumed identically and independently Gumbel distributed:

The maximum utility Ujn of the unchosen alternatives j for individual n is decomposed into a known part Vjn and an unknown part εjn which according to (Ben-Akiva and Lerman, 1985) isGumbel distributed with parameters (μ, ln( ∑j≠i exp(μVjn)/ μ) , so we can write:

36

Since the difference of two random terms with the same mean μ has itself a mean of zero, we can write:

The error terms resulting from discrete choice models follow the generalized extreme value type I distribution; thus the difference of two Gumbel–distributed random variables,

*= (jnin) has a logistic distribution with the following cumulative distribution function which equals (Pourabdollahi et al., 2013):

   

Although the initial assumption is that all variables are utility terms, in this study we test the possibility that some variables are regret terms. Namely, we test the possibility that a hybrid utility–regret formulation is more suitable to represent the choice behaviour of shippers and carriers. Generally, this hybrid specification of Vin is (Chorus et al., 2013):

 

The shipment size model takes the form of a log–linear regression that guarantees non–negative shipment sizes, where yn represents the logarithm of the shipment size chosen by shipper (carrier) n as a function of a vector Zkn of shipment attributes and a vector of unobserved factors n that are assumed to be normally distributed.

1

Let G(y) represent the probability that shipper (carrier) n chooses a shipment size smaller than y. The probability that the random variable y lies approximately around the observed shipment size y is calculated by (y+) – (y–) as follows:

37 a very small value. Accordingly, the joint probability that vehicle type i and shipment size y are chosen by shipper (carrier) n is expressed in (2.9), where τ* = τ′ – τ, and τ′ is the disturbance term of unchosen shipment sizes:

 

The use of the copula allows us to join the separate one–dimensional distribution functions to form a multivariate distribution. Copula–based models capture the dependency between the unobserved terms * and in the vehicle type and shipment size models. Based on Sklar’s theorem, there exists an unique copula that connects these two variables (Sklar, 1973). For a review on copula models, see (Trivedi and Zimmer, 2007).

In this study, we aimed for comprehensive copulas not restricted to specific multivariate distributions, allowing for both positive and negative dependence, and defining a symmetric dependence structure, since we assumed that the unobserved factors have the same effect on increasing as well as decreasing the probability of choosing a certain vehicle type and shipment size. Accordingly, we excluded from consideration copulas that are constructed from normal multivariate distributions (e.g., Gaussian), copulas that cannot handle negative dependence (e.g., Joe, Clayton), and copulas that model strong correlation in either higher or lower values with one–tail dependence (e.g., Gumbel). Accordingly, we investigated Archimedean copulas that satisfy our needs and are easier to derive (Trivedi and Zimmer, 2007). From these, we used the Frank copula (Frank, 1978; Charpentier et al., 2007), with the cumulative density function (CDF) in (2.10), and probability density function (PDF) in (2.11).

38

Assuming the independence of the joint choice observations over the decision makers (shippers/carriers), the log–likelihood function LL is expressed in (2.12). Alternately, if we assume the independence of vehicle type and shipment size choices, the log–likelihood function follows (2.13).

The estimation of the joint copula–based discrete–continuous model involves estimating the marginal CDFs F(x)n and G(y)n and the joint CDF ( F(x)n,G(y)n).

Depending on the available information on the marginal distributions, the copula parameter is usually estimated in three ways: (i) a fully parametric maximum likelihood (ML) method, (ii) a stepwise parametric or inference functions for margins (IFM) (Joe and Xu, 1996), or (iii) a semiparametric pseudo–maximum likelihood approach. The first estimation method requires assumptionsabout the type of distribution for the copula parameter ( ).

The second approach decomposes the problem into a multistep estimation procedure where, in the first step, the parameters of the margins are estimated under an independence assumption using individual likelihood functions. Then, the dependency parameter of the copula ( ) is estimated by maximizing the copula log likelihood function with the marginals replaced by their estimated values. However, when empirical marginals are available, the third approach is preferred.

Having an initial assumption about the type of marginal distributions, we used the fully parametric ML method that enables us to estimate the dependency parameter as well as coefficients of the two choices simultaneously. To solve the likelihood maximisation problem, we applied the L–BFGS–B (Limited memory Broyden–Fletcher–Goldfarb–Shanno algorithm with boundary) algorithm, as it is one of the commonly used algorithms for