Experimental analysis - : Random utility models: regression vs forecasting

Chapter 4 : Random utility models: regression vs forecasting

4.3 Experimental analysis

Experiment setting

As mentioned in the previous sections, and following a well-established and common approach in the literature, the different RUMs mentioned in the first section were estimated using synthetic datasets, allowing estimation results to be compared in a fair and controlled experimental setting.

These datasets were generated analogously to what done in Papola (2016) as described below, in terms of observations, choice context and definition of:

 observable components of the utilities (systematic utilities);  unobservable components of the utilities (random residuals);  choice.

Observation and choice context: The synthetic datasets encompass a variable number of

observations (200,1000, 5,000, 10,000, 100,000, 1,000,000) related to hypothetical three- alternatives and four-alternatives choice contexts; each observation is associated with the full set of alternatives.

Definition of the systematic utilities. The systematic utility of each alternative is given by a linear

combination of two specific attributes. The parameters ’s of these linear combinations do not

vary across observations, while values of the attributes for each observation are generated through independent random draws from a Normal variable with mean and variance defined as in Table 4.1, corresponding to a coefficient of variation equal to 0.1.

Definition of the random residuals. Random residuals are assumed to follow a multi-variate normal

distribution with zero-vector mean and predefined homoscedastic covariance matrix. Operationally, given a covariance matrix, random residuals for each alternative can be obtained through its Cholesky factorization:

 

ε F z (4.3)

in which F is the lower triangular matrix defined by the Cholesky factorization on the variance- covariance matrix .

106

Definition of the choice. The alternative chosen for each observation is the one with maximal utility.

Knowledge of the ground truth behind the estimation sample allows estimated factors to be

contrasted with true ones. This applies to taste parameters , to correlation matrices (whose

estimate can be computed through model structural parameters) and, especially, to market share elasticities.

Specifically, as mentioned in the previous section, the true market share elasticities can be computed by calculating the percentage variation in the total number of choices of a certain alternative in correspondence of the percentage variation of the value of a certain attribute. Concerning correlations, it must be highlighted that while for NL and CoNL the estimated correlations can be computed through the existing closed-form formulas, CNL and Fin-Mix correlations must be computed through integral calculation, as suggested for instance by Marzano et al. (2013) and Marzano (2014).

Several types of correlation contexts are assumed, with increasing number of non-zero correlations, hence requiring an increasing model flexibility in terms of reproducible correlation matrices.

Specification of the structure of the tested models

Compared models are MNL, NL, CNL, CoNL and Fin-Mix. For the sake of brevity, their structure - in terms of nesting structure - are depicted in Figure 4.1 and Figure 4.2. Both CoNL and FinMix were specified by mixing binary nests. Conversely, the CNL was specified by using “full nests” – i.e. nests including all the alternatives - as suggested by Marzano and Papola (2008).

Experimental results

The estimation of the different models in the different choice contexts, with variable sample size, were carried out with Matlab. For validation purpose some estimation experiments were carried out also with different software: R, MS Excel and BioGEME (Bierlaire, 2003). Main outputs are taste parameters, correlations and market share elasticities which have been contrasted with the corresponding true values.

For the sake of brevity, only estimation results concerning the market share elasticities are showed, with the aid of a set of synthetic plots. Indeed, as mentioned in the introduction, this indicator represents the real forecasting capability of a model and hence the main interest of the analyst when applying a model.

In this plots, in particular, a synthetic elasticity indicator (Ie) is reported, representing the mean

square error between true and modelled market share elasticities (both direct and cross). In more detail, for any model - characterized by a specific colour - the elasticity performances are plotted as a function of the sample size, in a semi-logarithmic scale. The same kind of plot is presented for several correlation contexts, related to the “three-alternative” (A,B,C) and the “four-alternative” (A,B,C,D) choice contexts.

Conversely, the goodness-of-fit measures trend, with reference to the sample size, is shown, particularly the adjusted r2 and the ratio between the optimum log-likelihood value and the sample size (called normalized log-likelihood). The objective of the comparison is contrasting the errors of the analysed models in terms of forecasting and the goodness of the estimated models in terms of fitting.

107 A first comment refers to the great importance of the sample size which help significantly in reaching better performances whatever the model. The main differences are observed when passing form hundreds to thousands of observations.

A second main comment is the general capability of reproducing almost perfectly the true market elasticities in all correlation contexts, if using the “correct” model, i.e. a model with an underlying correlation pattern compatible with the correlation context assumed as true.

Alternative 1 Alternative 2 Alternative 3 Alternative 4

Attributes Parameters Attributes Parameters Attributes Parameters Attributes Parameters

X1 X2 1 2 X3 X4 3 4 X5 X6   X7 X8  

Av. 8 2 1 6 5 2 2 5 4 2 3 4 2 5 5 2

St.dv. 0.8 0.2 - - 0.5 0.2 - - 0.4 0.2 - - 0.2 0.5 - -

Table 4.1: Experimental setting.

On the other hand, and even more importantly, the performance of the “wrong models” can be significantly worse with respect to those of the correct models.

The first experiment to be shown is the one of Figure 4.3 (three alternatives). This experiment is actually shown as check experiment, because its simplicity allows the reader to create a precise expectation on the results. This is a typical one-level Nested correlation scenario, wherein only one true value of the correlation is set to be different from 0. Particularly, the experiment wants to show a boundary case of nested correlation (almost total), so the true

value of AB is fixed to 0.95 (the reader can refer to the well-known Daganzo and Sheffi

network, cited in Chapter 2 and depicted in Figure 5.5). The NL is the natural candidate to reproduce such situation, while the MNL fails to reproduce it, due to the limitations of the already mentioned I.I.A. property. In fact, in this case, MNL goodness-of-fit measures are clearly worse than the other models and the error in reproducing the true elasticities is absolutely significant. Conversely, NL reproduce very well the true elasticities (perfectly for sample size greater than 5,000 observations). More complex models collapse to a NL and their performances practically coincide among them and with those of the NL.

The second and third correlation experiments (again three alternatives), are instead incompatible with a NL model. In this case, for big sizes of the sample, not only MNL but even NL error in reproducing the true elasticities is significant. But looking at the small sample sizes (200) there is the evidence of a contrasting behaviour. The models with better fitting, i.e. those with higher absolute value of the goodness of fit measures, perform worse in terms of forecasting, i.e. its mean square indicator is higher. See, for instance, the CNL, CoNL and FinMix in Figure 4.4 and, particularly, the FinMix in Figure 4.5.

This “small sample effect”, probably due to overfitting problems, is clearly evident in all the successive figures proposed (see FinMix and CoNL in Figure 4.6 and Figure 4.7, all the complex models in Figure 4.8 and Figure 4.9, the CNL in Figure 4.10).In other words, to express an unbiased forecast, it is necessary to work with a big enough sample of data. Figure 4.11 and Figure 4.12 enhance the same effect on the 4 alternatives context, particularly with reference to the CNL performance.

Increasing to more than a thousand observations, the forecasting capability of the appropriate model, i.e. the models which structurally allows to reproduce the considered correlation scenario, becomes stable.

108

Thus, with appropriate sample size, more complex models generally perform significantly better, even if their more complex expression may generate some algorithmic problems in finding the optimal solution: see for instance the performances of the Fin-Mix in Figure 4.10.

In terms of estimation time, referring to the maximum sample size (106_{observations), MNL}

and NL are generally very efficient with estimation times around few minutes. CNL and CoNL estimation times are a few dozen minutes, while Fin-Mix require generally a few hours.

BC C A B BC     C A B C A B CN L MNL NL CoNL / FinMix C  NL2 B A C   NL1 C A B BC  A B C NL3 CoNL / FinMix  CD    D A B C A B D C BCD D A B BCD C MNL NL CN L BD C  D A C B NL2 BC D  C A D B NL3 CD   D A B C NL1

Figure 4.2: Model’s specification for four alternatives-context. Figure 4.1: Model’s specification for three alternatives-context.

109

Figure 4.3: 0.95-0-0 correlation scenario – synthetic performance plots indicator for forecasting (column 1) and regression (column2 and 3).

110

Figure 4.5: 0.7-0.7-0 correlation scenario – synthetic performance plots indicator for forecasting (column 1) and regression (column2 and 3).

111

Figure 4.7: 0.5-03-0.3 correlation scenario – synthetic performance plots indicator for forecasting (column 1) and regression (column2 and 3).

112

Figure 4.9: 0.8-0.4-0.2 correlation scenario – synthetic performance plots indicator for forecasting (column 1) and regression (column2 and 3).

113

Figure 4.11: 0.95-0-0-0.3-0-0 correlation scenario – synthetic performance plots indicator for forecasting (column 1) and regression (column2 and 3).

114

In document Investigating the potential of the combination of random utility models (CoRUM) for discrete choice modelling and travel demand analysis (Page 105-114)