Simplified estimation - Logit method - Motivation for the predictive choice model

Search intensification using the predictive choice model

5.2 Motivation for the predictive choice model

5.3.3 Logit method

5.3.3.4 Simplified estimation

Until now, the parameters

β

₁ and

β

₂ have been estimated by the maximum likelihood method. Unfortunately, this method requires a significant computational effort and needs to be applied many times during the search.

In this section, we introduce a simplified estimation of the utility’s parameter values. Since the number of occurrences of choice values in X^* will be treated separately by the proportional method, the logit method only accounts for the total constraint violation.

However, in the logit method, we may have to set the choice specific parameter

β

₁ to a small value, e.g.

β

₁ = 0.05, so that the utility of one alternative is preferred to the other. This

is because an equal utility lies outside the assumption of the choice theory (Ben-Akiva and Lerman, 1985).

Instead of considering h₀ and h₁ separately, the absolute difference between h₀ and h₁,

∆h is used in order to characterise the value choice selection. ∆h is defined as follows:

1 0

h h

h h h

= −

∆ (5.13)

where: h₀ and h₁ are the average total violations when a variable is trial flipped or assigned a value 0 and 1 respectively.

From (5.13), when the value of ∆h is large, the probabilities of two alternatives (value 0 and 1) would be significantly different, and when ∆h = 0, the probabilities of the two alternatives would tend to be equal. ∆h is shown in a proportional scale so that the formulation could be generalised for the problem in which the total violation and the number of flip trials can be varied. Then we present a simplified estimation of the violation parameter

β

₂ as follows:

β

₂ =−∆h (5.14)

Now finding the utility’s parameter values only requires a very little computational effort. It is noted that in this study

β

₁ could either be set to any small positive or negative value,

whilst

β

₂ is always set to be negative because the constraint violation represents disutility, i.e. value 0 is preferred to 1 when h₀ is less than h₁.

Table 5.7 show the results obtained by the predictive choice model using the simplified estimation for logit method. The experiment uses the same test data as shown in Table 5.6.

For all test variables, the number of flip trials N = 20, the choice specific parameter

β

₁ is set to 0.05, and the decision parameter D is set to 70 (more experimental results are shown in Appendix B)

Table 5.7 shows that the predicted value for the test variable is close to its known optimal value with PC = 75%. This PC value is sufficiently high to allow us to have some confidence in the predictive choice model, and indicates that the result obtained by the simplified estimation for logit method serves almost as well as the result from the more complicated mathematical one used in Table 5.6.

In Tables 5.6 – 5.7 (and Appendix B), we observe that the predictions when P is between ₀ 0.45 – 0.55 are not very accurate. Therefore, the predictive choice model discards those predictions in order to increase the accuracy of forecast for all variables.

Var Violation ℵ^* Probability Φ

no. h0 h₁ ^{0 1}P₀ P₁

1 14.35 16.15 7 13 0.54 0.46 1

2 11.80 22.25 17 3 0.85 0.15 0 3 10.75 21.00 9 11 0.97 0.03 0 4 19.85 16.75 17 3 0.85 0.15 0 5 12.60 19.35 15 5 0.75 0.25 0 6 12.30 18.50 18 2 0.90 0.10 0 7 9.95 21.00 12 8 0.98 0.02 0

8 11.70 13.70 8 12 0.55 0.45 1

9 8.25 19.00 11 9 0.99 0.01 0 10 9.90 20.65 12 8 0.98 0.02 0 11 9.60 15.50 19 1 0.95 0.05 0

12 14.70 13.05 8 12 0.49 0.51 1

13 10.55 18.25 20 0 1.00 0.00 0

14 15.50 14.45 11 9 0.50 0.50 1

15 10.15 17.40 20 0 1.00 0.00 0

16 15.20 12.25 2 18 0.10 0.90 1

17 14.30 12.41 12 8 0.48 0.52 0

18 12.20 12.10 14 6 0.70 0.30 0

19 13.40 11.50 6 14 0.30 0.70 1

20 12.98 14.40 11 9 0.53 0.47 1

Table 5.7: Value choice prediction by the simplified estimation for logit method.

As an outcome of the predictive choice model is a probability of choosing value for a variable, the variable will be fixed at its predicted value for a number of iterations

determined by the magnitude of the probability. During these iterations other variables may become fixed. When the fixing iteration for a variable is reached, it is freed and its violation history is refreshed. Before we describe how the predictive choice model is used to improve the container rail schedule obtained by CLS, related work on the construction and use of probabilistic models in the search algorithm will be discussed.

5. 3.3.5 Related work

The predictive choice model has some general similarities with both adaptive techniques used in local search and population-based search techniques. For example, algorithms based on these techniques are: adaptive Tabu search (Glover and Laguna, 1997), variable neighbourhood search (Hansen and Mladenovic, 2001), ant colony optimisation (Blum et al, 2001), estimation of distribution algorithms (EDAs) (Pelikan et al, 1999), etc. These algorithms do not rely on problem specific designs of heuristics and do not need many tuning parameters. Instead they build probabilistic models to keep fixed some variables during the search or to predict the movements of populations in a probabilistic way. Such probabilistic models attempt to draw inferences specific to the complex combinatorial problem being solved and therefore may be regarded as adaptive processes that learn domain knowledge implicitly.

In most local search algorithms based on probabilistic models, the problem specific interactions amongst the decision variables in the combinatorial optimisation model are kept in mind implicitly, whereas in our predictive choice learning algorithm, as well as EDAs, the interactions are treated explicitly through the probability distribution associated with the variables selected at each sample. In EDAs, often incorporated in genetic algorithm

(Goldberg, 1989), a probabilistic model for selecting promising solutions is constructed, based on the observed probability distribution. The new solutions are generated by the probabilistic model. The observed distributions of the interactions amongst variables are estimated and are often displayed as complex chains or networks.

We consider the deterministic and non-deterministic (random) interactions amongst the variables separately. The logistic probability distribution of the non-deterministic behaviour in choosing a good value for a variable is assumed in order to derive a specific probabilistic model; thereby it makes our model easy to develop and convenient to use. The model learns from the search history, the outcome in terms of a probability is used to influence the search.

In general, the probability distribution in EDAs is categorised into three types according to the interactions amongst the variables (Larraaga and Lozano, 2002). These are: no interactions, pairwise interactions, and multivariate interactions. No interactions assumes that all variables in a problem are independent, i.e. the search algorithm only looks at the values of each variable regardless of the remaining variables. The second type assumes that the variables in the combinatorial optimisation model are largely independent except for some pairwise interactions. In this type, the joint probability distribution of the interactions between pairwise variables is constructed. The last is the most complex and requires significant computational effort, but in return for a potentially better result. It assumes multivariate interactions amongst the variables. The variables may be divided into a number of independent clusters, and the conditional probabilities are estimated within each cluster.

As the search history for our predictive choice model is based on a two-variables selection scheme, it is closely related to the pairwise interactions of EDAs. In our model, the joint

probability distribution of the interaction between variables is estimated by observing the constraint violations resulting from assigning values for two variables.

In addition, the application of our predictive choice model and EDAs is different. In EDAs, the model is used to predict the movements of the solution (populations) in the search space.

For instance, when applied to genetic algorithms (Pelikan et al, 1999), EDA generates new solutions by sampling the constructed probability distribution of the promising solutions (i.e.

EDAs do not use crossover and mutation). In our application, the probabilistic model is used to predict a choice of good value for an individual binary variable. With sufficient trial history, the model predicts likely optimal values for variables.

The next section describes how the predictive choice model is used to improve the solution of the container rail scheduling problem.

In document Constraint-Based Local Search for Container Rail Scheduling (Page 141-147)