Time-Linkage Free Formulation - The Anticipatory Stochastic Multi-Objective Optimization (AS-MO

5.2 The Anticipatory Stochastic Multi-Objective Optimization (AS-MOO) Model

5.2.10 Time-Linkage Free Formulation

The TL-AS-MOO formulation assumed that one decision per period should be taken. There are, however, scenarios for which a fixed decision is intended to operate for multiple periods. It is so e.g. when the costs to modify the solution in execution are prohibitive, what leads to

5.2. The Anticipatory Stochastic Multi-Objective Optimization (AS-MOO) Model 111 ASMOO solver (t = 1) ASMOO solver (t = 2) T(x1, u1, ξ1) ★ (u1,U 1) ★ ★

...

time

...

T(x2, u2, ξ2) ★ (u2,U 2) ★ ★ x2 ASMOO solver (t = 3) {x0,x1,x2} {x0,x1} x2 {γ0,γ1,γ2} {x0} T(x0,ξ 0) x1 x1 x3

...

{γ0,γ1} {γ0}

...

Figure 5.6: Outline of the AS-MOO methodology for online MCDM under uncertainty.

lower frequency re-optimizations. There may also be time constraints that make it impossible to keep the pace of environmental change for optimally adapting current decisions [52]. In such scenarios, a fixed decision should therefore be made robust against all possible disruptive future variations in the operational environment.

The TLF-AS-MOO model thus conveys a similar concept as that of the recently proposed single-objective optimization model known as Robust Optimization Over Time (ROOT) [90,125], whose aim is to find solutions requiring minimal to no change for maintaining good performance in future environments. The intuitive concept of ROOT was described in Jin et al. [125] as the ability of finding solutions “(...) whose quality is acceptable over a certain time interval, although they [may] not be the global optima at any time instant ”.

The task of obtaining fixed decisions is conceptually easier when the dynamics of the problem is not affected by the DM. The TLF-AS-MOO model therefore does not prescribe a decision- making strategy, since that is irrelevant for the dynamics of the problem. Nevertheless, a DM solving a TLF-AS-MOO problem is interested in obtaining a fixed finite approximation to the changing Pareto Set composed of N mutually non-dominated solutions that is robust against predicted changes in the optimization environment over time. The model is defined as:

UN ? t = arg max UN t ⊂Ωt E ( S λtFtNU N t + H−1 X h=1 λt+hFt+hN U N t !) , (5.16) s.t.        xt = T(xt−1, ξt−1), ut⊀ vt ∧ vt ⊀ ut (for ut, vt∈ UtN and ut 6= vt), Pr {zt+h|ut+h−1 ∈ Zγt+h} ≥ , ∀ (0 ≤ h ≤ H − 1), where zt+h= f (ut+h, xt+h) ∈ Ft+hN (Ut+hN ). (5.17)

Contrasting the two AS-MOO models with each other, it is clear from Eqs. (5.16) and (5.17) that not only the vector-valued objective function f in the TLF variant does not depend on past decisions, but also the future random objective vectors within the right-hand side of the summation are computed over the fixed candidate approximation trade-off set, UN

t . Note that

Figure 5.7: Interrelations between the five components of the proposed AS-MOO method.

trade-off solutions in UN

t are evaluated over the upcoming environments, i.e.,

t+h(U

t ) = {f (ut,1, xt+h), · · · , f (ut,N, xt+h)}. (5.18)

For those reasons, the TLF-AS-MOO model does not imply a recurrence equation and the only way to assess the future objective values for a current candidate decision is hence by means of prediction. As likewise indicated in section 5.2.4, this thesis proposes tracking the evolution of a given objective vector over time with Bayesian models, by taking advantage of the observed historical data stream {x0, · · · xt−1} at disposal.

Figure 5.2.10 outlines the information flow in both two proposed AS-MOO models, wherein it is emphasized the unawareness of an AS-MOO solver about the exact value of the current state vector xt, whose outcome is only revealed after an anticipatory decision is taken (in the

TL regime) and/or a finite approximation to the Pareto set is obtained. Also noteworthy is how the available historical data is accumulated over time, allowing the AS-MOO solver to more accurately track the objective vectors/decision vectors/preferred feasible regions bounds distributions.

Remark: The AS-MOO problem is repeatedly solved for each decision period upon the input of new data from the optimization environment.

A general sketch of the AS-MOO solver for approximating the dynamic PF and selecting a trade-off solution from the anticipatory SPF can be visualized in Fig. 5.2.10, assuming a one step ahead prediction (anticipation horizon of, H = 2). The result of one application of an AS-MOO solver at a decision period t is a set of mutually non-dominated -feasible candidate solutions (U_tN ?) maximizing expected hypervolume for H − 1 steps ahead (for both TL and TLF variants) and the indication of the AMFC u?

t ∈ UtN ?that is expected to maximally preserve the

DM partial preference specification for the periods t + h (1 ≤ h ≤ H), for the TL variant. There are five possible modules that can be integrated into AS-MOO, namely: (1) a predictor for the future preference specification of the DM; (2) a predictor for the next state of the optimization environment; (3) a Bayesian estimation module for tracking the decision and objective vectors over time and to compute the corresponding predictive distributions; (4) an

5.3. Summary of the Contributions 113

S-Metric MOO maximizer; and (5) a procedure for identifying the AMFC that maximizes the expected hypervolume for the future decision periods.

The light gray blocks (1), (2), and (5) are optional, since in the first case, the (partial) preferences of the DM can be given a priori. For instance, when there are multiple DMs with a vast range of preferences using the proposed anticipatory MCDM system, the canonical PFRs can be assumed throughout. The second module is optional because it can be very costly and challenging to identify a proper dynamical model for the environment, what requires the AS-MOO methodology to anticipate robust decisions by relying on its own internal prediction models which operate over the decision and objective spaces. That is exactly what module (3) is designed for. The methods used in the third module are described in chapter 6, section 6.1, whereas the anticipatory methods for approximately performing the tasks in module (4) and module (5) are described in section 6.5.

5.3 Summary of the Contributions

This chapter’s contributions to the thesis are as follows:

1. It argued for anticipation as a means to handle uncertainty in Multi-Criteria Decision- Making (MCDM);

2. It presented novel and useful definitions such as Stochastic Pareto Frontier (SPF) and Preferable Feasible Region (PFR) to allow for a better and more comprehensible modeling of Stochastic Multi-Objective Problems (SMOOPs);

3. Two novel Anticipatory SMOO (AS-MOO) models have been formulated in terms of the maximization of hypervolume over time;

4. The concept of preference for flexibility has been explicitly incorporated into the AS-MOO model by suggesting an automated online decision-making strategy for choosing a solution which is foreseen to yield to future stochastic Pareto Frontiers of maximal hypervolume; 5. Prediction has been susggested to reduce the exponential computational costs of traversing

the decision-tree which is implicit when solving the time-linkage AS-MOO model.

In the next chapter, we provide Bayesian anticipatory learning models and anticipatory multi-objective meta-heuristics for evolving a population of candidate trade-off choices approximating the anticipatory stochastic Pareto Frontier. Moreover, a procedure to identify the Anticipated Maximal Flexible Choice (AMFC) is proposed.

Chapter

6

Learning to Anticipate Flexible Trade-off

Choices

Humility is the only true wisdom by which we prepare our minds for all the possible changes of life.

– George Arliss

The important thing is this: to be able at any moment to sacrifice what we are for what we could become.

– Charles Du Bos

While future hypervolume maximization can postpone the assignment of relative impor- tances over the decision criteria, the DM hesitation about time performance can be handled via Online Anticipatory Learning (OAL). OAL refer to methods for (i) self-adjusting the DM willingness to near-term performance; and (ii) for incorporating predictive knowledge into Antic- ipatory Stochastic Multi-Objective Optimization (AS-MOO) solvers, mediated by the perceived temporal predictability based both on historical trajectory errors and predictive knowledge.

The assumption in OAL is that the more unpredictable the future, the more eager for immediate performance the DM should be, since, by construction, optimizing solutions for early performance is safer in this case. Conversely, the more certain the future, the more the DM is willing to capitalize on foreseen opportunities. OAL thus make use of Bayesian tracking in both the objective and the search spaces to approximate AS-MOO solutions for which improved performance on the targeted environments is predicted, in terms of PD.

In AS-MOO, the set Λt:t+H−1 = {λt, · · · , λt+H−1} encodes time preferences. In dynamic

models, such discount factors are often heuristically determined a priori as a monotonically decreasing function – often decaying at an exponential rate – over time, in order to model a DM that prefers more near-term optimized performance. As previously discussed in chapter 3, section 3.3.1, the parameterization of an optimization model can drastically alter the obtained sequence of decisions satisfying the chosen parameters. Furthermore, eliciting Λt:t+H−1 when N

incomparable solutions are to be simultaneously searched for is challenging: the future outcomes of some alternatives may be more predictable than those of others. Hence, instead of pursuing a

global set, we propose self-adjusting N independent sets, Λ(1)_t:t+H−1, · · · , Λ(N )_t:t+H−1, so that portions of the stochastic Pareto frontier can be approximated under varying time preferences.

In the following, we present the proposed OAL methods for incorporating predictive knowledge into (a) the objective (or performance) space; and (b) the search space of an AS-MOO solver. We hereafter refer to any λ ∈ Λt:t+H−1 as the anticipation rate.

6.1 Online Anticipatory Learning in the Objective Space

Since the current environment xt is unknown, for the sake of applying OAL in the objec-

tive space, we need a way to estimate the current objective values (performance levels) zt =

(f1(ut, xt) · · · fm(ut, xt))| for a fixed trade-off solution vector1, based on the observed trajec-

tory Zt−k:t−1 = {zt−k, · · · , zt−1} (see Fig. 6.1). The underlying stochastic uncertainty regarding

the true value of zt is modeled as a multivariate Gaussian distribution over the j = 1, · · · , m

values received from each of the m objective functions, i.e., zt∼ N (mzt, Σzt), for any decision

period at t − k, · · · , t, · · · , t + H − 1. For a given candidate trade-off solution, we thus propose predicting the unknown objective vector zt by using the Kalman Filter (KF) estimation

presented in chapter 2, see section 2.3.4.

In document Antecipação na tomada de decisão com múltiplos critérios sob incerteza (Page 148-154)