• No results found

5.2 The Anticipatory Stochastic Multi-Objective Optimization (AS-MOO) Model

5.2.1 Notation and Definitions

There are two possible formulations for AS-MOO: the (a) Time-Linkage (TL); and the (b) Time-Linkage Free (TLF) one; the difference being whether the objective functions at time t depend on the decisions taken at time t − k (t ≥ k > 0). Before stating the two formulations, some notation is needed. We propose modeling the dynamics of the objective functions by means of a hidden random state vector (xt), subject to stochastic dynamics over time. For instance,

in the TL regime, one can assume the state vector to depend linearly on both its previous state and on the decision taken at time t − 1, where the stochastic part may be modeled as additive noise represented in a vector of random disturbances ξt (and e.g. x0 ∼ N (m, Σ)):

xt+1|ut = Axt+ But+ ξt, (5.1)

whereas, for the TLF regime, one may simply take B = 0 and write:

xt+1= Axt+ ξt. (5.2)

Definition 5.1 (Time-Linkage regime). A TL regime is one wherein either:

a) the temporal evolution of the state vector xt is influenced by external actions or decisions

ut−k taken at previous periods; and/or

b) the objective functions f evaluation of a given decision ut depends on the decisions ut−k taken

at previous periods.

Definition 5.2 (Time-Linkage Free regime). A TLF regime is one wherein both:

a) the temporal evolution of the state vector xt is independent of external actions or decisions

ut−k taken at previous periods; and

b) the objective functions f evaluation of a given decision ut is independent of the decisions

ut−k taken at previous periods.

The classification of a temporal regime depicted in Definitions 5.1 and 5.2 is divided in two parts: (a) how past decisions affect the evolution of the state vector; and (b) how they affect the objective functions evaluation. In the TLF regime, both the state vector and the objective functions are independent of previous decisions, whereas in the TL regime, two situations can happen: firstly, if past decisions influence the evolution of the state vector (i.e., if B 6= 0), then they also necessarily influence the current objective functions evaluation, since f is parameterized by the state vector xt. On the other hand, even when the state vector evolution is independent

of past decisions, the objective functions evaluation may still be influenced by past decisions. This is the case, for instance, when there are resource costs for adapting previous decisions towards new ones [179]. The consideration of such costs may alter the optimal anticipatory

5.2. The Anticipatory Stochastic Multi-Objective Optimization (AS-MOO) Model 101

trajectories in the search space over time, when compared to the scenario in which the costs are ignored.

The TL regime can be identified by the notation xt|ut−1:t−k, for 0 < k < t, which denotes

the current hidden state vector given a sequence of past decisions ut−1, · · · , ut−k. However,

in both the TL and TLF regimes, the state may not depend on past decisions. Hence, when

xt|ut−1:t−k ≡ xt, the TL regime is distinguished from the TLF one by the conditional objective

vector notation zt|ut−1:t−k, which denotes that the objective functions evaluation of a current

candidate decision ut depends on a sequence of decisions taken at previous periods of the

decision-making process. In this thesis, we consider the Markov property:

xt|ut−1:t−k ≡ xt|ut−1, (5.3)

zt|ut−1:t−k ≡ zt|ut−1,

that is, the current state vector and objective vector conditioned on the latest k decisions depend only on the near-termly preceding decision. Moreover, we consider that the past state vectors encode all historical data that is relevant for computing the value of the stochastic objective functions at any given decision period. In general, xt−k encodes the parameters of

the underlying probability distributions generating the observed data. For instance, in the case of portfolio selection, xt−k represent the mean vectors of a multivariate Gaussian modeling the

returns distribution. On the other hand, the current state vector xtmay represent a distribution

over the parameters of the data generating process because, generally, a decision must be taken before the current environment can be known with certainty.

Given a fixed decision ut, the vector of objective functions can be denoted in four different

scenarios, depending on the temporal regime we are addressing:

1. In the TLF regime as zt = f (ut, xt), wherein neither the state and the objective vectors

depend on previous decisions;

2. In the TL regime as zt = f (ut, xt|ut−1) to account for the dependency of the state vector

on the previous decision;

3. In the TL regime as zt|ut−1 = f (ut, xt, ut−1) to account for the dependency of the objective

vector on the previous decision, in which case the objective functions are parameterized by both the current candidate decision being evaluated, ut, and the decision taken at the

preceding period, ut−1; or

4. In the TL regime as zt|ut−1 = f (ut, xt|ut−1, ut−1) to account for the dependency of both

the state and objective vectors on the previous decision. The objective vector notation in this case is the same as in the scenario 3.

Note: When hereafter referring to the TL regime, we assume the scenario 3, which best de- scribes the portfolio selection application that will be presented in chapter 7.

The vector-valued objective function f can thus be decomposed in the scenario 3 as follows: zt|ut−1= f (ut, xt, ut−1) = g(ut, xt) + h(ut, ut−1), (5.4)

where g encodes the original m conflicting objective functions of the MCDM problem and h denotes the costs incurring over each optimization criteria when evaluating the decision utgiven

that the decision ut−1 had been taken.

Remark: While the functional form of f is static, it is parameterized by the hidden random state vector, what implies that the statistics from repeated evaluations of f for a fixed decision vector ut may evolve over subsequent decision periods.

We denote the Stochastic Pareto Frontier (SPF) in the TLF regime as Z?

t|ut−1 = Ft(Ω?t, xt) = {f (u?t, xt) : u?t ∈ Ω ?

t}, (5.5)

where Ω?

t is the Pareto Set (PS) at time t and f is evaluated for all u?t ∈ Ω?t. It turns out our

SPF definition is based on the Pareto dominance applied over the objective mean vectors: Definition 5.3 (Stochastic Pareto Frontier). The Stochastic Pareto Frontier (SPF) is the set Z?

t = F (Ω?) composed of all random objective vectors satisfying the property

∀ z? ∈ Z?

, @ z0 ∈ F (Ω) such that E [z0]  E [z?] . (5.6) Put differently, the definition implies that, although the Pareto set is deterministically formed by all non-inferior solutions concerning the mean objective vectors, the SPF is a ran- dom set whose support are subsets of the objective space Z = F (Ω). An example of a SPF approximation with four mutually non-dominated random objective vectors is shown in Fig. 5.2.

Moreover, if UN

t = {ut,1, · · · ut,N} is a finite candidate set of N mutually non-dominated

solutions at time t, then ZN

t = FtN(UtN, xt) can be written in short as the set1 ZtN = {zt,i}Ni=1.

The notation used for the TL case is analogous: ZtN|ut−1 = {zt,i|ut−1}Ni=1, for a decision ut−1

implemented at time t − 1.

Before presenting the proposed AS-MOO models, we demonstrate a novel way of performing partial preference elicitation requiring minimal accountability and involvement for the DM.