• No results found

Concurrency as the origin of complexity

2.2 Time and MDPs

2.2.5 Concurrency as the origin of complexity

With the TMDP model, we have introduced continuous observable time in MDPs and are now able to represent time-dependent stochastic problems of decision under uncertainty. However, it appears that when it comes to writing the transition and reward models for real-world examples in TMDP models the task can become incredibly complicated. The first reason for this difficulty is that the overall stochastic behaviour of temporal Markov decision problems often results from the concurrent influence of several separate stochastic processes (as in the subway, airport or coordination problem). On top of that, when one allows for several actions to be undertaken simultaneously, the possible branching factor in the policy search explodes. These two aspects come from the fact that we allowed concurrency in two different ways.

In the CoMDP model ([Mausam and Weld, 2005]), Mausam tackled the problem of au- thorizing the combination of different actions to be undertaken at the same time. However, the framework of CoMDP remained in a discrete time setup with fixed time steps. Since our focus here is on time-dependency and temporal complexity, we won’t enter the CoMDP model in detail. We will remain in the framework of sequential decision theory and thus will not consider the combinatorial complexity of allowing concurrent actions to be undertaken at the same time. However, our final conclusions will show how our results extend to the case of these concurrent actions.

The complexity of our problems comes from the fact that — in the subway problem for example — different simple stochastic processes affect the same common state space. Pre- dicting the next state of the system implies taking into account in the transition function the probability that the first event to trigger is the arrival of a passenger at station 1, or the arrival of a passenger at station 2, or a train movement between station 5 and 6, etc. Additionally to the events’ concurrence — which introduce a first modeling difficulty — the individual processes are themselves time-dependent, adding to the complexity of the global process’ behaviour. This simple example gives both an idea of the origin of our problem’s modeling complexity and a hint as how to go around this difficulty.

Considering concurrent continuous-time stochastic processes is a framework known in the stochastic processes literature as generalized processes. It doesn’t really make sense to consider Generalized Markov Process since they would all be synchronous and would result in a trivial global Markov Process. However, as soon as we allow for real-valued stochastic transition times, then having several concurrent processes induces a new kind of non-trivial stochastic processes. The concurrent execution of several semi-Markov processes (SMPs) affecting the same state space results in a global stochastic process called a Generalized Semi-Markov Process (GSMP). GSMPs were first introduced in [Glynn, 1989] and have

Chapter 2. Temporal Markov Decision Problems — Modeling

been extensively studied in the stochastic processes and discrete event systems literature (as in [Nielsen, 1998] for example).

Chapter 11 will present GSMPs more in detail and will highlight their general relation with the global discrete events systems (DEVS, [Zeigler, 1976]) theory. Formally, a GSMP (Cf. [Glynn, 1989] for further details) is described by a set S of states and a set E of events. At any time, the process is in a state s and there exists a subset Es of events that are called

active or enabled. These events represent the different concurrent processes that compete for the next transition. To each active event e, we associate a clock ce representing the duration

before this event triggers a transition as presented on figure 2.6. This duration would be the sojourn time in state s if event e was the only active event. The event e∗ with the smallest

clock ce∗ (the first to trigger) is the one that takes the process to a new state. The transition

is then described by the transition model of the triggering event: the next state s0 is picked

according to the probability distribution Pe∗(s0|s). In the new state s0, events that are not

in Es0 are disabled (which actually implies setting their clocks to +∞). For the events of

Es0, clocks are updated the following way:

• If e ∈ Es\ {e∗}, then ce← ce− ce∗

• If e 6∈ Es or if e = e∗, pick ce according to Fe(τ|s0)

The first active event to trigger then takes the process to a new state where the above oper- ations are repeated. The framework of GSMPs could be compared with the (deterministic) framework of Timed Automata ([Alur and Dill, 1994]).

s1 Es1 : e2 e4 e5 e7 s2 Pe4(s′|s1) Es2 : e2 e3 e7 Pe7(s′|s2) Figure 2.6: Illustration of a GSMP

One first important remark concerning GSMPs is that the overall process does not re- tain Markov’s property anymore: knowing the current state s is not sufficient to predict the distribution on the next state of the process. [Nielsen, 1998] showed that by augmenting the state space with the events’ clocks, one could retain the Semi-Markov behaviour for a GSMP. Introducing action choice in a GSMP yields a GSMDP as defined by [Younes and Sim- mons, 2004]. In a GSMDP, we identify a subset A of controllable events or actions, the remaining ones are called uncontrollable or exogenous events. Actions can be enabled or disabled at will and the subset As = A ∩ Es of activable actions is never empty since it

always contains at least the “idle” action a∞ (whose clock is always set to +∞) which, in

fact, does nothing and lets the first exogenous event take the process to a new state. As in the MDP case, searching for control strategies on GSMDP implies defining rewards r(s, e) or r(s, e, s0) associated to transitions and introducing policies and criteria.

2.3. Similarities and differences with “classical” MDP problems The GSMDP framework, with and without continuous observable time, will be developed in chapters 11 and 13. In chapter 13 we will especially focus on designing efficient algorithms for solving time-dependent GSMDPs.