• No results found

Macrostate-Dependent Network Reward Model

Chapter 4 Approximate Approach to Online Resource Allocation

4.2. Macrostate-Dependent Network Reward Model

simplification of the state-dependent link model in Equation (4.14a). The purpose is to calculate the link reward rate 𝑅𝑙(Ξ ) as a function of the link macrostates. Since the macrostate-space Ω

𝐧

𝑙 has a cardinality |Ω𝐧𝑙| β‰ͺ |Ω𝑙𝐱|, the computational complexity of the state-dependent model (for large networks) is avoided, which makes the reward maximization problem, under the link independence assumption, solvable.

Any link state transition 𝐱𝑙→ 𝐲𝑙 is caused either by the admission or departure of a connection. These two events manifest through changes in the traffic carried by the link. This traffic was defined in Chapter 3 as the link macrostate 𝐧𝑙= (𝑛1𝑙, … , 𝑛

𝑗𝑙, … , 𝑛𝐽𝑙), where 𝑛𝑗𝑙 is the number of carried class-j connections. Let 𝛅𝑗𝑙 be a 𝐽-dimensional vector with a one in position 𝑗 and zeros in the other positions, and let us assume that the link state 𝐱𝑙 defines a carried traffic 𝐧𝑙. If the transition 𝐱𝑙→ 𝐲𝑙 is caused by a class-j arrival, the carried traffic changes as 𝐧𝑙→ 𝐧𝑙+ 𝛅

Figure 4.3: State and macrostate spaces for a single-link network that serves two connection classes.

from a class-j departure, the carried traffic varies as 𝐧𝑙→ π§π‘™βˆ’ 𝛅

𝑗𝑙= (𝑛1𝑙, … , π‘›π‘—π‘™βˆ’ 1, … , 𝑛𝐽𝑙). In both cases, the macrostates 𝐧𝑙+ 𝛅

𝑗𝑙 and π§π‘™βˆ’ 𝛅𝑗𝑙 are defined by their corresponding destination states 𝐲𝑙. In Chapter 3, it was also shown that any link macrostate 𝐧𝑙 is explicitly defined by a sub-set of network states 𝐱, i.e. there can be different spectrum configurations 𝐱 that define the same macrostate 𝐧𝑙, and thereby different link states 𝐱𝑙 may represent the same traffic load 𝐧𝑙. Let X̂𝐧𝑙 be the set of link states 𝐱𝑙 that yield the carried traffic 𝐧𝑙. Then we have that the macrostate transitions 𝐧𝑙→ 𝐧𝑙+ 𝛅

𝑗𝑙 and 𝐧𝑙→ π§π‘™βˆ’ 𝛅𝑗𝑙 need not be caused by a unique transition 𝐱𝑙→ 𝐲𝑙. In general, the transition 𝐧𝑙→ 𝐧𝑙+ 𝛅

𝑗𝑙 can be caused by any of the possible link state transitions 𝐱𝑙→ 𝐲𝑙, such that π±π‘™βˆˆ X̂𝐧𝑙 and π²π‘™βˆˆ XΜ‚

𝐧+𝛅𝑗𝑙

𝑙 . Similarly, 𝐧𝑙→ π§π‘™βˆ’ 𝛅

𝑗𝑙 can be caused by any of the possible link state transitions 𝐱𝑙→ 𝐲𝑙, such that π±π‘™βˆˆ XΜ‚

𝐧𝑙 and π²π‘™βˆˆ XΜ‚π§βˆ’π›…

𝑗 𝑙

𝑙 .

Example 4.3 Consider the single-link network shown in Fig. 4.3. It has a capacity of 𝐢𝑙= 6 slots and serves two connection classes with bandwidths 𝑏1= 2 and 𝑏2= 4. The link state-space Ω𝑙𝐱 consists of 18 states 𝐱𝑙 denoted as 𝐱𝑙= (π‘₯

class-1 arrival occurs in state π›˜2= (1, ∞, 0,0,0,0), the policy in Fig. 4.3 says that the connection must be admitted in slots 3-4. This decision causes the transition π›˜2β†’ π›˜7, with π›˜7= (1, ∞, 1, ∞, 0,0), and thereby the traffic load (2,0) is now carried . By following this reasoning, the remaining aforementioned four state transitions are obtained. Similarly, we have that the transition (1,0) β†’ (1,1), due to a class-2 arrival, can only be caused by any of the two transitions π›˜2β†’ π›˜18 or π›˜6β†’ π›˜17, with π›˜17, π›˜18∈ XΜ‚(1,1)𝑙 . Furthermore, a class-1 departure causes the transition (1,0) β†’ (0,0), which may originate from the five transitions π›˜2β†’ π›˜1, π›˜3β†’ π›˜1, π›˜4β†’ π›˜1, π›˜5β†’ π›˜1 and π›˜6β†’ π›˜1, with π›˜1∈ XΜ‚(0,0)𝑙 . This example evinces that, in general, macrostate transitions 𝐧𝑙→ 𝐧𝑙± 𝛅

𝑗𝑙 are not necessarily caused by a unique transition 𝐱𝑙→ 𝐲𝑙.

For a given state 𝐱𝑙, Equation (4.14a) calculates 𝑅𝑙(Ξ ) as a linear function of all transitions 𝐱𝑙→ 𝐲𝑙 caused by arrivals and departures when the link is in state 𝐱𝑙. To simplify this approach, let us use the fact that different link states 𝐱𝑙 may represent the same traffic 𝐧𝑙. Then, for every 𝐧𝑙, we define an equation that calculates 𝑅𝑙(Ξ ) as a linear function of all transitions 𝐧𝑙→ 𝐧𝑙± 𝛅

𝑗𝑙. By doing so, the link states that may define the macrostate 𝐧𝑙, i.e. all π±π‘™βˆˆ X̂𝐧𝑙, are no longer described by separate equations, but by a single one that implicitly takes into account all the transitions 𝐱𝑙→ 𝐲𝑙 that cause 𝐧𝑙→ 𝐧𝑙± 𝛅

𝑗𝑙. Therefore, let {𝐍𝑑𝑙,Ο€, 𝑑 β‰₯ 0} be a continuous-time stochastic process that models the time evolution of the macrostate of a link 𝑙 under a policy Ξ . The random variable 𝐍𝑑𝑙,Ο€ takes its values from the macrostate-space Ω𝐧𝑙 and thereby, 𝐍𝑑𝑙,Ο€ is the traffic carried by the link at time 𝑑. By following the same analysis made for the exact network reward model in Section 3.3.2 - Chapter 3, we have that for a single-link network, the process {𝐍𝑑𝑙,Ο€, 𝑑 β‰₯ 0} yields reward at a rate 𝑅𝑙(Ξ ) given by [RB16a, RB16b, RB17a]:

𝑅𝑙(Ξ ) = π‘ž(𝐧𝑙) + βˆ‘ πœ†

𝑗𝑙(𝐧𝑙, Ξ ) βˆ™ [𝑣(𝐧𝑙+ 𝛅𝑗𝑙, Ξ ) βˆ’ 𝑣(𝐧𝑙, Ξ )] 𝐽

𝑗=1 +

βˆ‘π½π‘—=1π‘›π‘—π‘™βˆ™ Β΅π‘—βˆ™ [𝑣(π§π‘™βˆ’ 𝛅𝑗𝑙, Ξ ) βˆ’ 𝑣(𝐧𝑙, Ξ )] , π§π‘™βˆˆ Ω𝐧𝑙 (4.16a) The first summation is the contribution to 𝑅𝑙(Ξ ) owing to connection requests which arrive at the link in macrostate 𝐧𝑙. Connections arrive at a macrostate-dependent rate πœ†

𝑗𝑙(𝐧𝑙, Ξ ), which is implicitly defined (as it will be shown in Section 4.3) by the arrival rates πœ†π‘—π‘™(𝐱𝑙, Ξ ) and πœ†π‘—π‘™(Ξ ). The second summation is the contribution to the link reward rate due to departures in macrostate 𝐧𝑙. Furthermore, π‘ž(𝐧𝑙) is the rate at which the link yields reward in macrostate 𝐧𝑙 and is given by [RB16a, RB16b, RB17a]:

π‘ž(𝐧𝑙) = βˆ‘ π‘Ÿ

π‘—π‘™βˆ™ π‘›π‘—π‘™βˆ™ ¡𝑗 𝐽

𝑗=1 (4.16b) where π‘›π‘—π‘™βˆ™ ¡𝑗 is the termination rate of class-j traffic in macrostate 𝐧𝑙. Moreover, in Equation (4.16a), 𝑣(𝐧𝑙, Ξ ) is the transient reward value for having set the link in macrostate 𝐧𝑙 at time 𝑑o. Under the macrostate-dependent description, we obtain two equations analogous to Equations (4.14d)-(4.14e):

𝑉(𝐧𝑙, Ξ , 𝑑) = 𝑅𝑙(Ξ ) βˆ™ 𝑑 + 𝑣(𝐧𝑙, Ξ ) , π§π‘™βˆˆ Ω𝐧𝑙 (4.16c) 𝑔𝑗𝑙(𝐧𝑙, Ξ ) = 𝑉(𝐧𝑙+ 𝛅𝑗𝑙, Ξ , 𝑑) βˆ’ 𝑉(𝐧𝑙, Ξ , 𝑑) = 𝑣(𝐧𝑙+ 𝛅𝑗𝑙, Ξ ) βˆ’ 𝑣(𝐧𝑙, Ξ ) (4.16d) with 𝑉(𝐧𝑙, Ξ , 𝑑) being the reward expected at 𝑑 ≫ 𝑑0, if the link was in macrostate 𝐧𝑙 at 𝑑0. Furthermore, Equation (4.16d) defines 𝑔𝑗𝑙(𝐧𝑙, Ξ ) as the macrostate-dependent link reward gain which is the long-term

Figure 4.4: System of linear equations in the state-dependent model for the link in Fig. 4.3.

reward that a class-j connection brings to the link if it gets admission in macrostate 𝐧𝑙. From this gain, we approximate the state-dependent network reward gain 𝑔𝑗(𝐲, 𝐱, Ξ ) as:

𝑔𝑗(𝐲, 𝐱, Ξ ) β‰ˆ βˆ‘π‘™βˆˆπœŒπ‘”π‘—π‘™(𝐧𝑙, Ξ ) (4.16e) i.e. to calculate 𝑔𝑗(𝐲, 𝐱, Ξ ), it is only necessary to know the macrostate-dependent gains of the links in the path 𝜌 used by the connection.

Equation (4.16a) defines a linear system with |Ω𝐧𝑙| equations. For all macrostates 𝐧𝑙, the parameters π‘ž(𝐧𝑙), πœ†

𝑗𝑙(𝐧𝑙, Ξ ) and πœ‡π‘— are known. Therefore, under the macrostate-dependent description of 𝑅𝑙(Ξ ), the calculation of the network reward rate 𝑅(Ξ ) is performed as follows. First, the parameters (π‘œ, 𝑑)𝑗, πœ†π‘—, Β΅π‘—βˆ’1, 𝑏𝑗, Γ𝑗, and π‘Ÿπ‘— are defined along with the policy Ξ . Second, for each link, these parameters and the link capacity 𝐢𝑙 are used to calculate the macrostate-space Ω𝐧𝑙 (every macrostate must fulfil the capacity constraint βˆ‘π½π‘—=1π‘π‘—βˆ™ 𝑛𝑗𝑙≀ 𝐢𝑙). Third, for every link, the linear system is defined from Equation (4.16a) based on the decisions of the policy Ξ . Then each system is solved for 𝑅𝑙(Ξ ) and the values 𝑣(𝐧𝑙, Ξ ). Finally, Equation (4.6) is used to calculate 𝑅(Ξ ) as the sum of the link rates 𝑅𝑙(Ξ ).

Example 4.5 For the flex-grid link in Fig. 4.3, the policy Ξ 1 uses first-fit (FF) as spectrum allocation policy [TAK+14]. In an ordered spectrum grid with 𝐢𝑙 slots indexed as 1,2, … , 𝐢𝑙, first-fit assigns the

available 𝑏𝑗 adjacent slots with the lowest indices in the grid. Consider the link state π›˜4= (0,0,1, ∞, 0,0). The policy decision Ξ 1(π›˜4, 1) says that if a class-1 arrival occurs in state π›˜4, it must be admitted by allocating the slots 1-2, thereby causing the transition π›˜4β†’ π›˜7= (1, ∞, 1, ∞, 0,0). In this case, the slots 1-2 are the two available adjacent slots with the lowest indices in the grid. For this spectrum allocation policy, we present in Fig. 4.4 the system of 18 linear equations defined by Equation (4.14a). Let us check the equation for π›˜4. In this state, the link only carries a class-1 connection, i.e. 𝐧𝑙= (1,0). Then the link earns reward at a rate π‘ž(π›˜4) = π‘Ÿ1π‘™βˆ™ πœ‡1 (ru/uot). Only two transitions are possible in π›˜4, namely, π›˜4β†’ π›˜1 which occurs when the carried connection departs, and π›˜4β†’ π›˜7, due to a class-1 arrival. As seen in Fig.

Figure 4.5: System of linear equations in the macrostate-dependent model for the link in Fig. 4.3.

4.3, class-2 arrivals are rejected (i.e. Ξ 1(π›˜4, 2) = π·π‘œ π‘›π‘œπ‘‘ π‘Žπ‘‘π‘šπ‘–π‘‘) as in π›˜4 there are no four available slots that fulfil the contiguity constraint. Since only two transitions are possible, in Equation (4.14a) we have that Ξ“π›˜1+4 = {π›˜7} and Ξ“π›˜2+4 = {πœ™} for the first double summation. Similarly, Ξ“π›˜1βˆ’4 = {π›˜1} and Ξ“π›˜1βˆ’4 = {πœ™} for

the second one. Hence, for π›˜4, we have that 𝑅𝑙(Ξ 1) = π‘Ÿ1π‘™βˆ™ πœ‡1+ πœ†π‘™1(π›˜4, Ξ 1) βˆ™ [𝑣(π›˜7, Ξ 1) βˆ’ 𝑣(π›˜4, Ξ 1)] + πœ‡1βˆ™ [𝑣(π›˜1, Ξ 1) βˆ’ 𝑣(π›˜4, Ξ 1)]. To calculate 𝑅𝑙(Ξ 1), the linear system in Fig. 4.4 is solved for 𝑅𝑙(Ξ 1) and the 18 values 𝑣(π›˜π‘–, Ξ 1).

Example 4.6 For the flex-grid link in Fig. 4.3, the macrostate-dependent reward model is defined by

a linear system with six equations. That system is defined in Fig. 4.5, where the macrostates are indexed as 𝐧𝑖, 𝑖 = 1, … ,6. For each macrostate 𝐧𝑖, the corresponding equation describes the effect that the states in X̂𝐧𝑙𝑖 have on 𝑅𝑙(Ξ 

1). To illustrate this, let us consider the macrostate 𝐧2= (1,0). In the state-dependent model, the five states that configure this macrostate (i.e. those in XΜ‚(1,0)𝑙 = {𝝌2, 𝝌3, 𝝌4, 𝝌5, 𝝌6}) define five equations in Fig. 4.4. In the macrostate-dependent model, these five equations are simplified by the equation for 𝐧2 shown in Fig. 4.5, namely 𝑅𝑙(Ξ 1) = π‘Ÿ1π‘™βˆ™ Β΅1+ πœ†π‘™1(𝐧2, Ξ 1) βˆ™ [𝑣(𝐧3, Ξ 1) βˆ’ 𝑣(𝐧2, Ξ 1)] + πœ†2𝑙(𝐧2, Ξ 1) βˆ™ [𝑣(𝐧6, Ξ 1) βˆ’ 𝑣(𝐧2, Ξ 1)] + Β΅

1βˆ™ [𝑣(𝐧1, Ξ 1) βˆ’ 𝑣(𝐧2, Ξ 1)]. Observe that the five equations in Fig. 4.4, for the states in XΜ‚(1,0)𝑙 , define π‘ž(π›˜π‘–) = π‘Ÿ1π‘™βˆ™ πœ‡1 which coincides with π‘ž(𝐧2) = π‘Ÿ1π‘™βˆ™ πœ‡1. Furthermore, notice that the simplified equation is a linear function of the transitions 𝐧2β†’ 𝐧2Β± 𝛅𝑗𝑙. These transitions were defined in Example 4.4, and they are: 𝐧2= (1,0) β†’ 𝐧3= (2,0), 𝐧2= (1,0) β†’ 𝐧6= (1,1) and 𝐧2= (1,0) β†’ 𝐧1= (0,0). From this, the reward change caused by the transition 𝐧2β†’ 𝐧3, i.e. by the admission of a class-1 connection, is given in the macrostate equation by 𝑣(𝐧3, Ξ 1) βˆ’ 𝑣(𝐧2, Ξ 1). This reward change can be caused by any of the five state transitions studied in Example 4.4, namely, π›˜2β†’ π›˜7, π›˜3β†’ π›˜10, π›˜4β†’ π›˜7, π›˜5β†’ π›˜8 and π›˜6β†’ π›˜9. These transitions originate from class-1 requests admitted when the link is in macrostate 𝐧2= (1,0). A similar interpretation follows for the reward changes 𝑣(𝐧6, Ξ 1) βˆ’ 𝑣(𝐧2, Ξ 1) and 𝑣(𝐧1, Ξ 1) βˆ’ 𝑣(𝐧2, Ξ 1). With the macrostate-dependent description, the link reward rate 𝑅𝑙(Ξ 

1) is calculated by solving the linear system in Fig. 4.5 for 𝑅𝑙(Ξ 1) and the six values 𝑣(𝐧𝑖, Ξ 1). This example shows the reduction that the macrostate-dependent description has on the size of the linear system: from 18 equations in the state-dependent model to six equations defined in the set of macrostates Ω𝐧𝑙.

4.2.1 Properties of the Macrostate-Dependent Reward Model

The macrostate-dependent model is an approximation to the state-dependent description. The accuracy of the approximation depends on the estimation of the rates πœ†π‘—π‘™(𝐧𝑙, Ξ ), as they determine the transition rates between the macrostates in Ω𝐧𝑙. This issue will be studied in Section 4.3, where it will be shown that πœ†π‘—π‘™(𝐧𝑙, Ξ ) implicitly quantifies the effect that the contiguity constraint has on the process {𝐍𝑑𝑙,Ο€, 𝑑 β‰₯ 0}. This process defines the following properties for the macrostate-dependent model described by Equations (4.16):

1. Every macrostate π§π‘™βˆˆ Ω𝐧𝑙 is defined by a sub-set of link states π±π‘™βˆˆ X̂𝐧𝑙 that fulfil the contiguity and capacity constraints.

2. For all π±π‘™βˆˆ X̂𝐧𝑙, it cannot be stated that the equality 𝑣(𝐱𝑙, Ξ ) = 𝑣(𝐧𝑙, Ξ ) always holds. Although the states in X̂𝐧𝑙 represent the same traffic load 𝐧𝑙, they define different spectrum configurations, and thus, they may have different effects on the link reward process. Therefore, for any 𝐱𝑙, π³π‘™βˆˆ X̂𝐧𝑙 , it is possible that 𝑣(𝐱𝑙, Ξ ) β‰  𝑣(𝐳𝑙, Ξ ), and thereby 𝑣(𝐧𝑙, Ξ ) β‰  𝑣(𝐱𝑙, Ξ ) β‰  𝑣(𝐳𝑙, Ξ ). 3. The purpose of the macrostate-dependent reward model is to circumvent the complexity of the

state-dependent description. This is accomplished by calculating a unique value 𝑣(𝐧𝑙, Ξ ) that represents all the states in X̂𝐧𝑙. The advantage of this strategy is that it makes the linear system in Equation (4.16a) solvable. The drawback is that from the solution to the system, the values 𝑣(𝐱𝑙, Ξ ) of the states in XΜ‚

𝐧𝑙 remain unknown. The transient values 𝑣(𝐧𝑙, Ξ ) are then used as an approximation to the values 𝑣(𝐱𝑙, Ξ ) of the states in X̂𝐧𝑙.

4. If a class-j connection is admitted to link 𝑙 in state 𝐱𝑙, such that it causes the transition 𝐱𝑙→ 𝐲𝑙, it brings to the link a short-term reward π‘Ÿπ‘—π‘™ and a long-term reward 𝑔𝑗𝑙(𝐲𝑙, 𝐱𝑙, Ξ ) = 𝑣(𝐲𝑙, Ξ ) βˆ’ 𝑣(𝐱𝑙, Ξ ). Under the macrostate-dependent reward model, the actual reward 𝑔

𝑗𝑙(𝐲𝑙, 𝐱𝑙, Ξ ) cannot be calculated, as the values 𝑣(𝐱𝑙, Ξ ) and 𝑣(𝐲𝑙, Ξ ) are unknown. Instead, it is approximated as 𝑔𝑗𝑙(𝐲𝑙, 𝐱𝑙, Ξ ) β‰ˆ 𝑔𝑗𝑙(𝐧𝑙, Ξ ) = 𝑣(𝐧𝑙+ 𝛅𝑗𝑙, Ξ ) βˆ’ 𝑣(𝐧𝑙, Ξ ). Therefore, the assumption is made that all the possible link state transitions 𝐱𝑙→ 𝐲𝑙 that cause the transition 𝐧𝑙→ 𝐧𝑙+ 𝛅

𝑗𝑙 yield, on average, the same long-term reward gain 𝑔𝑗𝑙(𝐲𝑙, 𝐱𝑙, Ξ ) β‰ˆ 𝑔𝑗𝑙(𝐧𝑙, Ξ ).

5. From the link independence assumption, it follows that the state-dependent network reward gain now approximates as 𝑔𝑗(𝐲, 𝐱, Ξ ) β‰ˆ βˆ‘π‘™βˆˆπœŒπ‘”π‘—π‘™(𝐧𝑙, Ξ ). It is interpreted as follows: if at time 𝑑0 the network is in state 𝐱 and a class-j connection is admitted to a path 𝜌 ∈ Γ𝑗, it causes the transition 𝐱 β†’ 𝐲, which implies that every link 𝑙 in 𝜌 undergoes a macrostate transition 𝐧𝑙→ 𝐧𝑙+ 𝛅

𝑗𝑙 that is implicitly caused by a link state move 𝐱𝑙→ 𝐲𝑙. As a result, the network earns an immediate (or short-term) reward π‘Ÿπ‘—, from which π‘Ÿπ‘—π‘™ (ru) are earned through the link 𝑙 in the path. Besides π‘Ÿπ‘—, the connection brings a long-term reward 𝑔𝑗(𝐲, 𝐱, Ξ ), which is the sum of the long-term reward contributions on each link in 𝜌. As with the approximate state-dependent model, it is assumed that a class-j connection only affects the network reward through the links of the path on which it is routed.

4.2.2 The Value Iteration Algorithm

As with the exact and the approximate state-dependent network reward models, the linear system defined by Equation (4.16a) can be solved by setting one of the transient values 𝑣(𝐧𝑙, Ξ ) to zero. Then a system with |Ω𝐧𝑙| equations and |Ω

𝐧

𝑙| unknowns is obtained that can be solved by standard linear algebra methods. An alternative solution method, known as the value iteration algorithm (VIA), has successfully been applied to reward-based routing in telephone and packet-switched systems [DM92, DM94, Dzi97]. The VIA and its properties are thoroughly studied in [Tij86], and a comprehensive formulation of this method for Markov decision processes is presented in [Dzi97] - appendix B. The idea behind the VIA is to avoid a direct solution of the linear system by using a numerical iteration procedure. The procedure reduces the computation time, thereby facilitating the implementation of the macrostate-dependent model for online resource allocation. To apply the VIA, the first step is to discretize Equation (4.16a). For this, observe that from Equation (4.16c) we have:

𝑉(𝐧𝑙, Ξ , 𝑑) βˆ’ 𝑉(𝐧𝑙, Ξ , 𝑑 βˆ’ 1) = 𝑅𝑙(Ξ ) , π§π‘™βˆˆ Ω 𝐧

𝑙 (4.17) Similarly, from Equation (4.16d):

𝑉(𝐧𝑙+ 𝛅

𝑗𝑙, Ξ , 𝑑 βˆ’ 1) βˆ’ 𝑉(𝐧𝑙, Ξ , 𝑑 βˆ’ 1) = 𝑣(𝐧𝑙+ 𝛅𝑗𝑙, Ξ ) βˆ’ 𝑣(𝐧𝑙, Ξ ) (4.18) Therefore, Equation (4.16a) can be re-written from these two equations as:

𝑉(𝐧𝑙, Ξ , 𝑑) βˆ’ 𝑉(𝐧𝑙, Ξ , 𝑑 βˆ’ 1) = π‘ž(𝐧𝑙) + βˆ‘ πœ† 𝑗𝑙(𝐧𝑙, Ξ ) βˆ™ [𝑉(𝐧𝑙+ 𝛅𝑗𝑙, Ξ , 𝑑 βˆ’ 1) βˆ’ 𝑉(𝐧𝑙, Ξ , 𝑑 βˆ’ 1)] 𝐽 𝑗=1 + βˆ‘π½ π‘›π‘—π‘™βˆ™ Β΅π‘—βˆ™ [𝑉(π§π‘™βˆ’ 𝛅𝑗𝑙, Ξ , 𝑑 βˆ’ 1) βˆ’ 𝑉(𝐧𝑙, Ξ , 𝑑 βˆ’ 1)] 𝑗=1 , π§π‘™βˆˆ Ω𝐧𝑙 (4.19)

βˆ‘π½π‘—=1πœ†π‘—π‘™(𝐧𝑙, Ξ ) βˆ™ 𝜏 βˆ™ [𝑉(𝐧𝑙+ 𝛅𝑗𝑙, Ξ , π‘˜ βˆ’ 1) βˆ’ 𝑉(𝐧𝑙, Ξ , π‘˜ βˆ’ 1)]+

βˆ‘π½π‘—=1π‘›π‘—π‘™βˆ™ Β΅π‘—βˆ™ 𝜏 βˆ™ [𝑉(π§π‘™βˆ’ 𝛅𝑗𝑙, Ξ , π‘˜ βˆ’ 1) βˆ’ 𝑉(𝐧𝑙, Ξ , π‘˜ βˆ’ 1)], π§π‘™βˆˆ Ω𝐧 𝑙 (4.21) which defines a recurrence relation, with π‘˜ being the iteration index. Notice that from Equation (4.17) [Dzi97]:

lim π‘˜β†’βˆž[𝑉(𝐧

𝑙, Ξ , π‘˜) βˆ’ 𝑉(𝐧𝑙, Ξ , π‘˜ βˆ’ 1)] β†’ 𝑅𝑙(Ξ ) βˆ™ Ο„ (4.22) The reward 𝑉(𝐧𝑙, Ξ , π‘˜) is then calculated from Equation (4.21) as:

𝑉(𝐧𝑙, Ξ , π‘˜) = π‘ž(𝐧𝑙) βˆ™ Ο„ + βˆ‘ πœ†

𝑗𝑙(𝐧𝑙, Ξ ) βˆ™ Ο„ βˆ™ [𝑉(𝐧𝑙+ 𝛅𝑗𝑙, Ξ , π‘˜ βˆ’ 1) βˆ’ 𝑉(𝐧𝑙, Ξ , π‘˜ βˆ’ 1)] 𝐽

𝑗=1 +

βˆ‘π½π‘—=1π‘›π‘—π‘™βˆ™ Β΅π‘—βˆ™ 𝜏 βˆ™ [𝑉(π§π‘™βˆ’ 𝛅𝑗𝑙, Ξ , π‘˜ βˆ’ 1) βˆ’ 𝑉(𝐧𝑙, Ξ , π‘˜ βˆ’ 1)] + 𝑉(𝐧𝑙, Ξ , π‘˜ βˆ’ 1), π§π‘™βˆˆ Ω𝐧 𝑙 (4.23) Based on this, for a link 𝑙, the VIA is implemented to solve the linear system for 𝑅𝑙(Ξ ) and the transient reward values 𝑣(𝐧𝑙, Ξ ) as follows [RB16b]:

1. Define the parameter Ο΅ β‰₯ 0 as a scalar that denotes the relative accuracy with which 𝑅𝑙(Ξ ) and the values 𝑣(𝐧𝑙, Ξ ) need to be estimated.

2. Set π‘˜ = 0. For each macrostate π§π‘™βˆˆ Ω𝐧𝑙 assign an arbitrary value to the reward 𝑉(𝐧𝑙, Ξ , 0). 3. Set π‘˜ = π‘˜ + 1. For each macrostate 𝐧𝑙, determine 𝑉(𝐧𝑙, Ξ , π‘˜) by calculating the right hand side

of Equation (4.23).

4. With the results obtained in the previous step, calculate the upper estimate of 𝑅𝑙(Ξ ) βˆ™ Ο„ as: π‘€π‘˜ = max

π§π‘™βˆˆβ„¦π§π‘™{𝑉(𝐧

𝑙, Ξ , π‘˜) βˆ’ 𝑉(𝐧𝑙, Ξ , π‘˜ βˆ’ 1)} (4.24)

and the lower estimate of 𝑅𝑙(Ξ ) βˆ™ Ο„ as: mk= min

π§βˆˆβ„¦π§π‘™{𝑉(𝐧

𝑙, Ξ , π‘˜) βˆ’ 𝑉(𝐧𝑙, Ξ , π‘˜ βˆ’ 1)} (4.25)

5. If the upper and lower estimates obtained for 𝑅𝑙(Ξ ) βˆ™ Ο„ fulfil the inequality:

(Mkβˆ’ mk)/mk≀ Ο΅ (4.26) the iteration process is stopped. In this case, the approximate solution is:

𝑅𝑙(Ξ ) βˆ™ Ο„ β‰…π‘€π‘˜+mk

and the transient reward values are calculated from: 𝑣(𝐧𝑙+ 𝛅

𝑗𝑙, Ξ ) βˆ’ 𝑣(𝐧𝑙, Ξ ) β‰… 𝑉(𝐧𝑙+ 𝛅𝑗𝑙, Ξ , π‘˜) βˆ’ 𝑉(𝐧𝑙, Ξ , π‘˜) (4.28) If the inequality is not fulfilled, then go back to step tree, i.e. start a new iteration, and continue until (Mkβˆ’ mk)/mk≀ Ο΅.

The VIA calculates an approximate solution for the linear system whose accuracy depends on the parameter Ο΅. Independently of the reward values 𝑉(𝐧𝑙, Ξ , 0) used for π‘˜ = 0, the VIA always converges, which is a consequence of the existence of the limit in Equation (4.22). The computational complexity of the VIA per iteration is quadratic in the number of macrostates, i.e. O(∝ |Ω𝐧𝑙|2). Moreover, the number of iterations required is polynomial in |Ω𝐧𝑙| (see [Tij86]). The VIA avoids the solution of the linear system by linear algebra methods which may be too slow under an online resource allocation scenario.

4.2.3 Simplified Macrostate-Dependent Link Model

The cardinality of Ω𝐧𝑙 grows with the size (i.e. the number of components) of the link macrostates. That size equals the number of classes 𝐽 served by the network. The components 𝑛𝑗𝑙 of 𝐧𝑙= (𝑛1𝑙, … , 𝑛𝑗𝑙, … , 𝑛𝐽𝑙) have two properties. First, 𝑛𝑗𝑙= 0 for all classes which can never be carried by the link, i.e. for all 𝑗 such that 𝑗 βˆ‰ J𝑙. Otherwise, 𝑛

𝑗𝑙β‰₯ 0. Secondly, there can be two or more classes in J𝑙 with the same bandwidth 𝑏𝑗 and holding time Β΅π‘—βˆ’1. These two properties can be used to define a new macrostate-space Ω𝐧̂𝑙 with