Chapter 4 Approximate Approach to Online Resource Allocation
4.1. Approximate Reward Maximization Problem
the computational complexity involved in exact (state-dependent) network models. For loss or circuit- switched networks, an approach which has been proved to be suitable under certain limiting regimes is the reduced load approximation [Kel91, KY14, CR93]. It is based on the link independence assumption which supposes statistical independence of the link state distributions. In this section we derive the state- dependent network reward model that this assumption defines. Then in the subsequent sections we use the model to formulate an approximate approach to online resource allocation.
4.1.1 The Link Independence Assumption
A stochastic optical network is said to fulfil the link independence property if the there are no correlations (i.e. no mutual relationships) among the state distributions of the network links. In this case, each link blocks independently, and thus, the probability π΅ππthat a path π β Ξπ blocks a class-j connection is:
π΅π π
= 1 β
β (
πβπ 1 β π΅ππ)
(4.1a) where π΅ππ is the probability that a class-j connection gets blocked by the link π in π. Since the class-j traffic is offered to all paths in Ξπ, the probability that the network blocks a class-j connection is:Table 4.1: Notation for relevant variables and parameters defined in Chapter 4. Symbol Units Description
π π(Ξ ) ru/uot Mean rate at which link π earns reward from carried connections πππ ru/con Reward parameter of class-j connections on link π
πππ
(
Ξ)
con/uot Mean rate at which class-j connections arrive at link π π΄ππ Erlangs Mean class-j traffic offered to link ππ΄cππ Erlangs Mean class-j traffic carried over link π
π΅ππ --- Blocking probability of class-j connections on link π πππ
(
π±π, Ξ)
con/uotMean rate at which class-j connection arrivals cause the link state transition π±π β π²π
π£(π±π, Ξ ) ru Transient reward earned by having set link π in state π±π at π‘0
π(π±π, Ξ , π‘) ru Reward expected at π‘ β« π‘0, if link π were set in state π±π at π‘0. This reward is earned at a constant link reward rate π π(Ξ )
Ξπ±π+π , Ξπ±πβπ ---
Sets of link states reachable due to class-j connection admissions and departures in link state π±π, respectively
πππ(π²π, π±π, Ξ ) ru
State-dependent link reward gain πππ(π²π, π±π, Ξ ) = π£
(
π²π, Ξ)
β π£(
π±π, Ξ)
. It is the long-term reward earned by link π at π‘ β« π‘o from a class-j connection arriving in state π±π that causes the link state transition π±π β π²ππππ
(
π§π, Ξ)
con/uotMean rate at which class-j connection arrivals cause the link macrostate transition π§π
β
π§π+ ππ π
π£(π§π, Ξ ) ru Transient reward earned by having set link π in macrostate π§π at π‘0
π(π§π, Ξ , π‘) ru Reward expected at π‘ β« π‘0, if link π were set in macrostate π§π at π‘0, this reward is earned at a constant link reward rate π π(Ξ )
πππ(π§π, Ξ ) ππ
π(π§π, Ξ ) = π£
(
π§π+ π ππ, Ξ
)
β π£(
π§π, Ξ).
Macrostate-dependent link reward gain. It is the long-term reward earned by link π from a class-j connection arriving in macrostate π§π that causes the transition π§πβ
π§π+ ππ π
ππ(π², π±, Ξ ) ru
State-dependent network reward gain. It is the long-term reward brought to the network by a class-j connection arriving in state x. Upon admission, the connection causes the network state transition π± β π² . This reward is estimated as follows:
- Exact state-dependent model: ππ(π², π±, Ξ ) = π£(π², Ξ ) β π£(π±, Ξ ) - Approximate state-dependent model: ππ
(
π², π±, Ξ)
ββ
πππ(
π²π, π±π, Ξ)
πβπ
- Approximate macrostate-dependent model: ππ(π², π±, Ξ ) β
β
πππ(π§π, Ξ ) πβπwhere π is the path on which the connection is routed X
Μ
π§π --- Set of link states π±π that yield the carried traffic (or macrostate) π§πππ§,ππ ---
Probability of observing the macrostate π§π defined by a non-blocking state π±π in X
Μ
π§ π
πππ(Ξ ) = ππβ πΜππ(Ξ )
βπβΞππΜππ(Ξ ) (4.4) where π
Μ
ππ
(
Ξ)
is the acceptance rate of class- j traffic on path π. This rate can be determined from online traffic measurements.Equation (4.2) states that the traffic offered to link π is the traffic offered to the path π reduced at every link π β π in π by a factor (1 β π΅ππ ). As a result, πππ
(
Ξ)
describes a Poisson process from which the link performance, defined as the link reward rate π π(Ξ ), can be evaluated from a performance function that depends on πππ(Ξ ), the link capacity πΆπ, and the parameters Β΅πβ1, ππ and ππ of the π½ connection classes:π π(Ξ ) = π(π
ππ(Ξ ), Β΅πβ1, ππ, ππ,πΆπ) (4.5) Therefore, if the network fulfils the link independence property - as defined by Equations (4.1)-(4.4), two equivalent approaches may evaluate the network reward rate π (Ξ ). First, the exact state-dependent model derived in Chapter 3. And secondly, a link-based approach whereby π (Ξ ) is determined by the performance functions of the network links. Such functions are given by Equation (4.5) which, as it will be shown in Section 4.1.3, defines a linear system similar to that obtained for the exact network model. From a computational perspective, the link based-approach is more advantageous as the performance functions are defined over link state-spaces which have smaller cardinalities than the network state-space. By this approach exact results are only obtained when the link independence property is satisfied. This occurs in networks that mainly route traffic on direct link paths or in networks that, besides operating at low traffic loads, use multi-link paths with a few number of links [Dzi97,CR93]. Although not all real network scenarios comply with these conditions, the link-based approach can still be applied (owing to its reduced computational complexity) as an approximate method to estimate the network reward rate π (Ξ ). In most real networks correlations exist and they can be classified into two types, namely, link and path correlations. The former ensue from connections routed over multi-link paths (the larger the number of links in the path, the larger the correlations). The latter stem from the routing decisions that assign traffic flows to the paths in Ξπ. If the link-based approach estimates π (Ξ ) in networks where correlations of either type exist, there will be an estimation error that grows with the strength of the correlations. In a specific network scenario, it is difficult to quantify how strong or weak these correlations are. The reason is that they are not only influenced by the definition of the set Ξπ (i.e. the definition of the number of routes and the number of links in each route) but are also determined by the traffic characteristics and the policy in use. In spite of this, the accuracy of the approach can be improved by employing simple network design rules that counteract correlations. One of them is to calculate sets Ξπ with a few number of paths, where each path has a short length w.r.t. the number of links. Another strategy is to implement admission control rules that avoid accepting connections on large multi-link paths (especially in high traffic load conditions).
The term βlink independence assumptionβ has widely been used in the literature to refer to network models that assume statistical independence of the link state distributions β and thus, those models rely on link-based approaches that estimate the network performance. For example, the performance evaluation results in [Kel88, Kel11, Kel91, KY14, CR93, DPKW88, DM89, DM92, DM94, Kri91, HKT00, Hwa93, Dzi97, Nor02] show that, for telephone and packet-switched data networks, the assumption is a valid approach to designing network control mechanisms. Inspired by these results, in the following we use the assumption to formulate an approximate state-dependent stochastic model that estimates the network reward rate π (Ξ ). A comprehensive study of the link independence assumption, and its implications, can be found in [Dzi97].
Figure 4.1: Flex-grid optical network with four nodes, five links and 12 classes [RB16d].
4.1.2 Reformulation of the Network Reward Maximization Problem
The link independence assumption has successfully been applied to reward-based routing in packet- switched networks, see for example the approaches proposed in [DPKW88, DM89, DM92, DM94, Kri91, HKT00, Hwa93, Dzi97, Nor02]. By using this approximation, we have that the rate π (Ξ ) at which the network earns reward can be calculated as the sum of the link reward rates π π(Ξ ):
π (Ξ ) = β π π(Ξ )
π (4.6) and therefore, the network reward maximization problem is now defined as:
π β= max Ξ β π
π(Ξ )
π (4.7) Since Equation (4.6) assumes that all network links block independently, then the assumption is made that any connection modifies π (Ξ ) by solely changing the rates π π(Ξ ) of the links on which it is carried. This implies that for a class-j arrival in state π±, the network reward rate attainable by the policy decision Ξ (π±, π) is calculated from Equation (4.6) as:
π (Ξ (π±, π)) = β π π(Ξ (π±, π))
πβπ (4.8) where the sum is over the links in the path π on which the connection is routed. Notice that, as seen in Chapter 3, the selected path π is implicitly defined by the state π² β Ξπ±π+, which results from the transition π± β π² caused by the decision Ξ (π±, π). The striking consequence of this is that to calculate a decision that outperforms Ξ (π±, π), it is only necessary to know the reward rates of the links that make up the paths in Ξπ. The applicability of this result will be explored in Section 4.4. In the meantime, let us formulate the state-dependent network reward model that results from the link independence assumption.
Example 4.1 Consider the optical network in Fig. 4.1, which has four nodes, five links and serves 12
connection classes (two classes per node-pair). Under the link independence assumption, the network reward rate is given by π (Ξ ) = π 1(Ξ ) + π 2(Ξ ) + π 3(Ξ ) + π 4(Ξ ) + π 5(Ξ ). Let us consider class-7 connections, which can only be routed over the paths in the set Ξ7= {(A, B), (A, C, B)}. Under the link independence assumption, it is assumed that class-7 can only modify π (Ξ ) by changing the rates of the links in Ξ7. For instance, if the policy Ξ has a decision Ξ (π±, 7), such that a class-7 connection is admitted in state π± on the path π = (A, C, B), then from Equation (4.8) we have that π (Ξ (π±, 7)) = π 1(Ξ (π±, 7)) + π 3(Ξ (π±, 7)), with π = 1 and π = 3 denoting link (A, C) and link (C, B), respectively.
Every link π has a capacity of πΆπ spectrum slots. The exact state-dependent reward model for this network is given by the set of Equations (3.29) in Chapter 3, where π (Ξ ) is calculated by solving a linear system defined in the state-space β¦π±. Before applying the link independence assumption to the derivation of a simplified network reward model, let us first calculate the rate at which a network link π earns reward.
The network stochastic process {ππ‘Ο, π‘ β₯ 0} implicitly defines for every link π, a continuous-time stochastic process {ππ‘π,Ο, π‘ β₯ 0}. This process describes the time evolution of the link state when a policy Ξ is used. The random variable ππ‘π,Ο takes its values from the link state-space β¦π±π, and thereby, ππ‘π,Ο is the link state at time π‘. The stochastic properties of this process stem for the parameters (π, π)π, ππ, Β΅πβ1, ππ, Ξπ, and ππ that define each connection class. In Fig. 4.2, we present a simplified view of the most relevant parameters and variables that describe the stochastic behaviour of a link π.
Class-j connections arrive at link π at a rate πππ(Ξ ) (con/uot), which is policy-dependent as it originates from the decisions defined in Ξ , and thus, we have that according to Equation (4.2), πππ(Ξ ) is not equal to the arrival rate ππ. Then class-j offers to the link a traffic load (in Erlangs) given by:
π΄ππ=πππ(Ξ )
ππ (4.9) From this load, the link carries a traffic:
π΄πcπ= π΄ππ β (1 β π΅ππ) (4.10)
where π΅ππ is the probability that a class-j connection request is blocked by the link. If the network is in state π± = (π±1, β¦ , π±π, β¦ , π±πΏ), and the policy Ξ admits a class-j connection on a path π, such that π β π, then the link π makes the state transition π±πβ π²π,where π±π, π²πβ β¦
π±
π. As a result of this event, the connection seizes ππ adjacent slots in the link, and the network earns an immediate reward of ππ (ru). From this reward, πππ (ru) are earned over link π. We denote this reward as the link reward parameter which is calculated as:
πππ= { ππ
|π|, if π β Jπ
0, if π β Jπ (4.11) Recall that Jπ is the set of class-j connections which can be carried on link π (see Table 3.1 in Chapter 3). Thus, for class-j connections which can never be routed through link π, we have that πππ= 0, i.e. the link π is not within the paths in Ξπ. Otherwise, πππ is ππ divided by the number of links |π| in the path π, i.e. the reward parameter ππ is evenly distributed among all links in π. This reward division rule has been studied along with other rules in [DM89, DM92, DM94, Dzi97] for routing in telephone and packet-switched networks. The performance results therein presented show that Equation (4.11) gives the simplest and most effective rule to apportion the reward parameters ππ among network links. We then interpret πππ as the immediate or short-term reward earned by a link π from a class-j connection.
Example 4.2 In order to illustrate the reward division rule, consider the link π = 5 in Fig. 4.1, i.e. that interconnecting the nodes B β D. This link is in the path π = (A, B, D) β Ξ3, Ξ9, which has |π| = 2 links, it is also in the path π = (B, D) β Ξ5, Ξ11, which has |π| = 1, and it is in the path π = (C, B, D) β Ξ6, Ξ12, which has |π| = 2 (see Fig. 4.1). Therefore, the classes carried by this link are J5= {3,5,6,9,11,12}. From
Equation (4.11), these six classes define on the link B β D the reward parameters: π35= 0.5, π55= 1.0, π65= 0.5, π95= 0.5, π115 = 1.0 and π125 = 0.5 (ru). For the remaining six classes ππ5= 0, as they are not defined in J5. If a class-12 connection is routed over the path (C,B,D), it brings an immediate reward of r12= 1 (ru) to the network, where from this reward r125 = 0.5 (ru) are earned over the link B β D.
Since π΄cππ is the mean number of class-j connections simultaneously carried on link π, class-j traffic is expected to yield reward at a mean rate πππβ Β΅πβ π΄cππ. Therefore, link π earns reward at a mean rate π π(Ξ ) =
β πππβ Β΅πβ π΄c
π
π
j (ru/uot), from which we have that the link performance function in Equation (4.5) is: π π(Ξ ) = β π
ππβ πππ(Ξ ) β (1 β π΅ππ) π½
π=1 (4.12) Let us now apply the link independence assumption to calculate the dependence of Equation (4.12) on the link states. If at time π‘π the network is in state π± = (π±1, β¦ , π±π, β¦ , π±πΏ), the probability of observing at time π‘π+1 the link π in a state π²π (as a result of a transition π± β π² which causes the link state transition π±πβ π²π) depends only on the link state π±π at time π‘π:
π{ππ‘π+1 π,Ο = π²π|π π‘π 1,Ο= π±1, β¦ , π π‘π π,Ο= π±π, β¦ , π π‘π πΏ,Ο= π±πΏ} = π{π π‘π+1 π,Ο = π²π|π π‘π π,Ο= π±π} (4.13) i.e. the states of the remaining links at time π‘π have no influence on the link state transition (and thus, the correlations among links are negligible). Based on this, we have that each network link can be treated independently from one another, i.e. the optical network behaves as a compound of πΏ independent single- link networks. Therefore, the dependence of π π(Ξ ) on the link state π±π is obtained by applying the set of Equations (3.29) in Chapter 3 to the case of a single-link network [RB16b]:
π π(Ξ ) = π(π±π) + β β π ππ(π±π, Ξ ) β [π£(π²π, Ξ ) β π£(π±π, Ξ )] π²πβΞ π±π π+ + π½ π=1 β β ππβ [π£(π²π, Ξ ) β π£(π±π, Ξ )] π²πβΞ π±π πβ π½ π=1 , π±πβ β¦π±π (4.14a) where Ξπ±π+π and Ξπ±πβπ are the sets of link states which are reachable due to class-j connection admissions
and departures in state π±π, respectively. The first double summation is the contribution to π π(Ξ ) owing to connection requests that arrive at the link in state π±π. Connections arrive at a state-dependent rate πππ(π±π, Ξ ) - the actual relationship between πππ(π±π, Ξ ) and πππ(Ξ ) will be studied in Section 4.3. The second double summation is the contribution to the link reward rate due to departures. Moreover, π(π±π) is the rate at which the link yields reward in state π±π, and is given by:
π(π±π) = β π
ππβ ππ(π±π) π½
π=1 (4.14b) with ππ(π±π) being the termination rate of carried class-j connections, which is calculated as:
ππ(π±π) = π
πβ ππ(π±π) (4.14c) where nπ(π±π) β₯ 0 is the number of class-j connections carried by the link in state π±π. In addition to Equation (4.14a), link π fulfils the following two equations:
π(π±π, Ξ , π‘) = π π(Ξ ) β π‘ + π£(π±π, Ξ ) , π±πβ β¦ π±
π (4.14d) πππ(π²π, π±π, Ξ ) = π(π²π, Ξ , π‘) β π(π±π, Ξ , π‘) = π£(π²π, Ξ ) β π£(π±π, Ξ ) (4.14e) with π(π±π, Ξ , π‘) being the reward expected at π‘ β« π‘0, if the link was in state π±π at π‘0. Equation (4.14d) states that in steady state, the link earns reward at a rate π π(Ξ ) regardless of the link state π±π at π‘0. Furthermore, Equation (4.14e) defines πππ(π²π, π±π, Ξ ) as the state-dependent link reward gain. It is the long-term reward that a class-j connection brings to the link if it gets admission in state π±π.
Equation (4.14a) defines a linear system over all link states in β¦π±π. Given that for all π±π the parameters π(π±π), π
ππ(π±π, Ξ ) and ππ are known, the linear system is solved for π π(Ξ ) and the values π£(π±π, Ξ ). This is accomplished by arbitrarily setting one of the values π£(π±π, Ξ ) to zero. Thus, under the link independence assumption, Equations (4.14) define an approximate, state-dependent network reward model whereby the
πππ(π²π, π±π, Ξ ) = π£(π²π, Ξ ) β π£(π±π, Ξ )
Table 4.2: Comparison between the exact and the approximate state-dependent reward models.
the calculation of the rate π (Ξ ) is performed as follows. First, besides the policy Ξ , the parameters (π, π)π, ππ, Β΅πβ1, ππ, Ξπ, and ππ are defined. Second, for each link, these parameters and the link capacity πΆπ are used to calculate the state-space β¦π±π (every state in β¦π±π must fulfil the spectrum contiguity constraint). Third, the linear system for every link is defined from Equation (4.14a) based on the decisions of the policy Ξ . Then each system is solved for π π(Ξ ) and the values π£(π±π, Ξ ). Finally, Equation (4.6) is used to calculate π (Ξ ) as the sum of the πΏ reward rates π π(Ξ ). Compared to the exact model, which only requires the solution of a linear system, the approximate approach involves πΏ independent linear systems. However, those systems have cardinalities |β¦π±π| βͺ |β¦
π±|, which to some extent alleviate the computational requirements