• No results found

Reinforcement Learning for Resource Allocation and Time Series Tools for Mobility Prediction

N/A
N/A
Protected

Academic year: 2021

Share "Reinforcement Learning for Resource Allocation and Time Series Tools for Mobility Prediction"

Copied!
44
0
0

Loading.... (view fulltext now)

Full text

(1)

1/38

Reinforcement Learning for Resource Allocation

and Time Series Tools for Mobility Prediction

Baptiste Lefebvre1,2, Stephane Senecal2 and Jean-Marc Kelif2

1École Normale Supérieure (ENS), Paris, France,

[email protected]

2Orange Labs, Issy-les-Moulineaux, France

[email protected], [email protected]

First GdR MaDICS Workshop on Big Data for the 5G RAN 25 November 2015 @ Huawei FRC

(2)

2/38

Context Current Controler Proposed Controler Mobility Prediction Conclusion

Agenda

1 Context 2 Current Controler 3 Proposed Controler 4 Mobility Prediction 5 Conclusion

(3)

3/38

Agenda

1 Context 2 Current Controler 3 Proposed Controler 4 Mobility Prediction 5 Conclusion

(4)

4/38 Context Current Controler Proposed Controler Mobility Prediction Conclusion

Wireless Networks

f

UE =User Equipment

(5)

5/38

Radio Resource Management (RRM)

... ... slot (0.5 ms) 12 sub ca rriers (180 kHz) PRB1 PRE2

Allocation Sharing of joint timeslots and frequency bands

Load ρ= C X c=1 T r RDc nc

Quality of Service (QoS)

QoS =1− 1 2 ρ Energy/Power Consumption P =PBS+r PRS + ˜ρPAP ˜ ρ=min(ρ,1)

1. Physical Ressource Block

(6)

6/38 Context Current Controler Proposed Controler Mobility Prediction Conclusion

Goal : optimization of the energy consumption

under QoS constraints

Formal framework considered :reinforcement learning [SB98] More specifically, Markov Decision Processes (MDP)[Put94] :

• A systemstate enumerates UEs of each radio condition and enumerates active resources

• Anaction is eithernull, either a deactivationor anactivation of a resource

• Apolicy associates to every state an action to proceed

• In order to perform energy savings, one needs to compute or estimate an optimal policy, i.e. a policy which implements a good trade-off between energy (electricalpower) consumption and targetedQoSlevel

(7)

7/38

Agenda

1 Context 2 Current Controler 3 Proposed Controler 4 Mobility Prediction 5 Conclusion

(8)

8/38

Context Current Controler Proposed Controler Mobility Prediction Conclusion

MDP Controler

• Acontrolerexecutes a policyΠ, which for a given traffic amount, aims at maximizing an objective function (QoS, power)

Transition Probability Operator P(s,a,s0) Instantaneous Reward Function R(s,a)

• Searching for anoptimal policy for a fully known MDP model can be performed bydynamic programming

(9)

9/38

Controler for Geometric Criterion

max Π E " X t=0 φtR(st,Π(st))|s0 =s #!

• Solving anequations system by iterating until reaching a fixed point(geometric criterion) :

Π(s) =arg max a∈A X s0∈S P(s,a,s0) R(s,a) +φV(s0) ! V(s) = X s0∈S P(s,Π(s),s0)R(s,Π(s)) +φV(s0) Parameterφ∈[0;1[

(10)

10/38

Context Current Controler Proposed Controler Mobility Prediction Conclusion

Controler for Average Criterion

max Π Tlim→∞E " 1 T T X t=0 R(st,Π(st))|s0 =s #!

• Solving anequations system1 by iterating until reaching afixed

point(average criterion) :

Π(s) =arg max a∈A X s0∈S P(s,a,s0) R(s,a) +V(s0) ! V(s) = X s0∈S P(s,Π(s),s0)R(s,Π(s)) +V(s0)

(11)

11/38

States Transitions and Rewards

• The system evolves incontinuous timeand not indiscrete time

• It is possible to turn acontinuous-time MDPinto a discrete-time MDPvia the use ofuniformization anddiscretization schemes

• P(s,a,s0)is replaced by Q(s,a,s0) which denotes thetransition rate(i.e. Poisson process parameter)

(12)

12/38

Context Current Controler Proposed Controler Mobility Prediction Conclusion

States Transitions Modeling

Q(n,r),a,(n0,r0) =          λi ifBλi(s,a,s 0) ni 1 n r RDi Fi ifBµi(s,a,s0) 0 else Bλi(s,a,s0) =n0=n+e(i)∧ r0 =r+a Bµi(s,a,s 0 ) = n0=n−e(i)∧ r0 =r+a

(13)

12/38

States Transitions Modeling

Q(n,r),a,(n0,r0) =          λi ifBλi(s,a,s 0) ni 1 n r RDi Fi ifBµi(s,a,s 0) 0 else Bλi(s,a,s0) = n0=n+e(i)∧ r0 =r+a Bµi(s,a,s 0) =n0=ne(i) r0 =r+a

(14)

13/38

Context Current Controler Proposed Controler Mobility Prediction Conclusion

Rewards - Costs Functions

C(s,a) = X s0∈S Q(s,a,s0)6=0 γE(n0,r+a) + (1−γ)F(n0,r+a) E(n,r) =        PBS+rPRS PBS+R(PRS +PAP) ifn=0 PBS+r(PRS+PAP) PBS+R(PRS +PAP) else F(n,r) =1−exp     − log(2)T r Rn PC i=1ni PC i=1Dini    

(15)

13/38

Rewards - Costs Functions

C(s,a) = X s0∈S Q(s,a,s0)6=0 γE(n0,r+a) + (1−γ)F(n0,r+a) E(n,r) =        PBS+rPRS PBS+R(PRS+PAP) ifn=0 PBS+r(PRS +PAP) PBS+R(PRS+PAP) else F(n,r) =1−exp     − log(2)T r Rn PC i=1ni PC i=1Dini    

(16)

14/38

Context Current Controler Proposed Controler Mobility Prediction Conclusion

Current Results

• The optimal policy is athreshold policy

• The optimal policy depends ontraffic volume, on target throughputand on cell capacity

• The execution of the optimal policy enables energy savings of the order of40%

• Proposal of taking into accountactivation timeby adding a timer

(17)

15/38

Optimization under Congestion

The controler does not activate the whole resources in order to reduce congestion as fast as possible

(18)

16/38

Context Current Controler Proposed Controler Mobility Prediction Conclusion

Unused Resources

(19)

17/38

Excessive QoS

The controler can grant an effective QoS level much greater than initially targeted QoS level (e.g. 50 Kbps→ 400 Kbps)

(20)

18/38

Context Current Controler Proposed Controler Mobility Prediction Conclusion

Agenda

1 Context 2 Current Controler 3 Proposed Controler 4 Mobility Prediction 5 Conclusion

(21)

19/38

Context Current Controler Proposed Controler Mobility Prediction Conclusion

States Transitions Modeling

Q(s,a,s0) =                    λi ifBλi(s,a,s 0) ni 1 n r+a R Di Fi ifBµi(s,a,s0)∧ ¬B(s,a,s0) ni 1 n r RDi Fi ifBµi(s,a,s0)∧ B(s,a,s0) 0 else i ∧r0 =r+a∨ B(r0,a,r) Bµi (n,r),a,(n0,r0)=n0 =n−e(i)∧r0=r+a∨ B(r0,a,r) B(r0,a,r) = (r0=r =1∧a=−1)∨(r0 =r =R∧a=1)

(22)

19/38

Context Current Controler Proposed Controler Mobility Prediction Conclusion

States Transitions Modeling

Q(s,a,s0) =                    λi ifBλi(s,a,s 0) ni 1 n r+a R Di Fi ifBµi(s,a,s0)∧ ¬B(s,a,s0) ni 1 n r RDi Fi ifBµi(s,a,s0)∧ B(s,a,s0) 0 else Bλi (n,r),a,(n0,r0) = n0 =n+e(i)(n0 =nn =N) ∧r0 =r+a∨ B(r0,a,r) Bµi (n,r),a,(n0,r0)=n0 =n−e(i)∧r0=r+a∨ B(r0,a,r) B(r0,a,r) = (r0=r=1∧a=−1)∨(r0 =r =R∧a=1)

(23)

20/38

Ideal and Effective Power Consumption

• Ideal Power Consumption:

P∗(n) = (

PBS+PRS if α(n) =0

PBS+dα(n)ePRS +α(n)PAP else

• Ideal Number of Resources2 :

α(n) =min C X i=1 ni T Di R,R !

• Effective Power Consumption:

ˆ P(n,r) = ( PBS+rPRS ifn=0 PBS+rPRS +rPAP else 2. Solving equationF(n,r) =12

(24)

21/38

Context Current Controler Proposed Controler Mobility Prediction Conclusion

Power Consumption Error Modeling

• Normalized Regret: E(n,r),a=          ˆ P(n,r)−P∗(n) R(PRS +PAP) ifB(r,a) ˆ P(n,r+a)−P∗(n) R(PRS +PAP) else B(r,a) = (r =1∧a=−1)∨(r =R∧a=1)

(25)

22/38

Rewards - Costs Functions

• Symmetrical Instantaneous Reward:

R(s,a) =−|E(s,a)|

• Asymmetrical Instantaneous Reward:

(26)

23/38

Context Current Controler Proposed Controler Mobility Prediction Conclusion

(27)

24/38

Overall Performance

β current controler proposed controler γ qˆ0,01 qˆ0,5 qˆ0,99 θ qˆ0,01 qˆ0,5 qˆ0,99 1 2 0,604 −0,98 +0,40 +0,80 1 −0,02 +0,00 +0,02 0,5 +0,21 +0,52 +0,84 1e−4 −0,02 +0,00 +0,02 0,4 +0,23 +0,54 +0,85 1e−8 −0,02 +0,00 +0,02 3 4 0,604 −0,98 +0,33 +0,60 1 −0,08 +0,02 +0,04 0,5 +0,21 +0,44 +0,66 1e−2 −0,04 +0,02 +0,08 0,4 +0,22 +0,47 +0,70 1e−4 +0,00 +0,06 +0,12 9 10 0,604 −0,98 +0,03 +0,41 1 −0,34 −0,02 +0,32 0,5 −0,15 0,21 +0,50 55e−3 −0,13 +0,21 +0,44 0,4 −0,09 +0,27 +0,54 3e−3 +0,00 +0,35 +0,54

(28)

25/38

Context Current Controler Proposed Controler Mobility Prediction Conclusion

Overall Performance

(29)

26/38

Agenda

1 Context 2 Current Controler 3 Proposed Controler 4 Mobility Prediction 5 Conclusion

(30)

27/38

Context Current Controler Proposed Controler Mobility Prediction Conclusion

Mobility

• Traffic due toarrivalsand

departuresof UEs in the coverage zoneof the BS, modeled by

Poisson processes

• Moves of UEs inducing

propagation losses,shadowingand

(31)

28/38

Problem Statement

• Theactivation/deactivation timeframeof a physical resource is not taken into account in the modeling

• Idea : implement the prediction of states to be visited in the next seconds

• This approach makes it possible to consider mobile users

• Given SINR traces of users who crossed the cell and the SINR trace of a user currently crossing the cell, we aim at estimating the SINR to be measured in the near future

(32)

29/38

Context Current Controler Proposed Controler Mobility Prediction Conclusion

Problem Modeling

• LetT ={T1,· · ·,TK} denote a set of time series

• LetT1 =ht1,1,· · ·,t1,N1i denote a time series • . . .

• LetTK =htK,1,· · · ,tK,NKi denote a time series

• LetT =ht1,· · ·,tNi denote a time series to be completed

ˆ tN+1 =f(T) ˆ tN+1=g(T) Tk ∼ D ˆ tN+1 =h(T) Tk ∼ D={D1,· · ·,DM}

(33)

30/38

Dynamic Time Warping (DTW)

• LetT =ht1,· · ·,tNi denote a time series

• LetT0 =ht10,· · ·,tN0 0i denote another time series

• Letd denote a distance measure between elements of these time series

D(ti,tj0) =d(ti,tj0) +min D(ti−1,tj0−1),D(ti−1,tj0),D(ti,tj0−1)

DTW(T,T0) =D(tN,tN00)

(34)

31/38

Context Current Controler Proposed Controler Mobility Prediction Conclusion

Barycentric Averaging DTW

• LetT ={T1,· · ·,TK} denote a set of time series

• LetT1 =ht1,1,· · ·,t1,N1i denote a time series • . . .

• LetTK =htK,1,· · · ,tK,NKi denote a time series

Thebarycentric averaging DTW T satisfies (cf. [PKG11]) :

∀N∈N∗, ∀T =ht1,· · · ,tNi K X k=1 DTW(T,Tk) 2 ≤ K X k=1 DTW(T,Tk) 2

(35)

32/38

Fast Dynamic Time Warping (FastDTW)

• Multi-level approachfor the computation of the dynamic time warping, cf. [SC04]

• Linearspatial complexity

• Lineartemporal complexity

• Approximation methodenjoying a good precision (via tuning parameterr)

(36)

33/38

Context Current Controler Proposed Controler Mobility Prediction Conclusion

Preliminary Results

Estimations implemented with a precision of dB order for time horizons of 1s order

(37)

34/38

Agenda

1 Context 2 Current Controler 3 Proposed Controler 4 Mobility Prediction 5 Conclusion

(38)

35/38

Context Current Controler Proposed Controler Mobility Prediction Conclusion

Conclusion

Summary :

• Reviewof State-of-the-Art controlers

• Proposal of a modified andimproved controler

• Proposal of amobility prediction mechanism

(different from those proposed for intercells transfert management)

Work in progress/Perspectives :

• Integrationof the mobility prediction module to the controler

• Enhancementof the mobility prediction mechanism

• Design of ahigher-level control system for many cells, even for an entire network

(39)

36/38

References

[PKG11] François Petitjean, Alain Ketterlin, and Pierre Gançarski.

A global averaging method for dynamic time warping, with applications to clustering.

Pattern Recognition, 44(3) :678–693, 2011.

[Put94] Martin Puterman.

Markov decision processes : discrete stochastic dynamic programming.

Wiley-Interscience, 1994.

[SB98] Richard S. Sutton and Andrew G. Barto.

Reinforcement Learning : An Introduction.

MIT Press Cambridge, 1998.

[SC78] Hiroaki Sakoe and Seibi Chiba.

Dynamic Programming Algorithm Optimization for Spoken Word Recognition.

Transactions on Acoustics, Speech and Signal Processing, 26(1) :43–49, 1978.

[SC04] Stan Salvador and Philip Chan.

FastDTW : Toward accurate dynamic time warping in linear time and space.

(40)

37/38

Context Current Controler Proposed Controler Mobility Prediction Conclusion

Thank you !

Thanks for your attention !

Questions ?

These research works are funded by Orange and supported by the collaborative research project ANR NETLEARN (ANR-13-INFR-0004)

(41)

38/38

Appendix : example of a MDP-based controler

0 1 0 2 0 3 0 4 1 1 1 2 1 3 1 4 2 1 2 2 2 3 2 4 3 1 3 2 3 3 3 4 4 1 4 2 4 3 4 4 5 1 5 2 5 3 5 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

(42)

38/38

Context Current Controler Proposed Controler Mobility Prediction Conclusion

Appendix : example of a MDP-based controler

0 1 0 2 0 3 0 4 1 1 1 2 1 3 1 4 2 1 2 2 2 3 2 4 3 1 3 2 3 3 3 4 4 1 4 2 4 3 4 4 5 1 5 2 5 3 5 4 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1

(43)

38/38

Appendix : example of a MDP-based controler

0 1 0 2 0 3 0 4 1 1 1 2 1 3 1 4 2 1 2 2 2 3 2 4 3 1 3 2 3 3 3 4 4 1 4 2 4 3 4 4 5 1 5 2 5 3 5 4 −1 −1 −1 −1 −1 −1 −1 −1 −1 −1 −1 −1 −1 −1 −1 −1 −1 −1 −1 −1 −1 −1 −1 −1 −1 −1 −1 −1 −1 −1 −1 −1 −1

(44)

38/38

Context Current Controler Proposed Controler Mobility Prediction Conclusion

Appendix : example of a MDP-based controler

0 1 0 2 0 3 0 4 1 1 1 2 1 3 1 4 2 1 2 2 2 3 2 4 3 1 3 2 3 3 3 4 4 1 4 2 4 3 4 4 5 1 5 2 5 3 5 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 −1 −1 −1 −1 −1 −1 −1 −1 −1 −1 −1 −1 −1 −1 −1 −1 −1 −1 −1 −1 −1 −1 −1 −1 −1 −1 −1 −1 −1 −1 −1 −1 −1

References

Related documents

The Bidding Documents shall clearly and adequately define, among others: (i) the objectives, scope, and expected outputs and/or results of the proposed contract or

strategies TRAFFIC STREAM MODELS IMPROVEMENT ASSESSMENT OF DRIVERS’ BEHAVIOR UNDER ADVERSE WEATHER CONDITIONS RESULTS INTEGRATION DATA MINING MICROSCOPIC ANALYSIS

Margins of excision · Basal cell carcinoma · Medicine · Physicians · Quality of health care · Comparative study · Surgery · General practitioners · Dermatologists · Plastic

The effect of agent inequality aversion social preference on spatial ultimatum game Fehr and Schmidt's (1999) model assumes that agents are heterogeneous.. Although certain people

Considerable research confirms that patient satisfaction surveys using ratings are leading indicators of healthcare outcomes, including compliance with medical advice, likelihood

expeditiously filed, showed a meritorious cause of action and presented numerous reasonable explanations for the non-appearance of plaintiff and his counsel?”

Battleship ...+50 points Grand Cruiser ...+30 points Heavy Cruiser ...+25 points Cruiser...+20 points A Daemonship may not be commanded by a Warmaster or a Chaos Lord even if it is

Tougaloo College, MS LeMoyne-Owens College, TN Stillman College, AL 2000 Huston-Tillotson College, TX Voorhees College, SC LeMoyne-Owens College, TN Alabama State