• No results found

Application of Reinforcement Learning on High Speed Rail Cognitive Radio

N/A
N/A
Protected

Academic year: 2020

Share "Application of Reinforcement Learning on High Speed Rail Cognitive Radio"

Copied!
5
0
0

Loading.... (view fulltext now)

Full text

(1)

2016 International Conference on Artificial Intelligence: Techniques and Applications (AITA 2016) ISBN: 978-1-60595-389-2

Application of Reinforcement Learning on

High-Speed Rail Cognitive Radio

Qing-ting WU, Cheng WU

*

and Yi-ming WANG

School of Urban Rail Transportation, Soochow University, Suzhou, China

*Corresponding author

Keywords: High-speed rail, Cognitive base station, Reinforcement learning.

Abstract. We all know that wireless communication plays a crucial role in the success of

high-speed rail operation. If we apply cognitive radio (CR) technology to individual in high-speed rail, it may raise some problems such as blind learning and frequent channel switch. In this article, we propose to embed the CR technology into base station which called cognitive base station (CBS). Each time, the CBS is qualified to sense all spectrums and choose the best one for individual. The proposed CBS implements reinforcement learning to adapt to environment condition and learn from experience. Finally we prove this model can significantly reduce the frequency hopping.

Introduction

Nowadays, the development of high-speed rail puts forward higher requirement for wireless communication. Especially when the speed is up to 350km/h, the communication channel of the train needs to be switched quickly [1]. Under these circumstances, the state of wireless communication often shows unstable. On the other hand, today’s wireless networks are characterized by a fix spectrum assignment policy which lead to the inefficiency in spectrum usage [2]. The instability in the vehicle wireless communication and low resource utilization foresee the CR technology being applied to high-speed rail environment.

The ultimate aim of CR is to achieve the best available spectrum without interfering with the licensed user (also called primary user) [3].There are few publications about applying CR to high-speed rail. Berbineau presents the first results obtained in the ANR project CORRIDOR (Cognitive Radio for High Speed Railway through Dynamic and Opportunistic spectrum Reuse) which is the first research project devoted to CR systems for Railway in France. But there is no project with real implementation of CR in the railway domain [4]. Wu proposes a wireless cognitive model for high-speed individuals’ spectrum management and shows a small performance improvement in wireless communication [5]. But there still exists some problems. The individual who senses the same environment, may possibly obtain the same results which leads to blind learning. If they compete for the same channel, it will lead to network congestion. On the other hand, too many users not only means the cost of time consuming, but also the interference to neighbors and primary users (PUs). And last but not least is the spectrum handoff resulting from switching cell. Based on these viewpoints, we propose to embed CR technology into base station. The basic idea is that the CBS senses spectrum holes and assigns the appropriate spectrum for individual according to its experience. Compared to WU, it can extremely reduce the computation. At the same time, we enable the CBS to cooperate with each other to further reduce frequency hopping.

The Global System for Mobile Communications Railway (GSM-R) implemented today is based on GSM which is designed for the need of special railway features [6]. The GSM system is dedicated to provide the bidirectional radio bearer for the train signaling systems in a 4MHz band which has a high priority level [7]. We assume to use these authorized bands for wireless communication without interfering with railway control system. In this way, we can improve the spectrum utilization and achieve great quality of service (QoS).

(2)

Cognitive Base Station Model

Our proposed CBS is capable of CR technology, which has four spectrum management function: spectrum sensing, spectrum mobility, spectrum decision and spectrum sharing [8,9]. Each time, it senses all spectrums and obtains the corresponding states. Then it chooses the most available spectrum hole based on a certain criteria for individuals. Once it has detected PUs, it will take actions. Figure 1 shows the CBS architecture. The cognitive engine is the intelligent part which performs CR technology. The policy engine receives the information from cognitive engine and the learning results as return.

Operational CBS Platform

Cognitive Engine Policy

Engine

Communication System

Application

Transport

Network

Link/Mac

PHY

External environment CR Users

Figure 1. CBS architecture.

Initial state and reward

Is PU on?

Yes

Assign -15 reward

Change state

No Assign +5 reward

[image:2.595.81.424.197.357.2]

Figure 2. The Q-Learning process on CBS mode.

We deploy the CBS along the railway. Each CBS agent has the same spectrum bands and its own PUs. The spectrum band is indexed as𝑓 ∈ 𝑭, 𝑭 = {1,2, … … , 𝑀}. 𝑀is the number of spectrum. We denote individuals as CR agents. The CBS agents assign spectrums independently to the CR agents in the range. A choice of spectrum by the CBS agent 𝑖 is essentially the choice of the frequency represented by 𝑓𝑖. In each slot time, CBS monitors all spectrums especially the chosen one [10].

Spectrum Management Based Cognitive Base Station

The Reinforcement Learning

Reinforcement learning (RL) [11] is the problem faced by an agent that must learn behavior through trial-and error interactions with a dynamic environment [12]. Q-Learning is a popular technique in RL which uses an iteration update method to build a policy. A learning agent observes states and rewards from external environment, learns, decides and conducts actions based on these observation [13,14]. The formula of Q-learning is as follow:

Q(st, at) ← Q(st, at) + α[rt+ γmaxQ(st+1, a) − Q(st, at)] (1)

where

(1) 𝑠𝑡 ∈ 𝑆 represents state which is observed by the agent from its operating environment, (2) 𝑎𝑡 ∈ 𝐴 represents action which may affect the state,

(3) 𝑟𝑡∈ 𝑅 is the reward which is the positive or negative effects of an action, (4) 0 ≤ γ ≤ 1 represents discount factor,

(5) 0 ≤ α ≤ 1 represents learning rate.

Application to Cognitive Base Station

Cognitive base station is a learning agent, denoted 𝐶𝐵𝑆. As we mentioned above, the CR network consists of CBS agents and CR users. Each CBS has its own primary users and spectrum bands, denoted 𝑃𝑈𝑠 and 𝑆𝑃 respectively. For each CBS, we define:

(3)

(2) 𝐴 = {𝑠𝑤𝑖𝑡𝑐ℎ𝑡𝑜𝑐ℎ𝑎𝑛𝑛𝑒𝑙1, 𝑠𝑤𝑖𝑡𝑐ℎ𝑡𝑜𝑐ℎ𝑎𝑛𝑛𝑒𝑙2, . . . , 𝑠𝑤𝑖𝑡𝑐ℎ_𝑡𝑜_𝑐ℎ𝑎𝑛𝑛𝑒𝑙𝑀 , 𝑡𝑟𝑎𝑛𝑠𝑚𝑖𝑡_𝑑𝑎𝑡𝑎} , as a

set of actions. At a particular time and a particular state, the CBS will take action according to learning results to either switch channel or transmit,

(3) 𝑅⃗ = {5, −15} as a set of rewards which is the estimate for spectrum usage availability on a CBS agent. If a PU’s activity occurs in the spectrum shared by any CR user, and in the slot same selected for transmission, then a high penalty of -15 is assigned. If a packet is successfully transmitted from the sender to receiver, and a reward of +5 is assigned, which is found experimentally to give the best results.

At time t, the state of CBS i is 𝑠𝑡, 𝑠𝑡 ∈ 𝑆𝑃⃗⃗⃗⃗ . In the next time slot, it first senses all spectrums. Then renews the state-action value according to reinforcement learning scheme. Finally it takes action according to the situation of 𝑠𝑡. Figure 2 illustrates the proposed process.

Performance Evaluation

In this section, we evaluate the performance of our CBS with reinforcement learning (CR-RL) scheme with a comparison to CBS with Round-Robin (CR-RR) scheme. The Round-robin (RR) scheme is a typical way in GSM-R system which employs the principle that once a spectrum is not available, the agent switches to next channel in equal portions and in circular order. Figure 3 shows the simulation topology. Below, we report results of our experiment that is performed in our NS-2 platform. Table 1 lists the simulation parameters and their values.

CBS agent

CBS agent

CBS agent

CBS agent PU agents

PU agents

PU agents PU agents

CR agent

[image:3.595.59.538.373.543.2]

Figure 3. The cognitive base station within high-speed-rail transportation.

Table 1. Simulation parameter.

(4)

(a)An example about the distribution of spectrums occupancy on CBS with 5 spectrums.

(b) Average rewards for 5 spectrum bands (c) Average rewards for 10 spectrum bands

(d) Cumulative number of channel switching for 20 spectrum bands

[image:4.595.140.454.68.231.2]
(5)

Conclusion

Applying CR to railway is cutting edge of the world. To improve the performance of wireless communication in high-speed rail, we propose to embed CR technology into base station. Our experiment shows that CBS with reinforcement learning scheme can reduce frequency hopping.

Acknowledgement

We thank the financial support from National Science Foundation (NNSF No. 61471252).

References

[1] Sun T, Zhou K, Luo X, et al. Research on the Fast Handover Algorithms of GSM-R for High-Speed Railway[C]//Network and Information Systems for Computers (ICNISC), 2015 International Conference on. IEEE, 2015: 213-218.

[2] Akyildiz I F, Lee W Y, Vuran M C, et al. Next generation/dynamic spectrum access/cognitive radio wireless networks: a survey[J]. Computer networks, 2006, 50(13): 2127-2159.

[3] Haykin S. Cognitive radio: brain-empowered wireless communications[J]. IEEE journal on selected areas in communications, 2005, 23(2): 201-220.

[4] Berbineau M, Masson E, Cocheril Y, et al. Cognitive radio for high speed railway through dynamic and opportunistic spectrum reuse[J]. Transport Research Arena, Paris (April 2014), 2014.

[5] Wu C, Wang Y M, Qiang X, et al. Adaptive Spectrum Management of Cognitive Radio in Intelligent Transportation System[C]//Applied Mechanics and Materials. Trans Tech Publications, 2015, 743: 765-773.

[6] Tingting G, Bin S. A high-speed railway mobile communication system based on LTE[C]//Electronics and Information Engineering (ICEIE), 2010 International Conference On. IEEE, 2010, 1: V1-414-V1-417.

[7] Sniady A, Soler J. An overview of GSM-R technology and its shortcomings[C]//ITS Telecommunications (ITST), 2012 12th International Conference on. IEEE, 2012: 626-629.

[8] Chkirbene Z, Hamdi N. A survey on spectrum management in cognitive radio networks[J]. International Journal of Wireless and Mobile Computing, 2015, 8(2): 153-165.

[9] Lee W Y, Akyildiz I F. Spectrum-aware mobility management in cognitive radio cellular networks[J]. IEEE Transactions on Mobile Computing, 2012, 11(4): 529-542.

[10]Wu C, Chowdhury K, Di Felice M, et al. Spectrum management of cognitive radio using multi-agent reinforcement learning[C]//Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: Industry track. International Foundation for Autonomous Agents and Multiagent Systems, 2010: 1705-1712.

[11]Sutton R S, Barto A G. Reinforcement learning: An introduction[M]. Cambridge: MIT press, 1998.

[12]Kaelbling L P, Littman M L, Moore A W. Reinforcement learning: A survey[J]. Journal of artificial intelligence research, 1996, 4: 237-285.

[13]Ozekin E E, Demirci F C, Alagoz F. Self-evaluating reinforcement learning based spectrum management for cognitive Ad Hoc networks[C]//The International Conference on Information Networking 2013 (ICOIN). IEEE, 2013: 444-449.

Figure

Figure 2. The Q-Learning process on CBS mode.
Table 1. Simulation parameter.
Figure 4. CBS simulations with RL and RR schemes.

References

Related documents

An execution context contains a collection of (key, value) associations representing the local variables and functions, a prototype whose members can be accessed through the this

Customer Information- processing Customer-related digital real-time data use Customer knowledge process Customer responsiveness Competitor responsiveness Technological

13 R & D Development of the Rodin platform supporting Event-B Automatic refinement, as developed by Siemens TS in Atelier B Deployment of Event-B in industry Development of

The high-frequency variation in the risk-neutral 10-year yield that is allowed for (but not assumed) by the RW model is arguably one of its strengths—indeed, Gürkaynak, Sack,

Thus, changeability and diagnosticity should represent boundary conditions for self-affirmation, such that negative feedback should be changeable and diagnostic for self-

The comparison is done to determine the schedule system that would be optimal for powering base stations at the given loads range through: (1) levelised cost of energy (LCOE);

(2004): International labour mobility and uneven regional development in Europe, European Urban and Regional Studies, 11(1): 27-46... (2002): International petty trading:

Efforts to Prevent Errors and Fraud The Department of Labor & Industries (L&I) wants to help employers, workers and medical providers avoid making mistakes that are costly