• No results found

3.3 Heuristic Schemes for Baseline Comparison

3.3.2 Spectrum Sensing

The opportunistic spectrum sensing scheme described by the flowchart in Figure 3.6 represents a typical cognitive radio approach to DSA, such as those introduced in Sub- section 2.2.2. There, a cognitive eNB has the capability of sensing the interference

File arrival Retransmit later

Assign the subchannel

Sense interference on random available subchannel Mark subchannel as unavailable

Interference below threshold? Any subchannels available? Yes Yes No No

Figure 3.6: Flow diagram of the spectrum sensing based opportunistic spectrum access scheme used for baseline comparison

levels on the subchannels of interest before making spectrum assignment decisions. It chooses a subchannel at random, and senses the interference level on it. If the interfer- ence level is below an admission threshold, the subchannel is assigned, otherwise the interference level is sensed on another randomly selected subchannel.

The key parameter in this scheme is the admission threshold, i.e. the maximum amount of interference allowed on the subchannel for it to be deemed safe and eligible for assignment. Figure 3.7 shows how the probability of retransmission in the stadium network varies at different traffic loads and with different values of the interference threshold measured in dB relative to the receiver noise floor. Every data point repre- sents the mean result of 50 simulations using identical parameters but different random seeds, with the error bars showing the minimum and maximum of the correspond- ing 50 values. Similarly to the dynamic ICIC parameters investigated in Subsection 3.3.1, a trade-off between the system performance at low and high traffic loads has to be achieved. The plot shows that the optimal value for the interference threshold significantly increases, as the offered traffic increases. Similarly to the MNRSS and the RNTP threshold for dynamic ICIC, low interference threshold values in spectrum sensing impose greater restrictions on subchannel selection resulting in better quality links. However, as the traffic load increases it becomes less feasible due to the increase in inter-cell interference levels and the lack of such high quality links. In those cases

10-2 10-1 100

6 8 10 12 14 16 18 20

Probability of retransmission

Interference detection threshold, dB 0.6 Gbps offered traffic 0.9 Gbps offered traffic 1.2 Gbps offered traffic

Figure 3.7: Probability of retransmission at the stadium network using the spectrum sensing based DSA scheme with different interference detection thresholds

relaxing the subchannel assignment constraints by raising the interference threshold improves the QoS. All further experiments that employ the spectrum sensing scheme depicted in Figure 3.6 for baseline comparison use an 11 dB interference threshold which is low enough to ensure good QoS at low and medium traffic loads, yet high enough not to cause excessive performance degradation at higher traffic loads.

3.4

Conclusion

This chapter described the methodology used for empirical evaluation of intelligent DSA methods proposed in this thesis. A stadium temporary event scenario, that in- volves a heterogeneous cognitive cellular system and an incumbent LTE network, is used as the basis for the detailed system-level simulation model of a wireless environ- ment. The key metrics used to assess the performance of the simulated DSA algo- rithms are the probability of retransmission, mean and 5th percentile user throughput and the overall system throughput density. A standard LTE interference management solution and a spectrum sensing based DSA scheme, typical for CR networks, are used for baseline comparison in the simulation experiments discussed in the rest of this thesis.

Chapter 4.

Distributed Q-Learning Based Dy-

namic Spectrum Access

Contents

4.1 Intelligent Dynamic Spectrum Access . . . 63 4.1.1 Reinforcement Learning . . . 64 4.1.2 Distributed Stateless Q-Learning . . . 65 4.2 Choice of the Learning Rate . . . 67 4.2.1 Win-or-Learn-Fast Variable Learning Rate . . . 67 4.2.2 Performance Comparison Using Different Learning Rates . 70 4.2.3 Temporal Performance . . . 71 4.2.4 Comparison with Heuristic Schemes . . . 72 4.3 Q-Learning Based Dynamic Spectrum Sharing . . . 73 4.3.1 Spectrum Occupancy Analysis . . . 74 4.3.2 Spatial Distribution of User Throughput . . . 75 4.3.3 Primary and Secondary User Quality of Service . . . 76 4.4 Conclusion . . . 79

4.1

Intelligent Dynamic Spectrum Access

An emerging state-of-the-art technique for intelligent DSA is reinforcement learning (RL); a machine learning technique aimed at building up solutions to decision prob- lems only through trial-and-error, discussed in detail in Section 2.3. It has been suc- cessfully applied to a range of DSA problems and scenarios, such as cognitive radio networks [43], small cell networks [7] and cognitive wireless mesh networks [18]. The most widely used RL algorithm in both artificial intelligence and wireless com- munications domains is Q-learning [94]. Therefore, most of the literature on RL based

DSA focuses on Q-learning and its variations, e.g. [18][102]. Furthermore, this thesis investigates distributed Q-learning based DSA. The distributed Q-learning approach has advantages over centralised methods in that no communication overhead is in- curred to achieve the learning objective, and the network operation does not rely on a single computing unit. It also allows for easier insertion and removal of base stations from the network, if necessary. For example, such flexible opportunistic protocols are well suited to disaster relief and temporary event networks, where rapidly deployable architectures with variable topologies are required to supplement any local wireless infrastructure, such as the cognitive wireless network introduced in Section 3.1. In pure distributed RL based DSA the task of every base station (BS) is to learn to prioritise among the available subchannels only through trial-and-error, with no fre- quency planning involved, and with no information exchange with other BSs. In this way, frequency reuse patterns emerge autonomously using distributed artificial intel- ligence with no requirement for any prior knowledge of a given environment. The rest of the section revisits the main principle behind RL and introduces the distributed Q-learning algorithm used as the basis for all work presented in this thesis.

4.1.1

Reinforcement Learning

RL is a model-free type of machine learning which is aimed at establishing the de- sirability of taking any available action in any state of the environment only through trial-and error [87]. This desirability of an action is represented by a numerical value known as the Q-value - the expected cumulative reward for taking a particular action in a particular state, as shown in the equation below:

Q(s, a) = E " T X t=0 γtrt # (4.1)

where Q(s, a) is the Q-value of action a in state s, rtis the numerical reward received t time steps after action a is taken in state s, T is the total number of time steps until the end of the learning process or episode, and γ ∈ [0, 1] is a discount factor.

The task of an RL algorithm is to estimate Q(s, a) for every action in every state, which is then stored in an array known as the Q-table. In some cases where an environment

does not have to be represented by states, only the action space and a 1-dimensional Q- table Q(a) can be considered [21]. The job of an RL algorithm then becomes simpler; it aims to estimate an expected value of a single reward for each action available to the learning agent:

Q(a) = E[rt] (4.2)