Initial and Final Performance

6.5 Simulation Results

6.5.2 Initial and Final Performance

Figure 6.7 shows the difference in initial and final P(re − tx) performance of these schemes at a wide range of traffic loads. It is plotted against the system throughput density to evaluate both the QoS and the system capacity in the same graphs. The initial P(re − tx) in Figure 6.7a is calculated using the first 20,000 transmissions, and the final P(re−tx) in Figure 6.7b is calculated from the last 20,000 file transmissions. The overall simulation length is 1,000,000 file transmissions. Every data point represents the mean result of 50 different simulations at a given traffic load with the error bars showing the minimum and maximum P(re − tx) in those simulations.

10-3 10-2 10-1 100 10 20 30 40 50 60 Probability of retransmission

System throughput density, Gbps/km2

Dynamic ICIC Q-learning DIAQ

(a) Initial probability of retransmission

10-3 10-2 10-1 100 10 20 30 40 50 60 Probability of retransmission

System throughput density, Gbps/km2

Dynamic ICIC Q-learning DIAQ

(b) Final probability of retransmission

Figure 6.7: Initial and final probabilitiy of retransmission using pure ICIC, pure Q-learning and distributed ICIC accelerated Q-learning (DIAQ) at different system throughput densities

Figure 6.7a shows that the dramatic improvement in initial performance using DIAQ instead of the classical Q-learning approach is consistent at most traffic loads. DIAQ introduces a 29-69% reduction in the initial probability of retransmission at system throughput densities below 45 Gbps/km2

. Only at ultra-high system throughput densities does the difference in their performance become negligible. DIAQ also shows a significantly better performance in initial and final probability of retransmission, compared to the dynamic ICIC scheme. Furthermore, the latter only supports system throughput densities of up to 48 Gbps/km2

, whereas DIAQ and Q-learning are significantly more robust at extremely high offered traffic densities. They both manage to support system throughput densities of up to 58 Gbps/km2

. This demonstrates that it is better to take opportunistic spectrum assignment decisions, based on reinforcement learning, instead of blocking transmissions based on ICIC signalling, since the probability of a subchannel not being occupied by any of the neighbouring eNBs tends to zero. In these cases, the heuristic ICIC approach “blindly” blocks most file transmissions, whereas Q-learning is still capable of providing some insight into which subchannels could result in successful transmissions.

6.6 Conclusion

The novel DIAQ scheme proposed in this chapter combines distributed RL and stan- dardized ICIC signalling in the LTE downlink, using the framework of HARL. It is theoretically evaluated using a novel extension of the Bayesian network model proposed in Chapter 5, which explains a predicted improvement in convergence behaviour achieved by DIAQ, compared to classical distributed RL. Large scale simulation exper- iments of a stadium small cell network show that it provides superior QoS compared to a typical heuristic ICIC approach and a state-of-the-art distributed RL based approach. It achieves a significantly lower probability of retransmission and supports higher system throughput densities of up to 58 Gbps/km2

. A comparison of the probability of retransmission time response characteristics of DIAQ and pure distributed Q-learning reveals a dramatic improvement in performance at the initial stage of learning, a 29% to 69% improvement ranging across all but ultra-high traffic loads, due to the use of heuristics for guiding the exploration process. This result confirms the theoretical pre-

dictions made using the Bayesian network model of the algorithm. DIAQ also exhibits excellent final performance and convergence speed. The dramatic improvements in the initial performance and convergence speed achieved by the heuristic acceleration of the learning process significantly increases the adaptability of the distributed RL based approach to DSA, since the cognitive eNBs are able to adapt to each other’s dynamically changing policies considerably faster. Finally, the DIAQ scheme is designed to comply with the current LTE standards. Therefore, it allows easy implementation of robust distributed machine intelligence for full self-organisation in existing commer- cial networks.

Chapter 7. Robust Intelligent Dynamic Spec-

trum Sharing

7.1 Motivation . . . 107 7.2 HARL for Dynamic Spectrum Sharing . . . 108 7.2.1 Spectrum Monitoring . . . 110 7.2.2 Spectrum Occupancy Estimation . . . 110 7.2.3 REM Based Heuristic Function . . . 111 7.2.4 Superimposed Heuristic Functions . . . 112 7.2.5 Q-Value Based Admission Control . . . 113 7.2.6 HARL Algorithms for Spectrum Sharing . . . 114 7.2.7 Choice of Parameters . . . 116 7.3 Simulation Results . . . 117 7.3.1 Spectrum Occupancy Analysis . . . 118 7.3.2 Primary User Quality of Service . . . 119 7.3.3 Statistical Analysis . . . 120 7.3.4 Temporal Performance . . . 123 7.4 Conclusion . . . 123

7.1 Motivation

The key feature of the novel distributed ICIC accelerated Q-learning (DIAQ) scheme proposed in Chapter 6 is the use of heuristic spectrum awareness information for a significant increase in the adaptability and robustness of distributed RL based DSA in terms of the QoS convergence behaviour. The purpose of this chapter is to report on the novel application of the HARL framework to a more complex DSA problem where

the cognitive cellular system shares spectrum with other independent primary and secondary wireless networks. The dynamic spectrum sharing (DSS) scenario described in Subsection 3.1.1 represents a relevant and realistic context for this problem and is used for the development and evaluation of the novel algorithms described in this chapter. The heuristic acceleration for the RL based DSA algorithms developed in this chapter is provided by a dynamically updated spectrum usage database, also known as the radio environment map (REM), which is a commonly used component in secondary cognitive wireless networks [60]. In previous work on combining RL and dynamic spectrum databases, such as REMs, researchers have considered employing RL algorithms solely for obtaining information that can be stored in these databases, e.g. [17][56]. There appears to be no evidence of previous work in the literature on using REM databases to enhance the performance of RL based DSA and DSS algorithms.

In document Accelerating Reinforcement Learning for Dynamic Spectrum Access in Cognitive Wireless Networks (Page 104-108)

6.5 Simulation Results

6.5.2 Initial and Final Performance

6.6

Conclusion

Chapter 7.

Robust Intelligent Dynamic Spec-

trum Sharing

Contents

7.1

Motivation