PU Interference, I j - Q-Learning Channel Selection Scheme Performance

7.2 Q-Learning Channel Selection Scheme Performance

7.2.3 PU Interference, I j

The average analytical and experimental PU interference is plotted in Fig- ure 7.16. Each point in the graph is the mean of the PU interference in channels 1 and 2 during the experiment, plotted against the average PU utilization across all data channels. The PU interference in channel 3 is not included because, as mentioned in Section 5.5.1 due to equipment con- straints no receiver could be implemented to log the PU. Since the results comprise running the scenario for each channel utilization permutation and as all channels and users are homogeneous, it is justified to interpret the figures as also the average interference caused to each PU or all PUs.

Outside of a mean utilization of 0.1 the interference caused by Q- learning channel selection is significantly greater than random channel selection. The interference rate is on average 46% greater and the absolute

value decreases with increased mean utilization for random, rule-based and Q-learning channel selection. This is consistent with our interpreta- tion of the results regarding Listen-before-Talking sensing in Section 7.2.1. To repeat, Listen-before-Talking sensing cannot prevent the SU interfering with a packet broadcast while it is transmitting. However, subsequent packets in a packet train will be avoided which are more likely to occur at higher utilizations. The Q-learning scheme preferentially selects channels with a past history of guaranteeing successful packet transmission and is more likely to use those channels with least utilization. Isolated packets are more likely to be transmitted in this channel, resulting in the higher interference observed when compared to random channel selection where lower utilized channels are selected equally with other channels.

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0 2 4 6 8 10 12 14 16 % PU Packet Errors

Channel Utilisation by Primary Users (Mean)

Q−Learning (Expt) Q−Learning (Theor) Random (Expt) Random (Theor) Rule−Based (Expt) Ideal (Non−Deferred) Ideal (Deferred)

Figure 7.16: Mean PU interference against mean of the PU channel utilizations

Figure 7.17 graphs the average experimental interference. Additionally the interferences caused to each PU in each Q-learning run are plotted in- dividually. Points are colored by the PU utilization and show interference

CHAPTER 7. RESULTS 133 is greater in channels with lower utilization, with the same conclusion as for Figure 7.16. The interferences at points 0.1 and 0.9 on the x-axis, ag- gregating the performance when the channel utilization is [0.1,0.1,0.1] and [0.9,0.9,0.9], differs from interference elsewhere where the PU utilization is identical but the utilization among all channels is heterogeneous. The

per channel interference when the utilization is [0.1,0.1,0.1] is 11.4±0.9%

but the interference to a channel with the same 0.1 utilization can reach up to 39.09%, which occurs when the nominal mean utilization is 0.6 in the figure. Similarly when the utilization is [0.9,0.9,0.9] the interference is

2.2±0.3%, but this drops to 0.10% for an identical channel with 0.9 utiliza-

tion at mean utilization 0.7. This shows Q-learning channel selection in- creases interference to the channel with the least utilization in the network but reduces harmful effects caused to other channels. This is expected since Q-learning favors selection of the least utilization channels, thus in- creasing the number of PU packets lost in the channel due to greater usage. The maximum interference caused to a channel with utilization 0.1 is 343% greater as compared to [0.1,0.1,0.1]. The top channel, ranked by Q-value, is selected with probability 0.93, whereas if all channels have equal Q-values this is 0.33 so there is a 280% increase in usage. Channels with lower Q- values are selected for exploration with probability 0.033, thus there is less SU interference.

The analytical results in Figure 7.16 significantly deviate from the ob-

served interference. The R2 _{correlation between the analytical and ob-}

served random channel selection results is 0.5381 while it is -0.1449 for

the Q-learning scheme. The R2 _{values indicate the analysis does not fit}

the experimental results. Theory correctly predicts that interference will trend downwards with increased mean utilization and the Q-learning DCS scheme will have raised interference over random, both trivial results. However the analysis poorly models the trend shape and significantly overestimates the packets that will be lost. The steady-state model cannot account for the Q-learning scheme’s dynamic adaptation to local vari-

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0 5 10 15 20 25 30 35 40 % PU Packet Errors

Channel Utilisation by Primary Users (Mean) ← 0.1 ← 0.3 ← 0.6 ← 0.9 Q−Learning (Expt) Random (Expt) Rule−Based (Expt) Ideal (Non−Deferred) Ideal (Deferred) PU Util Variance 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Figure 7.17: PU interference against mean of the PU channel utilizations ations in channel occupancy which would lessen the impact the SU has on the PU, but this does not explain the discrepancy between random channel selection analytical and experimental results.

The analytical goodput accurately models the downward shape of the experimental goodput with utilization in Figure 7.15 but underestimates

the experimental values. The R2 _{correlation between analytical and ob-}

served random channel selection results is 0.9798 and 0.8935 for the Q- learning scheme, which is lower than the values for packet transmission success correlation in Figure 7.10.

The interference and goodput are derived from the analytical SU transmission outcome probabilities. It is likely the increased discrepancy is due to the propagated error in using the analytical result for the SU transmission outcome probability added to simplifications of the scenario made by the model. As discussed in Section 6.4 the model’s assumptions that PU traffic is perfectly M/D/1 and timing is distribution-less, which in partic- ular is used in finding the goodput and interference, are not realizable in

CHAPTER 7. RESULTS 135 the GNU Radio implementation. The accurate fit in the successful transmission probability in Figure 7.10 is achieved only after substituting into the model the empirical channel transmission outcome probabilities mea- sured at different PU utilizations.

In document Analysis and Implementation of Reinforcement Learning on a GNU Radio Cognitive Radio Platform (Page 143-147)