• No results found

Hit Probability and Permutation Distance

5. ACCURACY VS LEARNING RATE OF MULTI-LEVEL CACHING AL-

5.3 A-LRU Algorithm

5.3.2 Hit Probability and Permutation Distance

Since the τ -distance characterizes how accurately an algorithm learns the popu- larity distribution, a smaller τ -distance should correspond to a larger hit probability. Computation of the τ -distance is complex, since it is a function of all possible per- mutations over the content items. But we can illustrate how different algorithms perform using a content library size of n = 20. Figure 5.18 compares the τ -distance and hit probabilities of various caching algorithms. The points on each curve corre- spond to cache size of 2, 3, 4, 5 from left to right. Since the cache size should be an integer, we partition the cache for A-LRU such that the size of cache 1 is always 1, and the remaining cache size is allocated to cache 2. From Figure 5.18, we can see that the τ -distance and hit probability follow the same rule, i.e., a smaller τ -distance corresponds to a larger hit probability, which is as expected.

Figure 5.19: Learning error of various caching algorithms under the IRM ar- rival process. number of requests hit probability RANDOM LRU FIFO CLIMB 2-LRU ARC A-LRU

Figure 5.20: Hit probability of various caching algorithms under the IRM ar- rival process.

5.3.3 Learning Error

We use the learning error defined in Equation (4.23) of Section 4.7 in Section 4 to compare the performance of different caching algorithms, as illustrated in Figure 5.19. We use a small cache size for these simulations, since computing all permutations becomes prohibitively complex quickly. However, (n = 20, m = 4) serves to illustrate the main insights. The learning error of various algorithms as a function of the number of requests received is shown in Figure 5.19, where the y-axis is shown with a logarithmic scale. Note also that the result in Figure 5.19 used a version of the A-LRU algorithm with a time-dependent selection of cache divisions that is discussed fully under the heading “Dynamic A-LRU” below. We see that LRU decreases fast initially and then levels off, whereas 2-LRU has a slower decay rate, but the eventual error is lower than that of LRU. This corresponds to faster mixing of LRU but a poorer eventual accuracy (τ -distance) as compared to 2-LRU. The ARC algorithm has a good performance initially, but it too levels off to an error larger than 2-LRU. The A-LRU algorithm with cache partitions (1, 3) picks a combination of accurate

learning and fast mixing, and is able to attain a low learning error quickly.

The effects seen in Figure 5.19 are also visible in the evolution of hit probabilities shown in Figure 5.20, where the x-axis is on a logarithmic scale. Here, we choose (n = 150, m = 30) in order to explore a range of cache partitions for A-LRU from (0, 30)–(30, 0). We compare the upper envelope of achievable hit probability by A- LRU with various other caching algorithms. We find that for any given learning time (requests), there is a cache partition such that A-LRU will attain a higher hit probability after learning for that time. These effects become more pronounced as the partition space (cache size) available for A-LRU increases.

Dynamic A-LRU: Whereas in our description of A-LRU, we use a fixed parti- tioning parameter β, the algorithm (and an implementation of it) can easily consider time-varying β values. For the sake of argument, we consider a k levels A-LRU with a sequence of χs such that χ1, χ2,· · · , χk becomes 0 as the number of requests go to

infinity, satisfying (i) P

tχi(t) → ∞; (ii)

P

tχ 2

i(t) <∞; and (iii) χi(t)/χi+1(t)→ 0.

Here, χ1 stands for the proportion of LRU to the rest, χ2 stands for the proportion

between 2-LRU and 3-LRU to the rest, etc. A typical choice of sequences will be χi(t) = m/(m + t

i+1

2i /ci), where t counts the number of requests and ci > 0 is a pa-

rameter to be varied. Under such setting, the βs in the previous definition of A-LRU satisfy that (i) at level i≤ k − 1, it is (1 − χ1(t))(1− χ2(t))· · · (1 − χi−1(t))χi(t), and

(ii) at level k, it is (1− χ1(t))(1− χ2(t))· · · (1 − χk(t)).

In particular, we consider the 2-level A-LRU shown in Figure 5.17. Here, the βs are β1(t) = m/(m + t/c) and β2(t) = 1− β1(t), where we take a common constant c

for simplicity. With such a sequence of βs, A-LRU will start at 1 (LRU) and (slowly) decrease to 0 (2-LRU). Under the setting (n = 150, m = 30), we choose different val- ues of the constant c = 3, 10, 15 for illustration, as shown in Figure 5.21. We observe

to learning accurately. Finally, we note that the results for A-LRU presented in Figure 5.19 used c = 600.

If the popularity distribution changes with time (in the next section), we should only consider constant β algorithms. These two distinctions follow from stochastic approximation ideas where while decreasing step-size algorithms can converge to optimal solutions in stationary settings, constant step-size algorithms provide good tracking performance for non-stationary settings.

A-LRU-c3 LRU 2-LRU A-LRU A-LRU-c10 A-LRU-c15 number of requests hit probability

Figure 5.21: Hit probability for A- LRU with time-varying β under IRM arrival process.

ratio=cache size/total number of unique items

Figure 5.22: Hit probabilities under Markov-modulated arrivals with ξ = 0.1.