Creating Analysis-Friendly Data
Guideline 6.2 Design your experiments to maximize the information content in
the data: aim for clear views of simple relationships.
These options are not available in every experiment, of course. There is usually a practical limit on the number of trials and on the largest problem size that can be measured. If the data will not cooperate despite your best design efforts, consider the techniques surveyed in the next two sections.
6.1 Variance Reduction Techniques
A variance reduction technique (VRT) modifies the test program in a way that reduces variance in the measured outcomes, on the theory that less variance yields better views of average-case costs. These techniques really do reduce variance in
data: if the goal of the experiment is to understand variance as it occurs naturally, do not apply VRTs.
To illustrate how these techniques work, Section 6.1.1 presents a case study to compare two algorithms for a problem in dynamic searching. The first experi- ment utilizes two straightforward implementations of the algorithms and reports both mean costs and some statistics to assess variance. Next, three variance reduction techniques, called common random variates, control variates, and con-
ditional expectation, are applied, and the outcomes are compared to those from
the first experiment. Some tips on applying these VRTs to general algorithms are also presented. Section 6.1.2 discusses some additional VRTs and their general application.
6.1.1 Case Study: VRTs for self-organizing sequential search rules
The self-organizing sequential search problem is to maintain a list ofn distinct
keys under a series ofm requests for keys. The cost of each request is equal to
the position of the key in the list, which is the cost of searching for the key using a linear search from the front. Figure 6.3 shows an example list containing keys 1. . . 6 in positions 1 through 6: a request for key 3 in this list would have cost 5.
The list is allowed to reorder itself by some rule that tries to keep frequently requested keys near the front to reduce total search cost, but the rule must work without any information about future requests. Two popular rules, illustrated in Figure 6.3, are:
• Move-to-Front (MTF): After key k is requested, move it to the front of the list, and shift everything between one space back.
• Transpose (TR): After key k is requested, move it one position closer to the front by transposing with its predecessor.
Nothing happens ifk is already at the front of the list. Transpose is more conser-
vative since keys change position incrementally, while MTF is more aggressive in moving a requested key all the way to the front. Which rule does a better job?
List: 5 2 1 6 3 4 3 5 2 1 6 4 5 2 1 3 6 4 1 2 3 4 5 6 MTF: TR: Request = 3,Cost = 5
Figure 6.3. Self-organizing sequential search. The request for key 3 has cost 5 because that key is in position 5 in the list. The MTF rule moves the requested key to the front of the list; the TR rule transposes the key with its predecessor.
Average-Case Experimental Analysis
We develop some notation to analyze these rules. Suppose requests are gener- ated independently at random according to a probability distribution P (n)= p1,p2,. . . pn defined on n keys. The request cost for key k in position L[i] in
a given list is equal to its positioni. The average list cost for list L depends on the
request costs and request probabilities for each key:
C(L)= n
i=1
i· pL[i]. (6.1)
The average cost of a rule is the expected cost of themthrequest, assuming that L is initially in random order (each permutation equally likely) and that the rule is
applied to a sequence of random requestsRi,i= 1...m − 1, generated according
to distributionP (n). Let μ(n, m) denote the average cost of MTF and let τ (n, m)
denote the average cost of TR in this model.
The experiments described here measure costs for requests drawn from Zipf ’s distribution, which has been used to model naturally occurring phenomena such as frequencies of words or letters in texts. This distribution, denotedZ(n), is defined
over the integers 1. . . n. The probability that key k∈ 1...n is requested next is
given by
p(k)= 1 kHn
, (6.2)
whereHnis thenthharmonic number, defined byHn=
n
j=11/j . (Multiplying byHnscales the probabilities so they sum to 1.) A picture of Zipf’s distribution
forn= 9 is shown in the following.
.3535
0
1 2 3 4 5 6 7 8 9
Key 1 is generated with probabilityp1= 0.3535. Key 2 appears half as often as key 1, key 3 appears one-third as often as key 1, and so on.
SequentialSearchTest (n, m, R, trials) { for (t=1; t<= trials; t++) {
L = randomPermutation (n); for (i=1; i<=m; i++) {
k = randomZipf(n); for (j=1; L[j] != k; j++); cost = j; reorder(R, L, j); // R = MTF or TR } printCost(R, t, n, m, cost); } }
Figure 6.4. Sequential Search test code. This test program prints the cost of themthrequest in each trial, assuming keys are generated according toZ(n) and the list is reordered by either MTF or TR.
There are no known formulas for calculatingμ(n, m) and τ (n, m) under this
distribution, so we develop experiments to study these average costs.Atest program for this purpose is sketched in Figure 6.4. In each random trial, the code generates an initial listL that contains a random permutation of the keys. Then it generates a random sequence of keys according to Zipf’s distribution: for each key, it looks up the request in the list, records the cost, and reorders the list according to the rule. At the end the program reports the cost of themthrequest.
Code to generate random permutations may be found in Section 5.2.2 of this text, and two methods for generating random variates according to Zipf’s distribution are described in Section 5.2.4. C language test programs for both MTF and TR can be downloaded from AlgLab.
The experiment runst random trials of this program at each design point (n, m).
The random variateMi(n, m), which can take any value in 1 . . . n, denotes the cost
of MTF reported in theithtrial at this design point. The sample mean at a design
point, denotedM(n, m), is the average of t outcomes:
M(n, m)=1 t t i=1 Mi(n, m). (6.3)
The expectation E[Mi(n, m)] of a random variate such as Mi(n, m) is the
weighted average of all possible outcomes, with each outcome weighted by its (unknown) probability of occurring. Since we assume these variates are generated
according to some distribution with meanμ(n, m), it must hold that
E[Mi(n, m)] = μ(n,m). (6.4)
We say that variateMi(n, m) is an estimator of μ(n, m), because its expectation
equalsμ(n, m). The sample mean M(n, m) from an experiment is likely to be close
toμ(n, m), and it is sometimes possible to quantify how close.
We will also be interested in the sample variance, a statistic that describes the dispersion of points away from their mean, defined by
V ar(M(n, m))=1 t t i=1 (Mi(n, m)− M(m,m))2. (6.5)
LetTi(n, m), T (n, m), and Var(T (n, m)) denote the analogous quantities for the
Transpose rule.
Although experiments are developed here to study theoretical questions, it is worth pointing out that self-organizing search rules, especially MTF, are of interest in many practical contexts. For example, most caching and paging algorithms keep track of elements in least-recently-used (LRU) order, which is identical to MTF order. When used in applications that require lookups in linearly organized data, these algorithms are sometimes more efficient than even binary search, for example, when the key distribution is skewed toward a small number of frequent requests, or when the request sequence exhibits temporal locality. Most of the variance reduction techniques illustrated here apply equally well to theory-driven or application-driven experiments.
The First Experiment
Rivest [15] showed that for any nonuniform request distribution such as Zipf’s distribution, Transpose has lower asymptotic cost, but Move-to-Front reaches its asymptote more quickly. We know thatμ(n, 1)= τ(n,1) because the initial list
is randomly ordered; furthermore, Rivest’s result implies that there is a crossover pointmcsuch that
μ(n, m) < τ (n, m) when 1 < m < mc μ(n, m) > τ (n, m) when mc≤ m.
Our first experimental goal is to locate the crossover pointmc.
Figure 6.5 shows the outcome of the first experiment, measuring Mi(50, m)
and Ti(50, m) in 25 random trials at each design point n = 50 and m =
1, 101, 201,. . . 1001. The lines connect the sample means M(50, m) and T (50, m)
0 200 400 600 (a) (b) 800 1000 0 200 400 600 800 1000 50 40 30 20 10 0 50 40 30 20 10 0 m M m T
Figure 6.5. The cost of the mth request. Panel (a) shows measurements of request costsMi(50, m)
for Move-to-Front; panel (b) shows measurements of request costsTi(50, m) for Transpose. Both
experiments take 25 random trials each atm= 1,101,201...1001, with requests generated by Zipf’s distribution onn= 50 keys. The lines connect sample means in each column of data.
The sample means and sample variances for the rightmost data columns in each panel(n= 50,m = 1001) appear in the following table. These statistics are
calculated according to formulas (6.3) and (6.5).
Statistical methods for expressing our confidence in how well the sample means estimate the distribution means μ(n, m) and τ (n, m) are described in Section
7.1.2. One method is to calculate 95-percent confidence intervals for M(n, m)
andT (n, m), which are also shown in the table. If certain assumptions about the
data sample hold, the confidence intervals will contain the true means,μ(50, 1001)
andτ (50, 1001), in 95 out of 100 experiments.
Mean Var 95% Conf. MTF 15.6 159.75 [10.64, 20.56] TR 11.4 195.83 [5.91, 16.89]
The huge variance in the data creates wide confidence intervals with ranges near 10, which indicate thatM(n, m) and T (n, m) might be as far as±5 from their true
means. Since the confidence intervals overlap, we cannot say with any certainty whetherμ(50, 1001) < τ (50, 1001) at this point.
It should be pointed out that the “certain assumptions about the data” mentioned previously and explained in detail in Section 7.1.2 only partially hold in these experiments. Confidence intervals are reported here for comparison purposes, to illustrate how variance reduction yields stronger results. Do not place too much confidence in these confidence intervals for estimation purposes.