Design your experiments to maximize the information content in

Creating Analysis-Friendly Data

Guideline 6.2 Design your experiments to maximize the information content in

the data: aim for clear views of simple relationships.

These options are not available in every experiment, of course. There is usually a practical limit on the number of trials and on the largest problem size that can be measured. If the data will not cooperate despite your best design efforts, consider the techniques surveyed in the next two sections.

6.1 Variance Reduction Techniques

A variance reduction technique (VRT) modiﬁes the test program in a way that reduces variance in the measured outcomes, on the theory that less variance yields better views of average-case costs. These techniques really do reduce variance in

data: if the goal of the experiment is to understand variance as it occurs naturally, do not apply VRTs.

To illustrate how these techniques work, Section 6.1.1 presents a case study to compare two algorithms for a problem in dynamic searching. The ﬁrst experiment utilizes two straightforward implementations of the algorithms and reports both mean costs and some statistics to assess variance. Next, three variance reduction techniques, called common random variates, control variates, and con-

ditional expectation, are applied, and the outcomes are compared to those from

the ﬁrst experiment. Some tips on applying these VRTs to general algorithms are also presented. Section 6.1.2 discusses some additional VRTs and their general application.

6.1.1 Case Study: VRTs for self-organizing sequential search rules

The self-organizing sequential search problem is to maintain a list ofn distinct

keys under a series ofm requests for keys. The cost of each request is equal to

the position of the key in the list, which is the cost of searching for the key using a linear search from the front. Figure 6.3 shows an example list containing keys 1. . . 6 in positions 1 through 6: a request for key 3 in this list would have cost 5.

The list is allowed to reorder itself by some rule that tries to keep frequently requested keys near the front to reduce total search cost, but the rule must work without any information about future requests. Two popular rules, illustrated in Figure 6.3, are:

• Move-to-Front (MTF): After key k is requested, move it to the front of the list, and shift everything between one space back.

• Transpose (TR): After key k is requested, move it one position closer to the front by transposing with its predecessor.

Nothing happens ifk is already at the front of the list. Transpose is more conser-

vative since keys change position incrementally, while MTF is more aggressive in moving a requested key all the way to the front. Which rule does a better job?

List: 5 2 1 6 3 4 3 5 2 1 6 4 5 2 1 3 6 4 1 2 3 4 5 6 MTF: TR: Request = 3,Cost = 5

Figure 6.3. Self-organizing sequential search. The request for key 3 has cost 5 because that key is in position 5 in the list. The MTF rule moves the requested key to the front of the list; the TR rule transposes the key with its predecessor.

Average-Case Experimental Analysis

We develop some notation to analyze these rules. Suppose requests are generated independently at random according to a probability distribution P (n)= p1,p2,. . . pn deﬁned on n keys. The request cost for key k in position L[i] in

a given list is equal to its positioni. The average list cost for list L depends on the

request costs and request probabilities for each key:

C(L)= n

i=1

i· pL[i]. (6.1)

The average cost of a rule is the expected cost of themth_{request, assuming that} L is initially in random order (each permutation equally likely) and that the rule is

applied to a sequence of random requestsRi,i= 1...m − 1, generated according

to distributionP (n). Let μ(n, m) denote the average cost of MTF and let τ (n, m)

denote the average cost of TR in this model.

The experiments described here measure costs for requests drawn from Zipf ’s distribution, which has been used to model naturally occurring phenomena such as frequencies of words or letters in texts. This distribution, denotedZ(n), is deﬁned

over the integers 1. . . n. The probability that key k∈ 1...n is requested next is

given by

p(k)= 1 kHn

, (6.2)

whereHnis thenthharmonic number, deﬁned byHn=

j=11/j . (Multiplying byHnscales the probabilities so they sum to 1.) A picture of Zipf’s distribution

forn= 9 is shown in the following.

.3535

1 2 3 4 5 6 7 8 9

Key 1 is generated with probabilityp1= 0.3535. Key 2 appears half as often as key 1, key 3 appears one-third as often as key 1, and so on.

SequentialSearchTest (n, m, R, trials) { for (t=1; t<= trials; t++) {

L = randomPermutation (n); for (i=1; i<=m; i++) {

k = randomZipf(n); for (j=1; L[j] != k; j++); cost = j; reorder(R, L, j); // R = MTF or TR } printCost(R, t, n, m, cost); } }

Figure 6.4. Sequential Search test code. This test program prints the cost of themthrequest in each trial, assuming keys are generated according toZ(n) and the list is reordered by either MTF or TR.

There are no known formulas for calculatingμ(n, m) and τ (n, m) under this

distribution, so we develop experiments to study these average costs.Atest program for this purpose is sketched in Figure 6.4. In each random trial, the code generates an initial listL that contains a random permutation of the keys. Then it generates a random sequence of keys according to Zipf’s distribution: for each key, it looks up the request in the list, records the cost, and reorders the list according to the rule. At the end the program reports the cost of themth_request.

Code to generate random permutations may be found in Section 5.2.2 of this text, and two methods for generating random variates according to Zipf’s distribution are described in Section 5.2.4. C language test programs for both MTF and TR can be downloaded from AlgLab.

The experiment runst random trials of this program at each design point (n, m).

The random variateMi(n, m), which can take any value in 1 . . . n, denotes the cost

of MTF reported in theith_{trial at this design point. The sample mean at a design}

point, denotedM(n, m), is the average of t outcomes:

M(n, m)=1 t t i=1 Mi(n, m). (6.3)

The expectation E[Mi(n, m)] of a random variate such as Mi(n, m) is the

weighted average of all possible outcomes, with each outcome weighted by its (unknown) probability of occurring. Since we assume these variates are generated

according to some distribution with meanμ(n, m), it must hold that

E[Mi(n, m)] = μ(n,m). (6.4)

We say that variateMi(n, m) is an estimator of μ(n, m), because its expectation

equalsμ(n, m). The sample mean M(n, m) from an experiment is likely to be close

toμ(n, m), and it is sometimes possible to quantify how close.

We will also be interested in the sample variance, a statistic that describes the dispersion of points away from their mean, deﬁned by

V ar(M(n, m))=1 t t i=1 (Mi(n, m)− M(m,m))2. (6.5)

LetTi(n, m), T (n, m), and Var(T (n, m)) denote the analogous quantities for the

Transpose rule.

Although experiments are developed here to study theoretical questions, it is worth pointing out that self-organizing search rules, especially MTF, are of interest in many practical contexts. For example, most caching and paging algorithms keep track of elements in least-recently-used (LRU) order, which is identical to MTF order. When used in applications that require lookups in linearly organized data, these algorithms are sometimes more efﬁcient than even binary search, for example, when the key distribution is skewed toward a small number of frequent requests, or when the request sequence exhibits temporal locality. Most of the variance reduction techniques illustrated here apply equally well to theory-driven or application-driven experiments.

The First Experiment

Rivest [15] showed that for any nonuniform request distribution such as Zipf’s distribution, Transpose has lower asymptotic cost, but Move-to-Front reaches its asymptote more quickly. We know thatμ(n, 1)= τ(n,1) because the initial list

is randomly ordered; furthermore, Rivest’s result implies that there is a crossover pointmcsuch that

μ(n, m) < τ (n, m) when 1 < m < mc μ(n, m) > τ (n, m) when mc≤ m.

Our ﬁrst experimental goal is to locate the crossover pointmc.

Figure 6.5 shows the outcome of the ﬁrst experiment, measuring Mi(50, m)

and Ti(50, m) in 25 random trials at each design point n = 50 and m =

1, 101, 201,. . . 1001. The lines connect the sample means M(50, m) and T (50, m)

0 200 400 600 (a) (b) 800 1000 0 200 400 600 800 1000 50 40 30 20 10 0 50 40 30 20 10 0 m M m T

Figure 6.5. The cost of the mth request. Panel (a) shows measurements of request costsMi(50, m)

for Move-to-Front; panel (b) shows measurements of request costsTi(50, m) for Transpose. Both

experiments take 25 random trials each atm= 1,101,201...1001, with requests generated by Zipf’s distribution onn= 50 keys. The lines connect sample means in each column of data.

The sample means and sample variances for the rightmost data columns in each panel(n= 50,m = 1001) appear in the following table. These statistics are

calculated according to formulas (6.3) and (6.5).

Statistical methods for expressing our conﬁdence in how well the sample means estimate the distribution means μ(n, m) and τ (n, m) are described in Section

7.1.2. One method is to calculate 95-percent conﬁdence intervals for M(n, m)

andT (n, m), which are also shown in the table. If certain assumptions about the

data sample hold, the conﬁdence intervals will contain the true means,μ(50, 1001)

andτ (50, 1001), in 95 out of 100 experiments.

Mean Var 95% Conf. MTF 15.6 159.75 [10.64, 20.56] TR 11.4 195.83 [5.91, 16.89]

The huge variance in the data creates wide conﬁdence intervals with ranges near 10, which indicate thatM(n, m) and T (n, m) might be as far as±5 from their true

means. Since the conﬁdence intervals overlap, we cannot say with any certainty whetherμ(50, 1001) < τ (50, 1001) at this point.

It should be pointed out that the “certain assumptions about the data” mentioned previously and explained in detail in Section 7.1.2 only partially hold in these experiments. Confidence intervals are reported here for comparison purposes, to illustrate how variance reduction yields stronger results. Do not place too much confidence in these confidence intervals for estimation purposes.

In document 9cgmv.A.Guide.to.Experimental.Algorithmics.pdf (Page 195-200)