Comparing Outcomes from Two Experiments - Testing Hypotheses about Outcomes of Experiments

E XAMPLE 1.42: B AYESIAN NETWORK

22. Joint probability distribution

2.4 Testing Hypotheses about Outcomes of Experiments

2.4.5 Comparing Outcomes from Two Experiments

Suppose that we want to test the hypothesis that two samples are drawn from dif-ferent populations. To fix ideas, consider that we are comparing two systems—a system currently in use and a system that incorporates a new algorithm—on the basis of a particular performance metric. We assume that we can collect perfor-mance metrics from each system multiple times to obtain two samples. If the sys-tems do not differ to a statistically significant degree, both samples would be drawn from the same underlying population, with the same population mean, and there-fore would have similar statistics, such as the sample mean. However, if the statis-tics are significantly different, we infer that the two samples are likely to be drawn from different populations and that the new algorithm does indeed affect the per-formance of the system.

The null hypothesis is the statement

H₀: the two systems are identical

We reject H₀if it is sufficiently unlikely that the two samples are drawn from the same population, because this will result in the conservative position that the new algorithm does not affect the performance of the system.

Suppose that we collect n sets of samples from the first system, labeled A, to get sample means a₁, a₂,...,a_n and collect m sets of samples from the second system, labeled B, to get sample means b₁, b₂,...,b_m. Let the means of these means be denoted a and b with corresponding variances m₂(a) and m₂(b).

If n=m, we define an auxiliary random variable C = A – B, which takes values c_i = a_i – b_i. Then, we redefine the hypothesis as

H₀: the population mean of C is zero

This can be easily tested using the approach described in Section 2.4.4.

EXAMPLE 2.9: COMPARING TWO SAMPLES

Suppose that you are using simulations to study the effect of buffer size at some network queue on packet loss rate. You would like to see whether increasing the buffer size from 5 packets to 100 packets has a significant effect on loss rate. To do so, suppose that you run ten simulations for each buffer size, resulting in loss rates as follows:

Loss rate with

5 Buffers ^1.20% ^2.30% ^1.90% ^2.40% ^3.00% ^1.80% ^2.10% ^3.20% ^4.50% ^2.20%

Loss rate with

100 Buffers ^0.10% ^0.60% ^1.10% ^0.80% ^1.20% ^0.30% ^0.70% ^1.90% ^0.20% ^1.20%

ptg7913109

2.4 Testing Hypotheses about Outcomes of Experiments 77

Does the buffer size have a significant influence on the packet loss rate?

Solution:

Note that each loss rate measurement in each simulation is itself a sample mean. Therefore, these loss rates can be assumed to be distributed approxi-mately normally. Denoting by a_i the loss rate with a buffer of size 5 packets and by b_i the loss rate with a buffer size of 100 packets, we define the auxil-iary variable that takes values (1.2 – 0.1), (2.3 – 0.6),..., (2.2 – 1.2), so that c is given by

c = {1.1, 1.7, 0.8, 1.6, 1.8, 1.5, 1.4, 1.3, 4.3, 1.0}

We compute the sample mean as 1.65 and sample variance m₂as 0.87, so that the unbiased estimator for the population variance is given by (n/n – 1)m₂= 0.97.

The variance of the sample mean is given by m₂/n = 0.097, corresponding to a standard deviation of 0.31. Because the number of values is smaller than 20, we use the t distribution with 9 degrees of freedom to compute the confidence inter-val at the 95% level as 1.65 ± 0.70. This interinter-val does not include 0. Thus, we con-clude that the change in the buffer size does significantly affect the loss rate.

If , the situation is somewhat more complex. We first use m₂(a) to compute the confidence interval for A’s performance metric around its sample mean a and similarly use m₂(b) to compute the confidence interval for B’s performance metric around its sample mean b, using the normal, or t, distribution, as appropriate).

Now, one of the following two cases holds.

1. The confidence intervals do not overlap. Recall that with 95% (or 99%) confi-dence, A’s and B’s population means lie within the computed confidence inter-vals. If the null hypothesis were true and the population means coincided, it must be the case that either A’s population mean or B’s population mean lies outside its computed confidence interval. However, this has a probability lower than 5% (1%). Therefore, we reject the hypothesis.

2. The confidence intervals overlap. In this case, there is some chance that the samples are drawn from the same population. The next steps depend on whether we can make one of two assumptions: (a) the population variances are the same, or (b) n and m are both large.

If the population variances are the same, we define the auxiliary variable s by

(EQ 2.21)

ptg7913109 Then, it can be shown that if the two samples are drawn from the same population,

the variable c defined by

(EQ 2.22)

is a standard t variable (i.e., with zero mean and unit variance) with m + n – 2 degrees of freedom. Therefore, we can use a t test to determine whether c has a zero mean, using the approach in Section 2.4.4.

EXAMPLE 2.10: COMPARING TWO SAMPLES OF DIFFERENT SAMPLE SIZES

Continuing with Example 2.9, assume that we have additional data points for the simulation runs with five buffers, as follows. Can we still claim that the buffer size plays a role in determining the loss rate?

Solution:

Here, m = 15 and n = 10, so we cannot use the approach of Example 2.9.

Instead, we will first compute the mean and confidence intervals of both sam-ples to see whether the intervals overlap. It is easily found that, at the 95%

level, using a t distribution with 14 and 9 degrees of freedom, respectively, the confidence intervals are 0.81 ± 0.25 and 0.81 ± 0.40, which overlap. However, the sample variances are not the same, and there is a good chance that the pop-ulation variances also differ. Nevertheless, for the purpose of this example, we will make the assumption that the population variances are the same. There-fore, we compute s² using Equation 2.21 as 2.89, so that s = 1.7. We then use Equation 2.22 to compute the standard t variate c as 0.0033/(1.7(1/15 + 1/10)^1/2)

= 0.0048. Since this has unit variance, it is easy to see using the t test with 23 degrees of freedom that 0 lies in the confidence interval for c, which implies that, with this data set, buffer size has no statistically significant effect on packet loss rate.

If population variances differ, but m and n are both large, it can be shown that the variable c defined by

Loss Rate with 5 Buffers

0.20% 0.30% 0.90% 1.40% 1.00% 0.80% 1.10% 0.20% 1.50% 0.20% 0.50% 1.20% 0.70% 1.30% 0.90%

Loss Rate with 100

Buffers 0.10% 0.60% 1.10% 0.80% 1.20% 0.30% 0.70% 1.90% 0.20% 1.20%

c a–b s 1

m--- 1 n ---+

---=

ptg7913109

2.4 Testing Hypotheses about Outcomes of Experiments 79

(EQ 2.23)

is a standard normal variable (i.e., with a zero mean and unit variance). Therefore, we can use a standard normal test to determine whether c has a zero mean, using the approach discussed in Section 2.4.4.

If neither assumption can be made, it is difficult to draw meaningful compari-sons, other than by using nonparametric tests, such as the Mann-Whitney U test, which is beyond the scope of this text.

2.4.6 Testing Hypotheses about Quantities Measured on

In document Mathematical Foundations of Computer Networking (Page 95-98)