How To Test For A Null Hypothesis

(1)

TESTING THE ROULETTE

WHEEL

Mihael Perman University of Ljubljana

Osijek, June 4th, 2009

(2)

SOME QUOTATIONS

The generation of random numbers is too important to be left to chance.

Robert R. Coveyou, Oak Ridge National Laboratory, ZDA

The only way to win at roulette is to steal chips when the croupier looks the other way.

Albert Einstein

With this roulette betting system, you will bet on the almost even money bets. You will bet on black or red or even or odd. You will not bet on the high-odds choices.

All you need to win with this system is to win 7 out of 20 times.

Internet, 1995

Roulette is the most glamorous of all the casino games.

An air of elegance surrounds the roulette table, and its spinning wheel seems to be a perfect agent by which the goddess of fortune may intevene in the affairs of mor- tal men. How much superior is this unapproachable mechanistic device to those games like dice and cards, where human hands may tamper with fate.

But more than glamour, the game presents to me a certain irresistible challenge. The roulette is intended

(3)

to be a symmetrical gambling device, the odds for which always favour the house. In the long run, it would appear that a player must inevitably lose. But due to a certain degree of asymmetry in the wheel’s production, or due to its later wear, the odds may shift enough to favor a player on certain bets. The shrewd observer may spot such a case and actually be able to play a winning game. Herein lies the challenge.

Allan N. Wilson, Tha Casino Gambler’s Guide.

(4)

Unless we inconvenience ourselves by staying a long time in the casino to increase the sample of spins, how may we distinguish statistically a true weak biased number from the false random winners that even- tually fluctuate all over the place and through which we lose? This questions I have put to two professors of mathematics, experts in roulette theory and play, whom I quote elsewhere in this book, and they both de- clare sadly that they have thus far no statistical method to offer as a practical solution. For the greater success of biased-wheel play, let’s hope that some day a solution may be found.

Russel T. Barnhart, Beating the Wheel

(5)

THE PROBLEM

The roulette wheel in principle generates random numbers uniformly distributed on the set {0, 1, 2, . . . , 36}.

Mechanical imperfections or wilful manipulation can lead to deviations from uniformity in various ways. Gambling houses are interested in statistics that would detect such deviations as soon as possible with the smallest probability of false alarms.

The reasons why “quality control” is desirable are the following:

• The odds offered by the house should be those ad- vertised.

• Skilled enough groups could take advantage of deviations if they notice them before the supervisors of the house. Relatively small deviations can “nudge”

the expected gain into the positive.

• Quality control should include the “human factor”.

Croupiers are human and could potentially cheat in collusion with gamblers.

(6)

MATHEMATICAL FORMULATION

In statistical terms the problems is formulated as fol- lows:

• We have observations X¹, X², . . . from the roullette wheel taking values in {0, 1, 2, . . . , 36}. We will assume that the observations are independent.

There are two main objectives:

• We would like to test the hypothesis (possibly sequentially)

H0 : X1, X2, . . . ∼ Uniform{0, 36} against H¹ : X¹, X², . . . ∼/ Uniform{0, 36} .

The question is what test statistics to choose and how to decide whether to reject or accept the null–

hypothesis.

• We would like to detect a “change point”. The observations can start out as uniform but change to another distribution as we are collecting the observations. How does one detect such a change?

(7)

THE CLASSICAL χ² TEST The usual χ² test is the first idea to try.

Notation:

• Nn^k is the frequency of outcome k after n spins of the wheel.

• p⁰ is the probability of each outcome under the null–

hypothesis.

• m = 37.

We compute

χ² = ^m−1^X

k=0

(N_k − np⁰)² np⁰ .

(8)

0 5 10 15 20 25 30 35 40 0

0.005 0.01 0.015 0.02 0.025 0.03

Probability distribution over cells

Fig. 1 Probability distribution for a “hanging” wheel

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000

10 20 30 40 50 60 70

Number of spins Trajectory of CHI statistic

Fig. 1a Behaviour of χ²–statistics for a “hanging” wheel

(9)

0 5 10 15 20 25 30 35 40 0

0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04

Fig. 2 Probability distribution for a “dented” wheel

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000

0 20 40 60 80 100 120 140

Number of spins

Trajectories of CHI statistic and the likelihood ratio statistic

Fig. 2a Behaviour of χ²–statistics for a “dented” wheel

(10)

0 5 10 15 20 25 30 35 40 0

0.005 0.01 0.015 0.02 0.025 0.03 0.035

Fig. 3 Probability distribution for a “nicked” wheel

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000

0 10 20 30 40 50 60

Number of spins

Trajectories of CHI statistic and the likelihood ratio statistic

Fig. 3a Behaviour of χ²–statistics for a “nicked” wheel

(11)

0 5 10 15 20 25 30 35 40 0

0.005 0.01 0.015 0.02 0.025 0.03

Perfect probability distribution over cells

Fig. 4 Probability distribution for a perfect wheel

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

x 10⁴ 0

10 20 30 40 50 60

Number of spins

Trajectories of CHI statistics and likelihood ratio statistic

Fig. 4a Behaviour of χ²–statistics and the likelihood ratio statistic for a perfect wheel

(12)

CENTRAL LIMIT THEOREM

The distributions of statistics to be used are all derived from a simple observations based on the central limit theorem.

The vector

√np1 ⁰(N_n¹−np⁰, N_n²−np⁰, . . . , N_n^m−np⁰) → N^d ^m(0, Σ)

where Σ = I − 11^T/m.

Remark: Multivariate normal vectors with the above distribution are easy to simulate on the computer. One only needs to simulate

(Z¹ − ¯Z, Z² − ¯Z, . . . , Z_m − ¯Z)

where the Z¹, Z², . . . , Zm are independent standard nor- mals.

(13)

MAIN ALTERNATIVE HYPOTHESES The alternative hypotheses we will consider:

0 5 10 15 20 25 30 35 40

0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04

Celica

Fig. 5 Dented wheel. The payoff for betting on triplets is 1:11. In the case shown betting on the first three cells

gives an expected payoff of 0.294.

(14)

0 5 10 15 20 25 30 35 40 0

0.005 0.01 0.015 0.02 0.025 0.03

Cells

Fig. 6 Hanging wheel. The payoff for betting on triplets is 1:11. In the case shown betting on the best three cells

gives an expected payoff of 0.044.

(15)

CHOICE OF STATISTICS

We would like to devise statistics which would be bet- ter traps for the given types of faults.

The χ²–test for the multinomial are based on the ex- pression

χ² = ^X(Observed_i − Expectedⁱ)² Expected_i

where the index i refers to cell number i. Normally the cells would be non–overlapping but that is because only in that case one can obtain analytical expressions for the limit distribution.

• We want to monitor sectors of three, five or seven. A plausible choice to monitor, say, triplets would be

CHI3 = ^m−1^X

k=0

(N_n^k + N_n^k+1 + N_n^k+2 − 3np⁰)²

3np⁰ .

Here we interpret k + 1 and k + 2 modulo m − 1.

(16)

• Another plausible statistic is the maximum deviation from the expected frequency of a “cell” which can also be a sector of three, five or seven adjacent pockets on the wheel.

M AX3 = max

0≤k<m

N_n^k + N_n^k+1 + N_n^k+2 − 3np⁰

√np0

.

• Yet another alternative is the likelihood ratio test.

One looks at the quantity

LRAT IO = log(sup_pPp(Observed values) Pp0(Observed values) ) . As we observe more and more outcomes Wilks’ theorem asserts that 2× LRAT IO converges in distribution to a χ²(36).

REMARK: As we do everything sequentially we actually observe stochastic processes of various statistics and need to keep that in mind. So is there a “process” version of Wilks’ theorem?

(17)

DISTRIBUTIONS OF STATISTICS

The distributions were obtained by simulation. Here are the distributions of a sample of test statistics.

• CHI3 is the χ²–like statistic for triplets.

• MAX1 is a suitably standardised maximal positive deviation from the expected frequences of cells.

• MAX3 is a suitably standardised maximal positive deviation from the expected frequences for triplets.

0 50 100 150 200 250 300 350 400

0 2000 4000 6000 8000 10000 12000 14000 16000 18000

CHI3

Density

Distribution of CHI3

Fig. 7 Distribution of CHI3 statistic

(18)

0 1 2 3 4 5 6 0

2000 4000 6000 8000 10000 12000 14000 16000 18000

MAX1

Density

Distribution of MAX1

Fig. 8 Distribution of MAX1 statistics’

1 2 3 4 5 6 7 8 9

0 2000 4000 6000 8000 10000 12000 14000 16000

MAX3

Density

Distribution of MAX3

Fig. 9 Distribution of MAX3 statistics’

(19)

TRAJECTORIES FOR THE CHOSEN STATISTICS The next few slides show various trajectories of CHIx and M AXx statistics:

• Trajectories for an honest wheel.

• Trajectories for a dented wheel.

• Trajectories for a hanging wheel.

• Trajectories in two cases of real data.

LEGEND:

• CHI1 or MAX1

• CHI3 or MAX3

• CHI5 or MAX5

• CHI7 or MAX7

(20)

Perfect wheel

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

x 10⁴ 0

50 100 150 200 250 300 350 400 450 500

Number of spins

Trajectories of CHIx statistics with 95% tresholds

Fig. 10 Trajectories of CHIx statistics for a prefect wheel.

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

x 10⁴ 1

2 3 4 5 6 7 8

Number of spins

Trajectories of MAXx statistics with 95% tresholds

Fig. 10a Trajectories of M AXx statistics for a prefect wheel.

(21)

Dented wheel

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 0

200 400 600 800 1000 1200 1400 1600 1800 2000

Number of spins

Fig. 11 Trajectories of CHIx statistics for a dented wheel.

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 0

2 4 6 8 10 12 14 16 18

Number of spins

Fig. 11a Trajectories of M AXx statistics for a dented wheel.

(22)

Hanging wheel

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 0

200 400 600 800 1000 1200 1400

Number of spins

Fig. 12 Trajectories of CHIx statistics for a hanging wheel.

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 1

2 3 4 5 6 7 8 9 10

Number of spins

Fig. 12a Trajectories of M AXx statistics for a hanging wheel.

(23)

Wheel AR04

0 1000 2000 3000 4000 5000 6000 7000 8000 9000

0 50 100 150 200 250 300 350 400 450 500

Number of spins Trajectories of CHI statistics

Fig. 13 Trajectories of CHIx statistics for wheel AR04.

0 1000 2000 3000 4000 5000 6000 7000 8000 9000

1 2 3 4 5 6 7 8

Number of spins Trajectories of MAX statistics

Fig. 13a Trajectories of M AXx statistics for wheel AR04.

(24)

Wheel HISPAR02

0 1000 2000 3000 4000 5000 6000 7000 8000 9000

0 200 400 600 800 1000 1200 1400 1600

Statistike CHIx za cilinder HISPAR02 s 95% pragom

Fig. 14 Trajectories of CHIx statistics for wheel HISPAR02.

0 1000 2000 3000 4000 5000 6000 7000 8000 9000

0 2 4 6 8 10 12 14

Statistike MAXx za cilinder HISPAR02 s 95% pragom

Fig. 14a Trajectories of M AXx statistics for wheel HISPAR02.

(25)

−5 0 5 10 15 20 25 30 35 40 0

0.005 0.01 0.015 0.02 0.025 0.03 0.035

Empirical probability distribution over cells

Fig. 14c Empirical probability distribution over cells for wheel HISPAR02.

(26)

SEQUENTIAL TESTS

It is a luxury to assume a fixed number of observations.

There are several reasons for that:

• We never now why data collection has stopped. Was that independent of teh outcomes? Usually that is not the case.

• The house wants to stop a table as soon as there is enough evidence that something is wrong.

(27)

MARTINGALES

The idea of a sequential test is that we reject the null- hyposthesis as soon as possible given the significance level α. But when is as soon as possible?

• One possible solution is to observe that the trans- forms

ˆ

χ²_k = k(χ²_k − mx(1 − x/m)) ,

where x is the width of the sector are MARTIN- GALES under the null-hypothesis.

• For martingales one has MAXIMAL INEQUALITIES.

Under the null-hypothesis we can say P ( max

1≤k≤n

k

n( ˆχ²_k − mx(1 − x/m))⁺ ≥ a) ≤ E^h( ˆχ²_n)^q⁺ⁱ

a^q ,

where x⁺ is the positive part, and q ≥ 1.

(28)

TESTS

The test now does the following: choose an appropriate q ≥ 1 and a > 0, and reject the null as soon as the maximal inequality is violated.

One needs to calibrate the tests. As an example one gets when α = 0.01

- For sectors of width x = 1 choose q = 8 and a = 29.97.

The constant q is choosen in such a way that it mini- mizes a.

(29)

EXAMPLES Hanging wheel

0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000

−100 0 100 200 300 400 500 600 700 800

Stevilo iger

Trajektorije transformiranih statistik TCHIx za cilinder, ki visi

Wheel AR04

0 1000 2000 3000 4000 5000 6000 7000 8000 9000

−50 0 50 100 150 200 250 300 350

Stevilo iger^v Potek transformiranih statistik TCHIx

(30)

THE KLOTZ STRATEGY

If the wheel is biassed there may be winning strategies.

One possible way is to maximize the expected logarithm of your winnings. This idea from economics produces an interesting strategy called the Klotz strategy.

The Klotz strategy is then combined with a Baysian estimate of probabilities of certain outcomes in the sense that

ˆ

p_i = ni + α n + nα .

The parameter α may be interpreted as “caution”. The higher it is, the less we are inclined to get exited by seem- ingly more probable outcomes.

(31)

EXAMPLES

Here are some simulated and some real examples. In all cases we take α = 100 and α = 200.

Slightly biassed wheel

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 0

500 1000 1500 2000 2500 3000 3500 4000

Potek kapitala pri previdnosti 100

Igra

Kapital

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 0

500 1000 1500 2000 2500 3000 3500 4000

Potek kapitala pri previdnosti 200

Igra

Kapital

More seriously biassed wheel

0 1000 2000 3000 4000 5000 6000 7000 8000 9000

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8

2x 10⁵ Potek kapitala pri previdnosti 100

Igra

Kapital

0 1000 2000 3000 4000 5000 6000 7000 8000 9000

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8

2x 10⁵ Potek kapitala pri previdnosti 200

Igra

Kapital

(32)

Real wheel AR04

0 1000 2000 3000 4000 5000 6000 7000 8000

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000

Kapital

Igra Potek kapitala na cilindru 0, previdnost=100

0 1000 2000 3000 4000 5000 6000 7000 8000

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000

Potek kapitala na cilindru 0, previdnost=200

Igra

Kapital

Real wheel HISPAR02

0 1000 2000 3000 4000 5000 6000 7000 8000

0 1 2 3 4 5 6 7 8 9

10x 10⁴ Potek kapitala na cilindru 1, previdnost=100

Igra

Kapital

0 1000 2000 3000 4000 5000 6000 7000 8000

0 1 2 3 4 5 6 7 8 9

Igra

Kapital

(33)

Real wheel Cammegh

0 1000 2000 3000 4000 5000 6000 7000 8000

0 1 2 3 4 5 6 7 8 9 10x 10⁴

Kapital

0 1000 2000 3000 4000 5000 6000 7000 8000

0 1 2 3 4 5 6 7 8 9 10x 10⁴

Kapital

Real wheel HISPAR04

0 1000 2000 3000 4000 5000 6000 7000 8000

0 1 2 3 4 5 6 7 8 9

Kapital

Igra

0 1000 2000 3000 4000 5000 6000 7000 8000

0 1 2 3 4 5 6 7 8 9 10x 10⁴

Igra

Kapital

Potek kapitala na cilindru 4, previdnost=200

(34)

TESTING

One possible idea is to use the Klotz strategy as a test statistics. If the optimal player starts winning too much we reject the null-hypothesis. But what is too much?

Again we observe a few facts:

• Under the null-hypothesis the current capital of the player is a non-negative supermartingale so it converges to a finite limit.

• The supremum of the entire capital trajectory is a finite random variable.

• One can either try to find an analytic estimate of the distribution of the maximum or simulate.

• Here is the simulated distribution.

• The advantage is that the p-values have the meaning in terms of money. It is not easy to get across simple statistical ideas to the end-user.

(35)

The distribution of the maximum

0 10 20 30 40 50 60 70

0 50 100 150 200 250 300 350

Maximum

(36)

CONCLUDING REMARKS Main points?

• One has to focus on certain types of alternative hypotheses. The entire space is just to big.

• The classical χ²–test does a poor job.

• If one takes marginal distributions of trajectories as approximations to the “right” critical values one has to proceed by simulation.

Remaining questions?

• Are the statistics chosen the right ones?

• What are the rules for deciding? In particular, what are the right critical values for individual statistics?

Is it correct to just look at the marginal distribution?

Or does one have to consider the entire trajectory?

• If one were to test sequentially what is the right de- cision rule?

• Is there a “process version” of Wilks’ theorem?

• What can one say about the asymptotic behaviour of the test statistics? Do they converge under the null-hypothesis?