Statistical inference: example 1. Inferential Statistics

(1)

Inferential Statistics

POPULATION

SAMPLE

Statistical inference is the branch of statistics concerned with drawing conclusions and/or making decisions concerning a population based only on sample data.

187

Statistical inference: example 1

• A clothing store chain regularly buys from a supplier large quantities of a certain piece of clothing. Each item can be classified either as good quality or top quality. The agree- ments require that the delivered goods comply with standards predetermined quality. In particular, the proportion of good quality items must not exceed 25% of the total.

• From a consignment 40 items are extracted and 29 of these are of top quality whereas the remaining 11 are of good quality.

INFERENTIAL PROBLEMS:

1. provide an estimate of π and quantify the uncertainty associated with such estimate;

2. provide an interval of “reasonable” values for π;

3. decide whether the delivered goods should be returned to the supplier.

188

Statistical inference: example 1 Formalization of the problem:

• POPULATION:

all the pieces of clothes of the consignment;

• VARIABLE OF INTEREST:

good/top quality of the good (binary variable);

• PARAMETER OF INTEREST:

proportion of good quality items π;

• SAMPLE:

40 items extracted from the consignment.

The value of the parameter π is unknown, but it affects the sampling values. Sampling evidence provides information on the parameter value.

Statistical inference: example 2

• A machine in an industrial plant of a bottling company fills one-liter bottles. When the machine is operating normally the quantity of liquid inserted in a bottle has mean µ = 1 liter and standard deviation σ =0.01 liters.

• Every working day 10 bottles are checked and, today, the average amount of liquid in the bottles is ¯x = 1.0065 with s = 0.0095.

INFERENTIAL PROBLEMS:

1. provide an estimate of µ and quantify the uncertainty associated with such estimate;

2. provide an interval of “reasonable” values for µ;

3. decide whether the machine should be stopped and revised.

(2)

Statistical inference: example 2 Formalization of the problem:

• POPULATION:

“all” the bottles filled by the machine;

• VARIABLE OF INTEREST:

amount of liquid in the bottles (continuous variable);

• PARAMETERS OF INTEREST:

mean µ and standard deviation σ of the amount of liquid in the bottles;

• SAMPLE:

10 bottles.

The values of the parameters µ and σ are unknown, but they affect the sampling values. Sampling evidence provides information on the parameter values.

191

The sample

• Census survey: attempt to gather information from each and every unit of the population of interest;

• sample survey: gathers information from only a subset of the units of the population of interest.

Why using a sample?

1. Less time consuming than a census;

2. less costly to administer than a census;

3. measuring the variable of interest may involve the destruc- tion of the population unit;

4. a population may be infinite.

192

Probability sampling

A probability sampling scheme is one in which every unit in the population has a chance (greater than zero) of being selected in the sample, and this probability can be accurately determined.

SIMPLE RANDOM SAMPLING: every unit has an equal probability of being selected and the selection of a unit does not change the probability of selecting any other unit. For instance:

• extraction with replacement;

• extraction without replacement.

For large populations (compared to the sample size) the difference between these two sampling techniques is negligible. In the following we will always assume that samples are extracted with replacement from the population of interest.

Probabilistic description of a population

• Units of the population;

• variable X measured on the population units;

• sometimes the distribution of X is known, for instance (i) X ∼ N(µ, σ²);

(ii) X ∼Bernoulli(π).

(3)

Probabilistic description of a sample

• The observed sampling values are x₁, x₂, . . . , x_n;

• BEFORE the sample is observed the sampling values are unknown and the sample can be written as a sequence of random variables

X₁, X₂, . . . , X_n

• for simple random samples (with replacement):

1. X₁, X₂, . . . , X_n are i.i.d.;

2. the distribution of X_i is the same as that of X for every i = 1, . . . , n.

195

Sampling distribution of a statistic (1)

• Suppose that the sample is used to compute a given statistic, for instance

(i) the sample mean ¯X;

(ii) the sample variance S²;

(iii) the proportion P of units with a given feature;

• generically, we consider an arbitrary statistic T = g(X₁, . . . , X_n) where g(·) is a given function.

196

Sampling distribution of a statistic (2)

• Once the sample is observed, the observed value of the statistic is given by

t = g(x₁, . . . , x_n);

• suppose that we draw all possible samples of size n from the given population and that we compute the statistic T for each sample;

• the sampling distribution of T is the distribution of the population of the values t of all possible samples.

Normal population

• Suppose that X ∼ N(µ, σ²)

• in this case the statistics of interest are:

(i) the sample mean ¯X =1 n

Xn i=1

X_i

(ii) the sample variance S²= 1 n − 1

Xn i=1

(X_i− ¯X)²

• the corresponding observed values are

¯ x =1

n Xn i=1

x_i and s²= 1 n − 1

Xn i=1

(xi− ¯x)², respectively.

(4)

The sample mean

The sample mean is a linear combination of the variables forming the sample and this property can be exploited in the computation of

• the expected value of ¯X, that is E( ¯X);

• the variance of ¯X, that is VarX¯;

• the probability distribution of ¯X.

199

Expected value of the sample mean

For a simple random sample X₁, . . . , X_n, the expected value of X is¯

E( ¯X) = E

X₁+ X₂+ · · · + Xn

n

= 1

nE (X₁+ X₂+ · · · + Xn)

= 1

n[E(X₁) + E(X₂) + · · · + E(Xn)]

= 1

n(n × µ)

= µ

200

Variance of the sample mean

For a simple random sample X₁, . . . , X_n, the variance of ¯X is VarX¯ = Var

X₁+ X₂+ · · · + Xn

n

= 1

n²Var(X₁+ X₂+ · · · + Xn)

= 1

n²[Var(X₁) + Var(X₂) +· · · + Var(Xn)]

= 1

n²(n × σ²)

= σ² n

Sampling distribution of the mean

For a simple random sample X₁, X₂, . . . , Xn, the sample mean ¯X has

• expected value µ and variance σ²/n;

• if the distribution of X is normal, then

X ∼ N¯ µ,σ² n

!

• more generally, the central limit theorem can be applied to state that the distribution of ¯X is APPROXIMATIVELY normal.

(5)

The sample variance

• The sample variance is defined as

S² = 1 n − 1

Xn i=1

(X_i− ¯X)²

• if Xi∼ N(µ, σ²) then

(n − 1)S²

σ² ∼ χ²_n−1

• ¯X and S² are independent.

203

The chi-squared distribution

• Let Z1, . . . , Z_r be i.i.d. random variables with distribution N (0; 1);

• the random variable

X = Z₁²+ · · · + Zr²

is said to follow a CHI-SQUARED distribution with r degrees of freedom (d.f.);

• we write X ∼ χ²r;

• E(X) = r and Var(X) = 2r.

204

Counting problems

• The variable X is binary, i.e. it takes only two possible values; for instance “success” and “failure”;

• the random variable X takes values “1” (success) and “0”

(failure);

• the parameter of interest is π, the proportion of units in the population with value “1”;

• Formally, X ∼ Bernoulli(π) so that

E(X) = π and Var(X) = π(1 − π).

The sample proportion (1)

• Simple random sample X1, . . . , X_n;

• the variable X takes two values: 0 and 1, and the sample proportion is a special case of sample mean

P = P_n

i=1X_i n

• the observed value of P is p.

(6)

The sample proportion (2)

• The sample proportion is such that

E(P ) = π and Var(P ) =π(1 − π) n

• for the central limit theorem, the distribution of ¯X is approximatively normal;

• sometimes the following empirical rules are used to decide if the normal approximation is satisfying:

1. n × π > 5 and n × (1 − π) > 5.

2. np(1 − p) > 9.

207

Estimation

• Parameters are specific numerical characteristics of a population, for instance:

– a proportion π;

– a mean µ;

– a variance σ².

• When the value of a parameter is unknown it can be estimated on the basis of a random sample.

208

Point estimation

A point estimate is an estimate that consists of a single value or point, for instance one can estimate

• a mean µ with the sample mean ¯x;

• a proportion π with a sample proportion p;

a point estimate is always provided with its standard error that is a measure of the uncertainty associated with the estimation process.

Estimator vs estimate

• An estimator of a population parameter is

– a random variable that depends on sample information, – whose value provides an approximation to this unknown

parameter.

• A specific value of that random variable is called an estimate.

(7)

Estimation and uncertainty

• Parameter θ;

• the sampling statistics T = g(X1, . . . , Xn) on which estimation is based is called the estimator of θ and we write

θ = Tˆ

• the observed value of the estimator, t, is called an estimate of θ and we write ˆθ = t;

• it is fundamental to assess the uncertainty of ˆθ;

• a measure of uncertainty is the standard deviation of the estimator, that is SD(T ) = SD(ˆθ). This quantity is called the

STANDARD ERROR of ˆθ and denoted by SE(ˆθ).

211

Point estimation of a mean (σ² known)

• Consider the case where X1, . . . , X_n is a simple random sample from X ∼ N(µ, σ²);

• Parameters:

– µ, unknown;

– assume that the value of σ² is known.

• the sample mean can be used as estimator of µ: µ = ¯b X;

• the distribution of the estimator is normal with E(µ) = µb and Var(µ) =b σ²

n

• STANDARD ERROR µb

SE(µ) =_b σ

√n

212

Point estimation of a mean with σ² known: example In the “bottling company” example, assume that the quantity of liquid in the bottles is normally distributed. Then a point estimate of µ is

• µ = 1.0065b

• and the standard error of this estimate is

SE(ˆµ) = σ

√10=0.01

√10 = 0.0032

Point estimation of a mean (σ² unknown)

• Typically the value of σ² is not known;

• in this case we estimate it as ˆσ²= s²;

• this can be used, for instance, to estimate the standard error of ˆµ

SE(ˆd µ) = ˆσ

√n.

• In the “bottling company” example, if σ is unknown it can be estimated as

SE(ˆd µ) =0.0095

√10 = 0.0030

(8)

Point estimation for the mean of a non-normal population

• X1, . . . , Xn i.i.d. with E(X_i) = µ and Var(X_i) =σ²;

• the distribution of Xi is not normal;

• for the central limit theorem the distribution of ¯X is approximatively normal.

215

Point estimation of a proportion

• Parameter: π;

• the sample proportion P is used as an estimator of π

π = Pb

• this estimator is approximately normally distributed with E(π) = πb and Var(π) =_b π(1 − π)

n

• the STANDARD ERROR of the estimator is SE(π) =_b

sπ(1 − π) n

and in this case the value of standard error is never known.

216

Estimation of a proportion: example

For the “clothing store chain” example the estimate of the proportion π of good quality items is

• π =b 11

40= 0.275

• and an ESTIMATE of the standard error is

SE(ˆd π) =

s0.275(1 − 0.275)

40 = 0.07

Properties of estimators: unbiasedness

A point estimator ˆθ is said to be an unbiased estimator of the parameter θ if the expected value, or mean, of the sampling distribution of ˆθ is θ, formally if

E(ˆθ) = θ

Interpretation of unbiasedness: if the sampling process was re- peated, independently, an infinite number of times, obtaining in this way an infinite number of estimates of θ, the arithmetic mean of such estimates would be equal to θ. However, unbiasedness does not guarantees that the estimate based on one single sample coincides with the value of θ.

(9)

Point estimator of the variance

The sample variance S² is an unbiased estimator of the variance σ² of a normally distributed random variable

E(S²) = σ².

On the other hand ˜S² is a biased estimator of σ²

E( ˜S²) =(n − 1) n σ².

219

Bias of an estimator

Let ˆθ be an estimator of θ. The bias of ˆθ, Bias(ˆθ), is defined as the difference between the expected value of ˆθ and θ

Bias(ˆθ) = E(ˆθ) − θ

The bias of an unbiased estimator is 0.

220

Properties of estimators: Mean Squared Error (MSE) For an estimator ˆθ of θ the (unknown) estimation “error” is given by

|θ − ˆθ|

The Mean Squared Error (MSE) is the expected value of the square of the “error”

M SE(ˆθ) = E[(θ − ˆθ)²]

= Varθˆ+ [θ − E(ˆθ)]²

= Varθˆ+ Bias(ˆθ)²

Hence, for an unbiased estimator, the MSE is equal to the variance.

Most Efficient Estimator

• Let ˆθ₁ and ˆθ₂ be two estimator of θ, then the MSE can be use to compare the two estimators;

• if both ˆθ₁ and ˆθ₂ are unbiased then ˆθ₁ is said to be more efficient than ˆθ₂ if

Varθˆ₁< Varθˆ₂

• note that if ˆθ₁ is more efficient than ˆθ₂ then also M SE(ˆθ₁) <

M SE(ˆθ₂) and SE(ˆθ₁) < SE(ˆθ₂);

• the most efficient estimator or the minimum variance unbiased estimator of θ is the unbiased estimator with the small- est variance.

(10)

Interval estimation

• A point estimate consists of a single value, so that – if ¯X is a point estimator of µ then it holds that

P ( ¯X = µ) = 0

– more generally, P (ˆθ = θ) = 0.

• Interval estimation is the use of sample data to calculate an interval of possible (or probable) values of an unknown population parameter.

223

Confidence interval for the mean of a normal population (σ known)

• X1, . . . , X_n simple random sample with X_i∼ N(µ, σ²);

• assume σ known;

• a point estimator of µ is ˆ

µ = ¯X ∼ N µ,σ² n

!

• the standard error of the estimator is SE(ˆµ) = σ

√n

224

Before the sample is extracted...

• The sample distribution of the estimator is completely known but for the value of µ;

• the uncertainty associated with the estimate depends on the size of the standard error. For instance, the probability that ˆ

µ = ¯X takes a value in the interval µ ± 1.96× SE is 0.95 (that is 95%).

µ µ− 1 SE µ− 2 SE

µ− 3 SE µ + 1 SE µ + 2 SE µ + 3 SE

area 95%

P µ − 1.96 SE ≤ ¯X ≤ µ + 1.96 SE= 0.95

Confidence interval for µ

• The probability that ¯X belongs to the interval (µ − 1.96 SE, µ + 1.96 SE) is 95%;

• this can be also stated as: the probability that the interval ( ¯X − 1.96 SE, ¯X + 1.96 SE)

contains the parameter µ is 95%

✛ ✲

-1.96 SE +1.96 SE

µ µ_b

(11)

Formal derivation of the 95% confidence interval for µ (σ known) It holds that

X − µ¯

SE ∼ N (0, 1) where SE = σ

√n so that

0.95 = P −1.96 ≤X − µ¯

SE ≤ 1.96

!

= P^"−1.96 SE ≤ ¯X − µ ≤ 1.96 SE

= P^"− ¯X − 1.96 SE ≤ −µ ≤ − ¯X + 1.96 SE

= P^"X − 1.96 SE ≤ µ ≤ ¯¯ X + 1.96 SE

227

Confidence interval for µ with σ known: example

In the bottling company example, if one assumes σ = 0.01 known, a 95% confidence interval for µ is

1.0065 − 1.960.01

√10; 1.0065 + 1.960.01

√10

!

that is

(1.0065 − 0.0062; 1.0065 + 0.0062) so that

(1.0003; 1.0126)

228

After the sample is extracted...

On the basis of the sample values the observed value of µ = ¯b x is computed. ¯x may belong to the interval µ ± 1.96 × SE or not.

For instance

µ− 3 SE µ + 1 SE µ + 2 SE µ + 3 SE

area 95%

¯ x

and in this case ¯x belongs to the interval µ ± 1.96 × SE and, as a consequence, also the interval (¯x − 1.96 × SE; ¯x + 1.96 × SE) will contain µ.

a different sample...

A different sample may lead to a sample mean ¯x that, as in the example below, does not belong to the interval µ±1.96×SE and, as a consequence, also the interval (¯x − 1.96 × SE; ¯x + 1.96 × SE) will not contain µ.

µ− 1 SE µ µ− 2 SE

µ− 3 SE µ + 1 SE µ + 2 SE µ + 3 SE

area 95% x¯

The interval (¯x − 1.96 × SE; ¯x + 1.96 × SE) will contain µ for the 95% of all possible samples.

(12)

Interpretation of confidence intervals

Probability is associated with the procedure that leads to the derivation of a confidence interval, not with the interval itself.

A specific interval either will contain or will not contain the true parameter, and no probability involved in a specific interval.

Confidence intervals for five different samples of size n = 25, extracted from a normal population with µ = 368 and σ = 15.

231

Confidence interval: definition

A confidence interval for a parameter is an interval constructed using a procedure that will contain the parameter a specified proportion of the times, typically 95% of the times.

A confidence interval estimate is made up of two quantities:

interval: set of scores that represent the estimate for the parameter;

confidence level: percentage of the intervals that will include the unknown population parameter.

232

A wider confidence interval for µ

• Since it also holds that

P µ − 2.58 SE ≤ ¯X ≤ µ + 2.58 SE= 0.99

µ− 3 SE µ + 1 SE µ + 2 SE µ + 3 SE

area 99%

¯ x

µ− 3 SE µ + 1 SE µ + 2 SE µ + 3 SE

area 99% ¯x

• then the probability that ( ¯X −2.58 SE, ¯X +2.58 SE) contains µ is 99%.

Confidence level

The confidence level is the percentage associated with the interval. A larger value of the confidence level will typically lead to an increase of the interval width. The most commonly used confidence levels are

• 68% associated with the interval ¯X ± 1 SE;

• 95% associated with the interval ¯X ± 1.96 SE;

• 99% associated with the interval ¯X ± 2.58 SE.

Where the values 1, 1.96 and 2.58 are derived from the standard normal distribution tables.

(13)

Notation: standard normal distribution tables

• Z ∼ N(0, 1);

• α value between zero and one;

• zα value such that the area under the Z pdf between zα and +∞ is equal to α;

• formally

P (Z > z_α) = α and P (Z < z_α) = 1 − α

• furthermore

P (−z_α/2< Z < z_α/2) = 1 − α

235

Confidence interval for µ with σ known: formal derivation (1) It holds that

X − µ¯

SE ∼ N (0, 1) where SE = σ

√n

so that

Pµ − z_α/2SE ≤ ¯X ≤ µ + z_α/2SE= 1 − α

or, equivalently,

P −z_α/2≤X − µ¯

SE ≤ z_α/2

!

= 1 − α

236

Confidence interval for µ with σ known: formal derivation (2)

1 − α = P −z_α/2≤X − µ¯ SE ≤ z_α/2

!

= P−z_α/2SE ≤ ¯X − µ ≤ z_α/2SE

= P− ¯X − z_α/2SE ≤ −µ ≤ − ¯X + z_α/2SE

= PX − z¯ α/2SE ≤ µ ≤ ¯X + z_α/2SE

Confidence interval at the level 1 − α for µ with σ known A confidence interval at the (confidence) level 1 −α, or (1−α)%, for µ is given by

X − z¯ α/2SE; X + z¯ _α/2SE

Since SE =√^σn then

X − z¯ _α/2 σ

√n; X + z¯ _α/2 σ

√n

!

(14)

Margin of error

• The confidence interval

¯x ± z_α/2× σ

√n

• can also be written as ¯x ± ME where M E = z_α/2× σ

√n is called the margin of error.

• the interval width is equal to twice the margin of error.

239

Reducing the margin of error

M E = z_α/2× σ

√n

The margin of error can be reduced, without changing the ac- curacy of the estimate, by increasing the sample size (n ↑).

240

Confidence interval for µ with σ unknown

• ¯X ∼ N µ, σ²

√n

!

;

• X − µ¯

SE ∼ N(0, 1);

• in this case the standard error is unknown and needs to be estimated.

SE(ˆµ) = σ

√n is estimated by SE(ˆ^d µ) = ˆσ

√n where ˆσ = S

• and it holds that

X − µ¯

SEd ∼ t_n−1

The Student’s t distribution (1)

• For Z ∼ N(0; 1) and X ∼ χ²r, independent;

• the random variable

T = Z qX/r

is said to follow a Student’s t distribution with r degrees of freedom;

• the pdf of the t distribution differs from that of the standard normal distribution because it has “heavier tails”.

−4 −3 −2 −1 0 1 2 3 4

Student’s t normal

t₁ and N (0; 1) comparison.

(15)

The Student’s t distribution (2)

• For r −→ +∞ the Student’s t distribution converges to the standard normal distribution.

0.1 0.2 0.3 0.4

−5 −4 −3 −2 −1 0 1 2 3 4 5

...

t₂₅ and N (0; 1) comparison

243

Confidence interval for µ with σ unknown

1 − α = P −t_n−1,α/2≤X − µ¯

SEd ≤ t_n−1,α/2

!

= P−t_n−1,α/2SE ≤ ¯^d X − µ ≤ t_n−1,α/2SE^d

= P− ¯X − t_n−1,α/2SE ≤ −µ ≤ − ¯^d X + t_n−1,α/2SE^d

= PX − t¯ _n−1,α/2SE ≤ µ ≤ ¯^d X + t_n−1,α/2SE^d

where t_n−1,α/2 is the value such that the area under the t pdf, with n − 1 d.f. between t_n−1,α/2 and +∞ is equal to α/2.

Hence, a confidence interval at the level (1 − α) for µ is X − t¯ _n−1,α/2 S

√n; X + t¯ _n−1,α/2 S

√n

!

244

Confidence interval for µ with σ unknown: example For the bottling company example, if the value of σ is not known, then

s = 0.0095 e t_9;0.025= 2.2622

and a 95% confidence interval for µ is

1.0065 − 2.26220.0095

√10 ; 1.0065 + 2.26220.0095

√10

!

that is

(1.0065 − 0.0068; 1.0065 + 0.0068) so that

(0.9997; 1.0133)

Confidence interval for the mean of a non-normal population

• X1, . . . , X_n i.i.d. with E(X_i) = µ and Var(X_i) =σ²;

• the distribution of Xi is not normal;

• for the central limit theorem the distribution of ¯X is approximatively normal;

• if one uses the procedures described above to construct a confidence interval for µ the nominal confidence level of the interval is only an approximation of the true confidence level.

(16)

Confidence interval for π For the central limit theorem

P − π

SE ≈ N (0, 1) where SE =

sπ(1 − π) n so that

1 − α ≈ P

−z_α/2≤P − π

SE ≤ z_α/2

= P−z_α/2SE ≤ P − π ≤ z_α/2SE

= P−P − zα/2SE ≤ −π ≤ −P + zα/2SE

= PP − z_α/2SE ≤ π ≤ P + z_α/2SE

Since π is always unknown, it is always necessary to estimate the standard error.

247

Confidence interval for π: example

For the clothing store chain example, a 95% confidence interval for π is

11

40− 1.96 SE(ˆπ); 11

40+ 1.96 SE(ˆπ)

so that SE(ˆπ) =

sπ(1 − π)

n is estimated by SE(ˆ^d π) =

sˆπ(1 − ˆπ)

n where ˆπ = ¯x =11 40 and one obtains

11

40− 1.96 × 0.07 ; 11

40+ 1.96 × 0.07

so that

(0.137; 0.413)

248

Example of decision problem

Problem: in the example of the bottling company, the quality control department has to decide whether to stop the pro- duction in order to revise the machine.

Hypothesis: the expected (mean) quantity of liquid in the bottles is equal to one liter.

The standard deviation is assumed known and equal to σ = 0.01.

The decision is based on a simple random sample of n = 10 bottles.

Statistical hypotheses

A decisional problem in expressed by means of two statistical hypotheses:

• the null hypothesis H0

• the alternative hypothesis H1

the two hypotheses concern the value of an unknown population parameter, for instance µ,

( H₀: µ = µ₀ H₁: µ 6= µ0

(17)

Distribution of ¯X under H₀

If H₀ is true (that is under H₀) the distribution of the sample mean ¯X

• has expected value equal to µ0= 1;

• has standard error equal to SE = σ/√

10 = 0.00316.

• if X1, . . . , X₁₀is a normally distributed i.i.d. sample than also X follows a normal distribution, otherwise the distribution¯ of ¯X is only approximatively normal (by the central limit theorem).

0.9905 0.9937 0.9968 1.0000 1.0032 1.0063 1.0095

251

Observed value of the sample mean The observed value of the sample mean is ¯x.

¯

x is almost surely different form µ₀= 1.

under H0, the expected value of ¯X is equal to µ₀ and the difference between µ₀ and ¯x is uniquely due to the sampling error.

HENCE THE SAMPLING ERROR IS

x − µ ¯ 0

that is

observed value “minus” expected value

252

Decision rule

The space of all possible sample means is partitioned into a

• rejection region also said critical region;

• nonrejection region.

Outcomes and probabilities

There are two possible “states of the world” and two possible decisions. This leads to four possible outcomes.

H₀ TRUE H₀ FALSE H₀

IS REJECTED

Type I error

α

OK

H₀ IS NOT REJECTED

OK

Type II error

β

The probability of the type I error is said significance level of the test and can be arbitrarily fixed (typically 5%).

(18)

Test statistic

A test statistic is a function of the sample, that can be used to perform a hypothesis test.

• for the example considered, ¯X is a valid test statistics, which is equivalent to the, more common, “z” test statistic

• Z =X − µ¯ 0

σ/√

n ∼ N(0, 1)

255

Hypothesis testing: example

• 5% significance level (arbitrarily fixed);

• Z = X − 1¯

0.00316∼ N(0, 1)

• the observed value of Z is z =1.0065 − 1

0.00316 = 2.055;

• the empirical evidence leads to the rejection of H0.

256

p-value approach to testing

• the p-value, also called observed level of significance is the probability of obtaining a value of the test statistic more ex- treme than the observed sample value, under H₀.

• decision rule: compare the p-value with α:

– p-value < α =⇒ reject H₀ – p-value ≥ α =⇒ nonreject H₀

• for the example considered

p-value=P (Z ≤ −2.055) + P (Z ≥ 2.055) = 0.04;

• p-value ≤ 5% = statistically significant result.

• p-value ≤ 1% = highly significant result.

z test for µ with σ known

• X1, . . . , X_n i.i.d. with distribution N (µ, σ²);

• Hypotheses:

( H₀ : µ = µ₀ H₁ : . . .

• test statistic:

Z =X − µ¯ 0

σ/√ n

• under H0 the test statistic Z has distribution N (0; 1)

(19)

z test: two-sided hypothesis

H₁: µ 6= µ0

in this case

p − value = P (Z > |z|)

−3.5 −1.0 0.0 1.0 3.0

−3.5 −|z|−|z| −1.0 0.0 1.0 |z||z| 3.0 3.5

259

z test: one-sided hypothesis (right)

H₁: µ > µ₀

in this case

p − value = P (Z > z)

−3.0 −2.0 −1.0 0.0 1.0 z 3.0 3.5

260

z test: one-sided hypothesis (left)

H₁: µ < µ₀

in this case

p − value = P (Z < z)

−3.5 z −1.0 0.0 1.0 2.0 3.0

z test for µ with σ unknown

• Hypotheses:

( H₀ : µ = µ₀ H₁ : µ 6= µ0

• test statistic:

t =X − µ¯ 0

S/√ n

• p-value: P (T_n−1> |t|) where T_n−1 follows a Student’s t distribution with n − 1 degrees of freedom.

(20)

Test for a proportion

• Null hypotheses: H0: π = π₀;

• Under H0 the sampling distribution of P is approximately normal with expected value E(P ) = π₀ and standard error

SE(P ) =

sπ₀(1 − π0) n

Note that under H₀ there are no unknown parameters.

263

z test for π

• Hypotheses:

( H₀: π = π₀ H₁: π 6= π0

• test statistic:

Z = P − π0 qπ₀(1 − π0)/n

• P -value: P (Z > |z|)

264

z test for π: example

For the clothing store chain example, the hypotheses are

( H₀ : π = 0.25 H₁ : π > 0.25

Hence, under H₀ the standard error is SE =

s0.25(1 − 0.25)

40 = 0.068

so that

z =0.275 − 0.25

0.068 = 0.37

and the p-value is P (Z ≥ 0.37) = 0.36 and the null hypothesis cannot be rejected.