CHAPTER3 Statistic

(1)

CHAPTER 3 : SAMPLING DISTRIBUTIONS Sub-Topic

 Sampling error.

 Introduction to sampling distribution.  Sampling distribution of single mean.

 Sampling distribution of the difference between two means.  Sampling distribution test : t distribution, 2

- distribution and F-distribution.

Chapter Learning Outcome

 Solve the problems involve the sampling distributions for the single and two population means.

Learning Objective

By the end of this chapter, students should be able to  Understand the concept of sampling error.

 Determine the mean and standard deviation for the sampling distribution of the sample mean.

 Understand the importance of the Central Limit Theorem.

 Apply the sampling distributions for mean and difference between two means.

Key Term (English to Bahasa Melayu)

English Bahasa Melayu

1. Parameter → Parameter

2. Statistic → Statistik

3. Sampling error → Ralat pensampelan 4. Central Limit Theorem → Teorem had memusat 5. Sampling distribution → Taburan pensampelan 6. Simple random sample → Sampel rawak mudah

(2)

3.1 Sampling error

Definition 1

Sampling error of single mean is the difference between values (a statistic) computed from a sample and the corresponding value (a parameter) computed from a population.

Theory 1

Formula sampling error of single mean :   _ x e where  _ x sample mean   population mean. Definition 2

A parameter is a measure computed from the entire population.

Definition 3

A statistics is a measure computed from a sample that has been selected from a population.

Example 1

If given that the mean population is 158972 square feet and a sample size of five shopping centre yields with sample mean 155072 square feet. Find the sampling error.

Answer Example 1

We know that 158972 and 155072 _



x .

(3)

3900 3900 158972 155072 _        x  e Square feet. Theory 2

Formula for population mean,

N x



  where   population mean 

x values in the population 

N population size.

Theory 3

Fundamental statistical concepts are

 The size of the sampling error depends on which sample is taken.  The sampling error may be positive or negative.

There is potentially a different value for each possible sample mean.

Definition 4

A simple random sample is a sample selected in such a manner that each possible sample of a given size has an equal chance of being selected.

Theory 4

Formula for sample mean, is

n x x



_ where  _ x sample mean 

x sample value selected from the population 

(4)

3.2 Introduction to sampling distribution

In the inferential statistics process, a researcher selects a random sample from the population, computes a statistics on the sample and reaches conclusions about the population parameter from the statistics. In this chapter, we will explore the sample mean,

__

x, as the statistic. The sample means is one of the more common statistics used in the inferential process. To compute and assign the probability of occurrence of a particular value of a sample mean, the researcher must know the distribution of the sample means. One way to examine the distribution possibilities is to take a population with a particular distribution, randomly select samples of a given size, compute the sample means and attempt to determine how the means are distributed. Let say 23 people selected randomly from the population of women in Ayer Hitam Pahat between the ages of 20 and 40 years old and we computed the mean height of the sample. We would not expect our sample mean to be equal to the mean of all women in Ayer Hitam. It might be somewhat lower or it might be somewhat higher, but it would not equal the population mean exactly. Similarly, if we took a second sample of 23 people from the same population, we would not expect the mean of this second sample to equal the mean of the first sample. Inferential statistics concerns generalizing from sample to population. A critical part of inferential statistics involves determining how far sample statistics are likely to vary from each other and from the population parameter. Why we sample the population ? Why not we study the whole population ? These all because the physical impossibility of checking all items in the population, the cost of studying all the items in a population, the sample results are usually adequate, contacting the whole population would often be time-consuming and the last one is the destructive nature of certain tests such as a study of light bulb life.

Definition 5

If samples of size n are drawn randomly from a population that has a mean of  and

a standard deviation of 2 , the sample means, __

(5)

distributed for sufficiently large sample size (n30) regardless of the shape of the population distribution. If the population is normally distributed, the sample means are normally distributed for any size sample. It can be shown that the mean of the sample means is the population mean, which is __ 

x

and standard deviation of the sample means (called the standard error of the mean) is the standard deviation of the population divided by the square root of the sample size, which is

n

x 

__  .

Definition 6

A sampling distributions is a distribution of the possible values of a statistic for a given size sample selected from a population.

Definition 7

Sampling distribution of the mean, for random samples of n observations taken from a population with mean,  and a standard deviation,  , regardless of the population’s distribution, provided the sample size is sufficiently large, the

distribution of the sample mean, _

x will be normal with a mean equal to the population mean, _ 

x

. Further, the standard deviation will equal the population

standard deviation divided by the square-root of the sample size, n

x 

_  . The

larger sample size is, the better an approximation to the normal distribution.

3.3 Sampling distribution of the single mean Theory 5

Z-value for sampling distribution of _ x is n x Z     _ where  _ x sample mean

(6)



 population mean 

 population standard deviation 

n sample size

Theory 6 ( Calculation probability of single mean )

In order to calculate the probability of single mean, we need to follow four steps below.

Step 1 : Write the mean of sample mean, __

x which is _ 

x

.

Step 2 : Write the standard deviation of sample mean, __ x which is n x    _ .

Step 3 : Write the distribution in normal distribution form which is       2 __ __ __, ~ x x N x   .

Step 4 : Find the probability of sample mean,

                  __ __ __ x x r Z P r x P   . Example 2

What is the probability that a sample of 100 automobile insurance claim files will yield an average claim of RM4527.77 or less if the average claim for the population is RM4560 with standard deviation of RM600 ?

Answer Example 2 Given n100, 4527.77 _  x , 4560 and  600. Step 1 : _ 4560 x Step 2 : 60 100 600 _    n x   Step 3 :



2



__ 60 , 4560 ~ N x Step 4 :                 60 4560 77 . 4527 77 . 4527 __ Z P x P

(7)

P



Z 0.54



P(Z 0.54) 0.2946

Example 3

The random variable, X represent the number of box in a container, has the following probability distribution.

X 4 5 6 7

P(x) 0.2 0.4 0.3 0.1

(a) Find the population mean and variance.

(b) Find the sample mean and variance for random samples of 36 boxes.

(c) Calculate the probability if the average number of box in 36 containers will be less than 5.5. Answer Example 3 (a) E(X)



x.P(x) 4(0.2)5(0.4)6(0.3)7(0.1) 0.82.01.80.7 5.3 ) 1 . 0 ( 7 ) 3 . 0 ( 6 ) 4 . 0 ( 5 ) 2 . 0 ( 4 ) (X2  2  2  2  2 E 3.21010.84.9 28.9





2 2 ) ( ) ( ) ( Var X E X  E X 28.9(5.3)2 0.81

(8)

(b) Mean sample, _ 5.3 x Variance sample, 0.0225 36 81 . 0 2 2 _    n x   (c) Step 1 : _ 5.3 x Step 2 : 0.0225 0.15 36 81 . 0 2 _     n x   Step 3 :



2



__ 15 . 0 , 3 . 5 ~ N x Step 4 :       _          15 . 0 3 . 5 5 . 5 5 . 5 __ Z P x P P(Z1.33) 1P



Z 1.33



10.09176 0.90824 Example 4

An electrical firm manufactures light bulbs that have a length of life that is approximately normally distributed, with mean equal to 800 hours and a standard deviation of 40 hours. Find the probability that a random sample of 16 bulbs will have an average life of less than 775 hours.

Answer Example 4 Step 1 : _ 800 x Step 2 : 10 16 40 _    n x   Step 3 : ~



800,100



__ N x Step 4 :                 10 800 775 775 __ Z P x P

(9)

P(Z 2.5) 0.0062

Example 5

At a large university, the mean age of the students is 22.3 years and the standard deviation is 4 years. A random sample of 64 students is drawn. What is the probability that the average age of these students is greater than 23 years ?

Answer Example 5 Step 1 : _ 22.3 x Step 2 : 0.5 64 4 _    n x   Step 3 :



2



__ 5 . 0 , 3 . 22 ~ N x Step 4 :       _          5 . 0 3 . 22 23 23 __ Z P x P P(Z 1.4) 0.0808 Example 6

The breaking strength (in kg/mm) for a certain type of fabric has mean 1.86 and standard deviation 0.27. A random sample of 80 pieces of fabric is drawn. What is the probability that the sample mean breaking strength is less than 1.8 kg/mm ?

Answer Example 6 Given : 1.86,  0.27 and n80 Step 1 : _ 1.86 x Step 2 : 0.03018 80 27 . 0 _    n x  

(10)

Step 3 :



2



__ 03018 . 0 , 80 ~ N x Step 4 :       _          03018 . 0 86 . 1 8 . 1 8 . 1 __ Z P x P  P(Z 1.99) 0.0233 Example 7

Taking random samples of size n from an infinite population that has a standard deviation two, show that

__

x would be a more precise estimator of  if sample size were increased from four to six. Interpret the result.

The precision of __

x as an estimator of  is measured by the standard deviation, 

x  .

For sampling from an infinite population, n

x 

  .

Therefore, for  2 and n4 :   n 2 4 1

x   Increasing n from 4 to 16 :   n 2 160.5 x  

Thus, with an increasing in sample size and a constant σ, 

x

 decreases by 50%.

Exercise 3.3

1. Bags of concrete mix labeled have a population mean weight of 100 kg and a population standard deviation of 0.5 kg.

(a) What is the probability that the mean weight of a random sample of 50 bags is less than 99.9 kg ?

(b) If the population mean weight is increased to 100.15 kg, what is the probability that the mean weight of a sample of size 50 will be less than 100 kg ?

(11)

2. In a report stated that the average time of watching movie per week for children with ages between two and six years is 22 hours. Assume the variable is normally distributed and the standard deviation is five hours. A sample of 33 children with ages between two and six years is randomly selected. Find the probability that the average time they watch movie per week will be greater than 23.5 hours.

3. Women from aged 18 to 24, their systolic blood pressures (in mm Hg) are normally distributed with a mean of 114.4 and a standard deviation of 13.1. (a) If five women between the ages of 18 to 24 are randomly selected, find

the probability that her systolic blood pressure is greater than 120. (b) If twelve women between the ages of 18 to 24 are randomly selected,

find the probability that the average of their systolic blood pressure is less than 115.

4. A random sample of hundred is taken from a normally distributed population with mean 20 and standard deviation equal to one. What is the probability that

__

x will take on a value 20 and 20.2 inclusive ?

5. Engineers must consider the breadths of male heads when designing motorcycle helmets. Men have head breadths that are normally distributed with a mean of 15.24 cm and a standard deviation of 2.54 cm.

(a) If one male is randomly selected, find the probability that his head breadth is less than 15.75 cm.

(b) Find the probability that 100 randomly selected men have a mean head breadths at least 16.00 cm.

6. The lifetime of a particular type of battery is normally distributed with a mean of 1100 days and a standard deviation of 80 days. The manufacturer randomly selects 400 batteries of this type and ships them to a departmental store.

(12)

(a) What is the mean and standard deviation of the sampling distribution of

__ x ?

(b) What is the probability that the average lifetime of these 400 batteries is between 1097 and 1104 days ?

7. The time required to assemble an electronic component is normally distributed with a mean of 25 minutes and a standard deviation of 3.5 minutes. Find the probability that the average time required to assemble all 19 components is at least 23 minutes.

8. The amount of sulfur in the daily emissions from a power plant has a normal distribution with a mean of 94 and a standard deviation of 22. For a random sample of 5 days, find the probability that the average amount of sulfur emissions will exceed 80.

9. According to the growth chart that doctors use as a reference, the heights of two-year-old boys are normally distributed with mean 34.5 inches and standard deviation 1.3 inches. If six two-year-old boys are selected, what is the probability that their average height will be between 34.1 and 35.2 inches.

10. Casual workers in a certain industry are paid on average RM5.10 per hour which is normally distributed with standard deviation of RM2.20. A sample of 35 casual workers from the industry was selected to be respondents for the underpaid issue questionnaires. Find the probability that the average payment for those casual workers is

(a) at least RM6.00 per hour. (b) greater than RM4.80 per hour.

11. Intelligent Quotients (IQ) in the general population are normally distributed with a mean of 100 and a standard deviation of 15. A random sample of 40

(13)

students was taken in a certain university. Find the probability that the mean IQ of the sample is

(a) greater than 105 and less than 107. (b) not more than 109.

12. Given a random sample of X₁,X₂,X₃_,,X₄₀ which is drawn from population with Poisson distribution 3.5. Find probability that the sample mean is between 3.4 and 4.3.

13 PVC pipe is manufactured with mean diameter is 3.2 cm and standard deviation is 1.6 cm. The distribution of diameter is normal. Find the probability that a random sample of 64 pipes will have a sample mean diameter is less than three centimeter.

14 Consider the PVC pipe in the previous question. How is the standard deviation of the sample mean changed when the sample size is decreased from 64 to 9 ? Explain. Answer Exercise 3.3 1. (a) 0.0793 (b) 0.0170 2. 0.0427 3. (a) 0.1685 (b) 0.5636 4. 0.1359 5. (a) 0.5793 (b) 0.0014 6. (a) 1100, 4 (b) 0.6147 7. 0.9936 8. 0.9222 9. 0.6799 10. (a) 0.0078 (b) 0.7910 11. (a) 0.0159 (b) 0.9826

(14)

12. 0.6296 13. 0.1587

3.4 Sampling distribution of the difference between two means

Theory 7

Statistical analyses are very often concerned with the difference between means. A typical example is an experiment designed to compare the mean of a control group with the mean of an experimental group. Inferential statistics used in the analysis of this type of experiment depend on the sampling distribution of the difference between means.

Theory 8 ( Calculation probability of two means)

In order to calculate the probability of two means, we need to follow four steps below.

Step 1 : Write the mean of sampling distribution which is, ₁ ₂ 2 __ 1 __      x x

Step 2 : Write the standard deviation of sample mean which is,

2 2 2 1 1 2 2 __ 1 __ n n x x       .

Step 3 : Write the distribution in normal form,         2 __ 2 __ 1 __ 2 __ 1 __ , ~ x x x x N x   .

Step 4 : Find the probability of sample mean,

                 _ _   2 _ 1 _ 1 _ 1 _ __ 2 __ 1 x x x x r Z P r x x P   . Example 8

The mature citrus trees of type A have a mean height of 14.8 feet with a standard deviation of 1.2 feet. The mature citrus trees of type B have a mean height of 12.9 feet with a standard deviation of 1.5 feet. Two samples of size 12 and 15 are randomly selected from mature citrus tree of type A and B respectively. Find the probability that

(15)

(a) the mean of type A is more than 14 feet. (b) the mean of type B is between 12 to 14 feet.

(c) the mean of type A is two feet more than the mean of type B.

(a)

A B

Sample mean 14.8 12.9

Sample standard deviation 1.2 1.5

Sample size 12 15 Step 1 : __  _A14.8 xA   Step 2 : 0.34641 12 2 . 1 __    n A x   Step 3 :



2



__ 34641 . 0 , 8 . 14 ~ N xA Step 4 :       _    34641 . 0 8 . 14 14 ) 14 ( __ Z P x P P(Z 2.31) 1P(Z 2.31) 10.01044 0.9896 (b) Step 1 : __  B 12.9 xB   Step 2 : 0.38729 15 5 . 1 __    n B x   Step 3 :



2



__ 38729 . 0 , 9 . 12 ~ N xB Step 4 :        _ _         _ _ 38729 . 0 9 . 12 14 38729 . 0 9 . 12 12 14 12 __ Z P x P P(2.32Z 2.84) 1P(Z 2.32)P(Z 2.84)

(16)

10.010170.00226 0.9878 (c) Step 1 : __  A B 14.812.91.9 x    Step 2 : 0.51961 15 5 . 1 12 2 . 1 2 2 2 2 __      B B A A x n n    Step 3 :



2



__ __ 51961 . 0 , 9 . 1 ~ N x xA B Step 4 :       _         _ _ 51961 . 0 9 . 1 2 2 __ __ Z P x x P A B P(Z 0.19) 0.4247 Example 9

The result of Statistics Test 1 for two groups of management students, Section 1 and Section 2 are normally distributed with N(60,42) and N(64,22) respectively. Two samples of size 9 and 12 are randomly selected from Section 1 and Section 2 respectively. Find the probability that the mean of Section 2 is lower than the mean of Section 1 ? Answer Example 9 Section 1 Section 2 Sample mean 60 64 Sample variance 16 4 Sample size 9 12 Step 1 : 2 1 64 60 4 1 2            x x Step 2 : 1.4529 12 4 9 16 2 2 2 1 2 1 1 2        x _n _n x   

(17)

Step 3 : 1



2



__ 2 __ 4529 . 1 , 4 ~ N x x  Step 4 :       _ _        _ 0 1 __ 2 __ 1 __ 2 __ x x P x x P       _   4529 . 1 4 0 Z P P(Z 2.75) P(Z 2.75) 0.0030 Example 10

Consider two populations of students who participate in a reading programmed prior to taking a Japanese course. The populations are those who earn an A grade and those who earn a B grade. Let X be the number of books read by the students who participate in the programmed. Find the probability that the mean number of books read by the students who earn A grade is greater than the students who earn B grade if given the data below.

Grade A Grade B

Sample mean 37 25

Sample standard deviation 8.7014 8.5264

Sample size 8 6 Answer Example 10 Step 1 :     372512 x A B xA _B    Step 2 : 4.6455 6 5264 . 8 8 7014 . 8 2 2 2 2         B B A A x xA B n n    Step 3 :



2



__ __ 6455 . 4 , 12 ~ N x xA B Step 4 :       _ _        _ 0 __ __ __ __ B A B A x P x x x P

(18)

      _   6455 . 4 12 0 Z P P(Z 2.58) 1P(Z2.58) 10.00494 0.99506 Example 11

The length of computer desk is approximately normal distributed. There are two factories produce that kind of desk. The summary statistics are given below.

Factory A Factory B

Sample mean 60.5 58.3

Sample standard deviation 3 4

Sample size 35 40

Find the probability the mean sample for the length of computer desk produced by Factory B is greater than mean sample for the length of computer desk produced by Factory A. Answer Example 11 Step 1 :     58.360.52.2 x B A xB A    Step 2 : 0.81064 40 4 35 32 2 2 2         B B A A x xB A n n    Step 3 :



2



__ __ 81064 . 0 , 2 . 2 ~  x N xB A Step 4 :       _ _        _ 0 __ __ __ __ A B A B x P x x x P

(19)

      _    81064 . 0 ) 2 . 2 ( 0 Z P 0.0034 Exercise 3.4

1. The usage of electricity at residential area A is normally distributed with mean of 156 kilowatt per hour and standard deviation of 43 kilowatt per hour. Meanwhile the usage of electricity at residential area B is also normally distributed with mean of 161 kilowatt per hour and its standard deviation is 48 kilowatt per hour. Two samples of size 20 and 25 residences are randomly selected from residential area A and residential area B, respectively. Find the probability that the mean of usage at residential area A is lower than the mean of usage at residential area B.

2. A company manufactures two types of cables, brand A and brand B that have mean breaking strengths of 4000 kg and 4500 kg and standard deviations of 300 kg and 200 kg, respectively. If 100 cables of brand A and 50 cables of brand B are tested, what is the probability that the mean breaking strengths of brand B will be at least 600 kg more than brand A ?

3. The effective life of a component used in a jet-turbine aircraft engine is a random variable with mean 3465 hours and standard deviation 25 hours. The distribution of effective life is fairly close to a normal distribution. The engine manufacturer introduces an improvement into the manufacturing process for this component that increases the mean life to 4050 hours and decreases the standard deviation to 15 hours. Given a random sample of 20 components is selected from the old process and 35 components is selected from the improved process. What is the probability that the difference between two sample mean improved process and old process is at most 23 hours ?

(20)

4. The average running times of films produced by Company A are 98.4 minutes with standard deviation of 7.8 minutes. Companies B have a mean running times of 110.7 minutes with standard deviation of 29.8 minutes. Assume the populations are approximately normally distributed. What is the probability that a random sample of 36 films from Company B will have mean running times that at least 13 minutes more than the mean running times of a random sample of 49 films from Company A.

5. The elasticity of polymer is affected by the concentration of a reactant. When low concentration is used, the true mean elasticity is 55 and when high concentration is used the mean elasticity is 60. The standard deviation of elasticity is 4, regardless of concentration. The distribution of elasticity is normally distributed. Two random samples of size 16 are taken. Find the probability that the difference mean between high concentration and low concentration is more than two.

6. If given two populations of UTHM students who participate in a debate competition France Language. The populations are those who get a Score A and Score B. Let X is the number of questions answered by the students who participate in the competition. Find the probability that the mean number of questions collect by the students who get Score B is at most than the students who get Score A if given the data such as below.

Score A Score B

Sample mean 83 91

Sample standard deviation 12 8

Sample size 15 14

7. The mean age at death in Malaysia is 55.5 years and Singapore is 57 years. The standard deviation is approximately 4.6 years and 5 years for each country respectively. Samples of 130 deaths from the Malaysia Hospital and

(21)

120 from Singapore Hospital were selected. Find the probability that (a) the mean age at death in Malaysia is greater than the mean age at

death in Singapore.

(b) the mean age at death in Singapore is three less than the mean age at death in Malaysia.

8. The average life of a hand phone is 8 years for a female and 6 years for a male, with a standard deviation of 1 and 2 years respectively. Assuming that the lives of these hand phones follow approximately a normal distribution, find the probability that the mean life of a random

(a) male hand phone falls between 6.6 and 7.7 years. (b) sample of 44 females is not less than 2.5 years than the sample of 55

males hand phones.

9. A company manufactures two types of polystyrenes, type A and type B that have mean breaking strengths of 400 g and 450 g and standard deviations of 30 g and 20 g, respectively. If 80 polystyrenes of type A and 45 polystyrenes of type B are tested, what is the probability that the mean breaking strengths of type B will be at most 53 g more than type A ?

10. The average running times of disks produced by Company X is 88.1 minutes and a standard deviation of 6.1 minutes, while those of Company Y have a mean running times of 99.3 minutes with standard deviation of 13.6 minutes. Assume the populations are approximately normally distributed. What is the probability that a random sample of 41 disks from Company Y will have mean running times that at most 15 minutes more than the mean running times of a random sample of 32 disks from Company X ?

11. A random sample of size sixteen is selected from a normal population with a mean of 75 and a standard deviation of eight from sample A. A second sample

(22)

of size nine is selected from another normal population with a mean of 70 and a standard deviation of twelve from sample B. Let X and _A X be the two _B sample means. Find

(a) the probability that mean difference between sample A and sample B will be exceed four.

(b) the probability that mean difference between sample A and sample B will be between 3.5 and 5.5.

12. The elasticity of polymer is affected by the concentration of a reactant. When low concentration is used, the true mean elasticity is 55 and when high concentration is used the mean elasticity is 60. The standard deviation of elasticity is 4, regardless of concentration. The distribution of elasticity is normally distributed. Two random samples of size 16 are taken. Find the probability that the difference mean between high concentration and low concentration is more than two.

13. The average running times of films produced by Company A is 98.4 minutes and a standard deviation of 7.8 minutes, while those of Company B have a mean running times of 110.7 minutes with standard deviation of 29.8 minutes. Assume the populations are approximately normally distributed. What is the probability that a random sample of 36 films from Company B will have mean running times that at least 13 minutes more than the mean running times of a random sample of 49 films from Company A.

14. The weight of computer chair is approximately normally distributed. There are two company produce that kind of chair. The data in table shows as a follows.

Company 1 Company 2

Sample mean = 20.1 Sample mean = 23.1

(23)

Sample size = 38 Sample size = 29

Find the probability that mean weight of computer chair produced by Company 2 is greater than weight of computer chair produced by Company 1.

15. A study was designed to estimate the difference in diastolic blood pressure readings between men and women. The mean and standard deviation for sixteen men are 77.37 and 8.35, while for thirteen women are 71.08 and 9.22 respectively. Assume that the readings are normally distributed, find

(a) the sampling distribution of the different between diastolic blood pressure readings for men and women.

(b) the probability that the different between diastolic blood pressure readings for men is greater than women.

(c) the probability that the different between diastolic blood pressure readings from men is five less than women.

Answer Exercise 3.4 1. 0.6443 2. 0.0076 3. 0.0073 4. 0.4443 5. 0.9830 6. 0.0166 7. (a) 0070 (b) 0.9931 8. (a) 0.1844 (b) 0.0526 9. 0.7486 10. 0.9452 11. (a) 0.5871 (b) 0.1769 12. 0.983 13. 0.44433 14. 0.9993 15. (b) 0.9719 (c) 0.3483

(24)

Theory 9

The t-distribution has been introduced by W. S. Gosset (1876 - 1937). He adopted the pen name "student." Therefore, the distribution is known as 'student’s t-distribution'. It is used to establish confidence limits and test the hypothesis when the population variance is not known and sample size is small (less than 30). If a random sample x1,

2

x , …, xn of n values be drawn from a normal population with mean μ and standard

deviation s, then the mean of sample x



x_i n



. Estimation of the variance, let s 2 be the estimate of the variance of the sample then s2 given by

1 2 2         



x x n

s i whereby (n1) as denominator in place ' n . The statistic '' ' t

is defined as t x s2 n



whereby 



x sample mean, μ is actual mean of population and nsample size and sstandard deviation of sample. The formula for

1 2         



x x n s _i . Note :

 '' t is distributed as the student distribution with (n1)degree of freedom (df ).

 The variable '' t distribution ranges from minus infinity to plus infinity.

 Such as standard normal distribution, it is also symmetrical and has mean zero.  2

of t-distribution is greater than 1, but becomes 1 as 'df' increases and thus the sample size becomes large.

 The t-distribution is lower at the mean and higher at the tails than the normal distribution.

 The t-distribution has proportionally greater area at its tails than the normal distribution.

 The t-distribution is similar in shape to the standard normal distribution, which are symmetric about zero, uni-modal and bell-shaped.

(25)

 The spread of a t-distribution is larger than that of a standard normal distribution. That is, there is more probability in the tails of a t-distribution. This makes sense because the t-statistic should have more variability that the test statistic, Z that we use before. There is added variability in the t statistic since it uses s, an estimate of , rather than a known, fixed value of .

Theory 10 (Finding areas under the t-distributions)

We use t-distribution table to find areas under the t-distributions. The table gives the value of t__,_v which is the 100α percentage point of the t-distributions for v, degrees of freedom. The numbers in the middle of the table are values from t-distributions. Each row corresponds to a t-distribution with the degrees of freedom given at the beginning of the row. The numbers in the top row are right tail areas.

Example 12

Find the value of t-distribution, if given nine degrees of freedom with alpha equal to 0.05.

Refers to t-distribution table, if we go across the row for nine degrees of freedom and down the column for an area of 0.05, we get the t value of 1.833. That means, for t9

distribution, the area under the curve to the right of 1.833 is 0.05.

Example 13

Find the value of t-distribution, if given twenty degrees of freedom with alpha equal to 0.001.

Refers to t-distribution table, if we go across the row for twenty degrees of freedom and down the column for an area of 0.001, we get the t value of 3.552. That means, for t₂₀ distribution, the area under the curve to the right of 3.552 is 0.001.

(26)

Example 14

By using the statistical table, find the value of t__,_v. (a) P(T t__,₁₄)0.025 (b) P(T t__,₂₄)0.005 Answer Example 14 (a) Given : P(T t__,₁₄)0.025 t__,_v t₁₄_,₀_.₀₂₅2.145, then P(T 2.145)0.025 (b) Given : P(T t__,₂₄)0.005 t__,_v t₂₄_,₀_.₀₀₅2.797, then P(T t__,₂₄)P(T t__,₂₄)0.005 Theory 11

Tests like Z score, t, and F are based on the assumption that the samples were drawn from normally distributed populations or more accurately that the sample means were normally distributed. As these tests require assumption about the type of population or parameters, these tests are known as 'parametric tests.' There are many situations in which it is impossible to make any rigid assumption about the distribution of the population from which samples are drawn. This limitation led to search for non-parametric tests. Chi-square (Read as Ki - square) test of independence and goodness of fit is a prominent example of a non-parametric test. The chi-square, 2 test can be used to evaluate a relationship between two nominal or ordinal variables. The 2 is measure of actual divergence of the observed and expected frequencies. In sampling studies we never expect that there will be a perfect coincidence between actual and observed frequencies and the question that we have to tackle is about the degree to which the difference between actual and observed frequencies can be ignored as arising due to fluctuations of sampling. If there is no difference between actual and observed frequencies then 2 0. If there is a difference, then 2 would be more

(27)

than 0. But the difference may also be due to sample fluctuation and thus the value of 2

 should be ignored in drawing the inference. Such values of 2 under different conditions are given in the form of tables and if the actual value is greater than the table value, it indicates that the difference is not solely due to sample fluctuation and that there is some other reason. On the other hand, if the calculated 2

is less than the table value, it indicates that the difference may have arisen due to chance fluctuations and can be ignored. Thus 2 tests enable us to find out the divergence between theory and fact or between expected and actual frequencies are significant or not. If the calculated value of 2 is very small, compared to table value then expected frequencies are very little and the fit is good. If the calculated value of 2 is very large as compared to table value then divergence between the expected and the observed frequencies is very big and the fit is poor.

Theory 12 (Finding areas under the 2- distributions)

We use 2- distribution table to find areas under the 2- distributions. The table gives the value of 2,v which is the 100 percentage point of the  2- distributions

for v, degrees of freedom. The numbers in the middle of the table are values from 2

 - distributions. Each row corresponds to a 2- distribution with the degrees of freedom given at the beginning of the row. The numbers in the top row are right tail areas.

Example 15

Find the value of 2- distribution, if given seventeen degrees of freedom with alpha equal to 0.95.

Refers to 2- distribution table, if we go across the row for seventeen degrees of freedom and down the column for an area of 0.95, we get the 2 value of 8.672. That

(28)

means, for 217 distribution, the area under the curve to the right of 8.672 is 0.95.

Example 16

Find the value of 2

- distribution, if given twelve degrees of freedom with alpha equal to 0.02.

Refers to 2- distribution table, if we go across the row for twelve degrees of freedom and down the column for an area of 0.02, we get the 2 value of 24.054. That means, for 212 distribution, the area under the curve to the right of 24.054 is 0.02.

Theory 13

In probability theory and statistics, the F-distribution is a continuous probability distribution. It is also known as Snedecor's F distribution or the Fisher-Snedecor distribution (after R.A. Fisher and George W. Snedecor). The F-distribution becomes relevant when we try to calculate the ratios of variances of normally distributed

statistics. Suppose we have two samples with n1 and n2 observations, the ratio ₂

2 2 1 s s F  is distributed according to an F distribution (named after R.A. Fisher) with v₁ n₁1 numerator degrees of freedom, and v₂ n₂1 denominator degrees of freedom. The F-distribution is skewed to the right, and the F-values can be only positive.

Theory 14 (Finding areas under the F - distributions)

We use F-distribution table to find areas under the F-distributions. The table gives the values of

2 1, ,v v

F_ which is the 100α percentage point of the F-distributions having v1 degrees of freedom in the numerator and v2 degrees of freedom in the denominator. For each pair of values of v1 and v2 , F,v₁,v₂ is tabulated for

001 . 0 , 01 . 0 , 025 . 0 , 05 . 0 

(29)

percentage points of the distribution may be obtained from the relation 1 2 2 1 , , , , 1 1 v v v v F F     for example, 0.351 1 12 , 8 , 05 . 0 18 , 12 , 95 . 0   F F . Example 17

If s and ₁2 s are the variances of independent random samples of size, ₂2 n1 25 and

13 2 

n from normal population with equal variances, find _       6.25 2 2 2 1 s s P . Answer Example 17

Variance of normal population are equal for two independent random samples, 2 2 2 1 2      . 25 1  n , v₁n₁125124 13 2  n , v₂ n₂ 113112

From statistical table :

999 . 0 001 . 0 1 ) 25 . 6 ( 1 ) 25 . 6 ( 25 . 6 2 2 2 1                F P F P s s P Exercise 3.4

By using the statistical table, find the probability 1. P(T 2.898), v17

2. P(T 1.415), v7 3. P(T 2.042), v30

By using the statistical table, find the value of T , if given 4. P(T _____), v26, 0.005

(30)

5. P(T _____), v21, 0.995 6. P(T _____), v14, 0.9995

In each of the following parts, find 20.95,. Assume a chi- square distribution with

7. 14 degrees of freedom 8. 29 degrees of freedom

Assume a chi – square distribution with 17 degrees of freedom. Fill in the blanks. 9. P(2 __________)0.05

10. P(2 __________)0.005

11. Assume a 2 distribution with 7 degrees of freedom, find ) 346 . 6 167 . 2 ( 2  P . 12. Assume a 2

distribution with 17 degrees of freedom, find P(2 19.511).

13. If S₁2 and S₂2 are the variances of independent random samples of size, 9

1 

n and n₂ 11 from the normal population with equal variances, find

      5.06 2 2 2 1 S S P .

14. If s and ₁2 s are the variances of independent random samples of size, ₂2 9

1

n and n₂ 13 from normal population with equal variances, find

      4.50 2 2 2 1 s s P .

15. If s and ₁2 s₂2 are the variances of independent random samples of size, 7

1 

(31)

        4.28 2 2 2 1 s s P . Answer Exercise 3.4 1. 0.005 2. 0.90 3. 0.975 4. 2.779 5. -2.831 6. 4.140 7. 6.571 8. 17.708 9. 27.587 10. 5.697 11. 0.45 12. 0.7 13. 0.01 14. 0.99 15. 0.95 EXERCISE CHAPTER 3

1. A simple random sample of 100 men is chosen from a population with mean height 70 inch and standard deviation 2.5 inch. What is the probability that the average height of the sample men is greater than 69.5 inch ?

2. A group of ball bearings have a mean weight of 5.02 grams and a standard deviation of 0.30 grams. A random sample of 100 ball bearings chosen from this group, find

(a) the probability that an average weight of ball bearings chosen from this group are between 4.96 and 5.00 grams.

(b) the probability that an average weight of ball bearings chosen from this group are more than 5.10 grams

3. Given the population 5, 5, 5, 7, 7, 8, 8, 8, 9, 9. Find mean and standard deviation sampling distribution of

(a) If a random sample of size 40 was drawn with replacement from that population.

(32)

population.

4. The mean height of 250 UPM staffs is 158 m and the standard deviation is 5 m. Find the mean and standard deviation of the sampling distribution of the mean height for a sample size of 38 staffs.

5. A chemical engineer calculates that the populations mean yield of batch is 518 grams per milliliter with a standard deviation of 40 grams. Assume that the distribution of yield to be approximately normal. What is the probability in a certain month, he get yield less than 515 grams for 36 batches ?

6. The viscosity of a fluid can be measured in an experiment by dropping a small ball into a calibrated tube containing the fluid and observing the random variable X, the time it takes for the ball to drop the measures distance. Assume that X is normally distributed with a mean of 20 seconds and a standard deviation of 0.5 seconds for a particular type of liquid.

(a) What is the standard deviation of the average time of 40 experiments ? (b) What is the probability that the average time of 40 experiments will

exceed 20.1 seconds ?

(c) Suppose the experiment is repeated only 20 times. What is the probability that the average value of X less than 20.1 seconds ?

7. Two independent experiments are being run in which two different types of paints are compared. Twenty specimens are painted using type A and the drying time in hours is recorded on each. The same is done with type B. Assume that the mean drying times of the two populations are normal,

) 1 , ( ~N 

X_A and X_B ~N(,1) for the two types of paints.

(a) Write down the mean distribution of XA

__ . (b) Calculate       _ _ 3 . 0 __ __ B A X X P .

(33)

8. The photo resist thickness in semiconductor manufacturing has a mean of 10 micrometers and standard deviation of 3 micrometer. Assume that the thickness is normally distributed and that the thicknesses of different photo resist are independent.

(a) Determine the probability that the average thickness of 10 photo resist is either greater than 11 or less than 9 micrometers.

(b) Determine the number of photo resist that need to be measured such that the average thickness exceeds 11 micrometers is 0.01.

9. The population of the usage per sheet of paper for old and new certain products are distributed N1(2000,60) and N2(2500,40) respectively. Two random samples are taken from each population of size n₁ and n₂.

(a) Write down the sampling distribution of the different means of new to old products.

(b) Find the probability of mean usage of sample of size n1 n2 30 for new and old products with new products is at least 500.5 sheets more than old products.

(c) Find the probability of mean usage of size n1 20 and n2 25 for new and old products with new products is at most 502 sheets more than old products.

10. Line Clear Manufacturing Sdn. Bhd. manufactured two type of cables A and B that have mean breaking strengths of 2500 lb and 2400 lb with their standard deviation 150 lb and 100 lb. If 50 cables of brand A and 25 cables of brand B are tested, what is the probability that the mean breaking strength of A will be

(a) at least 150 lb more than brand B ? (b) at least 110 lb more than brand B ?

(34)

mean of 54 months and a standard deviation of 6 months. Your consumer advocacy group tests 50 of them. What is the probability that it finds a mean lifetime of less than 52 months ?

12. A manufacturer of video display units is testing two microcircuit designs to determine whether they produce equivalent mean current flow is normally distributed with mean and standard deviation such as below. Find the probability that the mean of design A is lower than the mean of design B.

Design A Design B

Mean 24.2 23.9

Variance 10 20

Sample size 15 10

13. The amount of time that a drive-through restaurant counter spends on a customer is normally distributed with a mean 3.2 minutes and a standard deviation 1.6 minutes. If a random sample of 64 customers is observed, find the probability that the mean time at the counter is

(a) less than 2.7 minutes. (b) more than 3 minutes.

(c) at least 3.2 minutes but less than 3.4 minutes.

14. Students may choose between a 3 semester course in physics without labs and 4 semester course with labs. The final written examination is the same for each section. The section wit labs made an average examination grade of 84 with a standard deviation of 4 and the section without labs made an average grade of 77 with a standard deviation of 6. Assume the populations are approximately normally distributed. Find the probability that the sample mean for a random sample of scores of 12 students with labs exceeds the sample mean for a random sample of scores of 18 students without labs by at most 5.

(35)

15. Casual workers in a certain industry are paid on average RM5.10 per hour which is normally distributed with standard deviation of RM2.20. A sample of 35 casual workers from the industry was selected to be respondents for the underpaid issue questionnaires. Find the probability that the average payment for those casual workers is

(a) at least RM6.00 per hour. (b) greater than RM4.80 per hour.

16. The mean age at death in Malaysia is 55.5 years and Singapore is 57 years. The standard deviation is approximately 4.6 years and 5 years for each country respectively. Samples of 130 deaths from the Malaysia Hospital and 120 deaths from Singapore Hospital were selected. Find the probability that (a) the mean age at death in Malaysia is greater than the mean age at death

in Singapore.

(b) the mean age at death in Singapore is three less than the mean age at death in Malaysia.

17. Intelligent Quotients (IQ) in the general population are normally distributed with a mean of 100 and a standard deviation of 15. A random sample of 40 students was taken in a certain university. Find the probability that

(a) the mean IQ of the sample is greater than 105 and less than 107. (b) the mean IQ of the sample is not more than 109.

18. The average life of a hand phone is 8 years for a female and 6 years for male, with a standard deviation of 1 and 2 years respectively. Assuming that the lives of these hand phones follow approximately a normal distribution, find (a) the probability that the mean life of a random male hand phone falls

between 6.6 and 7.7 years.

(b) the probability that the mean life of a random sample of 44 females is not less than 2.5 years than the sample of 55 males hand phones.

(36)

19. A company manufacturers two types of polystyrenes, type A and type B that have mean breaking strengths of 400 g and 450 g with standard deviation of 30 g and 20 g, respectively. If 80 polystyrenes of type A and 45 polystyrenes of type B are tested, what is the probability that the mean breaking strengths of type B will be at most 53 g more than type A ?

20. The average running times of disks produced by Company X is 88.1 minutes and a standard deviation of 6.1 minutes, while those of Company Y have a mean running times of 99.3 minutes with standard deviation of 13.6 minutes. Assume the populations are approximately normally distributed. What is the probability that a random sample of 41 disks from Company Y will have mean running times that at most 15 minutes more than the mean running times of a random sample of 32 disks from Company X ?

ANSWER EXERCISE CHAPTER 3

1. 0.97725 2. (a) 0.22867 (b) 0.00379 3. (a) 7.1, 0.2392 (b) 7.1, 0.87368 4. 158m, 0.8111 5. 0.32636 6. (a) 0.0791 (b) 0.1038 (c) 0.8133 7. (b) 0.00135 8. (a) 0.2937 (b) 49 9. (b) 0.3936 (c) 0.82381 10. (a) 0.0432 (b) 0.3658 11. 0.00914 12. 0.45620 13. (a) 0.00621 (b) 0.84134 (c) 0.34134 14. 0.13567 15. (a) 0.00776 (b) 0.791 16. (a) 0.00695 (b) 0.99305 17. (a) 0.01593 (b) 0.98257

(37)

18. (a) 0.18443 (b) 0.0526

19. 0.7486 20. 0.9452

SUMMARY CHAPTER 3

Sampling error of single mean :   _ x e . Population mean, N x



  . Sample mean, is n x x



_ .

Z-value for sampling distribution of _ x is n x Z     _ .

(38)

Calculate the probability of single mean : Step 1 : Mean of sample mean,

__

x which is _ 

x

.

Step 2 : Standard deviation of sample mean, __ x which is n x    _ .

Step 3 : Distribution in normal distribution form which is       2 __ __ __, ~ x x N x   .

Step 4 : Probability of sample mean,

                  __ __ __ x x r Z P r x P   .

Calculate the probability of two means :

Step 1 : Mean of sampling distribution which is, 1 2 2 __ 1 __      x x

Step 2 : Standard deviation of sample mean which is,

2 2 2 1 1 2 2 __ 1 __ n n x x       .

Step 3 : Distribution in normal form,         2 __ 2 __ 1 __ 2 __ 1 __ , ~ x x x x N x   .

Step 4 : Probability of sample mean,

                 _ _   2 _ 1 _ 1 _ 1 _ __ 2 __ 1 x x x x r Z P r x x P   .

(39)

(40)