• No results found

Chapter 3 – Sample Statistics

N/A
N/A
Protected

Academic year: 2021

Share "Chapter 3 – Sample Statistics"

Copied!
36
0
0

Loading.... (view fulltext now)

Full text

(1)

Chapter 3 – Sample Statistics

• By the end of the chapter, you will be able to:

1) Define, interpret and evaluate sample statistics.

–Mean

–Variance and Standard Deviation –Covariance and Correlation

–Median, Mode, Percentiles

(2)

2

3. POPULATION VS. SAMPLE DATA

Population Data – Full information on the ENTIRE population.

-Includes population probability (pdf) -Uses formulas containing probability -ex) data on an ENTIRE class

Sample Data – Partial information from a RANDOM SAMPLE of the population

-Individual data points (no pdf) -Uses the following formulas

-ex) Study of 2,000 random students

(3)

3.1 Estimators

Population Expected Value:

μ = E(Y) = Σ y f(y) Sample Mean:

__

Note: From this point on, Y may be expressed as

Ybar (or any other variable - ie:Xbar). For example,

N

Y Y

i

(4)

4

3.1 Sample Mean Example

Tom and Rodney both go to a 4-day gaming

convention. How much they spend (S)each day is listed below:

Day 1 2 3 4

Tom 110 85 90 135

Rodney 190 20 10 200

110 85 90 135 4 105

190 20 10 200 4 105

T Ti

R Ri

S S

N S S

N

  

  

  

  

(5)

3.2 Estimators

Population Variance:

σY2 = Var(Y) = Σ [y-E(y)]2 f(y) Sample Variance:

1 )

(

2

2

   N

Y

S

y

Y

i

(6)

6

3.1 Sample Variance Example

Day 1 2 3 4

Tom 110 85 90 135

Rodney 190 20 10 200

3 517

900 225

400 25

1 4

) 105 135

( )

105 90

( )

105 85

( )

105 110

(

1

) : (

1 ) (

2

2 2

2 2

2

2 2

2 2

S S

S

S S

N

S S S

Tom

N

Y S Y

i T T i

y

(7)

3.2 Sample Variance Example

Day 1 2 3 4

Tom 110 85 90 135

Rodney 190 20 10 200

2 2

2 2

2 2 2 2

2

( )

1

( )

: 1

(190 105) (20 105) (10 105) (200 105) 4 1

S

S

i y

R R

i

S Y Y

N

S S Rodney S

N S

(8)

8

3.2 Sample Variance Example

Day 1 2 3 4

Tom 110 85 90 135

Rodney 190 20 10 200

Even though Tom and Rodney spent on average the same average amount every day ($105), Rodney’s

spending was MUCH more spread out (changed more from day to day), as seen by the variance.

2 2

2

2

( )

1 : 517

: 10,833

S

S

i y

S Y Y

N Tom S

Rodney S

105 105

R T

S

S

(9)

3.2 Estimators

Population Standard Deviation:

σ

Y

= (σ

2

)

1/2

Sample Standard Deviation:

S

y

= (S

y2

)

1/2

(10)

10

3.2 Sample Standard Deviation Example

2

2

2 2

: 517

517 22.7 : 12,833

10,833 104

S

S

S

S

S

S

Tom S

S S

Rodney S

S S

  

  

(11)

3.5 Estimators

Population Covariance:

Cov(V,W)=∑∑ (v-E(v)) (w-E(w)) f(v,w) Sample Covariance:

( ¿ ¿ � − ´ ) (¿ ¿ � − ´ )

¿

��� ( � , ) = �� =¿

 

(12)

12

3.5 Covariance Example

Day 1 2 3 4

Tom 110 85 90 135

Rodney 190 20 10 200

There is a POSITIVE covariance between Tom

and Rodney’s spending. They seem to go up and down at the same time.

( )( )

( , )

1

(110 105)(190 105) (85 105)(20 105) (90 105)(10 105) (135 105)(200 105) ( , )

4 1 425 1700 1425 2850

( , ) 2133

3

T T R R

i i

T R

T R

T R

S S S S

Cov S S

N Cov S S

Cov S S

(13)

3.5 Estimators

Population Correlation:

σ

vw

= corr(V,W)= Cov(V,W)/ σ

v

σ

w

Sample Correlation:

r

vw

= corr(V,W)= Cov(V,W)/ S

v

S

w

(14)

14

3.5 Correlation Example

Day 1 2 3 4

Tom 110 85 90 135

Rodney 190 20 10 200

There is a VERY STRONG positive correlation between Tom and Rodney’s spending.

( , )

( , )

( , ) 2133 0.904

(22.7)(104)

T R

T R

T R

T R

Cov S S Cor S S

S S Cor S S

(15)

3.1 Median

Median –Value point in the middle of the data set

-(Data must be arranged in ascending order)

Calculating:

Odd number of observations = middle data point

������ ���= �+ �

 

For example, with 3 data points, the median is the 2nd data point :

(16)

16

3.1 Median

Calculating:

Even number of observations = average of middle 2 data points

For example, if you have 42 data points:

��������= �+�

= ��+ �

=�� . �

 

The median would be the average of the 21st and 22nd data point.

(17)

3.1 Median

Usage:

-The mean is usually used as an “average” or measurement of “central location”

-If there are strong outliers (values way above or way below most others), that could influence the mean, and the median may be a better measure

Example:

At the end of term, 6/60 students were enrolled but

(18)

18

3.1 Percentiles

-Percentiles are cut-off values that divide the data set so that, when arranged from smallest to largest,

 are below the pth percentile  are above the pth percentile

 

For example,

80

 

Note: The median is the 50th percentile.

(19)

3.1 Quartile

-Quartiles are specific percentiles that divide the data into four sections

25th percentile th percentile th percentile

 

Technically there is a 4th quartile, but it is above 100%

of the data.

(20)

20

3.2 Max, Min, Range and Mode

Min = minimum = lowest value in the data set Max = maximum = highest value in the data set Range = max – min

Mode = the value(s) that show up most

(21)

3.6 Degrees of Freedom

Some distributions (such as the t-distribution) depend on DEGREES OF FREEDOM

Degrees of Freedom are generally dependent on two things:

 Sample size (as sample rise rises, so does degrees of freedom)

 Complication of test (more complicated

statistical tests reduce degrees of freedom)

(22)

22

3.6 t-distribution

 t-tables are both similar in shape to a normal table (bell curve) and statistically related to it

 The t-table is symmetric

 50% probability is on each half of the table

 Statistical analysis often requires us to find

critical t-values (t*) on one or both sides of the central mean of zero

 These are sometimes referred to one-tailed

or two-tailed values

(23)

3.6 t-distribution

t-distribution with 2 tails:

0

Same 

Percentage

t*

-t*

(24)

24

3.6 t-distribution

Example 1:

Find the critical t-values (t*) with 1% in two tails with 27df

(Note: 1% in both tails = 0.5% in each tail)

For p=0.495, df 27 gives t*=2.77, -2.77

(25)

3.6 Example 1

1% in two tails, 27 df:

0

49.5% each

2.77 -2.77

(26)

26

3.6 t-distribution

Example 2:

Find the critical t-value (t*) that cuts of 1% of the right tail with 35df

For 1T=0.01, df 30 gives t*=2.46 df 40 gives t*=2.42

Since 35 is halfway between 30 and 40, a good approximation of df 35 would be:

t*=(2.46+2.42)/2 = 2.44

(27)

3.6 Example 2

1% in right tail, 35 df:

0

49%

2.44

(28)

28

3.6 t-distribution

Typically, the following variable (similar to the normal Z variable seen earlier) will have a t- distribution: (we will see examples later)

) (

) (

Estimator sd

Sample

Estimator E

Estimator

t

(29)

7.5 Estimators as random variables

Each of these estimators will give us a result based upon the data available.

Therefore, two different data sets can yield two different point estimates.

Therefore the value of the point estimate can be seen as being the result of a chance experiment – obtaining a

data set.

Therefore each point estimate is a random variable,

(30)

30

7.5 What distribution to use?

(when examining a sample mean)

IF:

A) The population has a normal distribution (this is a reasonable assumption for many populations)

And

B) You know the population mean Then

The sample mean follows a NORMAL DISTRIBUTION

(31)

7.5 What distribution to use?

If the population doesn’t have a normal distribution:

The central limit theorem states that: “In selecting random samples of size n from a population, the sampling distribution of the sample mean can be

approximated by a normal distribution as the same size becomes large.”

General statistic practice assumes that a sample size of 30 or more is “large” enough

If outliers are an issue, 50 may be a better goal

(32)

32

7.5 What distribution to use?

If you don’t know the population mean:

A t-distribution can be used instead of a normal distribution.

For this course, we will always assume:

a) A normal distribution is appropriate BUT

b) We don’t have the population mean, so 

the t-distribution will be used

(33)

7.5 Estimators Distribution

Since the sample mean is a variable, we can easily apply expectation and summation rules to find the expected value of the sample mean:

   

 

i Y

i i

i

Y N N E

Y E

Y N E

N E Y

Y E

N Y Y

 

) 1 1 (

1

(34)

34

7.5 Estimators Distribution

If we make the simplifying assumption that there is no covariance between data points (ie: one person’s

consumption is unaffected by the next person’s

consumption), we can easily calculate variance for the sample mean:

   

 

 

N N

Y N Var

Y N N Var

Y Var

Y N Var

N Var Y

Y Var

Y Y

Y i

i i

2 2 2

2 2

2

2

1

) 1 1 (

1

 

 

(35)

7.5 Estimators Distribution

If we don’t know the population variance of Ybar, we can calculate its sample variance, therefore,

 

  Y S

SampleVar Y N

Var

Y Y

2 2

 

(36)

36

7.5 Estimators Distribution

The STANDARD DEVIATION of a point estimate (such as sample mean) is often referred to as STANDARD

ERROR:

 

References

Related documents

We have been engaged by the Company to review the condensed set of financial statements in the half-yearly financial report for the six months ended 30 September 2015 which

Consumer cooperative store Departmental store Direct marketing Fixed shop retailing Franchise General store Internet marketing Itinerant retailing Mail order retailing Multiple

Mainly high resolution multichannel seismic data and swatch bathymetry data were used to study near- surface seismostratigraphy, structure and seismic fluid-indicating features in

As this segment contributed about 15% to the company s total revenue in the reported quarter, the continuing weakness can significantly weigh down on the company

Draw a normal distribution, show its mean, now imagine that we take random samples of n = 100 or some other number from this population, and for each sample we calculate the mean

 Central limit theorem- as the sample size n increase, the means of the random samples taken from practically any population approach a normal distribution with mean  and

The general consensus of contingency writers is that if managers are to apply management concepts, principles and techniques successfully, they must consider the

We will study the fundamental principles and techniques of data mining, and we will examine real-world examples and cases to place data-mining techniques in context, to