Stat 231 Coursenotes

(1)

STAT 231 (Winter 2012 - 1121)

Honours Statistics

Prof. R. Metzger Prof. R. Metzger University of Waterloo University of Waterloo L LAA_T_TE

EXer: W. KongXer: W. Kong <

<http://stochasticseeker.wordpress.com/http://stochasticseeker.wordpress.com/>> Last Revision: April 3, 2012

Last Revision: April 3, 2012

1 1 PPPPDDAACC 11 1. 1.1 1 PrProboblelemm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1 1..2 2 PPllaann . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 1 1..3 3 DDaattaa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 1. 1.4 4 AnAnalalysysisis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 1. 1.5 5 ConConclclususioionn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 2

2 MeaMeasursuremeement nt AnalAnalysiysiss 88 2.1

2.1 MeasMeasureuremenments ts of of SpSpreareadd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 2.2

2.2 MeasMeasureuremenments ts of of AssAssociaociatiotionn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 3

3 ProProbabibabilitlity y TheTheororyy 99 4

4 StaStatististictical al ModelModelss 1010

4.

4.1 1 GeGeneneraralilititieses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1100 4.

4.2 2 TTypypes es of of MoModedelsls. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1100 5

5 EstEstimaimates tes and and EstEstimimatoatorsrs 1111 5.

5.1 1 MoMotivtivatiationon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1111 5.2

5.2 MaximuMaximum Likm Likelihooelihood Esd Estimatiotimation (Mn (MLE) LE) AlgoAlgorithmrithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1122 5.

5.3 3 EsEstitimatmatororss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1144 5.

(2)

6

6 DisDistritributibution on TheTheororyy 1717 6.1

6.1 StudStudent ent t-Dt-Dististribributiutionon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1188 6.2

6.2 LeasLeast t SquSquarares es MethMethodod . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1199 7

7 InIntetervrvalalss 1919

7.1

7.1 ConConﬁdeﬁdence nce IntIntervervalsals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2200 7.2

7.2 PrePredicdictinting g IntIntervervalsals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2200 7.3

7.3 LikLikelyhelyhood ood IntIntervervalsals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2200 8

8 HypoHypothethesis sis TTestestinging 2020 9

9 ComCompaparatrative ive ModelModelss 2121

10

10 ExperimExperimental Desiental Designgn 2222

11

11 Model AsseModel Assessmentssment 2323

12

12 Chi-SqChi-Squareuared d TTestest 2323

References

References 2525

Appendix A

Appendix A 2626

These notes are currently a work in progress, and as such may be incomplete or contain errors. [Source: These notes are currently a work in progress, and as such may be incomplete or contain errors. [Source: lambertw.com]

(3)

Winter 2012

Winter 2012 LIST OF FIGURES LIST OF FIGURES

List of Figures

1.1

1.1 PoPopulpulatiation on RelRelatioationshnshipsips . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 1.2

(4)

Acknowledgments:

Special thanks to

Special thanks to Michael Baker Michael Baker and his Land his LAA_T_TE

EX X forformatted notesmatted notes. . They werThey were the inspiratie the inspiration for theon for the structure of these notes.

(5)

Winter 2012

Winter 2012 ABSTRACT ABSTRACT

Abstract Abstract

The purpose of these notes is to provide a guide to the second year honours statistics course. The purpose of these notes is to provide a guide to the second year honours statistics course. The contents of this course are designed to satisfy the following objectives:

The contents of this course are designed to satisfy the following objectives:

••

TTo o proprovide studentvide students s with a with a basibasic c undeunderstarstanding of nding of proprobabibabilitylity, , the the role of role of varvariatioiation n in in empiempiricalrical problem solving and statistical concepts (to be able to critically evaluate, understand and interpret problem solving and statistical concepts (to be able to critically evaluate, understand and interpret statistical studies reported in newspapers, internet and scientiﬁc articles).

statistical studies reported in newspapers, internet and scientiﬁc articles).

••

To provide students with the statistical concepts and techniques necessary to carry out an empiricalTo provide students with the statistical concepts and techniques necessary to carry out an empirical study to answer relevant questions in any given area of interest.

study to answer relevant questions in any given area of interest.

The recommended prerequisites are Calculus I, II (one of Math 137,147 and one of Math 138, The recommended prerequisites are Calculus I, II (one of Math 137,147 and one of Math 138, 148), and Probability (Stat 230). Readers should have a basic understanding of single-variable 148), and Probability (Stat 230). Readers should have a basic understanding of single-variable diﬀerential and integral calculus as well as a good understanding of probability theory.

(6)

1 1 P

PP

PD

DA

AC

C

PPDAC

PPDAC is a process or recipe used to solve statistical problems. It stands for:is a process or recipe used to solve statistical problems. It stands for: P

Problem /roblem / PPlan /lan / DData /ata / AAnalysis /nalysis / CConclusiononclusion

1.

1.1 1 Pr

Prob

oble

lem

m

The problem step’s job is to clearly deﬁne the The problem step’s job is to clearly deﬁne the

1. Goal or Aspect of the study 1. Goal or Aspect of the study 2.

2. TTargarget Populatiet Population and on and UnitsUnits 3.

3. Unit’s VUnit’s Variariatesates 4.

4. AttrAttributeibutes and s and PaParameterametersrs Deﬁnition 1.1.

Deﬁnition 1.1. TheThe target populationtarget population is the set of animals, people or things about which you wish tois the set of animals, people or things about which you wish to draw conclusions. A

draw conclusions. A unitunit is a singleton of the target population.is a singleton of the target population. Deﬁnition 1.2.

Definition 1.2. TheThe sample sample populatipopulationon is a is a spespecificified subsed subset of et of the tarthe target popget populaulatiotion. n. AA samplesample is is aa singleton of the sample population and a unit of the study population.

singleton of the sample population and a unit of the study population. Example 1.1.

Example 1.1. If I were interested in the average age of all students taking STAT 231, then:If I were interested in the average age of all students taking STAT 231, then:

Unit = student of STAT 231 Unit = student of STAT 231

Target Population (T.P.)=students of STAT 231 Target Population (T.P.)=students of STAT 231

Deﬁnition 1.3.

Deﬁnition 1.3. AA variatevariate is a characteristic of a single unit in a target population and is usually one of theis a characteristic of a single unit in a target population and is usually one of the following:

following: 1.

1. Response Response varivariatesates - interest in the study- interest in the study 2.

2. Explanatory variateExplanatory variate - why responses vary from unit to unit- why responses vary from unit to unit (a)

(a) KnoKnown - wn - varivariates that are know to cause the ates that are know to cause the resporesponsesnses i.

i. FocFocal - al - knowknown variates that divide the n variates that divide the targtarget population into subsetset population into subsets (b)

(b) UnknoUnknown - wn - varvariates that cannot be explained in the that iates that cannot be explained in the that cause respocause responsesnses Using Ex.

Using Ex. 1.1. 1.1. as a guide, we can think of the respas a guide, we can think of the response varonse variate as the age of of a iate as the age of of a studestudent and the explanant and the explanatorytory variates as factors such as the age of the student, when and where they were born, their familial situations, variates as factors such as the age of the student, when and where they were born, their familial situations, educa

educational backgrtional background and intelligound and intelligence. ence. FoFor an r an examplexample e of a of a focal variatfocal variate, e, we could think of we could think of somethsomethinging along the lines of domestic vs. international students or male vs. female.

along the lines of domestic vs. international students or male vs. female. Deﬁnition 1.4.

Deﬁnition 1.4. AnAn attributeattribute is a characteristic of a population which is usually denoted by a function of is a characteristic of a population which is usually denoted by a function of the response variate. It can have two other names, depending on the population studied:

(7)

Winter 2012

Winter 2012 1 1 PPPPDDAAC C

••

ParameterParameter is used when studying populationsis used when studying populations

••

StatisticStatistic is used when studying samplesis used when studying samples

••

Attribute can be used interchangeably with the aboveAttribute can be used interchangeably with the above Deﬁnition 1.5.

Deﬁnition 1.5. TheThe aspectaspect is the goal of the study and is generally one of the followingis the goal of the study and is generally one of the following 1.

1. DescrDescriptive - describiiptive - describing or determinng or determining the value of ing the value of an attributean attribute 2.

2. CompaComparative - rative - compacomparing the attribute of ring the attribute of twtwo o (or more(or more) groups) groups 3.

3. CausatiCausative - ve - trying to determine whethetrying to determine whether a r a parparticulaticular explanator explanatory variate causes a ry variate causes a resporesponse to nse to changchangee 4.

4. PrediPredictive - ctive - to predict the value of to predict the value of a response variaa response variate using your explanatote using your explanatory variatery variate

1 1..2

2 P

Plla

an

n

The job of the plan step is to accomplish the following: The job of the plan step is to accomplish the following:

1. Deﬁne the

1. Deﬁne the Study ProtocolStudy Protocol - this is the population that is actually studied and is NOT always a subset- this is the population that is actually studied and is NOT always a subset of the T.P.

of the T.P. 2. Deﬁne the

2. Deﬁne the Sampling ProtocolSampling Protocol - the sampling protocol is used to draw a sample from the study- the sampling protocol is used to draw a sample from the study population

population (a)

(a) Some types of the Some types of the samplisampling protocng protocol includeol include i.

i. RandoRandom sampling (ad verbam sampling (ad verbatim)tim) ii.

ii. JudgJudgment sampliment sampling (e.g. ng (e.g. gendgender ratios)er ratios) iii.

iii. VolVolunteer samplunteer sampling (ad ing (ad verbaverbatim)tim) iv.

iv. RepRepresenresentative samplingtative sampling A.

A. a a sample that matches the sample that matches the samplsample e populpopulation (S.Pation (S.P.) in .) in all importanall important t chacharacterracteristics (i.e.istics (i.e. the proportion of a certain important characteristic in the S.P. is the same in the sample) the proportion of a certain important characteristic in the S.P. is the same in the sample) (b)

(b) GenerGenerallyally, , statisstatisticianticians s preprefer fer Random samplingRandom sampling 3. Deﬁne the Sample - ad verbatim

3. Deﬁne the Sample - ad verbatim 4. Deﬁne the

4. Define the measurement systemmeasurement system - this defines the tools/methods used- this defines the tools/methods used A

A visvisualualizaizatiotion n of of the relatithe relationsonship betwhip between the een the poppopulaulationtions s is is belbelowow. . In In the diagrthe diagram, am, T.PT.P. . is is the the tatargergett population and S.P. is the sample population.

population and S.P. is the sample population. Example 1.2.

Example 1.2. Suppose that we wanted to compare the common sense between maths and arts students atSuppose that we wanted to compare the common sense between maths and arts students at the Univers

the University of Waterloo. ity of Waterloo. An experimeAn experiment that nt that we could do is we could do is take 50 arts students and 33 take 50 arts students and 33 maths studenmaths studentsts from the University of Waterloo taking an introductory statistics course this term and have them write a from the University of Waterloo taking an introductory statistics course this term and have them write a statistics test (non-mathematical) and average the results for each group. In this situation we have:

(8)

T.P. T.P. S.P. S.P. Study Error Study Error Sample Sample Sample Error Sample Error Figure 1.1:

Figure 1.1: Population RelationshipsPopulation Relationships

••

T.P. = All arts and maths studentsT.P. = All arts and maths students

••

S.P. = Arts and maths students from the University of Waterloo taking an introductory statisticsS.P. = Arts and maths students from the University of Waterloo taking an introductory statistics course this term

course this term

••

Sample = 50 arts students and 33 maths students from the University of Waterloo taking an intro-Sample = 50 arts students and 33 maths students from the University of Waterloo taking an intro-ductory statistics course this term

ductory statistics course this term

••

Aspect = Comparative studyAspect = Comparative study

••

Attribute(s): Average grade of arts and maths students (for T.P., S.P., and sample)Attribute(s): Average grade of arts and maths students (for T.P., S.P., and sample) –

– Parameter for T.P. and S.P.Parameter for T.P. and S.P. –

– Statistic for SampleStatistic for Sample There are, however some issues: There are, however some issues:

1.

1. Sample unitSample units diﬀer from S.Ps diﬀer from S.P.. 2.

2. S.PS.P. units diﬀer from . units diﬀer from T.PT.P. units. units 3.

3. We want to measure common sense, but the test is We want to measure common sense, but the test is measumeasuring statisring statistical knowtical knowledgeledge 4. Is it fair to put arts students in a maths course?

4. Is it fair to put arts students in a maths course? Example 1.3.

Example 1.3. Suppose that we want to investigate the eﬀect of cigarettes on the incidence of lung cancerSuppose that we want to investigate the eﬀect of cigarettes on the incidence of lung cancer in humans. We can do this by purchasing online mice, randomly selecting 43 mice and letting them smoke in humans. We can do this by purchasing online mice, randomly selecting 43 mice and letting them smoke 20 ciga

20 cigaretrettes a tes a dadayy. . We then condWe then conduct an uct an autautopsopsy y at at the time of the time of deadeath to th to checheck if ck if thethey y havhave e lunlung g cancancercer.. In this situation, we have:

In this situation, we have:

••

T.P. = All people who smokeT.P. = All people who smoke

••

S.P. = Mice bought onlineS.P. = Mice bought online

••

Sample = 43 selected online miceSample = 43 selected online mice

••

Aspect = Not ComparativeAspect = Not Comparative

(9)

Winter 2012

••

Attribute:Attribute: –

– T.P. - proportion of smoking people with lung cancerT.P. - proportion of smoking people with lung cancer –

– S.P. - proportion of mice bought online with lung cancerS.P. - proportion of mice bought online with lung cancer –

– Sample - proportion of sample with lung cancerSample - proportion of sample with lung cancer Like the previous example, there are some issues:

Like the previous example, there are some issues: 1.

1. Mice and humans may react diﬀerenMice and humans may react diﬀerently to tly to cigacigarettesrettes 2. We do not have a baseline (i.e. what does X% mean?) 2. We do not have a baseline (i.e. what does X% mean?) 3.

3. Is have the mice smoke 20 cigaretIs have the mice smoke 20 cigarettes a day realistictes a day realistic?? Deﬁnition 1.6.

Definition 1.6. LetLet aa((x x )) be defined as an attribute as a function of some population or samplebe defined as an attribute as a function of some population or sample x x . . WeWe define the

deﬁne the study errorstudy error asas

aa((T.P.T.P.))

₋

aa((S.P.S.P.))..

Unfortunately, there is no way to directly calculate this error and so its value must be argued. In our previous Unfortunately, there is no way to directly calculate this error and so its value must be argued. In our previous examples, one could say that the study error in Ex. 1.3. is higher than that of Ex. 1.2. due to the S.P. in examples, one could say that the study error in Ex. 1.3. is higher than that of Ex. 1.2. due to the S.P. in 1.3. being drastically diﬀerent than the T.P..

1.3. being drastically diﬀerent than the T.P.. Deﬁnition 1.7.

Definition 1.7. Similar to above, we define theSimilar to above, we define the sample errorsample error asas aa((S.P.S.P.))

₋

aa((sample sample )).. Although we may be able to calculate

Although we may be able to calculate aa((sample sample )),, aa((S.P.S.P.)) is not computable and like the above, this valueis not computable and like the above, this value must be argued.

must be argued. Remark

Remark 1.11.1.. Note that if we use a random sample, we hope it is representative and that the above errorsNote that if we use a random sample, we hope it is representative and that the above errors are minimized.

are minimized.

1 1..3

3 D

Da

atta

a

This step involves the collecting and organizing of data and statistics. This step involves the collecting and organizing of data and statistics. Data Types

Data Types

••

Discrete DataDiscrete Data:: Simply put, there are “holes” between the numbersSimply put, there are “holes” between the numbers

••

ContinuousContinuous (CTS)(CTS) DataData:: WeWe assumeassume that there are no “holes”that there are no “holes”

••

Nominal DataNominal Data:: No order in the dataNo order in the data

••

Ordinal DataOrdinal Data:: There is some order in the dataThere is some order in the data

••

Binary DataBinary Data:: e.g. Success/failure, true/false, yes/noe.g. Success/failure, true/false, yes/no

••

Counting DataCounting Data:: Used for counting the number of eventsUsed for counting the number of events

(10)

1.

1.4 4 An

Anal

alys

ysis

is

This step involves analyzing our data set and making well-informed observations and analyses. This step involves analyzing our data set and making well-informed observations and analyses. Data Quality

Data Quality

There are 3 factors that we look at: There are 3 factors that we look at:

1.

1. OutliersOutliers: Data that is more extreme than their counterparts.: Data that is more extreme than their counterparts. (a)

(a) ReasonReasons for outliers for outliers?s? i.

i. TTypos or data ypos or data recorecording errording errorsrs ii.

ii. MeasuMeasuremenrement errorst errors iii.

iii. ValValid outlieid outliersrs (b)

(b) WithouWithout having been involvet having been involved in the d in the study from the starstudy from the start, it is t, it is diﬃcudiﬃcult to tell lt to tell which is whichwhich is which 2.

2. MissiMissing Data Poinng Data Pointsts 3.

3. MeasurMeasurement Issement Issuesues Characteristic of a Data Set Characteristic of a Data Set

Outliers could be found in any data set but these 3 always are: Outliers could be found in any data set but these 3 always are:

1.

1. ShapeShape (a)

(a) NumerNumerical methodical methods:s: SkewnessSkewness andand KurtosisKurtosis (not covered in this course)(not covered in this course) (b)

(b) EmpirEmpirical methodsical methods:: i.

i. Bell-Bell-shapshaped: ed: SymmetrSymmetrical abouical about a meant a mean ii.

ii. SkeSkewed left (negawed left (negative): tive): densdensest area on the rightest area on the right iii.

iii. SkeSkewed righwed right (positive)t (positive): : densedensest area on the leftst area on the left iv.

iv. UnifoUniform: rm: even all arouneven all around; straighd; straight linet line 2.

2. CenterCenter (location)(location) (a)

(a) The “miThe “middlddle” e” of our datof our dataa i.

i. ModeMode: statistic that asks which value occurs the most frequently: statistic that asks which value occurs the most frequently ii.

ii. MedianMedian ((QQ22): The middle data value): The middle data value A.

A. The deﬁniThe deﬁnition in this course is an algorition in this course is an algorithmthm B.

B. We deWe denotnotee nn data values bydata values by x x 11,, x x 22,...,x ,...,x nn and the sorted data byand the sorted data by x x (1)(1),, x x (2)(2),...,x ,...,x ((nn)) wherewhere

x

x (1)(1)

≤

x x (2)(2)

≤

...

≤

x x ((nn))

If

If nn is odd thenis odd then QQ2 2 == x _x ((nn+1+1 2

2 )) and if and if nn is even, thenis even, then QQ2 2 ==

x x ((nn 2 2))

−

x x ((nn+1+1 2 2 )) 2 2

(11)

Winter 2012

iii.

iii. MeanMean: The sample mean is: The sample mean is ¯x x =¯=

n n



i i =1=1 x x i i n n A.

A. The sample mean moves in the the direThe sample mean moves in the the direction of the outlier if an outlier is addection of the outlier if an outlier is added (mediand (median as well but less of an eﬀect)

as well but less of an eﬀect) (b)

(b) RobustnessRobustness: The median is less aﬀected by outliers and is thus: The median is less aﬀected by outliers and is thus robust.robust. 3.

3. SpreadSpread (variability)(variability) (a)

(a) RangeRange: By deﬁnition, this is: By deﬁnition, this is x x ((nn))

−

x x (1)(1)

(b)

(b) IQRIQR (Interquartile range): The middle half of your data(Interquartile range): The middle half of your data i. We ﬁrst deﬁne

i. We ﬁrst deﬁne posnposn((aa)) as the function which returns the index of the statisticas the function which returns the index of the statistic aa in a datain a data set. The value of

set. The value of QQ11 is deﬁned as the median in the data setis deﬁned as the median in the data set x

x (1)(1),, x x (2)(2),...,x ,...,x ((posnposn((QQ2)2)

₋

1)1)

and

and QQ22 is the median of the data setis the median of the data set x

x ((posnposn((QQ2)+1)2)+1),, x x ((posnposn((QQ2)+2)2)+2),...,x ,...,x ((nn))

ii. The IQR of set is deﬁned to be the diﬀerence between

ii. The IQR of set is deﬁned to be the diﬀerence between QQ33 andand QQ11. That is,. That is, IIQRQR == QQ33

₋

QQ11

iii. A

iii. A box plotbox plot is visual repis visual represresententatiation on of of thithis. s. The whisThe whiskekers of rs of a a bobox x ploplot t reprepresresent valueent valuess that are some value below or above the data set, according to the upper and lower fences, that are some value below or above the data set, according to the upper and lower fences, which are the

which are the upper and lower bounds of upper and lower bounds of the whiskerthe whiskers s resperespectivelyctively. . The lower fence,The lower fence, LLLL, , isis bounded by a value of

bounded by a value of

LL

LL == QQ11

−

11..5(5(IIQRQR))

and the upper fence,

and the upper fence, UULL, a value of , a value of U

ULL== QQ33 + 1+ 1..5(5(IIQRQR))

We say that any value that is less than

(12)

Here, we have a visualization of a box plot, courtesy of Wikipedia: Here, we have a visualization of a box plot, courtesy of Wikipedia:

Figure 1.2:

Figure 1.2: Box Plots (from Wikipedia)Box Plots (from Wikipedia) (c)

(c) Variance Variance : For a sample population, the variance is deﬁned to be: For a sample population, the variance is deﬁned to be

s s 22 == n n



i i =1=1 ((x x i i

−

x x ¯¯))22 n n

₋

11 which is called the

which is called the sample variancesample variance.. (d)

(d) Standard DeviationStandard Deviation: For a sample population, the standard deviation is just the square root of : For a sample population, the standard deviation is just the square root of the variance: the variance: s s ==







n n



i i =1=1 ((x x i i

−

x x ¯¯))22 n n

₋

11 which is called the

which is called the sample sample standastandard rd deviatdeviationion..

1.

1.5 5 Co

Conc

nclu

lusi

sion

on

In the conclusion, there are only two aspects of the study that you need to be concerned about: In the conclusion, there are only two aspects of the study that you need to be concerned about:

1.

1. Did you answDid you answer your prober your problemlem 2.

(13)

Winter 2012

Winter 2012 2 2 MEAMEASURSUREMEEMENT NT ANAANALLYSIYSIS S

2 2 Me

Meas

asure

ureme

ment

nt An

Anal

alys

ysis

is

The goal of measures is to explain how far our data is spread out and the relationship of data points. The goal of measures is to explain how far our data is spread out and the relationship of data points.

2.

2.1 1 Me

Meas

asure

ureme

ment

nts

s of

of Sp

Sprea

read

d

The goal of the standard deviation is to approximate the average distance a point is from the mean. Here The goal of the standard deviation is to approximate the average distance a point is from the mean. Here are some other methods that we could use for standard deviation and why they fail:

are some other methods that we could use for standard deviation and why they fail:

1. 1. n n



i i =1=1 ((x x i i

−

x x ¯¯)) n

n does not work because it is always equal to 0does not work because it is always equal to 0

2. 2. n n



i i =1=1

||

x x _i i

₋

¯¯x x

_||

n

n works but we cannot do anything mathematically signiﬁcant to itworks but we cannot do anything mathematically signiﬁcant to it

3. 3.



nn



i i =1=1 ((x x i i

−

x x ¯¯))22 n

n is a good guess, but note thatis a good guess, but note that n n



i i =1=1 ((x x i i

−

x x ))¯¯ 22

≤

n n



i i =1=1

||

x x i i

−

x x ¯¯

||

(a)

(a) TTo ﬁx o ﬁx thithis, we uses, we use nn

₋

11 instead of instead of nn (proof comes later on)(proof comes later on) Proposition 2.1.

Proposition 2.1. An interesting identity is the following:An interesting identity is the following:

s s ==







n n



i i =1=1 ((x x i i

−

x x ¯¯))22 n n

₋

11 ==







n n



i i =1=1 ((x x 22 i i

−

nnx x ¯¯22)) n n

₋

11 Proof.

Proof. Exercise for the reader.Exercise for the reader.

Deﬁnition 2.1. Coeﬃcient of Variation

Deﬁnition 2.1. Coeﬃcient of Variation (CV)(CV)

This measure provides a unit-less measurement of spread: This measure provides a unit-less measurement of spread:

CV CV == s s ¯ ¯ x x

×

100%100%

2.2 2.2 Meas

Measurem

urement

ents

s of

of Ass

Associat

ociation

ion

Here, we examine a few interesting measures which test the relationship between two random variables Here, we examine a few interesting measures which test the relationship between two random variables X X any

any Y Y .. 1.

1. CovarianceCovariance: In theory (a population), the covariance is deﬁned as: In theory (a population), the covariance is deﬁned as Cov

(14)

but in practice (in samples) it is deﬁned as but in practice (in samples) it is deﬁned as

s s XY XY == n n



i i =1=1

((x x i i

−

x x )(¯¯)( y y i i

−

y y ))¯¯

n

₋

11 .. Note that Cov

Note that Cov((X,X, Y Y )),, s s XY XY

∈∈

RR and both give us an idea of the direction of the relationship but notand both give us an idea of the direction of the relationship but not

the

the magnimagnitude.tude. 2.

2. CorrelationCorrelation:: In theory (a population), the correlation is deﬁned asIn theory (a population), the correlation is deﬁned as ρ

ρXY XY ==

Cov

Cov((X,X, Y Y )) σ

σX X σσY Y

but in practice (in samples) it is deﬁned as but in practice (in samples) it is deﬁned as

r r XY XY == s s XY XY s s X X s s Y Y .. Note that

Note that

₋

11

_≤

ρρXY XY ,, r r XY XY

≤

11 and both give us an idea of the direction of the relationship AND theand both give us an idea of the direction of the relationship AND the

magnitude. magnitude.

(a)

(a) An interpAn interpretatiretation of on of the values is as the values is as follofollows:ws: i.

i.

_||

r r XY XY

|| ≈

≈

1 1 ==

⇒

strong relationshipstrong relationship

ii.

_||

r r XY XY

||

= = 1 1 ==

⇒

perfectly linear relationshipperfectly linear relationship

iii.

_||

r r XY XY

||

>> 1 1 ==

⇒

positive relationshippositive relationship

iv.

_||

r r XY XY

||

<< 1 1 ==

⇒

negative relationshipnegative relationship

v.

_||

r r XY XY

|| ≈

≈

0 0 ==

⇒

weak relationshipweak relationship

3.

3. Relative-riskRelative-risk: From: From STAT STAT 230230, this the probability of something happening under a condition relative, this the probability of something happening under a condition relative to

to thithis s samsame e thithing ng haphappenpening if ing if the condithe conditiotion n is note is note met. met. FoFormarmallylly, , fofor r twtwo o eveneventsts AA andand BB, it is, it is deﬁned as deﬁned as RR RR == P P ((AA

||

BB)) P P ((AA

_||

BB))¯¯ .. An interesting property is that if

An interesting property is that if RRRR = = 11 thenthen AA

_⊥

BB and vice versa.and vice versa. 4.

4. SlopeSlope: This will be covered later on.: This will be covered later on.

3 3 Pro

Proba

babi

bili

litty

y Th

Theo

eory

ry

All of this content was covered in

All of this content was covered in STAT STAT 230230 so I so I wilwill l not be typesnot be typesettietting it. ng it. CheCheck out ck out any proany probabbabiliilityty text

textboobook k jusjust t to to revreview the iew the conconcepcepts ts and proand properperties of ties of expexpectaectation and tion and vavariariancence. . The only The only impimporortantantt change was a notational one, speciﬁcally that instead of writing

change was a notational one, speciﬁcally that instead of writing X X

_∼

BiBi nn((n,n, p p )), we write, we write X X

_∼

BiBi nn((n,n,Π)Π) where

(15)

Winter 2012

Winter 2012 4 ST4 STAATISTISTICTICAL AL MODMODELS ELS

4 4 St

Stat

atis

isti

tica

cal l Mod

Model

elss

Recall that the goal of statistics is to guess the value of a population parameter on the basis of a (or more) Recall that the goal of statistics is to guess the value of a population parameter on the basis of a (or more) sample statistic.

sample statistic.

4.

4.1 1 Ge

Gene

nera

rali

liti

ties

es

We make our measurements on our sample units. The data values that are collected are: We make our measurements on our sample units. The data values that are collected are:

••

The response variateThe response variate

••

The explanatory variate(s)The explanatory variate(s)

The response variate is a characteristic of the unit that helps us answer the problem. It will be denoted by The response variate is a characteristic of the unit that helps us answer the problem. It will be denoted by Y

Y and will be assumed to be random with a random componentand will be assumed to be random with a random component .. Every model is relating the population parameter (

Every model is relating the population parameter (µ,σ,π,ρ,...µ,σ,π,ρ,...) ) to the to the samsample valuple values (unites (units). s). We will useWe will use at least one subscript representing the value of unit

at least one subscript representing the value of unit i i . Note that a realization,. Note that a realization, y y i i , is the response that is a, is the response that is a

achieved by a response variate achieved by a response variate Y Y i i ..

Example 4.1.

Example 4.1. Here is a situation involving coin ﬂips:Here is a situation involving coin ﬂips:

Y

Y i i == ﬂipping a coin that has yet to landﬂipping a coin that has yet to land

y

y i i == coin lands and realizes its potential (H/T)coin lands and realizes its potential (H/T)

In every

In every modmodel el we assumwe assume e thathat t samsamplipling was ng was dondone e ranrandomdomlyly. . ThiThis s allallowows s us to us to assassume thatume that i i

⊥

 j j forfor

i i

_

== j j ..

4.

4.2 2 T

Type

ypes

s of

of Mod

Model

elss

Goal of

Goal of statistical modelsstatistical models: explain the relationship between a parameter and a response variate.: explain the relationship between a parameter and a response variate. The following are the diﬀerent types of statistical models that we will be examining :

The following are the diﬀerent types of statistical models that we will be examining : 1.

1. DiscreteDiscrete (Binary)(Binary) Model -Model - either the population data is within parameters or it is not.either the population data is within parameters or it is not. (a)

(a) BinomiBinomial:al: Y Y i i == i i ,, i i

∼

BiBi nn(1(1,, Π)Π)

(b)

(b) PoisPoisson:son: Y Y i i == i i ,, i i

∼

P o i s P o i s ((µµ))

2.

2. Response Model -Response Model - these model the response andthese model the response and at most at most use the explanatory variate implicitly as ause the explanatory variate implicitly as a focal explanatory variate.

focal explanatory variate. (a)

(a) Y Y i i == µµ ++ i i , , i i

∼

N N (0(0,, σσ22))

(b)

(16)

(c)

(c) Y Y ii j j == µµ++ τ τ i i + + ii j j , , ii j j

∼

N N (0(0,, σσ ))

(d)

(d) Y Y i i == µµ++ δδ ++ i i

Where

Where Y Y j j ,, Y Y ii j j are the responses of unitare the responses of unit j j [in [in grougroupp i i ],], µµ the overall average,the overall average, µµi i the averagethe average

in the

in the i i thth _group,_{group, τ}_τ

i i the diﬀerence between the overall average and the average in thethe diﬀerence between the overall average and the average in the i i thth group,group,

and

and δδ equal to some bias.equal to some bias. 3.

3. Regression Model -Regression Model - these create a function that relates the response and the explanatory variatethese create a function that relates the response and the explanatory variate (attribute or parameter); note here that we assume

(attribute or parameter); note here that we assume Y Y i i == Y Y i i

||

X X ..

(a)

(a) Y Y i i == αα ++ βx βx i i ++ i i , , i i

∼

N N (0(0,, σσ22))

(b)

(b) Y Y i i == αα ++ ββ((x x i i

−

x ) +x ¯¯) + i i , , i i

∼

N N (0(0,, σσ22))

(c)

(c) Y Y i i == βx βx i i ++ i i , , i i

∼

N N (0(0,, σσ22))

(d)

(d) Y Y ii j j == ααi i ++ βx βx ii j j ++ ii j j , , ii j j

∼

N N (0(0,, σσ22))

(e)

(e) Y Y i i == ββ00 ++ ββ11x x 11i i ++ ββ22x x 22i i ++ i i , , i i

∼

N N (0(0,, σσ22))

Where

Where Y Y j j ,, Y Y ij ij are the response of unitare the response of unit j j [in group[in group i i ],], α,α, ββ00 are the intercepts, βare the intercepts, β is the slope,is the slope,

x

x j j ,, x x ij ij are the explanatory variates of unitare the explanatory variates of unit j j [in group[in group i i ], and], and ββ11,, β β22 the slopes for two explanatorythe slopes for two explanatory

variates (indicating that

variates (indicating that Y Y i i inin ((e e )) is a is a funfunctiction of on of twtwo o expexplanlanatoatory variry variatesates). ). NotNote e thathat t in thein the

above models, we assume that we can control our explanatory variate and so we treat the above models, we assume that we can control our explanatory variate and so we treat the x x _i i



_s_s

constant. constant. Theorem 4.1.

Theorem 4.1. Linear Combinations of Normal Random Variables Linear Combinations of Normal Random Variables Let

Let X X i i

∼

N N ((µµi i ,, σσ22i i )),, X X j j

⊥

X X i i for all for all i i = j j ,, k =



k i i

∈∈

RR. Then,. Then,

T T == k k 00 ++ n n



i i =1=1 k k i i X X i i ==

⇒

T T

∼

N N



k k 00 ++ n n



i i =1=1 k k i i µµi i ,, n n



i i =1=1 k k _i i22σσ_i i22



Proof.

Proof. Do as an exercise.Do as an exercise.

5 5 Es

Esti

tima

mate

tes

s an

and

d Es

Esti

tima

mato

tors

rs

In this section, we continue to develop the relationship between our population and sample. In this section, we continue to develop the relationship between our population and sample.

5.

5.1 1 Mo

Moti

tiva

vati

tion

on

Suppose that I ﬂip a coin 5 times, with the number of heads

Suppose that I ﬂip a coin 5 times, with the number of heads Y Y

_∼

BinBin(5(5,, Π Π == ₂1₂1)). . NoNote tte thahatt ΠΠ is given.is given. What’s the value of

What’s the value of y y that maximizesthat maximizes f f (( y y ))? ? UnfortunatelyUnfortunately, since, since y y is discrete, the best that we can do isis discrete, the best that we can do is draw a histogram and look for the maximum. This is a very boring problem.

draw a histogram and look for the maximum. This is a very boring problem. The

The maximmaximum um liklikelyhood elyhood testtest, , hohowewevever, r, asasks the ks the ququesestiotion n in revein reversrse. e. ThThat at isis, , wwe e ﬁnﬁnd d ththe e opoptitimamall parameter for

parameter for ΠΠ , given, given y y , such that, such that f f (( y y )) is at its maximum. From the formula of is at its maximum. From the formula of f f , given by, given by f f (( y y ) ) ==



55 y y



ΠΠ y y ₍₁₍₁

−

Π)Π)55

−

y y

(17)

Winter 2012

Winter 2012 5 5 ESTIMESTIMAATES TES AND AND ESTIMAESTIMATORS TORS

we can see that if we consider

we can see that if we consider f f as a functionas a function ΠΠ, , it makeit makes it s it a a concontintinuouuous functs functionion. . TTo o comcomputpute thee the maximum, it is just a matter of using logarithmic diﬀerentiation,

maximum, it is just a matter of using logarithmic diﬀerentiation, ln

ln f f (( y y ) = ln) = ln



55 y

y



++ y y ln Π + lnΠ + (5(5

−

y y )) lnln(1(1

−

Π)Π) ﬁnding the partials,

ﬁnding the partials,

∂ ∂ lnln f f (( y y )) ∂ ∂ ΠΠ == y y Π Π

−

5 5

₋

y y 1 1

₋

ΠΠ and setting them to zero to ﬁnd the maximum

and setting them to zero to ﬁnd the maximum ∂ ∂ lnln f f (( y y )) ∂ ∂ ΠΠ = = 0 0 ==

⇒

0 0 == y y Π Π

−

5 5

₋

y y 1 1

₋

ΠΠ = =

_⇒

y y = = 5Π5Π = =

_⇒

Π Π == 55 y y and so if we take

and so if we take Π Π == y y ₅₅, then, then f f (( y y )) is is maximizmaximized.ed.

5.2 5.2 Max

Maximu

imum Lik

m Likeli

elihood Est

hood Estima

imatio

tion (ML

n (MLE) Al

E) Algo

gorith

rithm

m

Here, we will formally describe the

Here, we will formally describe the maximum likelihood estimationmaximum likelihood estimation (MLE) algorithm whose main goal is(MLE) algorithm whose main goal is to know what estimate for a study population parameter

to know what estimate for a study population parameter θθ should be used, given a set of data points suchshould be used, given a set of data points such that probability of the data set being chosen is at its maximum.

that probability of the data set being chosen is at its maximum.

Suppose that we posit that a certain population follows a given statistical model with unknown parameters Suppose that we posit that a certain population follows a given statistical model with unknown parameters θθ11,, θθ22,...,θ,...,θmm. We then draw n. We then draw n data points,data points, y y 11,, y y 22,...,y ,...,y nn randomly (this is important) and using the data, werandomly (this is important) and using the data, we

try to estimate the unknown parameters. The algorithm goes as follows: try to estimate the unknown parameters. The algorithm goes as follows:

1. Deﬁne

1. Deﬁne LL == f f (( y y 11,, y y 22,...,y ,...,y nn) ) == n n



i i =1=1 f

f (( y y i i )) where we callwhere we call LL aa liklikelihood elihood functifunctionon. . SimSimpliplify if fy if pospossibsible.le.

Note that

Note that f f (( y y 11,, y y 22,...,y ,...,y nn) ) == n n



i i =1=1 f

f (( y y i i )) because we are assuming random sampling, implying thatbecause we are assuming random sampling, implying that y y i i

⊥

y y j j ,,

∀∀

i i

_

== j j .. 2. Deﬁne

2. Deﬁne l l = ln(= ln(LL)). Simplify. Simplify l l using logarithmic laws.using logarithmic laws. 3. Find 3. Find ∂l ∂l ∂θ ∂θ11,, ∂l ∂l ∂θ ∂θ22,, ...,., ∂l ∂l ∂θ

∂θnn, set each of the partials to zero, and solve for each, set each of the partials to zero, and solve for each θθi i ,, i i = = 11,...,n,...,n. . ThThe sole solvedved

θθ



_i i_s_{s are called the}_{are called the estimates}_{estimates of}_{of f}_{f and we add a hat,}_{and we add a hat, ˆ}_θθˆ_i i_{, to indicate this.}_{, to indicate this.}

To illustrate the algorithm, we give two examples. To illustrate the algorithm, we give two examples. Example 5.1.

Example 5.1. SupposeSuppose Y Y i i == i i , , i i

∼

ExpExp((θθ)) andand θθ is a rate. What is the optimalis a rate. What is the optimal ˆθθ according to MLE?ˆaccording to MLE?

Solution. Solution. 1. 1. LL == n n



i i =1=1

θθexp(exp(

₋

y y i i θθ) ) == θθnnexp(exp(

−

n n



i i =1=1 y y i i θθ))

(18)

2. 2. l l = ln(= ln(LL) ) == nnlnln θθ

₋



i i =1=1 y y i i θθ 3. 3. _∂θ_∂θ∂l ∂l == nn_θ_θ

₋

n n



i i =1=1 y

y i i == nn_θ_θ

−

nn ¯ y ,, y ¯ _∂θ_∂θ∂l ∂l = = 0 0 ==

⇒

nn_θ_θ_ˆ_ˆ

−

nn ¯ y y = ¯= 0 0 ==

⇒

θθ =ˆˆ=_y_y11_¯_¯..

So the maximum

So the maximum l l , and consequently maximum, and consequently maximum LL is obtained, whenis obtained, when θθ == _y_y11_¯_¯.. Example 5.2.

Example 5.2. SupposeSuppose Y Y i i == αα ++ βx βx i i ++ i i , , i i

∼

N N (0(0,, σσ22)). Use MLE to estimate. Use MLE to estimate αα andand ββ..

Solution. Solution.

First, take note that

First, take note that Y Y i i

∼

N N ((αα++ βx βx i i ,, σσ22))

1.

1. We ﬁrst simplifyWe ﬁrst simplify LL as follows.as follows.

L L == n n



i i =1=1 1 1

√

₂_2πσ_πσ exp(exp(

₋

(( y y i i

−

αα

−

βx βx i i ))22//22σσ22))

= = 11 (2 (2ππ))nn22

··

1 1 σ σnn exp(exp( n n



i i =1=1

−

(( y y i i

−

αα

−

βx βx i i ))22//22σσ22))

= = K K

_··

11 σ σnn exp(exp( n n



i i =1=1

−

(( y y i i

−

αα

−

βx βx i i ))22//22σσ22))

where

where K K == 11

(2 (2ππ))nn22..

2.

2. By direct evaBy direct evaluatioluation,n,

l l = = lnln K K

₋

nn lnln σσ

₋

n n



i i =1=1

(( y y i i

−

αα

−

βx βx i i ))22//22σσ22

3.

3. ComputComputing the partiing the partials, we getals, we get

∂l ∂l ∂α ∂α == n n



i i =1=1

(( y y i i

−

αα

−

βx βx i i ))/σ/σ22

and and ∂l ∂l ∂α ∂α == n n



i i =1=1

((x x i i )()( y y i i

−

αα

−

βx βx i i ))/σ/σ22..

So ﬁrst solving for

So ﬁrst solving for αα we getwe get

∂l ∂l ∂α ∂α = 0 = 0 ==

⇒

n n



i i =1=1

(( y y i i

−

ααˆˆ

−

βx βx ˆˆ i i ) ) = = 00

=

_⇒

nn y y ¯¯

₋

nnααˆˆ

₋

nn ˆ β βˆx x ¯¯= = 00 =

(19)

Winter 2012

and extending to

and extending to ββ we getwe get ∂l ∂l ∂β ∂β = = 0 0 ==

⇒

n n



i i =1=1

((x x i i )()( y y i i

−

ααˆˆ

−

βx βx ˆˆ i i ) ) = = 00

= =

_⇒

n n



i i =1=1 x

x i i y y i i

−

nn¯x ˆx ¯ααˆ

−

β βˆˆ n n



i i =1=1 x x _i i22 ((5.15.1)) = =

_⇒

n n



i i =1=1 x

x i i y y i i

−

nn¯x x ((¯¯ y y ¯

−

β β ¯ˆˆx x ))¯

−

β βˆˆ n n



i i =1=1 x x _i i22 = = 00 = =

_⇒

n n



i i =1=1 x

x i i y y i i

−

nn¯x ¯x ¯ y y ¯

−

β βˆˆ



nn



i i =1=1 x x _i i22

₋

nn¯x x ¯22



= = 00 = =

_⇒

n n



i i =1=1 x

x i i y y i i

−

nn¯x x ¯¯ y y ¯

n n

₋

11 == ˆ ˆ β β



n n



i i =1=1 x x 22 i i

−

nn¯x x ¯22



n n

₋

11 = =

_⇒

βs βs ˆˆ _x_x22 == s s xy xy = =

_⇒

β β =ˆˆ= s s xx y y s s 22 x x

Before we head oﬀ into the next section, we should recall a very important theorem from

Before we head oﬀ into the next section, we should recall a very important theorem from STAT STAT 230230.. Theorem 5.1.

Theorem 5.1. (Central Limit Theorem)(Central Limit Theorem) Let

Let X X 11,, X X 22,...,X ,...,X nn be i.i.d.be i.i.d.11 random variables with distributionrandom variables with distribution F F ,, E E ((X X i i ) ) == µµ and and V ar V ar ((X X i i ) ) == σσ22,,

∀∀

i i , and , and

X

X i i

⊥

X X j j ,,

∀∀

i i



== j j . Then as . Then as nn

→

→ ∞

∞

,, ¯X X ¯

∼

N N ((µ,µ, σσ

2 2 n n )),, n n



i i =1=1 X X i i

∼

N N ((nµ,σnµ,σ22nn)).. Proof.

Proof. Beyond the scope of this course.Beyond the scope of this course.

5.

5.3 3 Es

Esti

tima

mato

tors

rs

One of the pitfalls that you may notice with the MLE Algorithm is that the estimate is relative to the sample One of the pitfalls that you may notice with the MLE Algorithm is that the estimate is relative to the sample dat

data a thathat t we collewe collect ct frofrom m the study populthe study populatioation. n. FoFor r examexampleple, , supsuppospose e thathat t we have we have twtwo o samsampleples s dradrawnwn from our population,

from our population, SS11 ==

{{

y y i i 11

}}

and Sand S22 ==

{{

y y i i 22

}}

, of the same size, taken randomly with replacement and, of the same size, taken randomly with replacement and

suppose that we model the data in our population with the model

suppose that we model the data in our population with the model Y Y i i == αα++ βx βx i i ++ i i ,, i i

∼

N N (0(0,, σσ22)). It could. It could

be the case that when we apply MLE to both of them to get the following two models,

be the case that when we apply MLE to both of them to get the following two models, y y i i 11 = = ˆααˆ11++ ˆ β βˆ11x x i i ++ i i

and

and y y i i 22 = = ˆααˆ22++ ˆ β βˆ22x x i i ++ i i , that the estimates may diﬀer. That is, it is possible that, that the estimates may diﬀer. That is, it is possible that ˆααˆ11



= = ˆααˆ22 and/orand/or ˆ β βˆ11



== ˆ β βˆ22..

This is because we are doing random sampling, which motivates us to believe that if we take

This is because we are doing random sampling, which motivates us to believe that if we take nn randomrandom samples,

samples, SS j j ==

{{

y y ij ij

}}

, from the study population of equal size, then the estimates,, from the study population of equal size, then the estimates, θθ j j , for a parameter, for a parameter θθ of of

the samples should follow a distribution. the samples should follow a distribution. We

We calcall l the the ranrandom dom vavariariable ble reprepresresententing ing disdistritributbution ion anan estimatorestimator and denote it as the parameter inand denote it as the parameter in question with a tilde on the top,

question with a tilde on the top, ˜θθ. Note that˜. Note that ˜θθ describes the distribution of ˜describes the distribution of θθ for the population and notfor the population and not the sampl

the sample. e. In other worIn other words, the ds, the relrelatioationshnship is ip is thatthat ˆθθ is a speciﬁc realization of ˆ is a speciﬁc realization of ˜θθ through the sampled˜ through the sampled data points. Now the obvious question to ask here is what exactly the distribution of an estimator is, given data points. Now the obvious question to ask here is what exactly the distribution of an estimator is, given

1

(20)

the mode

the model l that we are usinthat we are using g to model the to model the poppopulaulatiotion. n. ThiThis s is actuais actually depelly dependendent on nt on whawhat t we get as we get as thethe estimate, which we will see in the following examples.

estimate, which we will see in the following examples. Example 5.3.

Example 5.3. Given a modelGiven a model Y Y i i == µµ ++ i i ,, i i

∼

N N (0(0,, σσ22)), if one were to compute the best estimate for, if one were to compute the best estimate for

the parameter

the parameter µµ using MLE, one would obtainusing MLE, one would obtain ˆµµ = ˆ = ¯ y y . ¯. BecBecausause e the estithe estimatomator that we r that we wawant to nt to examexamineine describes the population and not the sample, and because we are doing random sampling, we capitalize the describes the population and not the sample, and because we are doing random sampling, we capitalize the y

y to to shoshow w thithis s (w(we e movmove e frofrom m the samplthe sample, e, up a up a levlevel el of of absabstratractioction, into n, into the study poputhe study populatlationion). ). ThaThat t is,is, ˜

˜ µ

µ == ¯Y Y . ¯. So what is the distrSo what is the distribuibutiotion of n of ˜µ in this caseµ˜ in this case? ? RecRecall from Thm 4.1. all from Thm 4.1. thathat t a a linlineaear combinr combinatiation of on of normally distributed random variables is normal. Since

normally distributed random variables is normal. Since ¯ ¯ Y Y == n n



i i =1=1 1 1 n nY Y i i ˜

˜ α

α is normal withis normal with E E ((¯Y Y ) ¯) == µµ andand V ar V ar (( ¯Y Y ) ¯) == σσ_n_n22. . SoSo ˜αα˜

_∼

N ((µ,N µ, σσ_n_n22)).. Example 5.4.

Example 5.4. Let’Let’s s try a try a slislightghtly ly momore difficure difficult lt examexampleple. . SupSuppospose e we want to we want to find the find the disdistritributbution of ion of αα˜˜ and

and ˜ β for the model β˜ for the model Y Y i i == αα ++ ββ((x x i i

−

x x ) +¯¯) + i i , , i i

∼

N N (0(0,, σσ22)). . ThrouThrough similagh similar methods shr methods shown in Ex. own in Ex. 5.2.,5.2.,

using MLE, one should be able to obtain the estimates

using MLE, one should be able to obtain the estimates ˆαα = ˆ = ¯ y y and¯ and ˆ β β =ˆ= s s xy xy

s s 22

x

x . Now, from Ex. 5.3., we already. Now, from Ex. 5.3., we already

computed the distribution for

computed the distribution for αα as˜˜ as αα˜˜

_∼

N N ((µ,µ, σσ_n_n22)), so what we are more interested in is, so what we are more interested in is ˜ β β. Through a simple˜. Through a simple re-arrangement of the deﬁnition of

re-arrangement of the deﬁnition of ˜ β β,,˜

˜ ˜ B B == n n



i i =1=1

((x x i i

−

x x )(¯¯)(Y Y i i

−

Y Y ¯¯)) n n



i i =1=1 ((x x i i

−

x x ))¯¯ 22 = = n n



i i =1=1

((x x i i

−

x x ¯¯))Y Y i i n n



i i =1=1 ((x x i i

−

x x ¯¯))22 = = n n



i i =1=1 ((x x i i

−

x x ))¯¯ ((x x i i

−

x ))x ¯¯ 22 Y Y i i since since ¯Y Y ¯

n n



i i =1=1

((x x i i

−

x x ¯¯) ) = = 00. . SoSo ˜ β β is actually a linear combination of normal random variables, making it normal˜ is actually a linear combination of normal random variables, making it normal

as well. Computing the expectation, we get as well. Computing the expectation, we get

E E (( ˜BB) ˜) == n n



i i =1=1 ((x x i i

−

x x ¯¯)) ((x x i i

−

x x ¯¯))22 E E ((Y Y i i ))

= = n n



i i =1=1



11 ((x x i i

−

x x ))¯¯ 22



_((x_x i i

−

x x )(¯¯)(αα ++ ββ((x x i i

−

x x ))¯¯)) = = 11 ((nn

₋

1)1)s s 22 x x







α α =0 =0



 

nn

 





i i =1=1 ((x x i i

−

x x )) +¯¯ + β β = =s s _x_x22((nn

₋

1)1)



 

nn

 





i i =1=1 ((x x i i

−

x x ))¯¯ 22







= = ββ



s s 2 2 x x s s 22 x x



= = β.β. We can

We can simsimilailarly compurly compute te the varithe variancance e usiusing ng thithis s methmethod, althouod, although this gh this wawas s not done not done in in leclecturtures. es. I I wilwilll leave it to the reader as an exercise (I posit that variance should be 0). Thus,

(21)

Winter 2012

5.

5.4 4 Bi

Bias

ases

es in

in St

Stati

atist

stic

icss

In this section we take a look at one of the problems of using estimators. In this section we take a look at one of the problems of using estimators. Deﬁnition 5.1.

Deﬁnition 5.1. We say that for a given estimator,We say that for a given estimator, ˜θθ, of an estimate for a model is˜, of an estimate for a model is unbiasedunbiased if the followingif the following holds

holds

E

E ((˜θθ) ˜) == θ.θ. Otherwise, we say that our estimator is

Otherwise, we say that our estimator is biasedbiased..

One way to intuitively look at this deﬁnition is that we say an estimator is unbiased if we take

One way to intuitively look at this deﬁnition is that we say an estimator is unbiased if we take nn samplsamples es of of equal size with estimators

equal size with estimators ˜θθ˜i i for each sample, and on averagefor each sample, and on average

n n



i i =1=1 ˜ ˜ θ θi i n

n

≈

θθ for large enoughfor large enough nn, or as, or as nn

→ ∞

,,

n n



i i =1=1 ˜ ˜ θ θi i n

n

→

θθ. Since all the the examples that we’ve seen were so far unbiased, let’s take a look at an unbiased. Since all the the examples that we’ve seen were so far unbiased, let’s take a look at an unbiased

one. one.

Example 5.5.

Example 5.5. Consider the modelConsider the model Y Y i i == µµ ++ i i , , i i

∼

N N ((µ,µ, σσ22)). . If one werIf one were to e to compcompute the besute the best estimat estimatete

for

for σσ22 _{using MLE, the result would be}_{using MLE, the result would be}

ˆ ˆ σ σ22 == n n



i i =1=1

(( y y i i

−

y y ¯¯))22

n n ==

⇒

σσ˜˜ 2 2 ₌₌ n n



i i =1=1

((Y Y i i

−

Y Y ¯¯))22

n n ==



nn



i i =1=1 Y

Y _i i22



₋

nn ¯Y Y ¯22 n

n ..

Fr

From om herhere, e, let’let’s s try to try to compcompute the ute the expexpectectatioation. n. HoHoweweverver, , befbeforore e thathat, t, it it loolooks ks thathat t ﬁndﬁndinging E E (( ¯Y Y ¯22_{)) and}_and

E

E ((Y Y 22_{))would be helpful in this situation. We can ﬁnd it directly through the variance of}_{would be helpful in this situation. We can ﬁnd it directly through the variance of ¯}_Y_{Y and}_¯ _{and Y}_Y ..

V ar

V ar ((¯Y Y ¯) ) == E E ((¯Y Y ¯22))

₋



E E ((¯Y Y ¯))



22 ==

_⇒

σσ

2 2

n

n == E E ((¯Y Y ¯

2 2₎₎

−

µµ22 =

=

_⇒

E E ((¯Y Y ¯22) ) == σσ

2 2 n n ++ µµ 2 2 and similarly and similarly E

Stat 231 Coursenotes

STAT 231 (Winter 2012 - 1121)

STAT 231 (Winter 2012 - 1121)

Honours Statistics

Honours Statistics

Table of Contents

Table of Contents

List of Figures

List of Figures

Acknowledgments:

Acknowledgments:

••

••

1

1 P

PP

PD

DA

AC

C

1.

1.1

1 Pr

Prob

oble

lem

m

••

••

••

1

1..2

2 P

Plla

an

n

••

••

••

••

••

••

••

••

••

••

−

−

−

−

1

1..3

3 D

Da

atta

a

••

••

••

••

••

••

1.

1.4

4 An

Anal

alys

ysis

is

≤

≤

≤

≤

≤

≤

−

−





−

₋

₋

₋

₋

₋

₋

₋

₋

₋

₋

₋

₋

₋

₋