STAT 231 (Winter 2012 - 1121)
STAT 231 (Winter 2012 - 1121)
Honours Statistics
Honours Statistics
Prof. R. Metzger Prof. R. Metzger University of Waterloo University of Waterloo L LAATTEEXer: W. KongXer: W. Kong <
<http://stochasticseeker.wordpress.com/http://stochasticseeker.wordpress.com/>> Last Revision: April 3, 2012
Last Revision: April 3, 2012
Table of Contents
Table of Contents
1 1 PPPPDDAACC 11 1. 1.1 1 PrProboblelemm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1 1..2 2 PPllaann . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 1 1..3 3 DDaattaa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 1. 1.4 4 AnAnalalysysisis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 1. 1.5 5 ConConclclususioionn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 22 MeaMeasursuremeement nt AnalAnalysiysiss 88 2.1
2.1 MeasMeasureuremenments ts of of SpSpreareadd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 2.2
2.2 MeasMeasureuremenments ts of of AssAssociaociatiotionn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 3
3 ProProbabibabilitlity y TheTheororyy 99 4
4 StaStatististictical al ModelModelss 1010
4.
4.1 1 GeGeneneraralilititieses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1100 4.
4.2 2 TTypypes es of of MoModedelsls. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1100 5
5 EstEstimaimates tes and and EstEstimimatoatorsrs 1111 5.
5.1 1 MoMotivtivatiationon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1111 5.2
5.2 MaximuMaximum Likm Likelihooelihood Esd Estimatiotimation (Mn (MLE) LE) AlgoAlgorithmrithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1122 5.
5.3 3 EsEstitimatmatororss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1144 5.
6
6 DisDistritributibution on TheTheororyy 1717 6.1
6.1 StudStudent ent t-Dt-Dististribributiutionon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1188 6.2
6.2 LeasLeast t SquSquarares es MethMethodod . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1199 7
7 InIntetervrvalalss 1919
7.1
7.1 ConConfidefidence nce IntIntervervalsals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2200 7.2
7.2 PrePredicdictinting g IntIntervervalsals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2200 7.3
7.3 LikLikelyhelyhood ood IntIntervervalsals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2200 8
8 HypoHypothethesis sis TTestestinging 2020 9
9 ComCompaparatrative ive ModelModelss 2121
10
10 ExperimExperimental Desiental Designgn 2222
11
11 Model AsseModel Assessmentssment 2323
12
12 Chi-SqChi-Squareuared d TTestest 2323
References
References 2525
Appendix A
Appendix A 2626
These notes are currently a work in progress, and as such may be incomplete or contain errors. [Source: These notes are currently a work in progress, and as such may be incomplete or contain errors. [Source: lambertw.com]
Winter 2012
Winter 2012 LIST OF FIGURES LIST OF FIGURES
List of Figures
List of Figures
1.1
1.1 PoPopulpulatiation on RelRelatioationshnshipsips . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 1.2
Acknowledgments:
Acknowledgments:
Special thanks to
Special thanks to Michael Baker Michael Baker and his Land his LAATTE
EX X forformatted notesmatted notes. . They werThey were the inspiratie the inspiration for theon for the structure of these notes.
Winter 2012
Winter 2012 ABSTRACT ABSTRACT
Abstract Abstract
The purpose of these notes is to provide a guide to the second year honours statistics course. The purpose of these notes is to provide a guide to the second year honours statistics course. The contents of this course are designed to satisfy the following objectives:
The contents of this course are designed to satisfy the following objectives:
••
TTo o proprovide studentvide students s with a with a basibasic c undeunderstarstanding of nding of proprobabibabilitylity, , the the role of role of varvariatioiation n in in empiempiricalrical problem solving and statistical concepts (to be able to critically evaluate, understand and interpret problem solving and statistical concepts (to be able to critically evaluate, understand and interpret statistical studies reported in newspapers, internet and scientific articles).statistical studies reported in newspapers, internet and scientific articles).
••
To provide students with the statistical concepts and techniques necessary to carry out an empiricalTo provide students with the statistical concepts and techniques necessary to carry out an empirical study to answer relevant questions in any given area of interest.study to answer relevant questions in any given area of interest.
The recommended prerequisites are Calculus I, II (one of Math 137,147 and one of Math 138, The recommended prerequisites are Calculus I, II (one of Math 137,147 and one of Math 138, 148), and Probability (Stat 230). Readers should have a basic understanding of single-variable 148), and Probability (Stat 230). Readers should have a basic understanding of single-variable differential and integral calculus as well as a good understanding of probability theory.
1
1 P
PP
PD
DA
AC
C
PPDAC
PPDAC is a process or recipe used to solve statistical problems. It stands for:is a process or recipe used to solve statistical problems. It stands for: P
Problem /roblem / PPlan /lan / DData /ata / AAnalysis /nalysis / CConclusiononclusion
1.
1.1
1 Pr
Prob
oble
lem
m
The problem step’s job is to clearly define the The problem step’s job is to clearly define the
1. Goal or Aspect of the study 1. Goal or Aspect of the study 2.
2. TTargarget Populatiet Population and on and UnitsUnits 3.
3. Unit’s VUnit’s Variariatesates 4.
4. AttrAttributeibutes and s and PaParameterametersrs Definition 1.1.
Definition 1.1. TheThe target populationtarget population is the set of animals, people or things about which you wish tois the set of animals, people or things about which you wish to draw conclusions. A
draw conclusions. A unitunit is a singleton of the target population.is a singleton of the target population. Definition 1.2.
Definition 1.2. TheThe sample sample populatipopulationon is a is a spespecificified subsed subset of et of the tarthe target popget populaulatiotion. n. AA samplesample is is aa singleton of the sample population and a unit of the study population.
singleton of the sample population and a unit of the study population. Example 1.1.
Example 1.1. If I were interested in the average age of all students taking STAT 231, then:If I were interested in the average age of all students taking STAT 231, then:
Unit = student of STAT 231 Unit = student of STAT 231
Target Population (T.P.)=students of STAT 231 Target Population (T.P.)=students of STAT 231
Definition 1.3.
Definition 1.3. AA variatevariate is a characteristic of a single unit in a target population and is usually one of theis a characteristic of a single unit in a target population and is usually one of the following:
following: 1.
1. Response Response varivariatesates - interest in the study- interest in the study 2.
2. Explanatory variateExplanatory variate - why responses vary from unit to unit- why responses vary from unit to unit (a)
(a) KnoKnown - wn - varivariates that are know to cause the ates that are know to cause the resporesponsesnses i.
i. FocFocal - al - knowknown variates that divide the n variates that divide the targtarget population into subsetset population into subsets (b)
(b) UnknoUnknown - wn - varvariates that cannot be explained in the that iates that cannot be explained in the that cause respocause responsesnses Using Ex.
Using Ex. 1.1. 1.1. as a guide, we can think of the respas a guide, we can think of the response varonse variate as the age of of a iate as the age of of a studestudent and the explanant and the explanatorytory variates as factors such as the age of the student, when and where they were born, their familial situations, variates as factors such as the age of the student, when and where they were born, their familial situations, educa
educational backgrtional background and intelligound and intelligence. ence. FoFor an r an examplexample e of a of a focal variatfocal variate, e, we could think of we could think of somethsomethinging along the lines of domestic vs. international students or male vs. female.
along the lines of domestic vs. international students or male vs. female. Definition 1.4.
Definition 1.4. AnAn attributeattribute is a characteristic of a population which is usually denoted by a function of is a characteristic of a population which is usually denoted by a function of the response variate. It can have two other names, depending on the population studied:
Winter 2012
Winter 2012 1 1 PPPPDDAAC C
••
ParameterParameter is used when studying populationsis used when studying populations••
StatisticStatistic is used when studying samplesis used when studying samples••
Attribute can be used interchangeably with the aboveAttribute can be used interchangeably with the above Definition 1.5.Definition 1.5. TheThe aspectaspect is the goal of the study and is generally one of the followingis the goal of the study and is generally one of the following 1.
1. DescrDescriptive - describiiptive - describing or determinng or determining the value of ing the value of an attributean attribute 2.
2. CompaComparative - rative - compacomparing the attribute of ring the attribute of twtwo o (or more(or more) groups) groups 3.
3. CausatiCausative - ve - trying to determine whethetrying to determine whether a r a parparticulaticular explanator explanatory variate causes a ry variate causes a resporesponse to nse to changchangee 4.
4. PrediPredictive - ctive - to predict the value of to predict the value of a response variaa response variate using your explanatote using your explanatory variatery variate
1
1..2
2 P
Plla
an
n
The job of the plan step is to accomplish the following: The job of the plan step is to accomplish the following:
1. Define the
1. Define the Study ProtocolStudy Protocol - this is the population that is actually studied and is NOT always a subset- this is the population that is actually studied and is NOT always a subset of the T.P.
of the T.P. 2. Define the
2. Define the Sampling ProtocolSampling Protocol - the sampling protocol is used to draw a sample from the study- the sampling protocol is used to draw a sample from the study population
population (a)
(a) Some types of the Some types of the samplisampling protocng protocol includeol include i.
i. RandoRandom sampling (ad verbam sampling (ad verbatim)tim) ii.
ii. JudgJudgment sampliment sampling (e.g. ng (e.g. gendgender ratios)er ratios) iii.
iii. VolVolunteer samplunteer sampling (ad ing (ad verbaverbatim)tim) iv.
iv. RepRepresenresentative samplingtative sampling A.
A. a a sample that matches the sample that matches the samplsample e populpopulation (S.Pation (S.P.) in .) in all importanall important t chacharacterracteristics (i.e.istics (i.e. the proportion of a certain important characteristic in the S.P. is the same in the sample) the proportion of a certain important characteristic in the S.P. is the same in the sample) (b)
(b) GenerGenerallyally, , statisstatisticianticians s preprefer fer Random samplingRandom sampling 3. Define the Sample - ad verbatim
3. Define the Sample - ad verbatim 4. Define the
4. Define the measurement systemmeasurement system - this defines the tools/methods used- this defines the tools/methods used A
A visvisualualizaizatiotion n of of the relatithe relationsonship betwhip between the een the poppopulaulationtions s is is belbelowow. . In In the diagrthe diagram, am, T.PT.P. . is is the the tatargergett population and S.P. is the sample population.
population and S.P. is the sample population. Example 1.2.
Example 1.2. Suppose that we wanted to compare the common sense between maths and arts students atSuppose that we wanted to compare the common sense between maths and arts students at the Univers
the University of Waterloo. ity of Waterloo. An experimeAn experiment that nt that we could do is we could do is take 50 arts students and 33 take 50 arts students and 33 maths studenmaths studentsts from the University of Waterloo taking an introductory statistics course this term and have them write a from the University of Waterloo taking an introductory statistics course this term and have them write a statistics test (non-mathematical) and average the results for each group. In this situation we have:
T.P. T.P. S.P. S.P. Study Error Study Error Sample Sample Sample Error Sample Error Figure 1.1:
Figure 1.1: Population RelationshipsPopulation Relationships
••
T.P. = All arts and maths studentsT.P. = All arts and maths students••
S.P. = Arts and maths students from the University of Waterloo taking an introductory statisticsS.P. = Arts and maths students from the University of Waterloo taking an introductory statistics course this termcourse this term
••
Sample = 50 arts students and 33 maths students from the University of Waterloo taking an intro-Sample = 50 arts students and 33 maths students from the University of Waterloo taking an intro-ductory statistics course this termductory statistics course this term
••
Aspect = Comparative studyAspect = Comparative study••
Attribute(s): Average grade of arts and maths students (for T.P., S.P., and sample)Attribute(s): Average grade of arts and maths students (for T.P., S.P., and sample) –– Parameter for T.P. and S.P.Parameter for T.P. and S.P. –
– Statistic for SampleStatistic for Sample There are, however some issues: There are, however some issues:
1.
1. Sample unitSample units differ from S.Ps differ from S.P.. 2.
2. S.PS.P. units differ from . units differ from T.PT.P. units. units 3.
3. We want to measure common sense, but the test is We want to measure common sense, but the test is measumeasuring statisring statistical knowtical knowledgeledge 4. Is it fair to put arts students in a maths course?
4. Is it fair to put arts students in a maths course? Example 1.3.
Example 1.3. Suppose that we want to investigate the effect of cigarettes on the incidence of lung cancerSuppose that we want to investigate the effect of cigarettes on the incidence of lung cancer in humans. We can do this by purchasing online mice, randomly selecting 43 mice and letting them smoke in humans. We can do this by purchasing online mice, randomly selecting 43 mice and letting them smoke 20 ciga
20 cigaretrettes a tes a dadayy. . We then condWe then conduct an uct an autautopsopsy y at at the time of the time of deadeath to th to checheck if ck if thethey y havhave e lunlung g cancancercer.. In this situation, we have:
In this situation, we have:
••
T.P. = All people who smokeT.P. = All people who smoke••
S.P. = Mice bought onlineS.P. = Mice bought online••
Sample = 43 selected online miceSample = 43 selected online mice••
Aspect = Not ComparativeAspect = Not ComparativeWinter 2012
Winter 2012 1 1 PPPPDDAAC C
••
Attribute:Attribute: –– T.P. - proportion of smoking people with lung cancerT.P. - proportion of smoking people with lung cancer –
– S.P. - proportion of mice bought online with lung cancerS.P. - proportion of mice bought online with lung cancer –
– Sample - proportion of sample with lung cancerSample - proportion of sample with lung cancer Like the previous example, there are some issues:
Like the previous example, there are some issues: 1.
1. Mice and humans may react differenMice and humans may react differently to tly to cigacigarettesrettes 2. We do not have a baseline (i.e. what does X% mean?) 2. We do not have a baseline (i.e. what does X% mean?) 3.
3. Is have the mice smoke 20 cigaretIs have the mice smoke 20 cigarettes a day realistictes a day realistic?? Definition 1.6.
Definition 1.6. LetLet aa((x x )) be defined as an attribute as a function of some population or samplebe defined as an attribute as a function of some population or sample x x . . WeWe define the
define the study errorstudy error asas
aa((T.P.T.P.))
−
−
aa((S.P.S.P.))..Unfortunately, there is no way to directly calculate this error and so its value must be argued. In our previous Unfortunately, there is no way to directly calculate this error and so its value must be argued. In our previous examples, one could say that the study error in Ex. 1.3. is higher than that of Ex. 1.2. due to the S.P. in examples, one could say that the study error in Ex. 1.3. is higher than that of Ex. 1.2. due to the S.P. in 1.3. being drastically different than the T.P..
1.3. being drastically different than the T.P.. Definition 1.7.
Definition 1.7. Similar to above, we define theSimilar to above, we define the sample errorsample error asas aa((S.P.S.P.))
−
−
aa((sample sample )).. Although we may be able to calculateAlthough we may be able to calculate aa((sample sample )),, aa((S.P.S.P.)) is not computable and like the above, this valueis not computable and like the above, this value must be argued.
must be argued. Remark
Remark 1.11.1.. Note that if we use a random sample, we hope it is representative and that the above errorsNote that if we use a random sample, we hope it is representative and that the above errors are minimized.
are minimized.
1
1..3
3 D
Da
atta
a
This step involves the collecting and organizing of data and statistics. This step involves the collecting and organizing of data and statistics. Data Types
Data Types
••
Discrete DataDiscrete Data:: Simply put, there are “holes” between the numbersSimply put, there are “holes” between the numbers••
ContinuousContinuous (CTS)(CTS) DataData:: WeWe assumeassume that there are no “holes”that there are no “holes”••
Nominal DataNominal Data:: No order in the dataNo order in the data••
Ordinal DataOrdinal Data:: There is some order in the dataThere is some order in the data••
Binary DataBinary Data:: e.g. Success/failure, true/false, yes/noe.g. Success/failure, true/false, yes/no••
Counting DataCounting Data:: Used for counting the number of eventsUsed for counting the number of events1.
1.4
4 An
Anal
alys
ysis
is
This step involves analyzing our data set and making well-informed observations and analyses. This step involves analyzing our data set and making well-informed observations and analyses. Data Quality
Data Quality
There are 3 factors that we look at: There are 3 factors that we look at:
1.
1. OutliersOutliers: Data that is more extreme than their counterparts.: Data that is more extreme than their counterparts. (a)
(a) ReasonReasons for outliers for outliers?s? i.
i. TTypos or data ypos or data recorecording errording errorsrs ii.
ii. MeasuMeasuremenrement errorst errors iii.
iii. ValValid outlieid outliersrs (b)
(b) WithouWithout having been involvet having been involved in the d in the study from the starstudy from the start, it is t, it is difficudifficult to tell lt to tell which is whichwhich is which 2.
2. MissiMissing Data Poinng Data Pointsts 3.
3. MeasurMeasurement Issement Issuesues Characteristic of a Data Set Characteristic of a Data Set
Outliers could be found in any data set but these 3 always are: Outliers could be found in any data set but these 3 always are:
1.
1. ShapeShape (a)
(a) NumerNumerical methodical methods:s: SkewnessSkewness andand KurtosisKurtosis (not covered in this course)(not covered in this course) (b)
(b) EmpirEmpirical methodsical methods:: i.
i. Bell-Bell-shapshaped: ed: SymmetrSymmetrical abouical about a meant a mean ii.
ii. SkeSkewed left (negawed left (negative): tive): densdensest area on the rightest area on the right iii.
iii. SkeSkewed righwed right (positive)t (positive): : densedensest area on the leftst area on the left iv.
iv. UnifoUniform: rm: even all arouneven all around; straighd; straight linet line 2.
2. CenterCenter (location)(location) (a)
(a) The “miThe “middlddle” e” of our datof our dataa i.
i. ModeMode: statistic that asks which value occurs the most frequently: statistic that asks which value occurs the most frequently ii.
ii. MedianMedian ((QQ22): The middle data value): The middle data value A.
A. The definiThe definition in this course is an algorition in this course is an algorithmthm B.
B. We deWe denotnotee nn data values bydata values by x x 11,, x x 22,...,x ,...,x nn and the sorted data byand the sorted data by x x (1)(1),, x x (2)(2),...,x ,...,x ((nn)) wherewhere
x
x (1)(1)
≤
≤
x x (2)(2)≤
≤
...≤
≤
x x ((nn))If
If nn is odd thenis odd then QQ2 2 == x x ((nn+1+1 2
2 )) and if and if nn is even, thenis even, then QQ2 2 ==
x x ((nn 2 2))
−
−
x x ((nn+1+1 2 2 )) 2 2Winter 2012
Winter 2012 1 1 PPPPDDAAC C
iii.
iii. MeanMean: The sample mean is: The sample mean is ¯x x =¯=
n n
i i =1=1 x x i i n n A.A. The sample mean moves in the the direThe sample mean moves in the the direction of the outlier if an outlier is addection of the outlier if an outlier is added (mediand (median as well but less of an effect)
as well but less of an effect) (b)
(b) RobustnessRobustness: The median is less affected by outliers and is thus: The median is less affected by outliers and is thus robust.robust. 3.
3. SpreadSpread (variability)(variability) (a)
(a) RangeRange: By definition, this is: By definition, this is x x ((nn))
−
−
x x (1)(1)(b)
(b) IQRIQR (Interquartile range): The middle half of your data(Interquartile range): The middle half of your data i. We first define
i. We first define posnposn((aa)) as the function which returns the index of the statisticas the function which returns the index of the statistic aa in a datain a data set. The value of
set. The value of QQ11 is defined as the median in the data setis defined as the median in the data set x
x (1)(1),, x x (2)(2),...,x ,...,x ((posnposn((QQ2)2)
−
−
1)1)and
and QQ22 is the median of the data setis the median of the data set x
x ((posnposn((QQ2)+1)2)+1),, x x ((posnposn((QQ2)+2)2)+2),...,x ,...,x ((nn))
ii. The IQR of set is defined to be the difference between
ii. The IQR of set is defined to be the difference between QQ33 andand QQ11. That is,. That is, IIQRQR == QQ33
−
−
QQ11iii. A
iii. A box plotbox plot is visual repis visual represresententatiation on of of thithis. s. The whisThe whiskekers of rs of a a bobox x ploplot t reprepresresent valueent valuess that are some value below or above the data set, according to the upper and lower fences, that are some value below or above the data set, according to the upper and lower fences, which are the
which are the upper and lower bounds of upper and lower bounds of the whiskerthe whiskers s resperespectivelyctively. . The lower fence,The lower fence, LLLL, , isis bounded by a value of
bounded by a value of
LL
LL == QQ11
−
−
11..5(5(IIQRQR))and the upper fence,
and the upper fence, UULL, a value of , a value of U
ULL== QQ33 + 1+ 1..5(5(IIQRQR))
We say that any value that is less than
Here, we have a visualization of a box plot, courtesy of Wikipedia: Here, we have a visualization of a box plot, courtesy of Wikipedia:
Figure 1.2:
Figure 1.2: Box Plots (from Wikipedia)Box Plots (from Wikipedia) (c)
(c) Variance Variance : For a sample population, the variance is defined to be: For a sample population, the variance is defined to be
s s 22 == n n
i i =1=1 ((x x i i−
−
x x ¯¯))22 n n−
−
11 which is called thewhich is called the sample variancesample variance.. (d)
(d) Standard DeviationStandard Deviation: For a sample population, the standard deviation is just the square root of : For a sample population, the standard deviation is just the square root of the variance: the variance: s s ==
n n
i i =1=1 ((x x i i−
−
x x ¯¯))22 n n−
−
11 which is called thewhich is called the sample sample standastandard rd deviatdeviationion..
1.
1.5
5 Co
Conc
nclu
lusi
sion
on
In the conclusion, there are only two aspects of the study that you need to be concerned about: In the conclusion, there are only two aspects of the study that you need to be concerned about:
1.
1. Did you answDid you answer your prober your problemlem 2.
Winter 2012
Winter 2012 2 2 MEAMEASURSUREMEEMENT NT ANAANALLYSIYSIS S
2
2 Me
Meas
asure
ureme
ment
nt An
Anal
alys
ysis
is
The goal of measures is to explain how far our data is spread out and the relationship of data points. The goal of measures is to explain how far our data is spread out and the relationship of data points.
2.
2.1
1 Me
Meas
asure
ureme
ment
nts
s of
of Sp
Sprea
read
d
The goal of the standard deviation is to approximate the average distance a point is from the mean. Here The goal of the standard deviation is to approximate the average distance a point is from the mean. Here are some other methods that we could use for standard deviation and why they fail:
are some other methods that we could use for standard deviation and why they fail:
1. 1. n n
i i =1=1 ((x x i i−
−
x x ¯¯)) nn does not work because it is always equal to 0does not work because it is always equal to 0
2. 2. n n
i i =1=1||
x x i i−
−
¯¯x x||
nn works but we cannot do anything mathematically significant to itworks but we cannot do anything mathematically significant to it
3. 3.
nn
i i =1=1 ((x x i i−
−
x x ¯¯))22 nn is a good guess, but note thatis a good guess, but note that n n
i i =1=1 ((x x i i−
−
x x ))¯¯ 22≤
≤
n n
i i =1=1||
x x i i−
−
x x ¯¯||
(a)(a) TTo fix o fix thithis, we uses, we use nn
−
−
11 instead of instead of nn (proof comes later on)(proof comes later on) Proposition 2.1.Proposition 2.1. An interesting identity is the following:An interesting identity is the following:
s s ==
n n
i i =1=1 ((x x i i−
−
x x ¯¯))22 n n−
−
11 ==
n n
i i =1=1 ((x x 22 i i−
−
nnx x ¯¯22)) n n−
−
11 Proof.Proof. Exercise for the reader.Exercise for the reader.
Definition 2.1. Coefficient of Variation
Definition 2.1. Coefficient of Variation (CV)(CV)
This measure provides a unit-less measurement of spread: This measure provides a unit-less measurement of spread:
CV CV == s s ¯ ¯ x x
×
×
100%100%2.2
2.2 Meas
Measurem
urement
ents
s of
of Ass
Associat
ociation
ion
Here, we examine a few interesting measures which test the relationship between two random variables Here, we examine a few interesting measures which test the relationship between two random variables X X any
any Y Y .. 1.
1. CovarianceCovariance: In theory (a population), the covariance is defined as: In theory (a population), the covariance is defined as Cov
but in practice (in samples) it is defined as but in practice (in samples) it is defined as
s s XY XY == n n
i i =1=1((x x i i
−
−
x x )(¯¯)( y y i i−
−
y y ))¯¯n
n
−
−
11 .. Note that CovNote that Cov((X,X, Y Y )),, s s XY XY
∈∈
RR and both give us an idea of the direction of the relationship but notand both give us an idea of the direction of the relationship but notthe
the magnimagnitude.tude. 2.
2. CorrelationCorrelation:: In theory (a population), the correlation is defined asIn theory (a population), the correlation is defined as ρ
ρXY XY ==
Cov
Cov((X,X, Y Y )) σ
σX X σσY Y
but in practice (in samples) it is defined as but in practice (in samples) it is defined as
r r XY XY == s s XY XY s s X X s s Y Y .. Note that
Note that
−
−
11≤
≤
ρρXY XY ,, r r XY XY≤
≤
11 and both give us an idea of the direction of the relationship AND theand both give us an idea of the direction of the relationship AND themagnitude. magnitude.
(a)
(a) An interpAn interpretatiretation of on of the values is as the values is as follofollows:ws: i.
i.
||
r r XY XY|| ≈
≈
1 1 ==⇒
⇒
strong relationshipstrong relationshipii.
ii.
||
r r XY XY||
= = 1 1 ==⇒
⇒
perfectly linear relationshipperfectly linear relationshipiii.
iii.
||
r r XY XY||
>> 1 1 ==⇒
⇒
positive relationshippositive relationshipiv.
iv.
||
r r XY XY||
<< 1 1 ==⇒
⇒
negative relationshipnegative relationshipv.
v.
||
r r XY XY|| ≈
≈
0 0 ==⇒
⇒
weak relationshipweak relationship3.
3. Relative-riskRelative-risk: From: From STAT STAT 230230, this the probability of something happening under a condition relative, this the probability of something happening under a condition relative to
to thithis s samsame e thithing ng haphappenpening if ing if the condithe conditiotion n is note is note met. met. FoFormarmallylly, , fofor r twtwo o eveneventsts AA andand BB, it is, it is defined as defined as RR RR == P P ((AA
||
BB)) P P ((AA||
BB))¯¯ .. An interesting property is that ifAn interesting property is that if RRRR = = 11 thenthen AA
⊥
⊥
BB and vice versa.and vice versa. 4.4. SlopeSlope: This will be covered later on.: This will be covered later on.
3
3 Pro
Proba
babi
bili
litty
y Th
Theo
eory
ry
All of this content was covered in
All of this content was covered in STAT STAT 230230 so I so I wilwill l not be typesnot be typesettietting it. ng it. CheCheck out ck out any proany probabbabiliilityty text
textboobook k jusjust t to to revreview the iew the conconcepcepts ts and proand properperties of ties of expexpectaectation and tion and vavariariancence. . The only The only impimporortantantt change was a notational one, specifically that instead of writing
change was a notational one, specifically that instead of writing X X
∼
∼
BiBi nn((n,n, p p )), we write, we write X X∼
∼
BiBi nn((n,n,Π)Π) whereWinter 2012
Winter 2012 4 ST4 STAATISTISTICTICAL AL MODMODELS ELS
4
4 St
Stat
atis
isti
tica
cal l Mod
Model
elss
Recall that the goal of statistics is to guess the value of a population parameter on the basis of a (or more) Recall that the goal of statistics is to guess the value of a population parameter on the basis of a (or more) sample statistic.
sample statistic.
4.
4.1
1 Ge
Gene
nera
rali
liti
ties
es
We make our measurements on our sample units. The data values that are collected are: We make our measurements on our sample units. The data values that are collected are:
••
The response variateThe response variate••
The explanatory variate(s)The explanatory variate(s)The response variate is a characteristic of the unit that helps us answer the problem. It will be denoted by The response variate is a characteristic of the unit that helps us answer the problem. It will be denoted by Y
Y and will be assumed to be random with a random componentand will be assumed to be random with a random component .. Every model is relating the population parameter (
Every model is relating the population parameter (µ,σ,π,ρ,...µ,σ,π,ρ,...) ) to the to the samsample valuple values (unites (units). s). We will useWe will use at least one subscript representing the value of unit
at least one subscript representing the value of unit i i . Note that a realization,. Note that a realization, y y i i , is the response that is a, is the response that is a
achieved by a response variate achieved by a response variate Y Y i i ..
Example 4.1.
Example 4.1. Here is a situation involving coin flips:Here is a situation involving coin flips:
Y
Y i i == flipping a coin that has yet to landflipping a coin that has yet to land
y
y i i == coin lands and realizes its potential (H/T)coin lands and realizes its potential (H/T)
In every
In every modmodel el we assumwe assume e thathat t samsamplipling was ng was dondone e ranrandomdomlyly. . ThiThis s allallowows s us to us to assassume thatume that i i
⊥
⊥
j j forfori i
== j j ..4.
4.2
2 T
Type
ypes
s of
of Mod
Model
elss
Goal of
Goal of statistical modelsstatistical models: explain the relationship between a parameter and a response variate.: explain the relationship between a parameter and a response variate. The following are the different types of statistical models that we will be examining :
The following are the different types of statistical models that we will be examining : 1.
1. DiscreteDiscrete (Binary)(Binary) Model -Model - either the population data is within parameters or it is not.either the population data is within parameters or it is not. (a)
(a) BinomiBinomial:al: Y Y i i == i i ,, i i
∼
∼
BiBi nn(1(1,, Π)Π)(b)
(b) PoisPoisson:son: Y Y i i == i i ,, i i
∼
∼
P o i s P o i s ((µµ))2.
2. Response Model -Response Model - these model the response andthese model the response and at most at most use the explanatory variate implicitly as ause the explanatory variate implicitly as a focal explanatory variate.
focal explanatory variate. (a)
(a) Y Y i i == µµ ++ i i , , i i
∼
∼
N N (0(0,, σσ22))(b)
(c)
(c) Y Y ii j j == µµ++ τ τ i i + + ii j j , , ii j j
∼
∼
N N (0(0,, σσ ))(d)
(d) Y Y i i == µµ++ δδ ++ i i
Where
Where Y Y j j ,, Y Y ii j j are the responses of unitare the responses of unit j j [in [in grougroupp i i ],], µµ the overall average,the overall average, µµi i the averagethe average
in the
in the i i thth group,group, τ τ
i i the difference between the overall average and the average in thethe difference between the overall average and the average in the i i thth group,group,
and
and δδ equal to some bias.equal to some bias. 3.
3. Regression Model -Regression Model - these create a function that relates the response and the explanatory variatethese create a function that relates the response and the explanatory variate (attribute or parameter); note here that we assume
(attribute or parameter); note here that we assume Y Y i i == Y Y i i
||
X X ..(a)
(a) Y Y i i == αα ++ βx βx i i ++ i i , , i i
∼
∼
N N (0(0,, σσ22))(b)
(b) Y Y i i == αα ++ ββ((x x i i
−
−
x ) +x ¯¯) + i i , , i i∼
∼
N N (0(0,, σσ22))(c)
(c) Y Y i i == βx βx i i ++ i i , , i i
∼
∼
N N (0(0,, σσ22))(d)
(d) Y Y ii j j == ααi i ++ βx βx ii j j ++ ii j j , , ii j j
∼
∼
N N (0(0,, σσ22))(e)
(e) Y Y i i == ββ00 ++ ββ11x x 11i i ++ ββ22x x 22i i ++ i i , , i i
∼
∼
N N (0(0,, σσ22))Where
Where Y Y j j ,, Y Y ij ij are the response of unitare the response of unit j j [in group[in group i i ],], α,α, ββ00 are the intercepts, βare the intercepts, β is the slope,is the slope,
x
x j j ,, x x ij ij are the explanatory variates of unitare the explanatory variates of unit j j [in group[in group i i ], and], and ββ11,, β β22 the slopes for two explanatorythe slopes for two explanatory
variates (indicating that
variates (indicating that Y Y i i inin ((e e )) is a is a funfunctiction of on of twtwo o expexplanlanatoatory variry variatesates). ). NotNote e thathat t in thein the
above models, we assume that we can control our explanatory variate and so we treat the above models, we assume that we can control our explanatory variate and so we treat the x x i i
s sconstant. constant. Theorem 4.1.
Theorem 4.1. Linear Combinations of Normal Random Variables Linear Combinations of Normal Random Variables Let
Let X X i i
∼
∼
N N ((µµi i ,, σσ22i i )),, X X j j⊥
⊥
X X i i for all for all i i = j j ,, k =
k i i∈∈
RR. Then,. Then,T T == k k 00 ++ n n
i i =1=1 k k i i X X i i ==⇒
⇒
T T∼
∼
N N
k k 00 ++ n n
i i =1=1 k k i i µµi i ,, n n
i i =1=1 k k i i 22σσi i 22
Proof.Proof. Do as an exercise.Do as an exercise.
5
5 Es
Esti
tima
mate
tes
s an
and
d Es
Esti
tima
mato
tors
rs
In this section, we continue to develop the relationship between our population and sample. In this section, we continue to develop the relationship between our population and sample.
5.
5.1
1 Mo
Moti
tiva
vati
tion
on
Suppose that I flip a coin 5 times, with the number of heads
Suppose that I flip a coin 5 times, with the number of heads Y Y
∼
∼
BinBin(5(5,, Π Π == 2121)). . NoNote tte thahatt ΠΠ is given.is given. What’s the value ofWhat’s the value of y y that maximizesthat maximizes f f (( y y ))? ? UnfortunatelyUnfortunately, since, since y y is discrete, the best that we can do isis discrete, the best that we can do is draw a histogram and look for the maximum. This is a very boring problem.
draw a histogram and look for the maximum. This is a very boring problem. The
The maximmaximum um liklikelyhood elyhood testtest, , hohowewevever, r, asasks the ks the ququesestiotion n in revein reversrse. e. ThThat at isis, , wwe e finfind d ththe e opoptitimamall parameter for
parameter for ΠΠ , given, given y y , such that, such that f f (( y y )) is at its maximum. From the formula of is at its maximum. From the formula of f f , given by, given by f f (( y y ) ) ==
55 y y
ΠΠ y y (1(1−
−
Π)Π)55−
−
y yWinter 2012
Winter 2012 5 5 ESTIMESTIMAATES TES AND AND ESTIMAESTIMATORS TORS
we can see that if we consider
we can see that if we consider f f as a functionas a function ΠΠ, , it makeit makes it s it a a concontintinuouuous functs functionion. . TTo o comcomputpute thee the maximum, it is just a matter of using logarithmic differentiation,
maximum, it is just a matter of using logarithmic differentiation, ln
ln f f (( y y ) = ln) = ln
55 yy
++ y y ln Π + lnΠ + (5(5−
−
y y )) lnln(1(1−
−
Π)Π) finding the partials,finding the partials,
∂ ∂ lnln f f (( y y )) ∂ ∂ ΠΠ == y y Π Π
−
−
5 5−
−
y y 1 1−
−
ΠΠ and setting them to zero to find the maximumand setting them to zero to find the maximum ∂ ∂ lnln f f (( y y )) ∂ ∂ ΠΠ = = 0 0 ==
⇒
⇒
0 0 == y y Π Π−
−
5 5−
−
y y 1 1−
−
ΠΠ = =⇒
⇒
y y = = 5Π5Π = =⇒
⇒
Π Π == 55 y y and so if we takeand so if we take Π Π == y y 55, then, then f f (( y y )) is is maximizmaximized.ed.
5.2
5.2 Max
Maximu
imum Lik
m Likeli
elihood Est
hood Estima
imatio
tion (ML
n (MLE) Al
E) Algo
gorith
rithm
m
Here, we will formally describe the
Here, we will formally describe the maximum likelihood estimationmaximum likelihood estimation (MLE) algorithm whose main goal is(MLE) algorithm whose main goal is to know what estimate for a study population parameter
to know what estimate for a study population parameter θθ should be used, given a set of data points suchshould be used, given a set of data points such that probability of the data set being chosen is at its maximum.
that probability of the data set being chosen is at its maximum.
Suppose that we posit that a certain population follows a given statistical model with unknown parameters Suppose that we posit that a certain population follows a given statistical model with unknown parameters θθ11,, θθ22,...,θ,...,θmm. We then draw n. We then draw n data points,data points, y y 11,, y y 22,...,y ,...,y nn randomly (this is important) and using the data, werandomly (this is important) and using the data, we
try to estimate the unknown parameters. The algorithm goes as follows: try to estimate the unknown parameters. The algorithm goes as follows:
1. Define
1. Define LL == f f (( y y 11,, y y 22,...,y ,...,y nn) ) == n n
i i =1=1 ff (( y y i i )) where we callwhere we call LL aa liklikelihood elihood functifunctionon. . SimSimpliplify if fy if pospossibsible.le.
Note that
Note that f f (( y y 11,, y y 22,...,y ,...,y nn) ) == n n
i i =1=1 ff (( y y i i )) because we are assuming random sampling, implying thatbecause we are assuming random sampling, implying that y y i i
⊥
⊥
y y j j ,,∀∀
i i
== j j .. 2. Define2. Define l l = ln(= ln(LL)). Simplify. Simplify l l using logarithmic laws.using logarithmic laws. 3. Find 3. Find ∂l ∂l ∂θ ∂θ11,, ∂l ∂l ∂θ ∂θ22,, ...,., ∂l ∂l ∂θ
∂θnn, set each of the partials to zero, and solve for each, set each of the partials to zero, and solve for each θθi i ,, i i = = 11,...,n,...,n. . ThThe sole solvedved
θθ
i i s s are called theare called the estimatesestimates of of f f and we add a hat,and we add a hat, ˆθθˆi i , to indicate this., to indicate this.To illustrate the algorithm, we give two examples. To illustrate the algorithm, we give two examples. Example 5.1.
Example 5.1. SupposeSuppose Y Y i i == i i , , i i
∼
∼
ExpExp((θθ)) andand θθ is a rate. What is the optimalis a rate. What is the optimal ˆθθ according to MLE?ˆaccording to MLE?Solution. Solution. 1. 1. LL == n n
i i =1=1θθexp(exp(
−
−
y y i i θθ) ) == θθnnexp(exp(−
−
n n
i i =1=1 y y i i θθ))2. 2. l l = ln(= ln(LL) ) == nnlnln θθ
−
−
i i =1=1 y y i i θθ 3. 3. ∂θ∂θ∂l ∂l == nnθθ−
−
n n
i i =1=1 yy i i == nnθθ
−
−
nn ¯ y ,, y ¯ ∂θ∂θ∂l ∂l = = 0 0 ==⇒
⇒
nnθθˆˆ−
−
nn ¯ y y = ¯= 0 0 ==⇒
⇒
θθ =ˆˆ= y y 11¯¯..So the maximum
So the maximum l l , and consequently maximum, and consequently maximum LL is obtained, whenis obtained, when θθ == y y 11¯¯.. Example 5.2.
Example 5.2. SupposeSuppose Y Y i i == αα ++ βx βx i i ++ i i , , i i
∼
∼
N N (0(0,, σσ22)). Use MLE to estimate. Use MLE to estimate αα andand ββ..Solution. Solution.
First, take note that
First, take note that Y Y i i
∼
∼
N N ((αα++ βx βx i i ,, σσ22))1.
1. We first simplifyWe first simplify LL as follows.as follows.
L L == n n
i i =1=1 1 1√
√
22πσπσ exp(exp(−
−
(( y y i i−
−
αα−
−
βx βx i i ))22//22σσ22))= = 11 (2 (2ππ))nn22
··
1 1 σ σnn exp(exp( n n
i i =1=1−
−
(( y y i i
−
−
αα−
−
βx βx i i ))22//22σσ22))= = K K
··
11 σ σnn exp(exp( n n
i i =1=1−
−
(( y y i i
−
−
αα−
−
βx βx i i ))22//22σσ22))where
where K K == 11
(2 (2ππ))nn22..
2.
2. By direct evaBy direct evaluatioluation,n,
l l = = lnln K K
−
−
nn lnln σσ−
−
n n
i i =1=1(( y y i i
−
−
αα−
−
βx βx i i ))22//22σσ223.
3. ComputComputing the partiing the partials, we getals, we get
∂l ∂l ∂α ∂α == n n
i i =1=1(( y y i i
−
−
αα−
−
βx βx i i ))/σ/σ22and and ∂l ∂l ∂α ∂α == n n
i i =1=1((x x i i )()( y y i i
−
−
αα−
−
βx βx i i ))/σ/σ22..So first solving for
So first solving for αα we getwe get
∂l ∂l ∂α ∂α = 0 = 0 ==
⇒
⇒
n n
i i =1=1(( y y i i
−
−
ααˆˆ−
−
βx βx ˆˆ i i ) ) = = 00=
=
⇒
⇒
nn y y ¯¯−
−
nnααˆˆ−
−
nn ˆ β βˆx x ¯¯= = 00 =Winter 2012
Winter 2012 5 5 ESTIMESTIMAATES TES AND AND ESTIMAESTIMATORS TORS
and extending to
and extending to ββ we getwe get ∂l ∂l ∂β ∂β = = 0 0 ==
⇒
⇒
n n
i i =1=1((x x i i )()( y y i i
−
−
ααˆˆ−
−
βx βx ˆˆ i i ) ) = = 00= =
⇒
⇒
n n
i i =1=1 xx i i y y i i
−
−
nn¯x ˆx ¯ααˆ−
−
β βˆˆ n n
i i =1=1 x x i i 22 ((5.15.1)) = =⇒
⇒
n n
i i =1=1 xx i i y y i i
−
−
nn¯x x ((¯¯ y y ¯−
−
β β ¯ˆˆx x ))¯−
−
β βˆˆ n n
i i =1=1 x x i i 22 = = 00 = =⇒
⇒
n n
i i =1=1 xx i i y y i i
−
−
nn¯x ¯x ¯ y y ¯−
−
β βˆˆ
nn
i i =1=1 x x i i 22−
−
nn¯x x ¯22
= = 00 = =⇒
⇒
n n
i i =1=1 xx i i y y i i
−
−
nn¯x x ¯¯ y y ¯n n
−
−
11 == ˆ ˆ β β
n n
i i =1=1 x x 22 i i−
−
nn¯x x ¯22
n n−
−
11 = =⇒
⇒
βs βs ˆˆ x x 22 == s s xy xy = =⇒
⇒
β β =ˆˆ= s s xx y y s s 22 x xBefore we head off into the next section, we should recall a very important theorem from
Before we head off into the next section, we should recall a very important theorem from STAT STAT 230230.. Theorem 5.1.
Theorem 5.1. (Central Limit Theorem)(Central Limit Theorem) Let
Let X X 11,, X X 22,...,X ,...,X nn be i.i.d.be i.i.d.11 random variables with distributionrandom variables with distribution F F ,, E E ((X X i i ) ) == µµ and and V ar V ar ((X X i i ) ) == σσ22,,
∀∀
i i , and , andX
X i i
⊥
⊥
X X j j ,,∀∀
i i
== j j . Then as . Then as nn→
→ ∞
∞
,, ¯X X ¯∼
∼
N N ((µ,µ, σσ2 2 n n )),, n n
i i =1=1 X X i i∼
∼
N N ((nµ,σnµ,σ22nn)).. Proof.Proof. Beyond the scope of this course.Beyond the scope of this course.
5.
5.3
3 Es
Esti
tima
mato
tors
rs
One of the pitfalls that you may notice with the MLE Algorithm is that the estimate is relative to the sample One of the pitfalls that you may notice with the MLE Algorithm is that the estimate is relative to the sample dat
data a thathat t we collewe collect ct frofrom m the study populthe study populatioation. n. FoFor r examexampleple, , supsuppospose e thathat t we have we have twtwo o samsampleples s dradrawnwn from our population,
from our population, SS11 ==
{{
y y i i 11}}
and Sand S22 =={{
y y i i 22}}
, of the same size, taken randomly with replacement and, of the same size, taken randomly with replacement andsuppose that we model the data in our population with the model
suppose that we model the data in our population with the model Y Y i i == αα++ βx βx i i ++ i i ,, i i
∼
∼
N N (0(0,, σσ22)). It could. It couldbe the case that when we apply MLE to both of them to get the following two models,
be the case that when we apply MLE to both of them to get the following two models, y y i i 11 = = ˆααˆ11++ ˆ β βˆ11x x i i ++ i i
and
and y y i i 22 = = ˆααˆ22++ ˆ β βˆ22x x i i ++ i i , that the estimates may differ. That is, it is possible that, that the estimates may differ. That is, it is possible that ˆααˆ11
= = ˆααˆ22 and/orand/or ˆ β βˆ11
== ˆ β βˆ22..This is because we are doing random sampling, which motivates us to believe that if we take
This is because we are doing random sampling, which motivates us to believe that if we take nn randomrandom samples,
samples, SS j j ==
{{
y y ij ij}}
, from the study population of equal size, then the estimates,, from the study population of equal size, then the estimates, θθ j j , for a parameter, for a parameter θθ of ofthe samples should follow a distribution. the samples should follow a distribution. We
We calcall l the the ranrandom dom vavariariable ble reprepresresententing ing disdistritributbution ion anan estimatorestimator and denote it as the parameter inand denote it as the parameter in question with a tilde on the top,
question with a tilde on the top, ˜θθ. Note that˜. Note that ˜θθ describes the distribution of ˜describes the distribution of θθ for the population and notfor the population and not the sampl
the sample. e. In other worIn other words, the ds, the relrelatioationshnship is ip is thatthat ˆθθ is a specific realization of ˆ is a specific realization of ˜θθ through the sampled˜ through the sampled data points. Now the obvious question to ask here is what exactly the distribution of an estimator is, given data points. Now the obvious question to ask here is what exactly the distribution of an estimator is, given
1
the mode
the model l that we are usinthat we are using g to model the to model the poppopulaulatiotion. n. ThiThis s is actuais actually depelly dependendent on nt on whawhat t we get as we get as thethe estimate, which we will see in the following examples.
estimate, which we will see in the following examples. Example 5.3.
Example 5.3. Given a modelGiven a model Y Y i i == µµ ++ i i ,, i i
∼
∼
N N (0(0,, σσ22)), if one were to compute the best estimate for, if one were to compute the best estimate forthe parameter
the parameter µµ using MLE, one would obtainusing MLE, one would obtain ˆµµ = ˆ = ¯ y y . ¯. BecBecausause e the estithe estimatomator that we r that we wawant to nt to examexamineine describes the population and not the sample, and because we are doing random sampling, we capitalize the describes the population and not the sample, and because we are doing random sampling, we capitalize the y
y to to shoshow w thithis s (w(we e movmove e frofrom m the samplthe sample, e, up a up a levlevel el of of absabstratractioction, into n, into the study poputhe study populatlationion). ). ThaThat t is,is, ˜
˜ µ
µ == ¯Y Y . ¯. So what is the distrSo what is the distribuibutiotion of n of ˜µ in this caseµ˜ in this case? ? RecRecall from Thm 4.1. all from Thm 4.1. thathat t a a linlineaear combinr combinatiation of on of normally distributed random variables is normal. Since
normally distributed random variables is normal. Since ¯ ¯ Y Y == n n
i i =1=1 1 1 n nY Y i i ˜˜ α
α is normal withis normal with E E ((¯Y Y ) ¯) == µµ andand V ar V ar (( ¯Y Y ) ¯) == σσnn22. . SoSo ˜αα˜
∼
∼
N ((µ,N µ, σσnn22)).. Example 5.4.Example 5.4. Let’Let’s s try a try a slislightghtly ly momore difficure difficult lt examexampleple. . SupSuppospose e we want to we want to find the find the disdistritributbution of ion of αα˜˜ and
and ˜ β for the model β˜ for the model Y Y i i == αα ++ ββ((x x i i
−
−
x x ) +¯¯) + i i , , i i∼
∼
N N (0(0,, σσ22)). . ThrouThrough similagh similar methods shr methods shown in Ex. own in Ex. 5.2.,5.2.,using MLE, one should be able to obtain the estimates
using MLE, one should be able to obtain the estimates ˆαα = ˆ = ¯ y y and¯ and ˆ β β =ˆ= s s xy xy
s s 22
x
x . Now, from Ex. 5.3., we already. Now, from Ex. 5.3., we already
computed the distribution for
computed the distribution for αα as˜˜ as αα˜˜
∼
∼
N N ((µ,µ, σσnn22)), so what we are more interested in is, so what we are more interested in is ˜ β β. Through a simple˜. Through a simple re-arrangement of the definition ofre-arrangement of the definition of ˜ β β,,˜
˜ ˜ B B == n n
i i =1=1((x x i i
−
−
x x )(¯¯)(Y Y i i−
−
Y Y ¯¯)) n n
i i =1=1 ((x x i i−
−
x x ))¯¯ 22 = = n n
i i =1=1((x x i i
−
−
x x ¯¯))Y Y i i n n
i i =1=1 ((x x i i−
−
x x ¯¯))22 = = n n
i i =1=1 ((x x i i−
−
x x ))¯¯ ((x x i i−
−
x ))x ¯¯ 22 Y Y i i since since ¯Y Y ¯n n
i i =1=1
((x x i i
−
−
x x ¯¯) ) = = 00. . SoSo ˜ β β is actually a linear combination of normal random variables, making it normal˜ is actually a linear combination of normal random variables, making it normalas well. Computing the expectation, we get as well. Computing the expectation, we get
E E (( ˜BB) ˜) == n n
i i =1=1 ((x x i i−
−
x x ¯¯)) ((x x i i−
−
x x ¯¯))22 E E ((Y Y i i ))= = n n
i i =1=1
11 ((x x i i−
−
x x ))¯¯ 22
((x x i i−
−
x x )(¯¯)(αα ++ ββ((x x i i−
−
x x ))¯¯)) = = 11 ((nn−
−
1)1)s s 22 x x
α α =0 =0
nn
i i =1=1 ((x x i i−
−
x x )) +¯¯ + β β = =s s x x 22((nn−
−
1)1)
nn
i i =1=1 ((x x i i−
−
x x ))¯¯ 22
= = ββ
s s 2 2 x x s s 22 x x
= = β.β. We canWe can simsimilailarly compurly compute te the varithe variancance e usiusing ng thithis s methmethod, althouod, although this gh this wawas s not done not done in in leclecturtures. es. I I wilwilll leave it to the reader as an exercise (I posit that variance should be 0). Thus,
Winter 2012
Winter 2012 5 5 ESTIMESTIMAATES TES AND AND ESTIMAESTIMATORS TORS
5.
5.4
4 Bi
Bias
ases
es in
in St
Stati
atist
stic
icss
In this section we take a look at one of the problems of using estimators. In this section we take a look at one of the problems of using estimators. Definition 5.1.
Definition 5.1. We say that for a given estimator,We say that for a given estimator, ˜θθ, of an estimate for a model is˜, of an estimate for a model is unbiasedunbiased if the followingif the following holds
holds
E
E ((˜θθ) ˜) == θ.θ. Otherwise, we say that our estimator is
Otherwise, we say that our estimator is biasedbiased..
One way to intuitively look at this definition is that we say an estimator is unbiased if we take
One way to intuitively look at this definition is that we say an estimator is unbiased if we take nn samplsamples es of of equal size with estimators
equal size with estimators ˜θθ˜i i for each sample, and on averagefor each sample, and on average
n n
i i =1=1 ˜ ˜ θ θi i nn
≈
≈
θθ for large enoughfor large enough nn, or as, or as nn→ ∞
→ ∞
,,n n
i i =1=1 ˜ ˜ θ θi i nn
→
→
θθ. Since all the the examples that we’ve seen were so far unbiased, let’s take a look at an unbiased. Since all the the examples that we’ve seen were so far unbiased, let’s take a look at an unbiasedone. one.
Example 5.5.
Example 5.5. Consider the modelConsider the model Y Y i i == µµ ++ i i , , i i
∼
∼
N N ((µ,µ, σσ22)). . If one werIf one were to e to compcompute the besute the best estimat estimatetefor
for σσ22 using MLE, the result would beusing MLE, the result would be
ˆ ˆ σ σ22 == n n
i i =1=1(( y y i i
−
−
y y ¯¯))22n n ==
⇒
⇒
σσ˜˜ 2 2 == n n
i i =1=1((Y Y i i
−
−
Y Y ¯¯))22n n ==
nn
i i =1=1 YY i i 22
−
−
nn ¯Y Y ¯22 nn ..
Fr
From om herhere, e, let’let’s s try to try to compcompute the ute the expexpectectatioation. n. HoHoweweverver, , befbeforore e thathat, t, it it loolooks ks thathat t findfindinging E E (( ¯Y Y ¯22)) andand
E
E ((Y Y 22))would be helpful in this situation. We can find it directly through the variance of would be helpful in this situation. We can find it directly through the variance of ¯Y Y and¯ and Y Y ..
V ar
V ar ((¯Y Y ¯) ) == E E ((¯Y Y ¯22))
−
−
E E ((¯Y Y ¯))
22 ==⇒
⇒
σσ2 2
n
n == E E ((¯Y Y ¯
2 2))
−
−
µµ22 ==
⇒
⇒
E E ((¯Y Y ¯22) ) == σσ2 2 n n ++ µµ 2 2 and similarly and similarly E