Key Statistical and Mathematical Prerequisites
NOTE TO THE STUDENT:
This document is designed to review previously acquired statistics knowledge. It is not
designed to teach this material from scratch. If you arenʼt familiar with the basic concepts,
please consult your basic stats book. The purpose of this document is to review and
introduce notation consistent with our course.
A. Random Sample
The basic building block of statistics is the random variable. A random variable is
the quantifier of our world; it associates a number with each event in the world. A classic
example of this is the number of heads in a sequence of 3 coin tosses. The events are the
possible outcomes (e.g. HHH, HTH, TTH ...), the variable X defined as the number of
heads in the sequence associates a real number with each event. Another example is the
random variable H which represents the height of a person. Of course, a person is a
complicated set of events (genetic inheritance, nutritional background ...). The H variable
measures or quantifies only one aspect of this complicated set of outcomes that make-up
the person. The height random variable has a probability distribution which is the
distribution of all heights in the population.
A random sample is a collection of independent and identically distributed
random variables. { X1, X2, ... , XN} are N iid r.v.s The important idea to remember here is
that a random sample describes a mechanism for obtaining the data. The actual data we
analyze is a realization or draw from the random sample.
For example, if we draw one person out of N rooms in the building according to a
random sample method and make measurements of their heights, we have constructed a
random sample {H1, H2, ..., HN}. These observations are independent in the sense that
the height of the person drawn from room 302 is unrelated to the height of someone drawn
students as the distribution from which we are making draws. Unless certain classes
attract unusually short or tall students, we can make the assumption that each time we
B. Normal Distribution
The normal distribution is the building block of all statistics. Nothing is ever exactly
normally distributed, but the normal distribution can serve as a good approximation,
especially in regression analysis.
There are many different normal distributions. The normal distribution has two parameters: mean (µ) and variance (σ2 -- or equivalently the std dev. σ). If you tell me µ
and σ, then I know everything about this distribution. Thus, I only have to carry around two
numbers to summarize the complete distribution rather than a large table like a histogram.
Key features of the normal distribution: 1. symmetric about mean (same as mode and median), 2. thin tails (Prob[ |Y - µ | > 3σ] = .0026).
Let's look at some random samples from various normal distributions:
The R commands to generate various distributions are: rbeta,
rbinom, rcauchy, rchisq, rexp, rf, rgamma, rgeom, rhyper,
rlogis, rlnorm, rmultinom, rnbinom, rnorm, rpois, rsignrank, rt,
runif, rweibull, and rwilcox.
Suppose you want to generate a binomial distribution with 50 observations, 10 trials, .3 probability of success.
> x=rbinom(50,10,.3)
Suppose we wish to simulate a vector of 100 independent,
standard (i.e., mean 0 and standard deviation 1) normal random
variables. We use the rnorm function for this purpose, and its
defaults are mean=0 and standard deviation=1. Thus, we may simply type
> x=rnorm(100)
If the mean is 10 and std dev is 4, then the command is > x=rnorm(100,mean=10,sd=4)
> hist(x)
Population Moments and Sample Moments
The population moments are the expected or average value taken with respect to the
population rather than sample distribution.
For example, if Yi ~ iid N(µ,σ2), then
pop. var.: E[(Yi - µ)2] = σ2 sample var:
pop. std dev: sqrt(E[Yi -µ)2] = σ sample std dev:
You can think of the population moments (using the E[ ] notation) as simply a weighted
average of values with weights from the population distributions (i.e. µ = Σ piyi).
"Z" scores
Suppose Y~ N(µ,σ2)
(Y - µ)/σ ~ N(0,1)
A normal (0,1) random variable is sometimes denoted "Z". For example, consider a normal
mean problem with σ known.
C. T-test on Means and Confidence Intervals
The so-called "normal mean problem":
You have a random sample from N(µ,σ2
) dist.
{ Y1, ... YN } ~ iid N(µ,σ2)
The goal is to test
H0:
µ
=
µ
0 (null hypothesis)vs
Ha:
µ
≠
µ
0 (alternative hypothesis)Idea: compute = (sample average) and compare this quantity to µ0.
Problem: how precisely do you measure µ using the sample average?
Solution: compute the standard error of the sample average. The standard error is simply
the estimated standard deviation of the sample average.
Fact: = σ2/N (remember this, averaging reduces variance)
estimate σ2
by using s2 =
std error of
Basic idea in all t-tests: use sample estimate of parameter of interest (in this case we use
as an estimate of µ) and gauge how different this is from the hypothesized value by
forming the ratio of that difference to the standard error.
P-values and α-values
The significance level or α-value is used to set-up a hypothesis test.
α
= Pr[ Type I error] = Pr[ reject the null when the null is true]
For the t-test, we decide what is an acceptable α-level. For the given α-level, we
determine a rejection region, i.e. determine
t
*such that Pr[|t| > t
*] =
α
if H0 holds
.The p-value is used to summarize the results of a particular hypothesis test.
p = Pr[ | t | > realized value of the t-statistic ]
e.g. If you get a value of t = 8.0, then p = Pr[|t| > 8.0 ] given H0 true.
Confidence Intervals
A confidence interval is constructed around point estimate (sample mean) to so
will contain the true parameter value. For example, a 95 per cent confidence interval will
cover the true population mean 95/100 times.
D. Covariance and Correlation.
The covariance between two random variables (X,Y) is defined to be
cov(X,Y)
≡
E[(X-
µ
x)(Y-µ
y)]= E[(X-
µ
x)Y] ~σ
xy(You only have to take the mean away from one of them.)
Covariance is a measure of linear dependence between X and Y. To estimate σxy, we use
the sample covariance, .
Suppose covariance is positive. This means that X and Y are both on the same side of the
mean more frequently than on opposite sides of the mean.
E[(X - µx)(Y - µy)]
(+) (+) (-) (+)
outweigh
(-) (-) (+) (-)
Note that the covariance is in units of X times the units of Y. Also, it is difficult to say
what a "large" value of covariance is. For these reasons, the correlation coefficient was
introduced.
ρ
≡
cov(X,Y)/(
σ
xσ
y) -1 ≤ρ
≤ 1
The sample correlation coefficient is estimated by inserting the sample estimates of
covariance and standard deviations.
E. Mean and Variance of A Linear Combination of Random Variables
Useful Facts:
E[aX] = aE[X]
Var(aX) = a2Var(X)
Var(X)
≡
E[(X-µx)2] = E[(X-µx)X]Cov(X,Y)
≡
E[(X-µx)(Y-µy)] = E[(X-µx)Y] = E[X(Y-µy)]1. Bivariate Case
E[aX + bY] = aE[X] + bE[Y]
(by linearity of expectations-- exp of sumis sum of expectations)
proof: E[(aX + bY - aµx - bµy)2] = E[(a(X - µx) + b(Y - µy))2]
= E[a2(X-µx)2 + b2(Y-µy)2 + 2ab(X-µx)(Y-µy)]
= a2Var(X) + b2Var(Y) + 2abcov(X,Y)
2. Extension to N-variate Case
E[ΣwiYi] = ΣwiE[Yi] = Σwi µ (under assumption of sample from identical distribution)
Example:
Variance of sample mean. = 1/N Y1 + 1/N Y2 + ... + 1/N YN.
Here wi = 1/N
Symbol Glossary
The following symbols have been used and introduced in this chapter
Random Variables
in general, any capital letter can be used to represent a random variable. In this chapter, we used X, Y, W, and Z.
Population Quantities
µ (mean)
σ and σ2 (std dev and variance), e.g. σ
Y is pop std dev of r.v. Y.
σxy (covariance between X, Y)
ρ (correlation between two r.v.s)
Sample Quantities
(sample average or mean)
sy or s (sample std dev. or sample variance)
sxy (sample covariance)
r (sample correlation)
Distributions
X ~ N(µ, σ2) refers to the normal distribution
often Z refers to the N(0,1) distribution.
t ~ student t (or sometimes just t) here t refers to the value of the test statistic as well as the distribution
Mathematical Symbols
≡
(defined as) e.g, cov(x,y) ≡ σxyAppendix A: Mathematical Prerequisites
note to the student: this material is required and must be mastered and committed to memory. We will not have time to "relearn" these simple rules as we use them in the rest of the course.
Exponential Functions
A function having a constant base and a variable exponent is an exponential function. For example,
y = f(x) = bx
where b is the base and x is the exponent.
Some rules of exponents:
1. bubv = bu+v
2. (bu)v = buv
3. (abc)u = aubucu
4. (a/b)u = au/ bu
5. b-u = 1/bu
6. bu/ bv = bu-v = 1/bv-u
7. b1/u = where u > 0
9. b0 = 1 if b ≠ 0
A frequently used exponential function is y = ex , where e = 2.718… . The base e follows all the rules of exponents listed above.
Logarithmic Function
y = bx
where b > 0, b ≠ 1, then x is the logarithm of y to the base b. Therefore,
x = f(y) = logby
is a logarithmic function. Note that exponential and logarithmic functions are inverse functions.
Some rules of logarithms:
1. logb(uv) = logbu + logbv
2. logb(u/v) = logbu - logbv
3. logbun = nlogbu
4. logbb = 1
5. logb1 = 0
Two frequently used bases in logarithms are 10 (common logarithms) and e (natural logarithms). They all obey the above rules.
Summation Notation
The summation notation, Σ, is a shorthand notation for summing over a group of terms. Thus, for i = 1, 2, ..., n,
where i is the summation index. Note that Σ is a linear. Hence, for 'c' a constant, we have
Σcxi = cΣxi (we can move c out of the Σ-sign),
Σc = cΣ1 = cn,
and Σ(xi - yi) = Σxi - Σyi.
definitions of sample moments.
sample mean
sample variance
example: Once "interesting" property of the sample average is that the sum of deviations from the sample average is zero, i. e.
This is a very useful fact in what we will have to do in the chapters below.