• No results found

MathStatPrerequisites.pdf

N/A
N/A
Protected

Academic year: 2020

Share "MathStatPrerequisites.pdf"

Copied!
16
0
0

Loading.... (view fulltext now)

Full text

(1)

Key Statistical and Mathematical Prerequisites

NOTE TO THE STUDENT:

This document is designed to review previously acquired statistics knowledge. It is not

designed to teach this material from scratch. If you arenʼt familiar with the basic concepts,

please consult your basic stats book. The purpose of this document is to review and

introduce notation consistent with our course.

A. Random Sample

The basic building block of statistics is the random variable. A random variable is

the quantifier of our world; it associates a number with each event in the world. A classic

example of this is the number of heads in a sequence of 3 coin tosses. The events are the

possible outcomes (e.g. HHH, HTH, TTH ...), the variable X defined as the number of

heads in the sequence associates a real number with each event. Another example is the

random variable H which represents the height of a person. Of course, a person is a

complicated set of events (genetic inheritance, nutritional background ...). The H variable

measures or quantifies only one aspect of this complicated set of outcomes that make-up

the person. The height random variable has a probability distribution which is the

distribution of all heights in the population.

A random sample is a collection of independent and identically distributed

random variables. { X1, X2, ... , XN} are N iid r.v.s The important idea to remember here is

that a random sample describes a mechanism for obtaining the data. The actual data we

analyze is a realization or draw from the random sample.

For example, if we draw one person out of N rooms in the building according to a

random sample method and make measurements of their heights, we have constructed a

random sample {H1, H2, ..., HN}. These observations are independent in the sense that

the height of the person drawn from room 302 is unrelated to the height of someone drawn

(2)

students as the distribution from which we are making draws. Unless certain classes

attract unusually short or tall students, we can make the assumption that each time we

(3)

B. Normal Distribution

The normal distribution is the building block of all statistics. Nothing is ever exactly

normally distributed, but the normal distribution can serve as a good approximation,

especially in regression analysis.

There are many different normal distributions. The normal distribution has two parameters: mean (µ) and variance (σ2 -- or equivalently the std dev. σ). If you tell me µ

and σ, then I know everything about this distribution. Thus, I only have to carry around two

numbers to summarize the complete distribution rather than a large table like a histogram.

Key features of the normal distribution: 1. symmetric about mean (same as mode and median), 2. thin tails (Prob[ |Y - µ | > 3σ] = .0026).

Let's look at some random samples from various normal distributions:

The R commands to generate various distributions are: rbeta,

rbinom, rcauchy, rchisq, rexp, rf, rgamma, rgeom, rhyper,

rlogis, rlnorm, rmultinom, rnbinom, rnorm, rpois, rsignrank, rt,

runif, rweibull, and rwilcox.

Suppose you want to generate a binomial distribution with 50 observations, 10 trials, .3 probability of success.

> x=rbinom(50,10,.3)

Suppose we wish to simulate a vector of 100 independent,

standard (i.e., mean 0 and standard deviation 1) normal random

variables. We use the rnorm function for this purpose, and its

defaults are mean=0 and standard deviation=1. Thus, we may simply type

> x=rnorm(100)

(4)

If the mean is 10 and std dev is 4, then the command is > x=rnorm(100,mean=10,sd=4)

> hist(x)

Population Moments and Sample Moments

The population moments are the expected or average value taken with respect to the

population rather than sample distribution.

For example, if Yi ~ iid N(µ,σ2), then

(5)

pop. var.: E[(Yi - µ)2] = σ2 sample var:

pop. std dev: sqrt(E[Yi -µ)2] = σ sample std dev:

You can think of the population moments (using the E[ ] notation) as simply a weighted

average of values with weights from the population distributions (i.e. µ = Σ piyi).

"Z" scores

Suppose Y~ N(µ,σ2)

(Y - µ)/σ ~ N(0,1)

A normal (0,1) random variable is sometimes denoted "Z". For example, consider a normal

mean problem with σ known.

(6)

C. T-test on Means and Confidence Intervals

The so-called "normal mean problem":

You have a random sample from N(µ,σ2

) dist.

{ Y1, ... YN } ~ iid N(µ,σ2)

The goal is to test

H0:

µ

=

µ

0 (null hypothesis)

vs

Ha:

µ

µ

0 (alternative hypothesis)

Idea: compute = (sample average) and compare this quantity to µ0.

Problem: how precisely do you measure µ using the sample average?

Solution: compute the standard error of the sample average. The standard error is simply

the estimated standard deviation of the sample average.

Fact: = σ2/N (remember this, averaging reduces variance)

estimate σ2

by using s2 =

std error of

(7)

Basic idea in all t-tests: use sample estimate of parameter of interest (in this case we use

as an estimate of µ) and gauge how different this is from the hypothesized value by

forming the ratio of that difference to the standard error.

P-values and α-values

The significance level or α-value is used to set-up a hypothesis test.

α

= Pr[ Type I error] = Pr[ reject the null when the null is true]

For the t-test, we decide what is an acceptable α-level. For the given α-level, we

determine a rejection region, i.e. determine

t

*

such that Pr[|t| > t

*

] =

α

if H0 holds

.

The p-value is used to summarize the results of a particular hypothesis test.

p = Pr[ | t | > realized value of the t-statistic ]

e.g. If you get a value of t = 8.0, then p = Pr[|t| > 8.0 ] given H0 true.

Confidence Intervals

A confidence interval is constructed around point estimate (sample mean) to so

(8)

will contain the true parameter value. For example, a 95 per cent confidence interval will

cover the true population mean 95/100 times.

(9)

D. Covariance and Correlation.

The covariance between two random variables (X,Y) is defined to be

cov(X,Y)

E[(X-

µ

x)(Y-

µ

y)]

= E[(X-

µ

x)Y] ~

σ

xy

(You only have to take the mean away from one of them.)

Covariance is a measure of linear dependence between X and Y. To estimate σxy, we use

the sample covariance, .

Suppose covariance is positive. This means that X and Y are both on the same side of the

mean more frequently than on opposite sides of the mean.

E[(X - µx)(Y - µy)]

(+) (+) (-) (+)

outweigh

(-) (-) (+) (-)

Note that the covariance is in units of X times the units of Y. Also, it is difficult to say

what a "large" value of covariance is. For these reasons, the correlation coefficient was

introduced.

ρ

cov(X,Y)/(

σ

x

σ

y) -1 ≤

ρ

≤ 1

(10)

The sample correlation coefficient is estimated by inserting the sample estimates of

covariance and standard deviations.

(11)

E. Mean and Variance of A Linear Combination of Random Variables

Useful Facts:

E[aX] = aE[X]

Var(aX) = a2Var(X)

Var(X)

E[(X-µx)2] = E[(X-µx)X]

Cov(X,Y)

E[(X-µx)(Y-µy)] = E[(X-µx)Y] = E[X(Y-µy)]

1. Bivariate Case

E[aX + bY] = aE[X] + bE[Y]

(by linearity of expectations-- exp of sum

is sum of expectations)

proof: E[(aX + bY - aµx - bµy)2] = E[(a(X - µx) + b(Y - µy))2]

= E[a2(X-µx)2 + b2(Y-µy)2 + 2ab(X-µx)(Y-µy)]

= a2Var(X) + b2Var(Y) + 2abcov(X,Y)

2. Extension to N-variate Case

E[ΣwiYi] = ΣwiE[Yi] = Σwi µ (under assumption of sample from identical distribution)

(12)

Example:

Variance of sample mean. = 1/N Y1 + 1/N Y2 + ... + 1/N YN.

Here wi = 1/N

(13)

Symbol Glossary

The following symbols have been used and introduced in this chapter

Random Variables

in general, any capital letter can be used to represent a random variable. In this chapter, we used X, Y, W, and Z.

Population Quantities

µ (mean)

σ and σ2 (std dev and variance), e.g. σ

Y is pop std dev of r.v. Y.

σxy (covariance between X, Y)

ρ (correlation between two r.v.s)

Sample Quantities

(sample average or mean)

sy or s (sample std dev. or sample variance)

sxy (sample covariance)

r (sample correlation)

Distributions

X ~ N(µ, σ2) refers to the normal distribution

often Z refers to the N(0,1) distribution.

t ~ student t (or sometimes just t) here t refers to the value of the test statistic as well as the distribution

Mathematical Symbols

(defined as) e.g, cov(x,y) ≡ σxy

(14)

Appendix A: Mathematical Prerequisites

note to the student: this material is required and must be mastered and committed to memory. We will not have time to "relearn" these simple rules as we use them in the rest of the course.

Exponential Functions

A function having a constant base and a variable exponent is an exponential function. For example,

y = f(x) = bx

where b is the base and x is the exponent.

Some rules of exponents:

1. bubv = bu+v

2. (bu)v = buv

3. (abc)u = aubucu

4. (a/b)u = au/ bu

5. b-u = 1/bu

6. bu/ bv = bu-v = 1/bv-u

7. b1/u = where u > 0

9. b0 = 1 if b ≠ 0

A frequently used exponential function is y = ex , where e = 2.718… . The base e follows all the rules of exponents listed above.

Logarithmic Function

(15)

y = bx

where b > 0, b ≠ 1, then x is the logarithm of y to the base b. Therefore,

x = f(y) = logby

is a logarithmic function. Note that exponential and logarithmic functions are inverse functions.

Some rules of logarithms:

1. logb(uv) = logbu + logbv

2. logb(u/v) = logbu - logbv

3. logbun = nlogbu

4. logbb = 1

5. logb1 = 0

Two frequently used bases in logarithms are 10 (common logarithms) and e (natural logarithms). They all obey the above rules.

Summation Notation

The summation notation, Σ, is a shorthand notation for summing over a group of terms. Thus, for i = 1, 2, ..., n,

where i is the summation index. Note that Σ is a linear. Hence, for 'c' a constant, we have

Σcxi = cΣxi (we can move c out of the Σ-sign),

Σc = cΣ1 = cn,

and Σ(xi - yi) = Σxi - Σyi.

(16)

definitions of sample moments.

sample mean

sample variance

example: Once "interesting" property of the sample average is that the sum of deviations from the sample average is zero, i. e.

This is a very useful fact in what we will have to do in the chapters below.

References

Related documents

Keywords: Circular data; longitudinal data; Gibbs sampler; latent variables; mixed- effects linear models; projected normal distribution.. * Department of

RMB manages transmission of CAN Messages while DRCC runs in redundancy mode, and it doesn’t work while DRCC runs in normal mode. The block consists of

Maximum likelihood (ML) estimation of model parameters is a key building block of the main statistical analysis technique used in modeling sociolinguistic data - logistic

By taking advantage of the sparsity of the high dimensional partial correlation matrix, we estimate the null distribution of our test statistics from data using Efron’s central

This section describes how to create a null distribution of the test statistics through the bootstrap, instead of the permutation approach, such that the FWER error rate is

We first investigate the statistics of normal SYN arrival rate (SAR) and confirm it follows normal distribution. The proposed method identifies the attack by testing 1) the

In any case, since the values of the three statistics are smaller for the BMo distribution compared to those values of the Moyal, beta normal and

To look into the trend and distribution patterns of pH, arsenic, fluoride, and iron, in the groundwaters of the study area data were exposed to Normal