• No results found

Basic statistics formulas

N/A
N/A
Protected

Academic year: 2021

Share "Basic statistics formulas"

Copied!
5
0
0

Loading.... (view fulltext now)

Full text

(1)

Basic statistics formulas

Sets

De Morgan’s Law (AUB)c = Ac∩ Bc & (A∩B)c = Ac U Bc Commutativity A U B = B U A and A ∩ B = B ∩ A Associativity (A U B) U C = A U (B U C) (A ∩ B) ∩ C = A ∩ (B ∩ C) Distributivity (A U B) ∩ C = (A ∩ C) U (B ∩ C) (A ∩ B) U C = (A U C) ∩ (B U C)

Probability

• p(A) = 1 - p(Ac)

• p(A U B) = p(A) + p(B) - p(A ∩ B)

• If events A and B are mutually independent then p(A ∩ B) = p(A)p(B)

• p(A|B) = p(A ∩ B)/p(B) so long as p(B)>0 • p(A ∩ B) = p(A|B)p(B) = p(B|A)p(A) so

long as p(A)>0 and p(B)>0

• If {B1, B2...,Bk} is a set of mutually

excusive and exhaustive events, then

p(A) = p(A|B1)p(B1)+ p(A|B2)p(B2)+...+p(A|Bk)p(Bk)

Measures of Location

Sample mean x x n =

Median (for raw data i.e list of numbers ungrouped)

List the numbers in ascending order. Median is: (n+1)/2 value if n is odd;

Mean of n/2th and (n+1)/2th values if n is even.

Measures of spread

Sample variance

(

)

2 2 1 i x x s n − = −

Sample standard deviation,s

var

s= iance

Range

Largest value – smallest value Interquartile range (IQR) Upper quartile – lower quartile Coefficient of variation

/ s x

(2)

Expectation, variance, and

covariance

If random variable X is discrete: ( ) i ( )i

E X =

x p x

which is calculated over all possible values of X. Let g(X) denote a function of discrete X, then:

( ( )) ( ) ( )i i E g X =

g x p x Expection rules

Let X, Y, Z denote random variables; a, b denote constants

E1. E(a) = a

E2. E(aX) = a E(X)

E3. E(X+Y) = E(X) + E(Y) Variance rules

V1. var(a) = 0

V2. var(aX) = a2var(X)

V3. var(X Y) = var(X) + var(Y) 2*Cov(X,Y)

Covariance rules C1. Cov(a,X) = 0

C2. Cov(aX,bY) = ab*Cov(X,Y)

C3. Cov(X+Y,Z) = Cov(X,Z) + Cov(Y,Z)

Normal density function

The normal density function (aka normal distribution) has 2 parameters: mean and variance. For the normal distribution: 90% of data falls within ±1.65σ 95% of data falls within ±1.95σ Standardizing (Z-score) x z µ σ − =

Sampling distribution of

sample mean

Suppose XN( ,µ σ2), then the distribution of the sample mean X is

2 ( , ) X N n σ µ ∼

The standard error of X is 2

n

σ

This result is true if X does not follow the normal distribution but n is large (and then the result follows because of the Central Limit Theorem)

(3)

Confidence intervals for the mean

Parameter

Assumptions

Formula

Data normally distributed or n is large (n>30); 2 σ known / 2 x z n α σ ± Mean µ Data normally distributed; n small; 2 σ unknown / 2,n1 s x t n α − ±

Data are normally distributed; 2 2 , X Y σ σ are known 2 2 / 2 ( ) X Y X Y x y z n n α σ σ   − ±  +    Variances unknown; Large samples 2 2 / 2 ( ) X X X X s s x y z n n α   − ±  +    Difference in means X Y µ −µ Case of 2 independent distributions

Data are normally distributed; 2 2 , X Y σ σ are unknown but σX2 =σY2

(

)

2 / 2, 2 1 1 X Y n n p X Y x y t s n n α + −   − ±  +   

Where the estimate of the pooled variance is 2 2 2 ( 1) ( 1) 2 X X Y Y p X Y n s n s s n n − + − = + −

(4)

Hypothesis test for the mean

Hypothesis

Assumption

Test equation

Data normally distributed,

or large sample; 2 σ known / x a n σ −

Z-table for critical value Testing a single mean

equals a value “a” :

o

H µ=a

Data normally distributed 2

σ unknown /

x a

s n

t-table with df = n-1 for critical value

Data normally distributed, or large sample; Independent samples; 2 2 , X Y σ σ are known 2 2 X Y X Y x y a n n σ σ − −   +    

Z-table for critical value Data normally distributed,

or large sample; Independent samples; 2 2 , X Y σ σ are unknown 2 2 X Y X Y x y a s s n n − −   +    

Z-table for critical value Testing the difference

between 2 means equals a number “a” (which

includes the case of a=0 which is a test for no difference between means)

:

o X Y

H µ −µ =a Case of 2 independent distributions

Data are normally distributed;

2 2 ,

X Y

σ σ are unknown but 2 2 X Y σ =σ 2 1 1 p X Y x y a s n n − −   +    

Where the estimate of the pooled variance is

2 2

(n −1)s +(n −1)s =

(5)

Covariance and correlation

Parameter Population formula Sample formula

Covariance between variables X and Y

COV(X,Y) = E(XY) – E(X)E(Y) ( )( )

1 i i x x y y n − − −

Pearson’s correlation ( , ) X Y Cov X Y σ σ X Y xy nxy s s

Simple linear regression

Model: for i = 1,2,…,n i i i y = +α βx +u OLS estimators Intercept Slope

(

)

2 2 2 2 var( ) i i y x x n x nx α β σ α = − = −

⌢ 2 2 2 2 2 var( ) i i xy nxy x nx x nx β σ β − = − = −

⌢ ⌢

Estimator for variance of error term u is

(

)

2 2 i i y y n − −

References

Related documents

The testimony 1 by United States General Accounting Office (GAO), 2003 revealed that GPOs didn’t lead to cost savings consistently across different healthcare products. The

• Identify Strategic Drivers • Evaluate the Full Range of Options • Assess Internal Capabilities • Determine Scope and Logic • Create the Project Team • Link Buyer’s Needs to

The March indictment of Maduro and other senior officials on charges of narcotrafficking and support for terrorism against the US ratchets up the pressure on the Venezuelan

If you miss any services or take your vehicle to an uncertified repair facility other then where you purchased your vehicle, you might void your warranty agreement..

>= If the value of left operand is greater than or equal to the value of right operand, then condition becomes true.. (a >= b) is not

This research proposes an alternative restoration technique to visualize architectural heritage information at the historical site by employing MARR application and

The objective of the current study was to assess serum level of prolactin (PRL) and tumor necrosis factor alpha (TNF α ) in patients with relapsing remitting multiple sclerosis

You are a good worker Usted es un buen trabajador You do good work Usted hace un buen trabajo You have to fill out this Tiene que llenar esta solicitud You will need ...