FRM 2

(1)

P1.T2.

Quantitative Analysis

(2)

FRM 2012QUANTITATIVE ANALYSIS  1 www.bionicturtle.com

Stock, Chapter 2: Review of Probability ... 2

Stock, Chapter 3: Review of Statistics ... 28

Stock, Chapter 4: Linear Regression with one regressor... 50

Stock, Chapter 5: Single Regression: Hypothesis Tests and Confidence Intervals ... 59

Stock: Chapter 6: Linear Regression with Multiple Regressors ... 63

Stock, Chapter 7: Hypothesis Tests and Confidence Intervals in Multiple Regression ... 68

Rachev, Menn, and Fabozzi, Chapter 2: Discrete Probability Distributions ... 72

Rachev, Menn, and Fabozzi, Chapter 3: Continuous Probability Distributions ... 76

Jorion, Chapter 12: Monte Carlo Methods ... 87

Hull, Chapter 22: Estimating Volatilities and Correlations ... 98

(3)

Stock, Chapter 2:

Review of Probability

In this chapter…

 Define random variables, and distinguish between continuous and discrete random variables.

 Define the probability of an event.

 Define, calculate, and interpret the mean, standard deviation, and variance of a random variable.

 Define, calculate, and interpret the skewness, and kurtosis of a distribution.

 Describe joint, marginal, and conditional probability functions.

 Explain the difference between statistical independence and statistical dependence.

 Calculate the mean and variance of sums of random variables.

 Describe the key properties of the normal, standard normal, multivariate normal, Chi-squared, Student t, and F distributions.

 Define and describe random sampling and what is meant by i.i.d.

 Define, calculate, and interpret the mean and variance of the sample average.

 Describe, interpret, and apply the Law of Large Numbers and the Central Limit Theorem.

(4)

Define random variables, and distinguish between continuous and

discrete random variables.

We characterize (describe) a random variable with a probability distribution. The random variable can be discrete or continuous; and in either the discrete or continuous case, the probability can be local (PMF, PDF) or cumulative (CDF).

A random variable is a variable whose value is determined by the outcome of an

experiment (a.k.a., stochastic variable). “A random variable is a numerical summary of a random outcome. The number of times your computer crashes while you are writing a term paper is random and takes on a numerical value, so it is a random variable.”—S&W

Continuous random variable

A continuous random variable (X) has an infinite number of values within an interval:

(

)

b

( )

a

P a



X



b





f x dx

Pr (c

1

≤ Z ≤ c

2

) =

φ(c

2

) - φ(c

1

)

Pr (Z ≤ c)= φ(c)

Pr (X = 3)

Pr (X ≤ 3)

Continuous Discrete probability function (pdf, pmf) Cumulative Distribution Function (CDF)

(5)

Discrete random variable

A discrete random variable (X) assumes a value among a finite set including x1, x2, x3 and so on. The probability function is expressed by:

(

_k

)

(

_k

)

P X



x



f x

Notes on continuous versus discrete random variables

 Discrete random variables can be counted. Continuous random variables must be measured.

 Examples of a discrete random variable include: coin toss (head or tails, nothing in between); roll of the dice (1, 2, 3, 4, 5, 6); and “did the fund beat the benchmark?”(yes, no). In risk, common discrete random variables are default/no default (0/1) and loss frequency.

 Examples of continuous random variables include: distance and time. A common example of a continuous variable, in risk, is loss severity.

 Note the similarity between the summation (∑ ) under the discrete variable and the integral (∫) under the continuous variable. The summation (∑) of all discrete outcomes must equal one. Similarly, the integral (∫) captures the area under the continuous

distribution function. The total area “under this curve,” from (-∞) to (∞), must equal one.

 All four of the so-called sampling distributions—that each converge to the normal—are continuous: normal, student’s t, chi-square, and F distribution.

(6)

Summary

Continuous

Discrete

Are measured

Are counted

Infinite

Finite

Examples in Finance

Distance, Time (e.g.)

Default (1,0) (e.g.)

Severity of loss (e.g.)

Frequency of loss (e.g.)

Asset returns (e.g.)

For example

Normal

Bernoulli (0/1)

Student’s t

Binomial (series i.i.d. Bernoullis)

Chi-square

Poisson

F distribution

Logarithmic

Lognormal

Exponential

Gamma, Beta

EVT Distributions (GPD, GEV)

Define the probability of an event.

Probability: Classical or “a priori” definition

The probability of outcome (A) is given by:

Number of outcomes favorable to A

( )

Total number of outcomes

P A



For example, consider a craps roll of two six-sided dice. What is the probability of rolling a seven; i.e., P[X=7]? There are six outcomes that generate a roll of seven: 1+6, 2+5, 3+4, 4+3, 5+2, and 6+1. Further, there are 36 total outcomes. Therefore, the probability is 6/36.

In this case, the outcomes need to be mutually exclusive, equally likely, and

“cumulatively exhaustive” (i.e., all possible outcomes included in total). A key property of a probability is that the sum of the probabilities for all (discrete) outcomes is 1.0.

(7)

Probability: Relative frequency or empirical definition

Relative frequency is based on an actual number of historical observations (or Monte Carlo simulations). For example, here is a simulation (produced in Excel) of one hundred (100) rolls of a single six-sided die:

Empirical Distribution

Roll

Freq.

%

1

11 11%

2

17 17%

3

18 18%

4

21 21%

5

18 18%

6

15 15%

Total

100 100%

Note the difference between an a priori probability and an empirical probability:  The a priori (classical) probability of rolling a three (3) is 1/6,

 But the empirical frequency, based on this sample, is 18%. If we generate another sample, we will produce a different empirical frequency.

This relates also to sampling variation. The a priori probability is based on population properties; in this case, the a priori probability of rolling any number is clearly 1/6th_. However, a sample of 100 trials will exhibit sampling variation: the number of threes (3s) rolled above varies from the parametric probability of 1/6th_{. We do not expect the}

(8)

Define, calculate, and interpret the mean, standard deviation, and

variance of a random variable.

If we can characterize a random variable (e.g., if we know all outcomes and that each outcome is equally likely—as is the case when you roll a single die)—the expectation of the random variable is often called the mean or arithmetic mean.

Mean (expected value)

Expected value is the weighted average of possible values. In the case of a discrete random variable, expected value is given by:



_{1 1}



_{2 2}





_

1

( )

k k k i i i

E Y

y p

In the case of a continuous random variable, expected value is given by:

( )

E X





xf X dx

Variance

Variance and standard deviation are the second moment measures of dispersion. The variance of a discrete random variable Y is given by:





2





2 2 1

variance( )

k Y Y i Y i i

Y

E Y

y

p













_



_



_



Variance is also expressed as the difference between the expected value of X^2 and the square of the expected value of X. This is the more useful variance formula:

2 2 2 2

[(

) ]

(

) [ ( )]

Y

E Y

Y

E Y













Please memorize this variance formula above: it comes in handy! For example, if the probability of loan default (PD) is a Bernouilli trial, what is the variance of PD?

We can solve with E[PD^2] – (E[PD])^2, As E[PD^2] = p and E[PD] = p, E[PD^2] – (E[PD])^2 = p – p^2 = p*(1-p).

(9)

Example: Variance of a single six-sided die

For example, what is the variance of a single six-sided die? First, we need to solve for the expected value of X-squared, E[X2_{]. This is given by:}

2

1

2

1

2

1

2

1

2

1

2

1

2

91 [

]

(1 )

(2 )

(3 )

(4 )

(5 )

(6 )

6

6 E X



 

_{ }



 

_{ }



 

_{ }



_{ }

 



 

_{ }



 

_{ }



 

Then, we need to square the expected value of X, [E(X)]2_{. The expected value of a single six-sided} die is 3.5 (the average outcome). So, the variance of a single six-sided die is given by:

2 2

91

2

( )

(

) [ ( )]

(3.5)

2.92

6 Variance X



E X



E X







Here is the same derivation of the variance of a single six-sided die (which has a uniform distribution) in tabular format:

What is the variance of the total of two six-sided die cast together? It is simply the Variance (X) plus the Variance (Y) or about 5.83. The reason we can simply add them together is that they are independent random variables.

Sample Variance:

The unbiased estimate of the sample variance is given by:

2 2 1

1 (

)

1

k x i Y i

s

y

k











(10)

FRM 2012QUANTITATIVE ANALYSIS  9 www.bionicturtle.com Properties of variance



    

















only if independent only if inde 2 constant 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 pendent 2 2 2

only if independent

1.

0 2 .

2 .

3.

4.

5.

6.

7. (

)

(

)

X Y X Y X Y X Y X b X aX X aX b X aX bY X Y X

a

b

a

b

E X

Standard deviation:

Standard deviation is given by:





2





2

var( )

Y

E Y

Y

y

i Y

p

i







_







_







As variance = standard deviation^2, standard deviation = Square Root[variance]

Sample Standard Deviation:

The unbiased estimate of the sample standard deviation is given by:

2 1

1 (

)

1

k X i Y i

s

y

k











This is merely the square root of the sample variance. This formula is important because this is the technically precise way to calculate volatility.

(11)

Define, calculate, and interpret the skewness, and kurtosis of a

distribution.

Skewness (asymmetry)

Skewness refers to whether a distribution is symmetrical. An asymmetrical distribution is skewed, either positively (to the right) or negatively (to the left) skewed. The measure of “relative skewness” is given by the equation below, where zero indicates symmetry (no skewness):

3 3 3

[(

) ]

Skewness =



E X









For example, the gamma distribution has positive skew (skew > 0):

Skewness is a measure of asymmetry

If a distribution is symmetrical, mean = median = mode. If a distribution has positive skew, the mean > median > mode. If a distribution has negative skew, the mean < median < mode.

Kurtosis

Kurtosis measures the degree of “peakedness” of the distribution, and consequently of “heaviness of the tails.” A value of three (3) indicates normal peakedness. The normal distribution has kurtosis of 3, such that “excess kurtosis” equals (kurtosis – 3).

4 4 4

[(

) ]

Kurtosis =



E X









0.20

0.40

0.60

0.80

1.00

1.20

0 .0

0 .6

1 .2

1 .8

2 .4

3 .0

3 .6

4 .2

4 .8

Gamma Distribution

Positive (Right) Skew

alpha=1,

beta=1

alpha=2,

beta=.5

alpha=4,

beta=.25

(12)

A normal distribution has relative skewness of zero and kurtosis of three (or the same idea put another way: excess kurtosis of zero). Relative skewness > 0 indicates positive skewness (a longer right tail) and relative skewness < 0 indicates negative skewness (a longer left tail). Kurtosis greater than three (>3), which is the same thing as saying “excess kurtosis > 0,” indicates high peaks and fat tails (leptokurtic). Kurtosis less than three (<3), which is the same thing as saying “excess kurtosis < 0,” indicates lower peaks. Kurtosis is a measure of tail weight (heavy, normal, or light-tailed) and “peakedness”: kurtosis > 3.0 (or excess kurtosis > 0) implies heavy-tails.

Financial asset returns are typically considered leptokurtic (i.e., heavy or fat- tailed) For example, the logistic distribution exhibits leptokurtosis (heavy-tails; kurtosis > 3.0):

Univariate versus multivariate probability density functions

A single variable (univariate) probability distribution is concerned with only a single random variable; e.g., roll of a die, default of a single obligor. A multivariate probability density

function concerns the outcome of an experiment with more than one random variable. This includes, the simplest case, two variables (i.e., a bivariate distribution).

Density

Cumulative

Univariate

f(x)= P(X = x)

F(x) = P(X ≤ x)

Bivariate

f(x)= P(X = x, Y =y)

f(x) = P(X ≤ x, Y ≤ y)

0.10

0.20

0.30

0.40

0.50 1 5 9 13 17 21 25 29 33 37 41

Logistic Distribution

Heavy-tails (excess kurtosis > 0)

alpha=0, beta=1

alpha=2, beta=1

alpha=0, beta=3

N(0,1)

(13)

Describe joint, marginal, and conditional probability functions.

Stock & Watson illustrate with two variables:

 The age of the computer (A), a Bernoulli such that the computer is old (0) or new (1)  The number of times the computer crashes (M)

Marginal probability functions

A marginal (or unconditional) probability is the simple case: it is the probability that does not depend on a prior event or prior information. The marginal probability is also called the unconditional probability.

In the following table, please note that ten joint outcomes are possible because the age variable (A) has two outcomes and the “number of crashes” variable (M) has five outcomes. Each of the ten outcomes is mutually exclusive and the sum of their probabilities is 1.0 or 100%. For example, the probability that a new computer crashes once is 0.035 or 3.5%.

The marginal (unconditional) probability that a computer is new (A = 1) is the sum of joint probabilities in the second row:





1

Pr(

)

Pr

,

l i i

Y

y

X

x Y

y



_



Pr(

A

 

1)

0.5

0

1

2

3

4 Tot

0

Old

0.35

0.065

0.05

0.025

0.01

0.50

1

New

0.45

0.035

0.01

0.005

0.00

0.50 Tot

0.80

0.100

0.03

0.030

0.01

1.00

“The marginal probability distribution of a random variable Y is just another name for its probability distribution. This term distinguishes the distribution of Y alone (marginal distribution) from the joint distribution of Y and another random variable. The marginal distribution of Y can be computed from the joint distribution of X and Y by adding up the probabilities of all possible outcomes for which Y takes on a specified value”—S&W

(14)

Joint probability functions

The joint probability is the probability that the random variables (in this case, both random variables) take on certain values simultaneously.

Pr(

X



y Y

,



y

)

Pr(

A



0,

M



0)



0.35

0

1

2

3

4 Tot

0

Old

0.35

0.065

0.05

0.025

0.01

0.50

1

New

0.45

0.035

0.01

0.005

0.00

0.50 Tot

0.80

0.100

0.03

0.030

0.01

1.00

“The joint probability distribution of two discrete random variables, say X and Y, is the probability that the random variables simultaneously take on certain values, say x and y. The probabilities of all possible ( x, y) combinations sum to 1. The joint probability distribution can be written as the function Pr(X = x, Y = y).” —S&W

Conditional probability functions

Conditional is the probability of an outcome given (conditional on) another outcome:

Pr(

,

)

Pr(

|

)

Pr(

)

X

x Y

y

Y

y X

x

X

x



Pr(

M



0 |

A



0)



0.35 0.50



0.70

0

1

2

3

4 Tot

0

Old

0.35

0.065

0.05

0.025

0.01

0.50

1

New

0.45

0.035

0.01

0.005

0.00

0.50 Tot

0.80

0.100

0.03

0.030

0.01

1.00

“The distribution of a random variable Y conditional on another random variable X taking on a specific value is called the conditional distribution of Y given X. The conditional probability that Y takes on the value y when X takes on the value x is written:

(15)

Conditional probability = Joint Probability/Marginal Probability

What is the probability of B occurring, given that A has already occurred?

(

)

( | )

( ) ( | )

(

)

( )

P A

B

P B A

P A P B A

P A

B

P A











Conditional and unconditional expectation

An unconditional expectation is the expected value of the variable without any restrictions (or lacking any prior information).

A conditional expectation is an expected value for the variable conditional on prior information or some restriction (e.g., the value of a correlated variable). The conditional expectation of Y, conditional on X = x, is given by:

( |

)

E Y X



x

The conditional variance of Y, conditional on X=x, is given by:

var( |

Y X



x

)

The two-variable regression is a important conditional expectation. In this case, we say the expected Y is conditional on X:

E Y X

( |

_i

)



B

₁



B X

₂ _i

For Example: Two Stocks (S) and (T)

For example, consider two stocks. Assume that both Stock (S) and Stock (T) can each only reach three price levels. Stock (S) can achieve: $10, $15, or $20. Stock (T) can achieve: $15, $20, or $30. Historically, assume we witnessed 26 outcomes and they were distributed as follows.

Note S = S$10/15/20 and T = T$15/20/30 :

S= $10

S= $15

S=$20

Total

T=$15

0

2

4 T=$20

3

4

3

10 T=$30

3

6

3

12 Total

6

12

8

26

What is the joint probability?

A joint probability is the probability that both random variables will have a certain outcome. Here the joint probability P(S=$20, T=$30) = 3/26.

(16)

What is the marginal (unconditional) probability

The unconditional probability of the outcome where S=$20 = 8/26 because there are eight events out of 26 total events that produce S=$20. The unconditional probability P(S=20) = 8/26

What is the conditional probability

Instead we can ask a conditional probability question: “What is the probability that S=$20 given that T=$20?” The probability that S=$20 conditional on the knowledge that T=$20 is 3/10 because among the 10 events that produce T=$20, three are S=$20.

(

$20,

$20)

3 (

$20

$20)

(

$20)

10 P S

T

P S

T

P T



In summary:

 The unconditional probability P(S=20) = 8/26  The conditional probability P(S=20 | T=20) = 3/10  The joint probability P(S=20,T=30) = 3/26

Explain the difference between statistical independence and statistical

dependence.

X and Y are independent if the condition distribution of Y given X equals the marginal distribution of Y. Since independence implies Pr (Y=y | X=x) = Pr(Y=y):

Pr(

,

)

Pr(

|

)

Pr(

)

X

x Y

y

Y

y X

x

X

x



The most useful test of statistical independence is given by:

Pr(

X



x Y

,



y

)



Pr(

X



x P Y

) (



y

)

X and Y are independent if their joint distribution is equal to the product of their marginal distributions.

Statistical independence is when the value taken by one variable has no effect on the value taken by the other variable. If the variables are independent, their joint probability will equal the product of their marginal probabilities. If they are not independent, they are dependent.

(17)

For example, when rolling two dice, the second will be independent of the first.

This independence implies that the probability of rolling double-sixes is equal to the product of P(rolling one six) and P(rolling one

six). If two die are independent, then P (first roll = 6, second roll = 6) = P(rolling a six) * P (rolling a six). And, indeed: 1/36 = (1/6)*(1/6)

Calculate the mean and variance of sums of random variables.

Mean

(

)

_X _Y

E a



bX



cY

 

a

b





c



Variance

In regard to the sum of correlated variables, the variance of correlated variables is given by the following (note the two expressions; the second merely substitutes the covariance with the product of correlation and volatilities. Please make sure you are comfortable with this substitution).

2 2 2

2 , and given that

2

X Y X Y XY XY X Y X Y X Y X Y



 



 

 









In regard to the difference between correlated variables, the variance of correlated variables is given by:

2 2 2

2 and given that

2

X Y X Y XY XY X Y X Y X Y X Y



 



 

 













Variance with constants (a) and (b)

Variance of sum includes covariance (X,Y):

2 2 2 2

variance(

aX



bY

)



a



_X



2 ab



_XY



b



_Y

If X and Y are independent, the covariance term drops out and the variance simply adds::

2 2

(18)

Describe the key properties of the normal, standard normal, multivariate

normal, Chi-squared, Student t, and F distributions.

Normal distribution

Key properties of the normal:

 Symmetrical around mean; skew = 0

 Parsimony: Only requires (is fully described by) two parameters: mean and variance

 Summation stability: a linear combination (function) of two normally distributed random variables is itself normally distributed

 Kurtosis = 3 (excess kurtosis = 0)

The normal distribution is commonplace for at least three reasons:

 The central limit theorem (CLT) says that sampling distribution of sample means tends to be normal (i.e., converges toward a normally shaped distributed) regardless of the

shape of the underlying distribution; this explains much of the “popularity” of the normal

distribution.

 The normal is economical (elegant) because it only requires two parameters (mean and variance). The standard normal is even more economical: it requires no

parameters.

 The normal is tractable: it is easy to manipulate (especially in regard to closed-form equations like the Black-Scholes)

-0.1

0.1

0.3

0.5 (

4 .0

)

(

3 .0

)

(

2 .0

)

(

1 .0

)

0 .0

1 .0

2 .0

3 .0

4 .0

2 2 ( ) 2

1 ( )

2

x

f x

e

 





 



(19)

Standard normal distribution

A normal distribution is fully specified by two parameters, mean and variance (or standard deviation). We can transform a normal into a unit or standardized variable:

 Standard normal has mean = 0,and variance = 1

 No parameters required!

This unit or standardized variable is normally distributed with zero mean and variance of one (1.0). Its standard deviation is also one (variance = 1.0 and standard deviation = 1.0). This is written as: Variable Z is approximately (“asymptotically”) normally distributed: Z ~ N(0,1)

Standard normal distribution: Critical Z values:

Key locations on the normal distribution are noted below. In the FRM curriculum, the choice of one-tailed 5% significance and 1% significance (i.e., 95% and 99% confidence) is common, so please pay particular attention to the yellow highlights:

Critical

z values

Two-sided

Confidence

One-sided

Significance

1.00 ~ 68%

~ 15.87%

1.645 (~1.65)

~ 90%

~ 5.0 %

1.96 ~ 95%

~ 2.5%

2.327 (~2.33)

~ 98%

~ 1.0 %

2.58 ~ 99%

~ 0.5%

Memorize two common critical values: 1.65 and 2.33. These correspond to confidence levels, respectively, of 95% and 99% for a one-tailed test. For VAR, the one-tailed test is relevant because we are concerned only about losses (left-tail) not gains (right-tail).

Multivariate normal distributions

Normal can be generalized to a joint distribution of normal; e.g., bivariate normal distribution. Properties include:

1. If X and Y are bivariate normal, then aX + bY is normal; any linear combination is normal

2. If a set of variables has a multivariate normal distribution, the marginal distribution of each is normal

(20)

Chi-squared distribution

For the chi-square distribution, we observe a sample variance and compare to hypothetical population variance. This variable has a chi-square distribution with (n-1) d.f.:

2 2 ( 1) 2

(

1) ~

n

s

n





















Chi-squared distribution is the sum of m squared independent standard normal random variables. Properties of the chi-squared distribution include:

 Nonnegative (>0)

 Skewed right, but as d.f. increases it approaches normal  Expected value (mean) = k, where k = degrees of freedom  Variance = 2k, where k = degrees of freedom

 The sum of two independent chi-square variables is also a chi-squared variable

Chi-squared distribution: For example (Google’s stock return variance)

Google’s sample variance over 30 days is 0.0263%. We can test the hypothesis that the population variance (Google’s “true” variance) is 0.02%. The chi-square variable = 38.14:

Sample variance (30 days)

0.0263%

Degrees of freedom (d.f.)

29 Population variance?

0.0200%

Chi-square variable

38.14 **= 0.0263%/0.02%*29**

=CHIDIST() =

p value

11.93%

@ 29 d.f., Pr[.1] = 39.0875

Area under curve (1- )

88.07%

With 29 degrees of freedom (d.f.), 38.14 corresponds to roughly 10% (i.e., to left of 0.10 on the lookup table). Therefore, we can reject the null with only 88% confidence; i.e., we are likely to accept the probability that the true variance is 0.02%.

0%

10%

20%

30%

40%

0

10

20

30 Chi-square distribution

k = 2

k = 5

k = 29

(21)

Student t’s distribution

The student’s t distribution (t distribution) is among the most commonly used distributions. As the degrees of freedom (d.f.) increases, the t-distribution converges with the normal

distribution. It is similar to the normal, except it exhibits slightly heavier tails (the lower the d.f.., the heavier the tails). The student’s t variable is given by:

X x

X

t

S

n







Properties of the t-distribution:

 Like the normal, it is symmetrical

 Like the standard normal, it has mean of zero (mean = 0)

 Its variance = k/(k-2) where k = degrees of freedom. Note, as k increases, the variance approaches 1.0. Therefore, as k increases, the t-distribution approximates the standard normal distribution.

 Always slightly heavy-tail (kurtosis>3.0) but converges to normal. But the student’s t is not considered a really heavy-tailed distribution

In practice, the student’s t is the mostly commonly used distribution. When we test the significance of regression coefficients, the central limit thereom (CLT) justifies the normal distribution (because the coefficients are effectively sample means). But we rarely know the population variance, such that the student’s t is the appropriate distribution.

When the d.f. is large (e.g., sample over ~30), as the student’s t approximates the normal, we can use the normal as a proxy. In the assigned Stock & Watson, the sample sizes are large (e.g., 420 students), so they tend to use the normal.

0.00

0.01

0.02

0.03

0.04

0

0 .4

₀

.8

₁

.2

₁

.6

2 ₂

.4

₂

.8

₃

.2

₃

.6

t distribution vs. Normal

2

20 Normal

(22)

Student t’s distribution: For example

For example, Google’s average periodic return over a ten-day sample period was +0.02% with sample standard deviation of 1.54%. Here are the statistics:

Sample Mean

0.02%

Sample Std Dev

1.54%

Days (n=10)

10 Confidence

95%

Significance (1-)

5%

Critical t

2.262 Lower limit

-1.08%

Upper limit

1.12%

The sample mean is a random variable. If we know the population variance, we assume the sample mean is normally distributed. But if we do not know the population variance (typically the case!), the sample mean is a random variable following a student’s t distribution. In the Google example above, we can use this to construct a confidence (random) interval:

s

X

t

n





We need the critical (lookup) t value. The critical t value is a function of:  Degrees of freedom (d.f.); e.g., 10-1 =9 in this example, and  Significance; e.g., 1-95% confidence = 5% in this example

The 95% confidence interval can be computed. The upper limit is given by:

1.54%

(2.262)

1.12%

10 X





And the lower limit is given by:

1.54%

(2.262)

1.08%

10 X



 

Please make sure you can take a sample standard deviation, compute the critical t value and construct the confidence interval.

(23)

Both the normal (Z) and student’s t (t) distribution characterize the sampling distribution of the sample mean. The difference is that te normal is used when we know the population variance; the student’s t is used when we mus rely on the sample variance. In practice, we don’t know the population variance, so the student’s t is typically appropriate.









X X X X

X

Z

t

n

S









F-Distribution

The F distribution is also called the variance ratio distribution (it may be helpful to think of it as the variance ratio!). The F ratio is the ratio of sample variances, with the greater sample variance in the numerator: 2 2 x y

s

F

s



Properties of F distribution:  Nonnegative (>0)  Skewed right

 Like the chi-square distribution, as d.f. increases, approaches normal  The square of t-distributed r.v. with k d.f. has an F distribution with 1,k d.f.  m * F(m,n)=χ2

0%

2%

4%

6%

8%

10%

0.1 0.4 0.7 1.0 1.3 1.6 1.9 2.2

F distribution

19,19

9,9

(24)

F-Distribution: For example

For example, based on two 10-day samples, we calculated the sample variance of Google and Yahoo. Google’s variance was 0.0237% and Yahoo’s was 0.0084%. The F ratio, therefore, is 2.82 (divide higher variance by lower variance; the F ratio must be greater than, or equal to, 1.0).

GOOG

YHOO

=VAR()

0.0237%

0.0084%

=COUNT()

10

10 F ratio

2.82 Confidence

90%

Significance

10%

=FINV()

2.44

At 10% significance, with (10-1) and (10-1) degrees of freedom, the critical F value is 2.44. Because our F ratio of 2.82 is greater than (>) 2.44, we reject the null (i.e., that the population variances are the same). We conclude the population variances are different.

Moments of a distribution

The k-th moment about the mean () is given by:

1

(

)

k-th moment

n k i i

x

n











In this way, the difference of each data point from the mean is raised to a power (k=1, k=2, k=3, and k=4). There are the four moments of the distribution:

 If k=1, refers to the first moment about zero: the mean.

 If k=2, refers to the second moment about the mean: the variance.  If k=3, refers to the third moment about the mean: skewness

(25)

Define and describe random sampling and what is meant by i.i.d.

A random sample is a sample of random variables that are independent and identically distributed (i.i.d.)

Independent and identically distributed (i.i.d.) variables:

 Each random variable has the same (identical) probability distribution (PDF/PMF, CDF) distribution

 Each random variable is drawn independently of the others; no serial- or auto-correlation

The concept of independent and identically distributed (i.i.d.) variables is a key

assumption we often encounter: to scale volatility by the square root of time requires i.i.d. returns. If returns are not i.i.d., then scaling volatlity by the square root of time will give an incorrect answer.

Define, calculate, and interpret the mean and variance of the sample

average.

The sample mean is given by:

1

1 ( )

( )

n i Y i

E Y

n











The variance of the sample mean is given by:

2

variance( )

Y

Std Dev( )

Y Y

Y

n



_





Independent

Not (auto) correlated

Identical

Same Mean,

Same Variance

(26)

We expect the sample mean to equal the population mean

The sample mean is denoted by



_Y . The expected value of the sample mean is, as you might expect, the population mean:

( )

_Y _Y

E Y









This formula says, “we expect the average of our sample will equal the average of the population.” (over-bar signifies sample, Greek mu signifies the mean (average).

Sampling distribution of the sample mean

If either: (i) the population is infinite and random sampling, or (ii) finite population and sampling with replacement, the variance of the sampling distribution of means is:

2 2 2

[(

) ]

Y Y Y

E Y

n











This says, “The variance of the sample mean is equal to the population variance divided by the sample size.” For example, the (population) variance of a single six-sided die is 2.92. If we roll three die (i.e., sampling “with replacement”), then the variance of the sampling distribution = (2.92  3) = 0.97.

If the population is size (N), if the sample size n  N, and if sampling is conducted “without replacement,” then the variance of the sampling distribution of means is given by:

2 2

1

Y Y

N

n

N







_





_







Standard error is the standard deviation of the sample mean

The standard error is the standard deviation of the sampling distribution of the estimator, and the sampling distribution of an estimator is a probability (frequency distribution) of the estimator (i.e., a distribution of the set of values of the estimator obtained from all possible same-size samples from a given population). For a sample mean (per the central limit theorem!), the variance of the estimator is the population variance divided by sample size. The standard error is the square root of this variance; the standard error is a standard deviation:

2

se

Y Y

n



(27)

If the population is distributed with mean  and variance 2 but the distribution is not a normal distribution, then the standardized variable given by Z below is “asymptotically normal; i.e., as (n) approaches infinity () the distribution becomes normal.



 



~

(0,1)

Y Y Y

Y

Z

N

se

n









The denominator is the standard error: which is simply the name for the standard deviation of sampling distribution.

Describe, interpret, and apply the Law of Large Numbers and the Central

Limit Theorem.

In brief:

 Law of large numbers: under general conditions, the sample mean (Ӯ) will be near the population mean.

 Central limit theorem (CLT): As the sample size increases, regardless of the underlying distribution, the sampling distributions approximates (tends toward) normal

Central limit theorem (CLT)

We assume a population with a known mean and finite variance, but not necessarily a normal distribution (we may not know the distribution!). Random samples of size (n) are then drawn from the population. The expected value of each random variable is the population’s mean. Further, the variance of each random variable is equal the population’s variance divided by n (note: this is equivalent to saying the standard deviation of each random variable is equal to the population’s standard deviation divided by the square root of n).

The central limit theorem says that this random variable (i.e., of sample size n, drawn from the population) is itself normally distributed, regardless of the shape of the underlying

population. Given a population described by any probability distribution having mean () and finite variance (2_{), the distribution of the sample mean computed from samples (where each} sample equals size n) will be approximately normal. Generally, if the size of the sample is at least 30 (n  30), then we can assume the sample mean is approximately normal!

(28)

Each sample has a sample mean. There are many sample means. The sample means have variation: a sampling distribution. The central limit theorem (CLT) says the sampling distribution of sample means is asymptotically normal.

Summary of central limit theorem (CLT):

 We assume a population with a known mean and finite variance, but not necessarily a normal distribution.

 Random samples (size n) drawn from the population.

 The expected value of each random variable is the population mean

 The distribution of the sample mean computed from samples (where each sample equals size n) will be approximately (asymptotically) normal.

 The variance of each random variable is equal to population variance divided by n (equivalently, the standard deviation is equal to the population standard deviation divided by the square root of n).

Sample Statistics and Sampling Distributions

When we draw from (or take) a sample, the sample is a random variable with its own characteristics. The “standard deviation of a sampling distribution” is called the

standard error. The mean of the sample or the sample mean is a random variable defined by:

1 2 n

X

n





Not Normal!

(individually)

But sample mean (and sum)

→ Normal Distribution!

(29)

Stock, Chapter 3:

Review of Statistics

In this chapter…

 Describe and interpret estimators of the sample mean and their properties.

 Describe and interpret the least squares estimator.

 Define, interpret and calculate the critical t‐values.

 Define, calculate and interpret a confidence interval.

 Describe the properties of point estimators:

 Distinguish between unbiased and biased estimators

 Define an efficient estimator and consistent estimator

 Explain and apply the process of hypothesis testing:

 Define and interpret the null hypothesis and the alternative hypothesis

 Distinguish between one‐sided and two‐sided hypotheses

 Describe the confidence interval approach to hypothesis testing

 Describe the test of significance approach to hypothesis testing

 Define, calculate and interpret type I and type II errors

 Define and interpret the p value

 Define, calculate, and interpret the sample variance, sample standard deviation, and standard error.

 Define, calculate, and interpret confidence intervals for the population mean.

 Perform and interpret hypothesis tests for the difference between two means.

 Define, describe, apply, and interpret the t-statistic when the sample size is small.

 Interpret scatterplots.

 Define, describe, and interpret the sample covariance and correlation.

Describe and interpret estimators of the sample mean and their

properties.

An estimator is a function of a sample of data to be drawn randomly from a population.

An estimate is the numerical value of the estimator when it is actually computed using data from a specific sample. An estimator is a random variable because of randomness in selecting the sample, while an estimate is a nonrandom number.

(30)

The sample mean, Ӯ, is the best linear unbiased estimator (BLUE). In the Stock & Watson example, the average (mean) wage among 200 people is $22.64:

Sample Mean

$22.64

Sample Standard Deviation

$18.14

Sample size (n)

200 Standard Error

1.28 H0: Population Mean =

$20.00

Test t statistic

2.06 p value

4.09%

Please note:

 The average wage of (n = ) 200 observations is $22.64  The standard deviation of this sample is $18.14

 The standard error of the sample mean is $1.28 because $18.14/SQRT(200) = $1.28  The degrees of freedom (d.f.) in this case are 199 = 200 – 1

“An estimator is a recipe for obtaining an estimate of a population parameter. A simple analogy explains the core idea: An estimator is like a recipe in a cook book; an estimate is like a cake baked according to the recipe.” Barreto & Howland, Introductory

Econometrics

In the above example, the sample mean is an estimator of the unknown, true population mean (in this case, the same mean estimator gives an estimate of $22.64).

What makes one estimator superior to another?

 Unbiased: the mean of the sampling distribution is the population mean (mu)

 Consistent. When the sample size is large, the uncertainty about the value of arising from random variations in the sample is very small.

 Variance and efficiency. Among all unbiased estimators, the estimator has the smallest variance is “efficient.”

If the sample is random (i.i.d.), the sample mean is the Best Linear Unbiased Estimator (BLUE). The sample mean is:

 Consistent, AND

(31)

'

Describe and interpret the least squares estimator.

The estimator (m) that minimizes the sum of squared gaps (Yi – m) is called the least squares estimator:





2 1

1

n i i

Y

m

i









The estimator (m) that minimizes the sum of squared gaps in the formula above is called the least squares estimator.

Define, interpret and calculate critical t‐values.

The t-statistic or t-ratio is given by:

,0

( )

Y

t

SE Y







The critical t-value or “lookup” t-value is the t-value for which the test just rejects the null hypothesis at a given significance level. For example:

 95% two-tailed (2T) critical t-value with 20 d.f. is 2.086

 Significance test: is t-statistic > critical (lookup) t?

The critical t-values bound a region within the student’s distribution that is a specific percentage (90%? 95%? 99%?) of the total area under the student’s t distribution curve. The student’s t distribution with (n-1) degrees of freedom (d.f.) has a confidence interval given by:

 

Y

 

Y Y

S

Y

t

Y

t

n



n





 

For example: critical t

If the (small) sample size is 20, then the 95% two-tailed critical t is 2.093. That is because the degrees of freedom are 19 (d.f. = n - 1) and if we review the lookup table on the following page (corresponds to Gujarati A-2) under the column = 0.025/0.5 and row = 19, then we find the cell value = 2.093. Therefore, given 19 d.f., 95% of the area under the student’s t distribution is bounded by +/- 2.093. Specifically, P(-2.093 ≤ t ≤ 2.093) = 95%.

(32)

Student’s t Lookup Table

Excel function: = TINV(two-tailed probability [larger #], d.f.)

1-tail: 0.25 0.1 0.05 0.025 0.01 0.005 0.001 d.f. 2-tail: 0.50 0.2 0.1 0.05 0.02 0.01 0.002 1 1.000 3.078 6.314 12.706 31.821 63.657 318.309 2 0.816 1.886 2.920 4.303 6.965 9.925 22.327 3 0.765 1.638 2.353 3.182 4.541 5.841 10.215 4 0.741 1.533 2.132 2.776 3.747 4.604 7.173 5 0.727 1.476 2.015 2.571 3.365 4.032 5.893 6 0.718 1.440 1.943 2.447 3.143 3.707 5.208 7 0.711 1.415 1.895 2.365 2.998 3.499 4.785 8 0.706 1.397 1.860 2.306 2.896 3.355 4.501 9 0.703 1.383 1.833 2.262 2.821 3.250 4.297 10 0.700 1.372 1.812 2.228 2.764 3.169 4.144 11 0.697 1.363 1.796 2.201 2.718 3.106 4.025 12 0.695 1.356 1.782 2.179 2.681 3.055 3.930 13 0.694 1.350 1.771 2.160 2.650 3.012 3.852 14 0.692 1.345 1.761 2.145 2.624 2.977 3.787 15 0.691 1.341 1.753 2.131 2.602 2.947 3.733 16 0.690 1.337 1.746 2.120 2.583 2.921 3.686 17 0.689 1.333 1.740 2.110 2.567 2.898 3.646 18 0.688 1.330 1.734 2.101 2.552 2.878 3.610 19 0.688 1.328 1.729 2.093 2.539 2.861 3.579 20 0.687 1.325 1.725 2.086 2.528 2.845 3.552 21 0.686 1.323 1.721 2.080 2.518 2.831 3.527 22 0.686 1.321 1.717 2.074 2.508 2.819 3.505 23 0.685 1.319 1.714 2.069 2.500 2.807 3.485 24 0.685 1.318 1.711 2.064 2.492 2.797 3.467 25 0.684 1.316 1.708 2.060 2.485 2.787 3.450 26 0.684 1.315 1.706 2.056 2.479 2.779 3.435 27 0.684 1.314 1.703 2.052 2.473 2.771 3.421 28 0.683 1.313 1.701 2.048 2.467 2.763 3.408 29 0.683 1.311 1.699 2.045 2.462 2.756 3.396 30 0.683 1.310 1.697 2.042 2.457 2.750 3.385

The green shaded area represents values less than three (< 3.0). Think of it as the “sweet spot.” For confidences less than 99% and d.f. > 13, the critical t is always less than 3.0. So, for example, a computed t of 7 or 13 will generally be significant. Keep this in mind because in many cases, you do not need to refer to the lookup table if the computed t is large; you can simply reject the null.

(33)

Define, calculate and interpret a confidence interval.

The confidence interval uses the product of [standard error х critical “lookup” t]. In the Stock & Watson example, the confidence interval is given by 22.64 +/- (1.28)(1.96) because 1.28 is the standard error and 1.96 is the critical t (critical Z) value associated with 95% two-tailed

confidence:

Sample Mean

$22.64

Sample Std Deviation $18.14

Sample size (n)

200 Standard Error

1.28 Confidence

95%

Critical t

1.972 Lower limit

$20.11

Upper limit

$25.17

Confidence Intervals: Another example with a sample of 28 P/E ratios

Assume we have price-to-earnings ratios (P/E ratios) of 28 NYSE companies:

Mean

23.25 Variance

90.13 Std Dev

9.49 Count

28 d.f.

27 Confidence (1-α)

95%

Significance (α)

5%

Critical t

2.052 Standard error

1.794 Lower limit

19.6 **= 23.25 - (2.052)*(1.794)**

Upper limit

26.9 **= 23.25 + (2.052)*(1.794)**

Hypothesis

18.5 t value

2.65 = (23.25 - 18.5) / (1.794)

p value

1.3%

Reject null with

98.7%

 The confidence coefficient is selected by the user; e.g., 95% (0.95) or 99% (0.99).  The significance = 1 – confidence coefficient.

 













95% CI for

1.96 22.64 1.28 1.972

Y

SE Y

(34)

To construct a confidence interval with the dataset above:

 Determine degrees of freedom (d.f.). d.f. = sample size – 1. In this case, 28 – 1 = 27 d.f.  Select confidence. In this case, confidence coefficient = 0.95 = 95%

 We are constructing an interval, so we need the critical t value for 5% significance with two-tails.

 The critical t value is equal to 2.052. That’s the value with 27 d.f. and either 2.5% one-tailed significance or 5% two-one-tailed significance (see how they are the same provided the

distribution is symmetrical?)

 The standard error is equal to the sample standard deviation divided by the square root of the sample size (not d.f.!). In this case, 9.49/SQRT(28)  1.794.

 The lower limit of the confidence interval is given by: the sample mean minus the critical t (2.052) multiplied by the standard error (9.49/SQRT[28]).

 The upper limit of the confidence interval is given by: the sample mean plus the critical t (2.052) multiplied by the standard error (9.49/SQRT[28]).

 





9.49 



9.49

23.25

2.052

23.25

2.052

28

x x X X

S

X

t

X

t

n



n















 This confidence interval is a random interval. Why? Because it will vary randomly with each sample, whereas we assume the population mean is static.

We don’t say the probability is 95% that the “true” population mean lies within this interval. That implies the true mean is variable. Instead, we say the

probability is 95% that the random interval contains the true mean. See how the population mean is trusted to be static and the interval varies?

(35)

Describe the properties of point estimators:

 An estimator is a function of a sample of data to be drawn randomly from a population.  An estimate is the numerical value of the estimator when it is actually computed using

data from a specific sample.

The key properties of point estimators include:

 Linearity: estimator is a linear function of sample observations. For example, the sample mean is a linear function of the observations.

 Unbiasedness: the average or expected value of the estimator is equal to the true value of the parameter.

 Minimum variance: the variance of the estimator is smaller than any “competing” estimator. Note: an estimator can have minimum variance yet be biased.

 Efficiency: Among the set of unbiased estimators, the estimator with the minimum variance is the efﬁcient estimator (i.e., it has the smallest variance among unbiased estimators)

 Best linear estimator (BLUE): the estimate that combines three properties: (i) linear, (ii) unbiased, and (iii) minimum variance

 Consistency: an estimator is consistent if, as the sample size increases, it approaches (converges on) the true value of the parameter

Distinguish between unbiased and biased estimators

An estimator is unbiased if:

 

Y Y

E







Otherwise the estimator is biased.

If the expected value of the estimator is the population parameter, the estimator is unbiased. If, in repeated applications of a method the mean value of the estimators coincides with the true parameter value, that estimator is called an unbiased estimator. Unbiasedness is a repeated sampling property: if we draw several samples of size (n) from a population and compute the unbiased sample statistic for each sample, the average of will tend to approach (converge on) the population parameter.

(36)

Define an efficient estimator and consistent estimator

An efficient estimate is both unbiased (i.e., the mean or expectation of the statistic is equal to the parameter) and its variance is smaller than the alternatives (i.e., all other things being equal, we would prefer a smaller variance). A statement of the error or precision of an estimate is often called its reliability

 Efficient: among unbiased, estimator will smallest variance

 “Consistent” is about property as sample size increases

Efficient

• Unbiased

• Smallest variance

Consistent

• As sample size increases,

estimator approaches true

parameter value

• As n→∞, E[estimator] =

parameter

 





 



variance

_Y

variance

_Y

p

Y







(37)

Explain and apply the process of hypothesis testing:

Define & interpret the null hypothesis and the alternative

Distinguish between one‐sided and two‐sided hypotheses

Describe the confidence interval approach to hypothesis testing

Describe the test of significance approach to hypothesis testing

Define, calculate and interpret type I and type II errors

(38)

Define & interpret the null hypothesis and the alternative Distinguish between one‐sided

and two‐sided hypotheses Describe the confidence interval

approach to hypothesis testing Describe the test of significance approach to hypothesis testing Define, calculate and interpret

type I and type II errors

Define and interpret the p value

Define and interpret the null hypothesis and the alternative hypothesis

Please not the null must contain the equal sign (“=“):

0 ,0 1 ,0

: ( )

Y Y

H

E Y

H

E Y







The null hypothesis, denoted by H0, is tested against the alternative hypothesis, which is denoted by H1 or sometimes HA.

Often, we test for the significance of the intercept or a partial slope coefficient in a linear regression. Typically,

in this case, our null hypothesis is: “the slope is zero” or “there is no correlation between X and Y” or “the regression coefficients jointly are not significant.” In which case, if we reject the null, we are finding the statistic to be significant which, in this case, means “significantly different than zero.”

Statistical significance implies our null hypothesis (i.e., the parameter equals zero) was rejected. We concluded the parameter is nonzero. For example, a “significant” slope estimate means we rejected the null hypothesis that the true slope is zero.

0 1

: ( )

$20

: ( )

$20

H

E Y

H

E Y





FRM 2

P1.T2.

Quantitative Analysis

Table of Contents

Stock, Chapter 2: Review of Probability ... 2

Stock, Chapter 3: Review of Statistics ... 28

Stock, Chapter 4: Linear Regression with one regressor... 50

Stock, Chapter 5: Single Regression: Hypothesis Tests and Confidence Intervals ... 59

Stock: Chapter 6: Linear Regression with Multiple Regressors ... 63

Stock, Chapter 7: Hypothesis Tests and Confidence Intervals in Multiple Regression ... 68

Rachev, Menn, and Fabozzi, Chapter 2: Discrete Probability Distributions ... 72

Rachev, Menn, and Fabozzi, Chapter 3: Continuous Probability Distributions ... 76

Jorion, Chapter 12: Monte Carlo Methods ... 87

Hull, Chapter 22: Estimating Volatilities and Correlations ... 98

Stock, Chapter 2:

Review of Probability

In this chapter…

Define random variables, and distinguish between continuous and

discrete random variables.

(

)

( )

P a



X



b





f x dx

Pr (c

≤ Z ≤ c

) =

φ(c

) - φ(c

)

Pr (Z ≤ c)= φ(c)

Pr (X = 3)

Pr (X ≤ 3)

(

)

(

)

P X



x



f x

Summary

Continuous

Discrete

Are measured

Are counted

Infinite

Finite

Examples in Finance

Distance, Time (e.g.)

Default (1,0) (e.g.)

Severity of loss (e.g.)

Frequency of loss (e.g.)

Asset returns (e.g.)

For example

Normal

Bernoulli (0/1)

Student’s t

Binomial (series i.i.d. Bernoullis)

Chi-square

Poisson

F distribution

Logarithmic

Lognormal

Exponential

Gamma, Beta

EVT Distributions (GPD, GEV)

Define the probability of an event.

Number of outcomes favorable to A

( )

Total number of outcomes

P A



_

_

_

_