Introduction to Probability

(1)

Introduction to Probability

EE 179, Lecture 15, Handout #24

◮ Probability theory gives a mathematical characterization for experiments with random outcomes.

◮

coin toss

◮

life of lightbulb

◮

binary data sequence

◮

Brownian motion

◮ An event is a set of outcomes belonging to a sample space.

◮ Events must be repeatable and have statistical regularity, i.e. a large number of experiments have regularity in outcome patterns.

◮ We define the probability of an event A as the average number of times that the outcome belongs to A in the limit for large n:

P(A) = lim

n→∞

number of times outcome is in A n

◮ Examples:

◮

P(roulette wheel outcome is red) =

¹⁸38

◮

P(rain tomorrow) ≈

365⁵⁷

(bogus)

EE 179, May 2, 2014 Lecture 15, Page 1

(2)

Mathematics of Probability: Axiomatic Approach

◮ Random events are defined on a probability space:

◮

sample space S of possible outcomes (finite or infinite)

◮

family (set) of events {A

ⁱ

} that are subsets of S

◮

probability measure P( ·) on events

◮ The probability measure has three properties:

◮

P(A) ≥ 0

◮

P(S) = 1

◮

if A

i

∩ A

^j

is empty, then P ( ∪

ⁱ

A

i

) = P

i

P(A

i

)

◮ We formally write probability space as triple (S, {A ⁱ }, P(·)}

◮ A very common notation for the sample space is Ω, and a generic outcome is ω

◮ This axiomatic approach was introduced by Kolmogorov around 1930.

Probability has been used for thousand of years. Proverbs 16:32: “We may throw the dice, but the Lord determines how they fall.”

(3)

Random Variables

◮ Practical definition of random variable: the numerical output of a probabilistic experiment.

◮

coin flip (tails=0, heads=1) or sum of two dice (2, 3, . . . , 12)

◮

amount of snow fall at a location over a duration

◮

noise voltage at an instant or integral of noise over an interval

◮ Mathematical definition: a real-valued function defined on the sample space of a probability space.

X : Ω 7→ R ⇒ X(ω) ∈ R for every ω ∈ Ω Examples:

◮

Sample space for toss of two dice is {i, j : 1 ≤ i, j ≤ 6}. The sum is the r.v. i + j.

◮

For the BSC, the input is the r.v. X = x and the output is Y = y.

We have derived the joint probability distribution for X and Y .

◮ If the values of a r.v. are discrete, the r.v. is called discrete. Otherwise the r.v. is continuous or mixed.

(4)

Probability Mass Function

The probability distribution of a discrete random is complete described by its probality mass function (pmf or PMF).

P {X = x k } = p X (x _k ) , where values of X are {x k } In this special of the axioms of probability,

p X (x _k ) ≥ 0 , P

k p X (x _k ) = 1 Important discrete random variables:

◮ Bernoulli: Ω = {0, 1}, p(1) = p, p(0) = 1 − p.

◮ Binomial: S _n = P n

k=1 X _k where X _k are independent Bernoulli.

◮ Geometric: p(n) = (1 − p) ⁿ⁻¹ p for n = 1, 2, . . . and 0 ≤ p ≤ 1.

◮ Poisson: p(n) = ^λ

n

n! e ^−λ , where n ≥ 0 and λ ≥ 0.

Solving problems about discrete r.v.s usually requires manipulating sums (combinatorics).

(5)

Cumulative Distribution Function

For continuous random variables p(x) = 0 for all x, so we cannot use pmf.

The cumulative distribution function (cdf or CDF) can describe both discrete and continuous r.v.s.

The CDF of a real-valued r.v. X is defined by

F X (x) = P {X ≤ x} , −∞ ≤ x ≤ ∞ Properties of CDF.

◮ Monotone: if x ₁ ≤ x 2 then F (x ₁ ) ≤ F (x 2 )

◮ Limits: F ( −∞) = lim _x→−∞ F (x) = 0 , F ( ∞) = lim _x→∞ F (x) = 1

◮ Interval: P {a < X ≤ b} = P{X ≤ b} − P{X ≤ a} = F (b) − F (a)

◮ Point: P {X = x} = P{X ≤ x} − P{X < x} = F (x) − F (x ⁻ ), where F (x ⁻ ) = lim _u↑x F (u).

A random variable is continuous if its cdf F (x) is continuous for every x.

Another definition is P{X < x}; this is used by Russian mathematicians.

(6)

Types of CDFs

The CDF of any discrete r.v. is an increasing staircase function.

The CDF of a continous r.v. is a smooth nondecreasing function.

The CDF of a mixed r.v. is continuous between jumps; p(x) > 0 for some x.

“nondecreasing” = “increasing” but not necessarily strictly increasing”.

(7)

Probability Density Function

If X is a continuous r.v., then

P([x ₁ , x ₂ ]) = P {x 1 ≤ X ≤ x 2 } = F ^X (x ₂ ) − F ^X (x ₁ ) . If F _X (x) is differentiable, then

F _X (x 2 ) − F X (x 1 ) = Z x

₂

x

₁

p _x (u) du , where p X (x) = dF _X dx

We call p X (x) is the probability density function (pdf, PDF) of X; p X (x) is the probability per unit width of a narrow interval around x.

(8)

Properties of PDF

◮ Nonnegative: p(x) ≥ 0, since F (x) is increasing.

◮ CDF is the antiderivative of the PDF.

F (x) = Z _x

−∞

p(u) du

◮ Impulses: if P {X = x 0 } = p 0 > 0 then p _X (x) = p ₀ δ(x − x 0 ).

◮ Mixed r.v.: if F (x) is differentiable except at discrete points {x k }, then p(x) = ˜ p(x) + X

k

p _k δ(x − x k ) where ˜ p(x) is a nonnegative continuous function and

Z _∞

−∞

˜

p(x) dx = 1 − X

k

p _k

Most books use f

_X

(x) for pdf and p

_X

(x) for pmf.

(9)

Statistics of Random Variables

The complete description of a random variable is its CDF, which specifies probabilities of all intervals, e.g., X > x ₀ .

To compare two r.v.s we often need single numbers (statistics) associated with each random variable. The most common statistics are:

◮ Mean: (average, expected value):

X = E(X) = Z _∞

−∞

xp(x) dx or

∞

X

n=−∞

x n p(x n )

◮ Second moment:

E(X ² ) = Z ∞

−∞

x ² p(x) dx or

∞

X

n=−∞

x ² _n p(x _n )

◮ Variance:

Var(X) = E((X − X) ² ) = E(X ² ) − (E(X)) ²

◮ Median: The median X _med is the value satisfying P {X < X med } = P{X > X med }

(10)

Examples of Continuous Random Variables

Uniform random variable has a constant density on an interval.

We write X ∼ Unif[a, b] if p ^X (x) is constant on [a, b] and 0 elsewhere.

p _X (x; a, b) =





 1

b − a a ≤ x ≤ b 0 x < a or x > b

Examples of uniform random variables are final position of roulette wheel or quantization error.

E(X) = Z b

a

x

b − a dx = x ² 2(b − a)

b

a

= b ² − a ²

2(b − a) = b + a 2 E(X ² ) =

Z b a

x ²

b − a dx = x ³ 3(b − a)

b

a

= b ³ − a ³

3(b − a) = b ² + ba + a ² 3 Var(X) = b ² + ba + a ²

3 − b ² + 2ba + a ²

4 = b ² − 2ba + a ²

12 = (b − a) ² 12

(11)

Examples of Continuous Random Variables (cont.)

Exponential random variable has one parameter λ.

f (x; λ) =

( λe ^−λx x ≥ 0 0 x < 0 The CDF for x ≥ 0 is

F (x; λ) = Z x

−∞

λe ^−λu du = Z x

0 λe ^−λu du = e ^−λu

x

0 = 1 − e ^−λx The mean is

Z ∞ 0

xλe ^−λx dx = −xe ^−λx

∞ 0 −

Z ∞ 0

( −e ^λx ) = 1 λ The variance (integrating by parts twice) is

Z ∞ 0

(x − λ) ² λe ^−λx dx = 1 λ ²

(12)

Examples of Continuous Random Variables (cont.)

Gaussian random variable has two parameters µ and σ. Its pdf is N (x; µ, σ ² ) = 1

√ 2πσ ² exp

− (x − µ) ² σ ²

The Gaussian PDF is centered at and has maximum value at x = µ.

The mean is µ (obvious) and the variance is σ ² . The inflection points of the density graph are at ±σ.

The density decreases faster than exponentially as x → ±∞.

(13)

Joint Random Variables

In communication systems we usually have two random signals defined on the sample sample space:

◮ Transmitted signal x(t)

◮ Received signal y(t).

For times t ₁ and t ₂ , the values x(t ₁ ) and y(t ₂ ) are joint random variables.

Joint r.v.s are characterized by a joint CDF:

F XY (x, y) = P {X ≤ x, Y ≤ y}

= P {left lower quadrant bounded by (x, y)}

If X and Y are jointly continuous, their joint PDF is given by p _XY (x, y) = ∂ ²

Introduction to Probability

Introduction to Probability

EE 179, Lecture 15, Handout #24

◮ Probability theory gives a mathematical characterization for experiments with random outcomes.

coin toss

life of lightbulb

binary data sequence

Brownian motion

◮ An event is a set of outcomes belonging to a sample space.

◮ Events must be repeatable and have statistical regularity, i.e. a large number of experiments have regularity in outcome patterns.

◮ We define the probability of an event A as the average number of times that the outcome belongs to A in the limit for large n:

P(A) = lim

n→∞

number of times outcome is in A n

◮ Examples:

P(roulette wheel outcome is red) =

P(rain tomorrow) ≈

(bogus)

Mathematics of Probability: Axiomatic Approach

◮ Random events are defined on a probability space:

sample space S of possible outcomes (finite or infinite)

family (set) of events {A

} that are subsets of S

probability measure P( ·) on events

◮ The probability measure has three properties:

P(A) ≥ 0

P(S) = 1

if A

∩ A

is empty, then P ( ∪

A

) = P

P(A

)

◮ We formally write probability space as triple (S, {A i }, P(·)}

◮ A very common notation for the sample space is Ω, and a generic outcome is ω

◮ This axiomatic approach was introduced by Kolmogorov around 1930.

Probability has been used for thousand of years. Proverbs 16:32: “We may throw the dice, but the Lord determines how they fall.”

Random Variables

◮ Practical definition of random variable: the numerical output of a probabilistic experiment.

coin flip (tails=0, heads=1) or sum of two dice (2, 3, . . . , 12)

amount of snow fall at a location over a duration

noise voltage at an instant or integral of noise over an interval

◮ Mathematical definition: a real-valued function defined on the sample space of a probability space.

X : Ω 7→ R ⇒ X(ω) ∈ R for every ω ∈ Ω Examples:

Sample space for toss of two dice is {i, j : 1 ≤ i, j ≤ 6}. The sum is the r.v. i + j.

For the BSC, the input is the r.v. X = x and the output is Y = y.

We have derived the joint probability distribution for X and Y .

◮ If the values of a r.v. are discrete, the r.v. is called discrete. Otherwise the r.v. is continuous or mixed.

Probability Mass Function

The probability distribution of a discrete random is complete described by its probality mass function (pmf or PMF).

P {X = x k } = p X (x k ) , where values of X are {x k } In this special of the axioms of probability,

p X (x k ) ≥ 0 , P

k p X (x k ) = 1 Important discrete random variables:

◮ Bernoulli: Ω = {0, 1}, p(1) = p, p(0) = 1 − p.

◮ Binomial: S n = P n

k=1 X k where X k are independent Bernoulli.

◮ Geometric: p(n) = (1 − p) n−1 p for n = 1, 2, . . . and 0 ≤ p ≤ 1.

◮ Poisson: p(n) = λ

n! e −λ , where n ≥ 0 and λ ≥ 0.

Solving problems about discrete r.v.s usually requires manipulating sums (combinatorics).

Cumulative Distribution Function

For continuous random variables p(x) = 0 for all x, so we cannot use pmf.

The cumulative distribution function (cdf or CDF) can describe both discrete and continuous r.v.s.

The CDF of a real-valued r.v. X is defined by

F X (x) = P {X ≤ x} , −∞ ≤ x ≤ ∞ Properties of CDF.

◮ Monotone: if x 1 ≤ x 2 then F (x 1 ) ≤ F (x 2 )

◮ Limits: F ( −∞) = lim x→−∞ F (x) = 0 , F ( ∞) = lim x→∞ F (x) = 1

◮ Interval: P {a < X ≤ b} = P{X ≤ b} − P{X ≤ a} = F (b) − F (a)

◮ Point: P {X = x} = P{X ≤ x} − P{X < x} = F (x) − F (x − ), where F (x − ) = lim u↑x F (u).

A random variable is continuous if its cdf F (x) is continuous for every x.

Another definition is P{X < x}; this is used by Russian mathematicians.

Types of CDFs

The CDF of any discrete r.v. is an increasing staircase function.

The CDF of a continous r.v. is a smooth nondecreasing function.

The CDF of a mixed r.v. is continuous between jumps; p(x) > 0 for some x.

“nondecreasing” = “increasing” but not necessarily strictly increasing”.

Probability Density Function

If X is a continuous r.v., then

P([x 1 , x 2 ]) = P {x 1 ≤ X ≤ x 2 } = F X (x 2 ) − F X (x 1 ) . If F X (x) is differentiable, then

◮ We formally write probability space as triple (S, {A ⁱ }, P(·)}

P {X = x k } = p X (x _k ) , where values of X are {x k } In this special of the axioms of probability,

p X (x _k ) ≥ 0 , P

k p X (x _k ) = 1 Important discrete random variables:

◮ Binomial: S _n = P n

k=1 X _k where X _k are independent Bernoulli.

◮ Geometric: p(n) = (1 − p) ⁿ⁻¹ p for n = 1, 2, . . . and 0 ≤ p ≤ 1.

◮ Poisson: p(n) = ^λ

n! e ^−λ , where n ≥ 0 and λ ≥ 0.

◮ Monotone: if x ₁ ≤ x 2 then F (x ₁ ) ≤ F (x 2 )

◮ Limits: F ( −∞) = lim _x→−∞ F (x) = 0 , F ( ∞) = lim _x→∞ F (x) = 1

◮ Point: P {X = x} = P{X ≤ x} − P{X < x} = F (x) − F (x ⁻ ), where F (x ⁻ ) = lim _u↑x F (u).

P([x ₁ , x ₂ ]) = P {x 1 ≤ X ≤ x 2 } = F ^X (x ₂ ) − F ^X (x ₁ ) . If F _X (x) is differentiable, then

F _X (x 2 ) − F X (x 1 ) = Z x

p _x (u) du , where p X (x) = dF _X dx

F (x) = Z _x

◮ Impulses: if P {X = x 0 } = p 0 > 0 then p _X (x) = p ₀ δ(x − x 0 ).

p _k δ(x − x k ) where ˜ p(x) is a nonnegative continuous function and

Z _∞

p _k

The complete description of a random variable is its CDF, which specifies probabilities of all intervals, e.g., X > x ₀ .

X = E(X) = Z _∞

E(X ² ) = Z ∞

x ² p(x) dx or

x ² _n p(x _n )

Var(X) = E((X − X) ² ) = E(X ² ) − (E(X)) ²

◮ Median: The median X _med is the value satisfying P {X < X med } = P{X > X med }

We write X ∼ Unif[a, b] if p ^X (x) is constant on [a, b] and 0 elsewhere.

p _X (x; a, b) =

b − a dx = x ² 2(b − a)

= b ² − a ²

2(b − a) = b + a 2 E(X ² ) =

x ²

b − a dx = x ³ 3(b − a)

= b ³ − a ³