Introduction to Probability
EE 179, Lecture 15, Handout #24
◮ Probability theory gives a mathematical characterization for experiments with random outcomes.
◮
coin toss
◮
life of lightbulb
◮
binary data sequence
◮
Brownian motion
◮ An event is a set of outcomes belonging to a sample space.
◮ Events must be repeatable and have statistical regularity, i.e. a large number of experiments have regularity in outcome patterns.
◮ We define the probability of an event A as the average number of times that the outcome belongs to A in the limit for large n:
P(A) = lim
n→∞
number of times outcome is in A n
◮ Examples:
◮
P(roulette wheel outcome is red) =
1838◮
P(rain tomorrow) ≈
36557(bogus)
EE 179, May 2, 2014 Lecture 15, Page 1
Mathematics of Probability: Axiomatic Approach
◮ Random events are defined on a probability space:
◮
sample space S of possible outcomes (finite or infinite)
◮
family (set) of events {A
i} that are subsets of S
◮
probability measure P( ·) on events
◮ The probability measure has three properties:
◮
P(A) ≥ 0
◮
P(S) = 1
◮
if A
i∩ A
jis empty, then P ( ∪
iA
i) = P
i
P(A
i)
◮ We formally write probability space as triple (S, {A i }, P(·)}
◮ A very common notation for the sample space is Ω, and a generic outcome is ω
◮ This axiomatic approach was introduced by Kolmogorov around 1930.
Probability has been used for thousand of years. Proverbs 16:32: “We may throw the dice, but the Lord determines how they fall.”
EE 179, May 2, 2014 Lecture 15, Page 2
Random Variables
◮ Practical definition of random variable: the numerical output of a probabilistic experiment.
◮
coin flip (tails=0, heads=1) or sum of two dice (2, 3, . . . , 12)
◮
amount of snow fall at a location over a duration
◮
noise voltage at an instant or integral of noise over an interval
◮ Mathematical definition: a real-valued function defined on the sample space of a probability space.
X : Ω 7→ R ⇒ X(ω) ∈ R for every ω ∈ Ω Examples:
◮
Sample space for toss of two dice is {i, j : 1 ≤ i, j ≤ 6}. The sum is the r.v. i + j.
◮
For the BSC, the input is the r.v. X = x and the output is Y = y.
We have derived the joint probability distribution for X and Y .
◮ If the values of a r.v. are discrete, the r.v. is called discrete. Otherwise the r.v. is continuous or mixed.
EE 179, May 2, 2014 Lecture 15, Page 3
Probability Mass Function
The probability distribution of a discrete random is complete described by its probality mass function (pmf or PMF).
P {X = x k } = p X (x k ) , where values of X are {x k } In this special of the axioms of probability,
p X (x k ) ≥ 0 , P
k p X (x k ) = 1 Important discrete random variables:
◮ Bernoulli: Ω = {0, 1}, p(1) = p, p(0) = 1 − p.
◮ Binomial: S n = P n
k=1 X k where X k are independent Bernoulli.
◮ Geometric: p(n) = (1 − p) n−1 p for n = 1, 2, . . . and 0 ≤ p ≤ 1.
◮ Poisson: p(n) = λ
n
n! e −λ , where n ≥ 0 and λ ≥ 0.
Solving problems about discrete r.v.s usually requires manipulating sums (combinatorics).
EE 179, May 2, 2014 Lecture 15, Page 4
Cumulative Distribution Function
For continuous random variables p(x) = 0 for all x, so we cannot use pmf.
The cumulative distribution function (cdf or CDF) can describe both discrete and continuous r.v.s.
The CDF of a real-valued r.v. X is defined by
F X (x) = P {X ≤ x} , −∞ ≤ x ≤ ∞ Properties of CDF.
◮ Monotone: if x 1 ≤ x 2 then F (x 1 ) ≤ F (x 2 )
◮ Limits: F ( −∞) = lim x→−∞ F (x) = 0 , F ( ∞) = lim x→∞ F (x) = 1
◮ Interval: P {a < X ≤ b} = P{X ≤ b} − P{X ≤ a} = F (b) − F (a)
◮ Point: P {X = x} = P{X ≤ x} − P{X < x} = F (x) − F (x − ), where F (x − ) = lim u↑x F (u).
A random variable is continuous if its cdf F (x) is continuous for every x.
Another definition is P{X < x}; this is used by Russian mathematicians.
EE 179, May 2, 2014 Lecture 15, Page 5
Types of CDFs
The CDF of any discrete r.v. is an increasing staircase function.
The CDF of a continous r.v. is a smooth nondecreasing function.
The CDF of a mixed r.v. is continuous between jumps; p(x) > 0 for some x.
“nondecreasing” = “increasing” but not necessarily strictly increasing”.
EE 179, May 2, 2014 Lecture 15, Page 6
Probability Density Function
If X is a continuous r.v., then
P([x 1 , x 2 ]) = P {x 1 ≤ X ≤ x 2 } = F X (x 2 ) − F X (x 1 ) . If F X (x) is differentiable, then
F X (x 2 ) − F X (x 1 ) = Z x
2x
1p x (u) du , where p X (x) = dF X dx
We call p X (x) is the probability density function (pdf, PDF) of X; p X (x) is the probability per unit width of a narrow interval around x.
EE 179, May 2, 2014 Lecture 15, Page 7
Properties of PDF
◮ Nonnegative: p(x) ≥ 0, since F (x) is increasing.
◮ CDF is the antiderivative of the PDF.
F (x) = Z x
−∞
p(u) du
◮ Impulses: if P {X = x 0 } = p 0 > 0 then p X (x) = p 0 δ(x − x 0 ).
◮ Mixed r.v.: if F (x) is differentiable except at discrete points {x k }, then p(x) = ˜ p(x) + X
k
p k δ(x − x k ) where ˜ p(x) is a nonnegative continuous function and
Z ∞
−∞
˜
p(x) dx = 1 − X
k
p k
Most books use f
X(x) for pdf and p
X(x) for pmf.
EE 179, May 2, 2014 Lecture 15, Page 8
Statistics of Random Variables
The complete description of a random variable is its CDF, which specifies probabilities of all intervals, e.g., X > x 0 .
To compare two r.v.s we often need single numbers (statistics) associated with each random variable. The most common statistics are:
◮ Mean: (average, expected value):
X = E(X) = Z ∞
−∞
xp(x) dx or
∞
X
n=−∞
x n p(x n )
◮ Second moment:
E(X 2 ) = Z ∞
−∞
x 2 p(x) dx or
∞
X
n=−∞
x 2 n p(x n )
◮ Variance:
Var(X) = E((X − X) 2 ) = E(X 2 ) − (E(X)) 2
◮ Median: The median X med is the value satisfying P {X < X med } = P{X > X med }
EE 179, May 2, 2014 Lecture 15, Page 9
Examples of Continuous Random Variables
Uniform random variable has a constant density on an interval.
We write X ∼ Unif[a, b] if p X (x) is constant on [a, b] and 0 elsewhere.
p X (x; a, b) =
1
b − a a ≤ x ≤ b 0 x < a or x > b
Examples of uniform random variables are final position of roulette wheel or quantization error.
E(X) = Z b
a
x
b − a dx = x 2 2(b − a)
b
a
= b 2 − a 2
2(b − a) = b + a 2 E(X 2 ) =
Z b a
x 2
b − a dx = x 3 3(b − a)
b
a
= b 3 − a 3
3(b − a) = b 2 + ba + a 2 3 Var(X) = b 2 + ba + a 2
3 − b 2 + 2ba + a 2
4 = b 2 − 2ba + a 2
12 = (b − a) 2 12
EE 179, May 2, 2014 Lecture 15, Page 10
Examples of Continuous Random Variables (cont.)
Exponential random variable has one parameter λ.
f (x; λ) =
( λe −λx x ≥ 0 0 x < 0 The CDF for x ≥ 0 is
F (x; λ) = Z x
−∞
λe −λu du = Z x
0
λe −λu du = e −λu
x
0 = 1 − e −λx The mean is
Z ∞ 0
xλe −λx dx = −xe −λx
∞ 0 −
Z ∞ 0
( −e λx ) = 1 λ The variance (integrating by parts twice) is
Z ∞ 0
(x − λ) 2 λe −λx dx = 1 λ 2
EE 179, May 2, 2014 Lecture 15, Page 11
Examples of Continuous Random Variables (cont.)
Gaussian random variable has two parameters µ and σ. Its pdf is N (x; µ, σ 2 ) = 1
√ 2πσ 2 exp
− (x − µ) 2 σ 2
The Gaussian PDF is centered at and has maximum value at x = µ.
The mean is µ (obvious) and the variance is σ 2 . The inflection points of the density graph are at ±σ.
The density decreases faster than exponentially as x → ±∞.
EE 179, May 2, 2014 Lecture 15, Page 12
Joint Random Variables
In communication systems we usually have two random signals defined on the sample sample space:
◮ Transmitted signal x(t)
◮ Received signal y(t).
For times t 1 and t 2 , the values x(t 1 ) and y(t 2 ) are joint random variables.
Joint r.v.s are characterized by a joint CDF:
F XY (x, y) = P {X ≤ x, Y ≤ y}
= P {left lower quadrant bounded by (x, y)}
If X and Y are jointly continuous, their joint PDF is given by p XY (x, y) = ∂ 2
∂x∂y F XY (x, y)
EE 179, May 2, 2014 Lecture 15, Page 13
Properties of Joint PDF
◮ P {(X, Y ) ∈ [a, b] × [c, d]} ≥ 0, that is,
F (b, d) − F (a, d) − F (b, c) + F (a, b) ≥ 0
◮ p XY (x, y) ≥ 0, and Z ∞
−∞
Z ∞
−∞
p(x, y) dx dy = 1
EE 179, May 2, 2014 Lecture 15, Page 14