Let (Ω,F, P) be a probability measure space and (R,B) a measurable space. A random variableX is anF/B measurable functionX : Ω→R. That is,X(ω) induces an inverse mapping from B to F such that X−1(B) ∈ F for every B ∈ B, where B is the linear
Borel field. The symbol µ will denote a probability measure on the real line, while P is used for the probability measure on the underlying space Ω. The following theorem relates P and µ.
Theorem 3.1 (Theorem 3.1.3 in [31]) Each random variable on the probability space
(Ω,F, P) induces a probability space (R,B, µ) by means of the following correspondence: µ(B) =P X−1(B) =P(X−1(B)) =P(ω :X(ω)∈B), ∀B∈ B.
u t
The measure µ, induced by X, is called the probability distribution or law, and has an associated distribution function FX given by
FX(x) =µ((−∞, x]) =P(ω :X(ω)≤x).
If X is a r.v. on (Ω,F, P) which induces the space (R,B, µ) and g : R → R is a Borel function, theng◦X(ω) =g(X(ω)) is a random variable on the probability space (R,B, µg−1). The distribution of g(X) isµg−1 with
µg−1(A) =µ(g−1A) =P(ω:g(X(ω))∈A) =P(ω:X(ω)∈g−1A).
We now define the integral of a measurable function and present some properties of integrals which are essential to define the expectation of functions of random variables. Let φ denote a real measurable function on the probability space (Ω,F, P). If φ is nonnegative, the integral ofφ with respect to the measureP is defined as follows:
Z Ω φ(ω)dP(ω) = supX i inf ω∈Λi φ(ω) P(Λi),
where the supremum extends over all finite decompositions {Λi} of Ω intoF-sets. For a general functionφ, define its positive part,φ+, and negative part,φ− as follows
φ+(ω) = φ(ω), 0≤φ(ω)≤ ∞ 0, −∞ ≤φ(ω)≤0 , φ−(ω) = −φ(ω), −∞ ≤φ(ω)≤0 0, 0≤φ(ω)≤ ∞ ,
so that φ=φ+−φ−. The general integral is defined by
Z Ω φ(ω)dP(ω) = Z Ω φ+(ω)dP(ω)− Z Ω φ−(ω)dP(ω).
For a set Λ∈ F, the integral ofφover Λ is defined by Z Λ f(ω)dP(ω) = Z Ω 1ω∈Λ·φ(ω)dP(ω),
where 1ω∈Λ is the indicator function of the set Λ. Givenδ is a nonnegative measurable
function on the measure space (Ω,F, P), a measure ν defined by ν(Λ) =
Z
Λ
δ(ω)dP(ω), Λ∈ F
is said to have density δ with respect to P. A random variable X on (Ω,F, P) and its distributionµhave densityf with respect to the Lebesgue measureλiff is a nonnegative Borel function on Rand
P(ω:X(ω)∈A) =µ(A) =
Z A
f(x)dx, A∈R.
For any random variable the density is assumed to be with respect to the Lebesgue measure λif no other measure is specified. The density f and distribution functionFX
of a random variableX are related by the following Lebesgue integral F(x) =
Z x −∞
f(t)dt.
The following theorem presents important relations involving integration and the density of a measure.
Theorem 3.2 (Theorem 16.11 in [19]) If ν has density δ with respect to P, then Z Ω φ(ω)dν(ω) = Z Ω φ(ω)δ(ω)dP(ω), (3.2.1)
holds for nonnegative φ. Moreover, φ, not necessarily nonnegative, is integrable with respect to ν if and only ifφδ is integrable with respect to P, in which case (3.2.1) and
Z Λ φ(ω)dν(ω) = Z Λ φ(ω)δ(ω)dP(ω), both hold. ut
We now address change of variables by a mapping and integration. Let (Ω,F) and (Ω0,F0) be measurable spaces and T : Ω → Ω0 a F/F0 measurable mapping. For a measure P
on F, P T−1 defines a measure onF0 given byP T−1(Λ0) =P(T−1Λ0), for Λ0 ∈ F0. The
following theorem gives change of variable formulas for integration. Theorem 3.3 (Theorem 16.13 in [19]) If φ is nonnegative, then
Z Ω φ(T ω)P(dω) = Z Ω0 φ(ω0)P T−1(dω0). (3.2.2)
A function φ, not necessarily nonnegative, is integrable with respect to P T−1 if and only
if φT is integrable with respect to P, in which case (3.2.2) and Z T−1Λ0 φ(T ω)P(dω) = Z Λ0 φ(ω0)P T−1(dω0), hold. ut
We can now use all the concepts of integration to define expectation. The expected value of a random variable X on (Ω,F, P) is the integral ofX with respect to the measureP:
E[X] =
Z
Ω
X(ω)dP(ω). For each Λ in F, the truncated expectation is given by
E[X(ω)·1ω∈Λ] =
Z
Λ
X(ω)dP(ω). (3.2.3)
The following assumptions are made in the theorem that follows which shows different representations of the expectation.
Assumption 3.1 The r.v. X on (Ω,F, P) induces the probability space (R,B, µ). Assumption 3.2 g:R→Ris a Borel function so that g(X) is a r.v. on (R,B, µg−1).
The following theorem shows the dual characterization of the expectation of a function. Theorem 3.4 (Theorem 3.2.2 in [31]) Under assumptions 3.1 and 3.2
E[g(X)] = Z Ω g(X(ω))dP(ω) = Z R g(x)dµ(x). (3.2.4)
u t
(3.2.4) follows directly from theorem 3.3, replacing T : ω → Ω0 with X : Ω → R, φ by g, setting ω0 = x, and noting P X−1(dω0) = µ(dx) = dµ(x). Furthermore, under assumptions 3.1 and 3.2 and if X has density f with respect to the Lebesgue measure, we have E[g(X)] = Z R g(x)dµ(x) = Z R g(x)f(x)dλ= Z ∞ −∞ g(x)f(x)dx. (3.2.5) (3.2.5) follows from theorem 3.2 by replacingν withµ, P withλ, ω withx, φ withg, Ω withR and δ with f. If X has distribution function FX with continuous derivatives we have dFX(x) =f(x)dx and E[g(X)] = Z ∞ −∞ g(x)f(x)dx= Z ∞ −∞ g(x)dFX(x).
We now extend the results and definitions to multiple random variables. In Rk, thek-dimensional Borel field Bk is σ(Rk), whereRk denotes the measurable rectangles,
B1×B2× · · · ×Bk whereBi ∈ B fori= 1, . . . , k, ofRk. We call a measurable mapping
X into Rk, X : Ω → Rk a random vector on the space (Ω,F, P) and write X(ω) = (X1(ω), . . . , Xk(ω))>. X is measurable F if and only if each component mapping Xi is
measurableF. For ak-dimensional random vector X = (X1, . . . , Xk)>, the distribution
µ, which is a probability measure onBk, and the distribution function are given by µ(A) =P(ω: (X1(ω), . . . , Xk(ω))∈A), A∈ Bk,
F(x1, . . . , xk) =P(ω :X1(ω)≤x1, . . . , Xk(ω)≤xk) =µ(Sx),
where Sx ={y : yi ≤xi, i = 1, . . . , k}. A random vector X and its distribution µ have
density f with respect to the k-dimensional Lebesgue measure λ if f is a nonnegative Borel function on Rk and
P(ω :X(ω)∈A) =µ(A) =
Z A
IfX is ak-dimensional random vector with distributionµandg:Rk→Riis measurable, then g(X) is an i-dimensional random vector with distribution µg−1. If gj : Rk → R
is defined by gj(x1, . . . , xk) = xj, it follows gj(X) = Xj has distribution µj = µg−j1
given by µj(A) = µ[(x1, . . . , xk) : xj ∈ A] = P(ω : Xj(ω) ∈ A), for A ∈ R. The µj
are referred to as the marginal distributions ofµ. Ifµhas density f with respect to the k-dimensional Lebesgue measure, µj has density fj with respect to the one dimensional
Lebesgue measure given by fj(x) =
Z Rk−1
f(x1, . . . , xj−1, x, xj+1, . . . , xk)dx1· · ·dxj−1dxj+1· · ·dxk.
The random variables X1, . . . , Xk are defined to be independent if the σ-fields they
generate σ(X1),. . . ,σ(Xk) are independent. X1, . . . , Xk are independent if and only
if P(X1 ∈ H1, . . . , Xk ∈ Hk) = P(X1 ∈ H1)· · ·P(Xk ∈ Hk), and if and only if
P(X1≤x1, . . . , Xk≤xk) =P(X1 ≤x1)· · ·P(Xk ≤xk).
Given the random vector (X1, . . . , Xk) with distributionµ having densityf and dis-
tribution function F and each Xi with marginal distribution µi having density fi and
marginal distribution function Fi, X1, . . . , Xk are independent if and only if µ is the
product measure withµ=µ1× · · · ×µk, if and only ifF(x1, . . . , xk) =F1(x1)· · ·Fk(xk),
and if and only if f(x) = f1(x1)· · ·fk(xk). For Borel measurable function g : Rk → R
with g−1(B) ∈ Bk for every B ∈ B, h(ω) = g(X
1(ω),· · · , Xk(ω)) is a F/B measurable
r.v. and we have the expectation
E[g(X1(ω), . . . , Xk(ω))] = Z
Ω
h(ω)dP(ω). Similarly, applying theorem 3.3,
E[g(X1(ω), . . . , Xk(ω))] = Z
Rk
g(x1, . . . , xk)dµ(x1,· · ·, xk),
and ifµhas densityf with respect to thek-dimensional Lebesgue measureλby theorem 3.2 and Fubini’s theorem
E[g(X1(ω), . . . , Xk(ω))] = Z ∞ −∞· · · Z ∞ −∞ g(x1, . . . , xk)f(x1, . . . , xk)dx1· · ·dxk.