• No results found

1.1 Some Important Probability Definitions and Facts

1.1.10 Transformations of Random Variables

We often need to determine the distribution of some transformation of a given random variable or a set of random variables. In the simplest case, we have a random variableX with known distribution and we want to determine the distribution ofY =h(X), wherehis a full-rank transformation; that is, there is a functionh−1such thatX =h−1(Y). In other cases, the function may not be full-rank, for example,X may be ann-vector, andY =Pni=1Xi.

There are three general approaches to the problem: the method of CDFs; the method of direct change of variables, including convolutions; and the method of CFs or MGFs. Sometimes one method works best, and other times some other method works best.

Method of CDFs

GivenX with known CDFFX and Y =h(X) is invertible as above, we can write the CDFFY ofY as

FY(y) = Pr(Y ≤y) = Pr(h(X)≤y) = Pr(X≤h−1(y)) =FX(h−1(y)).

Example 1.11 distribution of the minimum order statistic in a two- parameter exponential distribution

Consider a shifted version of the exponential family of distributions, called the two-parameter exponential with parameter (α, θ). Suppose the random variablesX1, . . . , Xn are iid with Lebesgue PDF

pα,θ(x) =θ−1e−(x−α)/θI]α,∞[(x).

We want to find the distribution of the minimum of {X1, . . . , Xn}. Let us denote that minimum by Y. (This is an order statistic, for which we use a general notation,Y =X(1). We discuss distributions of order statistics more fully in Section1.1.12.) We have

Pr(Y t) = 1Pr(Y > t) = 1−Pr(Xi > tfori= 1, . . . , n) = 1(Pr(Xi> t∀Xi))n = 1−(1−Pr(Xi≤t∀Xi))n = 1(e−(t−α)/θ)n = 1e−n(t−α)/θ.

This is the CDF for a two-parameter exponential distribution with param- eters α and θ/n. If instead of a two-parameter exponential distribution, we began with the more common one-parameter exponential distribution with parameter θ, the distribution of Y would be the one-parameter exponential distribution with parameterθ/n.

Example 1.12 distribution of the square of a continuous random variable

Given X with CDF FX and Lebesgue PDF fX. Let Y = X2. For x < 0, Y−1[] −∞, x]] =, andY−1[] −∞, x]] =X−1[[ −√x,√x]], otherwise. Therefore the CDFFY ofY is FY(x) = Pr(Y−1[]− ∞, x]]) = Pr(X−1[[√x,√x]]) = (FX(√x)−FX(−√x))I¯IR+(x). Differentiating, we have the Lebesgue PDF ofY:

fY(x) = 1 2√x(fX(

Method of Change of Variables

IfX has densitypX(x|α, θ) andY =h(X), wherehis a full-rank transforma- tion (that is, there is a functionh−1such thatX =h−1(Y)), then the density ofY is

pY(y|α, θ) =pX h−1(y)|α, θ|Jh−1(y)|, (1.124)

where Jh−1(y) is the Jacobian of the inverse transformation, and | · | is the

determinant.

Constant linear transformations are particularly simple. IfXis ann-vector random variable with PDFfX andAis ann×nconstant matrix of full rank, the PDF ofY =AX isfX|det(A−1)|.

In the change of variable method, we think ofhas a mapping of the range X of the random variableX to the rangeY of the random variableY, and the method works by expressing the probability content of small regions inY in terms of the probability content of the pre-image of those regions in X.

For a given functionh, we often must decomposeX into disjoint sets over each of which his one-to-one.

Example 1.13 distribution of the square of a standard normal ran- dom variable

Suppose X N(0,1), and letY =h(X) =X2. The functionhis one-to-one over ]− ∞,0[ and, separately, one-to-one over [0,−∞[. We could, of course, determine the PDF ofY using equation (1.123), but we will use the change- of-variables technique.

The absolute value of the Jacobian of the inverse over both regions is x−1/2. The Lebesgue PDF ofX is fX(x) = 1 √ 2πe −x2/2 ; hence, the Lebesgue PDF of Y is

fY(y) = √1 2πy

−1/2e−y/2I ¯

IR+(y). (1.125)

This is the PDF of a chi-squared random variable with one degree of freedom χ2

1(see TableA.3).

Sums

A simple application of the change of variables method is in the common situation of finding the distribution of the sum of two scalar random variables that are independent but not necessarily identically distributed.

SupposeX is ad-variate random variable with PDFfX,Y is ad-variate random variable with PDFfY, andX andY are independent. We want the density ofU =X+Y. We form another variableV =Y and the matrix

A= Id Id 0Id ,

so that we have a full-rank transformation, (U, V)T=A(X, Y)T The inverse of the transformation matrix is

A−1= Id −Id 0 Id ,

and the Jacobian is 1. Because X and Y are independent, their joint PDF is fXY(x, y) = fX(x)fY(y), and the joint PDF of U and V is fU V(u, v) = fX(u−v)fY(v); hence, the PDF ofU is

fU(u) =RIRdfX(u−v)fY(v)dv =RIRdfY(u−v)fX(v)dv.

(1.126)

We call fU the convolution of fX and fY. The commutative operation of convolution occurs often in applied mathematics, and we denote it by fU = fX? fY. We often denote the convolution of a functionf with itself byf(2); hence, the PDF of X1+X2 whereX1, X2 are iid with PDFfX isfX(2). From equation (1.126), we see that the CDF ofU is the convolution of the CDF of one of the summands with the PDF of the other:

FU =FX? fY =FY ? fX. (1.127) In the literature, this operation is often referred to as the convolution of the two CDFs, and instead of as in equation (1.127), may be written as

FU =FX? FY.

Note the inconsistency in notation. The symbol “?” is overloaded.Following the latter notation, we also denote the convolution of the CDF F with itself asF(2).

Example 1.14 sum of two independent Poissons

SupposeX1is distributed as Poisson(θ1) andX2is distributed independently as Poisson(θ2). By equation (1.126), we have the probability function of the sum U =X1+X2 to be fU(u) = u X v=0 θu1−veθ1 (u−v)! θv 2eθ2 v! = 1 u!e θ1+θ2 1+θ2)u.

Table 1.1. Distributions of the Sums of Independent Random Variables Distributions ofXi fori= 1, . . . , k Distribution ofP Xi Poisson(θi) Poisson(Pθi) Bernoulli(π) binomial(k, π) binomial(ni, π) binomial(Pni, π)

geometric(π) negative binomial(k, π)

negative binomial(ni, π) negative binomial(Pni, π) normal(µi, σ2i) normal(Pµi,Pσ2i)

exponential(β) gamma(k, β)

gamma(αi, β) gamma(Pαi, β)

The property shown in Example1.14obviously extends tok independent Poissons. Other common distributions also have this kind of property for sums, as shown in Table 1.1. For some families of distributions such as binomial, negative binomial, and gamma, the general case is the sum of special cases.

The additive property of the gamma distribution carries over to the special cases: the sum ofkiid exponentials with parameterθis gamma(k, θ) and the sum of independent chi-squared variates withν1, . . . , νk degrees of freedom is distributed asχ2P

νi.

The are other distributions that could be included in Table 1.1 if the parameters met certain restrictions, such as being equal; that is, the random variables in the sum are iid.

In the case of the inverse Gaussian(µ, λ) distribution, a slightly weaker restriction than iid allows a useful result on the distribution of the sum so long as the parameters have a fixed relationship. IfX1, . . . , Xkare independent and Xiis distributed as inverse Gaussian(µ0αi, λ0α2i), then

P

Xi is distributed as inverse Gaussian(µ0Pαi, λ0(Pαi)2).

Products

Another simple application of the change of variables method is for finding the distribution of the product or the quotient of two scalar random variables that are independent but not necessarily identically distributed.

SupposeXis a random variable with PDFfX andY is a random variable with PDF fY and X and Y are independent, and we want the density of the product U =XY. As for the case with sums, we form another variable V =Y, form the joint distribution ofUandV using the Jacobian of the inverse transformation, and finally integrate out V. Analogous to equation (1.126), we have

fU(u) = Z ∞

−∞

fX(u/v)fY(v)v−1dv, (1.128) and for the quotient W =X/Y, we have

fW(w) = Z ∞

−∞

fX(wv)fY(v)vdv. (1.129) Example 1.15 the F family of distributions

SupposeY1andY2are independent chi-squared random variables withν1and ν2 degrees of freedom respectively. We want to find the distribution ofW = Y1/Y2. Along with the PDFs of chi-squared random variables, equation (1.129) yields

fW(w)∝wν1/2−1 Z ∞

0

v(ν1+ν2)/2−1e−(w+1)v/2dv.

This integral can be evaluated by making the change of variablesz= (w+1)v. After separating out the factors involvingw, the remaining integrand is the PDF of a chi-squared random variable withν1+ν2−1 degrees of freedom. Finally, we make one more change of variables:F =W ν2/ν1. This yields

fF(f)∝ f ν1/2−1

(ν2+ν1f)(ν1+ν2)/2

. (1.130)

This is the PDF of an F random variable withν1 andν2 degrees of freedom (see Table A.3). It is interesting to note that the mean of such a random variable depends only on ν2.

Method of MGFs or CFs

In this method, for the transformationY =h(X) we write the MGF ofY as EetTY= EetTh(X), or we write the CF in a similar way. If we can work out the expectation (with respect to the known distribution of X), we have the MGF or CF ofY, which determines its distribution.

The MGF or CF technique is particularly useful in the case whenY is the sum from a simple random sample. If

Y =X1+· · ·+Xn,

where X1, . . . , Xn are iid with CF ϕX(t), we see from the linearity of the expectation operator that the CF of Y is

ϕY(t) = (ϕX(t))n. (1.131) We use this approach in the proof of the simple CLT, Theorem1.38on page87. The MGF or CF for a linear transformation of a random variable has a simple relationship to the MGF or CF of the random variable itself, as we can easily see from the definition. LetX be a random variable in IRd,Abe a d×mmatrix of constants, andb be a constantm-vector. Now let

Y =ATX+b. Then

ϕY(t) = eib

Tt

Example 1.16 distribution of the sum of squares of independent standard normal random variables

Suppose X1, . . . , Xn iid ∼N(0,1), and let Y =PX2 i. In Example1.13, we saw that Yi=Xi2 d =χ2

1. Because theXi are iid, the Yi are iid. Now the MGF of a χ2 1 is E etYi= Z ∞ 0 1 √ 2πy −1/2e−y(1−2t)/2dy = (1−2t)−1/2 for t < 1 2.

Hence, the MGF ofY is (1−2t)n/2fort <1/2, which is seen to be the MGF of a chi-squared random variable withndegrees of freedom.

This is a very important result for applications in statistics.