Relations among Random Variables - Some Important Probability Definitions and Facts

1.1 Some Important Probability Definitions and Facts

1.1.4 Relations among Random Variables

In many applications there are two or more random variables of interest. For a given probability space (Ω,_F, P), there may be a collection of random variables,_W. If the random variables have some common properties, for example, if either all are discrete or all are continuous and if all have the same structure, we may identify the collection as a “space”.

If the random variables have some relationship to each other, that is, if they are not independent, we seek useful measures of their dependence. The appropriate measure depends on the nature of their dependence. If they are quadratically related, a measure that is appropriate for linear relationships may be inappropriate, and vice versa.

We will consider two ways of studying the relationships among random variables. The first is based on second-degree moments, called covariances, between pairsof variables, and the other is based on functions that relate the CDF of one variable or one set of variables to the CDFs of other variables. These functions, called copulas, can involve more than single pairs of variables.

Random Variable Spaces

As with any function space, W may have interesting and useful properties. For example, W may be a linear space; that is, for X, Y ∈ W and a ∈ IR, aX+Y _{∈ W}.

The concept of_Lp _{random variable spaces follows immediately from the} general property of function spaces, discussed on page741. for random variables in an_Lp _{random variable space, the}_pth _{absolute is finite.}

The closure of random variable spaces is often of interest. We define various forms of closure depending on types of convergence of a sequence Xn in the space. For example, given any sequence Xn∈ W ifXn →Lr X impliesX ∈ W thenW is closed for therth _moment.

Expectations

We may take expectations of functions of random variables in terms of their joint distribution or in terms of their marginal distributions. To indicate the

distribution used in an expectation, we may use notation for the expectation operator similar to that we use on the individual distribution, as described on page23. Given the random variablesX1andX2, we use the notation EX1

to indicate an expectation taken with respect to the marginal distribution of X1.

We often denote the expectation taken with respect to the joint distribution as simply E, but for emphasis, we may use the notation EX1,X2.

We also use notation of the form EP, whereP denotes the relevant probability distribution of whatever form, or Eθin a parametric family of probability distributions.

Expectations of PDFs and of Likelihoods

If the marginal PDFs of the random variablesX1 and X2 arefX1 andfX2,

we have the equalities EX1 _f X2(X1) fX1(X1) = EX2 _f X1(X2) fX2(X2) = 1. (1.63)

On the other hand,

EX1(−log(fX1(X1)))≤EX1(−log(fX2(X1))), (1.64)

with equality only iffX1(x) =fX2(x) a.e. (see page41).

When the distributions are in the same parametric family, we may write fθ with different values ofθ instead offX1 andfX2. In that case, it is more

natural to think of the functions as likelihoods since the parameter is the variable. From equation (1.63), for example, we have for the likelihood ratio,

Eθ1 _L(θ 2;X) L(θ1;X) = 1. (1.65)

Covariance and Correlation

Expectations are also used to define relationships among random variables. We will first consider expectations of scalar random variables, and then discuss expectations of vector and matrix random variables.

For two scalar random variables, X and Y, useful measures of a linear

relationship between them are the covariance and correlation. The covariance ofX andX, if it exists, is denoted by Cov(X, Y), and is defined as

Cov(X, Y) = E ((X−E(X))(Y −E(Y))) (1.66) From the Cauchy-Schwarz inequality (B.21) (see page853), we see that

ThecorrelationofX andY, written Cor(X, Y), is defined as

Cor(X, Y) = Cov(X, Y).pV(X)V(Y). (1.68) The correlation is also called the correlation coefficient and is often written asρX,Y.

From inequality (1.67), we see that the correlation coefficient is in [−1,1]. IfX andY are independent, then Cov(X, Y) = Cor(X, Y) = 0 (exercise).

Structure of Random Variables

Random variables may consist of individual IR elements arrayed in some structure, such as a vector so that the random variable itself is in IRdor as a matrix so that the random variable is in IRd×m. Many of the properties of random variables are essentially the same whatever their structure, except of course those properties may have structures dependent on that of the random variable.

Multiplication is an operation that depends very strongly on the structure of the operand. Ifxis a scalar,x2_{is a scalar. If}_x_{is a is vector, however, there} are various operations that could be interpreted as extensions of a squaring operation. First, of course, is elementwise squaring. In this interpretation x2 has the same structure as x. Salient relationships among the individual elements ofxmay be lost by this operation, however. Other interpretations are xT_{x, which preserves none of the structure of}_{x, and}_xxT_{, which is in IR}d×d_. The point of this is that what can reasonably be done in the analysis of random variables depends on the structure of the random variables, and such relatively simple concepts as moments require some careful consideration. In many cases, a third-order or higher-order moment is not useful because of its complexity.

Structural Moments

For random variables that have a structure such as a vector or matrix, the elementwise moments are the same as those for a scalar-valued random variable as described above, and hence, the first moment, the mean, has the same structure as the random variable itself.

Higher order moments of vectors and matrices present some problems because the number of individual scalar moments is greater than the number of elements in the random object itself. For multivariate distributions, the higher- order marginal moments are generally more useful than the higher-order joint moments. We define the second-order moments (variances and covariances) for random vectors and for random matrices below.

Definition 1.22 (variance-covariance of a random vector)

The variance-covariance of a vector-valued random variableX is the expectation of the outer product,

V(X) = E (X−E(X))(X −E(X))T, (1.69) if it exists.

For a constant vector, the rank of an outer product is no greater than 1, but unless Xa.s.= E(X), V(X) is nonnegative definite. We see this by forming the scalar random variableY =cT_X _{for any} _c

= 0, and writing 0_≤V(Y)

= E((cTX−cTE(X))2)

= E((cT(X₋E(X))(X ₋E(X))c) =cT_V(X_)c.

(IfX a.s.= E(X), then V(X) = 0, and while it is true thatcT_0c_{= 0}_≥_{0, we do} not say that the 0 matrix is nonnegative definite. Recall further that whenever I write a term such as V(X), I am implicitly assuming its existence.)

Furthermore, if it is not the case that X a.s.= E(X), unless some element Xi of a vectorX is such that

Xia.s.= X

j6=i

(aj+bjXj),

then V(X) is positive definite a.s. To show this, we show that V(X) is full rank a.s. (exercise).

The elements of V(X) are the bivariate moments of the respective elements ofX; the (i, j) element of V(X) is the covariance ofXi andXj, Cov(Xi, Xj). If V(X) is nonsingular, then the correlation matrix ofX, written Cor(X) is

Cor(X) = E(X −E(X))T(V(X))−1E(X−E(X)). (1.70) The (i, j) element of Cor(X) is the correlation of Xi and Xj, and so the diagonal elements are all 1.

Definition 1.23 (variance-covariance of a random matrix)

The variance-covariance of a matrix random variable X is defined as the variance-covariance of vec(X):

V(X) = V(vec(X)) = E vec(X₋E(X))(vec(X₋E(X)))T, (1.71) if it exists.

Linearity of Moments

The linearity property of the expectation operator yields some simple linearity properties for moments of first or second degree of random variables over the same probability space.

For random variablesX, Y, andZ with finite variances and constantsa, b, andc, we have

V(aX+Y +c) =a2V(X) + V(Y) + 2aCov(X, Y); (1.72) that is, V(_·) is not a linear operator (but it simplifies nicely), and

Cov(aX+bY +c, X+Z) =aV(X) +aCov(X, Z) +bCov(X, Y) +bCov(Y, Z); (1.73) that is, Cov(_·,_·) is a bilinear operator. Proofs of these two facts are left as exercises.

Copulas

A copula is a function that relates a multivariate CDF to lower dimensional marginal CDFs. The basic ideas of copulas can all be explored in the context of a bivariate distribution and the two associated univariate distributions, and the ideas extend in a natural way to higher dimensions.

Definition 1.24 (two-dimensional copula)

Atwo-dimensional copulais a functionCthat maps [0,1]2_{onto [0,}_{1] with the} following properties:

1. for every u_∈[0,1],

C(0, u) =C(u,0) = 0, (1.74)

and

C(1, u) =C(u,1) =u, (1.75)

2. for every (u1, u2),(v1, v2)∈[0,1]2withu1≤v1 andu2≤v2,

C(u1, u2)−C(u1, v2)−C(v1, u2) +C(v1, v2)≥0. (1.76)

A two-dimensional copula is also called a2-copula.

The arguments to a copulaCare often taken to be CDFs, which of course take values in [0,1].

The usefulness of copulas derive from Sklar’s theorem, which we state without proof.

Theorem 1.19 (Sklar’s theorem)

Let PXY be a bivariate CDF with marginal CDFs PX and PY. Then there

exists a copula C such that for everyx, y_∈IR,

PXY(x, y) =C(PX(x), PY(y)). (1.77)

If PX and PY are continuous everywhere, then C is unique; otherwise C is

unique over the support of the distributions defined byPX and PY.

Conversely, ifC is a copula andPX and PY are CDFs, then the function PXY(x, y) defined by equation (1.77) is a CDF with marginal CDFs PX(x)

Thus, a copula is a joint CDF of random variables with U(0,1) marginals. The proof of the first part of the theorem is given in Nelsen (2006), among other places. The proof of the converse portion is straightforward and is left as an exercise.

For many bivariate distributions the copula is the most useful way to relate the joint distribution to the marginals, because it provides a separate description of the individual distributions and their association with each other.

One of the most important uses of copulas is to combine two marginal distributions to form a joint distribution with known bivariate characteristics. Certain standard copulas are useful in specific applications. The copula that corresponds to a bivariate normal distribution with correlation coefficient ρis CNρ(u, v) = Z Φ−1_(u) −∞ Z Φ−1_(v) −∞ φρ(t1, t2) dt2dt1, (1.78) where Φ(_·) is the standard (univariate) normal CDF, and φρ(·,·) is the bivariate normal PDF with means 0, variances 1, and correlation coefficientρ. This copula is usually called the Gaussian copula and has been widely used in financial applications.

The association determined by a copula is not the same as that determined by a correlation; that is, two pairs of random variables may have the same copula but different correlations.

In document Theory of Statistics - Free Computer, Programming, Mathematics, Technical Books, Lecture Notes and Tutorials (Page 53-58)