• No results found

1.1 Some Important Probability Definitions and Facts

1.1.4 Relations among Random Variables

In many applications there are two or more random variables of interest. For a given probability space (Ω,F, P), there may be a collection of random vari- ables,W. If the random variables have some common properties, for example, if either all are discrete or all are continuous and if all have the same structure, we may identify the collection as a “space”.

If the random variables have some relationship to each other, that is, if they are not independent, we seek useful measures of their dependence. The appropriate measure depends on the nature of their dependence. If they are quadratically related, a measure that is appropriate for linear relationships may be inappropriate, and vice versa.

We will consider two ways of studying the relationships among random variables. The first is based on second-degree moments, called covariances, between pairsof variables, and the other is based on functions that relate the CDF of one variable or one set of variables to the CDFs of other variables. These functions, called copulas, can involve more than single pairs of variables.

Random Variable Spaces

As with any function space, W may have interesting and useful properties. For example, W may be a linear space; that is, for X, Y ∈ W and a ∈ IR, aX+Y ∈ W.

The concept ofLp random variable spaces follows immediately from the general property of function spaces, discussed on page741. for random vari- ables in anLp random variable space, thepth absolute is finite.

The closure of random variable spaces is often of interest. We define various forms of closure depending on types of convergence of a sequence Xn in the space. For example, given any sequence Xn∈ W ifXn →Lr X impliesX ∈ W thenW is closed for therth moment.

Expectations

We may take expectations of functions of random variables in terms of their joint distribution or in terms of their marginal distributions. To indicate the

distribution used in an expectation, we may use notation for the expectation operator similar to that we use on the individual distribution, as described on page23. Given the random variablesX1andX2, we use the notation EX1

to indicate an expectation taken with respect to the marginal distribution of X1.

We often denote the expectation taken with respect to the joint distribu- tion as simply E, but for emphasis, we may use the notation EX1,X2.

We also use notation of the form EP, whereP denotes the relevant proba- bility distribution of whatever form, or Eθin a parametric family of probability distributions.

Expectations of PDFs and of Likelihoods

If the marginal PDFs of the random variablesX1 and X2 arefX1 andfX2,

we have the equalities EX1 f X2(X1) fX1(X1) = EX2 f X1(X2) fX2(X2) = 1. (1.63)

On the other hand,

EX1(−log(fX1(X1)))≤EX1(−log(fX2(X1))), (1.64)

with equality only iffX1(x) =fX2(x) a.e. (see page41).

When the distributions are in the same parametric family, we may write fθ with different values ofθ instead offX1 andfX2. In that case, it is more

natural to think of the functions as likelihoods since the parameter is the variable. From equation (1.63), for example, we have for the likelihood ratio,

Eθ1 L(θ 2;X) L(θ1;X) = 1. (1.65)

Covariance and Correlation

Expectations are also used to define relationships among random variables. We will first consider expectations of scalar random variables, and then discuss expectations of vector and matrix random variables.

For two scalar random variables, X and Y, useful measures of a linear

relationship between them are the covariance and correlation. The covariance ofX andX, if it exists, is denoted by Cov(X, Y), and is defined as

Cov(X, Y) = E ((X−E(X))(Y −E(Y))) (1.66) From the Cauchy-Schwarz inequality (B.21) (see page853), we see that

ThecorrelationofX andY, written Cor(X, Y), is defined as

Cor(X, Y) = Cov(X, Y).pV(X)V(Y). (1.68) The correlation is also called the correlation coefficient and is often written asρX,Y.

From inequality (1.67), we see that the correlation coefficient is in [−1,1]. IfX andY are independent, then Cov(X, Y) = Cor(X, Y) = 0 (exercise).

Structure of Random Variables

Random variables may consist of individual IR elements arrayed in some struc- ture, such as a vector so that the random variable itself is in IRdor as a matrix so that the random variable is in IRd×m. Many of the properties of random variables are essentially the same whatever their structure, except of course those properties may have structures dependent on that of the random vari- able.

Multiplication is an operation that depends very strongly on the structure of the operand. Ifxis a scalar,x2is a scalar. Ifxis a is vector, however, there are various operations that could be interpreted as extensions of a squaring operation. First, of course, is elementwise squaring. In this interpretation x2 has the same structure as x. Salient relationships among the individual ele- ments ofxmay be lost by this operation, however. Other interpretations are xTx, which preserves none of the structure ofx, andxxT, which is in IRd×d. The point of this is that what can reasonably be done in the analysis of ran- dom variables depends on the structure of the random variables, and such relatively simple concepts as moments require some careful consideration. In many cases, a third-order or higher-order moment is not useful because of its complexity.

Structural Moments

For random variables that have a structure such as a vector or matrix, the elementwise moments are the same as those for a scalar-valued random vari- able as described above, and hence, the first moment, the mean, has the same structure as the random variable itself.

Higher order moments of vectors and matrices present some problems be- cause the number of individual scalar moments is greater than the number of elements in the random object itself. For multivariate distributions, the higher- order marginal moments are generally more useful than the higher-order joint moments. We define the second-order moments (variances and covariances) for random vectors and for random matrices below.

Definition 1.22 (variance-covariance of a random vector)

The variance-covariance of a vector-valued random variableX is the expec- tation of the outer product,

V(X) = E (X−E(X))(X −E(X))T, (1.69) if it exists.

For a constant vector, the rank of an outer product is no greater than 1, but unless Xa.s.= E(X), V(X) is nonnegative definite. We see this by forming the scalar random variableY =cTX for any c

6

= 0, and writing 0V(Y)

= E((cTX−cTE(X))2)

= E((cT(XE(X))(X E(X))c) =cTV(X)c.

(IfX a.s.= E(X), then V(X) = 0, and while it is true thatcT0c= 00, we do not say that the 0 matrix is nonnegative definite. Recall further that whenever I write a term such as V(X), I am implicitly assuming its existence.)

Furthermore, if it is not the case that X a.s.= E(X), unless some element Xi of a vectorX is such that

Xia.s.= X

j6=i

(aj+bjXj),

then V(X) is positive definite a.s. To show this, we show that V(X) is full rank a.s. (exercise).

The elements of V(X) are the bivariate moments of the respective elements ofX; the (i, j) element of V(X) is the covariance ofXi andXj, Cov(Xi, Xj). If V(X) is nonsingular, then the correlation matrix ofX, written Cor(X) is

Cor(X) = E(X −E(X))T(V(X))−1E(X−E(X)). (1.70) The (i, j) element of Cor(X) is the correlation of Xi and Xj, and so the diagonal elements are all 1.

Definition 1.23 (variance-covariance of a random matrix)

The variance-covariance of a matrix random variable X is defined as the variance-covariance of vec(X):

V(X) = V(vec(X)) = E vec(XE(X))(vec(XE(X)))T, (1.71) if it exists.

Linearity of Moments

The linearity property of the expectation operator yields some simple linearity properties for moments of first or second degree of random variables over the same probability space.

For random variablesX, Y, andZ with finite variances and constantsa, b, andc, we have

V(aX+Y +c) =a2V(X) + V(Y) + 2aCov(X, Y); (1.72) that is, V(·) is not a linear operator (but it simplifies nicely), and

Cov(aX+bY +c, X+Z) =aV(X) +aCov(X, Z) +bCov(X, Y) +bCov(Y, Z); (1.73) that is, Cov(·,·) is a bilinear operator. Proofs of these two facts are left as exercises.

Copulas

A copula is a function that relates a multivariate CDF to lower dimensional marginal CDFs. The basic ideas of copulas can all be explored in the context of a bivariate distribution and the two associated univariate distributions, and the ideas extend in a natural way to higher dimensions.

Definition 1.24 (two-dimensional copula)

Atwo-dimensional copulais a functionCthat maps [0,1]2onto [0,1] with the following properties:

1. for every u[0,1],

C(0, u) =C(u,0) = 0, (1.74)

and

C(1, u) =C(u,1) =u, (1.75)

2. for every (u1, u2),(v1, v2)∈[0,1]2withu1≤v1 andu2≤v2,

C(u1, u2)−C(u1, v2)−C(v1, u2) +C(v1, v2)≥0. (1.76)

A two-dimensional copula is also called a2-copula.

The arguments to a copulaCare often taken to be CDFs, which of course take values in [0,1].

The usefulness of copulas derive from Sklar’s theorem, which we state without proof.

Theorem 1.19 (Sklar’s theorem)

Let PXY be a bivariate CDF with marginal CDFs PX and PY. Then there

exists a copula C such that for everyx, yIR,

PXY(x, y) =C(PX(x), PY(y)). (1.77)

If PX and PY are continuous everywhere, then C is unique; otherwise C is

unique over the support of the distributions defined byPX and PY.

Conversely, ifC is a copula andPX and PY are CDFs, then the function PXY(x, y) defined by equation (1.77) is a CDF with marginal CDFs PX(x)

Thus, a copula is a joint CDF of random variables with U(0,1) marginals. The proof of the first part of the theorem is given in Nelsen (2006), among other places. The proof of the converse portion is straightforward and is left as an exercise.

For many bivariate distributions the copula is the most useful way to relate the joint distribution to the marginals, because it provides a separate description of the individual distributions and their association with each other.

One of the most important uses of copulas is to combine two marginal distributions to form a joint distribution with known bivariate characteristics. Certain standard copulas are useful in specific applications. The copula that corresponds to a bivariate normal distribution with correlation coefficient ρis CNρ(u, v) = Z Φ−1(u) −∞ Z Φ−1(v) −∞ φρ(t1, t2) dt2dt1, (1.78) where Φ(·) is the standard (univariate) normal CDF, and φρ(·,·) is the bi- variate normal PDF with means 0, variances 1, and correlation coefficientρ. This copula is usually called the Gaussian copula and has been widely used in financial applications.

The association determined by a copula is not the same as that determined by a correlation; that is, two pairs of random variables may have the same copula but different correlations.