Lecture11

(1)

Lecture 11

Systems of random variables

Plan of the lecture:

1. Joint PMFs of Multiple Random Variables 1.1 Joint PMF of two random variables 1.2 Functions of Multiple Random Variables 1.3 More than Two Random Variables

1.4 Conditioning one Random Variable on Another 2. Multiple Continuous Random Variables

2.1 Joint PDF of two random variables

2.2 Conditioning One Random Variable on Another 2.3 Inference and the Continuous Bayes’ Rule 3. Independence

3.1 Summary of Facts About Independent Discrete Random Variables 3.2 Independence of Continuous Random Variables

4. Joint CDFs

5. Covariance and Correlation

(2)

1 Joint PMFs of Multiple Random Variables

1.1 Joint PMF of two random variables

Probabilistic models often involve several random variables of interest. For example, in a medical diagnosis context, the results of several tests may be significant, or in a networking context, the workloads of several gateways may be of interest. All of these random variables are associated with the same experiment, sample space, and probability law, and their values may relate in interesting ways. This motivates us to consider probabilities involving simultaneously the numerical values of several random variables and to investigate their mutual couplings. We will extend the concepts of PMF and expectation developed so far to multiple random variables.

Consider two discrete random variables 𝑋 and 𝑌 associated with the same experiment. The joint PMF of 𝑋and 𝑌is defined by

𝑝_𝑋,𝑌(𝑥, 𝑦) = 𝑃(𝑋 = 𝑥, 𝑌 = 𝑦)

for all pairs of numerical values (𝑥, 𝑦) that 𝑋 and 𝑌 can take. Here and elsewhere, we will use the abbreviated notation 𝑃(𝑋 = 𝑥, 𝑌 = 𝑦) instead of the more precise notations 𝑃( 𝑋 = 𝑥 ∩ 𝑌 = 𝑦 ) or 𝑃(𝑋 = 𝑥 𝑎𝑛𝑑 𝑌 = 𝑦).

The joint PMF determines the probability of any event that can be specified in terms of the random variables 𝑋 and 𝑌. For example if 𝐴 is the set of all pairs (𝑥, 𝑦) that have a certain property, then

𝑃 𝑋, 𝑌 ∈ 𝐴 = _{(𝑥,𝑦)∈𝐴}𝑝_𝑋,𝑌(𝑥, 𝑦).

In fact, we can calculate the PMFs of 𝑋and 𝑌by using the formulas

𝑝_𝑋(𝑥) = 𝑝_𝑦 _𝑋,𝑌(𝑥, 𝑦), 𝑝_𝑌(𝑦) = 𝑝_𝑥 _𝑋,𝑌(𝑥, 𝑦).

The formula for 𝑝_𝑋(𝑥) can be verified using the calculation

(3)

where the second equality follows by noting that the event {𝑋 = 𝑥} is the union of the disjoint events {𝑋 = 𝑥, 𝑌 = 𝑦} as 𝑦 ranges over all the different values of 𝑌. The formula for 𝑝𝑌(𝑦) is

verified similarly. We sometimes refer to 𝑝_𝑋 and 𝑝_𝑌 as the marginal PMFs, to distinguish them from the joint PMF.

The example of Fig. 1 illustrates the calculation of the marginal PMFs from the joint PMF by using the tabular method. Here, the joint PMF of 𝑋 and 𝑌 is arranged in a two-dimensional table, and the marginal PMF of 𝑋 or 𝑌 at a given value is obtained by adding the table entries along a corresponding column or row, respectively.

Figure 1: Illustration of the tabular method for calculating marginal PMFs from joint PMFs. The joint PMF is represented by a table, where the number in each square (𝑥, 𝑦) gives the value of 𝑝_𝑋,𝑌(𝑥, 𝑦). To calculate the marginal PMF 𝑝_𝑋 𝑥 for a given value of 𝑥, we add the numbers in the column corresponding to 𝑥. For example 𝑝_𝑋 2 = 6/20. Similarly, to calculate the marginal PMF 𝑝_𝑌(𝑦) for a given value of 𝑦, we add the numbers in the row corresponding to 𝑦. For example 𝑝_𝑌(2) = 7/20.

1.2 Functions of Multiple Random Variables

(4)

𝑝_𝑍 𝑧 = _{𝑥,𝑦 | 𝑔 𝑥,𝑦 =𝑧}𝑝_𝑋,𝑌(𝑥, 𝑦).

Furthermore, the expected value rule for functions naturally extends and takes the form

𝐸 𝑔 𝑋, 𝑌 = _𝑥,𝑦𝑔 𝑥, 𝑦 𝑝_𝑋,𝑌(𝑥, 𝑦).

The verification of this is very similar to the earlier case of a function of a single random variable. In the special case where 𝑔 is linear and of the form 𝑎𝑋 + 𝑏𝑌 + 𝑐, where 𝑎, 𝑏, and 𝑐 are given scalars, we have

𝐸[𝑎𝑋 + 𝑏𝑌 + 𝑐] = 𝑎𝐸[𝑋] + 𝑏𝐸[𝑌] + 𝑐.

1.3 More than Two Random Variables

The joint PMF of three random variables 𝑋, 𝑌, and 𝑍is defined in analogy with the above as

𝑝𝑋,𝑌,𝑍(𝑥, 𝑦, 𝑧) = 𝑃(𝑋 = 𝑥, 𝑌 = 𝑦, 𝑍 = 𝑧),

for all possible triplets of numerical values (𝑥, 𝑦, 𝑧). Corresponding marginal PMFs are analogously obtained by equations such as

𝑝𝑋,𝑌 𝑥, 𝑦 = 𝑝𝑧 𝑋,𝑌,𝑍(𝑥, 𝑦, 𝑧) and 𝑝𝑋 𝑥 = 𝑝𝑌 𝑍 𝑋,𝑌,𝑍(𝑥, 𝑦, 𝑧).

The expected value rule for functions takes the form

𝐸 𝑔 𝑋, 𝑌, 𝑍 = 𝑥,𝑦,𝑧𝑔 𝑥, 𝑦, 𝑧 𝑝𝑋,𝑌,𝑍(𝑥, 𝑦, 𝑧),

and if 𝑔is linear and of the form 𝑎𝑋 + 𝑏𝑌 + 𝑐𝑍 + 𝑑, then

(5)

Furthermore, there are obvious generalizations of the above to more than three random variables. For example, for any random variables 𝑋1, 𝑋2, … , 𝑋𝑛 and any scalars 𝑎1, 𝑎2, … , 𝑎𝑛, we

have

𝐸[𝑎1𝑋1+ 𝑎2𝑋2+ ⋯ + 𝑎𝑛𝑋𝑛] = 𝑎1𝐸[𝑋1] + 𝑎2𝐸[𝑋2] + ⋯ + 𝑎𝑛𝐸[𝑋𝑛].

Summary of Facts About Joint PMFs

Let 𝑋and 𝑌be random variables associated with the same experiment. – The joint PMF of 𝑋and 𝑌is defined by

𝑝_𝑋,𝑌(𝑥, 𝑦) = 𝑃(𝑋 = 𝑥, 𝑌 = 𝑦).

– The marginal PMFs of 𝑋and 𝑌can be obtained from the joint PMF, using the formulas

𝑝𝑋(𝑥) = 𝑝𝑦 𝑋,𝑌(𝑥, 𝑦), 𝑝𝑌(𝑦) = 𝑝𝑥 𝑋,𝑌(𝑥, 𝑦).

– A function 𝑔(𝑋, 𝑌) of 𝑋and 𝑌defines another random variable, and

𝐸 𝑔 𝑋, 𝑌 = _𝑥,𝑦𝑔 𝑥, 𝑦 𝑝_𝑋,𝑌(𝑥, 𝑦).

– If 𝑔 is linear, of the form 𝑎𝑋 + 𝑏𝑌 + 𝑐, we have

𝐸[𝑎𝑋 + 𝑏𝑌 + 𝑐] = 𝑎𝐸[𝑋] + 𝑏𝐸[𝑌] + 𝑐.

– The above have natural extensions to the case where more than two random variables are involved.

1.4 Conditioning one Random Variable on Another

(6)

𝑝_𝑋|𝑌(𝑥|𝑦) = 𝑃(𝑋 = 𝑥|𝑌 = 𝑦).

Using the definition of conditional probabilities, we have

𝑝𝑋|𝑌(𝑥|𝑦) =𝑃(𝑋=𝑥,𝑌=𝑦)_{𝑃(𝑌=𝑦)} =𝑝𝑋 ,𝑌_𝑝 (𝑥,𝑦)

𝑌(𝑦) .

Let us fix some 𝑦, with 𝑝_𝑌(𝑦) > 0 and consider 𝑝_𝑋|𝑌(𝑥|𝑦) as a function of 𝑥. This function is a valid PMF for 𝑋: it assigns nonnegative values to each possible 𝑥, and these values add to 1. Furthermore, this function of 𝑥, has the same shape as 𝑝_𝑋,𝑌(𝑥, 𝑦) except that it is normalized by dividing with 𝑝𝑌(𝑦), which enforces the normalization property

𝑝𝑥 𝑋|𝑌(𝑥|𝑦) = 1.

Figure 2 provides a visualization of the conditional PMF.

Figure 2: Visualization of the conditional PMF 𝑝𝑋|𝑌(𝑥|𝑦). For each 𝑦, we view the joint PMF

along the slice 𝑌 = 𝑦and renormalize so that 𝑝𝑥 𝑋|𝑌(𝑥|𝑦)= 1.

The conditional PMF is often convenient for the calculation of the joint PMF, using a sequential approach and the formula

(7)

or its counterpart

𝑝_𝑋,𝑌(𝑥, 𝑦) = 𝑝_𝑋(𝑥)𝑝_𝑌|𝑋(𝑦|𝑥).

This method is entirely similar to the use of the multiplication rule.

The conditional PMF can also be used to calculate the marginal PMFs. In particular, we have by using the definitions,

𝑝𝑋 𝑥 = 𝑝𝑦 𝑋,𝑌(𝑥, 𝑦)= 𝑝𝑦 𝑌(𝑦)𝑝𝑋|𝑌(𝑥|𝑦).

This formula provides a divide-and-conquer method for calculating marginal PMFs. It is in essence identical to the total probability theorem, but cast in different notation.

Note finally that one can define conditional PMFs involving more than two random variables, as in 𝑝𝑋,𝑌|𝑍(𝑥, 𝑦|𝑧) or 𝑝𝑋|𝑌,𝑍(𝑥|𝑦, 𝑧). The concepts and methods described above

generalize easily.

2 Multiple Continuous Random Variables p. 125

2.1 Joint PDF of two random variables

We will now extend the notion of a PDF to the case of multiple random variables. In complete analogy with discrete random variables, we introduce joint, marginal, and conditional PDFs. Their intuitive interpretation as well as their main properties parallel the discrete case.

We say that two continuous random variables associated with a common experiment are

jointly continuous and can be described in terms of a joint PDF 𝑓_𝑋,𝑌, if 𝑓_𝑋,𝑌 is a nonnegative function that satisfies

𝑃 𝑋, 𝑌 ∈ 𝐵 = _{𝑋,𝑌 ∈𝐵}𝑓𝑋,𝑌(𝑥, 𝑦)𝑑𝑥𝑑𝑦,

(8)

𝑃(𝑎 ≤ 𝑋 ≤ 𝑏, 𝑐 ≤ 𝑌 ≤ 𝑑) = 𝑓_𝑐𝑑 _𝑎𝑏 𝑋,𝑌(𝑥, 𝑦)𝑑𝑥𝑑𝑦.

Furthermore, by letting 𝐵 be the entire two-dimensional plane, we obtain the normalization property

𝑓_−∞∞ _−∞∞ 𝑋,𝑌 𝑥, 𝑦 𝑑𝑥𝑑𝑦 = 1.

To interpret the PDF, we let 𝛿 be very small and consider the probability of a small rectangle. We have

𝑃(𝑎 ≤ 𝑋 ≤ 𝑎 + 𝛿, 𝑐 ≤ 𝑌 ≤ 𝑐 + 𝛿) = _𝑐𝑐+𝛿 _𝑎𝑎+𝛿𝑓_𝑋,𝑌(𝑥, 𝑦)𝑑𝑥𝑑𝑦≈ 𝑓_𝑋,𝑌(𝑎, 𝑐) ∙ 𝛿2_,

so we can view 𝑓_𝑋,𝑌(𝑎, 𝑐) as the “probability per unit area” in the vicinity of (𝑎, 𝑐).

The joint PDF contains all conceivable probabilistic information on the random variables 𝑋and 𝑌, as well as their dependencies. It allows us to calculate the probability of any event that can be defined in terms of these two random variables. As a special case, it can be used to calculate the probability of an event involving only one of them. For example, let 𝐴 be a subset of the real line and consider the event {𝑋 ∈ 𝐴}. We have

𝑃 𝑋 ∈ 𝐴 = 𝑃 𝑋 ∈ 𝐴 𝑎𝑛𝑑 𝑌 ∈ (−∞, ∞) = 𝑓_𝐴 _−∞∞ 𝑋,𝑌(𝑥, 𝑦)𝑑𝑦𝑑𝑥.

Comparing with the formula

𝑃 𝑋 ∈ 𝐴 = 𝑓_𝑋(𝑥)𝑑𝑥

𝐴

we see that the marginal PDF 𝑓𝑋 of 𝑋is given by

𝑓_𝑋 𝑥 = 𝑓_−∞∞ 𝑋,𝑌(𝑥, 𝑦)𝑑𝑦.

(9)

𝑓𝑌 𝑦 = 𝑓_−∞∞ 𝑋,𝑌(𝑥, 𝑦)𝑑𝑥.

Expectation

If 𝑋 and 𝑌 are jointly continuous random variables, and 𝑔 is some function, then 𝑍 = 𝑔(𝑋, 𝑌) is also a random variable. Let us note that the expected value rule is still applicable and

𝐸 𝑔(𝑋, 𝑌) = 𝑔(𝑥, 𝑦)𝑓_−∞∞ _−∞∞ 𝑋,𝑌(𝑥, 𝑦)𝑑𝑥𝑑𝑦.

As an important special case, for any scalars 𝑎, 𝑏, we have

𝐸[𝑎𝑋 + 𝑏𝑌] = 𝑎𝐸[𝑋] + 𝑏𝐸[𝑌].

2.2 Conditioning One Random Variable on Another

Let 𝑋 and 𝑌 be continuous random variables with joint PDF 𝑓𝑋,𝑌. For any fixed 𝑦 with

𝑓𝑌(𝑦) > 0, the conditional PDF of 𝑋given that 𝑌 = 𝑦, is defined by

𝑓_𝑋|𝑌(𝑥|𝑦) =𝑓𝑋 ,𝑌(𝑥,𝑦)

𝑓𝑌(𝑦) .

This definition is analogous to the formula 𝑝_𝑋|𝑌 = 𝑝_𝑋,𝑌/𝑝_𝑌 for the discrete case.

When thinking about the conditional PDF, it is best to view 𝑦 as a fixed number and consider 𝑓_𝑋|𝑌(𝑥|𝑦) as a function of the single variable 𝑥. As a function of 𝑥, the conditional PDF 𝑓_𝑋|𝑌(𝑥|𝑦) has the same shape as the joint PDF 𝑓_𝑋,𝑌(𝑥, 𝑦), because the normalizing factor 𝑓_𝑌(𝑦) does not depend on 𝑥; see Fig. 3. Note that the normalization ensures that

𝑓_−∞∞ 𝑋|𝑌(𝑥|𝑦)𝑑𝑥= 1,

(10)

Figure 3: Visualization of the conditional PDF 𝑓_𝑋|𝑌(𝑥|𝑦). Let 𝑋,𝑌have a joint PDF which is uniform on the set 𝑆. For each fixed 𝑦, we consider the joint PDF along the slice 𝑌 = 𝑦and

normalize it so that it integrates to 1.

To interpret the conditional PDF, let us fix some small positive numbers 𝛿₁ and 𝛿₂, and condition on the event 𝐵 = {𝑦 ≤ 𝑌 ≤ 𝑦 + 𝛿₂}. We have

𝑃(𝑥 ≤ 𝑋 ≤ 𝑥 + 𝛿₁|𝑦 ≤ 𝑌 ≤ 𝑦 + 𝛿₂) =𝑃(𝑥≤𝑋≤𝑥+𝛿1 𝑎𝑛𝑑 𝑦≤𝑌≤𝑦+𝛿2)

𝑃(𝑦≤𝑌≤𝑦+𝛿2) ≈

𝑓𝑋 ,𝑌(𝑥,𝑦)𝛿1𝛿2

𝑓𝑌(𝑦)𝛿2 = 𝑓𝑋|𝑌(𝑥|𝑦)𝛿1. In words, 𝑓𝑋|𝑌(𝑥|𝑦)𝛿1 provides us with the probability that 𝑋 belongs in a small interval

[𝑥, 𝑥 + 𝛿1], given that 𝑌 belongs in a small interval [𝑦, 𝑦 + 𝛿2]. Since 𝑓𝑋|𝑌(𝑥|𝑦)𝛿1 does not

depend on 𝛿₂, we can think of the limiting case where 𝛿₂ decreases to zero and write 𝑃(𝑥 ≤ 𝑋 ≤ 𝑥 + 𝛿1|𝑌 = 𝑦) ≈ 𝑓𝑋|𝑌(𝑥|𝑦)𝛿1, (𝛿1 small),

and, more generally,

𝑃 𝑋 ∈ 𝐴 𝑌 = 𝑦) = 𝑓_𝐴 𝑋|𝑌(𝑥|𝑦)𝑑𝑥.

Conditional probabilities, given the zero probability event {𝑌 = 𝑦}, were left undefined in earlier lectures. But the above formula provides a natural way of defining such conditional probabilities in the present context. In addition, it allows us to view the conditional PDF 𝑓_𝑋|𝑌(𝑥|𝑦) (as a function of 𝑥) as a description of the probability law of 𝑋, given that the event {𝑌 = 𝑦}has occurred.

(11)

modeling: instead of directly specifying 𝑓_𝑋,𝑌, it is often natural to provide a probability law for 𝑌, in terms of a PDF 𝑓_𝑌, and then provide a conditional probability law 𝑓_𝑋|𝑌(𝑥, 𝑦) for 𝑋, given any possible value 𝑦of 𝑌.

Having defined a conditional probability law, we can also define a corresponding conditional expectation by letting

𝐸[𝑋|𝑌 = 𝑦] = 𝑥𝑓_−∞∞ 𝑋|𝑌(𝑥|𝑦)𝑑𝑥.

The properties of (unconditional) expectation carry though, with the obvious modifications, to conditional expectation. For example the conditional version of the expected value rule

𝐸[𝑔 𝑋 |𝑌 = 𝑦] = 𝑔 𝑥 𝑓∞ _𝑋|𝑌(𝑥|𝑦)𝑑𝑥

−∞

remains valid.

Summary of Facts About Multiple Continuous Random Variables

Let 𝑋and 𝑌be jointly continuous random variables with joint PDF 𝑓_𝑋,𝑌.

– The joint, marginal, and conditional PDFs are related to each other by the formulas

𝑓_𝑋,𝑌 𝑥, 𝑦 = 𝑓𝑌(𝑦)𝑓𝑋|𝑌(𝑥|𝑦),

𝑓_𝑋 𝑥 = 𝑓_−∞∞ 𝑌(𝑦)𝑓𝑋|𝑌(𝑥|𝑦)𝑑𝑦.

The conditional PDF 𝑓_𝑋|𝑌(𝑥|𝑦) is defined only for those 𝑦for which 𝑓_𝑌(𝑦) > 0. – They can be used to calculate probabilities:

𝑃 𝑋, 𝑌 ∈ 𝐵 = _{𝑋,𝑌 ∈𝐵}𝑓𝑋,𝑌(𝑥, 𝑦)𝑑𝑥𝑑𝑦,

𝑃 𝑋 ∈ 𝐴 = 𝑓_𝐴 𝑋(𝑥)𝑑𝑥,

𝑃 𝑋 ∈ 𝐴 𝑌 = 𝑦) = 𝑓_𝐴 𝑋|𝑌(𝑥|𝑦)𝑑𝑥.

(12)

𝐸 𝑔(𝑋) = 𝑔(𝑥)𝑓_−∞∞ 𝑋(𝑥)𝑑𝑥,

𝐸 𝑔(𝑋, 𝑌) = 𝑔(𝑥, 𝑦)𝑓_−∞∞ _−∞∞ 𝑋,𝑌(𝑥, 𝑦)𝑑𝑥𝑑𝑦,

𝐸[𝑔 𝑋 |𝑌 = 𝑦] = 𝑔 𝑥 𝑓_−∞∞ 𝑋|𝑌(𝑥|𝑦)𝑑𝑥,

𝐸[𝑔 𝑋, 𝑌 |𝑌 = 𝑦] = 𝑔 𝑥, 𝑦 𝑓_−∞∞ 𝑋|𝑌(𝑥|𝑦)𝑑𝑥.

– We have the following versions of the total expectation theorem:

𝐸 𝑋 = 𝐸 𝑋|𝑌 = 𝑦 𝑓𝑌(𝑦) 𝑑𝑦,

𝐸 𝑔 𝑋 = 𝐸 𝑔 𝑋 |𝑌 = 𝑦 𝑓𝑌(𝑦) 𝑑𝑦,

𝐸 𝑔 𝑋, 𝑌 = 𝐸 𝑔 𝑋, 𝑌 |𝑌 = 𝑦 𝑓𝑌(𝑦) 𝑑𝑦.

2.3 Inference and the Continuous Bayes’ Rule

In many situations, we have a model of an underlying but unobserved phenomenon, represented by a random variable 𝑋 with PDF 𝑓_𝑋, and we make noisy measurements 𝑌. The measurements are supposed to provide information about 𝑋 and are modeled in terms of a conditional PDF 𝑓𝑋|𝑌. For example, if 𝑌is the same as 𝑋, but corrupted by zero-mean normally

distributed noise, one would let the conditional PDF 𝑓𝑌|𝑋(𝑦|𝑥) of 𝑌, given that 𝑋 = 𝑥, be normal

with mean equal to 𝑥. Once the experimental value of 𝑌is measured, what information does this provide on the unknown value of 𝑋?

This setting is similar to introduced the Bayes rule and used it to solve inference problems. The only difference is that we are now dealing with continuous random variables.

Note that the information provided by the event {𝑌 = 𝑦} is described by the conditional PDF 𝑓𝑋|𝑌(𝑥|𝑦). It thus suffices to evaluate the latter PDF. A calculation analogous to the original

derivation of the Bayes’ rule, based on the formulas 𝑓_𝑋𝑓_𝑌|𝑋 = 𝑓_𝑋,𝑌= 𝑓_𝑌𝑓_𝑋|𝑌, yields

𝑓𝑋|𝑌 𝑥 𝑦 =𝑓𝑋 𝑥 𝑓_𝑓 𝑌|𝑋(𝑦|𝑥)

𝑌(𝑦) =

𝑓𝑋 𝑥 𝑓𝑌|𝑋(𝑦|𝑥)

𝑓𝑋 𝑡 𝑓𝑌|𝑋(𝑦|𝑡)𝑑𝑡, which is the desired formula.

3 Independence

(13)

Let 𝐴 be an event, with 𝑃(𝐴) > 0, and let 𝑋 and 𝑌 be random variables associated with the same experiment.

– 𝑋is independent of the event 𝐴if

𝑝_𝑋|𝐴(𝑥) = 𝑝_𝑋(𝑥), for all 𝑥,

that is, if for all 𝑥, the events {𝑋 = 𝑥}and 𝐴are independent.

– 𝑋and 𝑌 are independent if for all possible pairs (𝑥, 𝑦), the events {𝑋 = 𝑥}and {𝑌 = 𝑦} are independent, or equivalently

𝑝_𝑋,𝑌(𝑥, 𝑦) = 𝑝_𝑋(𝑥)𝑝_𝑌(𝑦), for all 𝑥, 𝑦.

– If 𝑋and 𝑌are independent random variables, then

𝐸[𝑋𝑌] = 𝐸[𝑋]𝐸[𝑌].

Furthermore, for any functions 𝑓 and 𝑔, the random variables 𝑔(𝑋) and 𝑕(𝑌) are independent, and we have

𝐸[𝑔 𝑋 𝑕 𝑌 ] = 𝐸[𝑔 𝑋 ]𝐸[𝑕 𝑌 ].

– If 𝑋and 𝑌are independent, then

var(𝑋 + 𝑌) = var(𝑋) + var(𝑌).

3.2 Independence of Continuous Random Variables

Suppose that 𝑋and 𝑌are independent, that is,

𝑓𝑋,𝑌(𝑥, 𝑦) = 𝑓𝑋(𝑥)𝑓𝑌(𝑦), for all 𝑥, 𝑦.

We then have the following properties.

(14)

– We have

𝐸[𝑋𝑌] = 𝐸[𝑋]𝐸[𝑌],

and, more generally,

𝐸[𝑔 𝑋 𝑕 𝑌 ] = 𝐸[𝑔 𝑋 ]𝐸[𝑕 𝑌 ].

– We have

var(𝑋 + 𝑌) = var(𝑋) + var(𝑌).

4 Joint CDFs

If 𝑋and 𝑌are two random variables associated with the same experiment, we define their joint CDF by

𝐹𝑋,𝑌(𝑥, 𝑦) = 𝑃(𝑋 ≤ 𝑥, 𝑌 ≤ 𝑦).

As in the case of one random variable, the advantage of working with the CDF is that it applies equally well to discrete and continuous random variables. In particular, if 𝑋 and 𝑌 are described by a joint PDF 𝑓_𝑋,𝑌, then

𝐹_𝑋,𝑌(𝑥, 𝑦) = 𝑃(𝑋 ≤ 𝑥, 𝑌 ≤ 𝑦) = 𝑓_−∞𝑥 _−∞𝑦 𝑋,𝑌(𝑠, 𝑡)𝑑𝑠𝑑𝑡.

Conversely, the PDF can be recovered from the PDF by differentiating:

𝑓_𝑋,𝑌 𝑥, 𝑦 = 𝜕2𝐹𝑋 ,𝑌

𝜕𝑥𝜕𝑦 (𝑥, 𝑦).

More than Two Random Variables

(15)

𝑃 𝑋, 𝑌, 𝑍 ∈ 𝐵 = _{𝑥,𝑦,𝑧 ∈𝐵}𝑓𝑋,𝑌,𝑍(𝑥, 𝑦, 𝑧)𝑑𝑥𝑑𝑦𝑑𝑧,

for any set 𝐵. We also have relations such as

𝑓_𝑋,𝑌(𝑥, 𝑦) = 𝑓𝑋,𝑌,𝑍(𝑥, 𝑦, 𝑧)𝑑𝑧,

and

𝑓𝑋(𝑥) = 𝑓𝑋,𝑌,𝑍(𝑥, 𝑦, 𝑧)𝑑𝑦𝑑𝑧.

One can also define conditional PDFs by formulas such as

𝑓_{𝑋,𝑌|𝑍}(𝑥, 𝑦|𝑧) =𝑓𝑋 ,𝑌,𝑍(𝑥,𝑦,𝑧)

𝑓𝑍(𝑧) , for 𝑓𝑍(𝑧) > 0, 𝑓_{𝑋|𝑌,𝑍}(𝑥|𝑦, 𝑧) =𝑓𝑋 ,𝑌,𝑍(𝑥,𝑦,𝑧)

𝑓𝑌,𝑍(𝑦,𝑧) , for 𝑓𝑌,𝑍(𝑦, 𝑧) > 0. There is an analog of the multiplication rule:

𝑓_{𝑋,𝑌,𝑍}(𝑥, 𝑦, 𝑧) = 𝑓_{𝑋|𝑌,𝑍}(𝑥|𝑦, 𝑧)𝑓_𝑌|𝑍(𝑦|𝑧)𝑓_𝑍(𝑧).

Finally, we say that the three random variables 𝑋, 𝑌, and 𝑍are independent if

𝑓_{𝑋,𝑌,𝑍}(𝑥, 𝑦, 𝑧) = 𝑓_𝑋(𝑥)𝑓_𝑌(𝑦)𝑓_𝑍(𝑧), for all 𝑥, 𝑦, 𝑧.

The expected value rule for functions takes the form

𝐸 𝑔(𝑋, 𝑌, 𝑍) = 𝑔(𝑥, 𝑦, 𝑧)𝑓𝑋,𝑌,𝑍(𝑥, 𝑦, 𝑧)𝑑𝑥𝑑𝑦𝑑𝑧,

and if 𝑔is linear and of the form 𝑎𝑋 + 𝑏𝑌 + 𝑐𝑍, then

(16)

Furthermore, there are obvious generalizations of the above to the case of more than three random variables. For example, for any random variables 𝑋1, 𝑋2, … , 𝑋𝑛 and any scalars

𝑎1, 𝑎2, … , 𝑎𝑛, we have

𝐸[𝑎1𝑋1+ 𝑎2𝑋2+ ⋯ + 𝑎𝑛𝑋𝑛] = 𝑎1𝐸[𝑋1] + 𝑎2𝐸[𝑋2] + ⋯ + 𝑎𝑛𝐸[𝑋𝑛].

5 Covariance and Correlation

The covariance of two random variables 𝑋and 𝑌is denoted by cov(𝑋, 𝑌), and is defined by

cov(𝑋, 𝑌) = 𝐸 𝑋 − 𝐸[𝑋] 𝑌 − 𝐸[𝑌] .

When cov(𝑋, 𝑌) = 0, we say that 𝑋and 𝑌are uncorrelated.

Roughly speaking, a positive or negative covariance indicates that the values of 𝑋 − 𝐸[𝑋] and 𝑌 − 𝐸[𝑌] obtained in a single experiment “tend” to have the same or the opposite sign, respectively (see Fig. 4). Thus the sign of the covariance provides an important qualitative indicator of the relation between 𝑋and 𝑌.

If 𝑋and 𝑌are independent, then

cov 𝑋, 𝑌 = 𝐸 𝑋 − 𝐸 𝑋 𝑌 − 𝐸 𝑌 = 𝐸 𝑋 − 𝐸 𝑋 𝐸 𝑌 − 𝐸 𝑌 = 0.

Thus if 𝑋and 𝑌are independent, they are also uncorrelated. However, the reverse is not true.

(17)

The correlation coefficient 𝜌 of two random variables 𝑋 and 𝑌 that have nonzero variances is defined as

𝜌 = cov (𝑋,𝑌)

var (𝑋)var (𝑌).

It may be viewed as a normalized version of the covariance cov(𝑋, 𝑌), and in fact it can be shown that 𝜌ranges from −1 to 1.

If 𝜌 > 0 (or 𝜌 < 0), then the values of 𝑥 − 𝐸[𝑋] and 𝑦 − 𝐸[𝑌] “tend” to have the same (or opposite, respectively) sign, and the size of |𝜌|provides a normalized measure of the extent to which this is true. In fact, always assuming that 𝑋 and 𝑌 have positive variances, it can be shown that 𝜌 = 1 (or 𝜌 = −1) if and only if there exists a positive (or negative, respectively) constant 𝑐such that

𝑦 − 𝐸[𝑌] = 𝑐 𝑥 − 𝐸[𝑋] , for all possible numerical values (𝑥, 𝑦).

The covariance can be used to obtain a formula for the variance of the sum of several (not necessarily independent) random variables. In particular, if 𝑋₁, 𝑋₂, … , 𝑋_𝑛 are random variables with finite variance, we have

var 𝑛 𝑋_𝑖

𝑖=1 = 𝑛𝑖=1var 𝑋𝑖 + 2 𝑛𝑖,𝑗 =1cov(𝑋𝑖, 𝑋𝑗) 𝑖<𝑗

.

This can be seen from the following calculation, where for brevity, we denote 𝑋 𝑖 = 𝑋𝑖−

𝐸[𝑋𝑖]:

var 𝑛 𝑋_𝑖

𝑖=1 = 𝐸 𝑛𝑖=1𝑋𝑖 2 = 𝐸 𝑛𝑖=1 𝑛𝑗 =1𝑋 𝑖𝑋 𝑗 = 𝑛𝑖=1 𝑛𝑗 =1𝐸 𝑋 𝑖𝑋 𝑗 = 𝑛𝑖=1𝐸 𝑋 𝑖2 +

2 𝑛_{𝑖,𝑗 =1}𝐸 𝑋 _𝑖𝑋 _𝑗

𝑖<𝑗

= 𝑛 var 𝑋_𝑖

𝑖=1 + 2 𝑛𝑖,𝑗 =1cov(𝑋𝑖, 𝑋𝑗) 𝑖<𝑗