• No results found

CONTINUOUS VARIABLE INFERENCE 183

More Inference Algorithms

4.1. CONTINUOUS VARIABLE INFERENCE 183

Y X

k

Y

(y|x) = N(y;10x,30

2

) k

X

(x) = N(x;40,5

2

)

Figure 4.2: A Bayesian network containing continous random variables.

Theorem 4.1 These equalities hold for the normal density function:

N (x; µ, σ2) = N (µ; x, σ2) (4.2) N (ax; µ, σ2) = 1

aN µ

x;µ a,σ2

a2

(4.3)

N (x; µ1, σ21)N (x; µ2, σ22) = kN µ

x;σ22µ1+ σ21µ2

σ21+ σ22 , σ21σ22 σ21+ σ22

(4.4) where k does not depend on x.

Z

x

N (x; µ1, σ21)N (x; y, σ22)dx = N(y; µ1, σ21+ σ22). (4.5)

Proof. The proof is left as an exercise.

4.1.2 An Example Concerning Continuous Variables

Next we present an example of Bayesian inference with continuous random variables.

Example 4.1 Suppose you are considering taking a job that pays $10 an hour and you expect to work 40 hours per week. However, you are not guaranteed 40 hours, and you estimate the number of hours actually worked in a week to be normally distributed with mean 40 and standard deviation 5. You have not yet fully investigated the benefits such as bonus pay and nontaxable deductions such as contributions to a retirement program, etc. However, you estimate these other influences on your gross taxable weekly income to also be normally distributed with mean 0 (That is, you feel they about offset.) and standard deviation 30.

Furthermore, you assume that these other influences are independent of your hours worked.

First let’s determine your expected gross taxable weekly income and its stan-dard deviation. The number of hours worked X is normally distributed with density function ρX(x) = N (x; 40, 52), the other influences W on your pay are normally distributed with density function ρW(w) = N (w; 0, 302), and X and W are independent. Your gross taxable weekly income Y is given by

y = w + 10x.

Let ρY(y|x) denote the conditional density function of Y given X = x. The results just obtained imply ρY(y|x) is normally distributed with expected value and variance as follows:

The second equality in both cases is due to the fact that X and W are indepen-dent. We have shown that ρY(y|x) = N(y; 10x, 302). The Bayesian network in Figure 4.2 summarizes these results. Note that W is not shown in the network.

Rather W is represented implicitly in the probabilistic relationship between X and Y . Were it not for W , Y would be a deterministic function of X. We compute the density function ρY(y) for your weekly income from the values in that network as follows:

4.1. CONTINUOUS VARIABLE INFERENCE 185 The 3rd through 6th equalities above are due to Equalities 4.2, 4.3, 4.5, and 4.3 respectively. We conclude that the expected value of your gross taxable weekly income is $400 and the standard deviation is √

3400 ≈ 58.

Example 4.2 Suppose next that your first check turns out to be for $300, and this seems low to you. That is, you don’t recall exactly how many hours you worked, but you feel that it should have been enough to make your income ex-ceed $300. To investigate the matter, you can determine the distribution of your weekly hours given that the income has this value, and decide whether this distribution seems reasonable. Towards that end, we have

ρX(x|Y = 300) = ρY(300|x)ρX(x)

The 3rd equality is due to Equality 4.2, the 4th is due to Equality 4.3, the 6th is due to Equality 4.4, and the last is due to the fact that ρX(x|Y = 300) and N (x; 32.65, 6.62) are both density functions, which means their integrals over x must both equal 1, and therefore 102ρYk(300) = 1. So the expected value of the number of hours you worked is 32.65 and the standard deviation is√

6.62 ≈ 2.57.

4.1.3 An Algorithm for Continuous Variables

We will show an algorithm for inference with continuous variables in singly-connected Bayesian networks in which the value of each variable is a linear function of the values of its parents. That is, if PAX is the set of parents of X, then

x = wX+ X

Z∈PAX

bXZz, (4.6)

where WX has density function N (w; 0, σ2WX), and WX is independent of each Z. The variable WX represents the uncertainty in X’s value given values of X’s parents. For each root X, we specify its density function N (x; µX, σ2X).

A density function equal to N (x; µX, 0) means we know the root’s value, while a density function equal to N (x; 0, ∞) means complete uncertainty as to the root’s value. Note that σ2WX is the variance of X conditional on values of its parents. So the conditional density function of X is

ρ(x|paX) = N(x, X

Z∈PAX

bX Zz, σ2WX).

When an infinite variance is used in an expression, we take the limit of the expression containing the infinite variance. For example, if σ2= ∞ and σ2 appears in an expression, we take the limit as σ2approaches ∞ of the expression.

Examples of this appear after we give the algorithm. All infinite variances represent the same limit. That is, if we specify N (x; 0, ∞) and N(y; 0, ∞), in both cases ∞ represents a variable t in an expression for which we take the limit as t → ∞ of the expression. The assumption is that our uncertainty as to the value of X is exactly the same as our uncertainty as to the value of Y . Given this, if we wanted to represent a large but not infinite variance for both variables, we would not use a variance of say 1, 000, 000 to represent our uncertainty as to the value of X and a variance of ln(1, 000, 000) to represent our uncertainty as to the value of Y . Rather we would use 1, 000, 000 in both cases. In the same way, our limits are assumed to be the same. Of course if it better models the problem, the calculations could be done using different limits, and we would sometimes get different results.

A Bayesian network of the type just described is called a Gaussian Bayesian network. The linear relationship (Equality 4.6) used in Gaussian Bayesian net-works has been used in causal models in economics [Joereskog, 1982], in struc-tural equations in psychology [Bentler, 1980], and in path analysis in sociology and genetics [Kenny, 1979], [Wright, 1921].

Before giving the algorithm, we show the formulas used in the it. To avoid clutter, in the following formulas we use σ to represent a variance rather than a standard deviation.

The formula for X is as follows:

x = wX + X

4.1. CONTINUOUS VARIABLE INFERENCE 187