CONTINUOUS VARIABLE INFERENCE 183 - More Inference Algorithms

More Inference Algorithms

4.1. CONTINUOUS VARIABLE INFERENCE 183

Y X

k

(y|x) = N(y;10x,30

) k

(x) = N(x;40,5

)

Figure 4.2: A Bayesian network containing continous random variables.

Theorem 4.1 These equalities hold for the normal density function:

N (x; µ, σ²) = N (µ; x, σ²) (4.2) N (ax; µ, σ²) = 1

aN µ

x;µ a,σ²

a²

(4.3)

N (x; µ₁, σ²₁)N (x; µ₂, σ²₂) = kN µ

x;σ²₂µ₁+ σ²₁µ₂

σ²₁+ σ²₂ , σ²₁σ²₂ σ²₁+ σ²₂

(4.4) where k does not depend on x.

N (x; µ₁, σ²₁)N (x; y, σ²₂)dx = N(y; µ₁, σ²₁+ σ²₂). (4.5)

Proof. The proof is left as an exercise.

4.1.2 An Example Concerning Continuous Variables

Next we present an example of Bayesian inference with continuous random variables.

Example 4.1 Suppose you are considering taking a job that pays $10 an hour and you expect to work 40 hours per week. However, you are not guaranteed 40 hours, and you estimate the number of hours actually worked in a week to be normally distributed with mean 40 and standard deviation 5. You have not yet fully investigated the benefits such as bonus pay and nontaxable deductions such as contributions to a retirement program, etc. However, you estimate these other influences on your gross taxable weekly income to also be normally distributed with mean 0 (That is, you feel they about oﬀset.) and standard deviation 30.

Furthermore, you assume that these other influences are independent of your hours worked.

First let’s determine your expected gross taxable weekly income and its stan-dard deviation. The number of hours worked X is normally distributed with density function ρ_X(x) = N (x; 40, 5²), the other influences W on your pay are normally distributed with density function ρ_W(w) = N (w; 0, 30²), and X and W are independent. Your gross taxable weekly income Y is given by

y = w + 10x.

Let ρ_Y(y|x) denote the conditional density function of Y given X = x. The results just obtained imply ρ_Y(y|x) is normally distributed with expected value and variance as follows:

The second equality in both cases is due to the fact that X and W are indepen-dent. We have shown that ρ_Y(y|x) = N(y; 10x, 30²). The Bayesian network in Figure 4.2 summarizes these results. Note that W is not shown in the network.

Rather W is represented implicitly in the probabilistic relationship between X and Y . Were it not for W , Y would be a deterministic function of X. We compute the density function ρ_Y(y) for your weekly income from the values in that network as follows:

4.1. CONTINUOUS VARIABLE INFERENCE 185 The 3rd through 6th equalities above are due to Equalities 4.2, 4.3, 4.5, and 4.3 respectively. We conclude that the expected value of your gross taxable weekly income is $400 and the standard deviation is √

3400 ≈ 58.

Example 4.2 Suppose next that your first check turns out to be for $300, and this seems low to you. That is, you don’t recall exactly how many hours you worked, but you feel that it should have been enough to make your income ex-ceed $300. To investigate the matter, you can determine the distribution of your weekly hours given that the income has this value, and decide whether this distribution seems reasonable. Towards that end, we have

ρ_X(x|Y = 300) = ρ_Y(300|x)ρX(x)

The 3rd equality is due to Equality 4.2, the 4th is due to Equality 4.3, the 6th is due to Equality 4.4, and the last is due to the fact that ρ_X(x|Y = 300) and N (x; 32.65, 6.62) are both density functions, which means their integrals over x must both equal 1, and therefore ₁₀2ρ_Y^k(300) = 1. So the expected value of the number of hours you worked is 32.65 and the standard deviation is√

6.62 ≈ 2.57.

4.1.3 An Algorithm for Continuous Variables

We will show an algorithm for inference with continuous variables in singly-connected Bayesian networks in which the value of each variable is a linear function of the values of its parents. That is, if PA_X is the set of parents of X, then

x = w_X+ X

Z∈PA^X

b_XZz, (4.6)

where W_X has density function N (w; 0, σ²_W_X), and W_X is independent of each Z. The variable W_X represents the uncertainty in X’s value given values of X’s parents. For each root X, we specify its density function N (x; µ_X, σ²_X).

A density function equal to N (x; µ_X, 0) means we know the root’s value, while a density function equal to N (x; 0, ∞) means complete uncertainty as to the root’s value. Note that σ²_W_X is the variance of X conditional on values of its parents. So the conditional density function of X is

ρ(x|paX) = N(x, X

Z∈PAX

bX Zz, σ²_W_X).

When an infinite variance is used in an expression, we take the limit of the expression containing the infinite variance. For example, if σ²= ∞ and σ² appears in an expression, we take the limit as σ²approaches ∞ of the expression.

Examples of this appear after we give the algorithm. All infinite variances represent the same limit. That is, if we specify N (x; 0, ∞) and N(y; 0, ∞), in both cases ∞ represents a variable t in an expression for which we take the limit as t → ∞ of the expression. The assumption is that our uncertainty as to the value of X is exactly the same as our uncertainty as to the value of Y . Given this, if we wanted to represent a large but not infinite variance for both variables, we would not use a variance of say 1, 000, 000 to represent our uncertainty as to the value of X and a variance of ln(1, 000, 000) to represent our uncertainty as to the value of Y . Rather we would use 1, 000, 000 in both cases. In the same way, our limits are assumed to be the same. Of course if it better models the problem, the calculations could be done using diﬀerent limits, and we would sometimes get diﬀerent results.

A Bayesian network of the type just described is called a Gaussian Bayesian network. The linear relationship (Equality 4.6) used in Gaussian Bayesian net-works has been used in causal models in economics [Joereskog, 1982], in struc-tural equations in psychology [Bentler, 1980], and in path analysis in sociology and genetics [Kenny, 1979], [Wright, 1921].

Before giving the algorithm, we show the formulas used in the it. To avoid clutter, in the following formulas we use σ to represent a variance rather than a standard deviation.

The formula for X is as follows:

x = wX + X

4.1. CONTINUOUS VARIABLE INFERENCE 187

In document Learning Bayesian Networks(Neapolitan, Richard) (Page 194-198)