Step (or press ctrl-enter in text box)

(1)

Examples:[vote] [lsat] [track] [hmm] [alarm] [med] [dep] [delay] [mln][new]

[Background] [Documentation]

Step

(or press ctrl-enter in text box)

Objective_{: P()}

Algorithm: variable elimination (sum)

No variables created!

(2)

(3)

Factor graphs and inference

• We have a set of variables X = (X1, . . . , Xn), where each variable Xi can be assigned a value in its domain Domaini. An full assignment x = (x1, . . . , xn)

gives a value to each variable, and a partial assignment gives a value to a subset of the variables.

• We have a set of factors (constraints) f1, . . . , fm. Each factor fj depends on a subset of the variables Dj ⊂ X and maps each partial assignment to the

variables Dj to a non-negative real number representing how ”good” that partial assignment is. This is the way (local) dependencies between variables are

expressed.

• Define a weight for each assignment Weight(x) = Qm

j=1fj(x) which gives a global notion of goodness for the assignment x (notice that this takes into account all

the factors). A Markov network further defines a probability distribution over assignments by normalizing the weights: P(X1 = x1, . . . , Xn = xn) = PWeight(x)

x0 Weight(x0)

. • Given a factor graph (defined by the variables and factors) which contains all the information, we’d like to ask queries about these variables (as an analogy,

think about performing SQL queries on databases). There are two primary types of inference:

– MAP inference: compute the maximum weight (full) assignment: arg max_xWeight(x). Note that we don’t need probabilities here. This gives us the

best global assignment to all the variables, taking into account all the factors (sources of information).

– Marginal inference: suppose we are only interested in a subset of variables A ⊂ X. Call these query variables. For each partial assignment A = a, compute the weighted fraction of full assignments that are consistent with A = a: P(A = a) =

P

x consistent with a Weight(x)

P

x0Weight(x0)

.

(4)

(5)

Inference algorithms

We will consider three types of inference algorithms. Each type consists of two variants, one for MAP inference and one for marginal inference. Each algorithm has a set of knobs which can affect both speed and accuracy.

Type MAP inference Marginal inference

Variable elimination max variable elimination sum variable elimination

Incremental beam search particle filtering

Local iterated conditional modes (ICM) Gibbs sampling

(6)

(7)

Exact inference

• Variable elimination: The idea is to iteratively eliminate one non-query variable at a time. Eliminating a variable Xi replaces all the factors that depend on

X_i with one factor, either summing or maxing over the possible values of X_i.

– The new factor has arity equal to the size of the Markov blanket of X_i, which can be as large as the number of nodes n. Just storing the factor can be

exponential in the arity. So in practice, variable elimination is mostly applied to tree-structured factor graphs or graphs with low tree-width. – How do we answer our queries?

∗ For marginal inference, we eliminate all the non-query variables X − A, producing a new factor graph with variables A. Then just enumerate over each possible assignment A = a and evaluate its weight under the new factor graph. We can normalize these weights to get P(A = a).

∗ For MAP inference, we eliminate all the variables. This might seem mysterious since we want a full assignment to all the variables, but they’re all gone now. The trick is to store, for each new factor created, the assignment to the eliminated variable that achieved the max (this essentially provides a back pointer). Now we go through the factors created in reverse elimination order, setting the variables to a value that achieved the max.

– The only knob in variable elimination is the elimination order. There can be a huge difference in time/space complexity depending on the order. For trees, always eliminate leaves. For general graphs, choosing the variable that will create the smallest new factor is a reasonable heuristic. Variable elimination always provides the exact answer.

• Backtracking: the idea is to recursively enumerate all possible assignments to compute the desired max or sum. We can employ variable/value ordering heuristics. We can further prune zero weight assignments using forward checking or AC-3.

(8)

(9)

Approximate inference

• Incremental inference: the idea is to maintain a set of partial assignments. Each iteration, we extend all the partial assignments to a new varaible, and then prune the resulting assignments to keep the set to at most size K.

– The output of beam search and particle filtering is a set of weighted full assignments. For MAP inference, take the highest weight assignment. For marginal inference, use these particles as an approximation of the full joint distribution over all variables, and we can use it to estimate any query P(A = a) by computing the weighted fraction of particles that satisfy that query.

– The main knob to set is K, the beam size / number of particles. For particle filtering, we also have a proposal distribution for extend partial assignments, which for HMMs is usually taken to be to the transition distribution.

– Both beam search and particle filtering are approximate that become more exact as K increases. Their success relies on having strong local information (from the factors which are included so far). If there are long range dependencies, then all bets are off.

• Local inference: the idea is to maintain a full assignment and iteratively reassign one variable at a time conditioned on all others, either by sampling or maximizing.

– Both ICM and Gibbs sampling are approximate. Their success relies on having fairly weak dependencies between the variables. The main knob to set is the initial assignment which can be crucial for good performance (especially for ICM).

– In practice, Gibbs sampling (or MCMC algorithms in general can be re-purposed for MAP inference (these algorithms are known as simulated annealing or stochastic hill-climbing).

(10)

(11)

Documentation

This demo allows you to programmatically construct your own factor graph and step through various inference algorithms. • Problem definition

– variable(variableName, domain): adds a variable to the factor graph, where domain is a list of possible values (e.g., variable(’A’, [0, 1])). – factor(factorName, variables, spec): adds a factor to the factor graph, where variables is a string representing a space-separated list of variable

names and spec is either a function that takes in a partial assignment to variables and returns the factor value or a map from partial assignment (represented as space-separated string of values) to the factor value. Example: factor(’f1’, ’A B’, function(a, b) {return a < b;})

– condition(variableName, value): assign value to variableName.

– query(variables): sets the query variables to variables, which is a string representing a space-separated list of variable names. Example: query(’A B’).

• Inference algorithms

– maxVariableElimination(opts), sumVariableElimination(opts): run variable elimination (opts.order: space-separated list of variables to elim-inate)

– beamSearch(opts), particleFiltering(opts): run incremental inference algorithms (opts.K: number of candidates)

– iteratedConditionalModes(opts), beamSearch (opts): run local inference algorithms (opts.steps: number of steps of algorithms to take between renderings)