Inference in Markov Random Fields - Object segmentation in computer vision

2.2 Object segmentation in computer vision

2.2.3 Inference in Markov Random Fields

Inference in MRF-based object segmentation is the process of predicting the label values by combining cues from different energy terms, or equivalently, minimizing the energy defined by the energy function. In a probabilistic framework, the possible label configurations are fully described by the posterior distribution of the label variables given the input. In practice, we usually want to obtain a certain point estimator, such as the mean or mode for the distribution, as our labeling output. Each estimator has an associated loss function that quantifies the discrepancy between the estimated configuration and the “ideal” configuration. The estimator minimizes the corresponding loss function. In practice, the MAP estimate and the MPM estimateare widely used:

• MAP estimate: Maximum A Posterior (MAP) of labeling D given image I is the mode of the posterior distribution,

D∗= arg max

D P (D|I ), (2.24)

where the loss function is the 0-1 loss: L(D, ˆD) = δ(D, ˆD).

• MPM estimate: Marginal Posterior Mode (MPM) is the mode of the marginal posterior distribution,

d∗_i = arg max

P (di|I ), ∀i, (2.25)

where the loss function is the Hamming loss: L(D, ˆD) = |{i : di6= ˆdi}|.

Exact computation of the estimators is feasible for certain probabilistic models with special structures. For all other model structures we have to use approximate algorithms since the exact inference in NP-hard. We will discuss four types of inference algorithms as follows. Note that the three latter types are all approximate inference algorithms.

Exact inference. In certain restricted situations, it is possible to efficiently compute the MAP labeling in MRFs by constructing a specialized graph. In particular, [62] presents the graph

cut algorithm, or the minimum cut/maximum flow algorithm for binary image segmentation. In the case of a tree-structured graph, the Belief Propagation (BP) [156] algorithm is able to compute the marginals or modes of the model distribution. The BP algorithm propagates a set of messages carrying the interaction information through a tree model until they achieve consistency.

Approximate deterministic inference. In the context of image labeling, the computation of MAP is essentially a combinatorial optimization problem. Therefore, the MAP estimation is an energy minimization in which the domain is discrete. In general, two approximate ap- proaches are commonly used for this minimization-based labeling, i.e., heuristic local search and relaxation-based methods.

Heuristic local search-based methods search for the local minima in a state space neighborhood of an energy function from an initial estimate. Therefore, the quality of solution usually relies on the initial estimate and the size of neighborhood. The neighborhood in the state space is defined with respect to certain transformations of the state configuration. For example, the Iterative Conditional Mode (ICM) [112] approach defines the transformation as changing the label for a single node. Boykov et al. propose an effective local search method with a large neighborhood [24]. The algorithm defines two transformations (or moves), the α-expansion and α-β-swap, generating a much larger neighborhood in the state space. It greedily searches for the local minima based on the current estimate, and in each step finds the locally optimal transformation that gives the largest decrease of energy. In particular, the local search avoids bad local minima, and can be shown to come within a factor of 2 of the energy minimum. Each local move can be formulated as a graph cut problem that can be efficiently solved.

General discrete energy minimization can be viewed as an integer programming problem. In relaxation-based methods, linear programming relaxations have been adopted for approxi- mately solving for the MAP solution in MRFs [217, 224]. Firstly, the MAP problem is formulated as an Integer Linear Problem (ILP). By relaxing the integer constraints, the problem can be converted to a Linear Program (LP) that can be more efficiently solved. The integer solution can be recovered from the fractional solution of the LP [85].

Variational inference. In variational approximation, we use an approximating family of label probability distributions that are simpler than the original distribution and in which the inference is tractable. During inference, we choose a specific distribution from the approximating family to match the original distribution. The marginals or modes of the approximating distribution are used as substitutes for the original ones.

The simplest approximate inference, called mean field approximation, is originally a method of approximation for the computation of the mean of an MRF. Originating in statistical me- chanics, mean field approximation uses an approximating family with a fully factorized form [229]. In general, mean field approximation can only obtain a result with good quality when the nodes do not fluctuate a lot around their mean values. The algorithm can be thought of as a parallel

message-passing algorithm where each node sends an identical message to each of its neighbors at a particular time step. The message is, in turn, based on the message it received from its neighbors. It should be noted that we can improve the approximation of mean field by taking factorial distributions where each component is a larger but tractable subgraph of the original factor graph, leading to the structured mean field approach [176]. The fully factorized mean field algorithm is sometimes referred to as naive mean field in comparison.

A more sophisticated approximation based on BP, called the Loopy Belief Propagation (Loopy BP), uses a more complicated approximating family that includes pairwise marginals. In particular, the messages sent from a node to its neighbors at a given time step are different. See [215] for a comparison between the mean field and Loopy BP algorithms.

Sampling-based inference. Sampling methods are a general optimization approach commonly used to handle intractable posterior distributions in MRFs. The Markov Chain Monte Carlo (MCMC) sampling methods, including Gibbs sampling [56] and Metropolis-Hastings sampling [216], are widely used in practice. The basic idea behind MCMC is to define a Markov chain in such a way that its stationary distribution is the target distribution. After drawing samples from the Makov chain, we can derive the distribution or statistics from those samples. In contrast to deterministic methods, MCMC is guaranteed to be unbiased and con- verge in the limit.

In Gibbs sampling, the algorithm repeatedly sweeps through the MRF updating one node at a time. At each step, a node is updated to be a random draw from its conditional distribution, holding all neighboring nodes fixed. Metropolis-Hastings algorithm provides a more general approach that uses a proposal distribution to sample a candidate labeling given current configuration iteratively, and only changes the current labeling with a certain acceptance probability at each iteration.

Theoretically, the estimates provided by sampling become exact in the limit as the sample size grows to infinity. In practice, however, sampling-based methods are computationally ex- pensive as many samples are needed to obtain a good estimate. Methods have been proposed to improve sampling efficiency, particularly in graphical models with special structures [71].

Simulated Annealing (SA) [205] is another sampling-based algorithm that can be used for MAP inference. It draws samples from the annealed posterior distribution as the temperature decreases. When the temperature gets close to zero, only MAP states have significant probability mass. SA also provides the global MAP estimate, but the annealing must take place in infinitesimal steps, and it uses Gibbs sampling each time the temperature is reduced.

In document Context-driven Object Detection and Segmentation with Auxiliary Information (Page 63-65)