MPM Statistical Model - Multinomial Pattern Matching

3.2 Multinomial Pattern Matching

3.2.1 MPM Statistical Model

The underlying statistical model utilized by MPM is a Dirichlet-Multinomial (DM) model where each quantized pixel is assumed a realization of an independent but not necessarily identically distributed (INID) Multinomial random variable (RV) with the underlying probabilities distributed as Dirichlet. This model is a result of using a Bayesian estimate of the underlying probabilities a pixel will realize a given quantization level from a set of in-class training images [40]. In the general Nq = 2 case, this model reduces to the Beta-Bernoulli

model discussed previously.

To illustrate, we consider the training procedure for MPM as illustrated in Figure 3.1. Each row is a quantized and flattened training image indexed by pixel location i_{∈ [1 . . . K]} originating from a training dataset of the same type/pose class label consisting of N training images. These images are quantized to Nq values yielding labels in the set{1 . . . Nq}.

MPM then assumes that each column is composed of IID realizations of the K INID underlying multinomial RVs.

If the underlying probabilities, ~pi where ~pi is an [Nq x 1] vector, were known the

likelihood of a pixel realizing a specific quantization level can be written as

Pr(I_{|~p) =}

q=1

pN_q q (3.17)

where the dependence on pixel i has been suppressed and Nq = PN

the counts of quantile realization q at pixel location i across the N training images. As these probabilities are unknown, they must first be estimated and MPM utilizes a Bayesian approach. A Dirichlet prior is chosen to yield a solution to the posterior Pr(~p_{|I) using} Baye’s method with conjugate priors. The prior Dirichlet distribution can be written as

Pr(~p_{|~α) =} Nq Y q=1 pαq−1 q ∼ Dirichlet{~α} (3.18)

up to a constant term consisting of a ratio of gamma functions. Then, ~α can be interpreted as virtual counts of quantile realizations capable of encoding any a priori information on the underlying probabilities. The available reference on MPM leaves this as a general tuning parameter that can be set to any positive value [3], however it has been found choosing ~

α = ~1, where this is again an [Nq x 1] vector typically known as Laplace’s prior, is an

effective default selection. It can then be shown that by conditioning on the set of the N training images [37], the posterior distribution on the underlying probabilities becomes

Pr(~p_{|I) =} Nq Y q=1 pNq+αq−1 q (3.19)

up to a constant known to be Dirichlet distributed yielding the Dirichlet-Multinomial (DM) model where ~ Xi ∼ Multinomial{~pi} ~ pi ∼ Dirichlet{ −−−−→ N_iq+ α_} (3.20)

where ~Xi is a vector valued RV representing the pixel realization at index i ∈ [1 . . . K]

and ~N_iq is a vector counts of each of the Nq quantile realizations at pixel i calculated

over the N training images. Therefore, the independent Multinomial RVs representing the class-conditional template distributions are fully parametrized by the counts of quantile realizations for each pixel calculated across the N training images and the prior parameter

α. Alternatively, the equivalent variable ~ˆpi can be used and calculated as

~ˆpi =

N_iq

N , i∈ [1 . . . K], q ∈ [1 . . . Nq]. (3.21)

In the following section, MPM requires the calculation of two normalization terms, specifically the mean and variance of a quadratic penalty function which requires calcu-

lating the moments of the ~Xi term in (3.20). While the multinomial distribution has a

dependence on the number of draws, the MPM hypothesis test is designed to determine whether a single image originated from a given class conditional DM template, or did not,

therefore we assume a single draw. The mean of Xican be written as [22]

−−−−→ E_{Xi} = N_iq+ α N + Nqα = N ~ˆpi+ α N + Nqα = ~˜pi (3.22)

and the variance of Xias

−−−−−→ Var_{Xi} =

(N_iq+ α)(1_{− N}_iq+ α)

N + Nqα

(3.23)

where the denominator of these expressions results from PNq

q=1N

q _{+ α. It is noted that}

(3.22) is the minimum mean-squared error (MMSE) estimate of the underlying probabilities conditioned on the observed counts or empirical probabilities in the training dataset, which will be utilized later in Section 3.2.3. Otherwise, this notation assumes the vector or 1-of-K form of the Multinomial distribution in the case of a single draw, sometimes referred to as a categorical distribution, and (3.22) and (3.23) are [Nqx 1] vectors [21]. In the more

general multinomial cases, these vectors describe the moments of a distribution of counts across an arbitrary number of trials, and we note that assuming only a single trial simplifies things greatly as will be seen in Section 3.2.3 where the cross-terms can be disregarded and the computation of the higher order moments are equal to the first order moment.

and ˆC which was defined in (3.12). It is again noted that the certain implementations assume ˆC = 0 and do not include it in the calculation [3].

In document Performance Prediction of Quantization Based Automatic Target Recognition Algorithms (Page 41-44)