Analysis of the models and inference - Algebraic modelling of category distinguishability

Algebraic modelling of category distinguishability

6.3 Analysis of the models and inference

We use the subsets C₁, . . . , C_k in Definition 6.1 to define constraints on the raw probabilities pi,j in terms of quadratic binomial equations. For all r = 1, . . . , k, let nr be the cardinality of Cr and let Cr ={ir, . . . , ir + nr− 1}. Then we define the constraints:

pi,jpi+ 1,j + 1− pi,j + 1pi+ 1,j = 0 (6.2) for all i, j∈ {ir, . . . , i_r+n_r−2}. If nr= 1, then no equation is deﬁned. In particular notice that, for each r, the constraints are equivalent to the independence model for the sub-table with rows and columns labelled{ir, . . . , i_r+ n_r− 1}. For each subset Cr, Equation (6.2) states that (nr− 1)² adjacent minors vanish.

Deﬁnition 6.2 The statistical model associated to C1, . . . , Ck is deﬁned through the set of binomialsB in Equation (6.2). Therefore, the probability model assumes the form

M ={pi,j : B = 0} ∩ ∆>.

We restrict our analysis to the open simplex ∆>. However, algebraic statistics allows us to consider structural zeros, i.e., statistical models in the closed simplex

∆_≥ with pi,j ≥ 0. In this setting, the statistical models become non-exponential and some of the properties we discuss below no longer hold. The interested reader

Fig. 6.1 2× 2 minors for the ﬁrst model (left) and for the second model (right) in Example6.1.

can refer to (Rapallo 2007), where the behaviour of the statistical models on the boundary is studied.

In case of distinguishability of all categories, i.e.

C1={1}, . . . , CI ={I} ,

we do not deﬁne any binomial equation and the corresponding probability model is saturated. Let us analyse some non-trivial examples.

Example 6.1 Suppose we have a set of five categories, C ={1, 2, 3, 4, 5} and con-sider the following subsets: C₁ ={1, 2}, C2={2, 3}, C3 ={4, 5}. The corresponding probability model is defined through three binomial equations: p1,1p2,2− p1,2p2,1, p_2,2p_3,3− p2,3p_3,2, p_4,4p_5,5− p4,5p_5,4. On the other hand, if we consider the sub-sets C1 = {1, 2, 3}, C2 = {4}, C3 = {5}, the binomials to define the model are:

p1,1p2,2− p1,2p2,1, p1,2p2,3− p1,3p2,2, p2,1p3,2− p2,2p3,1, p2,2p3,3− p2,3p3,2. In Fig-ure6.1 the relevant 2× 2 adjacent minors for these two models are illustrated.

One can also deﬁne binomial equations using the τi,j. The most natural way to do this is to deﬁne

Mτ ={pi,j : τh ,k = 1 for (h, k)∈ Cr for some r} ∩ ∆>.

Notice that the equations of Mτ are not adjacent minors, but they are functions of some adjacent minors defining M . Hence, it is immediate to see that M ⊆ Mτ. As M is defined only by adjacent minors, we can provide an elementary characterization of the sufficient statistic. The case of M_τ is more involved and its study is currently in progress.

Note that in our modelling the notion of indistinguishability is clearly symmetric and reﬂexive, but it fails to verify transitivity. As a counterexample, simply consider I = 3 and the subsets C₁ = {1, 2} and C2 = {2, 3}. The categories 1 and 2 are indistinguishable, as are the categories 2 and 3, but the categories 1 and 3 are not.

In terms of the τ_i,j to add the transitivity property means to add more complicated binomial equations to the model. In our example, under the hypotheses τ1,2 = 1 and τ2,3 = 1 simple computations show that τ1,3 = 1 is equivalent to the binomial

constraint

p_1,2p_1,3p_2,1p_3,1− p²1,1p_2,3p_3,2= 0 .

This equation does not have an immediate meaning in terms of the probability model.

Now, we follow the theory in (Pistone et al. 2001) to compute the suﬃcient statistic for our models. As a reference in Polynomial Algebra, see (Cox et al. 1992).

Using a vector notation, let

p = (p1,1, . . . , p1,I, . . . , pI ,1, . . . , pI ,I)^t

be the column vector of the raw probabilities. Let R[p] be the polynomial ring in the indeterminates pi,j with real coefficients. Moreover, for any binomial m = pâ − p^b ∈ B, we define its log-vector as (a − b). The log-vectors of the binomials define a sub-vector space ofRÎ^×I.

The suﬃcient statistic is a linear map T from the sample spaceX = {1, . . . , I}² toR^s for some integer s. The function T can be extended to a homomorphism from R^I×I toR^s and we denote by A_T its matrix representation.

As we require the raw probabilities to be strictly positive, a binomial equation of the form pâ − p^b = 0 is equivalent to (a − b), log(p) = 0, where log(p) = (log(p1,1), . . . , log(pI ,I))^t and ·, · is the inner product in RÎ×I. Therefore, taking the log-probabilities, the binomials inB define a linear system of equations and we denote this system by

log(p)^tZ_B= 0 . (6.3)

The columns of Z_Bare the log-vectors of the binomials in B. If AT is such that its columns are a basis of the orthogonal complement of the column space of Z_B in R^I×I, then the solutions of the system in Equation (6.3) are the column space of AT, i.e.

log(p) = ATζ (6.4)

for a vector ζ of unrestricted parameters.

Now, let #B be the cardinality of B. It is easy to show that the log-vectors of the elements in B are linearly independent, see (Haberman 1974), Chapter 5. Hence, to compute the suﬃcient statistic for our statistical models, we need to produce (I²− #B) linearly independent vectors.

In order to make it easier to ﬁnd these vectors the following notion is useful: We say that a cell is a free cell if the corresponding indeterminate does not belong to any minor inB. Now, a system of generators of the orthogonal to Z_B can be found using the following.

Proposition 6.1 Let C1, . . . , Ck ⊂ {1, . . . , I} be as in Deﬁnition 6.1and consider the corresponding setB of binomials deﬁned in Equation (6.2). A system of gener-ators of the orthogonal space to Z_B is given by the indicator vectors of the rows, of the columns and of the free cells.

Proof Let Z_B be the column matrix of the log-vectors of minors inB and let CB

be its column space inR^I^×I. We also let L be the vector space generated by the indicator functions of the rows, of the columns and of the free cells. In the case B = B0 is the set of all adjacent minors, we have the following:

(C_B)^⊥=L .

To buildB from B0 we have to remove minors m1, . . . , mt and n1, . . . , nt which can be chosen in such a way that:

– miand niare symmetric with respect to the diagonal. (If miis on the main diagonal, then mi= ni);

– the monomials m_iare ordered in such a way that the diﬀerence of the indices of the topmost-rightmost variable is decreasing.

Now we proceed by induction. Let Bi be obtained by B0 removing the minors m1, . . . , mi and deﬁne as above Z_B_i, C_B_i and Li. Now we assume that

(C_B_i)^⊥=Li.

When the minor mi+ 1 is removed we create at least a new free cell. Each new free cell has indicator vector not inLi as it is not orthogonal to the log-vector of m_{i+ 1} but it is in

C_B_{i + 1}_⊥

. Pick one of the free cells and let vi+ 1 be its indicator vector.

We conclude that

C_B_{i + 1}_⊥

⊃ Li+ 1=Li+vi+ 1 and as dimLi+ 1+ dim

C_B_{i + 1}_⊥

= I² we have that

C_B_{i + 1}_⊥

=Li+ 1. Repeating this process we obtain the proof.

Equation (6.4) allows us to consider our models as log-linear models. Thus, max-imum likelihood estimates of the cell probabilities can be found through numerical algorithms, such as the Fisher scoring or the Iterative Proportional Fitting. The R-package gllm (Generalized Log-Linear Models) is an easy tool to compute the maximum likelihood estimates of the cell probabilities. The input is formed by the observed cell counts and the design matrix AT, see (Duﬀy 2006). Asymptotic chi-square p-values are then easy to compute. Non-asymptotic inference can be made through Algebraic Statistics, as extensively described for two-way tables in (Rapallo 2005). Moreover, Chapter 8 in (Sturmfels 2002) highlights connections between the maximum likelihood problem for contingency tables and the theory of systems of polynomial equations.

In document Algebraic and Geometric Methods in Statistics (Page 132-135)