• No results found

Analysis of the models and inference

Algebraic modelling of category distinguishability

6.3 Analysis of the models and inference

We use the subsets C1, . . . , Ck in Definition 6.1 to define constraints on the raw probabilities pi,j in terms of quadratic binomial equations. For all r = 1, . . . , k, let nr be the cardinality of Cr and let Cr ={ir, . . . , ir + nr− 1}. Then we define the constraints:

pi,jpi+ 1,j + 1− pi,j + 1pi+ 1,j = 0 (6.2) for all i, j∈ {ir, . . . , ir+nr−2}. If nr= 1, then no equation is defined. In particular notice that, for each r, the constraints are equivalent to the independence model for the sub-table with rows and columns labelled{ir, . . . , ir+ nr− 1}. For each subset Cr, Equation (6.2) states that (nr− 1)2 adjacent minors vanish.

Definition 6.2 The statistical model associated to C1, . . . , Ck is defined through the set of binomialsB in Equation (6.2). Therefore, the probability model assumes the form

M ={pi,j : B = 0} ∩ ∆>.

We restrict our analysis to the open simplex ∆>. However, algebraic statistics allows us to consider structural zeros, i.e., statistical models in the closed simplex

with pi,j ≥ 0. In this setting, the statistical models become non-exponential and some of the properties we discuss below no longer hold. The interested reader

s

Fig. 6.1 2× 2 minors for the first model (left) and for the second model (right) in Example6.1.

can refer to (Rapallo 2007), where the behaviour of the statistical models on the boundary is studied.

In case of distinguishability of all categories, i.e.

C1={1}, . . . , CI ={I} ,

we do not define any binomial equation and the corresponding probability model is saturated. Let us analyse some non-trivial examples.

Example 6.1 Suppose we have a set of five categories, C ={1, 2, 3, 4, 5} and con-sider the following subsets: C1 ={1, 2}, C2={2, 3}, C3 ={4, 5}. The corresponding probability model is defined through three binomial equations: p1,1p2,2− p1,2p2,1, p2,2p3,3− p2,3p3,2, p4,4p5,5− p4,5p5,4. On the other hand, if we consider the sub-sets C1 = {1, 2, 3}, C2 = {4}, C3 = {5}, the binomials to define the model are:

p1,1p2,2− p1,2p2,1, p1,2p2,3− p1,3p2,2, p2,1p3,2− p2,2p3,1, p2,2p3,3− p2,3p3,2. In Fig-ure6.1 the relevant 2× 2 adjacent minors for these two models are illustrated.

One can also define binomial equations using the τi,j. The most natural way to do this is to define

Mτ ={pi,j : τh ,k = 1 for (h, k)∈ Cr for some r} ∩ ∆>.

Notice that the equations of Mτ are not adjacent minors, but they are functions of some adjacent minors defining M . Hence, it is immediate to see that M ⊆ Mτ. As M is defined only by adjacent minors, we can provide an elementary characterization of the sufficient statistic. The case of Mτ is more involved and its study is currently in progress.

Note that in our modelling the notion of indistinguishability is clearly symmetric and reflexive, but it fails to verify transitivity. As a counterexample, simply consider I = 3 and the subsets C1 = {1, 2} and C2 = {2, 3}. The categories 1 and 2 are indistinguishable, as are the categories 2 and 3, but the categories 1 and 3 are not.

In terms of the τi,j to add the transitivity property means to add more complicated binomial equations to the model. In our example, under the hypotheses τ1,2 = 1 and τ2,3 = 1 simple computations show that τ1,3 = 1 is equivalent to the binomial

constraint

p1,2p1,3p2,1p3,1− p21,1p2,3p3,2= 0 .

This equation does not have an immediate meaning in terms of the probability model.

Now, we follow the theory in (Pistone et al. 2001) to compute the sufficient statistic for our models. As a reference in Polynomial Algebra, see (Cox et al. 1992).

Using a vector notation, let

p = (p1,1, . . . , p1,I, . . . , pI ,1, . . . , pI ,I)t

be the column vector of the raw probabilities. Let R[p] be the polynomial ring in the indeterminates pi,j with real coefficients. Moreover, for any binomial m = pa − pb ∈ B, we define its log-vector as (a − b). The log-vectors of the binomials define a sub-vector space ofRI×I.

The sufficient statistic is a linear map T from the sample spaceX = {1, . . . , I}2 toRs for some integer s. The function T can be extended to a homomorphism from RI×I toRs and we denote by AT its matrix representation.

As we require the raw probabilities to be strictly positive, a binomial equation of the form pa − pb = 0 is equivalent to (a − b), log(p) = 0, where log(p) = (log(p1,1), . . . , log(pI ,I))t and ·, · is the inner product in RI×I. Therefore, taking the log-probabilities, the binomials inB define a linear system of equations and we denote this system by

log(p)tZB= 0 . (6.3)

The columns of ZBare the log-vectors of the binomials in B. If AT is such that its columns are a basis of the orthogonal complement of the column space of ZB in RI×I, then the solutions of the system in Equation (6.3) are the column space of AT, i.e.

log(p) = ATζ (6.4)

for a vector ζ of unrestricted parameters.

Now, let #B be the cardinality of B. It is easy to show that the log-vectors of the elements in B are linearly independent, see (Haberman 1974), Chapter 5. Hence, to compute the sufficient statistic for our statistical models, we need to produce (I2− #B) linearly independent vectors.

In order to make it easier to find these vectors the following notion is useful: We say that a cell is a free cell if the corresponding indeterminate does not belong to any minor inB. Now, a system of generators of the orthogonal to ZB can be found using the following.

Proposition 6.1 Let C1, . . . , Ck ⊂ {1, . . . , I} be as in Definition 6.1and consider the corresponding setB of binomials defined in Equation (6.2). A system of gener-ators of the orthogonal space to ZB is given by the indicator vectors of the rows, of the columns and of the free cells.

Proof Let ZB be the column matrix of the log-vectors of minors inB and let CB

be its column space inRI×I. We also let L be the vector space generated by the indicator functions of the rows, of the columns and of the free cells. In the case B = B0 is the set of all adjacent minors, we have the following:

(CB)=L .

To buildB from B0 we have to remove minors m1, . . . , mt and n1, . . . , nt which can be chosen in such a way that:

– miand niare symmetric with respect to the diagonal. (If miis on the main diagonal, then mi= ni);

– the monomials miare ordered in such a way that the difference of the indices of the topmost-rightmost variable is decreasing.

Now we proceed by induction. Let Bi be obtained by B0 removing the minors m1, . . . , mi and define as above ZBi, CBi and Li. Now we assume that

(CBi)=Li.

When the minor mi+ 1 is removed we create at least a new free cell. Each new free cell has indicator vector not inLi as it is not orthogonal to the log-vector of mi+ 1 but it is in

CBi + 1

. Pick one of the free cells and let vi+ 1 be its indicator vector.

We conclude that

CBi + 1

⊃ Li+ 1=Li+vi+ 1 and as dimLi+ 1+ dim

CBi + 1

= I2 we have that 

CBi + 1

=Li+ 1. Repeating this process we obtain the proof.

Equation (6.4) allows us to consider our models as log-linear models. Thus, max-imum likelihood estimates of the cell probabilities can be found through numerical algorithms, such as the Fisher scoring or the Iterative Proportional Fitting. The R-package gllm (Generalized Log-Linear Models) is an easy tool to compute the maximum likelihood estimates of the cell probabilities. The input is formed by the observed cell counts and the design matrix AT, see (Duffy 2006). Asymptotic chi-square p-values are then easy to compute. Non-asymptotic inference can be made through Algebraic Statistics, as extensively described for two-way tables in (Rapallo 2005). Moreover, Chapter 8 in (Sturmfels 2002) highlights connections between the maximum likelihood problem for contingency tables and the theory of systems of polynomial equations.