A New Method for Computing Elimination Ideals of Likelihood Equations

(1)

A New Method for Computing Elimination Ideals of Likelihood

Equations

Xiaoxian Tang

∗

School of Mathematics and Systems

Science, Beihang University Beijing, China [email protected]

Timo de Wolff

Department für Mathematik,

Technische Universität Braunschweig

Braunschweig, Germany

t.de- wolff@tu- braunschweig.de

Rukai Zhao

Computer Science & Engineering,

Texas A&M University

College Station, Texas

[email protected]

ABSTRACT

We develop a probabilistic algorithm for computing elimination ideals of likelihood equations. We show experimentally that it is far more efficient than directly computing Gröbner bases or the interpolation method proposed in [39, 40] for medium to large size models. Furthermore, we deduce discriminants of the elimination ideals, which play a central role in real root classification. In partic-ular, we can compute the discriminant of one Jukes-Cantor model in phylogenetics (with size 8 GB text file).

CCS CONCEPTS

•Computing methodologies → Symbolic and algebraic

manipulation; Algebraic algorithms.

KEYWORDS

Maximum likelihood estimation, Likelihood equation, Real root classification, Discriminant, Elimination ideal

ACM Reference Format:

Xiaoxian Tang, Timo de Wolff, and Rukai Zhao. 2019. A New Method for Computing Elimination Ideals of Likelihood Equations. InInternational Symposium on Symbolic and Algebraic Computation (ISSAC ’19), July 15–18, 2019, Beijing, China. ACM, New York, NY, USA, 8 pages. https://doi.org/10. 1145/3326229.3326241

1 INTRODUCTION

This work is motivated by themaximum likelihood estimation

problem in statistics:

Which probability distribution describes a given data set optimally for a chosen statistical model?

A standard way to answer this question is to determine a point in the model that maximizes alikelihood function; see (1). When the model is algebraic (see Definition 2.2) and the data is discrete, then one finds all critical points of the likelihood function by solving a system oflikelihood equations(2) via applying Lagrange multipliers.

∗

Corresponding author

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and /or a fee. Request permissions from [email protected].

Solving likelihood equations motivates an important branch in algebraic statistics [1, 13, 14, 24–27, 31].

Likelihood equations form an algebraic system inprobability

variablesp0, . . . , pn,Lagrange multipliersλ1, . . . , λs+1, and

param-etersu0, . . . ,un representing the data obtained from statistical experiments. Given such a system with generically chosen data vector(u0, . . . ,un), the number of complex solutions is a finite

non-negative constant, called themaximum-likelihood-degree (ML-degree); see Definition 2.5 and [27, 31]. Since the variablesp_i rep-resent probabilities, one is especially interested in areal solution classification[15, 33, 47] of likelihood equations. Unfortunately, this classification is very challenging, since it is a specificreal quantifier eliminationproblem [2–5, 7–11, 16, 17, 23, 28–30, 34–38, 41, 42, 45], a fundamental problem in computational real algebraic geometry. The number of real solutions only changes when the parameters (data) pass a set called thediscriminant variety; see [33, Definition 1]. Hence, these discriminant varieties of likelihood equations play a core role in real solution classification. In [40], Rodriguez and the first author studied how to compute discriminant varieties for a likelihood equation system efficiently. Experiments [40, Tables 2–3] suggest that Gröbner bases [12, 21, 22] cannot be computed directly for medium to large size models. Rodriguez and the first author proposed in [40, Algorithm 2] a probabilistic algorithm based on evaluation/interpolation techniques, which, in theory, works for arbitrary general zero-dimensional systems. In practice, however, it is limited to small algebraic models with ML-degrees not greater than 6. The main bottleneck is that the size of the discriminants we are trying to compute are huge; for instance, a model with ML-degree 6 has a discriminant with total ML-degree 12 and thousands terms.

The key idea of this article is to exploit the special structure of likelihood equations to both improve the computational efficiency and to allow ourselves to compute generators of elimination ideals instead of computing discriminants directly. We summarize the entire challenge in Figure 1. Thus, thetask of this paper is to effi-ciently compute the elimination ideal with respect to all parameters (data) and one variable for a given system of likelihood equations.

More precisely, we have the followingproblem statement:

Input: Likelihood equations

f0, . . . , fn+s+1∈ Q[u0, . . . ,un, p0, . . . , pn, λ1, . . . , λs+1].

Output: A generator of the elimination ideal with respect to one probability variablep_ifor somei between 0 and n

p

(2)

LE

Algebraic Statistics

fundamental

RSC DV Goal: EI

Computational Algebraic Geometry

fundamental motivates motivates motivates

LE: Solving Likelihood Equations; RSC: Real Solution Classification; DV: Computing Discriminant Varieties; EI: Computing Elimination Ideals.

Figure 1: An visualization of our motivation.

Models #p_i MLD

Timings

Standard Interpolation Algorithm 1

1 4 3 0.046 s 1.831 s 0.525 s 2 6 2 0.524 s 24.983 s 2.310 s 3 6 4 3.211 s 282.425 s 16.174 s 4 6 6 ∞ 7933.230 s 782.676 s 5 5 12 ∞ 10726.268 s 761.257 s 6 9 10 ∞ > 1583 d 14 d 7 5 23 ∞ 9919.260 s 4624.575 s 8 8 14 ∞ > 4667 d > 15 d 9 8 9 ∞ > 39 d 2 d

Table 1: Runtimes for computing elimination ideals (s: sec-onds; d: days). The models are sorted by the same order with [40, Table 3]. The column “standard” constains the runtimes via a regular FGb Gröbner basis computation, the column “Interpolation” contains the runtimes for [40, Algorithm 2], and the last column contains the runtimes for our Algo-rithm 1.

Ourmain contribution is the development of a probabilistic al-gorithm, Algorithm 1, for computing elimination ideals of Lagrange likelihood equations. We implemented the algorithm inMaple. Our experiments, summarized in Table 1, show that Algorithm 1 is sig-nificantly more efficient than the standard approach of directly computing Gröbner bases, or the evaluation/interpolation method in [40] for statistical models beyond small size, see Table 1 and Section 6 for further details. Our crucial idea to save computational time is to exploit an observation that the elimination ideals have special structures; see (3).

In particular, we are able to compute the discriminant of one Jukes-Cantor model [27,P_comb, Example 15] in phylogenetics [44, Chapter 15]. This was impossible before. We point out that this is a gigantic polynomial, whose total degree is 176, and which take several GB memory when stored in a text file, see Table 2 (Model 9) for further details.

The article is organized as follows. Section 2 are preliminaries. In Section 3, we discuss the specialization properties of (radical) elimination ideals and multivariate factorization. In Section 4, we introduce general zero-dimensional systems. In Section 5, we in-troduce Algorithm 1 together with a list of sub-algorithms for computing elimination ideals of likelihood equations. In Section 6, we explain the implementation details and compare the efficiency of our code with existing methods.

2 PRELIMINARIES

We assume that the reader is familiar with the fundamental con-cepts of computational algebraic geometry. For a general overview, we refer the reader to [18] and [43].

2.1 Notation

Throughout the paper, we use bold letters for vectors or a finite set of polynomials, e.g.,z = (z1, . . . , zn) andH = {h1, . . . , hm}.

Forh ∈ Q[z] we denote thetotal degreeofh bydeg(h)and the degree off with respect to a particular variable zj asdeg(h, zj). We denote bycoeff(h, zi

j)thecoefficient ofh with respect to the monomialzi_j. IfN = deg(h, zj), then simply denotecoeff(h, zN_j )by lcoeff(h, z_j). ForH ⊆ Q[z], we denote by⟨H⟩theidealgenerated

byH in Q[z], and byV(H)theaffine variety {z ∈ Cn |h(z) =

0 for allh ∈ H }. For any ideal I ⊂ Q[z], we denote by

√

Ithe

radical idealofI, and we denote byV(I)the affine variety defined

by the generator polynomials ofI. For any subset S ⊆ Cn, we

denote byI(S)the ideal generated by the polynomials vanishing

onS, i.e.,I(S) = {h ∈ Q[z] | h(z∗) = 0 for all z∗ ∈ S}, and denote byStheZariski closureV(I(S)) of S in Cn. For a positive integern, and for any 1 ≤ i ≤ n, we denote thecanonical projection byproj_i _{: C}n → Ci. Finally, we denote bygcd(a,b)thegreatest

common divisorof two integersa,b.

2.2 Algebraic Statistics

In this section we recall the basic notions from algebraic statistics, which we need in this article.

Definition 2.1 (Probability Simplex). We define the

n-dimen-sional probability simplexas∆_n = {(p0, . . . , pn) ∈ Rn+1|p0 >

0, . . . , p_n > 0, p₀+ · · · + p_n= 1}.

Definition 2.2 (Algebraic Statistical Model and Model Invari-ant). Given homogenous polynomialsд1, . . . , дs ∈ Q[p0, . . . , pn]

such thatV(д1, . . . , дs) ⊊ Cn+1is irreducible and generically

re-duced, we define analgebraic statistical modelas M = V(д1, . . . , дs) ∩∆n.

Eachд_i is called amodel invariantofM. If V(д1, . . . , дs) has

codi-mensions, then we say {д1, . . . , дs} is a set ofindependent model

invariants.

Given an algebraic statistical modelM and adata vectoru =

(u0, . . . ,un) ∈ Rn+1_≥0 , themaximum likelihood estimation (MLE)

problem is the optimization problem

max Πn

k=0pukk subject top ∈ M, (1)

which is fundamental in statistics [20, Chapter 2]. One way to solve the MLE problem is to solve a system of likelihood equations [27] formulated by the Lagrange multiplier method. We give the explicit formulation of such a system:

Definition 2.3 (Lagrange Likelihood Equations). Given an

algebraic statistical modelM with a set of independent model

(3)

{f0, . . . , fn+s+1} below is said to be the system ofLagrange

likeli-hood equationsofM when set equal to zeros:

f0(u, p, λ) = p0· _∂д 1 ∂p0λ1+ · · · + ∂дs ∂p0λs+ λs+1 −u0, .. . fn(u, p, λ) = pn·∂д1 ∂pnλ1+ · · · + ∂дs ∂pnλs+ λs+1 −un, fn+1(u, p, λ) = д1(p0, . . . , pn), .. . fn+s(u, p, λ) = дs(p0, . . . , pn), fn+s+1(u, p, λ) = дs+1(p0, . . . , pn) := p0+ · · · + pn− 1, (2)

with inderterminatesu= (u0, . . . ,un),p= (p0, . . . , pn), andλ=

(λ1, . . . , λs+1). More specifically,u0, . . . ,un are parameters, and

p0, . . . , pn, λ1, . . . , λs+1are variables.

Theorem 2.4. [27] Given a system of Lagrange likelihood equa-tionsf0, . . . , fn+s+1defined in (2), there exists an affine varietyV ⊊

Cn+1, and a non-negative integerN such that for any b ∈ Cn+1\V , the equations

f0(b,p, λ) = · · · = fn+s+1(b,p, λ) = 0

haveN common complex solutions in Cn+1× Cs+1.

Definition 2.5 (Maximum-Likelihod-Degree). [27] Given an algebraic statistical modelM with a system of Lagrange likelihood equations defined as in (2), the non-negative integerN stated in

Theorem 2.4 is called themaximum-likelihood-degree, short

ML-degree, ofM.

Definition 2.6 (Mixed Discriminant). [40, Definition 4] Given an algebraic statistical modelM with a system of Lagrange likeli-hood equationsf = {f0, . . . , fn+s+1} defined in (2), we define

L_{M J} = proj_n+1(V(f ) ∩ V(J)),

where J denotes the determinant of Jacobian matrix of f with

respect to(p, λ): det          ∂f0 ∂p0 · · · ∂pn∂f0 ∂λ1∂f0 · · · _∂λs+1∂f0 .. . . .. ... ... . .. ... ∂fn+s+1 ∂p0 · · · ∂fn+s+1∂pn ∂fn+s+1∂λ1 · · · ∂fn+s+1_∂λs+1          .

IfI(L_{M J}) is principal, then a generator polynomial of I(L_{M J}), denoted byD_{M J}, is said to be amixed discriminantoff .

It is important to highlight thatL_{M J} is a component of the

discriminant variety[33] of Lagrange likelihood equations, which

plays a central role of real root classification; see e.g., [40, Theorem 2]. Notice that ifL_{M J} is principal, then the mixed discriminant D_{M J}_{is homogenuous in Q[u]; see [40, Proposition 2]. By Definition}

2.6,D_{M J} can be computed by computing the elimination ideal

⟨f , J⟩ ∩ Q[u]. That means to compute a Gröbner base of ⟨f , J⟩ with respect to a lexicographic order, where the determinantJ of the Jacobian matrix, defined as in Definition 2.6, can be huge. So, computingD_{M J}is usually challenging in practice; see [40, Tables 2–3].

According to [40, Algorithm 2, Strategy 3], one way to improve the efficiency for computingD_{M J}is to compute the elimination ideal with respect to one probability variable first; for instance, ⟨f ⟩ ∩ Q[u,p0]. Ifp⟨f ⟩ ∩ Q[u,p0] = ⟨Ef⟩, then it is well known

thatD_{M J}is a factor of the discriminant ofE_f with respect top0; see

[40, Lemma 3]. However,E_f is hard to obtain via directly computing Gröbner bases; see the column “standard" in Table 1. Therefore, the goal of the rest of paper is to computeE_f more efficiently.

3 SPECIALIZATION PROPERTIES

In this section, we discuss a selection of specialization properties, which guarantee the correctness of our Algorithm 1 in Section 5. Roughly speaking, elimination ideals (and their radicals) “specialize well" over a Zariski open set (see Proposition 3.2), and frac In what follows, we consider polynomial rings with at least two variables, i.e., Q[z1, . . . , zn] withn ≥ 2. Given h ∈ Q[z1, . . . , zn], we denote

for every 1≤i < n, bylm_i(h)andlcoeff_i(h)theleading monomial andleading coefficientofhwith respect toz_i+1, . . . , z_n, whenh is considered in Q(z1, . . . , zi)[zi+1, . . . , zn] with the lexicographic

or-derz_i+1< · · · < z_n. For everyb = (b1, . . . ,bi) ∈ Ci, we define the

polynomialh(b) = h|_z

1=b1, ...,zi=bi ∈ C[zi+1, . . . , zn]. For every

polynomial setH ⊆ Q[z1, . . . , zn], we defineH(b) = {h(b) ∈

C[zi+1, . . . , zn] |h ∈ H }.

Definition 3.1. [32, Definition 4.1] GivenH ⊆ Q[z1, . . . , zn], for

any 1≤i < n, a subset д of H is anoncomparable subsetofHwith respect tozi+1, . . . , znif

(1) for everyh ∈ H, there exists a д ∈ д such that lmi(h) is a multiple of lm_i(д), and

(2) for anyд1, д2∈д, with д1, д2, the leading monomial lmi(д1)

is not a multiple of lm_i(д2), and lmi(д2) is not a multiple of

lm_i(д1).

Proposition 3.2. GivenH ⊆ Q[z1, . . . , zn], for any 1 ≤i < n, if

the elimination ideal

⟨H⟩ ∩ Q[z1, . . . , zi, zi+1]= ⟨q⟩ with deg(q, zi+1)> 0,

and ifp⟨q⟩ = ⟨д⟩, then

(1) there exists an affine varietyV ⊊ Cisuch that for anyb ∈ Ci\V , ⟨H(b)⟩ ∩ C[z_i+1] = ⟨q(b)⟩, and

(2) there exists an affine varietyW ⊊ Ci_{such that for anyb ∈ C}i_{\W ,}

p

⟨H(b)⟩ ∩ C[z_i+1] = ⟨д(b)⟩.

Proof. LetG be a Gröbner basis of ⟨H⟩ with respect to the

lexicographic orderz1< · · · < zn. For any 1≤i < n, let N be a

noncomparable set ofG with respect toz_i+1, . . . , z_n.

Part (1): If⟨H⟩ ∩ Q[z1, . . . , zi+1] = ⟨q⟩, then by [18, page 121,

Theorem 2],G ∩ Q[z1, . . . , zi+1] is a Gröbner basis of ⟨q⟩. So G ∩

Q[z1, . . . , zi+1] contains only one element, sayh, and hence h = c ·q

wherec ∈ Q. Also,G_i = G ∩ Q[z1, . . . , zi]= ∅ since deg(h, zi+1)=

deg(q, z_i+1) > 0, and hence, V (G_i) = Ci. By [32, Theorem 4.3], there existsV ⊊ Cisuch that for anyb ∈ Ci\V , N(b) is a Gröbner basis of⟨H(b)⟩. By [18, page 121, Theorem 2], N(b) ∩ C[z_i+1] is a Gröbner basis of⟨H(b)⟩ ∩ C[zi+1]. Notice that N (b) ∩ C[zi+1]= {h(b)}. So, we have N(b) ∩ C[z_i+1]= ⟨h(b)⟩ = ⟨q(b)⟩.

Part (2): Ifp⟨q⟩ = ⟨д⟩, then V(q) = V(д). So, for any b ∈ Ci,

V(q(b)) = V(д(b)). Hence, we have p⟨q(b)⟩ = p⟨д(b)⟩. Note that

⟨д⟩ is a radical ideal. Then it is a basic fact that there exists an affine varietyV1⊊ Ci such that for anyb ∈ Ci\V1,⟨д(b)⟩ is still radical,

and hencep⟨q(b)⟩ = p⟨д(b)⟩ = ⟨д(b)⟩. By part (1), there exists an affine varietyV2⊊ Ci such that for anyb ∈ Ci\V2, we have the

equality in part (1). LetW = V₁∪V₂. Then for anyb ∈ Ci\W , we

(4)

Proposition 3.3. Letд ∈ Q[z₁, . . . , z_n]. Ifд = Πr

k=1дmkk, where

everyдkis irreducible in Q[z1, . . . , zn], andдj , дkfor anyj , k,

then for any 1 ≤i < n there exists an infinite subset Γ ⊆ Qi such that for anyb ∈ Γ we have д(b) = Πr_k=1дk(b)mk, whereдk(b) is

irreducible in Q[zi+1, . . . , zn], andдj(b) , д_k(b) for any j , k. Proof. Given thatд1, . . . , дr are irreducible, by Hilbert’s

irre-ducibility theorem, see e.g., [46, Theorem 1], for any 1≤i < n, there exists an infinite subsetΘ ⊆ Qi such that for anyb ∈ W , д1(b), . . . ,дr(b) are irreducible in Q[zi+1, . . . , zn]. Now consider

an arbitrary pairдj, д_kwithj , k and thus дj_{, д}_k. Without loss of generality, let

Wj,k = b ∈ Ci |д_j(b) − д_k(b) = 0 .

Obviously,W_j,k_{is an affine variety, which does not equal C}i. Then letΓ = Θ ∪ Ðr

k=1Ðk−1j=1

Qi\Wj,k

, and we are done. □

4 GENERAL ZERO-DIMENSIONAL SYSTEMS

Throughout the rest of the paper, we always assume that a sys-tem of Lagrange likelihood equations is general zero-dimensional; see Definition 4.1. A general zero-dimensional system has a nice structure, see [19, Theorem 6.10]. which leads us to analyze their elimination ideals further. The relation between the Shape Lemma [6] and [19, Theorem 6.10] was discussed in [19].

Definition 4.1 (General Zero-Dimensional System). A poly-nomial setH = {h1, . . . , hm} ⊆ Q[a1, . . . ak,y1, . . . ,ym] is called

ageneral zero-dimensional systemif there exists an affine variety

V ⊊ Cksuch that for anyb = (b₁, . . . ,b_k) ∈ Ck\V , the equations h1(b) = · · · = hm(b) = 0 satisfy:

(1) the number of complex solutions is a positive constant, denoted byN (H);

(2) all complex solutions are distinct;

(3) every pair of distinct complex solutionsy∗= (y∗1, . . . ,y ∗ m) and z∗_{= (z}∗ 1, . . . , z ∗ m) it holds thaty∗₁_{, z}∗₁.

Proposition 4.2. Consider a general zero-dimensional system

H ⊂ Q[a1, . . . , ak,y1, . . . ,ym]. If the elimination ideal

⟨H⟩ ∩ Q[a1, . . . , ak,y1]

is principal, then its radical ideal is generated by a polynomialд ∈ Q[a1, . . . , ak,y1] such that deg(д,y1) = N (H).

Proof. LetG be a Gröbner basis of ⟨H⟩ with respect to the

lexicographic ordera₁ < · · · < a_k < y₁ < · · · < y_m. SinceH is general zero-dimensional, by [19, Theorem 6.10], there exists

T1 = CNy1N+ CN −1y N −1

1 + . . . + C1y1+ C0∈ G,

where N = N (H), and C_i ∈ Q[a1, . . . , ak]. By [18, page 121,

Theorem 2], G ∩ Q[a1, . . . , ak,y1] is a Gröbner basis of ⟨H⟩ ∩

Q[a1, . . . , ak,y1]. By the hypothesis that ⟨H⟩ ∩ Q[a1, . . . , ak,y1]

is principal,G ∩Q[a1, . . . , ak,y1] contains only one element. Notice

T1∈ G∩Q[a1, . . . , ak,y1]. So, we know {T1}= G∩Q[a1, . . . , ak,y1].

And hence,⟨H⟩ ∩ Q[a₁, . . . , a_k,y₁]= ⟨T₁⟩. By [18, page 187, Propo-sition 12],p⟨T1⟩= ⟨д⟩, where д =

T1 gcdT1,

∂T1

∂a1, ...,_∂ak∂T1,∂T1∂y1

. Hence,

deg(д,y1) ≤ deg(T1,y1)= N (H).

Below, we prove deg(д,y₁) ≥N (H). By Definition 4.1, there ex-ists an affine varietyV1⊊ Cksuch that for anyb ∈ Ck\V1,V(H(b))

hasN (H) distinct complex points with distinct y1-coordinates.

By Proposition 3.2 (2), there existsV2 ⊊ Ck such that for any

b ∈ Ck\V2,

p

⟨H(b)⟩ ∩ Q[y1] = ⟨д(b)⟩.

Letb ∈ Ck\(V₁∪V₂). Thenд(b) = 0 has N (H) distinct complex solutions, which are they1-coordinates of points inV(H(b)). So

deg(д,y1) ≥ deg(д(b),y1) ≥N (H). □

5 ALGORITHM

Given an algebraic modelM, let its Lagrange likelihood equa-tion system bef = {f0, . . . , fn+s+1} ⊆ Q[u, p, λ]. Assuming ⟨f ⟩ ∩

Q[u, p0] is principal, we propose a probabilistic algorithm for

com-puting the polynomialE_f generatingp⟨f ⟩ ∩ Q[u,p0]. We simply

denote coeff(E_f, pi

0) byAi(u). ThenEf = Í N

i=0Ai(u)pi0, where by

Theorem 2.4, Definition 4.1 and Proposition 4.2,N is the ML-degree ofM. First, we highlight a fact:

(F1)Ef is homogenous with respect tou, and hence each A_i is

homogenous with the same total degree in Q[u].

We omit the proof of (F1) since the argument is similar to [40, Proposition 2], which is based on a basic fact implied by (2): for every(u,p₀) ∈ proj

n+2(V(f )) and for any complex scalar γ , 0,

(γu,p0) is also in projn+2(V(f )).

Besides observing (F1), we make the following assumptions to simplify our algorithm:

(A1) Assume deg(AN,u0) = deg(AN), i.e.,AN contains a term

udeg(AN)

0 ∈ Q[u0].

(A2) AssumeAN is monic with respect tou₀, which unifies our output polynomialE_f.

If (A1) does not hold, then we apply an invertible linear change to the parametersu_jsuch that (A1) holds for the new parameters (similar to [40, Algorithm 4]). For instance, obtain new parameters vjas

v0= u0, and vj= bjuj+ u0 for j = 1, . . . ,n,

whereb_jare randomly chosen rational numbers. By [40, Lemmas

1–2], deg(A_N(v),v0) will be equal to deg(AN(v)).

Let S (u) = n Õ i=0 ui.

The key idea of our algorithm is an observation from experiments: S (u) appears in some coefficients of E_f with respect top0. So, we

further write E_f(u,p0) = N Õ i=0 Ai(u) pi₀ = N Õ i=0 S (u)αi_R_i₍_{u) p}i 0, (3)

whereR_i ∈ Q[u]\⟨S (u)⟩. In a separate paper, we intend to prove for a general model that at least oneαiin (3) is nonzero. The main algorithm has three steps; see Algorithm 1:

Step 1 ComputeN , (α0, . . . , αN), and the degree of everyuj in

(5)

Algorithm 1: (Main Algorithm)

input : Lagrange likelihood equationsf0, · · · , fn+s+1∈ Q[u, p, λ] output : A generator ofp ⟨f0, . . . , fn+s+1⟩ ∩ Q[u, p0]:E

f(u, p0)= Í_i=0N Ai(u) pi0

1 N , (α0 , . . . , α_{N ), L, Ω ←}Degrees(f0 , . . . , f_n+s+1) 2 _{AN (u) ←}LeadingCoefficient(f0 , . . . , f_{n+s+1, αN , L)}

3 A0 (u ), . . . , A_{N −1(u) ←}Coefficients(f0 , . . . , f_{n+s+1, AN (u), α0, . . . , αN −1, Ω)} 4 ReturnÍN

i=0 Ai(u )pi0

Algorithm 2: (Sub-Algorithm of Algorithm 1) De-grees

input : Lagrange likelihood equationsf0, · · · , fn+s+1∈ Q[u, p, λ] output :N, (α0, . . . , αN), L, Ω, where

• N = deg(E_f, p0),

• αi is the multiplicity of the factorÍn

k=0uk appearing incoeff(Ef, p0i),

• Lis a list with lengthn + 1, whose(j + 1)-th entry isdeg(lcoeff(E

f, p0), uj)for

j = 0, . . . , n,

• Ωis anN × (n + 1)matrix, whose(i + 1, j + 1)-entry isdeg(coeff(E

f, p0i), uj)for

i = 0, . . . , N − 1and forj = 0, . . . , n.

1 f ∗0, . . . , f ∗n+s+1←

replaceu1 , . . . , uninf0 , . . . , f_n+s+1with some rational numbers

b1 , . . . , bn 2 д(u0 , p0 ) ←generator of q ⟨f ∗0, . . . , f ∗n+s+1⟩ ∩ Q[u0, p0] 3 N ← deg(д, p0 ) 4 forifrom0toNdo

5 αi ←multiplicity of the factoru0+ Ín

k=0bkincoeffд, pi 0 6 L(1) ← deg(lcoeff(д, p0 ), u0 ) 7 forifrom0toN − 1do 8 Ω(i + 1, 1) ← deg(coeff(д, pi 0), u0 ) 9 forjfrom1tondo

10 f ∗0, . . . , f ∗n+s+1←replaceu0 , . . . , uj−1, uj+1, . . . , uninf0 , . . . , fn+s+1with

some rational numbers 11 д(uj , p0 ) ←generator of q ⟨f ∗0, . . . , f ∗n+s+1⟩ ∩ Q[uj , p0] 12 L(j + 1) ← deg(lcoeff(д, p0 ), uj ) 13 forifrom0toN − 1do 14 Ω(i + 1, j + 1) ← deg(coeff(д, pi0), uj ) 15 returnN , (α0 , . . . , αN ), L, Ω

Step 2 Compute the leading coefficientAN(u) by interpolating RN(u).

Step 3 Compute the coefficientsAi(u) by interpolating R_i(u) for i = 0, . . . , N − 1.

We present the pseudocode in Algorithm 1 and its sub-algorithms Algorithms 2–6, and a running example in Section 5.1. Algorithm 1 is guaranteed to terminate since we only have finite loops. The algorithm is probabilistic, but Propositions 3.2–3.3 guarantee that it provides the correct output for a generical choice of random rational numbers.

5.1 Running Example

In this subsection, we illustrate how Algorithm 1 works by the

linear modelM below given by a weighted four-sided die [40,

Example 1], for which we know the ML-degree is 3:

M = V (p0+ 2p1+ 3p2− 4p3) ∩∆3,

where∆3 = {(p0, p1, p2, p3) ∈ R 4

>0|p0+ p1+ p2+ p3 = 1}. The

input Lagrange likelihood equations (2) are

f0= p0λ1+ p0λ2−u0 f1= p1λ1+ 2p1λ2−u1

f2= p2λ1+ 3p2λ2−u2 f3= p3λ1− 4p3λ2−u3

f4= p0+ 2p1+ 3p2− 4p3 f5= p0+ p1+ p2+ p3− 1

Algorithm 3: (Sub-Algorithm of Algorithm 1) Lead-ingCoefficient

input : Lagrange likelihood equationsf0, . . . , fn+s+1, andaN, L, where

•αN is the multiplicity of the factorÍn

k=0uk appearing inlcoeff(Ef, p0),

• Lis a list, whose(j + 1)-th entry isdeg(lcoeff(Ef, p0), uj)forj = 0, . . . , n. output :lcoeff(E

f, p0):A_N(u)

1 d ← L(1) − αN#Here,d = deg(RN , u0 ), and by (A1),deg(RN , u0 )= deg(RN )

2 forifrom0tod − 1do

3 Enumerate all the monomials in the set

{uβ11 · · ·u βnn | Ín j=1 βj = d−_{i, 0 ≤ βj ≤ L(j + 1) − αN }}asUi, 1 , . . . , Ui,ti 4 t ← max(t0 , . . . , t_d−1) 5 forkfrom1totdo

6 bk,1 , . . . , bk,n ←some rational numbers

7 q(u0 ) ←IntersectForLC(f0 , . . . , f_{n+s+1, bk,1, . . . , bk,n, αN )} 8 C∗0,k, . . . , C∗d−1,k←

the coefficients ofq(u0 )with respect tou0

0, . . . , ud−10

9 forifrom0tod − 1do

10 _{Mi ←}theti × timatrix whose(k, r )-entry isUi,r |u1

=bk,1 , . . ., un =bk,n

11 _{Ci (u}1 , . . . , un ) ← (Ui,1, . . . , Ui,ti )M−1_{i (C}_{i,1, . . . , C}∗ _i,ti∗ )T

12 Return _Ín k=0uk αN ud0+ Σd−1i=0 Ci u1, . . . , un _ui 0

Algorithm 4: (Sub-Algorithm of Algorithm 3) Inter-sectForLC

input : Lagrange likelihood equationsf0 , . . . , f_n+s+1, some rational numbersb1 , . . . , bnand

αN,

whereaNis the multiplicity of the factorÍn

k=0ukappearing inlcoeff(E f , p0). output : lcoeff(Ef ,p0 ) (Ín k=0uk )αN |u1=b1 , . . ., un =bn

1 f ∗0, . . . , f ∗n+s+1←replaceu1 , . . . , uninf0 , . . . , fn+s+1withb1 , . . . , bn, respectively

2 д(u0 , p0 ) ←generator of the radical of elimination ideal⟨f ∗

0, . . . , f ∗n+s+1⟩ ∩ Q[u0, p0]

3 q(u0 ) ←dividelcoeff(д, p0 )by(u0+ Ín

i=1 bi)αN

4 Makeq(u0 )monic with respect tou0, and_returnq(u0 )

Algorithm 5: (Sub-Algorithm of Algorithm 1) Coeffi-cients

input : Lagrange likelihood equationsf0, . . . , fn+s+1, and

AN(u), α0, . . . , αN −1, Ω, where

•A_N(u) = lcoeff(E_f, p0),

•αi is the multiplicity of the factorÍn

k=0uk appearing incoeff(E

f, p0i),

• Ωis anN × (n + 1)matrix, whose(i + 1, j + 1)-entry isdeg(coeff(E f, pi0), uj). output :coeff(E

f, p0i)fori = 0, . . . , N − 1:A0(u), . . . , AN −1(u)

1 d ← deg(AN(u))

2 forifrom0toN − 1do

3 Enumerate all the monomials in

{u0β0· · ·u βn n |Ín j=0βj= d − αi, 0 ≤ βj≤D(i + 1, j + 1) − αi}as Ui,1, . . . , Ui,ti 4 t ← max(t0, . . . , tN −1) 5 forkfrom1totdo

6 b_k,0, . . . , b_k,n←some rational numbers

7 д(p0) ←Intersect(f0, . . . , fn+s+1, bk,0, . . . , bk,n)

8 C∗

0,k, . . . , C ∗

N −1,k←the coefficients ofд(p0)with respect top0 0, . . . , p

N −1 0

9 forifrom0toN − 1do

10 M_i←ti×ti matrix whose(k, r )-entry is Ui,r

AN (u)|u0_=bk,0 , . . ., un =bk,n 11 Ri(u) ← (Ui,1, . . . , Ui,ti)M−1 i (C∗i,1, . . . , Ci,ti∗ )T 12 Return Ín k=0uk α0 R0(u), . . . , Ín k=0uk αN −1 RN −1(u)

wherep0, p1, p2, p3, λ1, λ2 are variables, andu0,u1,u2,u3are

pa-rameters. The output is a generator ofp⟨f0, . . . , f5⟩ ∩ Q[u, p0]. We

write the generator as in the form (3).

Step 1. First, we computeN , (α0, . . . , αN), and deg(Ai,uj) forj =

0, . . . , 3 and for i = 0, . . . , N . For each u_j _{, u}0, substituteuj = bj

(6)

Algorithm 6: (Sub-Algorithm of Algorithm 5) Inter-sect

input : Lagrange likelihood equationsf0, . . . , fn+s+1, and some rational numbers

b0, b1, . . . , bn output :_lcoeff(EEf(u,p0 )

f ,p0)

|_{u0=b0 , . . ., un =bn}

1 ef0, . . . ,fen+s+1←replaceu0, . . . , un inf0, . . . , fn+s+1 withb0, . . . , bn , respectively

2 д(p0) ←generator of the radical of elimination ideal⟨ef0, . . . ,efn+s+1⟩ ∩ Q[p0]

3 Makeд(p0)monic with respect top0 , andreturnд(p0)

we chooseb = (b1,b2,b3)= (2, 12, 7). We substitute uj = bj, and

rename the resulting polynomials asf∗

0, . . . , f ∗ 5. Note thatf ∗ k = fk(u0,b,p, λ). We obtain a generator of q ⟨f∗ 0, . . . , f ∗ 5⟩ ∩ Q[u0, p0]

by computing a Gröbner basis:д∗(u0, p0) = 10(u0+ 21) 2p3 0− (u0+ 21)(43u0+ 276)p 2 0+ 2u0(29u0+ 396)p0− 24u 2 0. Ifb is generic in

the parameter space C3, then by Proposition 3.2 (2),д∗(u0, p0) =

E_f(u0,b,p0). So, we have

N = deg(Ef(u, p0), p0)_{= deg(Ef}(u0, b, p0), p0)= deg(д ∗_{, p}₀₎

= 3.

And, fori = 0, . . . , N (= 3), we have

deg(Ai(u), u0)= deg(Ai(u0, b), u0)= deg(coeff(д∗, pi

0), u0)= 2.

So, we recordL(1)= deg(A3,u0)= 2 and Ω(i+1, 1) = deg(Ai,u0)=

2 fori = 0, 1, 2. Similarly, we compute the degrees of other parame-ters, and have

L = [2, 2, 2, 2] , and Ω =       2 0 0 0 2 1 1 1 2 2 2 2       , whereL(j + 1) records deg(A3,uj), andΩ(i + 1, j + 1) records

deg(A_i,u_j) fori = 0, 1, 2. Notice S(u0,b) = u0+ 21. By

Proposi-tion 3.3, checking the multiplicity of the factoru0+ 21 in each

coeff(д∗, pi

0) fori = 0, . . . , 3, we have α0 = α1 = 0, α2 = 1, and

α3= 2.

Step 2. The second step is to recover the leading coefficientAN(u). ByStep 1, we knowN = 3 and α3= 2. We write AN asA3(u) =

S (u)2

R3(u). By the degrees recored in L, we know the degrees

ofu0,u1,u2,u3 inA3(u) are all 2. So, R3(u) ∈ Q. According to

the assumption (A2),A₃(u) is monic with respect to u₀. Hence, R3(u) = 1 and A3(u) = S (u)

2

.

Step 3. The last step is to interpolateA0(u), A1(u) and A2(u). As

an example, we show how to interpolateA2(u) in details. By Step

1, we haveα2 = 1. So we write A2(u) = S (u)R2(u). By the last

row ofΩ, the degrees of u0,u1,u2,u3inA2(u) are 2, 2, 2, 2. Thus,

the degrees ofu0,u1,u2,u3inR2(u) are 1, 1, 1, 1, respectively. By

(F1),A2is homogenous, and we have deg(A2)= deg(A3)= 2. So

R2 is also homogenous, and deg(R2) = deg(A2) − deg(S (u)) =

1. Then we can assumeR2(u) = Í

3

k=0Ckuk, whereC_k ∈ Q. In order to determine the four coefficientsC_k, we establish four linear equations by sampling four times. The correctness of this sampling step is guaranteed by Proposition 3.2 (2). We show below how to do the sampling and establish the first linear equation (4) in details. The other equations in (5) are similarly obtained.

For everyu_j, substituteu_j= b_jintof₀, . . . , f₅, whereb_jis a ran-dom rational number. For instance, we chooseb = (b0,b1,b2,b3)=

(5, 6, 11, 32). We substituteu_j = b_j, and rename the resulting polyno-mials as ef0, . . . ,ef5. Note efk= fk(b,p, λ). We compute a generator

of q

⟨ ef0, . . . ,fe5⟩ ∩ Q[p0] and make it monic:

e д(1)₍_p 0) = p 3 0− 7 5p 2 0+ 481 1458p 0− 5 243. By Proposition 3.2 (2), ifb is generic in C4, then

e д(1)_(p 0)= Ef(b,p0) A3(b) . So coeff( e д(1)_{, p}2 0)= A2(b) A3(b)

. By the discussion above, we haveA₂(u) = S (u) Í3

k=0Ckuk, and byStep 2,A3(u) = S (u) 2 . So, we obtain −7 5 = coeff(eд (1)_{, p}2 0) = A2(b) A3(b) = 5C0+ 6C1+ 11C2+ 32C3 54 . (4)

Similarly, we obtain the other linear equations by samplings:

−311 120 = 11C0+ 2C1+ 3C2+ 8C3 24 , − 244 115 = 7C0+ 2C1+ 5C2+ 9C3 23 , (5) −181 110 = 7C0+ 3C1+ 13C2+ 21C3 44 . (6)

SolvingC₀, . . . ,C₃from the 4 equations (4)–(5), we have C0= − 43 10, C 1= −2, C2= − 3 2, C 3= − 4 5, and henceA2(u) = −S (u)(

43

10u0+2u1+ 3 2u2+

4

5u3). One can similarly

interpolateA0(u) and A1(u). Finally, the output is Ef = S (u) 2 p3 0− S (u)(43 10u0+2u1+ 3 2u2+ 4 5u3)p 2 0+ 1

5u0(29u0+23u1+21u2+14u3)p0− 12

5u 2

0. Also, it is straightforward to check the discriminant ofEf

with respect top0is 4u2 0(u0+ u1+ u2+ u3) 2_(441u4 0+ 4998u 3 0u1+ 20041u 2 0u 2 1+ 33320u0u 3 1+ 19600u 4 1− 756u3 0u2+ 20034u 2 0u1u2+ 83370u0u 2 1u2+ 79800u 3 1u2− 5346u 2 0u 2 2+ 55890u0u1u 2 2+ 119025u2 1u 2 2+ 4860u0u 3 2+ 76950u1u 3 2+ 18225u 4 2− 1596u 3 0u3− 11116u 2 0u1u3− 17808u0u 2 1u3+ 4480u 3 1u3+ 7452u 2 0u2u3− 7752u0u1u2u3+ 49680u 2 1u2u3− 17172u0u 2 2u3+ 71460u1u 2 2u3+ 27540u 3 2u3+ 2116u 2 0u 2 3+ 6624u0u1u 2 3− 4224u 2 1u 2 3− 9528u0u2u 2 3+15264u1u2u 2 3+14724u 2 2u 2 3−1216u0u 3 3−512u1u 3 3+3264u2u 3 3+256u 4 3),

where the last factor is the mixed discriminant off0, . . . , f5.

6 COMPUTATIONAL RESULTS

In this section, we explain the implementation details, and com-pare the timings of Algorithm 1 and existing methods by testing a list of interesting algebraic models.

6.1 Implementation

First, we explain the implementation and experimental details. Testing models,Maple code and computational results are available online via:

https://sites.google.com/site/rootclassification/publications/supplementary- materials/ lle2018.

Software We implemented Algorithm 1 in Maple 2018, where

we use the_{FGb command fgb_gbasis_elim for computing}

elimination ideals, for instance, in Algorithm 2-Lines 2, 11, Algorithm 4-Line 2 and Algorithm 6-Line 2.

Hardware and System We used a 3.2 GHz Intel Core i5 processor (8 GB of RAM) under OS X 10.9.3.

Testing Models Testing Models are chosen from the literatures [20, 27] and have been tested by both standard elimination method and Algorithm 1.

(7)

6.2 Computing elimination ideals

We have computed the radical elimination idealsE_f for testing models by standard elimination, [40, Algorithm 2] and Algorithm 1. Table 1 compares the timings of the three methods.

Conclusion from Table 1: For smaller models with ML-degree less than 5, computing Gröbner bases directly (standard elimination) is the fastest method; for larger models with ML-degree greater than 5, Algorithm 1 is the fastest. Particularly, comparing columns “In-terpolation" and “Algorithm 1", we see the structure of elimination ideals indeed improves the efficiency significantly.

Instruction for Table 1:

(1) The columns “#pi” and “ML-Degree” give the number of proba-bility variablesn and ML-degree N , respectively.

(2) In the column “standard” we record the time to compute the elimination ideal⟨f ⟩ ∩ Q[u,p0]. When FGb returned no

out-put until we run out the memory, we record “∞”.

(3) We record timings of [40, Algorithm 2] and Algorithm 1 in the columns “Interpolation" and “Algorithm 1". Timings in italics font means the computation did not finish within two weeks, but we estimate the sampling timing providing a lower bound; see Example 6.1.

Example 6.1. In Table 1, the estimated total timings for sam-pling, seeStep 3 in Section 5.1, are displayed in italic. We ex-plain how to estimate these timings by the running example in Section 5.1. There are 4 parametersui (i = 0, 1, 2, 3). We know deg(A2,ui) are all 2. Also by (F1) and (A1),A2is homogenous and

deg(A₂) = deg(A₂,u₀) = 2. So A₂is a linear combination of 10 monomials. [40, Algorithm 2] interpolatesA2(u) directly without

any structure, so we need to sample 10 times. However, Algorithm 1 interpolatesR2(u), which is a factor of A2(u) as shown in the

running example, so we only need to sample 4 times since there are 4 possible monomials inR2(u). We check by Maple the timing

for doing sample once inStep 3 is 0.02 second. Then we estimate the timing of sampling in [40, Algorithm 2] and Algorithm 1 are 0.02 × 10 = 2 (seconds) and 0.02 × 4 = 0.08 (second), respectively.

6.3 Computing discriminants

GivenE_f(u,p0), we can directly compute the discriminant of

E_f with respect top0, denoted bydiscr(Ef;p0), by eliminatingp0

fromE_f and∂Ef

∂p0

. We summarize the computational timing of this method for computing discriminants in Table 2. Here, we have a list of remarks on Table 2:

(1) We can not save the large result for Model 6 into a text file. The size of a temporary file when we interrupt the saving process is 32 GB.

(2) For Model 6 and Model 9, the estimated timings for [40,

Al-gorithm 2] to computeD_{M J}s are 13374 and 454833 days,

respectively. Note thatD_{M J}is a factor of discr(E_f;p0).

ACKNOWLEDGMENTS

We thank David A. Cox, Hoon Hong, Anne Shiu, Frank Sottile, and Bernd Sturmfels for their support and advice. TdW was partially supported by the DFG grant WO 2206/1-1. XT was partially sup-ported by the NSF (DMS-1513364, DMS-1752672, and CCF-1708884).

Models Degree Size

Our Method E_f discr(E f;p0) Total Model 4 110 7.5 MB 782.676 s 0.027 s 783 s Model 6 342 >32 GB 14 d 11.379 s 14 d Model 9 176 8.68 GB 2 d 81.015 s 2 d

Table 2: Runtimes for computing discriminants

REFERENCES

[1] C. Amendola, N. Bliss, I. Burke, C. R. Gibbons, M. Helmer, S. Hoşten, E. D. Nash, J. I. Rodriguez, and D. Smolkin. 2018. The maximum likelihood degree of toric varieties. (2018).Accepted by J. Symb. Comput. Arxiv: 1703.02251.

[2] D. S. Arnon and S. Dennis. 1988. A cluster-based cylindrical algebraic decompoi-sition algorithm.J. Symb. Comput. 5, 1 (1988), 189–212.

[3] S. Basu, R. Pollack, and M. F. Roy. 1996. On the combinatorial and algebraic complexity of quantifier elimination.Journal of ACM 43, 6 (1996), 1002–1045. [4] S. Basu, R. Pollack, and M. F. Roy. 1999. Computing roadmaps of semi-algebraic

sets on a variety.Journal of the AMS 3, 1 (1999), 55–82.

[5] S. Basu, R. Pollack, and M. F. Roy. 2006.Algorithms in Real Algebraic Geometry. Springer-Verlag.

[6] E. Becker, M. G. Marinari, T. Mora, and C. Traverso. 1994. The shape of the Shape Lemma. InProceedings of ISSAC’94. ACM New York, 129–133.

[7] C. W. Brown. 2001. Improved projection for cylindrical algebraic decomposition. J. Symb. Comput. 32, 5 (2001), 447–465.

[8] C. W. Brown. 2001. Simple CAD construction and its applications. _{J. Symb.} Comput. 31, 5 (2001), 521–547.

[9] C. W. Brown. 2003. QEPCAD B: a program for computing with semi-algebraic sets using CADs._{ACM SIGSAM Bulletin 37, 4 (2003), 97–108.}

[10] C. W. Brown. 2012. Fast simplifications for Tarski formulas based on monomial inequalities._{J. Symb. Comput 47, 7 (2012), 859–882.}

[11] C. W. Brown. 2013.Constructing a single open cell in a cylindrical algebraic decomposition. In_{ISSAC Proceedings of the International Symposium on Symbolic} and Algebraic Computation. acm, 133–140.

[12] B. Buchberger. 1965. An algorithm for finding the basis elements of the residue class ring of a zero dimensional polynomial ideal.J. Symb. Comput. 41, 3 (1965), 475–511.

[13] M.-L. G. Buot, S. Hoşten, and D. Richards. 2007. Counting and locating the solu-tions of polynomial systems of maximum likelihood equasolu-tions, II: The Behrens-Fisher problem.Statistica Sinica 17 (2007), 1343–1354.

[14] F. Catanese, S. Hoşten, A. Khetan, and B. Sturmfels. 2006. The maximum likeli-hood degree.Amer. J. Math. 128, 3 (2006), 671–697.

[15] C. Chen, J. H. Davemport, J. P. May, M. M. Maza, B. Xia, and R. Xiao. 2010. Triangular decomposition of semi-algebraic systems.. InISSAC’10 Proceedings of the 35th International Symposium on Symbolic and Algebraic Computation. ACM New York, 187–194.

[16] G. E. Collins. 1975. Quantifier Elimination for the Elementary Theory of Real Closed Fields by Cylindrical Algebraic Decomposition. In_{Lecture Notes In} Com-puter Science, Vol. 33. Springer-Verlag, Berlin, 134–183.

[17] G. E. Collins and H. Hong. 1991. Cylindrical algebraic decomposition for quntifier elimination._{J. Symb. Comput. 12, 3 (1991), 299–328.}

[18] D. A. Cox, J. Little, and D. Oshea. 2015._{Ideals, varieties, and algorithms: an} intro-duction to computational algebraic geometry and commutative algebra. Springer. [19] A. Dickenstein, M. P. Millán, A. Shiu, and X. Tang. 2019.Multistationarity in

structrued reaction networks. _{Bulletin of Mathematical Biology. 81, 5 (2019),} 1527–1581.

[20] M. Drton, B. Sturmfels, and S. Sullivant. 2009. _{Lectures on algebraic statistics.} Springer.

[21] J. C. Faugère. 1999. A new efficient algorithm for computing Gröbner bases (F4). Journal of Pure and Applied Algebra 139, 1 (1999), 61–88.

[22] J. C. Faugère, P. Gianni, D. Lazard, and T. Mora. 1993. Efficient computation of zero-dimensional Gröbner bases by change of ordering.J. Symb. Comput. 16, 4 (1993), 329–344.

[23] D. Grigoriev. 1988. Complexity of deciding Tarski algebra.J. Symb. Comput. 5, 1–2 (1988), 65–108.

[24] E. Gross, M. Drton, and S. Petrović. 2012. Maximum likelihood degree of variance component models.Electronic Journal of Statistics 6 (2012), 993–1016. [25] E. Gross and J. I. Rodriguez. 2014. Maximum likelihood geometry in the presence

of data zeros. In_{ISSAC’14 Proceedings of the 39th International Symposium on} Symbolic and Algebraic Computation. ACM New York, 232–239.

[26] J. Hauenstein, J. I. Rodriguez, and B. Sturmfels. 2012. Maximum likelihood for matrices with rank constraints._{Journal of Algebraic Statistics (2012). To Appear.} [27] S. Hoşten, A. Khetan, and B. Sturmfels. 2005. Solving the likelihood equations.

(8)

[28] H. Hong. 1990. An improvement of the projection operator in cylindrical algebraic decomposition. InISSAC Proceedings of the International Symposium on Symbolic and Algebraic Computation. ACM, 261–264.

[29] H. Hong. 1992.Simple solution formula construction in cylindrical algebraic decomposition based quantifier elimination. InISSAC Proceedings of the Interna-tional Symposium on Symbolic and Algebraic Computation. ACM, 177–188. [30] H. Hong and M. Safey EI Din. 2012.Variant Quantifier Elimination. _{J. Symb.}

Comput. 47, 7 (2012), 883–901.

[31] J. Huh and B. Sturmfels. 2014._{Likelihood geometry. Number 63–117. Springer} International Publishing.

[32] D. Kapur, Y. Sun, and D. K. Wang. 2010. A new algorithm for computing com-prehensive Gröbner systems. In_{ISSAC’10 Proceedings of the 35th International} Symposium on Symbolic and Algebraic Computation. 29–36.

[33] D. Lazard and F. Rouillier. 2005. Solving parametric polynomial systems._Journal of Symbolic Computation 42, 6 (2005), 636–667.

[34] S. McCallum. 1988. An improved projection operation for cylindrical algebraic decomposition of three-dimensional space.J. Symb. Comput. 5, 1 (1988), 141–161. [35] S. McCallum. 1999. On projection in CAD-Based quantifier elimination with

equational constrants. InISSAC Proceedings of the International Symposium on Symbolic and Algebraic Computation. ACM, 145–149.

[36] J. Renegar. 1992. On the computational comlexity and geometry of the first-order theory of the reals. Part I.J. Symb. Comput. 13, 3 (1992), 255–299.

[37] J. Renegar. 1992. On the computational comlexity and geometry of the first-order theory of the reals. Part II.J. Symb. Comput. 13, 3 (1992), 301–327.

[38] J. Renegar. 1992. On the computational comlexity and geometry of the first-order theory of the reals. Part III.J. Symb. Comput. 13, 3 (1992), 329–352.

[39] J. I. Rodriguez and X. Tang. 2015. Data-Discriminants of Likelihood Equations. InISSAC’15 Proceedings of the 40th International Symposium on Symbolic and Algebraic Computation. ACM New York, 307–314.

[40] J. I. Rodriguez and X. Tang. 2017. A Probabilistic Algorithm for Computing Data-Discriminants of Likelihood Equations.Journal of Symbolic Computation 83 (2017), 342–364.

[41] M. Safey EI Din and E. Schost. 2003. Polar varieties and computation of one point in each connected component of a smooth real algebraic set. In_ISSAC’03 Proceedings of International Symposium on Symbolic and Algebraic Computation. 224–231.

[42] M. Safey EI Din and E. Schost. 2004. Properness defects of projections and computaion of in each connected component of a real algebraic set._{Discrete and} Computational Geometry 32, 3 (2004), 417–430.

[43] B. Sturmfels. 2002. Solving systems of polynomial equations. In_Regional confer-ence series in mathematics. Vol. 97. American Mathematical Society, Providconfer-ence, R. I.

[44] S. Sullivant. 2018.Algebraic Statistics. Graduate Studies in Mathematics, Vol. 194. American Mathematical Society.

[45] A. Tarski. 1951.A decision method for elementary algebra and geometry. University of California Press. University of California Press. University of California Press. [46] M. B. Villarino, W. Gasarch, and K. W. Regan. 2018. Hilbert’s proof of his

irre-ducibility theorem.The American Mathematical Monthly 125, 6 (2018), 513–530. [47] L. Yang, X. Hou, and B. Xia. 2001. A complete algorithm for automated discovering of a class of inequality-type theorems. Science in China Series F: Information Sciences 44, 1 (2001), 33–49.