Besides the flexibility of the models introduced by Laird and Ware (1982), the suc- cess of mixed effects models is due to their great numerical tractability and to the joint developpement of efficient algorithms and powerful computers. In particular, the Expectation-Maximization (EM) algorithm (Dempster et al.,1977) is a very popular al- gorithm which allows for maximizing the likelihood of latent data models in very general frameworks.
The EM Algorithm and its Variants. Linear mixed effects models are an ideal ap- plication framework for the EM algorithm. Already, in the seminal paper of Dempster et al. (1977) a specific attention were given to it and the trend continued thereafter (Foulley,2002;Laird et al.,1987;Laird and Ware,1982;Meng and Van Dyk,1997).
Convergence of the EM algorithm toward a local minimum of the observed likeli- hood was proved by Dempster et al.(1977) with revisions by Wu(1983). However, the assumptions required at first instance were difficult to check and the framework intro- duced by Delyon et al. (1999) provides more reasonable ones. The algorithm iterates two steps until convergence: an expectation step, the E-step, in which we compute the the conditional expected log-likelihood, taking into account the last observed variables, and a maximization step, the M-step, in which we estimate a maximum of the likelihood of the parameters by maximizing the likelihood found through the E-step.
Besides upgrades to the speed of convergence (McLachlan and Krishnan,2007), many other variants were proposed for the EM algorithm. We can distinguish two types of upgrades: the ones concerning the expectation step and the ones concerning the maximization step. For the later, the Generalized EM (Delyon et al., 1999) no longer requires a maximization of the expectation at each step but only an increase in it. This way, one can apply the EM algorithm even without any analytic solution to the M-step. InLange(1995) version, the maximization step is performed by use of a Newton-Raphson method for instance.
Alternatives to the computation of the expectation involve the introduction of stochas- ticity in the estimation procedure. With the stochastic EM algorithm (SEM),Celeux and Diebolt (1985) proposed to replace the computation of the expectation by a numerical estimation of it via a simulation of the latent data. Wei and Tanner (1990) generalized this idea by replacing the computation of the expectation by a Monte-Carlo approxi- mation of it, leading to the Monte-Carlo EM or MCEM. By adjusting the number of random samplings in the Monte-Carlo summation, we are able to mimic the behavior of a simulated annealing algorithm (Celeux et al., 1995). An alternative approach devel- oped byDelyon et al.(1999) consists in replacing the computation of the expectation by an approximation of Robins-Monro type (Robbins and Monro, 1951), which is known to converge toward the expectation under ad hoc hypotheses. This procedure is referred to as stochastic approximation EM algorithm or SAEM algorithm. Finally, unlike their deterministic counterparts, these stochastic variants of the EM algorithm are able to get
away from local maxima. Therefore the convergence toward global maxima is favored. Markov Chain Monte Carlo Methods. When exact simulation of latent variables is
not tractable, rely on approximate sampling through Markov chain Monte Carlo method, or MCMC method, (Andrieu et al., 2003; Brooks et al., 2011; Robert and Casella, 1999) proved to be successful. The idea of MCMC methods is to generate a Markov chain converging toward the law we want to draw variables from. More specifically, we replace the simulation of one sample from this complicated law by the generation of a potentially high number of samples from (hopefully) simpler distributions. Among these samplers, the most frequently used is probably the Metropolis-Hastings algorithm. First introduced in the particular case of the Boltzmann distribution (Metropolis and Ulam,1949;Metropolis et al.,1953), the Metropolis-Hastings algorithm was generalized to any distribution by Hastings (1970). A strength of this algorithm is that it only requires the knowledge of the target distribution up to a multiplicative constant. Then, it allows to avoid computing the normalization constant, which is often an intractable computation. The Metropolis-Hastings algorithm can be seen as a generalization of rejection sampling: at each iterative step, and given the current state of the Markov chain, we make a proposal for an increment and we accept it as soon as it improves the “likelihood”. More precisely, given a target law π, a (pseudo)random generator q(·; xk)
and xk the current state of the chain at iteration k, we accept the proposal x∗∼ q(·; xk)
with probability
α(xk, x∗) =
π(x∗) q(xk, x∗)
π(xk) q(x∗, xk)
.
Moreover, this sampler can be incorporated into a Gibbs sampler and is thus particularly suitable for high dimensional data.
Building on the work ofKuhn and Lavielle(2004) which proves the convergence of the MCMC-SAEM algorithm in the case where the variables generated along the procedure remain bounded,Allassonnière et al.(2010) proved the convergence of the MCMC-SAEM algorithm in greater generality. Note that the convergence of this algorithm only requires a single step of MCMC, which makes it very competitive computationally speaking. This algorithm is used in theMonolixsoftware and proves to be widely applicable, especially for pharmacokinetics models (Chan et al.,2011;Lavielle and Mentré,2007).
Despite the flexibility of the models we described (linear models inI.1and nonlinear models in I.2), and as a result of how their are written, they can be applied only to scalar data. But the applications, especially in the medical sciences we are interested in, involve highly structured data such as scanners, images, tensors or 3D anatomical shapes. It is thus deemed necessary to propose a statistical framework suited from these data, both massive and heterogeneous.
II. The Use of Riemannian Geometry for the
Study of Longitudinal Data
In this section, and unless otherwise stated, we will refer generically to any structured data as a shape. Thus, a shape may name an image, as well as a mesh, a tensor, a submanifold, etc.
Riemannian geometry is a particularly suitable tool for the mathematical modeling of shapes. Indeed, rather than analyzing shapes individually, it seems more efficient to consider sets or populations of shapes and to try to understand them as spaces in the mathematical sense (Trouvé and Younes, 2015). By construction, these spaces will naturally inherit a Riemannian manifold structure.