Therefore, in the above context, we have seen that the convolved covariance could offer an explanation of the data through a simpler model or converge to the LMC, if needed.
2.4 Summary
In this chapter we have presented different alternatives for constructing valid covariance functions to be used in a multivariate Gaussian process framework. We introduced the convolved multiple output covariance and saw that it contains the Linear Model of Coregionalization and the Process Convolutions as particular cases. We have also specified the elements that we will use in the following chapters, namely, parameter estimation by maximum likelihood and predictive posterior distribution.
The linear model of coregionalization can be interpreted as an instantaneous mixing of latent functions, in contrast to a convolved multiple output framework, where the mixing is not necessarily instantaneous. Experimental results presented in publication v showed that there is a benefit in using this non-instantaneous mixing in terms of predictive precision. This augmented performance was more noticeable in systems with a presence of some dynamics.
One important question in the process convolution framework and in the con- volved multiple output covariance is how to choose the kernel smoothing func- tions {Gd,q}D,Qd=1,q=1. Although, there are non-parametric alternatives (Ver Hoef and Barry, 1998) as well as plenty of parametric ones (Higdon, 2002), in this thesis we are interested in dynamical systems, for which the present alternatives are not suitable. In the next chapter, we will study moving-average functions obtained from linear differential equations, that will allow us to encode prior knowledge of the system’s dynamics in the covariance function.
Linear Latent force models
In chapter 2 we established a general framework to develop covariance func- tions for multivariate regression in a Gaussian processes context. We proposed the convolved multiple output covariance as a method that generalizes different other alternatives in the literature and that is parameterized in terms of a set of moving-average functions Gi
d,q(x) and a set of covariances kq(x, x0). One impor- tant question for making the approach practical is how to specify the moving- average functions and the covariances of the latent functions.
It is well known, from the theory of dynamical systems, that there exists a cor- respondence between a linear differential equation and a convolution transform, and that this correspondence is established through what is called as the impulse response of the system. From a mathematical point of view, the impulse response is better known as the Green’s function and it is a standard method used to solve differential equations (Griffel, 2002; Rynne and Youngson, 2008).
In this chapter, we motivate the use of Green’s functions as alternatives to smoothing kernels by introducing a generative model of the noisy outputs, which we call linear latent force models or simply latent force models (LFM). A latent force model introduces basic mechanistic principles in the formulation of a tradi- tional latent variable model. Our motivation is to augment a latent variable model with the ability to incorporate salient characteristics of the data (for example, in a mechanical system inertia or resonance), even knowing that the differential equation from which it is derived does not reflect the real dynamics of the sys- tem. For example, for a human motion capture dataset, we develop a mechanistic
3.1. FROM LATENT VARIABLES TO LATENT FORCES
model of motion capture that does not exactly replicate the physics of human movement, but nevertheless captures important features of the movement. The linear latent force model is a generalization of the work of Lawrence et al. (2007) and Gao et al. (2008), who encoded first order differential equations in the covariance function of a multivariate Gaussian process.
In section 3.1, we introduce the linear latent force model as a latent variable model. We then see how the latent force model translates into a multivariate Gaussian process with a convolved multiple output covariance in section 3.2. In section 3.3, we present a second order latent force model for motion capture data. Finally, section 3.4 presents related work.
Remark. Section 3.1 and 3.3 were originally presented in publication ii. Section 3.2, which connects the latent force model with chapter 2, is new. The section on related work is also new.
3.1 From latent variables to latent forces
From the perspective of machine learning, the linear latent force model can be seen as a type of latent variable model. In a latent variable model we may summarize a high dimensional data set with a reduced dimensional representation. For example, if our data consists of N points in a D dimensional space we might seek a linear relationship between the data, Y = [y1, . . . , yD] ∈ <N×D with yd∈ <N×1, and a reduced dimensional representation, U = [u1, . . . , uQ] ∈ <N×Q with uq ∈ <N×1, where Q < D. From a probabilistic perspective this involves an assumption that we can represent the data as
Y = UW>+ E, (3.1)
where E = [e1, . . . , eD] is a matrix-variate Gaussian noise: each column, ed ∈ <N×1 (1 ≤ d ≤ D), is a multivariate Gaussian with zero mean and covariance Σd, this is ed ∼ N (0, Σd). The usual approach, as undertaken in factor analysis (FA) and principal component analysis (PCA), to dealing with the unknowns in this model is to integrate out U under a Gaussian prior and optimize with respect to W ∈ <D×Q(for a non-linear variant of the model it can be convenient to do this the other way around, this is, integrate out W and optimize U, see for example Lawrence (2005)). If the data has a temporal nature, then the Gaussian
prior in the latent space could express a relationship between the rows of U, utn = Γutn−1+η, where Γ is a transformation matrix, η is a general noise process, usually Gaussian, and utn is the n-th row of U, which we associate with time tn. This is known as the Kalman filter/smoother. Normally the times tn, are taken to be equally spaced, but more generally we can consider a joint distribution for p (U|t), with t = [t1. . . tN]>, which has the form of a Gaussian process,
p (U|t) = Q Y q=1 N uq|0, Kuq,uq ,
where we have assumed zero mean and independence across the Q dimensions of the latent space. The GP makes explicit the fact that the latent variables are functions, {uq(t)}Qq=1, and we have now described them with a process prior. The elements of the vector uq = [uq(t1), . . . , uq(tN)]>, represent the values of the function for the q-th dimension at the times given by t. The matrix Kuq,uq is the covariance function associated to uq(t) computed at the times given in t.
Such a GP can be readily implemented. Given the covariance functions for {uq(t)}Qq=1, the implied covariance functions for {yd(t)}Dd=1 are straightforward to derive. In Teh et al. (2005) this is known as a semiparametric latent factor model. If the latent functions uq(t) share the same covariance, but are sampled independently, this is known as the multi-task Gaussian process prediction model (Bonilla et al., 2008) with a similar model introduced in Osborne et al. (2008). Both models were introduced in chapter 2 as particular cases of the linear model of coregionalization. Historically the Kalman filter approach has been preferred, perhaps because of its linear computational complexity in N. However, recent ad- vances in sparse approximations have made the general GP framework practical (see Qui˜nonero-Candela and Rasmussen, 2005b, for a review).
So far the model described relies on the latent variables to provide the dynamic information. The novelty here is that we include a further dynamical system with a mechanistic inspiration. We now use a mechanical analogy to introduce it. Consider the following physical interpretation of equation (3.1): the latent functions, uq(t), are Q forces and we observe the displacement of D springs, yd(t), to the forces. Then we can reinterpret (3.1) as the force balance equation, Yκ = US>+
e
E. We have assumed that the forces are acting, for example, through levers, so that we have a matrix of sensitivities, S ∈ <D×Q, and a diagonal matrix of spring constants, κ ∈ <D×D, with elements {κ