• No results found

Towards a spatio-temporal framework

4.3 Spatio-temporal model development

4.3.1 Preliminaries on parameterization & estimation

The above sections have laid the foundations of the development of the spatio-temporal model. In the following paragraphs, two more specific aspects in terms of the DSTM frame-work are presented, including the widely-used approaches to parameterize the data/process model and a specific algorithm used to estimate a spatio-temporal random effect (STRE) model. These two aspects provide details that are directly linked to the proposal of the spatio-temporal model for sparse remote-sensing image time series.

Preliminary 1: parameterizing the data/process models Spatio-temporal processes are usually of high dimensionality, probably also with missing observations across space and time. As a result, estimation of model components can be problematic, especially if the model parameters are unconstrained. A helpful solution to this problem is to parameterize the model components based on ‘prior scientific knowledge and/or common spatial models’

(Xu & Wikle,2007). Some frequently used parameterizations of the data model and process model covariance matrices are summarised inXu & Wikle (2007).

(a) Assume the residuals of the data model are i.i.d. random noises, i.e. G = σ2I.

(b) Use an empirical orthogonal function (EOF) expansion to parameterize the data model residual covariance matrix, i.e. G = σ2I +PP

p=a+1λpξpξp>, where ξp are the EOFs and λp are corresponding eigenvalues.

(c) Specify an exponential covariance function for the process model covariance matrix, i.e. H = σ2V (h; d), where the elements in matrix V (h; d) is determined by correlation function ρ(h) = exp(−h/d) with d being the parameter.

(d) Use a conditional auto-regressive (CAR) model for the residuals of the process model, i.e. u(si)|u(sj) ∼ N

 bP

j6=iciju(sj), σ2i



, where b is the CAR model parameter, cij

describes the adjacency of u(si) and u(sj).

Note that the last parameterization only applies to a process model describing the evolution of the actual spatial process. It does not apply to the dimension-reduced state space model as defined in equation (4.1) and (4.3).

Among the four parameterizations above, the EOF expansion is the most interesting for the aims of this thesis. The concept of EOF in atmospheric and meteorological science is similar to the concept of principal component in statistics. Therefore, parameterizing the residual covariance matrix G using EOFs is associated with the PCA based on the covariance matrix G. Furthermore, if a basis representation is used for the data Zt and the EOFs, then this parameterization could be linked to a FPCA in that

Zt=

where λp is the eigenvalue and ξp is the vector of the evaluated eigenfunction ξp(). Both of them can be extracted from the FPCA. Unfortunately, the MLE of σ2 has no analytical solution, so numerical methods are required for the estimation. Specifically, it is done by numerically equating the score function of σ2 to zero. (Xu & Wikle,2007) derived the score function,

. The R function uniroot can be used to solve the score function.

The parameterization of the residual covariance matrix G and the estimation method of the parameter σ2 given the eigenvalues and eigenfunctions are important to the development and estimation of the spatio-temporal model in this thesis. The disadvantage of this method, however, is that the EOFs are estimated before fitting the state space model, i.e. the esti-mation is based on potentially correlated data. Whereas ideally, the estiesti-mation should use independent data. Therefore, this approach does not solve the problem put forward at the end of Chapter 3. There are also criticisms that the leading EOFs may not be adequate to explain the dominant system dynamics, despite their power in describing the variation in the data (Cressie & Wikle, 2011). On the contrary, the dynamics might be governed by a component that accounts for only a small proportion of the variance.

Preliminary 2: the STRE model and FRF One type of STRE model for very large spatio-temporal data sets has a data model for dimension reduction,

Z(s; t) = Y (s; t) + (s; t) (4.25)

Y (s; t) = µt(s) + St(s)>βt+ ζ(s; t) .

Here Z(s; t) is the observation and Y (s; t) is the true underlying process, which is fur-ther decomposed into a spatial mean function µt(s), a spatio-temporal dynamic component St(s)>βtand an additional random component ζ(s; t). Dimension reduction comes with the basis representation of the dynamic component as a spatial basis St(s) multiplied by the time-varying basis coefficient vector βt. The process model of βt is specified as

βt= Mtβt−1+ ut,

with propagator matrix Mtand residual ut. It is assumed that the series {ζt}Tt=1, where ζt= (ζ(s1; t), · · · , ζ(sn; t))>, is not temporally correlated and only depends on the observations at time t. It is also assumed that the series {ζt}Tt=1 is independent of the series {βt}Tt=1. Both {βt}Tt=1 and {ζt}Tt=1 are independent from the measurement error process {t}Tt=1. The estimation of model components βt and ζt uses the fixed rank filtering (FRF), where

‘rank’ refers to the dimension of the basis matrix St(s). It was proposed by Cressie et al. (2010), based on the fixed rank kriging method (Cressie & Johannesson, 2008) by in-corporating the temporal component through a process model estimated using the Kalman filter/smoother. One thing worth pointing out is that, although ζt is independent of βt, its estimation is accomplished by a filter based on the conditional distribution of (ζt, βt) given data Z1:T as

ζt|t= Ct>

StBt|t−1St>+ Dt−1

Zt− µt− Stβt|t−1

. (4.26)

Here Ct= Cov[Zt, ζt] is the covariance matrix, Dt= σ2ζIt+ σ2Wtis the covariance matrix of ζt+ tand βt|t−1, Bt|t−1 come from the Kalman filtering of βt. This suggests that, ζtand βt are no longer independent after conditioning on the data Z1:T.

What is enlightening about this method are the dependence/independence assumptions on the model components and its estimation using FRF embedded in an EM algorithm (Katzfuss

& Cressie,2011). To some extent, the conditional dependence of the two random components βtand ζtis crucial in terms of model estimation. However, the random component ζt, while accounting for the variation not covered by the system dynamic, cannot provide a conclusive

summary of the spatial variation. Unlike the eigenfunction and scores from a FPCA, ζt can hardly be used as a measure of the spatial variation patterns in the data or their evolution.

Therefore, it is not the optimal solution for the analysis in this thesis, where the spatial variation patterns are also of interest.

4.3.2 The proposed state space FPCA model (SS-FPCA)

Based on the above two preliminaries and all the basic elements introduced in previous sections, a spatio-temporal model with a system dynamic component and a FPCA component was proposed. The same notation as in Chapter 3 is used here, with subscript t indicating the time point and (x, y) indicating the spatial coordinates. The same hierarchies as in the STRE model (4.25) are used here, giving the following three levels.

(a) At the top level is a data model, which involves a dimension reduction of the underlying process through a basis representation,

Zt(x, y) = Yt(x, y) + t(x, y) Yt(x, y) = µt(x, y) + Φ(x, y)βt+

P

X

p=1

Φ(x, y)θpαpt

= µt(x, y) + Φ(x, y)βt+ Φ(x, y)Θαt,

where µt(x, y) is a fixed mean component, Φ(x, y)βtis the system dynamic component (also referred to as the state space, or SS component) and Φ(x, y)Θαt is a K-L ex-pansion of order P with orthonormal Φ(x, y)Θ (referred to as a FPCA component), accounting for the remaining spatial variations in the data.

(b) In the middle level is a process model, which assumes a random walk (or local level model) for the system dynamic,

βt= βt−1+ ut.

The motivation is, after appropriate detrending, this first order dependence structure would be adequate for most of the remote-sensing image time series considered in this thesis. Recall the exploratory analysis in section2.1, where it suggested that an AR(1) structure is appropriate for the majority of the LSWT time series after accounting for the seasonal structure. In addition, even though the above model means that the

element of βtfollows separate temporal evolution, the spatio-temporal dependence can be incorporated by the covariance structure of ut. More details on this issue are given in the paragraphs explaining the model assumptions.

(c) At the bottom level, the following distributions are assigned to the data and process model components. The measurement errors t(x, y) are assumed to be i.i.d. normally distributed as N (0, σ2). The residuals of the process model ut are assumed to be normally distributed as N (0, H), where H is symmetric, positive definite, but not necessarily diagonal. Finally, random coefficient vector αt is required to satisfy the assumptions of the PC scores as defined in Chapter 3. That is, αt ∼ N (0, Λ) with Λ = diag{λ1, · · · , λP}. Particularly, λp, p = 1, · · · , P , are arranged in decreasing order.

Putting (a), (b) and (c) together, the proposed model (using matrix notation) is

Zt= µt+ Φtβt+ ΦtΘαt+ t (4.27) βt= βt−1+ ut

where

Φ>Φ = I, Θ>Θ = I

αt∼ N (0, Λ) , Λ = diag{λ1, · · · , λP}

t∼ N (0, σ2I) ut∼ N (0, H) .

In model (4.27), Zt is the data vector, Φt is a (bivariate) basis matrix and t and ut are residual vectors of the data and process models. Model (4.27) is referred to as the state space functional principal component analysis and is abbreviated as the SS-FPCA model. Note that the subscript t in Φt is used to reflect the influence of the missing data at time point t.

The same notation was used in Chapter 3 for the MM-FPCA. In the following sections, the subscript t is dropped only when it is referred to the fitted results or a general case without emphasizing on the sparsity.

The SS-FPCA model extends the MM-FPCA in James et al. (2000) by incorporating the temporal dependence through a hierarchical design. The time invariant mean function Φtβ in the MM-FPCA is replaced by a time dependent mean function Φtβt. The dynamic of this function is governed by a first order random walk process in a lower hierarchy. With the

system dynamic component accounting for the temporal correlation, the FPCA component would be estimated based on (nearly) temporally independent data.

The SS-FPCA model also modifies the STRE model inCressie et al.(2010) by allowing more than one non-dynamic random component (ζt in model (4.25)). In addition, the SS-FPCA imposes structures on these non-dynamic random components so that they can provide a summary of the spatial variation patterns. As the constraints follow the assumptions of the MM-FPCA, the resulting random component would consist of spatial variation patterns of the corresponding PCs. In consequence, the random components in the SS-FPCA model would be more informative than their counterpart in the STRE model (4.25) and would fit the problem in this thesis better. Finally, dimension reduction is achieved through the functional representation of the random components and the truncation of the number of functional PCs. The mixed effect nature of the model suggests that the missing observations can be accommodated in a straightforward way. Both are desirable properties in terms of the application to high dimensional, sparse remote-sensing data.

The details of the model specifications are listed below.

(a) It should be pointed out that using a local level model for the system transition equation in model (4.27) is out of concern for computational simplicity. It is possible to assume βt= M βt−1+ ut for M 6= I (Cressie et al.,2010,Katzfuss & Cressie,2011), such as M = diag{m1, · · · , mK}. However, estimating such a propagator matrix M can be difficult and computationally intensive. It usually requires prior information to get a suitable design of M and sensible estimation result (Cressie & Wikle,2011). This can be hard for image time series, especially when βt is a vector of basis coefficient. On the other hand, the local level model assumption, though being non-stationary, can be appropriate for the remote-sensing environmental measurements, as many of them are indeed non-stationary in reality.

(b) No special structure is imposed on the residual covariance matrix in the process model, H. The only requirement is it being positive definite. It is possible to parameterize the H matrix, as suggested in section4.3.1. This typically involves imposing certain spatial structure on the H matrix, such as a covariogram model and a CAR model. In this way, the spatio-temporal dynamic of the process can be modelled. Specifically, imposing a diagonal structure on H would suggest separate evolution of the elements in βt. It can significantly simplify the estimation, but is often unrealistic in practice. This is

because most of the basis functions are not spatially ‘separable’, in the sense that their compact supports often overlap in space. Due to this overlapping, the elements in the basis coefficient vector βtwould not be independent. There might be some cases where the diagonal assumption is adequate, but this relies on a strong assumption of space-time separability and a spatially non-overlapping basis. To avoid setting too many impractical constraints, the H matrix is left unstructured for the SS-FPCA model, so that the residual process utcan be used to account for the (unknown) spatio-temporal dependence.

(c) It is required that Φ(x, y) is a orthonormal basis and Θ is a column orthonormal matrix.

This is to ensure that the estimated results are valid eigenvalues and eigenfunctions from a FPCA. The rationals for these assumptions have already been explained in Chapter3. Depending on the estimation methods, a final orthonormalization might be applied to the estimated bΘ as in James et al.(2000).

(d) To ensure the identifiability of the model, further assumptions are made on the random components βt, αtand model residuals t, ut. It is assumed that {βt}Tt=1 and {αt}Tt=1 are independent; {βt}Tt=1 is independent of {t}Tt=1; {αt}Tt=1 is independent of {ut}Tt=1 and {t}Tt=1. In addition, it is assumed that the estimation of βt at time point t relies on information from all the observed data {Zt}Tt=1. Whereas the prediction of αt at time point t requires only the information from Zt as in a FPCA. This assumption is similar to that of ζt in the STRE model (4.25) inCressie et al.(2010). The difference between the two models is that, while βt and ζt are independent but not conditionally independent given Z1:T in the STRE model (4.25), βt and αt are assumed to be also conditionally independent given Z1:T in the SS-FPCA.

It should be acknowledged at this stage that, it is always better to take into account the conditional dependence of αt and βt, wherever possible. However, the conditional independence assumption could be justified through the fact that αtare essentially PC scores. In a FPCA computed using matrix decomposition, the PC scores are obtained after the eigen-decomposition of the covariance matrix. In other words, they are not directly related to the extraction of the eigenfunctions and eigenvalues. Although the estimation of the SS-FPCA model would inevitably involve iterative steps, the conditional correlation between αt and βt is not presumed to have a large influence on model estimation if the algorithm is designed sensibly. Moreover, the evaluation of the conditional distribution for αt, βt|Z1:T is extremely difficult due to the different temporal dependence structures of βt and αt. To be specific, βt is governed by a

first-order Markov structure through βtt−1, which means the distribution of f (βt) for each time point t cannot be separated from the joint distribution of f (β1:T) due to the dependence. Whereas αt does not depend on its temporal neighbours and relies solely on the information at time t. Considering the complexity in determining the joint distribution f (αt, βt|Z1:T), this thesis assumes that αt and βt are conditionally independent.

(e) As mentioned in section4.1, it is sometimes sensible to use different bases for the state space and the FPCA component, such as

Zt(x, y) = µt(x, y) + Φβ(x, y)βt+ Φξ(x, y)Θαt+ t(x, y) .

This would offer more flexibility in describing the spatial/temporal variations. For example, basis Φβ(x, y) may be designed to capture the large scale temporal variation;

whereas Φξ(x, y) is intended to explain the smaller scale spatial variation via the FPCA.

However, this could complicate the estimation of the model, as some simplifications (e.g.

the matrix identity used in inverting high-dimensional matrix) may not be plausible if two different bases are used. As far as the problem in this thesis is concerned, the gain from specifying two different bases may not compensate the loss in the computational cost. Therefore, it is assumed that Φβ(x, y) = Φξ(x, y).