According to the Karhunen-Lo`eve (KL) orthogonal expansion (Wahba, 1990), the centred
random function Xc(t) = X(t) − µ(t) can be decomposed into
Xc(t) =
∞ X
j=1
φj(t)ξj, (2.4)
where ξj are uncorrelated random variables and φj(·) are the eigenfunctions of the covari-
ance operator of X. That is, φj(·) are the solutions to the eigenequations
Z
k(t, t0)φj(t0)dt0 = λjφj(t),
Z
φi(t)φj(t)dt = δij, (2.5)
Chapter 2. Statistical concepts for function-valued processes
where λ1 ≥ λ2 ≥ · · · ≥ 0 are the eigenvalues of k(·, ·) and δij is the Kronecker delta. Each
function φj(·) is called functional principal component (FPC) and the random variable ξj
is called its respective score.
As each deterministic function φj is normalised, the variance of X in the principal
direction φj is simply Varξj = λj. In other words, λj quantifies how much variation of
X is explained by φj. We can also show that
E||Xc(t)||2 =
∞ X
j=1
λj,
which means that the variance of X is equal to the sum of the variances of the projections
of X onto φj’s.
Since λj’s are arranged in nonincreasing order, we can calculate the cumulative fraction
of variance explained (CFVE) by the first J eigenfunctions via
CFVEJ = PJ j=1λj P∞ j=1λj . (2.6)
In practice, as we can only estimate a finite number of FPCs, the denominator of (2.6) is
replaced byPJj=1∗ λj, where J∗ is large.
One strategy to reduce eigenequation (2.5) to a matrix form is by representing the observed data as a linear combination of fixed (known) J basis functions (Ramsay & Sil-
verman, 2005, Section 8.4). This provides estimated eigenvalues ˆλj and the corresponding
FPCs ˆφj, where the maximum number of FPCs is J , the dimension of the basis. Some
drawbacks of this strategy are clear: the estimated FPCs are sensitive to the basis func- tions used to represent the data and to the sparsity level of the observed data. Another strategy is to estimate the covariance function nonparametrically and take its eigenfunc- tions as the FPCs (see e.g. principal component analysis through conditional expectation (PACE) in Yao et al. (2005)).
2.3.1
Multivariate FPCA
Ramsay & Silverman (2005) propose the bivariate FPCA, which can be straightforwardly extended to the multivariate functional principal component analysis (MFPCA). In this
approach, the functions Xl(t), l = 1, . . . , M , are concatenated into a unique long curve for
each individual of the sample. Next, FPCA is performed on the concatenated curves. As in FPCA, one obtains a vector of scores for each individual. However, this method might not work well as we often encounter different degrees of variability in different functional variables, which means that each functional variable may require a different number of
Chapter 2. Statistical concepts for function-valued processes
components.
Chiou et al. (2014) propose an approach in order to cope with these problems: they consider that each of the M functions may have different variation and extend FPCA to the multivariate case by using cross-covariance functions estimated nonparametrically through a local linear plane, so that their approach take into account the dependence among the functions. Nevertheless, the nonparametric estimation may suffer from the curse of dimensionality.
Berrendero et al. (2011) suggest an alternative way to reduce the dimension of multi- variate functional data. The main aim is to summarize the vector of functions for each individual employing a very small number of functions which retain most of the informa- tion from M original functions. This is done by looking for curves which are obtained after finding principal components based on the M × M covariance matrix of the M functional variables at each time t independently. As they show in their first simulation study, the curves that summarise the data tend to be quite rough when the random functions are weakly correlated, something which is difficult to explain. Therefore, time dependence should be considered or some regularisation should be applied in order to obtain smooth summary curves. In practice, this is important as we often encounter low correlated functions in settings with a large M .
Chiou & M¨uller (2014) propose a linear manifold model which identifies linear combi-
nations of the components of multivariate functional data and is determined by varying- coefficient functions that describe time varying relationships between those components. However, as in Berrendero et al. (2011), they obtain curves that summarise the data, rather than scores as in the Ramsay & Silverman (2005)’s FPCA approach.
A recent approach called MFPCA (Happ & Greven, 2018) can be applied to a more general case, allowing to include functional variables irregularly sampled and also observed on different dimensional domains. In this approach, each observation consists of M > 2
functions X1, . . . , XM, where each one may be defined on different domains, T1, . . . , TM,
with possible different dimensions. The article suggests an estimation strategy to calculate multivariate FPCs and scores based on their univariate counterparts.
In Chapter 5, we use a convolution-based approach where cross-covariance functions are explicitly modelled. This is achieved by assuming that multiple functional variables are constructed from the same source. Each functional variable, though, can also have independent features. This framework is especially important to guarantee positive defi- niteness of the covariance function of the multivariate response.
Chapter 2. Statistical concepts for function-valued processes