We adopt parsimonious basis representation to transform the FMM to basis space, which leads to divide-and-conquer computation. Consider the general FMM in (1.2), we assume that the responses {Yi(t), i = 1, . . . , N} take values in L2(T ), where T is a closed subset of Rd, d ≥ 1. Let {ϕj}∞j=1 denote a compactly supported, orthonormal basis of L2(T ).
We can expand Yi(t) by Yi(t) = P∞
j=1dijϕj(t) where dij = ⟨Yi, ϕj⟩ = R
T Yi(t)ϕj(t)dt. The coefficient sequence (di1, di2, . . .) lies in the space of square-summable sequences, denoted by ℓ2 =
n
dj :P∞
j=1d2j <∞o
. Since Y(t) is written as the linear combination of B(t), U(t) and E(t), it is natural to assume that these unobserved functional components also take values in the same L2(T ) space. With this assumption, all functional objects in FMM can be represented by a common basis. Specifically, denote Φ = (ϕ1, ϕ2, . . . )T, then all basis expansions can be represented by linear operations: Y = DΦ, B = B∗Φ, U = U∗Φ, and E = E∗Φ. Model (1.2) becomes DΦ = XB∗Φ + ZU∗Φ + E∗Φ. Since Φ preserves linear operation, the above model is equivalent to the dual space model D = XB∗ + ZU∗+ E∗.
This transforms FMM from the functional space L2(T ) to the dual space ℓ2. Since the due space model consists of discrete sequences only, estimation is much easier to perform.
Furthermore, as the correlations between basis coefficients are often substantially reduced, one can make independence approximations between columns of D, B∗, U∗, and E∗ in the dual space model following the idea of Morris and Carroll [46]. This further divides the dual space model into many independent regular mixed effect models. We denote the jth model by
dj = Xb∗j + Zu∗j + e∗j, j = 1, . . . , n (2.1)
where dj, b∗j, u∗j and e∗j denote the jth columns of D, B∗, U∗ and E∗ respectively. While the above divide-and-conquer strategy is suitable if using any orthonormal basis, we only fo-cus on compactly supported orthonormal basis such as Haar wavelets, Daubechies Wavelets, and spherical wavelets. These bases have the ability to capture local features of functional data, enable parsimonious representation which allows further compression, and have dis-crete transformation versions that are fast to compute. They are generally applicable to curves, images, surfaces, etc. As we will explain in Section2.2.2, using compactly supported orthonormal basis allows us to identify interesting local regions by performing testing in ba-sis space only, avoiding the need of inverse-transforming many posterior samples back to the data domain. Consider the jth model in (2.1), we slightly modify the priors and the random effect/residual distributions proposed by Morris and Carroll [46] to enable efficient varia-tional Bayes computation. In particular, denote dj = (d1j, . . . , dN j)T, b∗j = (b∗1j, . . . , b∗pj)T, u∗j = (u∗1j, . . . , u∗mj)T, and e∗j = (e∗1j, . . . , e∗N j)T. Specifically, our model can be written as
b∗i,j ∼ γi,j∗ N (0, qj∗τi,j) + (1− γi,j∗ )δ0, γi,j∗ ∼ Bernoulli(πj), q∗j ∼ IG(aj, bj), u∗j ∼ N(0, qj∗I), e∗j ∼ N(0, qj∗ζjI).
2.2. Model 25
Here, we have factored out the random effect variance q∗j from the prior variance of fixed effect b∗i,j and the residual variance. This new parameterization allows for convenient update of the approximate distribution of qj∗. Based on the above model setup, the joint posterior distribution of {b∗i,j}, {γi,j∗ } and q∗j can be written as
p({b∗i,j}, {γi,j∗ }, q∗j | dj, ζj, τi,j, πj)
∝ p(dj | {b∗i,j}, qj∗, ζj) p(qj∗) Yp i=1
p(b∗i,j | γi,j∗ , τi,j) p(γi,j∗ | πj). (2.2)
We treat πj, τi,j, ζj, and (aj, bj) as hyperparameters and make mean-field assumptions for the approximate distributions of {b∗i,j},{γi,j∗ }, and qj∗. This enables an efficient variational EM algorithm In particular, we assume that the approximate distribution can be factored as follows:
q({b∗i,j}, {γi,j∗ }, qj∗) = q(qj∗) Yp i=1
q(b∗i,j | γi,j∗ )q(γi,j∗ ). (2.3)
As the conditional posterior of {b∗i,j},{γi,j∗ }, and qj∗ all fall in the exponential family, we follow Blei [4] by assuming that each factor in the above approximate distribution also falls in the same exponential family. Therefore, in the E-step, the estimation of the approximate distribution boils down to the estimation of the natural parameters in exponential family.
This facilitates fast calculation of the approximate distributions.
In the M-step, conditional on the estimation of the approximate distributions, we update values of the hyperparameters πj, τi,j and ζj by directly maximizing the ELBO. Specifically, we are able to analytically solve the value of πj and τi,j by setting the first derivative of ELBO to zero. The value of ζj, however, needs to be searched by using an optimization algorithm.
In general, ELBO is neither a convex nor concave function of ζj because ζj is contained in the inverse of a covariance matrix. Thus, an optimization algorithm may converge to a local
optimum. The values of the hyperparameters (aj, bj) are determined by matching the mean of the inverse-Gamma prior with the initial estimate of q∗j while setting the prior variance to be fairly large (e.g., 103).
Based on the full conditional distribution in (2.2), the conditional distribution of dj is:
dj | ({b∗\i,j}, γ\i,j∗ , γi,j∗ = 1)∼ N(X(−i)b∗(−i,j), Σj+ X(i)XT(i)τi,j), (2.4) dj | ({b∗\i,j}, γ\i,j∗ , γ∗i,j = 0)∼ N(X(−i)b∗(−i,j), Σj). (2.5)
Instead of modeling distribution of γi,j and b∗i,j separately, estimation of joint distribution of (γi,j∗ , b∗i,j) will be more reasonable. The way to jointly model (γi,j∗ , b∗i,j) is through modeling q(γi,j∗ ) and q(b∗i,j | γi,j∗ ). Based on distribution in (2.4) and (2.5) and prior distribution of γi,j∗ , the approximate distribution of γi,j∗ is a Bernoulli distribution, i.e.,
q(γi,j∗ = 1)∼ Bernoulli(eπi,j), (2.6)
where
eπi,j = 1
1 + exp−Oi,j, (2.7)
Oi,j = log{ πj 1− πj
} − 1
2log(1 + XT(i)Σ−1j X(i)τi,j) (2.8) + Eq
1
2d∗Tj Σ−1j X(i)XT(i)Σ−1j τi,j q∗j(1 + τi,jXT(i)Σ−1j X(i))d∗j
,
where d∗j = (dj −P
l̸=iX(l)b∗l,j) and Eq denotes the expectation of random variable with respect to distribution q.
2.2. Model 27
By (2.2) and getting expectation from (2.6) and (2.9), the approximate distribution for qj∗ can be written as
The above steps are so-called E step in updating approximate distribution for parameters.
We still need to update hyperparameters πi,j, τi,j and ηj to finish one iteration in algorithm.
Specifically, we will use optimization techniques to update these hyperparameters. For exam-ple, after maximizing ELBO = Eq[log p(djk,{b∗i,j}, {γi,j∗ }, q∗j)]− Eq[log q({b∗i,j}, {γi,j∗ }, qjk∗ )], the algorithm reaches to the explicit solutions of πi,j, τi,j, which can be denoted by
πj =
After integrating and substitute (2.11) and (2.12) into the ELBO, Remaining hyperparam-eters ζj, can be obtained through the maximization of the following object function:
− log | 2πΣj | −aej
Since there is no explicit solution for (2.13), we utilize the climb-hill algorithm in MATLAB to perform the optimization. Since the computation of inverse matrix in (2.13) is computa-tionally expensive and sometimes unstable in terms of convergence, we show that through simplification, we can get rid of matrix inversion, leading to a speed up of computation in Appendix A.4. We list steps of the VFMM algorithm in Algorithm 2. More technical details can be found in Appendix A.4. To ensure fast convergence, we adopt Henderson’s Mixed Model equations [56, pages 275-286] to initialize parameters. While options for more parameter settings and detailed tuning are available, the only required inputs are the ob-served data Y and the design matrices X, Z. As the calculation is performed independently across the index j, Algorithm 2 can either be designed by using vector-form calculation or be distributed to multi-core computational units.
Algorithm 2: The VFMM algorithm
1 Initialize all parameters; while ELBO(t)− ELBO(t−1) > δ do
2 for all jk do
3 Update q(γi,jk∗ , Bi,jk∗ ) in (2.6) and (2.9);
4 Update q(qjk∗ ) in (2.10);
5 Update ELBO(t);
6 Update πjk, τi,jk in (2.11) and (2.12);
7 Update ζjk in (2.13) by hill-climbing algorithm;
8 end
9 Update ELBO(t);
10 end
2.2. Model 29