Original contributions - On reproducing kernel methods in functional statistics

In each chapter of this thesis we address a different statistical problem with functional data from an RKHS perspective. We summarize here the main theoretical contributions derived for each of them. In addition, we have coded algorithms in the programming language R (see R Core Team (2013)) for each of the proposed techniques.

In Chapter 2 we introduce a functional extension of the classical Mahalanobis distance. The expression of Mahalanobis distance for multivariate data involves the inverse of the covariance matrix. The functional counterpart of the covariance matrix is the covariance operator K defined in Equation (1.4). As mentioned, this operator is not invertible in L2[0, 1], so a direct extension of the distance is not possible. There are already a couple of interesting proposals trying to circumvent this problem. We propose a rather different perspective, motivated in RKHS terms, which is fully mathematically founded. Our proposed definition relies on the fact that the original Mahalanobis distance from x to m in the finite-dimensional case coincides with the norm in the RKHS of x − m. The classical distance can be thus expressed as a simple sum involving the inverse eigenvalues of the covariance matrix of the data. As pointed out in Lemma 2.1, this expression can be extended to the infinite-dimensional case, becoming an infinite sum. One could naively define the functional Mahalanobis distance in terms of this series. However, as it is apparent from the proof of the mentioned lemma, the series is convergent only when applied to a function in the RKHS. Usually one wants to compute distances between the trajectories of the process, which do not belong to the RKHS. Then our approach is based on replacing each trajectory with a “smoother” version that does belong to the RKHS. We propose to define the functional Mahalanobis-type distance as the distance

in the RKHS norm between these smoothed trajectories, which are obtained as the solution of the minimization problem defined in Equation (2.11). Both this solution and the Mahalanobis distance thus defined have simple explicit expressions, derived in Proposition 2.2. The minimization introduces a real smoothing parameter α. With Proposition 2.8 we prove that the choice of this parameter is not critical, in the sense that our distance is continuous in α. Another benefit of this result is that it implies the point convergence of the quantile functions of the distance, which is useful if one uses it to define a depth measure.

We prove that our proposal shares some interesting properties with the classical definition. For instance:

• Mahalanobis distance is invariant when an invertible transformation is applied to the point cloud. The proposed extension is invariant when an isometry in L2[0, 1] is applied to the curves (Theorem 2.4).

• It is a well-known fact that Mahalanobis distance for Gaussian data in Rd _dis-

tributes as a sum of d independent chi-squared variables. We prove (Theorem 2.5) that, for Gaussian processes, our distance follows an infinite sum of independent chi-squared variables. We also derive explicit expressions for the mean and variance of this distribution.

The obvious way to estimate the proposed distance in practice is to replace the eigenvalues and eigenfunctions of K in the expression of Proposition 2.2 with their sample counterparts. Under very general conditions this estimator is proved to be almost surely consistent (Theorem 2.10). Besides, we have also obtained the asymptotic behavior of the Mahalanobis distance between the mean function and its sample counterpart (The- orem 2.13).

In order to check the practical relevance of the proposal, we apply it to three problems in which Mahalanobis distance is classically used: exploratory analysis, functional binary classification and mean inference.

Chapter 3 has two parts. In the first part we address the problem of functional regression with scalar response. We propose to replace the the inner product in L2[0, 1] of the standard functional regression model with the inverse of Loève’s isometry (Eq. (1.3)) of a slope function in H(K). This model, defined in Equation (3.3), is espe- cially suitable to perform variable selection. In fact, it reduces to the classical finite- dimensional linear model when the slope function belongs to H0(K) (Eq. (1.1)). The

points t₁, . . . , tj ∈ [0, 1] that define this sparse slope function are sometimes called

“impact points”. Note that a finite model like in (3.7) can not be obtained with the standard L2 model. This would require the evaluation functionals x 7→ x(t0) to be

continuous, which is not the case.

An important advantage of this model is that it gives a theoretical ground to variable selection. We derive three suitable optimization criteria to obtain meaningful points

Chapter 1 Introduction

t1, . . . , tp for prediction. These criteria are proved to be equivalent in Proposition 3.1.

One of them is easily implementable in practice and it can be rewritten in an itera- tive way that directly suggests a greedy implementation of the proposal (Proposition 3.3). Besides, this criterion only depends on the covariances between the variables {X(t₁), . . . , X(tp)} and the response, which are simple to estimate, making the algo-

rithm really fast even for large data sets.

We propose to select the points that optimize the sample version of this criterion. Whenever the true slope function belongs to H₀(K), depending on p∗ points, we are able to consistently estimate them. If we assume that we know the true number of points p∗ to select, our estimator of the points converges almost surely and in quadratic mean to the true ones (Theorem 3.8). Once that the points are selected, one can easily predict the scalar response. This prediction is also proved to be almost sure consistent and it also converges in quadratic mean (Theorem 3.9). However, for most applications one does not know the optimal number of variables to select. In Equation (3.27) we propose a suitable estimator for this quantity, which converges almost surely to the true value p∗ (Theorem 3.11). Besides, we analyze how our estimator of the points behaves when one prefers to select a large conservative number of variables p (i.e. p > p∗). In Theorem 3.12 we prove that the selected points are “close” to the true points eventually as the sample size increases.

In the second part of Chapter 3 we extend the previous methodology to functional regression with functional response. The definition of the model should be adapted to select a set of points that is meaningful to predict the whole response curves. In this case the cross-covariance function between the regressors and response processes plays an important role. This function is essential part of the adapted optimality criteria, so it should be uniformly almost surely estimated to perform variable selection. This introduces some restrictions on the slope function β(s, t). Specifically, it is required that β(s, ·) ∈ H(K) for every s ∈ [0, 1] and that the stochastic process U (s) = Ψ−1_X (β(s, ·)) satisfies E[supsU (s)2] < ∞. In this setting, we are able to extend the consistency

results developed in the first part of the chapter: Theorem 3.16 for the optimal points, Theorem 3.17 for the estimated response curves and Theorem 3.18 for the number of selected points.

This extended model is adapted to perform variable selection in functional time series. The model with finite-dimensional kernel (i.e. β(s, ·) ∈ H0(K)) is proved to be an

autoregressive (AR) process in C[0, 1] with a unique strictly stationary solution (Propo- sition 3.21). The estimated lagged-covariance functions for AR processes are strongly consistent (Lemma 3.23), so all the previous results hold in this case. The proposal is proved to be competitive both in prediction accuracy and computational efficiency. Chapter 4 is devoted to functional logistic regression. The most common approach to functional logistic regression is defined in terms of the inner product in L2[0, 1]. However we prove in Theorem 4.1 that, when the distributions of X given Y = 0, 1 are Gaussian and homoscedastic, the logistic model induced by this model involves the inner product

in the RKHS. The derivation of this result requires Radon-Nikodym derivatives as in Equation (1.9). As in Chapter 3, this model is specially suited for variable selection. Whenever the slope function belongs to H0(K) and depends on the points t1, . . . , tp, it

reduces to the finite-dimensional logistic model with regressors {X(t1), . . . , X(tp)}.

We propose to select the points and the coefficients of the slope function using the max- imum likelihood (ML) criterion. We carefully analyze whether the ML estimator exists (Theorems 4.4 and 4.6). It turns out that non-existence problems of the ML estimator in the finite dimensional case are drastically worsened in the functional setting. For instance, the probability with which ML does not exists tends to one when the sample size goes to infinity. In order to circumvent these problems in practice we use Firth’s estimator, which exists for every sample, and select a small number of points (always less than 10).

In document On reproducing kernel methods in functional statistics (Page 37-40)