Parameter estimation - Contributions to Functional Data Analysis With a Focus on Points of Impa

In the following it is assumed that the labels for the points of impact are ordered such that τr = arg mins=1,...,S|bτr− τs|. Moreover we assume that S has been consistently estimated

by bS and max_r_=1,...,

S|τbr − τs| = OP(n

−1k). For estimating the parameters α, β₁, . . .β S we

impose the following additional assumptions for model (2.2): Additional to E("i|Xi(t), t ∈

[a, b]) = 0 we assume that V("i|Xi(t), t ∈ [a, b]) = σ2(g(ηi)) < ∞, where the variance

functionσ2 is defined over the range of g and is strictly positive. For simplicity the function g is assumed to be a known strictly monotone and smooth function with bounded first and second order derivatives and hence invertible. Model (2.2) then implies E(Yi|Xi) = g(ηi) as

can be seen as a generalization of a generalized linear model framework (cf. McCullagh and Nelder, 1989, Ch. 9). The following result shows that our model is uniquely identified:

Theorem 2.2. Let g(·) be invertible and assume that X_i satisfies Assumption 2.1. Then for all S∗ ≥ S, all α∗,β₁∗, . . . ,β_S∗∗ ∈ R, and all τ1, . . . ,τS∗ ∈ (a, b) with τ_k /∈ {τ1, . . . ,τS}, k = S +

1, . . . , S∗, we obtain E g(α + S X r=1 βrXi(τr)) − g(α∗+ S∗ X r=1 β∗ rXi(τr)) 2 > 0, (2.8) whenever |α − α∗| > 0, or supr=1,...,S|βr− βr∗| > 0, or supr=S+1,...,S∗|β_r∗| > 0.

Note that it already follows from Theorem 2.1 that all points of impact τ_r are uniquely identifiable under the assumptions of the theorem. Invertibility of g additionally ensures that the coefficientsα, β₁, . . . ,β_S are uniquely identified. Furthermore, it follows from the proof of Theorem 2.2, that under Assumption 2.1, E(Xi(τ) Xi(τ)T) is invertible, where Xi(τ) =

(1, Xi(τ1), . . . , Xi(τS))T.

Estimation of β₀ = (α, β₁, . . . ,β_S)T is performed by quasi-maximum likelihood. Define

X_i(bτ) = (1, X_i(bτ₁), . . . , X_i(bτ_S))T and denote the jth element of this vector asXb_{i j}. Forβ ∈ RS+1 letη_b_i(β) = X_i(bτ)T_β,

µn(β) = (g(ηb1(β)), . . . , g(ηbn(β)))

T_,

D_n(β) be the n×(S+1) matrix with

entries g0(η_b_i(β))Xb_{i j}, and letVb_n(β) be a n × n diagonal matrix with elements σ2(g(η_b_i(β))). Furthermore, denote the corresponding objects evaluated at the true points of impactτ_r by

X_i(τ), X_{i j},η_i(β), µ_n(β), D_n(β), and V_n(β); this convention applies also to the below defined objects.

Then our estimator bβ for β₀= (α, β1, . . . ,βS)T is defined as the solution of the S+ 1 score

equationsUb_n(bβ) = 0, where

U_n(β) =Db_n(β)TbV_n(β)−1(Y_n−µ_b_n(β)). (2.9) Note that the score equations are evaluated at the estimatesτbr instead ofτr.

In the following, it will be convenient to define

F_n(β) = D_n(β)TV_n(β)−1D_n(β) and bF_n(β) =Db_n(β)TbV_n(β)−1Db_n(β).

Observe that the S+ 1 × S + 1 matrix E(n−1F_n(β)) can be represented as E(n−1F_n(β)) =

[E(g02_(η

i(β))/σ2(g(ηi(β))) XikXil)]k,l, where k= 1, . . . , S + 1 and l = 1, . . . , S + 1. Let η(β)

and X_j be generic copies of η_i(β) and the jth component of X_i, respectively. This allows us to write E(n−1F_n(β)) = E(F(β)) with E(F(β)) = [E(g02(η(β))/σ2(g(η(β))) X_kXl)]k,l,

where we point out that E(F(β)) is for all β ∈ RS+1a symmetric and strictly positive definite matrix with inverse E(F(β))−1. Indeed, suppose E(F(β)) is not strictly positive definite, one would then derive the contradiction E((PS+1

j=1ajXjg0(η(β))/σ(g(η(β))))2) = 0 for nonzero

constants a₁, . . . , a_S₊₁. A similar argument can be used to show that E(bF(β)) is strictly positive

definite, where E(bF(β)) = [E(g02(η(β))/σ_b 2(g(η(β)))_b Xb_kXb_l)]_k,l.

In the rest of this section we assume X_i to be i.i.d. Gaussian distributed which covariance σ(s, t) satisfying Assumption 2.1. The following additional set of assumptions are used to derive more precise theoretical statements:

Assumption 2.3.

a) There exists a constant 0 < M_" < ∞, such that E("_ip_|X_i) ≤ M_" for some even p with p_{≥ max{2/κ + ε, 4} for some ε > 0.}

b) The link function g is monotone, invertible with two bounded derivatives |g0(·)| ≤ cg,

|g00(·)| ≤ cg, for some constant0≤ cg< ∞.

c) h(·) :=_σ2g_(g(·))0(·) is a bounded function with two bounded derivatives.

Assumption 2.3 a) states that some higher moments of "_i exist. While the condition on p ≥ 4 and p being even simplifies the proofs, the condition p > 2/κ is a more crucial one and is used in the proof of Proposition C.1 in Appendix C. The Assumptions 2.3 a) and b) and c) hold, for example, in the important case of a functional logistic regression with points of impact. Assumption 2.3 c) is satisfied, for instance, in the special case of generalized linear models with natural link functions. For the latter case, we haveσ2(g(x)) = g0(x) such that h(x) = 1. The boundedness conditions in b) and c) constitute a set of sufficient conditions needed to obtain our theoretical results.

Theorem 2.3. Let bS = S, maxr=1,...,S|τ_b_r− τ_r| = O_p(n−1/κ) and let X_i be a Gaussian process satisfying Assumption 2.1. under Assumption 2.3 we then obtain

n(bβ − β₀)_{→ N (0, (E(F(β}d ₀)))−1). (2.10) This result is remarkable; our estimator based on τ_b_r enjoys the same asymptotic efficiency properties as the infeasible estimator based on the true points of impactτ_r. It achieves the same asymptotic efficiency properties under classical multivariate setups (cf. McCullagh, 1983). In practice one might then replace E(F(β0)) by its consistent estimator n−1bF_n(bβ) in order to derive approximate results. This is a direct consequence of (C.24) and (C.50) in the supplementary Appendix C.

Parameter estimation under misspecified variance functions

So far, we have considered the case whereσ2(g(η_i(β))) is specified using a (correct) model assumption. In the following, we consider situations where only the mean function g(η_i(β)) can be specified, but where the functional form of σ2(·) is unknown. By Theorem 2.2, an estimator eβ for β₀ minimizes the squared error

e β = arg min β∈RS+1 1 2n n X i=1 (yi− g(ηbi(β))) 2_.

The estimator eβ can then be described as the solution of the score functionsUe_n(β) = 0, where

U_n(β) =Db_n(β)T(Y_n−bµ_n(β)). (2.11) Provided_|g000(x)| ≤ M_g, we get the following corollary by following the same arguments as in the proof of Theorem 2.3:

Corrolary 2.1. Under the Assumptions of Theorem 2.3, but with Assumption 2.3 c) replaced by

the assumption that_|g000(x)| ≤ M_gfor some0_{≤ M}_g< ∞, we have p

n(eβ − β₀)_{→ N (0, A}d −1BA−1), (2.12) where

A= E(g0(η(β₀))2X XT_{) and B = cov(g}0_(η(β

0)) X ") = E(g0(η(β0))2σ2(g(η(β0))) X XT).

In practice one might replace the sandwich matrix in (2.12) by their estimators, i.e., re- placing E(g0(η(β0))2X XT) by n−1 Pn i=1g0(ηi(eβ))bX_ibX T i and cov(g0(η(β0)) X ") by n−1Pn_i=1g0(η_b_i(eβ))2(y_i_{− g(}η_b_i(eβ)))2bX_iXb T i .

The above case corresponds to situations whereσ2(g(η_i(β))) is incorrectly specified by e

σ2_(g(η

i(β))) withσe

2_(g(η

i(β))) = 1. More general misspecifications lead to similar sandwich

estimators as in (2.12) provided eh(·) = g0(·)/σ_e2(·) is a bounded function with two bounded derivatives.

In document Contributions to Functional Data Analysis With a Focus on Points of Impact in Functional Regression (Page 72-75)