Two-step Methods for Differential Equation Models.

(1)

ABSTRACT

TAN, QIANWEN. Two-step Methods for Diﬀerential Equation Models. (Under the direc-tion of Subhashis Ghoshal.)

In engineering, physics, biomedical sciences, pharmacokinetics and pharmacodynam-ics (PKPD) and many other fields the regression function is often specified as solution of a system of ordinary diﬀerential equations (ODEs) given by

dfθ(t)

dt =F(t,fθ(t),θ), t∈[0,1]

here F is a known appropriately smooth vector valued function. Our interest lies in estimating θ from the noisy data.

A two-step approach to solve this problem consists of the first step fitting the data nonparametrically, and the second step estimating the parameter by minimizing the distance between the nonparametrically estimated derivative and the derivative suggested by the system of ODEs. In Chapter 2 we consider a Bayesian analog of the two step approach by putting a finite random series prior on the regression function using B-spline basis. By allowing the parameters to be subject-specific, we further make inference on fixed-eﬀect parameters which characterize how elements of subject-specific parameters vary among individuals due to systematic association with individual attributes.

(2)

re-gression driven by a system of ODEs. We study the asymptotic properties of the two-step method estimator based on quantile regression of θ and establish the n−1/2 _convergence rate. In Chapter 4 we further explore the Bayesian method for nonparametric quantile regression. We first estimate the whole quantile functions nonparametrically; then for a given τ-level quantile, we apply the two-step approach to estimate the paramter θ

(3)

(4)

Two-step Methods for Diﬀerential Equation Models

by Qianwen Tan

A dissertation submitted to the Graduate Faculty of North Carolina State University

in partial fulfillment of the requirements for the Degree of

Doctor of Philosophy

Statistics

Raleigh, North Carolina

2017

APPROVED BY:

Anastasios Tsiatis Wenbin Lu

Charles Smith Subhashis Ghoshal

(5)

DEDICATION

(6)

BIOGRAPHY

(7)

ACKNOWLEDGEMENTS

I would like to express my gratitude to my advisor Dr. Subhashis Ghoshal. He provided me great guidance, motivation and encouragement throughout the years on interesting research topics. Without his expertise and illuminating guidance, I could not complete the research work. Besides research, he set a great example to us with his dedication to work, self-motivation, time management skills and perfect balance of life and work. I could not have imagined having a better advisor and mentor for my Ph.D. study.

I wish to thank my committee members, Dr. Anastasios Tsiatis, Dr. Charles Smith, Dr. Wenbin Lu, and Dr. Tao Pang for taking precious time to review my dissertation and share their helpful comments and suggestions to my research.

I am also grateful to all faculty members in the department, who oﬀered a variety of interesting and useful lectures. I also want to thank all the staﬀ members in the department, who provide a friendly and comfortable environment.

Special thanks to all my good friends who shared laughs and tears with me in all these five years.

(8)

LIST OF TABLES

Table 2.1 Coverage and average length of the Bayesian credible intervals using two-step method and confidence interval obtained from nonlinear mixed eﬀects model for Gaussian error for one-compartment model of Theo-phylline. . . 22 Table 2.2 Coverage and average length of the Bayesian credible intervals using the

two-step method and confidence interval obtained from Varah-Brunel method for Gaussian error for Bergman’s minimal model. . . 29 Table 2.3 Coverage and average length of the Bayesian credible intervals using the

two-step method and confidence interval obtained from Varah-Brunel method for the scaled t6 error for Bergman’s minimal model. . . 30 Table 3.1 Monte Carlo bias and MSE of the two-step approach estimators for

quantile regression and mean regression when gamma data with chosen shape function for Lotka-Volterra equations are considered. . . 65 Table 3.2 Monte Carlo bias and MSE of the two-step approach estimators for

quantile regression and mean regression when gamma data with chosen scale function for Lotka-Volterra equations are considered. . . 66 Table 3.3 Monte Carlo bias and MSE of the two-step approach estimators for

quantile regression and mean regression when Weibull data with chosen shape function for Lotka-Volterra equations are considered. . . 67

(11)

LIST OF FIGURES

Figure 1.1 Viral load profiles for 10 subjects from the ACTG 315 study. . . 3

Figure 2.1 Theophylline concentrations for 12 subjects following a single oral dose. 11 Figure 2.2 Glucose and Insulin concentrations repeatedly measured over 75

min-utes for 20 healthy individuals in a regular intravenous glucose toler-ance test (IVGTT). . . 13 Figure 2.3 Observed data and individual posterior predictive profiles of glucose

concentration Gθk and population predictive profiles of glucose con-centration Gγ. . . 31

Figure 2.4 Observed data and individual posterior predictive profiles of insulin concentration Iθk and population predictive profiles of insulin concen-tration Iγ. . . 32

Figure 3.1 True derivative of loss function ρand the approximated ψn function. 40

Figure 3.2 Simulation results for gamma data with chosen shape function α(t) = 10t(1−t) + 0.5. The solid black curve is the true quantile function, the dashed blue curve is the estimated quantile function recovered based on two-step quantile regression estimator and the dashed red curve is the estimated quantile function recovered based on two-step mean regression estimator. . . 45 Figure 3.3 Simulation results of gamma data with chosen scale function α(t) =

10t(1−t) + 0.5. The solid black curve is the true quantile function, the dashed blue curve is the estimated quantile function recovered based on two-step quantile regression estimator and the dashed red curve is the estimated quantile function recovered based on two-step mean regression estimator. . . 47 Figure 3.4 Simulation results of Weibull data with chosen shape function α(t) =

1.6t(1−t) + 0.8. The solid black curve is the true quantile function, the dashed blue curve is the estimated quantile function recovered based on two-step quantile regression estimator and the dashed red curve is the estimated quantile function recovered based on two-step mean regression estimator. . . 48 Figure 3.5 Observed values and the predictive profiles of fθ,1 and fθ,2 over the 10

(12)

Figure 4.1 Simulation study results for prey population: (a) True quantiles (b) Es-timated quantiles using nonparametric simultaneous quantile regres-sion method at τ = {0.05,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,0.95} for n = 500 with the data points generated from gamma distribution. 80 Figure 4.2 Simulation study results for predator population: (a) True quantiles (b)

Estimated quantiles using nonparametric simultaneous quantile regres-sion method at τ = {0.05,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,0.95} for n = 500 with the data points generated from gamma distribution. 81 Figure 4.3 Simulation study results: given τ = 0.5, true quantile function,

(13)

LIST OF SYMBOLS

((Aij)) : a matrix Awith (i, j)-th element being Ai,j. AT : transpose of the matrix A.

Ai,: the i-th row of the matrix A. A,j: the j-th column of the matrix A.

rowss

r(A) : the sub-matrix of Aconsisting of r-th to s-th rows of Awith r < s.

colss_r(A) : the sub-matrix of Aconsisting of r-th tos-th columns ofA with r < s.

xr:s: the sub-vector consisting of r-th to s-th elements of a vector x.

vec(A) : the vector obtained by stacking the columns of Aone over another.

Ip: the identity matrix of order p.

diag(A1, . . . ,Ap) : block diagonal matrix in which the diagonal elements are square

ma-trices of any size (possibly even 1×1), and the oﬀ-diagonal elements are 0. maxeig(A) : the maximum eigenvalue of A.

mineig(A) : the minimum eigenvalue of A. ∥x∥: (∑p_i₌₁x_i2)1/2, the L2 norm of the vector x. f′(t) : _dtdf(t), the derivative of the function f(·). f(r)_{(t) :} dr

dtf(t), the r-th derivative of the function f(·). f(·) : a vector valued function.

∥f∥w: (

∫1

0 ∥f(t)∥

2_w(t)dt)1/2 _{for a vector valued function}_f _{: [0,}_1]→Rp _and_w_{: [0,}_1]→

[0,∞).

f(x) : (f(x1), . . . , f(xp))T for a real-valued functionf : [0,1]→R and a vectorx∈Rp.

⟨·,·⟩: inner product.

1A(·) : indicator function of the set A.

an=o(bn) : an/bn→0 as n → ∞for numerical sequences an and bn.

(14)

an≍bn: an =O(bn) and bn =O(an).

an≲bn: an =O(bn).

an≪bn: an =o(bn).

oP(1) : a sequence of random variables which converges in probability to zero.

OP(1) : a sequence of random variables bounded in probabiliry.

E(·) the mean vector of a random vector.

(15)

Chapter 1 Introduction

Diﬀerential equations are encountered in various branches of science such as environmen-tal system, viral dynamics of infectious disease and pharmacokinetics and pharmacody-namics (PKPD). One popular example is the Lotka-Volterra equations, also known as prey-predator equations. At time t ∈ [0,1] the prey and predator populations change according to the equations

dfθ,1(t)

dt = θ1fθ,1(t)−θ2fθ,1(t)fθ,2(t), dfθ,2(t)

dt = −θ3fθ,2(t) +θ4fθ,1(t)fθ,2(t),

where θ = (θ1, θ2, θ3, θ4)T and fθ,1 and fθ,2 denote the prey and predator populations at time t respectively. Another interesting example is HIV dynamics. A set of ordinary diﬀerential equations that describes the interaction between HIV virus and human body cells is given by

dX

dt = (1−ϵ)kV T −δX, dV

(16)

which has been proven useful for understanding the pathogenesis of HIV infection and de-veloping treatment strategies. In this HIV dynamics model,X andT are size of infected and uninfected immune cell populations, V is the size of the viral population. The pa-rameters in HIV dynamics model is viral clearance ratec, infected cell death rateδ, viral production rate p, probability of infectionk, and treatment eﬃcacyϵ. These parameters characterize intra-subject mechanisms related to interaction between virus and immune system. Note that only the viral load V(t) of the system of ODEs has been measured. Figure 1.1 shows viral load-time profiles for 10 participants in AIDS Clinical Trial Group (ACTG) protocol 315 [49] following initiation of potent antoretrovial therapy. Charac-terizing mechanisms of virus-immune system interaction that leads to such patterns of viral decay (and eventual rebound for many subjects) enhances understanding of the progression of HIV disease.

These models can be put in a regression model, either mean regression or quantile regression model with mean function orτ-th quantile function denoted byfθ : [0,1]→Rd

and fθ satisfies ODE

dfθ(t)

dt =F(t,fθ(t),θ), t∈[0,1], (1.1) here F is a known appropriately smooth vector valued function and θ is a parameter vector controlling the regression function.

1.1 Literature review

(17)

(18)

1.1.1 Non-linear least squares approach

If the ODEs can be solved analytically, then the usual nonlinear least squares (NLS) technique [29, 31] can be used to estimate the unknown parameters. Thus

ˆ

θ= argmin

θ∈Θ

n

∑

i=1

∥Yi−fθ(xi)∥2,

in many contexts, such closed form solutions are not available as evidenced in some of the previous examples. NLS was modified for this purpose by Brad [1] and Domselaar and Hemker [15]. Hairer et al. [24] and Mattheij and Molenaar [32] used the 4-stage Runge-Kutta algorithm as an alternative approach. The statistical properties of the cor-responding estimator have been studied by Xue et al. [52]. The strong consistency, √ n-consistency and asymptotic normality of the estimator were established in their work.

1.1.2 Two-step approach

Varah [47] used an approach of two-step procedure. In the first step each of the state variables is approximated by a cubic spline using the least squares technique. Let us denote the approximation by ˆf(·). In the second step, the parameter is estimated as

ˆ

θ = argmin

θ∈Θ

n

∑

i=1

_fˆ′_(t

i)−F(ti,fˆ(ti),θ)

2 .

(19)

of the squared deviation, that is

ˆ

θ = argmin

θ∈Θ

∫ 1 0

_fˆ′_(t)₋_F_(t,_fˆ_(t),_θ₎2_w(t)dt

and proved √n-consistency as well as asymptotic normality of ˆθ. The order of the B-spline basis is determined by the smoothness ofF(·,·,·) with respect to its first two argu-ments. Gugushvili and Klaassen [22] used the same approach but used kernel smoothing instead of spline. They also established √n-consistency of the estimator. Another mod-ification has been made in the work of Wu et al. [50]. They used penalized smoothing spline in the first step and numerical derivatives instead of actual derivatives of the non-parametrically estimated functions. In another work Brunel et al. [7] used nonparametric approximation of the true solution to (1.1) and then used a set of orthogonality condi-tions to estimate the parameters. The √n-consistency as well as asymptotic normality of the estimator were also established in their work.

1.1.3 Bayesian estimation techniques

ODE models in the Bayesian framework were considered in the works of Gelman et al. [19], Rogers et al. [40] and Girolami [21]. They obtained an approximate likelihood by solving the ODEs numerically. Using the prior assigned on parameters, MCMC technique was used to generate samples from the posterior. This method may suﬀer from computational complexity as well.

Campbell and Steele [9] proposed the smooth function tempering approach which is a population MCMC technique and it utilizes the generalized profiling approach [36] and the parallel tempering algorithm.

(20)

Gaussian process prior on the solution of the ODE and its derivative. The posterior distribution of the solution is used to draw the posterior sample of the parameter of interest. This method is computationally expensive and the likelihood is required to be known for this approach. The theoretical aspects of Bayesian estimation methods have not been yet explored in the literature.

Bhaumik and Ghosal [3] considered the Bayesian analog of the two-step method sug-gested by Brunel [6], putting a prior on the coeﬃcients of the B-spline basis functions and induced a posterior on parameter space. Bhaumik and Ghosal [3] further developed an eﬃcient two-step method Bayesian estiamation by directly considering the distance between the function in the nonparametric model and that obtained from RK4 method. They established a Bernstein-von Mises theorem for the posterior distribution of param-eters with n−1/2 _{contraction rate.}

1.1.4 Quantile regression

(21)

Yu and Moyeed [55] first introduced the quantile regression using Bayesian methods. Later Kottas and Gelfand [27], Gelfand and Kottas [18], Geraci Bottai [20] and Kottas and Krnjajic [28] proposed a few methods on generalization and extension of single level quantile regression under diﬀerent possible scenarios.

The main disadvantage of considering separate quantile regression using single level quantile regression is that the natural ordering among diﬀerent quantiles can not be en-sured. Addressing the non-crossing issue, He [25] proposed a quantile regression method assuming the response variable to be heteroskedastic. Neocleous and Portnoy [33] pro-posed a method to estimate the quantile curve using linear interpolation from an esti-mated gird of quantile curves. Takeuchi [43] and Takeuchi et al [44] proposed non-crossing quantile regression methods using support vector machine (SVM) [46]. Later Shim et al. [42] used doubly penalized kernel machine (DPKM) for estimating non-crossing quantile curves.

Dunson and Taylor [16] and Liu and Wu [30] proposed quantile regression methods for a grid of quantiles addressing the monotonicity constraint. Later Wu and Liu [51], Reich [37], Reich et al. [38], Reich and Smith [39] proposed linear quantile regression methods addressing the non-crossing issues. Recently, Tokdar and Kadane [45] and Das and Ghosal [13] proposed simultaneous linear quantile regression methods for univariate explanatory variables. Yang and Tokdar [53] extended that simultaneous linear quantile regression method to handle multivariate predictor case.

(22)

(23)

Chapter 2 Two-step Bayesian inference for

diﬀerential equation models on

longitudinal data

2.1 Introduction

(24)

oral dose D(mg/kg) of the anti-asthmatic agent theophylline, and blood samples drawn at several times following administration were assayed for theophylline concentration. As ordinarily observed in this context, the concentration profiles have a similar shape for all subjects; however peak concentration achieved, rise and decay vary substantially. These diﬀerences are believed to be attributable to inter-subject variation in the un-derlying pharmacokinetic process, understanding which is critical for developing dosing guidelines. To characterize these processes formally, it is a routine practice to represent the body by a simple compartment model [41]. For theophylline, the pharmacokinetics is modeled using a one-compartment model shown in (2.1) with first order absorption and elimination

dfθ,1(t)

dt =−kafθ,1(t), dfθ,2(t)

dt =kafθ,1(t)−kefθ,2(t),

(2.1)

where the initial status are fθ,1(0) =D and fθ,2(0) = 0. There are two diﬀerential

equa-tions in the one-compartment model. Since it is only theophylline serum concentration C(t) which is measured, we do not have measurements for compartment 1 while that of compartment 2, C(t) is equal to fθ,2(t) divided by the serum volume V = Cl/ke. The parameters in the one-compartment model are the first-order absorption rateka, the

first-order elimination rateke, and the clearance rateCl. For theophylline, a one-compartment

model is standard and has solution of the corresponding diﬀerential equations which yields

C(t) = Dkake Cl(ka−ke)

{

exp(−ket)−exp(−kat)

}

. (2.2)

(25)

disap-Figure 2.1: Theophylline concentrations for 12 subjects following a single oral dose.

(26)

muscles, liver and tissue is raised by the remote insulin in action. This lowers the glu-cose concentration in plasma, implying the β-cells to secrete less insulin, from which a feedback eﬀect arises. This integrated glucose-insulin system can be described by the following non-linearly coupled system of diﬀerential equations [17]

dG(t)

dt =−p1(G(t)−Gb)−X(t)G(t), G(0) =G0, dX(t)

dt =−p2X(t) +p3(I(t)−Ib), X(0) = 0, dI(t)

dt =−p4(I(t)−Ib) +p5(G(t)−p6)t, I(0) =I0,

(2.3)

where t = 0 is the glucose injection time. Diabetes is associated with a large number of abnormalities in insulin metabolism, ranging from an absolute deficiency to a combination of deficiency and resistance, causing inability to dispose glucose from the blood stream. Three factors, referred to as the metabolic portrait [34] and playing important roles for glucose disposal, are glucose eﬀectiveness SG = p1, insulin sensitivity SI = p3/p2, and pancreatic responsiveness ϕ1 =

Imax−Ib

p4(G0−Gb)

and ϕ2 =p5×104.

(27)

Figure 2.2: Glucose and Insulin concentrations repeatedly measured over 75 minutes for 20 healthy individuals in a regular intravenous glucose tolerance test (IVGTT).

individuals are needed so that estimation of the subject-specific parametersθk is feasible and the large sample approximation to the distribution of ˆθk may be applied. In this

chapter, we consider the case when number of repeated measurements n is suficiently large to support fitting the individual model separately by individual to obtain reason-able subject-specific estimators for θk. In the first stage, we consider the individual-level

model. For each individual k, 1 ≤ k ≤ l, we carried out a Bayesian analog of two-step appoach of Brunel [6] fitting a nonparametric regression model using the B-spline basis. We assign priors on the coeﬃcients of the basis function. A posterior is then induced on

θk using the posterior of the coeﬃcients of the basis functions. In the second stage, we

consider a population model that we fit the subject-specific parameters θk by individual

(28)

parameters and individual characteristics.

The chapter is organized as follows. Section 2.2 contains the model assumptions and prior specifications. The main results are given in Section 2.3. In Section 2.4 we carry out simulation studies under diﬀerent settings. We analyze a real life data in Section 2.5. Proofs of the main theorems are given in Section 2.6.

2.2 Model assumptions and prior specifications

We have a system of d ordinary diﬀerential equations (ODEs) given by

dfθ,j(t)

dt =Fj(t,fθ(t),θ), t∈[0,1], j = 1, . . . , d, (2.4)

where fθ(·) = (fθ,1(·), . . . , fθ,d(·))T and θ ∈ Θ, a compact subset of Rp. Let us denote F(·,·,·) = (F1(·,·,·), . . . , Fd(·,·,·))T. We also assume that for a fixed θ, F ∈ Cm−1((0,

1),Rd) for some integerm >1. Then, by successive diﬀerentiation of the right hand side of (2.4), it follows that fθ ∈Cm((0,1)). By the implied uniform continuity, the function and its several derivatives uniquely extend to continuous functions on [0,1].

For each individual, we consider an n×d matrix of observations ofYk for 1≤k ≤l

with Yij denoting the measurement taken on the j-th response at the time point ti,

0 ≤ ti ≤ 1, i = 1, . . . , n and j = 1, . . . , d. For the k-th individual, denoting εk =

((εijk))1≤i≤n,1≤j≤d as the corresponding error matrix. The proposed model is given by

(29)

while the data is generated by the model

Yijk =fk,j0 (ti) +εijk, i= 1, . . . , n, j = 1, . . . , d, k = 1, . . . , d,

where f_k0(·) = (f_k,0₁(·), . . . , f_k,d0 (·))T denotes the true mean vector for the k-th individual which does not necessarily belong to the {fθ : θ ∈ Θ}. We also denote the true mean

matrix by f0 _{= ((f}0

k,j))1≤k≤l,1≤j≤d. We assume that f0 ∈Cm([0,1]). Let εijk iid

∼P0, which is a probability distribution with mean zero and finite varianceσ2

0 fori= 1, . . . , n,j = 1, . . . , dand k= 1, . . . , l.

In the first stage we consider the individual-level model focusing on the subject-specific parameter θk for k = 1, . . . , l. Since the expression for fθ is usually not available, the

proposed model is embedded in the nonparametric regression model fork-th individual

Yk =XnBn,k+εk, (2.5)

where Xn = ((Ns(ti)))1≤i≤n,1≤s≤Jn, {Ns(·)}

Jn

s=1 being the B-spline basis functions with orderm and kn−1 interior knots with increasing dimension Jn =kn+m−1. We choose

interior knots 0< ξ1 < ξ2 <· · ·< ξkn−1 <1 to satify the pseudo-uniformity criterion:

max

1≤i≤kn−1|ξi+1−2ξi+ξi+1|=o(k −1

n ),

max 1≤i≤kn−1|

ξi−ξi−1|

min

1≤i≤kn−1|ξi−ξi−1| ≤M,

(2.6)

for some constant M. Here ξ0 and ξkn are defined as 0 and 1 respectively. The criterion (2.6) is required to apply the asymptotic results obtained in Zhou et al. [56] where they mention the similar criterion in equation (3) of that paper. Here we denoteBn,k =

(

βJn×1

(30)

. . . ,βJn×1

k,d

)

the matrix containing the coeﬃcients of the basis functions for d responses. We considerP0 to be unknown and useN(0, σk2) as the working distribution for the error

of the k-th individual where σk should be treated as another set of unknown parameters

for k = 1, . . . , l. Denoting by Qn, the empirical distribution funtion of ti, i = 1, . . . , n,

we assume that for some probability measure Q on [0,1] with positive and continuous density

sup

t∈[0,1]

|Qn(t)−Q(t)|=o(kn−1). (2.7)

For each individual k, we put an inverse gamma prior on σ2

k which is given by

σ_k2 ∼InvGamma(a0k, b0k).

Conditional on σ2

k, let the prior distribution on the coeﬃcients be given by

βk,j|σ2k∼ N

(

0, n2k_n−1σ_k2IJn

)

.

Simple calculation yields the posterior distribution for βk,j as

βk,j|(Yk),j ∼ N

((

X_nTXn+

knIJn n2

)₋1

X_nT(Yk),j, σk2

(

X_nTXn+

knIJn n2

)−1)

(2.8)

and the posterior distributions of βk,j and βk,j′ are mutually independent for j ̸= j′; j, j′ = 1, . . . , d. In the model of (2.5), the expected response vector at a point t ∈[0,1] is given by BT

n,kN(t), where N(·) = (N1(·), . . . , NJn(·))

T_.

(31)

(0,1). We define

Rf(η) =

{ ∫ 1 0

∥f′(t)−F(t,f(t),η)∥2w(t)dt

}1/2

ψ(f) = argmin

η∈Θ

Rf(η).

It is easy to check that ψ(fη) =η for all η∈Θ. Thus the map ψextends the definition

of the parameter θk beyond the model. Let us define θ0,k = ψ(fk0). Thus θ0,k describes

the projection of the true regression function on the parametric model. We assume that

θ0,k lies in the interior of Θ. A posterior of θk is induced on Θthrough the mapping ψ

acting on f(·) =BT

n,kN(·) and the posterior of Bn,k given by (2.8).

Remark 2.1. Note thatf0(·) need not be a solution of the ODE. In real life it is almost impossible to accurately describe a data generating mechanism in terms of a mathemati-cal model. ODE is a useful tool to model many dynamic systems within a margin of error. This justifies the study of misspecified regression function in the context of ODE model. Often we are more interested in inferring on the parameters rather than the regression function described by the ODE model. Then the role of the true parameter is played by the parameter value which brings the ODE model closest to the true regression function. The definition of θ0 reinforces this intuition.

At the second stage, we consider the population-level model

θk =d(xk,γ,ηk) (2.9)

where d is a p-dimensional function depending on an r ×1 vector of fixed eﬀects γ,

ηk is a p×1 vector of random eﬀects associated with individual k, and xk is vector

(32)

θk vary among individuals, due to systematic association with individual attributes in xk and to unexplained variations in the population of individuals, for example, natural,

biological variation, represented by ηk. Sometimes we are interested in certain function h(·) of θk instead of θk itself. In this case, population model can be easily modified to

be h(θk) = d(xk,γ,ηk). A common special case of (2.9) is that of a linear relationship

between θk and fixed and random eﬀects as in usual, i.e., θk = Xkγ +ηk where Xk is

a design matrix depending on elements of xk. Define θ = (θT1, . . . ,θTl )T and X = (X1T, . . . ,X_lT)T to be the design matrix for the fixed eﬀect. Then the second step of two-step procedure is given by the optimization as follows:

argmin

γ,η1,...,ηl

l

∑

k=1

∫ 1 0

∥f_β′_k(t)−F(t,fβ_k(t),Xkγ+ηk)∥2w(t)dt.

Motivated by the linear population model, the induced posterior of fixed eﬀect γ is obtained as (XT_X₎−1_XT_θ _{and the induced posterior of random eﬀects} _η

k = θk − Xk(XTX)−1XTθ.

2.3 Main results

Our objective is to study the asymptotic behavior of the posterior distribution of√n(γ− γ0). The asymptotic representation of

√

n(γ −γ0) is given by the next theorem under the assumption that for all ϵ > 0, inf

η:∥η−θ0,k∥≥ϵRf 0

k(η) > Rf

0

k(θ0,k) for k = 1, . . . , l. We denote Dl,r,sF(t,f,θ) = ∂l+r+s/∂θs∂fr∂tlF(t,f(t),θ). Let the matrix

Jθ0,k =

∫ 1 0

[

(D0,0,1F(t,fk0(t),θ0,k))TD0,0,1F(t,fk0(t),θ0,k)−D0,0,1Sk(t,fk0(t),θ0,k)

]

(33)

be nonsingular, whereSk(t,f(t),θ) = (D0,0,1(F(t,f(t),θ)))T((fk0(t))′−F(t,fk0(t),θ0,k)).

Letm ≥5 and n1/2m _≪_k

n≪n1/8. Also define

Γk(z) =

∫ 1 0

([

D0,1,0Sk(t,fk0(t),θ0,k)−(D0,0,1F(t,fk0(t),θ0,k))TD0,1,0F(t,fk0(t),θ0,k)

]

w(t)

−d

dt[(D0,0,1F(t,f 0

k(t),θ0,k))Tw(t)

)

z(t)dt.

LetAk(t) stands for thep×d matrix

J_θ0−1

,k

([

D0,0,1Sk(t,fk0(t),θ0,k)−(D0,0,1F(t,fk0(t),θ0,k))TD0,0,1F(t,fk0(t),θ0,k)

]

w(t)

−d dt

[

(D0,0,1F(t,fk0(t),θ0,k))Tw(t)

])

.

Then we have, with (Ak),j standing for thejth column ofAk(t),

J_θ0−1

,kΓk(fk) =

d

∑

j=1

∫ 1 0

(Ak),j(t)NT(t)βk,jdt= d

∑

j=1

GT_k,jβk,j, (2.10)

where GT k,j =

∫1

0(Ak),j(t)N

T_(t)dt _{which is a} _p_×_J

n matrix for j = 1, . . . , d and k = 1,

. . . , l.

Theorem 2.1. Define

µk =

√ n

d

∑

j=1

GT_k,j(X_nTXn)−1XnT(Yk),j−

√ nJ_θ−1

0,kΓ(f 0

k),

Σk = n d

∑

j=1

Gk,j(X_nTXn)−1GT_k,j

(34)

. . . , l. Define µ= (µT

1, . . . ,µTk)T. Let

µγ = (XTX)−1XTµ,

Σγ = (XTX)−1XT diag(Σ1, . . . ,Σl)X(XTX)−1.

(2.11)

IfHk,jis nonsingular for allj = 1, . . . , d,k = 1, . . . , l, andD0,2,1F(t,y,θ)andD0,0,2F(t,

y,θ)are continuous in their arguments, then on a set in the sample space with high true

probability, then the posterior distribution of √n(γ−γ0) assigns most of its mass inside

a large compact set, which implies that the posterior distribution of γ − γ0 contracts

at 0 at the rate of n−1/2 with high probability under the truth. Moreover, the posterior

distribution of √n(γ−γ0) is approximated by a normal distribution as follows

∥Π(√n(γ−γ0)∈ ·|Y)− N(µγ, σ02Σγ)∥TV =oP0(1). (2.12)

2.4 Simulation study

2.4.1 Theophylline

We consider the one-compartment model shown in (2.1) to study the posterior distri-bution of γ. This set of ODEs describes the pharmacokinetics process of absorption, distribution and elimination governing concentration of drug theophylline achieved. We consider the case that the true regression function is the solution of the ODEs. We are interested in three parameters, the first-order absorption rate ka, the first-order

elimi-nation rate ke and the clearance rate Cl. For the population model, we reparameterize

(35)

model is assumed to be given by

k∗_a,k=γ1+η1k, k∗e,k =γ2 +η2k, Cl∗k=γ3+η3k

is a linear population model. This model enforces positivity of the pharmacokinetics parameters for each k. Understanding the typical values of the parameters in f, how they vary across individuals in the population, and whether some of this variation is associated with individual characteristics may be addressed through inference on the parameters γ. The components of γ = (γ1, γ2, γ3)T describe both the typical values and the strength of systematic relationships between elements of subject-specific parameters

θk.

We consider l = 12 individuals in the simulation study. For each individual and sample size n, the ti’s are observed time points, which are chosen to be equally-spaced

(36)

Table 2.1: Coverage and average length of the Bayesian credible intervals using two-step method and confidence interval obtained from nonlinear mixed eﬀects model for Gaussian error for one-compartment model of Theophylline.

n Two-step Bayesian NLMM

Coverage (se) Length (se) Coverage (se) Length (se) 50 ka

89.7 (0.05) 1.322 (0.310) 91.3 (0.04) 0.773 (0.185) ke 90.1 (0.06) 0.514 (0.109) 91.8 (0.02) 0.199 (0.047) Cl 89.7 (0.05) 0.679 (0.154) 91.5 (0.04) 0.234 (0.045) 100 ka

91.7 (0.03) 0.871 (0.238) 95.3 (0.02) 0.612 (0.167) ke 92.1 (0.03) 0.264 (0.063) 95.0 (0.02) 0.134 (0.043) Cl 90.6 (0.03) 0.542 (0.092) 92.5 (0.02) 0.187 (0.044) 500 ka

95.3 (0.01) 0.608 (0.102) 97.8 (0.01) 0.406 (0.081) ke 96.3 (0.01) 0.147 (0.054) 97.4 (0.01) 0.099 (0.023) Cl 95.1 (0.01) 0.153 (0.041) 96.7 (0.01) 0.108 (0.033)

we have p = 3, d = 1 and the ODEs are given by (2.1) for t ∈ [0,25] with initial conditions fθ,1(0) = 4.02 and fθ,2(0) = 0. Then take γ0 = (0.466,−2.455,−3.228)T. We generate the data from Yijk = fθ0,k,j(ti) +εijk, where fθ0,k,j(·) representing the j-th compartment mean function, satisfies the corresponding ordinary diﬀerential equation with true subject-specific parameters being θ0,k.

The true distribution of error εijk among the repeated measurements is taken to be

(37)

σ2

k, we put Gaussian priors on the coeﬃcients βk|σ2k with mean vector 0 and dispersion

matrixn2_k

nσk2IJn for two-step Bayesian method. We choosekn−1 equally-spaced interior knots. As far as choosingkn is concerned, we take m= 3 and kn = 10,15,20 for n = 50,

100 and 500 respectively guided by Theorem 2.1 and cross validation. The simulation results are summarized in the Tables 2.1. Not surprisingly the NLMM based confidence intervals obtained from the NLS method are shorter and with an higher coverage rate. When there is an explicit solution to the ODEs system, we definitely will choose the NLS method over the two-step method. Nevertheless the two-step method gives the coverage even if it is not as eﬃcient as the NLS method when solution is explicit.

2.4.2 Bergman’s minimal model

We consider Bergman’s mininal model of glucose disappearance shown in (2.3) to study the posterior distribution of γ. We consider the case that the true regression function is the solution of the ODE. We are interested in the following three factors, glucose eﬀectiveness SG = p1, insulin sensitivity SI =p3/p2 and pancreatic responsivenessϕ1 =

Imax−Ib

p4(G0−Gb)

and ϕ2 = p5 ×104. Consider these factors as functons h(·) of the original parameters θk. For the population model, a single parameter is associated with each

fixed eﬀect, with independent random eﬀect, i.e. h(θk) = γ +ηk for k = 1, . . . , l. We consider l = 20 individuals in the simulation study. For each individual and sample size n, the ti’s are observed time point, which are chosen to be equally-spaced and spread

(38)

the coverage and the average length of the corresponding credible interval over these 500 replicates. The estimated standard error of the interval length and coverage are given inside the parentheses in the tables. Our method is abbreviated as “Two-step Bayesian” in the tables. We also consider 500 replications to construct the 95% equal tailed confidence interval based on asymptotic normality as obtained from the estimation method introduced by Varah [47] and modified and studied by Brunel [6]. We abbreviated this method as “Varah-Brunel” in the tables. The estimated standard errors of the interval length and coverage are given inside the parentheses in the tables.

In this example, not all states of the system of ODEs have been measured; the remote insulin compartment, represented byX(t) has not been measured. In this case, we replace X(t) by the solution of the diﬀerential equation given by

X(t) = p3e−p2t

∫ t

0

ep2sI(s)ds−p3Ibe−p2t

(_ep2t₋₁

p2

)

.

Note that the optimization step for the third compartmentI(t) is linear in the parameters if we reparameterizep∗₆ =p5p6. Hence the optimization step is equivalent to obtaining an ordinary least square solution. For the first compartment, after we replace X(t) by the above equation, it is linear in the parameters p1 and p3. We first treat p2 as known and solve for p1 and p3 as functions of p2 which we optimize using standard tools.

(39)

diﬀerential equation with true subject-specific parameters being θ0,k.

The true distribution of errorεijk in the repeated measurements is taken eitherN(0,

52_{) or a scaled} _{t-distribution with 6 degrees of freedom, where scaling is done in order} to make the standard deviation 5, where i = 1, . . . , n, j = 1, . . . , d and k = 1, . . . , l. We put an inverse gamma prior on σ2

k with shape and scale parameters being 100 and

0.01 respectively. Conditional on σ_k2, we put Gaussian priors on the coeﬃcients βk,1|σk2

and βk,2|σk2 with mean vector 0 and dispersion matrix n2knσk2IJn for two-step Bayesian method. We choosekn−1 equally-spaced interior knots. We choose m= 6 and kn = 16,

17,20 for n = 50,100 and 500 respectively. The simulation results are summarized in the Tables 2.2 and 2.3. Not surprisingly asymptotic normality based confidence intervals obtained from the “Varah-Brunel” method are shorter but too optimistic, failing to give adequate coverage for finite sample sizes since the delta method is known underestimate variation.

2.5 Real data Analysis

(40)

in a 30% solution) is administered intravenously over a 60 seconds period to 20 healthy overnight-fasted subjects, and subsequently the glucose and insulin concentrations in plasma are repeatedly measured at 0, 2, 4, 6, 8, 10, 20, 30, 40, 50, 60 and 75 minutes. The individual glucose and insulin profiles are shown in Figure 2.2. We have two main interests in this real data analysis: (i) recover the true individual profiles of glucose and insulin concentrations using our two-step Bayesian approach; (ii) study the impact of body mass index (BMI) and gender to insulin sensitivity for healthy subjects.

We use B-spline basis of order 7 with kn= 2. We put an inverse gamma prior on σ2

with shape and scale parameters 100 and 0.01 respectively and usew(t) =t0.5₍₁₋_t)0.5_,_t_∈ [0,1]. Conditional onσ2

k we put a Gaussian prior onβwith mean vector0and dispersion

matrix n2_k

nσ2kIJn. Samples of size 500 are drawn from the posterior distributions of subject-specific parametersθ = (p1, p2, p3, n, γ, h, G0, I0)T. For the population model, we introduce two subject covariates which are body mass index (BMI) as x1 and gender as x2 (x2 = 1 for female andx2 = 0 for male), hence the population model can be written as θk =γ0+γ1x1k+γ2x2k+ηk fork = 1, . . . , l. We recover the individual profiles Gθk(·) and Iθk(·) by using the posterior of subject-specific parameters θk. We set the random eﬀect to be 0 to obtain population profilesGγ(·) andIγ(·). The individual posterior predictive

profiles of Gθk(·) and Iθk(·) are superimposed on the data in Figure 2.3 and 2.4 by solid red curve; and the population posterior predictive profiles of Gγ(·) and Iγ(·) are

(41)

12, 13 and 18; as for other subjects, the glucose concentrations decays at a stable rate from 0 minute to 75 minutes, i.e., subject 1, 8, 9, 10 and 16. Also, we note that for subject 14, 16 and 17 the individual profile is quite diﬀerent from the population profile, it can be interpreted as that the variations among the subject-specific paramters are not fully explained by BMI and gender. There are some unexpained variations remain which are attributible to random eﬀects. This phenomenon is also shown in Figure 2.4 for subject 2, 7, 8, 9 and 17.

(42)

IVGTT study.

2.6 Proofs

Proof of Theorem 2.1. Following Bhaumik and Ghosal [2], we know that the posterior

distribution of each subject-specific parameter θk for k = 1, . . . , l,

√

n(θk −θ0,k) is

ap-proximated by a normal distribution with mean vector µk and dispersion matrix σ02Σk.

This distribution matches with the frequentist distribution of the estimator in Brunel [6]. Note thatγ = (XT_X₎−1_XT_θ_{, is a linear function of}_θ _where_θ _{= (}_θT

1, . . . ,θlT)T. So it is

straightforward to state that the posterior distribution of√n(γ−γ0) is approximated by a normal distribution with mean vector µγ and dispersion matrix σ02Σγ given by (2.11).

Since the asymptotic mean µk of the posterior distribution of each θk matches with the centerized and normalized distribution of the estimator in Brunel [6], it follows that

µk is stochastically bounded. Hence µγ is stochastically bounded as the dimension of

design matrix X is pl×r and fixed.

In the proof of Lemma 4 of Bhaumik and Ghosal [2], they have already shown that the eigenvalues of Gk,j(XnTXn)−1Gk,j are of the order of n−1. The eigenvalues of asymptotic

(43)

Table 2.2: Coverage and average length of the Bayesian credible intervals using the two-step method and confidence interval obtained from Varah-Brunel method for Gaussian error for Bergman’s minimal model.

n Two-step Bayesian Varah-Brunel

Coverage (se) Length (se) Coverage (se) Length (se)

50 p1 89.1

(0.04) 0.023 (0.011) 83.5 (0.05) 0.018 (0.008) SI 95.3 (0.02) 0.037 (0.009) 85.4 (0.05) 0.021 (0.010) ϕ1 90.7 (0.03) 8.863 (1.865) 84.1 (0.05) 7.713 (1.192) ϕ2 93.3 (0.03) 8.671 (2.185) 85.3 (0.05) 6.933 (1.412) 100 p1

91.0 (0.02) 0.015 (0.008) 86.2 (0.03) 0.013 (0.005) SI 95.3 (0.02) 0.017 (0.009) 89.7 (0.03) 0.015 (0.007) ϕ1 91.3 (0.02) 8.524 (1.776) 86.9 (0.03) 7.299 (1.155) ϕ2 95.3 (0.02) 7.832 (1.698) 88.2 (0.03) 6.512 (1.203) 500 p1

(44)

Table 2.3: Coverage and average length of the Bayesian credible intervals using the two-step method and confidence interval obtained from Varah-Brunel method for the scaled t6 error for Bergman’s minimal model.

n Two-step Bayesian Varah-Brunel

Coverage (se) Length (se) Coverage (se) Length (se)

50 p1 87.3

(0.04) 0.022 (0.010) 82.1 (0.05) 0.020 (0.007) SI 93.2 (0.02) 0.041 (0.009) 83.8 (0.06) 0.032 (0.010) ϕ1 89.1 (0.04) 8.741 (1.565) 82.4 (0.05) 7.013 (0.792) ϕ2 91.5 (0.03) 8.817 (1.815) 84.6 (0.05) 7.339 (1.542) 100 p1

89.0 (0.02) 0.018 (0.008) 84.7 (0.03) 0.012 (0.005) SI 95.1 (0.02) 0.026 (0.009) 87.6 (0.03) 0.014 (0.007) ϕ1 90.6 (0.02) 8.542 (1.676) 85.5 (0.03) 6.992 (1.034) ϕ2 94.3 (0.02) 7.522 (1.843) 87.6 (0.03) 6.821 (1.230) 500 p1

(45)

(46)

(47)

Chapter 3 Two-step approach for quantile

regression driven by diﬀerential

equations

3.1 Introduction

(48)

functions. Koenker and Bassett [26] discussed advantages of quantile regression in some non-Gaussian settings.

In this chapter we consider a two step approach for quantile regression driven by a system of ODEs. For a fixed 0 < τ < 1, let the τth quantile of the response variable Y be denoted by Qτ(Y|t). The ODE model assumes that Qτ(Y|t) is described by a set

of ODEs in t containing a set of unknown parameters. In the two-step approach, we first estimate the quantile function and its derivatives nonparametrically using B-spline basis expansion. We then minimize the distance between the nonparametric estimate of the derivative and that obtained from the ODE with the nonparametric estimate of the quantile function plugged in the equations defining the model. We study the asymp-totic properties of the two-step approach estimator based on quantile regression of θ

(49)

error needs to be accounted in the estimates. Secondly, their approach is based on a suﬃciently smooth criterion function, but the criterion function for the quantile function is even discontinuous. We therefore modify our approach using a sequence of smooth cri-terion functions approximating the one for the quantile function. Thus two modifications of the results of Yohai and Maronna [54] are needed: allowing a sequence of criterion functions instead of a fixed one and letting the model to be slightly misspecified in that the true distribution does not belong to the model but the approximation error decays to zero suﬃciently fast. Both modification requires carefully dealing with the terms arising due to the approximations to the criterion function and the truth.

The chapter is organized as follows. Section 3.2 contains the model assumptions. The main results are given in Section 3.3. In Section 3.4 we carry out simulation studies under diﬀerent settings. We analyze a real life data in Section 3.5. Proofs of the main theorems are given in Section 3.6 and those of some auxiliary lemmas in Section 3.7.

3.2 Model assumptions

Consider a matrix of observations Y with Yij as entries denoting the ith measurement

taken on thejth response variable at the time point ti where i= 1, . . . , nand j = 1, . . . ,

d. The observations can be represented by Y = ((Yij))1≤i≤n,1≤j≤d = (Y1(n×1), . . . ,Y

n×1

d ).

The unknown quantile function Qτ(Y|t) = (Qτ(Yj|t) : j = 1, . . . , d) of observations of Y as a function oft with a set of unknown parameters θ is denoted by fθ(t) where the

dynamics of the unknown function fθ(t) can be described by the following system of

ordinary diﬀerential equations

dfθ(t)

(50)

where fθ(·) = (fθ,1(·), . . . , fθ,d(·))T, F(·,·,·) = (F1(·,·,·), . . . , Fd(·,·,·))Tand θ ∈ Θ. We

assume that Θis a compact subset of Rp_.

The first step of the two-step approach is to fit the nonparametric quantile regression model at the given quantile levelτ. Here we use B-spline basis functions to fit the quantile regression Qτ(Y|t) =XnBn where Qτ(Y|t) is theτth quantile function of observations Y evaluated at the given time pointst = (t1, . . . , tn)T; Xn = ((Nl(ti)))1≤i≤n,1≤l≤Jn = (x1, . . . ,xn)T ∈Rn×Jn,{Nl(·)}Jl=1n being the B-spline basis functions with order mand kn−1

interior knots with Jn=m+kn−1, andxi being theith row vector of matrix Xn with

increasing dimension Jn for i = 1, . . . , n, and Bn =

(

βJn×1

1 , . . . ,β

Jn×1

d

)

is the matrix containing the coeﬃcients of the basis functions for d responses. The loss function of quantile regression for a given τ using the B-spline basis expansion can be written as

ρ

(Y −XnBn) whereρ(u) =ρτ(u) = τ u−1{u≤0}u.

In the first step, we carry out the nonparametric estimation of Qτ(Y|t) using the

B-spline basis expansion. We minimize the loss function

ρ

(Y−XnBn) to obtain the quantile

regression coeﬃcient Bnb =

( b

βJ₁n×1, . . . ,βb_dJn×1

)

. In the second step, we estimate the parameterθby minimizing a weighted distance between the nonparametrically estimated derivative and the derivative suggested by the system of ODEs. Letw(·) be a continuous weight function withw(0) =w(1) = 0 and be positive on (0,1). We define

Rf(η) =

{ ∫ 1 0

∥f′(t)−F(t,f(t),η)∥2w(t)dt

}1/2

ψ(f) = argmin

η∈Θ

Rf(η). (3.2)

It is easy to check that ψ(fη) = η. Thus the map ψ extends the definition of the parameter θ beyond the model. Let us define f0 = (f0,1, . . . , f0,d)T the true quantile

(51)

θ0 describes the projection of the true regression function on the parametric model. We assume that θ0 lies in the interior of Θ. From now on, we shall write θ for ψ(f) and treat it as the parameter of interest. The estimator of θ is induced on Θ through the mapping ψ acting on f(·) = BbT

nN(·).

3.3 Main results

Our objective is to study the asymptotic behavior of the two-step method estimator and obtain an asymptotic representation of √n( ˆθ−θ0). We make the assumption that

for all ϵ >0, inf

η:∥η−θ0∥≥ϵRf0(η)> Rf0(θ0). (3.3)

We denote mixed derivatives as Dl,r,sF(t,f,θ) = ∂l+r+s/∂θs∂fr∂tlF(t,f,θ). The

re-sponse variables Yj’s are considered independent, and hence βˆj are mutually

indepen-dent forj = 1, . . . , d. This allows treating regression for each response variable separately. Thus we can assume that d= 1 in Theorem 1 for the sake of simplicity in notation and write ˆf(·), f0(·), F(·,·,·), βˆ instead of fˆ(·), f0(·), F(·,·,·) and Bnb respectively. Exten-sion to d-dimensional case is straightforward as shown in Remark 4 after the statement of Theorem 1.

Remark 3.1. Condition (3.3) implies that θ0 is the unique point of minimum of Rf0(·)

and θ0 should be a well-separated point of minimum.

Theorem 3.1. Let the matrix

Jθ0 =

∫ 1 0

[

(D0,0,1F(t, f0(t),θ0))TD0,0,1F(t, f0(t),θ0)−D0,0,1S(t, f0(t),θ0)

]

(52)

be nonsingular, where S(t, f(t),θ) = (D0,0,1(F(t, f(t),θ)))T(f0′(t)−F(t, f0(t),θ0)). Let m ≥ 5 and n1/2m _≪ _J

n ≪ n1/8. If D0,2,1F(t, y,θ) and D0,0,2F(t, y,θ) are continuous in

their arguments, then under the assumption (3.3), we have

∥√n(θˆ−θ0)−Jθ0−1

√

n(Γ( ˆf)−Γ(f0))∥ →P0 0 (3.4)

as n → ∞, where

Γ(z) =

∫ 1 0

([

D0,1,0S(t, f0(t),θ0)−(D0,0,1F(t, f0(t),θ0))TD0,1,0F(t, f0(t),θ0)

]

w(t)

−d

dt[(D0,0,1F(t, f0(t),θ0))

T_w(t))_z(t)dt.

Remark 3.2. In Lemma 3.3, we show thatΓ( ˆf)−Γ(f0) = Op(n−1/2). Henceθˆconverges θ0 at the rate n−1/2 in probability under the truth.

Remark 3.3. We note that fifth order smoothness of the true mean function is suﬃcient to ensure that the convergence rate is n−1/2_{. We do not gain by assuming a higher order} of smoothness. For m = 5, the required condition becomes n1/10 ≪ Jn ≪ n1/8. Also,

deterministically chosen equally spaced knots are suﬃcient to derive the parametric rate.

Remark 3.4. When the response is ad-dimensional vector, (3.4) holds with the scalars being replaced by the correspondingd-dimensional vectors. Let A(t) stands for the p×d matrix

J_θ0−1

([

D0,0,1S(t,f0(t),θ0)−(D0,0,1F(t,f0(t),θ0))TD0,0,1F(t,f0(t),θ0)

]

w(t)

−d dt

[

(D0,0,1F(t,f0(t),θ0))Tw(t)

])

(53)

Then we have, with A,j standing for thejth column of A(t),

J_θ0−1Γ(fβ) = d

∑

j=1

∫ 1 0

A,j(t)NT(t)βjdt= d

∑

j=1

GT_n,jβj, (3.5)

where GT_n,j =∫₀1A,j(t)NT(t)dt which is ap×Jn matrix for j = 1, . . . , d. Then in order

to approximate the distribution of ˆθ, it suﬃces to study the asymptotic distribution of the linear combination of ˆβj given by (3.5).

Theorem 3.2. Let τ2

n = EFψn2/(EFψn′)2 and define Σn = n

∑d j=1G

T

n,j(XnTXn)−1Gn,j.

Then under the conditions of Theorem 3.1, √nτ−1

n Σ

−1/2

n ( ˆθ−θ0)converges in distribution

to Np(0,I).

To estimate quantile function Qτ(Y|t) nonparametrically using the B-spline basis

expansion, we estimate the coeﬃcient of quantile regression β of the basis functions by

ˆ

β = argmin

β n

∑

i=1

ρ(Yi−x′iβ) (3.6)

where the loss function is defined as ρ(u) = ρτ(u) = τ u−1{u ≤ 0}u. If ρ were convex

with derivativeψ, then (3.6) would be equivalent to the solution of

n

∑

i=1

ψ(Yi−x′iβ)xi = 0.

(54)

technical obstacles. Further, as the B-spline expansion is only an approximation, the truth does not belong to the model and the model is mildly misspecified. Figure 3.1 shows the derivative function and its approximation by a sequence of smooth functions given by

ψn(un) = τ −1 +

(

1 + τ 1−τe

−Snun

)₋1

whereSn → ∞. The diﬀerentiability ofψn function will be seen to play an important role

in the asymptotic properties of the estimation. This motivates us to define the estimator ˆ

β as any of the solution of

n

∑

i=1

ψn(Yi−x′iβ)xi = 0. (3.7)

(55)

According to the property of B-splines there always exist β0 and constantcsuch that

sup

t∈[0,1]

|fβ0(t)−f0(t)|2 ≤cJn−2m, sup t∈[0,1]

|f_β0′ (t)−f₀′(t)|2 ≤cJ_n−2(m−1). (3.8)

Note that for such β0, we could not define β0 to be a solution of

E0(ψn(Y −fβ(t))) = 0

because ψn is not the same as ψ for the chosen τ and fβ0(t) is not the τth quantile ofY

for a given t. Instead, for such β0 satisfying (3.8), we could only have the following such that for some ηn>0,

|E0(ψn(Y −fβ0(t)))| ≤ηn.

Suppose there exist positive sequencesbn,cnanddnsuch thatD(u, z) =

ψn(u+z)−ψn(u)

z ≥

Two-step Methods for Differential Equation Models.

ABSTRACT

DEDICATION

BIOGRAPHY

ACKNOWLEDGEMENTS

TABLE OF CONTENTS

LIST OF TABLES

LIST OF FIGURES

LIST OF SYMBOLS

Chapter 1

Introduction

1.1

Literature review

1.1.1

Non-linear least squares approach

1.1.2

Two-step approach

1.1.3

Bayesian estimation techniques

1.1.4

Quantile regression

Chapter 2

Two-step Bayesian inference for

diﬀerential equation models on

longitudinal data

2.1

Introduction

2.2

Model assumptions and prior specifications

2.3

Main results

2.4

Simulation study

2.4.1

Theophylline

2.4.2

Bergman’s minimal model

2.5

Real data Analysis

2.6

Proofs

Chapter 3

Two-step approach for quantile

regression driven by diﬀerential

equations

3.1

Introduction

3.2

Model assumptions

ρ

ρ

3.3

Main results