Quantile based estimation of treatment effects in censored data

Full text

(1)QUANTILE BASED ESTIMATION OF TREATMENT EFFECTS IN CENSORED DATA by. Nicholas Paul Crotty DISSERTATION submitted in fulfilment of the requirements for the degree. MAGISTER SCIENTIAE in. MATHEMATICAL STATISTICS in the FACULTY OF SCIENCE at the. University of Johannesburg. SUPERVISOR: PROF. F. LOMBARD December 2012.

(2) I hereby declare that this dissertation has not been submitted to any higher education institution, other than the University of Johannesburg, for any educational qualification whatsoever.. NP Crotty.

(3) Abstract Comparison of two distributions via use of the quantile comparison function is carried out specifically from possibly censored data. A semi-parametric method which assumes linearity of the quantile comparison function is examined thoroughly for non-censored data and then extended to incorporate censored data. A fully nonparametric method to construct confidence bands for the quantile comparison function is set out. The performance of all methods examined is tested using Monte Carlo Simulation..

(4) Acknowledgements I would like to thank my supervisor, Prof Lombard, for his guidance, patience and invaluable help throughout this endeavour. I am indebted to him for the amount of knowledge he has passed onto me. Thank you to my parents, Jim and Penny Crotty, for their unwavering support and belief in me. I would also like to thank my brothers, Tom and Peter Crotty, for providing not only encouragement but giving me opportunity to relax and helping take my mind off work. Thank you to Keilauren De Vries for providing motivation and support for the last part of this undertaking. Thank you to Summer and Kimi for providing companionship when I was working late into the night. Lastly I would like to thank the National Research Fund (NRF) for the financial aid given to me..

(5) Contents. 1 Introduction and literature review. 1. 2 Technical preliminaries. 6. 2.1. The empirical distribution function . . . . . . . . . . . . . . . . . . .. 6. 2.2. The quantile comparison function . . . . . . . . . . . . . . . . . . . .. 7. 2.3. The survival function and censoring . . . . . . . . . . . . . . . . . . .. 10. 2.3.1. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 10. 2.3.2. The Kaplan-Meier estimator . . . . . . . . . . . . . . . . . . .. 14. 2.3.3. The Nelson-Aalen estimator . . . . . . . . . . . . . . . . . . .. 19. Some stochastic processes . . . . . . . . . . . . . . . . . . . . . . . .. 20. 2.4.1. Brownian motion . . . . . . . . . . . . . . . . . . . . . . . . .. 20. 2.4.2. Brownian bridge . . . . . . . . . . . . . . . . . . . . . . . . .. 21. 2.4.3. Two parameter Brownian motion . . . . . . . . . . . . . . . .. 21. 2.4.4. Kiefer process (two parameter Brownian bridge) . . . . . . . .. 22. 2.4. 3 A semi-parametric regression method for complete data. 23.

(6) 3.1. Regression setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 24. 3.2. Comparison of variances . . . . . . . . . . . . . . . . . . . . . . . . .. 27. 3.3. Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 31. 3.3.1. Bias of the estimators . . . . . . . . . . . . . . . . . . . . . .. 31. 3.3.2. Variance of the estimators . . . . . . . . . . . . . . . . . . . .. 33. 3.3.3. The choice of k . . . . . . . . . . . . . . . . . . . . . . . . . .. 38. 3.4. The location only model . . . . . . . . . . . . . . . . . . . . . . . . .. 42. 3.5. Change of response variable . . . . . . . . . . . . . . . . . . . . . . .. 45. 4 A semi-parametric regression method for censored data. 47. 4.1. The regression setup . . . . . . . . . . . . . . . . . . . . . . . . . . .. 48. 4.2. Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 50. 4.2.1. Bias of the estimators . . . . . . . . . . . . . . . . . . . . . .. 51. 4.2.2. Variance of the estimators . . . . . . . . . . . . . . . . . . . .. 51. 5 Nonparametric confidence bands for the quantile comparison function with censored data. 55. 5.1. Asymptotic representation of the quantile comparison function . . . .. 56. 5.2. Computational issues . . . . . . . . . . . . . . . . . . . . . . . . . . .. 60. 5.3. Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 63. Bibliography. 71. Appendices. 75.

(7) A Approximations for the empirical quantile function. 75. A.1 Uncensored data . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 75. A.2 Censored data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 77. B Derivation of (5.2). 79.

(8) Chapter 1 Introduction and literature review Consider comparing the effects two nominally different treatments have on a population of experimental units. A treatment could be a medical treatment, such as a pill, injection or operation. Merely glancing at the data to check if there is a difference in the response of the experimental units under differing treatments is not enough. A statistically significant difference is required. In this dissertation we will be interested in whether a treatment has an effect on the response. The particular type of treatment effect that we will consider is discussed below. At this point we define some notation. We have two independent treatment groups yielding observations X1 = {X1,1 , . . . , X1,n1 } and X2 = {X2,1 , . . . , X2,n2 } respectively. Here n1 and n2 denote the respective sample sizes. The distribution of the X1 ’s and X2 ’s are denoted by F1 and F2 respectively. The most general question of interest is whether F1 and F2 are identical. The Kolmogorov-Smirnov test is possibly the best known test of the equality of F1 and F2 . This is a fully nonparametric test. However, we will be interested in cases where some known structural relationship exists between F1 and F2 . In particular we will consider the case where X2 has the same distribution as µ + σX1 , where σ > 0 and µ are unknown constants. This implies that ( F2 (x) = F1 1. x−µ σ. ) (1.1).

(9) for all x, or equivalently F2−1 (u) = µ + σF1−1 (u). (1.2). for all 0 < u < 1. In this instance we are then dealing with semi-parametric models and it is to be expected that tests for treatment effect, i.e. µ ̸= 0 and/or σ ̸= 1 will be more powerful than fully nonparametric tests. In such a semi-parametric model it is therefore very clear what is the meaning of the treatment effect; the treatment effect is defined by the numerical value of the vector [ µ σ ]. In the fully nonparametric case it is not immediately obvious how to quantify the term “treatment effect”. Lehmann [12] defined the so-called quantile comparison function (QCF) q(x) = F2−1 (F1 (x)). (1.3). as a measure of the treatment effect. When there is no treatment effect, i.e. when F1 ≡ F2 we have q(x) ≡ x. On the other hand in the semi-parametric case (1.2) we have q(F1−1 (u)) = F2−1 (u) = µ + σF1−1 (u), i.e. q(x) ≡ µ + σx,. (1.4). so that the QCF is a straight line in this case. Doksum [3] considers a slight variation, ∆(x) = q(x) − x, on the QCF, which he calls the shift function. Doksum [3] states and proves various properties of this function and creates confidence bands based on a two sample Kolmogorov-Smirnov statistic. Doksum and Sievers [4] consider three different confidence bands for the shift function, one of which (named the S band) is a revised version of the one proposed by Doksum [3]. Extension of the shift function to k ≥ 3 samples is examined by Doksum [5] and by Einmahl and McKeague [6]. In the latter case the simultaneous confidence bands are distribution free for all values of k and, in particular, for k = 2 they are asymptotically equivalent to one of the bands given by Doksum and Sievers [4]. Lombard [14] created fully nonparametric confidence bands for q based on a 2.

(10) Kolmogorov-Smirnov type statistic for matched pairs data, i.e. where the data occur naturally as correlated pairs (X1,i , X2,i ); i = 1, . . . , n (= n1 = n2 ). In the semiparametric case, Hsieh [8] sets out his empirical process approach which is based on the relation (1.2). In essence his idea boils down to the following. Choose k ≥ 3 points 0 < u1 < . . . < uk < 1 and write the relation (1.2) at each u: F2−1 (u1 ) = µ + σF1−1 (u1 ) .. . F2−1 (uk ) = µ + σF1−1 (uk ).. (1.5). Replacing F1 and F2 by consistent estimates Fˆ1 and Fˆ2 we see that Fˆ2−1 (u1 ) = µ + σ Fˆ1−1 (u1 ) + ϵ(u1 ) .. . Fˆ2−1 (uk ) = µ + σ Fˆ1−1 (uk ) + ϵ(uk ),. (1.6). where the ϵ(ui ), i = 1, . . . , k, represent the errors in estimating F1 and F2 by Fˆ1 and Fˆ2 . Now notice that (1.6) resembles a simple linear regression setup with regression parameters µ (intercept) and σ (slope), explanatory variable values Fˆ1−1 (ui ), response variables Fˆ2−1 (ui ) and (heteroscedastic) errors ϵ(ui ). Hsieh’s [8] idea was to estimate µ and σ by weighted least squares via the system (1.6). Hsieh [8] shows that if k increases suitably slowly with n then the resulting weighted least squares estimates are asymptotically efficient. That is, the asymptotic variances are the same as those of the maximum likelihood estimates of µ and σ when F1 is known up to a location and scale parameter. The relationship between k and n (k = o(n1/3 )) is not specified exactly and it is not at all clear how k should be chosen in practical situations. Thus, it is legitimate to ask how well ordinary (unweighted) least squares estimates would fare compared to the weighted least squares estimates and also to what extent the asymptotic results of Hsieh [8] are applicable in finite samples. We investigate these questions in Chapter 3 using theoretical results supported by Monte Carlo 3.

(11) simulations. Some other work along these lines has been reported by Potgieter and Lombard [17]. Lombard [14], Potgieter [16] and Hall, Potgieter and Lombard [7] have proposed various tests of the null hypothesis that the QCF is a straight line, i.e. that (1.4) holds. The justification for such tests is that if (1.4) can be shown to be a reasonable assumption, much more powerful inferences, for example narrower confidence bands and intervals, are obtained compared to the fully nonparametric approach. In this dissertation our interest will focus primarily on cases where the data may be censored, i.e. X1,i and/or X2,i are partially observed. Let C1,i be a random variable independent of X1,i , i = 1, . . . , n1 and define the new random variables ˜ 1,i = X1,i ∧ C1,i . X The observed times therefore have the distribution function ˜ 1 > x) F˜1 (x) = 1 − P (X = 1 − P (X1 > t, C1 > x) = 1 − P (X1 > x)P (C1 > x) = 1 − (1 − F1 (x))K1 (x),. (1.7). where K1 (x) = P (C1 > x). The same discussion holds in relation to the X2,i data namely ˜ 2,i = X2,i ∧ C2,i X where C1,i and C2,i are assumed to be independent. Essentially, the problem that ˜ 1,i , i = 1, . . . , n1 , which we now face is to estimate the QCF q given the data X ˜ 2,i , i = 1, . . . , n2 , which have the distribution function F˜1 along with the data X have the distribution function F˜2 . The problem of estimating q and constructing 4.

(12) simultaneous confidence bands has not been extensively researched. In the semiparametric case Hsieh [10] examines the extension of his empirical process approach to the censored case and considers both an ordinary and a weighted least squares method of estimating µ and σ in (1.4). Once again it is shown that the weighted regression method yields asymptotically efficient estimates under appropriate conditions. Hsieh [10] provides simulation results, some of which we have been unable to reproduce. In Chapter 4 we consider this methodology, implement some simplifications and conduct extensive Monte Carlo simulations to compare the weighted and ordinary least squares estimates. Lu, Wells and Tiwari [15] use a bootstrap method to construct simultaneous confidence bands for the QCF. In Chapter 5 we propose and evaluate by Monte Carlo simulations an alternative approach based on a method developed by Lombard [14]. The necessary technical results and notation needed in Chapters 3 to 5 are presented in Chapter 2. In conclusion we point out that there is a parallel development regarding estimation of the ordinal dominance function (also known as the probability integral transform function) p(u) = F2 (F1−1 (u)). While we do not consider this function in this dissertation we mention for completeness the papers by Li, Tiwari, and Wells [13] and Hsieh [9] which consider nonparametric and semi-parametric estimation of p from censored data.. 5.

(13) Chapter 2 Technical preliminaries 2.1. The empirical distribution function. Let X = {X1 , . . . , Xn } be independent identically distributed (i.i.d) with distribution function (CDF) F . The ordered statistics are denoted by X(1) ≤ . . . ≤ X(n) . The empirical distribution function (EDF) 1∑ Fˆ (x) = I{Xi ≤ x}, − ∞ < x < ∞, n i=1 n. (2.1). is the natural estimator of the underlying distribution F . We will assume that F is continuous and strictly increasing. Thus its inverse, F −1 (u), u ∈ [0, 1] is the unique real number x such that F (x) = u holds. The EDF, on the other hand, is not strictly increasing; hence it does not possess a unique inverse. Fˆ −1 (u) = inf{x : Fˆ (x) ≥ u}. (2.2). is known as the left continuous inverse. We also define the continuous EDF as x − X(k) 1 k F˜ (x) = + , n X(k+1) − X(k) n. (2.3). for X(k) ≤ x ≤ X(k+1) with k = 1, . . . , n. The continuous EDF is thus defined by linear interpolation between points X(k) and X(k+1) . 6.

(14) From (2.1) we have nFˆ (x) =. n ∑. I{Xi ≤ x},. i=1. so that nFˆ (x) is the sum of independent Bernoulli random variables with success probability F (x) and hence is Binomially distributed with parameters n and F (x). By the Glivenko-Cantelli theorem, van der Vaart [18] page 266, Fˆ is a uniformly consistent estimator of F , i.e. sup |Fˆ (x) − F (x)| → 0 as n → ∞, x. and by the central limit theorem the process. √. n(Fˆ (x) − F (x)) converges in distribu-. tion to a Gaussian process Y (x) with covariance function cov(Y (x), Y (y)) = F (x ∧ y) − F (x)F (y),. (2.4). for all −∞ ≤ x, y ≤ ∞.. 2.2. The quantile comparison function. Let F1 and F2 be two continuous and strictly increasing CDFs. The quantile comparison function (QCF), defined by q(x) = F2−1 (F1 (x)). (2.5). for all x, can be used to compare F1 and F2 . A plot of the graph {(x, q(x)), − ∞ < x < ∞} is known as a (population) Q-Q plot. If F1 = F2 , then clearly q(x) = x and the plot will be a straight line through the origin at a 45 degree angle. Alternatively when F1 and F2 are not identical, then q(x) ̸≡ x. This is demonstrated in Figure 2.1 which plots the quantiles of the t distribution with 5 degrees of freedom (d.f.) against those of the standard normal distribution. 7.

(15) Figure 2.1: Q-Q plot of a t distribution with 5 d.f. versus a standard normal distribution 4 q(x). Quantiles of a t distribution with 5 d.f.. 3. 2. 45 degree line through the origin. 1. 0. −1. −2. −3. −4 −3. −2. −1. 0. 1. Quantiles of a standard normal distribution. 8. 2. 3.

(16) The most natural estimator of q is qˆ(x) = Fˆ2−1 (Fˆ1 (x)).. (2.6). Figure 2.2 shows the Q-Q plot for two samples, both of size 100, from a standard normal distribution and a t distribution with 5 degrees of freedom respectively. An Figure 2.2: Q-Q plot of two samples, from a t distribution with 5 d.f. and a standard normal distribution 5. 4. 3. Quantiles of a sample from a t distribution with 5 d.f. 2. 1. 0. 45 degree line through the origin. −1. −2. −3. −4. −5 −3. −2. −1. 0. 1. 2. 3. Quantiles of a sample from a standard normal distribution. alternative graphical representation is in terms of the shift function, which is defined as ∆(x) = q(x) − x. 9. (2.7).

(17) The same data as used in Figure 2.2 represented in this form are plotted in Figure 2.3. Figure 2.3: Shift function for two samples from Figure 2.2 2.5. 2. 1.5. 1. ∆ (x). 0.5. 0. −0.5. −1. −1.5. −2 −2.5. −2. −1.5. −1. −0.5. 0. 0.5. 1. 1.5. 2. x. 2.3 2.3.1. The survival function and censoring Definitions. Survival analysis is concerned with the time, X = {X1 , . . . , Xn } (> 0), that elapses until a pre-defined event of interest takes place. While it is common for this event 10.

(18) to be a death (hence the name survival analysis) this is not necessary. Other events could be the failure of a mechanical part, completion of a task or learning a new skill for example. The survival function is defined as S(t) = P (X > t) = 1 − F (t). (2.8). where F is the CDF of X and t ≥ 0. The natural non-parametric estimate of S is the empirical survival function (ESF) ∑ ˆ = 1 S(t) I{Xi > t}. n i=1 n. (2.9). The hazard rate, also known as the hazard function, is defined as P (t ≤ X < t + h|X ≥ t) . h→0 h. λ(t) = lim. (2.10). Heuristically, λ(t)h is approximately equal to the probability that the event occurs in a time interval (t, t + h) given that it has not occurred up to time t. The density function of the Weibull is f (t) = αλtα−1 exp(−λtα ). (2.11). and the hazard rate is λ(t) = αβtα−1 from Klein and Moeschberger [11] page 38, where α, β > 0 are parameters. Figure 2.4 shows the hazard rates for three sets of parameter values. The horizontal line (α = 1, β = 0.5) corresponds to the exponential distribution which thus has a constant hazard rate. The next two cases correspond to a decreasing and an increasing hazard rate respectively. A related quantity is the cumulative hazard function, given by ∫t Λ(t) =. λ(s)ds. 0. 11. (2.12).

(19) Figure 2.4: Hazard rates of a Weibull distribution with various parameter values 1.6 α = 1, β= 0.5 α = 0.5, β = 1 α = 3, β = 0.002. 1.4. 1.2. Hazard Rate. 1. 0.8. 0.6. 0.4. 0.2. 0. 0. 20. 40. 60. 80. Time. 12. 100. 120. 140. 160.

(20) The relationship between the cumulative hazard function and the survival function is, from (2.10), S(t) = exp(−Λ(t)).. (2.13). We wish to consider situations in which censoring occurs, that is, when an outside occurrence prevents the event of interest from being observed. Various forms of censoring can occur. Our interest focuses on random censoring which is described as follows. Let C be a random variable, independent of X, with continuous survival function K. Let C1 , . . . , Cn be independent observations on C and define the random variables ˜ i = Xi ∧ Ci . X and δi = I{Xi ≤ Ci }, ˜ i , δi ), i = 1, . . . , n. i = 1, . . . , n. The observational data now consists of the pairs (X If δi = 0, we do not observe the full survival time Xi but rather the censored value ˜ i = Ci . The observed times therefore have the survival function X ˜ > t) H(t) = P (X = P (X > t, C > t) = P (X > t)P (C > t) = S(t)K(t),. (2.14). where the third equality follows from the assumed independence of X and C. Conse˜ data quently, since the problem is to construct an estimator of S, the ESF of the X cannot be used directly for this purpose. This observation led to the development of the Kaplan-Meier and Nelson-Aalen estimators which we consider next. 13.

(21) 2.3.2. The Kaplan-Meier estimator. Denote the observed data, ordered in their first component, by ˜ (i) , δ(i) ), i = 1, . . . , n (X ˜ (1) < . . . < X ˜ (n) . The Kaplan-Meier estimate of the survival function S is with X (. ∏. Sˆ(1) (t) =. ˜ (i) ≤t} {i:X. n−i n−i+1. )δ(i) .. (2.15). For an alternative expression, let l=. n ∑. I{δi = 1}. i=1. denote the total number of uncensored observations and let Rj where j = 1, . . . , l denote the ranks of these uncensored observations in the ranking of all the observed ˜ i , i = 1, . . . , n: X Rj =. n ∑. ˜ i ≤ Xj , δj = 1}. I{X. i=1. Then Sˆ(1) (t) =. (. ∏ ˜ R ≤t} {j:X j. n − Rj n − Rj + 1. ) .. It should be noted that neither of these two representations take into account tied data. We will not concern ourselves with this aspect except to note that it is simple to accommodate ties into the numerical computations; see Klein and Moeschberger [11] page 92. For notational simplicity, we denote Sˆ(1) by Sˆ for the remainder of this section. The Kaplan-Meier estimator is equal to the EDF when no censoring is present. 14.

(22) To see this, set δ(i) = 1 for all i in (2.15). Then, when X(j) ≤ t < X(j+1) , n−1 n−2 n−j Sˆ( t) = × × ... × n n−1 n−j+1 j = 1− n n 1∑ = 1− I{Xi ≤ t} n i=1 1∑ I{Xi > t}. n i=1 n. =. Figure 2.5 shows the Kaplan-Meier estimator for a sample of 25 observations from a standard exponential distribution, censored by random variables that are uniformly distributed on the interval [0, 2.2]. The censoring distribution was chosen to censor an expected 40% of the observations. The censored observations are represented by the stars. Notice that the graph jumps down at each uncensored event time and that the size of the jump there equals (1+ number of censored event times since the last uncensored time)/n. That is, if there are no censored observations between two uncensored observations then the Kaplan-Meier will jump down by 1/n. If there was one censored observation then the jump down will be 2/n etc. Figure 2.6 shows how smaller amounts of censoring changes the Kaplan-Meier. It shows the same data as in Figure 2.5 but this time the censoring variables were distributed uniformly on the interval [0, 20]. This corresponds to censoring an expected 5% of the observations. As for the EDF, we will use a continuous version of the Kaplan-Meier estimator. The continuous Kaplan-Meier estimator is x − X(j) Rj+1 − Rj n − Rj ˜ S(x) = − n X(j+1) − X(j) n. (2.16). for X(j) ≤ x < X(j+1) with j = 1, . . . , l. The continuous Kaplan-Meier is thus defined by linear interpolation between points X(j) ≤ x < X(j+1) . Notice that the points 15.

(23) Figure 2.5: The Kaplan Meier estimator for a sample from a standard exponential distribution with 40% censoring 1. 0.9. 0.8. 0.7. S(t). 0.6. 0.5. 0.4. 0.3. 0.2. 0.1. 0. 0. 0.5. 1. 1.5. 2. t. 16. 2.5. 3. 3.5. 4.

(24) Figure 2.6: The Kaplan Meier estimator for a sample from a standard exponential distribution with 5% censoring 1. 0.9. 0.8. 0.7. S(t). 0.6. 0.5. 0.4. 0.3. 0.2. 0.1. 0. 0. 0.5. 1. 1.5. 2. t. 17. 2.5. 3. 3.5. 4.

(25) given are uncensored observations. This is due to the Kaplan-Meier only jumping down at uncensored observations. Breslow and Crowley [1] showed that ˆ = Z(t). √ ˆ − S(t)), n(S(t). t≥0. (2.17). ˆ with covariance function converges in distribution to a zero mean Gaussian process, Z,  t ∧t  ∫1 2 ⋆ dF (s)  ˆ 1 ), Z(t ˆ 2 )) = S(t1 )S(t2 )  cov(Z(t (2.18) H 2 (s) 0. where F ⋆ (s) = P (X ≤ s, δ = 1) ∫s = − K(x)dS(x). 0. Therefore dF ⋆ (s) −K(s)dS(s) = 2 H (s) K 2 (s)S 2 (s) dS(s) = − 2 S (s)K(s) so that.  t ∧t ∫1 2 ˆ ˆ cov(Z(t1 ), Z(t2 )) = −S(t1 )S(t2 ) .  dS(s) S 2 (s)K(s). .. (2.19). 0. For later use we need an expression for the covariance function of the process ¯ ˆ −1 (u)), Z(u) = Z(S. 0 ≤ u ≤ 1.. (2.20). Making the substition S(s) = w in the integral in (2.19) results in u∫ 1 ∧u2. ¯ 1 ), Z(u ¯ 2 )) = u1 u2 cov(Z(u 0. 18. dw w2 K(S −1 (w)). .. (2.21).

(26) We now show that when no censoring is present, we can recover (2.4) from (2.19). From (2.18).  t ∧t  ∫1 2 d S(s)  ˆ 1 ), Z(t ˆ 2 )) = −S(t1 )S(t2 )  cov(Z(t S(s)2 0 ( ) 1 = S(t1 )S(t2 ) −1 S(t1 ∧ t2 ) ( ) 1 − S(t1 ∧ t2 ) = S(t1 ∧ t2 )S(t1 ∨ t2 ) S(t1 ∧ t2 ) = S(t1 ∨ t2 ) − S(t1 )S(t2 ),. (2.22). where the third equality follows from the fact that S(t1 )S(t2 ) = S(t1 ∨2 )S(t1 ∧ t2 ). Hence using (2.8) we see that ˆ 1 ), Z(t ˆ 2 )) = (1 − F (t1 ∨ t2 )) − (1 − F (t1 ))(1 − F (t2 )) cov(Z(t = F (t1 ) + F (t2 ) − F (t1 ∨ t2 ) − F (t1 )F (t2 ) = F (t1 ∧ t2 ) − F (t1 )F (t2 ) = cov(Y (t1 ), Y (t2 )).. 2.3.3. (2.23). The Nelson-Aalen estimator. The Kaplan-Meier estimator in (2.15) can be alternatively expressed in the form   ( ) ∑ 1  ˆ = exp  δ(i) × log 1 − S(t) − . n−i+1 {i:X˜(i) ≤t} Replacing the logarithmic term by its first order approximation −1/(n − i + 1) leads us to the Nelson-Aalen estimator. . .  Sˆ(2) (t) = exp −. ∑ {i:X˜(i) ≤t} 19. δ(i)  . n−i+1. (2.24).

(27) Comparing (2.24) with (2.13) we see that ˆ = Λ(t). ∑ {i:X˜(i) ≤t}. δ(i) n−i+1. estimates the hazard function Λ(t) in (2.12).. 2.4. Some stochastic processes. Below we consider four stochastic processes that will be of use to us.. 2.4.1. Brownian motion. From Csörg˝o and Révész [2], page 21, we have the definition of Brownian Motion: Definition 2.1. A stochastic process {W (t), t ≥ 0} is said to be a standard Brownian motion if. 1. W (0) = 0; 2. {W (t), t ≥ 0} has stationary and independent increments; 3. for every t > 0, W (t) is normally distributed with mean 0 and variance t.. The covariance function of standard Brownian motion is, from Csörg˝o and Révész [2] page 22, cov(W (s), W (t)) = s ∧ t. for all s, t > 0. 20. (2.25).

(28) 2.4.2. Brownian bridge. We have the definition of a Brownian Bridge, from Csörg˝o and Révész [2] page 41: Definition 2.2. If {W (t), t ≥ 0} is a standard Brownian motion, then B(t) = W (t) − tW (1) , t > 0 is a Brownian bridge process. Using (2.25) we find the covariance function as cov(B(t), B(s)) = cov(W (t) − tW (1), W (s) − sW (1)) = cov(W (t), W (s)) + cov(W (t), −sW (1)) + cov(−tW (1), W (s)) + cov(−tW (1), −sW (1)) = t ∧ s − ts. (2.26). for all s, t > 0.. 2.4.3. Two parameter Brownian motion. Extending Brownian motion to two parameters changes the definition to, from Csörg˝o and Révész [2] page 58, Definition 2.3. A stochastic process {W (s, t), s, t ≥ 0} is said to be a two parameter standard Brownian motion process if 1. W (0, t) = W (s, 0) = 0 for 0 ≤ s, t < ∞; 2. {W (z), z = (s, t) ∈ R2 } has stationary and independent increments; 3. for every s, t > 0, W (s, t) is normally distributed with mean 0 and variance st. 21.

(29) The covariance function of two parameter standard Brownian motion is, from Csörg˝o and Révész [2] page 58, cov(W (s, t), W (s′ , t′ )) = (s ∧ s′ )(t ∧ t′ ). (2.27). for all s, s′ , t, t′ ≥ 0.. 2.4.4. Kiefer process (two parameter Brownian bridge). A Kiefer process can be regarded as a two parameter Brownian bridge. From Csörg˝o and Révész [2] page 80 we have Definition 2.4. If {W (s, t), 0 ≤ s ≤ 1, t ≥ 0 } is a two parameter standard Brownian motion, then κ(s, t) = W (s, t) − sW (s, t), 0 ≤ s ≤ 1, t ≥ 0, is a Kiefer process (two parameter Brownian bridge process). The following holds: κ(0, t) = κ(1, t) = κ(s, 0) = 0. The expected value for a Kiefer process is zero, E(κ(s, t)) = 0. The covariance for a Kiefer process is cov(κ(s, t); κ(s′ , t′ )) = (t ∧ t′ )(s ∧ s′ − ss′ ). A generalized Kiefer process is defined similarly. The covariance for a generalized Kiefer process is cov(κ(s, t); κ(s′ , t′ )) = (t ∧ t′ )Γ(s, s′ ) where Γ is the covariance function of a stationary Gaussian process and 0 ≤ s′ ≤ 1 and t′ ≥ 0. For instance, in Chapter 4 we will have Γ(s, s′ ) = (1 − s)(1 − s′ ). s∧s′ ∫. 0. 22. du (1 −. u)2 (K(F −1 (u))). .. (2.28).

(30) Chapter 3 A semi-parametric regression method for complete data Consider comparing the distributions of two random variables, X1 and X2 , having CDFs F1 and F2 respectively. For example X1 can be a generic observation from a control group while X2 is from a treatment group. If the difference between the CDFs of F1 and F2 is known up to a number of parameters then maximum likelihood estimation of these parameters provides the best results. The method outlined here is a semi-parametric method, that is, there is no assumption regarding the type of distribution. However, we do assume that the two distributions differ only in location and scale. The relationship between the two distributions is assumed to be F2−1 (u) = µ + σF1−1 (u). (3.1). where 0 < u < 1. The following is equivalent to (3.1) by setting u = F1 (x): F2−1 (F1 (x)) = µ + σx, that is, P (X1 ≤ x) = P (X2 ≤ µ + σx) ( ) X2 − µ = P ≤x . σ 23. (3.2).

(31) This says that X2 = µ + σX1 in distribution.. (3.3). It is the presence of the scale parameter σ that sets this model apart from the usual location shift model which assumes that F1 and F2 differ only in location. Now, assume we have independent observations X1,1 , . . . , X1,n1 on X1 along with X2,1 , . . . , X2,n2 on X2 . Estimates of F1−1 (u) and F2−1 (u) will be obtained for a number of points u1 , . . . , uk and a linear regression setup based on (3.1) will then be implemented to estimate µ and σ. We will follow the methodology outlined in Hsieh [8]. The natural estimators of F1−1 and F2−1 are the empirical quantile functions Fˆ1−1 and Fˆ2−1 . Replacing F1 and F2 in (3.1) by these estimates gives Fˆ2−1 (u) = µ + σ Fˆ1−1 (u) + ϵ(u). (3.4). where ϵ(u) is an error term.. 3.1. Regression setup. If (3.4) is written out for 0 ≤ u1 ≤ . . . ≤ uk ≤ 1, we have Fˆ2−1 (u1 ) = µ + σ Fˆ1−1 (u1 ) + ϵ(u1 ) .. . Fˆ2−1 (uk ) = µ + σ Fˆ1−1 (uk ) + ϵ(uk ),. (3.5). which looks like an ordinary simple regression setup with response variable Fˆ2−1 (u) and predictor variable Fˆ1−1 (u). We will now proceed to estimate µ and σ by ordinary least squares (OLS). We will show below that the error terms are heteroscedastic and also derive asymptotic distributions of the OLS estimators µ ˆ and σ ˆ . Hsieh (1995) [8] accommodates the heteroscedasticity of the error terms by using a generalized least squares (GLS) approach. He shows that his GLS estimators of µ and σ are 24.

(32) asymptotically efficient. This implies that the OLS estimators will not be asymptotically efficient. Nonetheless the question remains whether the OLS estimators perform markedly poorer in finite sample situations than the GLS estimators. Hsieh [8], page 738, shows that the covariance matrix of (ϵ(u1 ), . . . , ϵ(uk )) can be well approximated by cov(ϵ(ui ), ϵ(uj )) = where Σij =. σ2 Σij n⋆. ui ∧ uj − ui uj −1 f1 (F1 (ui ))f1 (F1−1 (uj )). (3.6). and 1 1 1 = + . n⋆ n1 n2. (3.7). The calculations leading to (3.6) are detailed in Appendix A.1. Set e = Σ− 2 ϵ. Then 1. E(e) = 0 and cov(e) = Σ− 2 cov(ϵ)Σ− 2 1 σ2 − 1 = Σ 2 ΣΣ− 2 n⋆ σ2 = I. n⋆ 1. Now with. and. 1. .  1 Fˆ1−1 (u1 )   .. X =  ...  . −1 1 Fˆ1 (uk ). (3.8). .  Fˆ2−1 (u1 )   .. Y = , . −1 Fˆ2 (uk ). (3.9). ( ) σ2 Y = Xβ + Σ e e ∼ 0, , n⋆. (3.10). we have a regression setup 1 2. 25.

(33) where β = [ µ σ ]T , which leads us to Σ− 2 Y = Σ− 2 Xβ + e. 1. 1. (3.11). This conforms to a homoscedastic OLS model. One problem that arises is that Σ− 2 1. depends on f1 (F1−1 (u)), 0 < u < 1. f1 (F1−1 (u))) can be estimated by a Gaussian kernel density estimator 1 ∑ f1⋆ (Fˆ1−1 (u)) = ϕ mh i=1 m. (. Fˆ1−1 (u) − X1,i h. ) .. (3.12). Plugging this estimator into Σ gives a computable estimator Σ⋆ of Σ. The GLS estimator of β is now given by −1 T −1 βˆ = (X T Σ−1 ⋆ X) X Σ⋆ Y. where βˆ = [ˆ µσ ˆ ]T . The covariance matrix of the GLS estimator βˆ is given by ˆ = cov((X T Σ⋆ −1 X)−1 X T Σ⋆ − 2 Σ⋆ − 2 Y ) cov(β) 1. 1. = cov((X T Σ⋆ −1 X)−1 X T Σ⋆ − 2 (Σ⋆ − 2 Xβ + e)) 1. 1. = cov(β + (X T Σ⋆ −1 X)−1 X T Σ⋆ − 2 e) 1. ≈ (X T Σ⋆ −1 X)−1 X T Σ⋆ − 2 cov(e)Σ⋆ − 2 X T (X T Σ⋆ −1 X)−1 σ 2 T −1 −1 T −1 T T −1 −1 = (X Σ⋆ X) X Σ⋆ X (X Σ⋆ X) n⋆ σ 2 T −1 −1 (X Σ⋆ X) . = n⋆ 1. 1. (3.13). The OLS estimator of β, from (3.5), is β˜ = (X T X)−1 X T Y with covariance matrix, from (3.6), ˜ = cov(β). σ 2 T −1 T (X X) X Σ⋆ X(X T X)−1 . n⋆ 26. (3.14).

(34) 3.2. Comparison of variances. In this section we will compare the variances of the OLS and GLS estimators. For ease of computation and since we consider only large n, we will replace Fˆ1 (u) in (3.8) by F1 (u). From (3.8) we see that . k ∑. F1−1 (uj ). . k   j=1   X X= k k  ∑ −1 ∑ −1 F1 (uj ) (F1 (uj ))2 T. j=1. j=1. with inverse   (X T X)−1 =  . k ∑. (F1−1 (uj ))2. −. j=1. k ∑. −. F1−1 (uj ).  F1−1 (uj )  j=1   k k ∑. j=1. . × k. k ∑ (. )2 F1−1 (uj ) −. (. j=1.  1 =  k. 1 k. × [. 1 k. (F1−1 (uj ))2. k ∑. − k1. k ∑. k ∑. j=1. A† −B † −B † 1. 1 (. (F1−1 (uj ))2 −. F1−1 (uj ). j=1. F1−1 (uj ). j=1. 1 k. k ∑. ] ( ) 2 / A† − B † .. i , i = 1, . . . , k. k+1 27.    . )2 −1 F1−1 (uj ) . j=1. In the calculations that follow, we take ui =. )2 −1 F1−1 (uj ) . j=1. j=1. − k1. . 1 := k. k ∑. k ∑.

(35) Letting k → ∞, we see that ∫1. †. →. A. ∫∞. (F1−1 (u))2. x2 dF1 (x),. du = −∞. 0. and B. ∫1. †. →. F1−1 (u). ∫∞ du =. x dF1 (x). −∞. 0. We may assume without loss of generality, that ∫∞. ∫∞ x2 dF1 (x) = 1.. x dF1 (x) = 0 and −∞. −∞. We then have A† = 1 and B † = 0, hence T. (X X). −1. 1 ∼ k. [. 1 0 0 1. ] .. Similary using (3.6) we find that [ T. X ΣX = k. 2. A‡k Bk‡ Bk‡ Ck‡. ]. where A‡k = k −2. Bk‡. =k. −2. k ∑ k ∑ ui ∧ uj − ui uj , f f 1,i 1,j i=1 j=1. k ∑ k ∑. F1−1 (uj ). i=1 j=1. Ck‡. =k. −2. k ∑ k ∑. ui ∧ uj − ui uj , f1,i f1,j. F1−1 (ui )F1−1 (uj ). i=1 j=1. f1,i = f1 (F1−1 (ui )). 28. ui ∧ uj − ui uj , f1,i f1,j. (3.15).

(36) Then letting k → ∞ we find. A‡k. = k. −2. k ∑ k ∑. i∧j − ki kj k f (F1−1 ( ki ))f1 (F1−1 ( kj )) i=1 j=1 1. ∫1 ∫1. u ∧ v − u.v. → 0. f1 (F1−1 (u))f1 (F1−1 (v)). 0. ∫∞ ∫∞. =. dudv. (F1 (x ∧ y) − F1 (x)F1 (y)) dxdy := A‡ ,. (3.16a). −∞ −∞. the last equality following after making the substituitions F1 (x) = v and F1 (y) = u. Similary Bk‡ →. ∫∞ ∫∞. x(F1 (x ∧ y) − F1 (x)F1 (y)) dxdy := B ‡. (3.16b). xy(F1 (x ∧ y) − F1 (x)F1 (y)) dx dy := C ‡ .. (3.16c). −∞ −∞. and Ck‡. ∫∞ ∫∞ → −∞ −∞. These results together with (3.15) and (3.14) show that 2 ˜ ≈σ cov(β) n⋆. [. A‡ B ‡ B‡ C ‡. It is vital that we include the assumption that. ]. ∫∞ −∞. .. (3.17). x dF1 (x), and. ∫∞ −∞. x2 dF1 (x). are both finite, otherwise X T X does not exist. This eliminates using OLS if the underlying distribution is Cauchy or a t distribution with 2 degrees of freedom, for example. ˆ the covariance matrix Hsieh [8], Theorem 1, shows that for the GLS estimator β, 29.

(37) is . −1. ∫1 ∂f1 (F1−1 (u)) ∂f1 (F1−1 (u))F1−1 (u)  ∫1 ( ∂f1 (F1−1 (u)) )2    du . du ∂u ∂u ∂u   2  0 0  σ  ˆ  cov(β) ≈  n⋆  ( ) 1  ∫1 ∂f (F −1 (u)) (f (F −1 (u)))F −1 (u)  2 −1 −1 ∫ ∂f (F (u))F (u) 1 1 1   1 1 1 1 1 . du du   ∂u ∂u ∂u 0. 0. (3.18) The asymptotic efficiency of the OLS estimator µ ˜ relative to the GLS estimator µ ˆ is defined by e(˜ µ:µ ˆ) =. avar(ˆ µ) avar(˜ µ). (3.19). where avar stands for ”asymptotic variance”. The asymptotic efficiency of σ ˜ with respect to σ ˆ is defined analogously. Recall that ui = i/(k + 1). We shall set k equal to 8, 16 and 50. Table 3.1 gives the asymptotic efficiency results computed numerically from the ˆ and Cov(β) ˜ given in (3.13) and (3.14). It is clear that the expressions for Cov(β) efficiency of the OLS estimators decreases as the tail thickness of the underlying distribution increases. In the next section we will use Monte Carlo simulation to see how the two methods compare in finite sample situations.. Table 3.1: Efficiency Ratios defined in (3.19), for µ and σ, k = 8. Distribution Normal distribution t distribution, 5 d.f t distribution, 3 d.f. k=8 e(ˆ µ; µ ˜) e(ˆ σ; σ ˜) 1.00 0.99 0.96 0.99 0.88 0.97. 30. k= e(ˆ µ; µ ˜) 1.00 0.93 0.80. 16 e(ˆ σ; σ ˜) 0.99 0.97 0.90. k= e(ˆ µ; µ ˜) 1.00 0.88 0.69. 50 e(ˆ σ; σ ˜) 0.99 0.93 0.69. ..

(38) 3.3. Simulation results. Monte Carlo simulations were conducted to compare the GLS method to the OLS method in finite samples. We shall first look at the bias of the estimators, secondly at the efficiency ratios, as defined in (3.19) and finally at the effect of the choice of k.. 3.3.1. Bias of the estimators. Table 3.2 lists the bias results when F1 and F2 are normal distributions and Tables 3.3 and 3.4 when F1 and F2 were t distributions with 3 and 5 degrees of freedom respectively. F1 was a standard normal distribution and F2 was distributed µ + σF1 and analogously for when F1 was a t distribution. For ease of computation the continuous EDF (2.3) was used to estimate F1 (x) and F2 (x) and hence F1−1 (u) and F2−1 (u). The bias is calculated simply as the arithmetic mean of the estimates over ¯ˆ and σ ¯ˆ along with µ ¯˜ and σ ¯˜ ) minus the true value, that is, the simulations (µ ¯ˆ − µ and σ ¯ˆ − σ µ ˆbias = µ ˆbias = σ for GLS and ¯˜ − µ and σ ¯˜ − σ µ ˜bias = σ ˜bias = σ for OLS. There were 1000 simulations run, with k = 8 being evenly spaced over [0, 1] so u=. [. 1 9. 2 9. .... The bandwidth used in (3.12) was 1.059s h= , s=α∧ n2. 8 9. (. ]. .. β 1.349. ) (3.20). where α is the standard deviation of the observed data and β is the interquartile range. The results show that the bias is negligible for both OLS and GLS estimators with slightly larger bias as the tail size of the distribution increases. Interestingly, the OLS 31.

(39) Table 3.2: Bias of estimators when F1 and F2 are normally distributed µ 0 1 0 1. σ 1 2 1 2. n 50 50 100 100. µ ˆbias 0.00 0.01 0.00 0.00. µ ˜bias 0.00 0.01 0.00 0.00. σ ˆbias 0.00 -0.04 0.00 0.01. σ ˜bias 0.00 -0.04 0.00 0.00. Table 3.3: Bias of estimators when F1 and F2 are t distributions with 3 d.f. µ 0 1 0 1. σ 1 2 1 2. n 50 50 100 100. µ ˆbias 0.02 0.00 0.01 0.00. µ ˜bias 0.01 0.00 0.01 0.00. σ ˆbias 0.03 0.04 0.01 0.03. σ ˜bias 0.01 0.00 0.00 -0.01. Table 3.4: Bias of estimators when F1 and F2 are t distributions with 5 d.f. µ 0 1 0 1. σ 1 2 1 2. n 50 50 100 100. µ ˆbias 0.00 -0.02 0.00 0.00. µ ˜bias 0.00 -0.02 0.00 0.00. 32. σ ˆbias 0.01 0.04 0.02 0.02. σ ˜bias 0.00 0.03 0.01 0.00.

(40) is less biased for σ than the GLS estimate when the tail thickness of the distribution increases.. 3.3.2. Variance of the estimators. F1 and F2 are normal distributions in Table 3.5 and t distributions with 3 and 5 degrees of freedom in Tables 3.6 and 3.7 respectively. F1 was a standard normal distribution and F2 was distributed µ + σF1 and analogously for when F1 was a t distribution. The sample sizes are equal (n1 = n2 = n) with k = 8. A 1000 simulations were performed. The same setup as in (3.20) was used to estimate the density function. As before the continuous EDF (2.3) was used to estimate F1 (x) and F2 (x) and hence F1−1 (u) and F2−1 (u). The tables show the finite sample efficiencies, found by Monte Carlo simulations, as well as the theoretical asymptotic efficiencies from Table 3.1. The difference between the mean squared error and the variance of the estimator was less than 0.1% in all cases and the bias was negligible so the variances of the estimators were used in the efficiency calculations. Define the finite sample efficiencies by var(ˆ µ) var(ˆ σ) e(˜ µ:µ ˆ) = and e(˜ σ:σ ˆ) = . var(˜ µ) var(˜ σ) Table 3.5 shows the finite sample efficiencies for when F1 and F2 are normal distributions. The asymptotic efficiencies in Table 3.1 shows that for a normal distribution the GLS method and OLS method should be equally efficient due to the ratio being almost one. However, Table 3.5 suggests that OLS performs better at the smaller sample sizes and that, as expected, the difference between the two becomes less as the sample size increases. Table 3.6 lists the efficiency ratios when F1 and F2 are t distributions with 3 degrees of freedom. At small sample sizes the efficiency ratio favours OLS, but when the sample size increases it then changes in favour of GLS in the estimation of µ. The effect is not as pronounced in the estimation of σ. 33.

(41) Table 3.5: Finite sample efficiencies for a normal distribution, k = 8.. µ 0 1 1 0 1 1. σ 1 1 2 1 1 2. n 50 50 50 250 250 250. Finite sample µ 1.07 1.04 1.08 1.03 1.03 1.02. efficiency σ 1.03 1.02 1.08 1.02 1.02 1.03. Asymptotic efficiency µ σ 1.00 0.99 1.00 0.99 1.00 0.99 1.00 0.99 1.00 0.99 1.00 0.99. Table 3.6: Finite sample efficiencies for a t distribution with 3 d.f, k = 8.. µ 0 1 1 0 1 1. σ 1 1 2 1 1 2. n 50 50 50 250 250 250. Finite sample µ 1.17 1.14 1.12 0.97 0.95 0.94. efficiency σ 1.17 1.18 1.13 0.99 1.04 1.03. 34. Asymptotic efficiency µ σ 0.88 0.97 0.88 0.97 0.88 0.97 0.88 0.97 0.88 0.97 0.88 0.97.

(42) The second t distribution examined had 5 degrees of freedom and the results are given in Table 3.7. As with the previous results the finite sample efficiency does not reflect the asymptotic efficiency for the sample sizes looked at. Table 3.7: Finite sample efficiencies for a t distribution with 5 d.f, k = 8.. µ 0 1 1 0 1 1. σ 1 1 2 1 1 2. n 50 50 50 250 250 250. Finite sample µ 1.15 1.16 1.08 1.05 1.03 1.03. efficiency σ 1.08 1.08 1.11 1.05 1.03 1.04. Asymptotic efficiency µ σ 0.96 0.99 0.96 0.99 0.96 0.99 0.96 0.99 0.96 0.99 0.96 0.99. In conclusion, GLS does not perform as well as OLS at small sample sizes but improves as the sample size increases. Further simulations were run to test at what sample size the simulated efficiency results come close to the asymptotic efficiency. In all cases a sample size of n1 = n2 = 2500 was required to make the discrepancy between the finite sample efficiency to within 0.03. In practice we cannot obtain estimates by simulation and we have to use (3.13) and (3.14) to plug-in our estimates to get an estimate of the variance. The plug-in estimates of the covariance matrices of the estimators are ] [ σ ˆ2 A1 A3 ˆ −1 X)−1 = plug-in cov(GLS) = (X T Σ A3 A2 n where X is defined in (3.8) and Σ⋆ was defined at (3.12). Similarly ] [ ) σ ˆ 2 ( T −1 T ˆ B1 B3 T −1 = (X X) (X ΣX)(X X) = plug-in cov(OLS). B3 B2 n A1 and B1 are the variances for µ ˆ and µ ˜ and A2 and B2 are the variances for σ ˆ and σ ˜ respectively. This will be compared to the true variance i.e those found by simulation, 35.

(43) to determine the reliability of the plug-in estimate. Define ˆ µˆ = R. A1 ˆ σˆ = A2 and R var(ˆ µ) var(ˆ σ). as the reliability of the plug-in estimate of the variance for GLS. Similarly define ˆ µ˜ = R. B1 ˆ σ˜ = B2 and R var(˜ µ) var(˜ σ). for OLS. The variance of the denominators in these expressions are the simulated (true) variances. Table 3.8 lists the reliability of the plug-in estimate for a normal distribution and Tables 3.9 and 3.10 do the same for a t distribution with 3 degrees of freedom and 5 degrees of freedom respectively. The simulation setup was the same as for the finite sample efficiency simulations.. Table 3.8: Reliability of the plug-in variance estimates for a normal distribution, k = 8. µ 0 1 1 0 1 1. σ 1 1 2 1 1 2. n 50 50 50 250 250 250. ˆ µˆ R 0.90 0.92 0.92 0.92 0.97 0.95. ˆ σˆ R 0.86 0.92 0.89 0.98 1.03 0.93. ˆ µ˜ R 1.08 1.07 1.12 0.99 1.05 1.02. ˆ σ˜ R 0.97 1.02 1.05 1.03 1.08 1.00. The reliability of the plug-in estimate of the variance when F1 and F2 are normal distributions seems to be about the same for GLS and OLS. Table 3.9 shows that OLS provides a more reliable estimate of the variance compared to GLS. The GLS plug-in estimate also tends to underestimate the variance. Table 3.10 also shows the OLS plug-in estimate performing better than GLS specifically at the smaller sample sizes. The next section will look at how the choice of k affects the estimation. 36.

(44) Table 3.9: Reliability of the plug-in variance estimates for a t distribution with 3 d.f, k = 8. µ 0 1 1 0 1 1. σ 1 1 2 1 1 2. n 50 50 50 250 250 250. ˆ µˆ R 0.83 0.85 0.84 0.91 1.01 0.94. ˆ σˆ R 0.74 0.81 0.81 0.89 0.87 0.85. ˆ µ˜ R 1.07 1.04 1.02 0.95 1.03 0.95. ˆ σ˜ R 1.10 1.06 1.04 0.91 0.94 0.91. Table 3.10: Reliability of the plug-in variance estimates for a distribution with 5 d.f, k = 8. µ 0 1 1 0 1 1. σ 1 1 2 1 1 2. n 50 50 50 250 250 250. ˆ µˆ R 0.82 0.78 0.86 0.97 0.93 0.96. 37. ˆ σˆ R 0.80 0.79 0.79 0.87 0.94 0.91. ˆ µ˜ R 1.02 0.98 1.01 1.05 0.99 1.03. ˆ σ˜ R 0.94 0.94 0.96 0.94 0.99 0.97.

(45) 3.3.3. The choice of k. The choice of k = 8 in the previous results seems rather arbitrary, thus we considered a range of values for k between k = 2 and k = 50. The sample sizes were n1 = n2 = 100, the parameters were µ = 1 and σ = 2. As before the continuous EDF (2.3) was used to estimate F1 (x) and F2 (x) and hence F1−1 (u) and F2−1 (u). The regression points were evenly spaced over [0, 1] giving us ui =. i , i = 1, . . . , k. k+1. One thousand simulations were performed at every k. Figure 3.1 shows the effect of k on the estimation when F1 is a t distribution with 3 d.f. In this setup the choice of k has an obvious effect on the estimation when using the GLS method. As k increases, the variance increases, showing that the additional information contained in the added regression points decreases the performance of the GLS method, rather than improving it as intuition would suggest. This agrees with Theorem 1 in Hsieh [8] which requires k = o(n1/6 ) for asymptotic efficiency. The same pattern was observed when the degrees of freedom were increased to five in Figure 3.2 as well as in the case when F1 came from a normal distribution in Figure 3.3. This combination of factors suggests that for best results, a lower value for k of around 10 would give the best results across various types of distributions.. 38.

(46) Figure 3.1: Effect of the choice of k for a t distribution with 3 d.f. MSE for µ 0.9 GLS OLS. 0.8. MSE. 0.7 0.6 0.5 0.4 0.3 0.2. 0. 5. 10. 15. 20. 25. 30. 35. 40. 45. 50. 40. 45. 50. Number of regression points, k. MSE for σ 0.6 GLS OLS. MSE. 0.5 0.4 0.3 0.2 0.1. 0. 5. 10. 15. 20. 25. 30. 35. Number of regression points, k.. 39.

(47) Figure 3.2: Effect of the choice of k for a t distribution with 5 d.f. MSE for µ 0.6. MSE. 0.5. GLS OLS. 0.4 0.3 0.2 0.1. 0. 5. 10. 15. 20. 25. 30. 35. 40. 45. 50. 40. 45. 50. Number of regression points, k. MSE for σ 0.6. MSE. 0.5. GLS OLS. 0.4 0.3 0.2 0.1. 0. 5. 10. 15. 20. 25. 30. 35. Number of regression points, k.. 40.

(48) Figure 3.3: Effect of the choice of k for a standard normal distribution. MSE for µ 0.28 0.26. GLS OLS. MSE. 0.24 0.22 0.2 0.18 0.16 0. 5. 10. 15. 20. 25. 30. 35. 40. 45. 50. 40. 45. 50. Number of regression points, k. MSE for σ 0.5. MSE. 0.4. GLS OLS. 0.3 0.2 0.1 0. 0. 5. 10. 15. 20. 25. 30. 35. Number of regression points, k.. 41.

(49) 3.4. The location only model. The special case where σ = 1 will now be considered. The assumed relationship between the two distributions is then X2 = µ + X1 in distribution and (3.4) reduces to Fˆ2−1 (u) = µ + Fˆ1−1 (u) + ϵ⋆ (u) with cov(ϵ⋆ (ui )ϵ⋆ (uj )) =. ui ∧ uj − ui uj . n⋆ f1 (F1−1 (ui ))f1 (F1−1 (uj )). Thus, we have a simple two-sample location problem and the OLS of µ is therefore ) 1 ∑ ( ˆ −1 F2 (uj ) − Fˆ1−1 (uj ) , µ ˜= k j=1 k. which is the mean of the difference between the estimates Fˆ1−1 and Fˆ2−1 at the regression points. The variance of µ ˜ is σ2 var(˜ µ) ≈ n⋆. ∫∞ ∫∞ (F1 (x ∧ y) − F1 (x)F1 (y)) dxdy. −∞ −∞. The estimation of µ with GLS is however more complicated. We have −1. −1. Σ⋆ 2 Y = Σ⋆ 2 Xβ + e where.   Fˆ2−1 (u1 ) − Fˆ1−1 (u1 )    .. Y = ; X =  . Fˆ2−1 (uk ) − Fˆ1−1 (uk ) .  1 ..  ; β = µ; e = Σ− 12 ϵ. ⋆ .  1. The estimator for µ is then µ ˆ =. (. X T Σ−1 ⋆ X 42. )−1. XΣ−1 ⋆ Y.. (3.21).

(50) If we say that.    Σ−1 =  ⋆ . α1,1 . . . α1,k α2,1 . . . α2,k .. .. .. . . . αk,1 . . . αk,k.    , . then using (3.21) and defining Yj to be the jth element in Y , we have k ∑. µ ˆ=. Yj. j=1. k ∑. αi,j. i=1. k ∑ k ∑. .. αi,j. j=1 i=1. The variance of µ ˆ is σ2 var(ˆ µ) ≈ n⋆. ∫1 (. ∂f1 (F1−1 (u)) ∂u. )2 du.. 0. Table 3.11 shows results for the location only model when F1 and F2 are normal distributions. One thousand simulations were used, with k = 8 and n1 , n2 = n. The same setup as in (3.20) to estimate the density function was used. The continuous EDF (2.3) was used to estimate F1 (x) and F2 (x) and hence F1−1 (u) and F2−1 (u). Table 3.11 gives the finite sample efficiency, e(˜ µ:µ ˆ) =. var(ˆ µ) . var(˜ µ). Tables 3.12 and 3.13 show the same setup but when F1 is t distributed with 3 and 5 degrees of freedom respectively. The finite sample efficiencies of OLS relative to GLS show that OLS is more efficient at the sample sizes looked at. The same pattern of the GLS becoming more efficient relative to OLS as the sample size increases in the location and scale case was observed. 43.

(51) Table 3.11: Finite sample efficiencies for the location only model for a normal distribution µ 0 1 2 0 1 2. n 50 50 50 100 100 100. e(˜ µ:µ ˆ) 1.10 1.08 1.12 1.04 1.07 1.08. Table 3.12: Finite sample efficiencies for the location only model for a t distribution with 3 d.f. µ 0 1 2 0 1 2. n 50 50 50 100 100 100. e(˜ µ:µ ˆ) 1.15 1.11 1.12 1.02 1.05 1.06. Table 3.13: Finite sample efficiencies for the location only model for a a t distribution with 5 d.f. µ 0 1 2 0 1 2. n 50 50 50 100 100 100. e(˜ µ:µ ˆ) 1.16 1.18 1.15 1.09 1.12 1.10. 44.

(52) 3.5. Change of response variable. Thus far we have estimated µ and σ by regressing Fˆ2−1 (u) on Fˆ1−1 (u) by analogy with the relation d. Y = µ + σX. However, if we reparameterize the model as d. X = ν + τY then we would estimate ν and τ by regressing Fˆ1−1 (u) on Fˆ2−1 (u). Since τ = ν=. − σµ ,. 1 σ. and. logical consistency would require that τˆ =. 1 µ ˆ and νˆ = − . σ ˆ σ ˆ. (3.22). However, from regression theory we know that 3.22 is false. We consider the following two cases:. Case 1. F2−1 = µ + σF1−1. Case 2. F1−1 = ν + τ F2−1 .. (3.23). In case 1, µ and σ were estimated by GLS. In case 2 ν and τ were estimated by GLS and then converted to estimates of µ and σ using (3.22). The sample sizes were 50, 250 and 500, with 1000 repetitions. F1 was a standard normal distribution, F2 was normal (µ, σ) and k was set to 8. The same setup as in (3.20) to estimate the density function was used. The continuous EDF (2.3) was used to estimate F1 (x) and F2 (x) and hence F1−1 (u) and F2−1 (u). The mean of the estimates over the 1000 repetitions is given in Table 3.14. As can be seen the difference between the two sets of estimates are very minor and thus of little or no practical consequence.. 45.

(53) Table 3.14: Estimates of µ and σ in cases 1 and 2 from (3.23). µ 0 0 5 5 0 0 5 5 0 0 5 5. σ 1 5 1 5 1 5 1 5 1 5 1 5. n 50 50 50 50 250 250 250 250 500 500 500 500. Case 1 µ ˆ σ ˆ 0.00 1.00 -0.05 4.97 5.01 1.00 5.05 4.99 0.00 1.00 -0.01 5.00 5.00 1.00 4.99 4.99 0.00 1.00 0.01 5.01 5.00 1.00 5.00 4.99. 46. Case 2 µ ˆ σ ˆ 0.00 1.00 -0.03 4.99 4.99 1.00 5.06 4.99 0.00 1.00 -0.01 5.00 5.01 1.00 5.00 4.99 0.00 1.00 0.01 5.01 5.00 1.00 5.01 4.99.

(54) Chapter 4 A semi-parametric regression method for censored data The scenario investigated in Chapter 3 will now be extended to the case where censoring of the observations can occur. This is discussed in Hsieh [10] and we will follow the methodology set out there. The following notation will be used when the data are censored.. The data from the two groups will still be denoted by. X1 = {X1,1 , . . . , X1,n1 } and X2 = {X2,1 , . . . , X2,n2 } with CDFs F1 and F2 respectively. The censoring observations will be C1 = {C1,1 , . . . , C1,n1 } and C2 = {C2,1 , . . . , C2,n2 } with survival functions K1 and K2 respectively. The observed data will then be de˜ 1 = {X ˜ 1,1 , . . . , X ˜ 1,n1 } and X ˜ 2 = {X ˜ 2,1 , . . . , X ˜ 2,n2 }, where X ˜ 1,i = X1,i ∧ C1,i noted by X ˜ 2,i = X2,i ∧ C2,i . The observed censoring indicators will be δ1,i = I(X1,i ≤ C1,i ) and X and δ2,i = I(X2,i ≤ C2,i ). Denote the observed data, ordered in their first component, by ˜ 1,(i) , δ1,(i) ), i = 1, . . . , n1 and (X ˜ 2,(j) , δ2,(j) ), j = 1, . . . , n2 (X ˜ 1,(1) < . . . < X ˜ 1,(n ) and X ˜ 2,(1) < . . . < X ˜ 2,(n ) . with X 1 2 The extension to the censored case is based on the following model 1. T2 = T1γ λ,. (4.1). where T1 and T2 are random variables representing the lifetimes in the two treatment 47.

(55) groups and γ, λ > 0. Taking logarithms in (4.1) gives log T2 =. 1 log T1 + log λ. γ. (4.2). Set µ = log λ, σ = γ1 , X1 = log T1 and X2 = log T2 . Then (4.2) becomes X2 = µ + σX1 . which is the model considered in Chapter 3. Following the methodology set out there, we postulate a heteroscedastic regression model Fˆ2−1 (u) = µ + σ Fˆ1−1 (u) + ϵ(u),. (4.3). where now Fˆ1 and Fˆ2 are Kaplan-Meier estimators of F1 and F2 , i.e. for j = 1, 2 ∏ ( nj − i )δj,(i) ˆ Fj (t) = 1 − , (4.4) n − i + 1 j ˜ {i:Xj,(i) ≤t}. where t ≥ 0. Monte Carlo simulations will be undertaken to investigate the relative effectiveness of the GLS and OLS estimators of µ and σ.. 4.1. The regression setup. Expanding (4.3) with 0 ≤ u1 ≤ . . . ≤ uk ≤ 1, we have Fˆ2−1 (u1 ) = µ + σ Fˆ1−1 (u1 ) + ϵ(u1 ) .. . Fˆ2−1 (uk ) = µ + σ Fˆ1−1 (uk ) + ϵ(uk ),. (4.5). an ordinary simple regression setup with response variable Fˆ2−1 (u) and predictor variable Fˆ1−1 (u). Hsieh [10] looks at both the OLS and GLS cases. He also shows that the GLS method is asymptotically efficient. The expressions for the asymptotic variances are complicated and will not be dealt with here. We will be focusing on the simulation results. 48.

(56) From (9) and (10) from Hsieh [10], Fˆ1−1 and Fˆ2−1 can be represented in terms of two generalized Kiefer processes with covariance functions Λi = D(1−u) C −1 D(Cγ (i) ) C T −1 D(1−u) .. (4.6). In (4.6) Dg represents a diagonal matrix with main diagonal vector g and where the matrix C is the linear operator such that Cu = (u1 , u2 − u1 , . . . , uk − uk−1 )T and (i). (i). γ (i) = (γ1 , . . . , γk )T with (i) γj. ∫. uj. =. du (1 −. 0. u)2 (Ki (Fi−1 (u))). 1 ≤ j ≤ k,. (i). (i). - see Lemma 2.1 of Hsieh [10] - our γj corresponds to Hsieh’s [10] tj . To estimate (i) ˜ i,(l) ) = ηi,l , l = 1, . . . , ni . These ηi,l ’s are ordered but are not all γ first set Sî (X j. distinct. Denote the distinct ηi,l ’s by ξi,l with l = 1, . . . , n′1 where n′i is the number of distinct values. Set ξ1,(0) = 1. The frequencies of these distinct ξi,l ’s are fi,1 , . . . , fi,n′i (i). and the cumulated frequencies are Fi,1 , . . . , Fi,n′i . It is shown in Section 5.2 that γj can be estimated by n′i −1 (i) γˆj. =. ∑ l=0. (i). Using this estimate of γj. ni log ni − Fi,l. (. 1 − ξi,l ∧ uj 1 − ξi,l+1 ∧ uj. ) .. ˆ i of Λi . Then the GLS in (4.6) we get an estimate Λ. estimator is −1 T −1 βˆ = (X T Σ−1 ⋆ X) X Σ⋆ Y. where Y = [Fˆ2−1 (u1 ) . . . Fˆ2−1 (uk )]T , ]T [ 1 ... 1 , X = ˆ −1 F1 (u1 ) . . . Fˆ1−1 (uk ) [. and Σ⋆ =. Df−1 ⋆ ˆ −1 1 (F1 (u)). ] 1 ˆ 1 ˆ Λ1 + Λ2 Df−1 . ⋆ ˆ −1 1 (F1 (u)) n1 n2. (Recall the definition of Df−1 ⋆ (F ˆ −1 (u)) from (3.12)). The covariance matrix is 1. (4.7). 1. ˆ = σ 2 (X T Σ⋆ −1 X)−1 . cov(β) 49.

(57) The asymptotic covariance matrix of β can be found in [10] page 2713. The OLS estimator is β˜ = (X T X)−1 X T Y with covariance matrix ( ) ˜ = σ 2 (X T X)−1 (X T Σ−1 X)(X T X)−1 . cov(β) ⋆. 4.2. Simulation results. A number of cases were investigated by Monte Carlo simulation. Table 4.1 shows the distributions that were used. F1 and K1 denote the CDF and survival function of X1 and its associated censoring variable C1 respectively. The X2 data are distributed as µ + σX1 with censoring variable C2 , which has survival function K2 . Table 4.1 lists the three cases that will be considered. The exponential distribution, denoted by exp(λ), has the density function f (x; λ) =. e. −x λ. λ. .. The lognormal distribution, denoted by lognormal(a, b), has the density function ) ( 1 −(ln x − a)2 f (x : a, b) = √ exp . 2b2 xb 2π In addition we define n1 n2 1 ∑ 1 ∑ (1) (2) δ and φ2 = δ , φ1 = n1 i=1 i n2 i=1 i. which are the observed proportions of uncensored observations. Case 1 was the same simulation setup from Hsieh [10]. Our censoring proportions for the second group did not match his though for the first group they did. We could not fix this discrepancy and as such a direct comparison of his results was not possible. 50.

(58) Table 4.1: Distributions of variables X1 , C1 and C2 Case 1 2 3. 4.2.1. X1 exp(1) log(exp(1)) lognormal(1, 1). C1 exp(4) exp(2) lognormal(1.5, 1). C2 exp(8) exp(4) lognormal(2, 1). Bias of the estimators. Table 4.2 gives the bias results for the cases outlined in Table 4.1 for various values of µ and σ. There were 8 regression points, from 0.1 to 0.8 with evenly spaced intervals. The simulations were run 1000 times with the sample size being set to 50 for both groups. The continuous version of the Kaplan Meier estimator (2.16) was used to estimate F1 (t) and F2 (t) and hence F1−1 (u) and F2−1 (u). The bias is calculated ¯ˆ and σ ¯ˆ along simply as the arithmetic mean of the estimates over the simulations (µ ¯˜ and σ ¯˜ ) minus the true value, that is, with µ ¯ˆ − µ and σ ¯ˆ − σ µ ˆbias = µ ˆbias = σ for GLS and ¯˜ − µ and σ ¯˜ − σ µ ˜bias = σ ˜bias = σ for OLS. There is hardly any bias for any of the cases. The OLS method has less of a bias for µ than the GLS method though µ ˆbias is still negligible.. 4.2.2. Variance of the estimators. Table 4.3 gives the variance of the estimators for case 1 for the same simulation setup as used in the simulations to test the bias of the estimators. The variance was used rather than the mean squared error due to the negligible bias. The sample sizes were equal (n1 = n2 = n). The finite sample efficiencies are given by e(˜ µ:µ ˆ) =. var(ˆ µ) var(ˆ σ) and e(˜ σ:σ ˆ) = . var(˜ µ) var(˜ σ) 51.

(59) Table 4.2: Bias of the estimators, for the cases outlined in Table 4.1 Case. µ. σ. φ1. φ2. µ ˆbias. µ ˜bias. σ ˆbias. σ ˜bias. 1. 0.5 1 2. 0.5 1 1.5. 0.80 0.80 0.80. 0.88 0.78 0.66. 0.01 0.01 0.02. 0.00 0.01 0.00. 0.02 0.08 0.20. 0.02 0.09 0.23. 2. 0.5 1 2. 0.5 1 1.5. 0.91 0.91 0.91. 0.91 0.84 0.72. -0.03 -0.06 -0.09. 0.01 -0.02 0.03. 0.00 0.00 0.00. 0.00 0.00 -0.01. 3. 0.5 1 2. 0.5 1 1.5. 0.85 0.85 0.85. 0.95 0.86 0.72. 0.00 0.02 0.02. 0.00 0.00 0.00. 0.02 0.02 0.03. 0.02 0.03 0.04. The results show that the GLS method outperforms the OLS method by a large margin. The variance is low for both the estimation of µ and σ for both OLS and GLS methods. The only notably higher variance was in the estimation of σ when it is greater than 1 and the sample size is small. Table 4.3: Variance of estimators for Case 1 from Table 4.1 µ 0.5 1 2 0.5 1 2. σ 0.5 1 1.5 0.5 1 1.5. n 50 50 50 100 100 100. φ1 0.80 0.80 0.80 0.80 0.80 0.80. φ2 0.89 0.78 0.66 0.88 0.78 0.66. var(ˆ µ) 0.00 0.01 0.02 0.00 0.00 0.01. var(˜ µ) 0.00 0.02 0.04 0.00 0.01 0.02. var(ˆ σ) 0.02 0.08 0.21 0.01 0.04 0.11. var(˜ σ) 0.02 0.09 0.23 0.01 0.05 0.12. eˆ(˜ µ:µ ˆ) 0.38 0.38 0.39 0.40 0.39 0.37. eˆ(˜ σ:σ ˆ) 0.86 0.91 0.90 0.95 0.87 0.87. Table 4.4 provides the variance of the estimators for case 2. The variances are larger than in case 1 but are still fairly low. There is little difference between OLS and GLS, OLS giving a better estimation of µ and GLS giving a better estimation of 52.

(60) σ. Table 4.4: Variance of estimators for Case 2 from Table 4.1 µ 0.5 1 2 0.5 1 2. σ 0.5 1 1.5 0.5 1 1.5. n 50 50 50 100 100 100. φ1 0.91 0.91 0.91 0.91 0.91 0.91. φ2 0.91 0.84 0.72 0.92 0.84 0.72. var(ˆ µ) 0.02 0.06 0.15 0.01 0.03 0.07. var(˜ µ) 0.02 0.06 0.13 0.01 0.03 0.06. var(ˆ σ) 0.01 0.05 0.13 0.01 0.03 0.06. var(˜ σ) 0.01 0.06 0.13 0.01 0.03 0.06. eˆ(˜ µ:µ ˆ) 1.05 1.06 1.10 1.00 1.03 1.05. eˆ(˜ σ:σ ˆ) 0.96 0.95 1.03 0.94 0.93 0.94. The variance results for case 3 are given in Table 4.5. They are similar to case 1 in that the GLS gives a much more accurate estimation than OLS. The variances were all low except for values of σ greater than 1 with small sample sizes. Table 4.5: Variance of estimators for Case 3 from Table 4.1 µ 0.5 1 2 0.5 1 2. σ 0.5 1 1.5 0.5 1 1.5. n 50 50 50 100 100 100. φ1 0.85 0.85 0.85 0.85 0.85 0.85. φ2 0.95 0.86 0.72 0.95 0.86 0.72. var(ˆ µ) 0.01 0.02 0.04 0.00 0.01 0.02. var(˜ µ) 0.01 0.04 0.10 0.01 0.02 0.05. var(ˆ σ) 0.03 0.09 0.22 0.01 0.05 0.11. var(˜ σ) 0.03 0.11 0.28 0.01 0.06 0.13. eˆ(˜ µ:µ ˆ) 0.45 0.42 0.39 0.40 0.40 0.37. eˆ(˜ σ:σ ˆ) 0.83 0.77 0.79 0.75 0.77 0.79. There was a problem with the estimation due to the method of choosing the regression points, u. The Kaplan-Meier estimator is only defined for a certain interval; within 0 and the quantile of the last uncensored observation. This happens as beyond the last uncensored observation there is no more available information that can be used by the product-limit estimator as it has its jumps at the uncensored values. Thus when there is heavy censoring, the quantile of the last uncensored observation may be lower than the highest regression point. Figure 4.1 illustrates this point. In 53.

(61) this case F1 (x) is a standard exponential distribution and C1 is uniformly distributed on [0, 2.2], chosen so approximately 40 percent of the data is censored, n = 100. Fˆ1 (x) is only defined up to 0.89. If there was a regression point at 0.9 for instance, then the product limit estimator will be undefined. To counteract this, it is suggested that the largest regression point is less than or equal to the quantile of the last uncensored observation. Figure 4.1: The estimated CDF for an standard exponential distribution with 40% censoring 1. 0.9. 0.8. 0.7. F1 (x). 0.6. 0.5. 0.4. 0.3. 0.2. 0.1. 0. 0. 0.2. 0.4. 0.6. 0.8. 1. x. 54. 1.2. 1.4. 1.6. 1.8. 2.

(62) Chapter 5 Nonparametric confidence bands for the quantile comparison function with censored data This chapter is concerned with a fully nonparametric model in which two independent data sets, X1 = {X1,1 , . . . , X1,n1 } (> 0) and X2 = {X2,1 , . . . , X2,n2 } (> 0), come from two unspecified survival functions, S1 and S2 respectively. Our objective is to construct a confidence band for the quantile comparison function, q = S2−1 (S1 ), and our only assumptions are that both S1 and S2 are continuous and strictly decreasing. Doksum and Sievers [4] considered this type of problem for complete data. The additional factor that we wish to incorporate into the analysis is to allow incomplete data. Lu, Wells and Tiwari [15] used a bootstrap method that allowed for censored data. This section will look at a method that does not require use of a bootstrap. ˜ 1 = {X1,i ∧ The data are censored and we do not observe X1 and X2 but rather X ˜ 2 = {X2,i ∧ C2,i , i = 1, . . . , n2 } where the C1,i and C2,i C1,i , i = 1, . . . , n1 } and X are independent observations from continuous distributions with survival functions K1 and K2 respectively. The observed data therefore comes from distributions with survival functions, with t ≥ 0, ˜ 1,i > t] = S1 (t)K1 (t) H(t) = P [X 55.

(63) and ˜ 2,i > t] = S2 (t)K2 (t). J(t) = P [X. 5.1. Asymptotic representation of the quantile comparison function. First, suppose there is no censoring present. Set ( n⋆ =. n1 n2 n1 + n2. ). ( =. 1 1 + n1 n2. )−1 .. (5.1). From Potgieter [16], Section 2.3, we see that √ n⋆ (ˆ q (t) − q(t)) =. √ n⋆ (Sˆ1 (t) − Sˆ2 (q(t))) + op (1). f2 (q(t)). (5.2). (Since Potgieter [16] is possibly not freely available, we reproduce his derivation in Appendix B). In principle, therefore, confidence bands for q could be obtained using the asymptotic distribution of the first term of the right hand side of (5.2). However this would require the estimation of f2 (q(t)) which is extremely variable where f2 (q(t)) is close to zero. The situation here is analogous to the estimation of a single quantile discussed in Section 21.8, page 309, of van der Vaart [18]. As pointed out there it is simpler to base the estimation on the numerator in (5.2) alone. Our confidence band for q will be

(64) { } √

(65)

(66) ˆ

(67) ˆ I := q(t) : n⋆

(68) S1 (t) − S2 (q(t))

(69) ≤ Cα where C is chosen so that P (q(t) ∈ I|S1 , S2 ) = 1 − α. Notice that 56. (5.3).

(70) n2 1 ∑ ˆ S2 (q(t)) =1 − I(X2,i ≤ F2−1 (F1 (t))) n2 i=1 n2 1 ∑ =1 − I(F2 (X2,i ) ≤ F1 (t)) n2 i=1 n2 1 ∑ =1 − I(U2,i ≤ F1 (t)) n2 i=1. and n1 1 ∑ ˆ S1 (t) =1 − I(F1 (X1,i ) ≤ F1 (t)) n1 i=1 n1 1 ∑ =1 − I(U1,i ≤ F1 (t)) n1 i=1. where the U1,i and U2,i are uniformly distributed between zero and one. Therefore the distributions of Sˆ2 (q(t)) and Sˆ1 (t) are independent of F2 . Thus, we may assume that S1 ≡ S2 . Set T =. √.

(71)

(72)

(73)

(74) n⋆ sup

(75) Sˆ1 (t) − Sˆ2 (q(t))

(76) t. Then with S1 = S2 (= S), i.e. q(t) = t,

(77)

(78) √

(79)

(80) T = n⋆ sup

(81) Sˆ1 (t) − Sˆ2 (t)

(82) t

(83)

(84) √

(85)

(86) = n⋆ sup

(87) (Sˆ1 (t) − S(t)) − (Sˆ2 (t) − S2 (t))

(88) t

(89) √

(90) √

(91) n⋆ √

(92) √ n ⋆ = sup

(93)

(94) n1 (Sˆ1 (t) − S1 (t)) − n2 (Sˆ2 (t) − S2 (t))

(95)

(96) n n2 t

(97) √1

(98) √

(99) n⋆ √

(100) √ n ⋆ = sup

(101)

(102) n1 (Sˆ1 (S1−1 (u)) − u) − n2 (Sˆ2 (S2−1 (u)) − u)

(103)

(104) n1 n2 0≤u≤1

(105) √

(106) √

(107) n⋆

(108) n⋆ ¯ = sup

(109)

(110) Z¯1 (u) − Z2 (u)

(111)

(112) , n1 n2 0≤u≤1 where Z¯i (u), i = 1, 2 was defined in (2.20). Thus, P (q(t) ∈ I|S1 , S2 ) = P (T ≤ Cα |S1 ≡ S2 ). 57. (5.4). (5.5).

No results found