Computational issues - Quantile based estimation of treatment effects in censored data

An empirical estimate of this probability using the continous version of the Kaplan-Meier (2.16) is

The integrals in (5.12) can be evaluated using numerical integration. An alter-native method is as follows. Set ˜S1( ˜X1,(i)) = η1,(i), i = 1, . . . , n1. These η1,(i)’s are not all distinct due to possibly tied data. Denote the distinct η_1,(i)’s by ξ_1,(j) with j = 1, . . . , n^′₁ where n^′₁ is the number of distinct values. Set ξ_1,(0) = 1. The frequen-cies of these distinct ξ_1,(j)’s are f_1,(1), . . . , f_1,(n′

1) and the accumulated frequencies are denoted by F_1,(1), . . . , F_1,(n^′

Hence

This gives us an expression for ˆθ(u) which we can use in (5.10). The next step is to solve (5.10) for b.

We mentioned that appropriate values of u_⋆ and u^⋆ must be chosen. The obvious choices would be 0 and 1 respectively. However, if u_⋆ is less than both ξ_1,(1) and ξ_2,(1) then (5.11) will be zero. Thus u_⋆ should be chosen so that it is larger than ξ_1,(1) or ξ_2,(1). The choice of u^⋆ is restricted by the estimator ˜S(.) because it is undeﬁned after the last uncensored observation. Thus u^⋆ should not exceed the minimum of ξ_1,(n′

and ξ_2,(n^⋆

2). That is

u_⋆ ≥ max(ξ1,(1), ξ_2,(1)) and u^⋆ ≤ min(ξ1,(n^′₁), ξ_2,(n′

2)) (5.14)

The integral in (5.10) can be evaluated by numerical integration, using (5.13) and the bounds given in (5.14). The solution in b of the equation can then be found numerically using, for instance, the Matlab function fzero.m.

Table 5.1: Uniform censoring distributions

Distribution of X₁ Desired censoring % Censoring distribution

Weibull(1,0.5) 20% U(0,7.6)

40% U(0,2.2)

Weibull(1,1) 20% U(0,5.0)

40% U(0,2.2)

Weibull(1,1.5) 20% U(0,4.5)

40% U(0,2.2)

5.3 Simulation results

To test the accuracy of our approximations, Monte Carlo simulations were run. We used the same simulation setup as in Lu, Wells and Tiwari [15] and provide their re-sults in Table 5.3 for comparison. Weibull distributions with three sets of parameters were looked at for the distribution of X₁. A weibull with parameters (1, 0.5) and (1, 1.5) as well as with parameters (1, 1) which corresponds to a standard exponential distribution. A total of 2500 simulations were run for each setup. The censoring distribution was uniformly distributed on an interval chosen to censor either 40% or 20% of the data. Table 5.1 gives more information. The density function for the Weibull, denoted Weibull(a, b), is given by

f (x; a, b) = ba^−bx^b⁻¹exp (

−(x a

)b)

where x > 0.

The following test statistic was used,

T^⋆ =

√ n1n2

n₁+ n₂ max

W | ˆS1(W )− ˆS2(W )|, (5.15) where W is the combined set of the uncensored values from X₁ and X₂. As mentioned previously, see Section 4.2.2, the Kaplan-Meier estimator is not deﬁned past the last uncensored observation, thus W should not go past these bounds for either ˆS₁(W ) or

Sˆ₂(W ). Hence

W ={X1,(1), . . . , X1,(¯n1), X2,(1), . . . , X2,(¯n2)}.

where ¯n₁ and ¯n₂ were chosen as large as possible such that

X_1,(¯_n₁₎ <= max(X_1,(¯_n₁₎, X_2,(¯_n₂₎) and X_2,¯_n₁ <= max(X_1,(¯_n₁₎, X_2,(¯_n₂₎) held.

Critical values, denoted b^⋆, were obtained by plugging the simulated data into (5.10) and computing b such that (5.10) held for a speciﬁc value of alpha, the required level of the test. This value was then compared against (5.15) to get our simulated signiﬁcance level, α^⋆, that is

α^⋆ = 1 N

∑N i=1

I(Ti^⋆ > b^⋆_i)

with N being the total number of simulations run, and the subscript i refer to the simulations.

There was one issue that came up in the smallest sample sizes of only 10 obser-vations. When the censoring was chosen to be 40%, that is, uniformly distributed on the interval [0, 2.2], there were occasions when the entire data set was censored. In Lu, Wells and Tiwari [15] page 1018, they did not specifically mention this problem but they had dealt with it by defining the Kaplan-Meier estimate to be zero past the largest observation regardless of whether it was censored or not. That is, they defined

Sˆ_i(t) = 0 for t≥ Xi,ni when δ_i,n_i = 0

with i = 1, 2. We will also use this edited version for these low sample simulations.

Our results, given in Table 5.2, are fairly good for both the Weibull(1,1) and Weibull(1,1.5) survival distributions considering the extremely small sample sizes and high censoring percentages. The simulated values are close to the nominal values for the Weibull(1,1.5) for all sample sizes and censoring percentages. The diﬀerence

Table 5.2: Simulation Results, with uniform censoring Censoring % Level of test Survival Distribution K₁ K₂ n₁ n₂ 0.1 0.05 0.01

Weibull(1,0.5) 0.40 0.40 10 15 0.18 0.10 0.02 0.40 0.40 15 20 0.21 0.11 0.02 0.40 0.40 20 25 0.25 0.14 0.04 0.20 0.40 10 15 0.13 0.06 0.01 0.20 0.40 15 20 0.14 0.08 0.01 0.20 0.40 20 25 0.17 0.09 0.02 Weibull(1,1) 0.40 0.40 10 15 0.13 0.07 0.01 0.40 0.40 15 20 0.14 0.07 0.01 0.40 0.40 20 25 0.13 0.06 0.01 0.20 0.40 10 15 0.10 0.05 0.01 0.20 0.40 15 20 0.11 0.05 0.01 0.20 0.40 20 25 0.09 0.04 0.01 Weibull(1,1.5) 0.40 0.40 10 15 0.11 0.06 0.01 0.40 0.40 15 20 0.11 0.05 0.01 0.40 0.40 20 25 0.09 0.04 0.01 0.20 0.40 10 15 0.09 0.04 0.01 0.20 0.40 15 20 0.09 0.05 0.01 0.20 0.40 20 25 0.08 0.04 0.01

Table 5.3: Simulation Results for a bootstrap method, with uniform censoring Censoring % Level of test

Survival Distribution K₁ K₂ n₁ n₂ 0.1 0.05 0.01 Weibull(1,0.5) 0.40 0.40 10 15 0.11 0.05 0.01 0.40 0.40 15 20 0.09 0.05 0.01 0.40 0.40 20 25 0.10 0.05 0.01 0.20 0.40 10 15 0.09 0.05 0.01 0.20 0.40 15 20 0.11 0.05 0.01 0.20 0.40 20 25 0.10 0.05 0.01 Weibull(1,1) 0.40 0.40 10 15 0.09 0.05 0.01 0.40 0.40 15 20 0.10 0.05 0.01 0.40 0.40 20 25 0.10 0.05 0.01 0.20 0.40 10 15 0.09 0.05 0.01 0.20 0.40 15 20 0.10 0.05 0.01 0.20 0.40 20 25 0.10 0.05 0.01 Weibull(1,1.5) 0.40 0.40 10 15 0.09 0.05 0.01 0.40 0.40 15 20 0.10 0.05 0.01 0.40 0.40 20 25 0.10 0.05 0.01 0.20 0.40 10 15 0.09 0.05 0.01 0.20 0.40 15 20 0.10 0.05 0.01 0.20 0.40 20 25 0.10 0.05 0.01

being in general 0.01. For the Weibull(1,1) with 40% censoring for both groups the simulated values were noticeably higher than the nominal values. The discrepancy was about 0.03 and 0.02 for nominal values of 0.1 and 0.05 respectively. When the ﬁrst group’s censoring percentage was 20% then the simulated values were much closer to the nominal values with the discrepancies now being around 0.01 for nominal values of 0.1 and 0.5.

The results for the Weibull(1,0.5) were not good with the simulated values being much larger than the nominal values. This is most severe with 40% censoring for both groups, with the simulated values being more than double the nominal values. There is a slight improvement when one group’s censoring was 20% but still the discrepancies are large, speciﬁcally for nominal values of 0.05 and 0.1. This was due to the high censoring, and when the simulations were done with more moderate censoring of 20%

and 10% they were vastly improved. This can be seen in Table 5.4. The simulated values are much closer to the nominal values, with the discrepancies generally being 0.01.

Lu, Wells and Tiwari’s [15] simulated values are close to the nominal values, with the diﬀerence not being more than 0.01 across all cases considered. Our simulation results only performed this well for a Weibull(1,1.5), and for a Weibull(1,1) when one censoring percentage was 20%.

In addition we looked at how good the approximations would be when the censor-ing came from an exponential distribution rather than a uniform distribution. This case was not considered in Lu, Wells and Tiwari [15]. The setup for the various cen-soring %’s and survival distributions is shown in Table 5.5 and the results are shown in Table 5.6. These results are an improvement over when the censoring was uniformly distributed. There is still an issue with the Weibull(1,0.5), though for exponential censoring this was only when there was 40% for both groups. For all other cases, the diﬀerence between simulated and nominal value did not exceed 0.02.

Table 5.4: Simulation Results for Weibull(1,0.5) with uniform censoring Censoring % Level of test Survival Distribution K₁ K₂ n₁ n₂ 0.1 0.05 0.01

Weibull(1,0.5) 0.20 0.20 10 15 0.11 0.06 0.01 0.20 0.20 15 20 0.09 0.04 0.01 0.20 0.20 20 25 0.09 0.04 0.01 0.10 0.20 10 15 0.11 0.05 0.01 0.10 0.20 15 20 0.10 0.05 0.01 0.10 0.20 20 25 0.08 0.04 0.01

Table 5.5: Exponential censoring distributions

Distribution of X Desired censoring % Censoring distribution

Weibull(1,0.5) 20% exp(5.5)

40% exp(1.4)

Weibull(1,1) 20% exp(4)

40% exp(1.5)

Weibull(1,1.5) 20% exp(3.9)

40% exp(8.4)

Table 5.6: Simulation Results, with exponential censoring Simulations 2500 Censoring % Level of test Survival Distribution K₁ K₂ n₁ n₂ 0.1 0.05 0.01

Weibull(1,0.5) 0.40 0.40 10 15 0.14 0.07 0.01 0.40 0.40 15 20 0.14 0.06 0.01 0.40 0.40 20 25 0.15 0.07 0.01 0.20 0.40 10 15 0.11 0.05 0.01 0.20 0.40 15 20 0.10 0.05 0.01 0.20 0.40 20 25 0.11 0.05 0.01 Weibull(1,1) 0.40 0.40 10 15 0.12 0.06 0.01 0.40 0.40 15 20 0.11 0.06 0.01 0.40 0.40 20 25 0.11 0.05 0.01 0.20 0.40 10 15 0.08 0.04 0.01 0.20 0.40 15 20 0.09 0.04 0.01 0.20 0.40 20 25 0.09 0.05 0.01 Weibull(1,1.5) 0.40 0.40 10 15 0.11 0.05 0.01 0.40 0.40 15 20 0.10 0.04 0.01 0.40 0.40 20 25 0.09 0.04 0.01 0.20 0.40 10 15 0.09 0.03 0.00 0.20 0.40 15 20 0.08 0.04 0.01 0.20 0.40 20 25 0.08 0.04 0.01

Epilogue

In this dissertation we have considered estimation of location and scale diﬀerences between two populations based on independent samples obtained from these. In the case of uncensored we have compared ordinary least squares estimation and a generalised least squares method. The same has been done in the case where the data may have been right censored. The comparisons made are based on theoretical calculations supported by extensive monte carlo simulations. We have also considered analytic method of estimating a quantile comparison function that does not involve use of a bootstrap methodology. Further work on these problems could center on analyzes for matched pair data. A diﬃcultly in this last respect is the complications involved in constructing bivariate analogues of the Kaplan-Meier estimator.

Bibliography

[1] Breslow, N., Crowley, J. A Large Sample Study of the Life Table and Product Limit Estimates Under Random Censorship. The Annals of Statistics, Vol. 2, No. 3 (1974), pp 437-453

[2] Cs¨org˝o, M., R´ev´esz, P. Strong Approximations in Probability and Statistics.

New York: Acadamic Press, 1981

[3] Doksum, K. Empirical Probability Plots and Statistical Inference for Nonlinear Models in the Two-sample Case. The Annals of Statistics, Vol. 2, No. 2 (1974), pp 267-277l

[4] Doksum, K.A., Sievers, G.L. Plotting with conﬁdence: Graphical comparisons of two populations. Biometrika, Vol. 63, No. 3 (1976), pp 421-434

[5] Doksum, K.A. Some graphical methods in statistics. A review and some exten-sions. Statistica Neerlandica, Vol. 31 (1977), pp 53-68

[6] Einmahl, J.H.J., McKeague, I.W. Conﬁdence tubes for Multiple Quantile Plots via Empirical Likelihood. The Annals of Statistics, Vol. 27, No. 4 (1999), pp 1348-1367

[7] Hall, P., Lombard, F., Potgieter, C.J. A new approach to function-based hypoth-esis testing in location-scale families. To appear in Technometrics

[8] Hsieh, F. The Empirical Process Approach for Semiparametric Two-Sample Models with Heterogneous Treatment Eﬀect. Journal of the Royal Statistical Society, Serious B (Methodological), Vol. 57, No. 4 (1995), pp 735-748

[9] Hsieh, F. A Transformation Model for Two Survival Curves: An Empirical Process Approach. Biometrika, Vol. 83, No. 3 (1996), pp 519-528

[10] Hsieh, F. Empirical Process Approach in a Two-Sample Location-Scale model with Censored Data. The Annals of Statistics, Vol. 24, No. 6 (1996) , pp 2705-2719

[11] Klein, J.P., Moeschberger, M.L. Survival Analysis: techniques for censored and truncated data. Second edition. Springer, New York, (2005)

[12] Lehmann, E.L. Statistical Methods based on Ranks. Holden Day, San Franciso, (1974)

[13] Li, G., Tiwari. R.C., Wells, M.T. Quantile Comparison Functions in Two-Sample Problems, With Application to Comparisons of Diagnostic Markers.

Journal of the American Statistical Association, Vol. 91, No. 434 (1996), pp 689-698

[14] Lombard, F. Nonparametric Conﬁdence Bands for a Quantile Comparison Function. Technometrics, Vol. 47, No. 3 (2005), pp 364-369

[15] Lu, H.H.S., Wells, M.T., Tiwari, R.C. Inference for Shift Functions in the Two-Sample Problem with Right-Censored Data: With Applications. Journal of the American Statistical Association, Vol. 89, No. 427 (1994), pp 1017-1026

[16] Potgieter, C.J. Estimation and Testing of Linear Treatment Eﬀects from Matched Pair Data. Masters of Science Dissertation, University of Johannes-burg, January (2007)

[17] Potgieter, C.J., Lombard, F. Nonparametric estimation of location and scale parameters. Computational Statistics and Data Anaylsis, 56, pp 4327-4337 [18] van der Vaart, A.W. Asymptotic Statistics. Cambridge University Press, New

York, (1998)

Appendices

Appendix A

Approximations for the empirical quantile function

A.1 Uncensored data

From Hsieh [8], (5) and (6), we obtain a strong approximation for the empirical quantile function and substituting these results into (3.1) gives

Fˆ₂⁻¹(u) = µ + σ ˆF₁⁻¹(u)− σ

√n₁

B1,n1(u)

f₁(F₁⁻¹(u)) + 1

√n₂

B2,n2(u) f₂(F₂⁻¹(u)),

where B_2,n₂(u) and B_1,n₁(u) are independent Brownian Bridges as set out in Section 2.4.1. A change from f₂(F₂⁻¹(u)) is needed as when the estimation is conducted using f₂(F₂⁻¹(u)) there is a bias involved in the estimation of σ. This will be shown later on in Table A.1. If we take (3.1) and derive with respect to u we get

f2(F₂⁻¹(u)) = σ f1(F₁⁻¹(u)) f2(F₂⁻¹(u) = 1

σf1(F₁⁻¹(u)).

Using this substitution in the above leads to Fˆ₂⁻¹(u) = µ + σ ˆF₁⁻¹(u)− σ

( 1

√n₁

B_1,n₁(u)

f₁(F₁⁻¹(u)) − 1

√n₂

B_2,n₂(u) f₁(F₁⁻¹(u))

)

. (A.1)

Deﬁning ϵ(u) as

we then get (A.1) to be simply

Fˆ₂⁻¹(u) = µ + σ ˆF₁⁻¹(u) + ϵ(u)

which holds for 0≤ u ≤ 1.

The covariance for the error term ϵ(u) can be easily found. We have cov(ϵ(u), ϵ(v))

We have the covariance for a Brownian bridge from (2.26) and since the covariance of two independent Brownian bridges is zero then,

cov(ϵ(u), ϵ(v)) = σ²

where the elements in Σ equal the covariance between the errors at time u and v.

As stated previously there is a bias when the estimation is conducted using f₂(F₂⁻¹(u)) in σ rather than ¹_σf₁(F₁⁻¹(u)). Table A.1 shows the bias for σ, given by

σ_bias = σ− ¯ˆσ,

when the estimation is carried out using both f₂(F₂⁻¹(u)) and f₁(F₁⁻¹(u)) in Σ_⋆; the sample size is taken to be 250, F₁ and F₂ both come from a normal distribution, with

Table A.1: Bias results for σ ( µ , σ) f₁(F₁⁻¹(u)) f₂(F₂⁻¹(u))

(0,1) 0.0022 -0.0238 (1,1) 0.0016 -0.0242 (2,1) -0.0058 -0.0326 (0,2) -0.0041 -0.0514 (1,2) 0.0022 -0.0514 (2,2) 0.0071 -0.0457

(µ, σ) being equal to (0, 1) for the F₁ distribution, and (µ, σ) for the F₂ distribution shown in the Table.

The bias using f₂(F₂⁻¹(u)) is always negative and approximately 10 times larger than the bias when using f₁(F₁⁻¹(u)). This is due to trying to estimate f (F₂⁻¹(u)) when our response variable in the regression setup is F₂⁻¹(u). Thus this can be seen almost as a double estimation, causing bias.

A.2 Censored data

The following approximations for the product limit estimators are used from Hsieh [10], (6) and (7),

Fˆ₁⁻¹(u) = F₁⁻¹(u) + [1/(n₁.f₁(F₁⁻¹(u)))]K₁(u, n₁) + ϵ_n₁, (A.3)

and

Fˆ₂⁻¹(u) = F₂⁻¹(u) + [1/(n₂.f₂(F₂⁻¹(u)))]K₂(u, n₂) + ϵ_n₂. (A.4) K₁(u, n₁) and K₂(v, n₂) denote Kiefer processes, as discussed in Section 2.4.4. Com-bining (A.3) within (A.4) gives

Fˆ₂⁻¹(u) = µσ [

Fˆ₁⁻¹(u)− K₁(u, n₁) n₁.f₁(F₁⁻¹(u))

]

+ σ K₂(u, n₂) n₂.f₁(F₁⁻¹(u)).

As in the complete case there was the change from f₂(F₂⁻¹(u)) to _σ¹f₁(F₁⁻¹(u)) to avoid a bias in the results. This leads to

Fˆ₂⁻¹(u) = µ + σ ˆF₁⁻¹(u) + σ we have our regression setup as before with error term ϵ(t);

F˜₂⁻¹(t) = µ + σ ˜F₁⁻¹+ ϵ(t). (A.5)

The covariance matrix for the error terms is cov(ϵ(s), ϵ(t)) = cov Due to the two Kiefer processes being independent this leads to

cov(ϵ(s), ϵ(t)) = σ²

Using our deﬁnition of the covariance function for the Kiefer process in (2.28) gives

cov(ϵ(s), ϵ(t)) =

where the elements in Σ equal the covariance between the errors at time s and t.

Appendix B

Derivation of (5.2)

In Section 2.2 we deﬁned the function

q(x) = F₂⁻¹(F₁(x)) := ϕ(F₁, F₂) and its estimator

q(x) = ˆF₂⁻¹( ˆF₁(x)).

We wish to develop an expression for the process

√n_⋆(ˆq(x)− q(x)), x ∈ R¹

in terms of sums of independent random variables.

In order to avoid notational complications we deﬁne here, and only here, F ≡ F1 and G≡ F2,

which is the notation used by Potgieter [16]. We apply the functional delta method (see van der Vaart, [18], Theorem 20.8). For this we need to ﬁnd

ϕ^′_(F,G)= d

dtϕ((1− t)F + tδx, (1− t)G + tδy) t=0

. Towards this, let

F_t= (1− t)F + tδx (B.1)

and

Gt= (1− t)G + tδy (B.2)

and consider the identity

G_t(

G⁻¹_t F_t)

= F_t. (B.3)

Substituting (B.1) and (B.2) into (B.3) gives (1− t)G(

G⁻¹_t F_t)

+ tδ_y(

G⁻¹_t F_t)

= (1− t)F + tδx

and diﬀerentiation with respect to t gives

−F + δx=− G( Setting t = 0 and rearranging terms in (B.4) gives

d which leads to the ﬁrst order expansion

√n_⋆(ˆq(x)− q(x)) = n^−1/2∑

I{Xi ≤ x} − I{Yi ≤ q(x)}

g (q(x)) + o_p(1).

In document Quantile based estimation of treatment effects in censored data (Page 67-87)