An empirical estimate of this probability using the continous version of the Kaplan-Meier (2.16) is
The integrals in (5.12) can be evaluated using numerical integration. An alter-native method is as follows. Set ˜S1( ˜X1,(i)) = η1,(i), i = 1, . . . , n1. These η1,(i)’s are not all distinct due to possibly tied data. Denote the distinct η1,(i)’s by ξ1,(j) with j = 1, . . . , n′1 where n′1 is the number of distinct values. Set ξ1,(0) = 1. The frequen-cies of these distinct ξ1,(j)’s are f1,(1), . . . , f1,(n′
1) and the accumulated frequencies are denoted by F1,(1), . . . , F1,(n′
Hence
This gives us an expression for ˆθ(u) which we can use in (5.10). The next step is to solve (5.10) for b.
We mentioned that appropriate values of u⋆ and u⋆ must be chosen. The obvious choices would be 0 and 1 respectively. However, if u⋆ is less than both ξ1,(1) and ξ2,(1) then (5.11) will be zero. Thus u⋆ should be chosen so that it is larger than ξ1,(1) or ξ2,(1). The choice of u⋆ is restricted by the estimator ˜S(.) because it is undefined after the last uncensored observation. Thus u⋆ should not exceed the minimum of ξ1,(n′
1)
and ξ2,(n⋆
2). That is
u⋆ ≥ max(ξ1,(1), ξ2,(1)) and u⋆ ≤ min(ξ1,(n′1), ξ2,(n′
2)) (5.14)
The integral in (5.10) can be evaluated by numerical integration, using (5.13) and the bounds given in (5.14). The solution in b of the equation can then be found numerically using, for instance, the Matlab function fzero.m.
Table 5.1: Uniform censoring distributions
Distribution of X1 Desired censoring % Censoring distribution
Weibull(1,0.5) 20% U(0,7.6)
40% U(0,2.2)
Weibull(1,1) 20% U(0,5.0)
40% U(0,2.2)
Weibull(1,1.5) 20% U(0,4.5)
40% U(0,2.2)
5.3 Simulation results
To test the accuracy of our approximations, Monte Carlo simulations were run. We used the same simulation setup as in Lu, Wells and Tiwari [15] and provide their re-sults in Table 5.3 for comparison. Weibull distributions with three sets of parameters were looked at for the distribution of X1. A weibull with parameters (1, 0.5) and (1, 1.5) as well as with parameters (1, 1) which corresponds to a standard exponential distribution. A total of 2500 simulations were run for each setup. The censoring distribution was uniformly distributed on an interval chosen to censor either 40% or 20% of the data. Table 5.1 gives more information. The density function for the Weibull, denoted Weibull(a, b), is given by
f (x; a, b) = ba−bxb−1exp (
−(x a
)b)
where x > 0.
The following test statistic was used,
T⋆ =
√ n1n2
n1+ n2 max
W | ˆS1(W )− ˆS2(W )|, (5.15) where W is the combined set of the uncensored values from X1 and X2. As mentioned previously, see Section 4.2.2, the Kaplan-Meier estimator is not defined past the last uncensored observation, thus W should not go past these bounds for either ˆS1(W ) or
Sˆ2(W ). Hence
W ={X1,(1), . . . , X1,(¯n1), X2,(1), . . . , X2,(¯n2)}.
where ¯n1 and ¯n2 were chosen as large as possible such that
X1,(¯n1) <= max(X1,(¯n1), X2,(¯n2)) and X2,¯n1 <= max(X1,(¯n1), X2,(¯n2)) held.
Critical values, denoted b⋆, were obtained by plugging the simulated data into (5.10) and computing b such that (5.10) held for a specific value of alpha, the required level of the test. This value was then compared against (5.15) to get our simulated significance level, α⋆, that is
α⋆ = 1 N
∑N i=1
I(Ti⋆ > b⋆i)
with N being the total number of simulations run, and the subscript i refer to the simulations.
There was one issue that came up in the smallest sample sizes of only 10 obser-vations. When the censoring was chosen to be 40%, that is, uniformly distributed on the interval [0, 2.2], there were occasions when the entire data set was censored. In Lu, Wells and Tiwari [15] page 1018, they did not specifically mention this problem but they had dealt with it by defining the Kaplan-Meier estimate to be zero past the largest observation regardless of whether it was censored or not. That is, they defined
Sˆi(t) = 0 for t≥ Xi,ni when δi,ni = 0
with i = 1, 2. We will also use this edited version for these low sample simulations.
Our results, given in Table 5.2, are fairly good for both the Weibull(1,1) and Weibull(1,1.5) survival distributions considering the extremely small sample sizes and high censoring percentages. The simulated values are close to the nominal values for the Weibull(1,1.5) for all sample sizes and censoring percentages. The difference
Table 5.2: Simulation Results, with uniform censoring Censoring % Level of test Survival Distribution K1 K2 n1 n2 0.1 0.05 0.01
Weibull(1,0.5) 0.40 0.40 10 15 0.18 0.10 0.02 0.40 0.40 15 20 0.21 0.11 0.02 0.40 0.40 20 25 0.25 0.14 0.04 0.20 0.40 10 15 0.13 0.06 0.01 0.20 0.40 15 20 0.14 0.08 0.01 0.20 0.40 20 25 0.17 0.09 0.02 Weibull(1,1) 0.40 0.40 10 15 0.13 0.07 0.01 0.40 0.40 15 20 0.14 0.07 0.01 0.40 0.40 20 25 0.13 0.06 0.01 0.20 0.40 10 15 0.10 0.05 0.01 0.20 0.40 15 20 0.11 0.05 0.01 0.20 0.40 20 25 0.09 0.04 0.01 Weibull(1,1.5) 0.40 0.40 10 15 0.11 0.06 0.01 0.40 0.40 15 20 0.11 0.05 0.01 0.40 0.40 20 25 0.09 0.04 0.01 0.20 0.40 10 15 0.09 0.04 0.01 0.20 0.40 15 20 0.09 0.05 0.01 0.20 0.40 20 25 0.08 0.04 0.01
Table 5.3: Simulation Results for a bootstrap method, with uniform censoring Censoring % Level of test
Survival Distribution K1 K2 n1 n2 0.1 0.05 0.01 Weibull(1,0.5) 0.40 0.40 10 15 0.11 0.05 0.01 0.40 0.40 15 20 0.09 0.05 0.01 0.40 0.40 20 25 0.10 0.05 0.01 0.20 0.40 10 15 0.09 0.05 0.01 0.20 0.40 15 20 0.11 0.05 0.01 0.20 0.40 20 25 0.10 0.05 0.01 Weibull(1,1) 0.40 0.40 10 15 0.09 0.05 0.01 0.40 0.40 15 20 0.10 0.05 0.01 0.40 0.40 20 25 0.10 0.05 0.01 0.20 0.40 10 15 0.09 0.05 0.01 0.20 0.40 15 20 0.10 0.05 0.01 0.20 0.40 20 25 0.10 0.05 0.01 Weibull(1,1.5) 0.40 0.40 10 15 0.09 0.05 0.01 0.40 0.40 15 20 0.10 0.05 0.01 0.40 0.40 20 25 0.10 0.05 0.01 0.20 0.40 10 15 0.09 0.05 0.01 0.20 0.40 15 20 0.10 0.05 0.01 0.20 0.40 20 25 0.10 0.05 0.01
being in general 0.01. For the Weibull(1,1) with 40% censoring for both groups the simulated values were noticeably higher than the nominal values. The discrepancy was about 0.03 and 0.02 for nominal values of 0.1 and 0.05 respectively. When the first group’s censoring percentage was 20% then the simulated values were much closer to the nominal values with the discrepancies now being around 0.01 for nominal values of 0.1 and 0.5.
The results for the Weibull(1,0.5) were not good with the simulated values being much larger than the nominal values. This is most severe with 40% censoring for both groups, with the simulated values being more than double the nominal values. There is a slight improvement when one group’s censoring was 20% but still the discrepancies are large, specifically for nominal values of 0.05 and 0.1. This was due to the high censoring, and when the simulations were done with more moderate censoring of 20%
and 10% they were vastly improved. This can be seen in Table 5.4. The simulated values are much closer to the nominal values, with the discrepancies generally being 0.01.
Lu, Wells and Tiwari’s [15] simulated values are close to the nominal values, with the difference not being more than 0.01 across all cases considered. Our simulation results only performed this well for a Weibull(1,1.5), and for a Weibull(1,1) when one censoring percentage was 20%.
In addition we looked at how good the approximations would be when the censor-ing came from an exponential distribution rather than a uniform distribution. This case was not considered in Lu, Wells and Tiwari [15]. The setup for the various cen-soring %’s and survival distributions is shown in Table 5.5 and the results are shown in Table 5.6. These results are an improvement over when the censoring was uniformly distributed. There is still an issue with the Weibull(1,0.5), though for exponential censoring this was only when there was 40% for both groups. For all other cases, the difference between simulated and nominal value did not exceed 0.02.
Table 5.4: Simulation Results for Weibull(1,0.5) with uniform censoring Censoring % Level of test Survival Distribution K1 K2 n1 n2 0.1 0.05 0.01
Weibull(1,0.5) 0.20 0.20 10 15 0.11 0.06 0.01 0.20 0.20 15 20 0.09 0.04 0.01 0.20 0.20 20 25 0.09 0.04 0.01 0.10 0.20 10 15 0.11 0.05 0.01 0.10 0.20 15 20 0.10 0.05 0.01 0.10 0.20 20 25 0.08 0.04 0.01
Table 5.5: Exponential censoring distributions
Distribution of X Desired censoring % Censoring distribution
Weibull(1,0.5) 20% exp(5.5)
40% exp(1.4)
Weibull(1,1) 20% exp(4)
40% exp(1.5)
Weibull(1,1.5) 20% exp(3.9)
40% exp(8.4)
Table 5.6: Simulation Results, with exponential censoring Simulations 2500 Censoring % Level of test Survival Distribution K1 K2 n1 n2 0.1 0.05 0.01
Weibull(1,0.5) 0.40 0.40 10 15 0.14 0.07 0.01 0.40 0.40 15 20 0.14 0.06 0.01 0.40 0.40 20 25 0.15 0.07 0.01 0.20 0.40 10 15 0.11 0.05 0.01 0.20 0.40 15 20 0.10 0.05 0.01 0.20 0.40 20 25 0.11 0.05 0.01 Weibull(1,1) 0.40 0.40 10 15 0.12 0.06 0.01 0.40 0.40 15 20 0.11 0.06 0.01 0.40 0.40 20 25 0.11 0.05 0.01 0.20 0.40 10 15 0.08 0.04 0.01 0.20 0.40 15 20 0.09 0.04 0.01 0.20 0.40 20 25 0.09 0.05 0.01 Weibull(1,1.5) 0.40 0.40 10 15 0.11 0.05 0.01 0.40 0.40 15 20 0.10 0.04 0.01 0.40 0.40 20 25 0.09 0.04 0.01 0.20 0.40 10 15 0.09 0.03 0.00 0.20 0.40 15 20 0.08 0.04 0.01 0.20 0.40 20 25 0.08 0.04 0.01
Epilogue
In this dissertation we have considered estimation of location and scale differences between two populations based on independent samples obtained from these. In the case of uncensored we have compared ordinary least squares estimation and a generalised least squares method. The same has been done in the case where the data may have been right censored. The comparisons made are based on theoretical calculations supported by extensive monte carlo simulations. We have also considered analytic method of estimating a quantile comparison function that does not involve use of a bootstrap methodology. Further work on these problems could center on analyzes for matched pair data. A difficultly in this last respect is the complications involved in constructing bivariate analogues of the Kaplan-Meier estimator.
Bibliography
[1] Breslow, N., Crowley, J. A Large Sample Study of the Life Table and Product Limit Estimates Under Random Censorship. The Annals of Statistics, Vol. 2, No. 3 (1974), pp 437-453
[2] Cs¨org˝o, M., R´ev´esz, P. Strong Approximations in Probability and Statistics.
New York: Acadamic Press, 1981
[3] Doksum, K. Empirical Probability Plots and Statistical Inference for Nonlinear Models in the Two-sample Case. The Annals of Statistics, Vol. 2, No. 2 (1974), pp 267-277l
[4] Doksum, K.A., Sievers, G.L. Plotting with confidence: Graphical comparisons of two populations. Biometrika, Vol. 63, No. 3 (1976), pp 421-434
[5] Doksum, K.A. Some graphical methods in statistics. A review and some exten-sions. Statistica Neerlandica, Vol. 31 (1977), pp 53-68
[6] Einmahl, J.H.J., McKeague, I.W. Confidence tubes for Multiple Quantile Plots via Empirical Likelihood. The Annals of Statistics, Vol. 27, No. 4 (1999), pp 1348-1367
[7] Hall, P., Lombard, F., Potgieter, C.J. A new approach to function-based hypoth-esis testing in location-scale families. To appear in Technometrics
[8] Hsieh, F. The Empirical Process Approach for Semiparametric Two-Sample Models with Heterogneous Treatment Effect. Journal of the Royal Statistical Society, Serious B (Methodological), Vol. 57, No. 4 (1995), pp 735-748
[9] Hsieh, F. A Transformation Model for Two Survival Curves: An Empirical Process Approach. Biometrika, Vol. 83, No. 3 (1996), pp 519-528
[10] Hsieh, F. Empirical Process Approach in a Two-Sample Location-Scale model with Censored Data. The Annals of Statistics, Vol. 24, No. 6 (1996) , pp 2705-2719
[11] Klein, J.P., Moeschberger, M.L. Survival Analysis: techniques for censored and truncated data. Second edition. Springer, New York, (2005)
[12] Lehmann, E.L. Statistical Methods based on Ranks. Holden Day, San Franciso, (1974)
[13] Li, G., Tiwari. R.C., Wells, M.T. Quantile Comparison Functions in Two-Sample Problems, With Application to Comparisons of Diagnostic Markers.
Journal of the American Statistical Association, Vol. 91, No. 434 (1996), pp 689-698
[14] Lombard, F. Nonparametric Confidence Bands for a Quantile Comparison Function. Technometrics, Vol. 47, No. 3 (2005), pp 364-369
[15] Lu, H.H.S., Wells, M.T., Tiwari, R.C. Inference for Shift Functions in the Two-Sample Problem with Right-Censored Data: With Applications. Journal of the American Statistical Association, Vol. 89, No. 427 (1994), pp 1017-1026
[16] Potgieter, C.J. Estimation and Testing of Linear Treatment Effects from Matched Pair Data. Masters of Science Dissertation, University of Johannes-burg, January (2007)
[17] Potgieter, C.J., Lombard, F. Nonparametric estimation of location and scale parameters. Computational Statistics and Data Anaylsis, 56, pp 4327-4337 [18] van der Vaart, A.W. Asymptotic Statistics. Cambridge University Press, New
York, (1998)
Appendices
Appendix A
Approximations for the empirical quantile function
A.1 Uncensored data
From Hsieh [8], (5) and (6), we obtain a strong approximation for the empirical quantile function and substituting these results into (3.1) gives
Fˆ2−1(u) = µ + σ ˆF1−1(u)− σ
√n1
B1,n1(u)
f1(F1−1(u)) + 1
√n2
B2,n2(u) f2(F2−1(u)),
where B2,n2(u) and B1,n1(u) are independent Brownian Bridges as set out in Section 2.4.1. A change from f2(F2−1(u)) is needed as when the estimation is conducted using f2(F2−1(u)) there is a bias involved in the estimation of σ. This will be shown later on in Table A.1. If we take (3.1) and derive with respect to u we get
1
f2(F2−1(u)) = σ f1(F1−1(u)) f2(F2−1(u) = 1
σf1(F1−1(u)).
Using this substitution in the above leads to Fˆ2−1(u) = µ + σ ˆF1−1(u)− σ
( 1
√n1
B1,n1(u)
f1(F1−1(u)) − 1
√n2
B2,n2(u) f1(F1−1(u))
)
. (A.1)
Defining ϵ(u) as
we then get (A.1) to be simply
Fˆ2−1(u) = µ + σ ˆF1−1(u) + ϵ(u)
which holds for 0≤ u ≤ 1.
The covariance for the error term ϵ(u) can be easily found. We have cov(ϵ(u), ϵ(v))
We have the covariance for a Brownian bridge from (2.26) and since the covariance of two independent Brownian bridges is zero then,
cov(ϵ(u), ϵ(v)) = σ2
where the elements in Σ equal the covariance between the errors at time u and v.
As stated previously there is a bias when the estimation is conducted using f2(F2−1(u)) in σ rather than 1σf1(F1−1(u)). Table A.1 shows the bias for σ, given by
ˆ
σbias = σ− ¯ˆσ,
when the estimation is carried out using both f2(F2−1(u)) and f1(F1−1(u)) in Σ⋆; the sample size is taken to be 250, F1 and F2 both come from a normal distribution, with
Table A.1: Bias results for σ ( µ , σ) f1(F1−1(u)) f2(F2−1(u))
(0,1) 0.0022 -0.0238 (1,1) 0.0016 -0.0242 (2,1) -0.0058 -0.0326 (0,2) -0.0041 -0.0514 (1,2) 0.0022 -0.0514 (2,2) 0.0071 -0.0457
(µ, σ) being equal to (0, 1) for the F1 distribution, and (µ, σ) for the F2 distribution shown in the Table.
The bias using f2(F2−1(u)) is always negative and approximately 10 times larger than the bias when using f1(F1−1(u)). This is due to trying to estimate f (F2−1(u)) when our response variable in the regression setup is F2−1(u). Thus this can be seen almost as a double estimation, causing bias.
A.2 Censored data
The following approximations for the product limit estimators are used from Hsieh [10], (6) and (7),
Fˆ1−1(u) = F1−1(u) + [1/(n1.f1(F1−1(u)))]K1(u, n1) + ϵn1, (A.3)
and
Fˆ2−1(u) = F2−1(u) + [1/(n2.f2(F2−1(u)))]K2(u, n2) + ϵn2. (A.4) K1(u, n1) and K2(v, n2) denote Kiefer processes, as discussed in Section 2.4.4. Com-bining (A.3) within (A.4) gives
Fˆ2−1(u) = µσ [
Fˆ1−1(u)− K1(u, n1) n1.f1(F1−1(u))
]
+ σ K2(u, n2) n2.f1(F1−1(u)).
As in the complete case there was the change from f2(F2−1(u)) to σ1f1(F1−1(u)) to avoid a bias in the results. This leads to
Fˆ2−1(u) = µ + σ ˆF1−1(u) + σ we have our regression setup as before with error term ϵ(t);
F˜2−1(t) = µ + σ ˜F1−1+ ϵ(t). (A.5)
The covariance matrix for the error terms is cov(ϵ(s), ϵ(t)) = cov Due to the two Kiefer processes being independent this leads to
cov(ϵ(s), ϵ(t)) = σ2
Using our definition of the covariance function for the Kiefer process in (2.28) gives
cov(ϵ(s), ϵ(t)) =
where the elements in Σ equal the covariance between the errors at time s and t.
Appendix B
Derivation of (5.2)
In Section 2.2 we defined the function
q(x) = F2−1(F1(x)) := ϕ(F1, F2) and its estimator
ˆ
q(x) = ˆF2−1( ˆF1(x)).
We wish to develop an expression for the process
√n⋆(ˆq(x)− q(x)), x ∈ R1
in terms of sums of independent random variables.
In order to avoid notational complications we define here, and only here, F ≡ F1 and G≡ F2,
which is the notation used by Potgieter [16]. We apply the functional delta method (see van der Vaart, [18], Theorem 20.8). For this we need to find
ϕ′(F,G)= d
dtϕ((1− t)F + tδx, (1− t)G + tδy) t=0
. Towards this, let
Ft= (1− t)F + tδx (B.1)
and
Gt= (1− t)G + tδy (B.2)
and consider the identity
Gt(
G−1t Ft)
= Ft. (B.3)
Substituting (B.1) and (B.2) into (B.3) gives (1− t)G(
G−1t Ft)
+ tδy(
G−1t Ft)
= (1− t)F + tδx
and differentiation with respect to t gives
−F + δx=− G( Setting t = 0 and rearranging terms in (B.4) gives
d which leads to the first order expansion
√n⋆(ˆq(x)− q(x)) = n−1/2∑
i
I{Xi ≤ x} − I{Yi ≤ q(x)}
g (q(x)) + op(1).