interval-censored time-to-event data
2.4 computation
Use of standard methods to compute(θˆn, ˆΛn)is complicated by the size of the param-
eter space and constraints onΛ. The latter cannot be eliminated through transforma- tion, but can be expressed as a linear inequality. From Proposition2.6, log likn(θ,Λ)
is concave, so computation of ˆΛnreduces to quadratic programming (qp). Cheng et
al. (2011) recently applied qp to obtain Wellner and Y. Zhang’s (2007) semiparametric estimators from panel count data. They proposed jointly updating estimates forθ andΛusing Pan’s (1999) extension of the iterative convex minorant algorithm (Jong- bloed1998). The approach proposed here is similar, but the quadratic approximation is based on the relatively flexible Lagrangian framework of Dümbgen et al. (2006). 2.4.1 Parameter estimates
Letλj =Λ(tj), wheretjis the right-endpoint of the jth maximal intersection from
Definition2.4. Bya2the almost-sure constraintsW⊺λj ≥0 andW⊺Λ(tj) ≤W⊺Λ(tk),
j<k, amount to the inequalityAλ≥0, whereλ= (λ1⊺, . . . ,λ⊺d)⊺andAis the block
diagonal matrix A= ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ w 0 0 0 ⋯ 0 −w w 0 0 ⋯ 0 0 −w w 0 ⋯ 0 ⋯ 0 0 ⋯ 0 −w w ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ ,
withwas described in Remark2.2. In practice the minimumw0and maximumw1
2.4 computation
For brevity putϕ = (θ⊺,λ⊺)⊺and let log likn(ϕ) ≡ log likn(θ,λ). Following the
results of Section1.2.4, we specify a computational algorithm by an initial valueϕ(0),
a candidate stepη(r)= (η⊺
θ,η⊺λ)⊺, a line search findingϕ(r+1)∈seg(ϕ(r),ϕ(r)+η(r))
such that log likn(ϕ(r+1)) ≥log likn(ϕ(r)), and a stopping ruled(ϕ(r),ϕ(r+1)) <ε.
Applying the framework of Dümbgen et al. (2006, Section 3), the candidate step forλ(r),η(r)
λ , is based on a quadratic approximation. In particular
η(r)λ = arg max
ηλ∶A(ηλ+λ(r))≥0
∇λlog likn(ϕ(r))⊺ηλ+12η⊺λ∇2λlog likn(ϕ(r))ηλ (2.13) ≈arg max
λ∶Aλ≥0 log likn
(θ(r),λ) −log likn(ϕ(r)) −λ(r).
θ(r)is updated via the Newton-Raphson step
η(r)θ = − ∇2θlog likn(ϕ(r))−1∇θlog likn(ϕ(r)). (2.14)
Following Jongbloed (1998) overshoot is avoided using the step-halving line search, based on a variant of Armijo’s (1966) rule. It is given by
ϕ(r+1)=ϕ(r)+η(r)/2j, (2.15)
where jis the smallest nonnegative integer satisfying
log likn(ϕ(r)) −log likn(ϕ(r)+η(r)/2j) ≤α∇ϕlog likn(ϕ(r))⊺η(r)/2j.
Hereαis a fixed parameter set to some positive value less than the step factor: 0<
α<1/2. Its value can affect the number of iterations needed to achieve the stopping
rule, but is otherwise inconsequential (Fletcher1987, p. 30). 2.15 algorithm. Set r ∶= 0, θ(0) = 0 and λ(j0) = (tj/τ, 0⊺d
w−1)⊺. Let η(r) be the candidate step with components given by (2.14) and (2.13) andϕ(r+1)be the result of
the line search (2.15). If
∥ϕ(r+1)−ϕ(r)∥∞≤ε, (2.16)
for small positive valueε, then stop. Otherwise, putr∶=r+1. ◽
Convergence of Algorithm2.15to the maximum likelihood estimator follows from Propositions1.27and2.6. Alternative convergence criteria to (2.16) can be based on the characterization of the spmle implied by Proposition1.27:
∣∇ϕlog likn(ϕ(r))⊺ϕ(r)∣ ≤ε. (2.17)
Constrained Newton methods generally require many more iterations than the stan- dard Newton-Raphson algorithm. Computing time is largely determined by process- ing power and the software used to carry out qp. The c routines available with ibm’s (2012) cplex Optimization Studio offer a reasonably fast solution.
2.4.2 Variance estimates
The variance estimator for ˆθn given by (2.12) is based the curvature of the profile
log-likelihood. This requires repeated evaluation of the profile log-likelihood log plikn(θ) = sup
λ∶Aλ≥0log likn (θ,λ),
by fixingθ(r)atθ in Algorithm2.15. Since we need to approximate the only value
of the profile likelihood and not the profile maximizer, the stopping rule (2.16) is replaced by
∣1−log likn
(θ,λ(r+1))
log likn(θ,λ(r)) ∣ ≤ε.
This can reduce the computation time considerably since the log-likelihood often converges faster thanλ(r).
The tuning parameterρnin (2.12) determines the values around ˆθnused to assess
the curvature of the profile log-likelihood. Standard practice calls for a scalar value ρn ≂n−1/2with proportionality constant chosen empirically. Some informal experi-
mentation suggests that variance estimates are not highly sensitive to the choice ofρn,
particularly with larger sample sizes and frequent inspections. This also seems appar- ent in numerical studies from Zeng et al. (2006). However for the sake of convenience, a data-driven selection method is desirable. Borrowing methods from numerical dif- ferentiation we adopt the matrix form ofρnand reduce the choice to specifying broad
parameters describing the magnitude ofθ.
Let f ∶R→Rbe a continuously differentiable function. In the finite-difference
approximation
f′(x) ≈ f(x+ρ) − f(x)
ρ ,
it is standard practice to selectρ∼ √
єcurv(x), whereєis the error in evaluating f
and curv = √
f/f′′ is the “curvature scale” of f. This choice is a minimizer of the
truncation errorρ3f′′in the above first-order approximation, plus the “round-off”
errorє∣f(x)/ρ∣(Press et al.2007, Section 5.7). When little is known about f′′one
can simply setρ∼ √
єx or, forx close to zero, ρ∼
√
єsign(x)max(∣x∣, typx),
where typx is a typical absolute value forx (Dennis and Schnabel1996, p. 98). In (2.12) the curvature of the profile log-likelihood is evaluated with a second- order finite difference approximation. The corresponding curvature scale is based on the ratio of the profile log-likelihood and its third derivative, which can be evaluated