An essential part of treatment plan assessment is to check whether the present dose distribution exceeds tolerance doses of affected OARs to avoid severe side effects, see section 2.2.5. These tolerance doses are results of clinical experience and information about the dose to different organs, first summarised by Emami et al. (1991). This seminal work has been further developed by numerous studies investigating the relationship between dose distribution and side effects. These dose-response relationships can be expressed using various NTCP models. Some of the different types of NTCP models are presented below.
2.4.1 Types of NTCP models
NTCP models aim to predict the probability of complications based on the dose distribution in associated irradiated organs. For this, the real three-dimensional dose distribution is reduced to a few simple metrics. Different methods for modelling clinical outcome data of retrospective patient cohorts and their dose distributions are described in the following (Gulliford, 2015).
DVH-reduction models Based on the data published by Emami et al. (1991), the empirical Lyman-Kutcher-Burman model (LKB model) was developed that describes the dose-response as a function of irradiated volume, reduces DVH to a single metric and estimates model parameters for specific OARs (Lyman, 1985; Kutcher and Burman, 1989; Burman et al., 1991; Kutcher et al., 1991). The model parameters are TD50, m and n. The parameter TD50(V ) is the tolerance
dose for uniform irradiation of a partial volume V of an OAR at which 50 % of patients are likely to experience toxicity. The parameter m represents the slope at the steepest part of the dose- response curve, see figure 2.6. n indicates the volume effect of the investigated OAR (Gulliford et al., 2012b). Serially structured organs such as the spinal cord are described by n ≈0, while parallel organs are characterised by n≈1. Taking fractional irradiation into account, the normal
2.4 Normal tissue complication probability models
tissue complication probability for an LKB model NTCPLKBfor a uniform dose D to a volume V of
an OAR is given by NTCPLKB = 1 √ 2π ˆ t −∞ exp (︄ −u2 2 )︄ du, (2.9) where t = D−TD50(V ) m·TD50(V ) , (2.10) TD50(V ) = TD50(VOAR) Vn , (2.11)
where VOAR represents the entire volume of the considered OAR.
However, dose distributions to OARs are actually non-uniform. The inhomogeneous dose dis- tribution can be reduced to a single metric that produces the same probability of a given side effect as a corresponding uniform dose distribution. Such a metric is the widely used generalised equivalent uniform dose (gEUD) given by
gEUD = ⎛ ⎝ ∑︂ i ViDai ⎞ ⎠ 1/a , (2.12)
where Diis the dose defined for each bin i in a differential DVH, see figure 2.5B. Viis the volume
in a dose bin i and a is a volume parameter that is equivalent to1/n. This calculated uniform dose
can then be applied as D = gEUD in the LKB model in equation (2.10).
Tissue-architecture models Models have been developed that consider the functional archi- tecture of the tissue by introducing functional subunits of an OAR. These can be anatomical substructures, such as nephrons of the kidney, or the largest cell group that still functions as long as it comprises a surviving clonogen (Gulliford, 2015). These functional subunits can be arranged in serial or parallel order, or a combination of both. In parallel organs, functional sub- units are performing rather independently so that side effects occur after the irradiated volume exceeds a critical value. Side effects that arise from irradiation of parallel organs rather depend on the mean dose deposited in these organs (e.g. liver, lung or kidney). This effect is described in an LKB model with a parameter n ≈ 1. Serially structured organs, such as the spinal cord, lose their entire organ function when a small area is irradiated with high doses. An LKB model characterises such an OAR with the parameter n≈0.
Källman et al. (1992) suggested the relative seriality model and Niemierko and Goitein (1993) presented a critical volume model based on the assumption that NTCP is fully determined by the number of surviving functional subunits of an OAR.
2 Theoretical background
Multiple-metric models The above-mentioned models predict the complication probability for one specific side effect based on the dose to a corresponding OAR. However, some complica- tions are affected by the irradiation of different OARs, e.g. swallowing dysfunction following the irradiation of superior pharyngeal constrictor muscle and the supraglottic larynx (Christianen et al., 2012) or heart valvular dysfunction by the irradiation of heart and lung (Cella et al., 2014). To correct for this in LKB models, an interaction gEUD variable for both OARs can be introduced (Cella et al., 2014). Nevertheless, side effects may also be modified by dose-independent clinical parameters, such as age or radiation technique (Christianen et al., 2012). Multivariable logis- tic regression models are appropriate to include clinical and dosimetric parameters. They are defined by NTCPLogistic = (︂1 + e−g(x))︂−1, (2.13) where g(x ) = β0+ p ∑︂ i=1 βixi, (2.14)
whereβi denote model coefficients and xi are p individual explanatory variables.
A probability of 50 % for the occurrence of the considered side effect occurs if the condition
g(x ) =β0+ p ∑︂
i=1
βixi, = 0 (2.15)
is met. For univariable models including a single dosimetric parameter as a predictor, TD50 is calculated by
TD50=−β0
β1
. (2.16)
Models considering the development over time For the analysis of longitudinal toxicity data with repeated measures, as usual in patient follow-up, a generalised estimating equation (GEE) approach can be applied (Liang and Zeger, 1986). For a cohort consisting of N patients, each patient is seen for T follow-up visits. This number of visits T may differ between patients (Agresti, 2002). An observation for patient i ∈[1, N] at follow-up visit t can be expressed by yit. Thus, all observations of this patient over time form the vector Yi = (︁
yi1, ..., yit, ..., yiT)︁⊺
, t ∈[1, T ]. The K corresponding explanatory variables for patient i at time t are combined in the vectorxit. The
value of variable k is noted with xitk, k ∈ [0, K ]. For k = 0, xit = 1 for all patients. As the side effects are measured repeatedly for each patient at different follow-up visits, these observations are assumed to be correlated, but independent across different patients (Agresti, 2002; Wang, 2014). An appropriate link function g(.) relates the explanatory variables to the expected value E (yit) =µit via
g(µit) =x⊺itβ, (2.17)
2.4 Normal tissue complication probability models
where βis an unknown vector containing all K regression coefficients (Samur et al., 2014). For the evaluation of side effects, the severity grades are commonly dichotomised, e.g. grade<2 vs grade≥2. For such binary outcome data, yit is either 0 or 1. In this case, the logit function may be an appropriate link function (Samur et al., 2014).
The regression coefficients βk can be estimated by the solution (e.g. using a maximum likeli-
hood approach) of the GEE
N ∑︂ i=1 ∂µi ∂βk Vi−1(Yi− µi) = 0, k ∈[0, K ] , (2.18)
where Vi indicates the variance-covariance matrix for Yiof patient i. It is expressed by
Vi =ϕA 1 2 i Ri(α)A 1 2 i , (2.19)
where Ai is a T ×T diagonal matrix with the variance of Yi as the t-th diagonal element (Liang
and Zeger, 1986), ϕis a scale parameter depending on the distributions of outcomes and Ri(α) indexed by a vector of association parametersαrepresents the working correlation matrix of size T ×T . The working correlation matrix models the dependence of each observation with other observations for the same patient. The elements of this matrix are the correlations between the longitudinal observations within a patient. Commonly used working correlation structures for GEE models are for example independent, autoregressive or unstructured (Wang, 2014). The advantage of GEE models is that consistent estimates of the parameters can be obtained even if the working correlation matrix is incorrectly specified (Samur et al., 2014).
2.4.2 Endpoint definition and parameter fitting
Data on side effects should be prospectively collected in clinical trials according to standardised tests or grading systems for a defined follow-up duration, see section 2.3. For modelling, these grades or scores may then be dichotomised according to severity (e.g. grade<2 vs grade≥2). During univariable modelling, the eligible parameters are preselected. A method for fitting the model parameters is maximum likelihood estimation (Cella et al., 2014; Gulliford, 2015). To find the optimal coefficients, the agreement of the predicted outcome with the observed outcome is maximised. For multivariable logistic regression models, the individual variables should not be strongly correlated. This should be considered when selecting the appropriate model predictors. DVH parameters are often correlated within a given cohort that may lead to problems with multi- collinearity (Bentzen et al., 2010). Principle component analysis may be a method to overcome this problem (Dawson et al., 2005). However, the principal components consist of various dosi- metric parameters, which make them appear abstract and are therefore less practical in clinical routine (e.g. in treatment planning). Cross-validation and bootstrapping ensure generalisability (Gulliford, 2015).
2 Theoretical background
2.4.3 Assessment of model performance
Model performance can be assessed in terms of discrimination and calibration. Discrimination refers to the ability to separate patients who do or do not develop a given side effect. Calibration compares the predicted and observed outcome (Bentzen et al., 2010).
The discriminating ability of NTCP models can be assessed using receiver-operating charac- teristic (ROC) analysis by calculating sensitivity and specificity (Gulliford, 2015). Sensitivity and specificity are calculated from the contingency table of the predicted and observed outcome for each possible cut-off of the continuous model variable. The ROC curve is a plot of sensitivity against 1-specificity. The area under the receiver operating characteristic curve (AUC) is a met- ric that ranges from 0 to 1. A value of 0.5 represents a random prediction, a value of 1 a perfect prediction. In the medical field, AUC values do rarely exceed a value of 0.8 (Gulliford, 2015).
In a calibration plot, the observed outcome is plotted against the predicted NTCP values. For perfectly calibrated models, the data points are aligned on the quadrant bisector. Calibration is important if the models shall predict exact complication probabilities, e.g. for comparison between different treatment plans.
2.4.4 Model validation
In this thesis, NTCP models are developed on a training cohort and internally validated using cross-validation. Model performance has to be checked by applying the final NTCP model with unchanged model coefficients on an external patient cohort and calculating AUC and the calibra- tion plot. The Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) Initiative published recommendations for the reporting of development and validation of prediction models (Moons et al., 2015). The classification of different analysis types is pre- sented in appendix A table A.3. Testing on an independent dataset from another institution (ex- ternal validation) may reduce model performance due to different scoring of side effects, patient demographics and comorbidities or treatment strategies (Bentzen et al., 2010).