Survival Analysis - Statistical Consideration and Data Analysis

Chapter 2 – Methods

2.2 Cohort Study

2.2.7 Statistical Consideration and Data Analysis

2.2.7.3 Survival Analysis

Survival analyses were performed to assess primary and secondary endpoints. Survival analysis typically focuses on time to event data (31). In survival analysis, subjects are usually followed over a specified time period and the focus is on the time at which the event of interest occurs (31- 34). Observations are called censored when the information about their survival time is incomplete. Three reasons of censoring are: when a person does not experience the event before the study ends, when a person is lost to follow-up, and when a person withdraws from the study (35). A non- informative and random censoring is required to avoid bias in a survival analysis. The survival data has the following features: 1) the outcome variable, the time to a well-defined event and the status of the event; 2) censored observations, if the event of interest has not occurred at the time of data analysis; 3) the predictors or explanatory variables that could potentially influence the outcome variable (31).

The survival and hazard functions are key elements in survival analysis for describing the distribution of event times. The survival function S (t) is fundamental to a survival analysis. It gives the probability that a person survives longer than some specified time t and that the random variable T exceeds the specified time t (31-35). The hazard function h (t) gives the instantaneous potential per unit time for the event to occur, given the individual has survived up to time t whereas the hazard ratio is an estimate of the ratio of the hazard rate in the treated versus the control group (36). The hazard rate is the probability that if the event in question has not already occurred, it will occur in the next time interval, divided by the length of that interval. The time interval is made very short, so that in effect the hazard rate represents an instantaneous rate.

The hazard function – denoted by h (t) – can be estimated using the following equation:

h (t) = number of individuals experiencing an event in interval beginning at t/ (number of individuals surviving at time t) x (interval width).

There are three primary goals of survival analysis, to estimate and interpret survival and / or hazard functions from the survival data; to compare survival and / or hazard functions, and to assess the relationship of explanatory variables to survival time (31).

Unlike ordinary regression models, survival methods correctly incorporate information from both censored and uncensored observations in estimating important model parameters. There are three main approaches to analyze the relationship of a set of predictor variables with the survival time: nonparametric, parametric, and semi-parametric (31,33). Nonparametric methods provide simple and quick looks at the survival experience. The Kaplan-Meier method, a nonparametric estimator of the survival function, is widely used to estimate and graph survival probabilities as a function of time (37). Parametric methods assume that the underlying distribution of the survival times follows certain known probability distributions. Popular parametric methods include the exponential, Weibull, and lognormal distributions (31). The Cox regression model is a semi- parametric model which unlike parametric models, makes no assumptions about the shape of the so-called baseline hazard function. The Cox proportional hazards regression model remains the dominant survival analysis method to test for differences in survival times of two or more groups of interest, while adjusting for covariates of interest.

26 2.2.7.3.1 The Kaplan–Meier Method

Survival of the study cohorts were estimated by using the Kaplan-Meier (KM) method. The KM survival curve is defined as “the probability of surviving in a given length of time while considering time in many small intervals” (37). The KM estimate is also called as “product limit estimate”. In KM the time is divided into periods of fixed length and each period or segment is the interval between two non-simultaneous terminal events. In addition, in each segment, calculation is made of the probability of survival as the product of the probability of survival at the start of the interval and the probability of survival at the end of the interval – since the subject was alive at the start (31,33,37). In KM three assumptions are made: at any time subjects who are censored have the same survival prospects as those who continue to be followed, the survival probabilities are the same for subjects recruited early and late in the study, and the event happens at the time specified.

2.2.7.3.2 The Log-Rank Test

The survival distributions of different groups were compared by the log-rank test. The log-rank test is a form of chi-square test and is used to test the null hypothesis that there is no difference between the populations in the probability of an event at any time point (31,35,38). The analysis is based on the times of events. The log-rank test is based on the same assumptions as the Kaplan Meier survival curve that censoring is unrelated to prognosis and the survival probabilities are the same for subjects recruited early and late in the study, and the events happened at the times specified. The test is more likely to detect a difference between groups when the risk of an event is consistently greater for one group than another. The log-rank test is purely a test of significance and cannot provide an estimate of the size of the difference between the groups (38). Furthermore, the log-rank test cannot be used to explore and adjust for the effects of prognostic variables, such as age and disease duration, known to affect survival.

2.2.7.3.3 The Cox Proportional Hazards (PH) Model

There are several known variables that can affect the survival of patients with Stage IV CRC. These variables include age, comorbid illnesses, performance status, extent of cancer and systemic therapy (39-41). Furthermore, pretreatment hematologic abnormalities have been reported to have prognostic value in patients with solid tumors (42, 43). We performed multivariate analyses to determine the prognostic significance of the primary tumor resection in patients with Stage IV CRC. The Cox proportional hazard model was used and the hazard ratios and 95% confidence limit were estimated.

The Cox proportional hazards model is a popular mathematical model that is both powerful and flexible for the analysis of survival data (31,33,44,45). Cox regression is considered a ‘semi- parametric’ procedure because the baseline hazard function, h0 (t), does not have to be specified. In this model, the relative risk is described parametrically and the hazard functions non- parametrically. The model simultaneously explores the effects of several variables on survival and allows the researchers to isolate the effects of treatment from the effects of other variables. It provides an estimate of the hazard ratio and its confidence interval and may improve the estimate of treatment effect by narrowing the confidence interval.

The Cox proportional hazard model makes assumptions that the hazard ratios of two people are independent of time, and are valid only for time-independent covariates and that the hazard functions for any two individuals at any point in time are proportional (31). In other words, if a person is at risk of death at some initial time point that is twice as high as that of another person, then at all later times the risk of death remains twice as high.

Cox’s method is similar to multiple regression analysis, except that the dependent (Y) variable is the hazard function at a given time. If there are several explanatory (X) variables of interest such as age, gender, interventions, then the hazard or risk of dying at time t can be expressed as (31):

ℎ(𝑡𝑡) = ℎ0(𝑡𝑡)exp(𝛽𝛽𝑎𝑎𝑎𝑎𝑎𝑎× 𝑎𝑎𝑎𝑎𝑎𝑎 + 𝛽𝛽𝑎𝑎𝑎𝑎𝑔𝑔𝑔𝑔𝑎𝑎𝑔𝑔× 𝑎𝑎𝑎𝑎𝑔𝑔𝑔𝑔𝑎𝑎𝑔𝑔 + ⋯ + 𝛽𝛽𝑎𝑎𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔× 𝑎𝑎𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔) Taking natural logarithms of both sides:

ln ℎ(𝑡𝑡) = ln ℎ0(𝑡𝑡)exp(𝛽𝛽𝑎𝑎𝑎𝑎𝑎𝑎 × 𝑎𝑎𝑎𝑎𝑎𝑎 + 𝛽𝛽𝑎𝑎𝑎𝑎𝑔𝑔𝑔𝑔𝑎𝑎𝑔𝑔× 𝑎𝑎𝑎𝑎𝑔𝑔𝑔𝑔𝑎𝑎𝑔𝑔 + ⋯ + 𝛽𝛽𝑎𝑎𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔× 𝑎𝑎𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔) The quantity h0 (t) is the baseline or underlying hazard function and corresponds to the probability of death when all the explanatory variables are zero. The regression coefficients βage to βgroup give the proportional change that can be expected in the hazard, related to changes in the explanatory variables.

The proportional hazards assumption can be tested using graphical, goodness of fit test and time- dependent covariates (31). For example, with complementary log-log plot, if the hazards are proportional across the group, a plot of the logarithm of the negative logarithm of the estimated survivor function against the logarithm of survival time will yield parallel curves. Parameter (β) estimates in the Cox PH model are obtained by maximizing the partial likelihood (45). Cox and others have shown that this partial log-likelihood can be treated as an ordinary log-likelihood to derive valid (partial) maximum likelihoods of β (44,45).

In our cohort studies we examined the following variables with respect to their prognostic significance: Interventions: Resection of primary tumor, metastasectomy, use of chemotherapy, second generation chemotherapy, second-line therapy, third-line therapy, and radiation therapy; clinical & demographic variables: age, gender, major comorbid illness, secondary cancer, ECOG performance status, cancer center, and active smoking; laboratory values: albumin, bilirubin, alkaline phosphatase, sodium level, serum creatinine, blood urea nitrogen (BUN), hemoglobin, white blood cell (WBC), platelet count, and carcinoembryonic antigen (CEA); disease characteristics: site, grade, mucinous tumor, symptomatic disease, extra-hepatic metastases, and stage.

Following cutoffs were used to categorize continuous variables. age (<65 vs. ≥65) or (<70 vs. ≥70), albumin (≥36 vs. <36 g/l), bilirubin (≥26 vs. <26 um/l), alkaline phosphatase (≥120 vs. <120 mm/l), sodium level (≤135 mEq/l vs. >135 mEq/l), serum creatinine (≥120 vs. <120 um/l), BUN (≥8 vs. <8 mm/l), hemoglobin (≥120 vs. <120 g/l), WBC (≥11 vs. <11 x 109/l), platelet count (≥450 vs. <450 x109/l), and CEA (≥6 vs. <6 mcg/l). The categorical or ordinal variables were characterized as: site (colon vs. rectal), grade (3 vs. <3), and stage (Stage IVa vs. Stage IVb disease).

For the patients cohort that underwent surgical resection of the primary tumor , tumor-related characteristics including nodal status, T status, the ratio of metastatic to examined lymph node (LNR) [median number was used as a cutoff value] and number of lymph nodes removed (≥12 vs. r <12) were examined in a multivariate analysis. For the Cox proportional hazard model, the proportional hazards assumption was assessed for the variables using the log-log survival curves. All variables that were significant on univariate analysis with P<0.05, were examined in multivariate models. The likelihood ratio test and t test were used to determine if a variable correlates with survival in the model. Tests for interaction were performed for surgery and the other prognostic variables that were correlated with survival. In addition to the tests for interaction, secondary analyses were performed in subgroups of patients with asymptomatic or minimally symptomatic disease, patients who did not have metastasectomy, and patients who were treated with combination chemotherapy. A two-sided P-value of <0.05 was considered to be statistically significant. For missing data an imputation technique was used. The SPSS version 21-22 and the STATA MP version 13.1 (StataCorp College Station, TX) were used for statistical analysis (SPSS Inc. Chicago, IL).

In document Management of the Primary Tumor in Metastatic Colorectal Cancer (Page 43-46)