Weights Estimation - Application of novel statistical methods for biomarker selection to HIV in

4.3. Application

4.4.1. Weights Estimation

To estimate the weights, we fitted four different pooled logistic regression models. From the first two models, we generated the predicted probabilities required to compute the stabilized weights associated with the biomarker exposure. The numerator of the weights was obtained from an-intercept only regression model with biomarker (A) as the dependent variable, while estimates for the denominator came from a multivariate regression model with the same covariates listed above plus a smooth function of study duration represented by natural cubic splines with 4 knots at the 5th_{, 35}th_{, 65}th_{, and 95}th_{percentiles (Harrel, 2001). The use of the cubic}

splines in lieu of a linear term relaxed the dependency on the strong linearity assumption with regards to duration of follow-up while allowing for time-varying hazards.

Two additional pooled logistic models were used for estimating the censoring weights. All the covariates listed above, including the cubic splines, were used in the estimation of the numerator. For the denominator, the biomarker exposure variable of interest was included as an additional regressor.

For each subject i at each time point j, we subsequently computed an overall weight that was the product of stabilized weights obtained from the biomarker exposure model and the censoring mechanism. These weights were then entered into the Cox model to generate the adjusted estimates of the variable importance measure for each biomarker. This process was followed separately for each biomarker There were then k estimates of of variable importance measures and k p-values denoted as P1, P2,…, Pk. We adjusted the p-

values for multiplicity testing to control the false discovery rate. The Benjamini and Yekutieli (2001) False Discovery Rate (FDR) controlling procedure was used to account for the dependence of the test statistics. Significance for each biomarker was assessed by comparing related adjusted p-value to the 0.05 alpha level. A lower p-value denoted a better measure of importance.

4.4.2. Results

In the GS study, the median age at enrollment was 27 years. Women from Zimbabwe accounted for 58.5% of the population while those from Uganda made up the remaining 41.5%. About 8% of all study subjects had at least two sex partners while 14% had a STI history, 8% were breastfeeding, and 45% displayed STI symptoms. These women averaged 11.2 (Standard Deviation [SD] = 15.5) sex acts

per month, but only 35% of them reported consistent condom use. On average, the partners of these women spent 10 (SD=15.2) nights away from home, and 75% of those partners had been reported to have had sex with another woman in the three months prior to enrollment in the study. Finally, 59% of the subjects’ partners met the study definition for primary partner risk, a composite variable that included having a partner with HIV, urethral discharge, weight loss, nights spent away from home, or a history of sex with female sex workers.

Table 7 presents the distribution of the stabilized weights. Overall, the mean for the weights computed for each biomarker was clustered around 1, which is a desired result. All biomarkers displayed small variability in the weights.

For each biomarker, we derived estimates of 5-year cumulative probability of survival with corresponding 2-sided 95% confidence intervals, using the Kaplan Meier estimator (Kaplan and Meier, 1958). In this context, survival was defined as the probability of not experiencing the event of interest in the first 5 years since estimated infection date. For the purpose of survival curves estimation, all

biomarkers measured on a continuous scale were dichotomized based on meaningful clinical values suggested in literature (Table 8). In short, having the following

biomarker characteristics was associated with a 5-year cumulative probability of survival 40%: CD4 at baseline 500 cells/mm3_{(35% survival rate), Lymphocyte}

count < 1200 cells/mm3 _{(35%), and CD4 Percentage} _{20 % (30%).}

We computed both weighted and un-weighted estimates of the importance of each of the 11 biomarkers under consideration (Tables 9 and 10). The un-weighted estimates were obtained from a standard Cox proportional hazards model with the

100

following covariates: age, country, primary partner risk, STI history, having more than one sex partner, number of coital acts in previous 3 months, frequency of nights away from home by study subject’s partner, condom use consistency, study subject’s partner’s sexual behavior and risk, and breastfeeding. The results in Table 9 suggest that, when the standard Cox model was used, the following biomarker variables had a significant impact on CD4 cell counts: Baseline CD4 Cell count, CD4/CD8 T-cell Ratio, Plasma Viral Load, and Lymphocyte Count.

In the weighted analysis (Table 10), the same four biomarkers were found to exert a significant impact on the time to the second successive drop of CD4 cell counts below the threshold of 350 cells/mm3_{. Based on the magnitude of the p-}

values, the most important biomarkers were baseline CD4 Cell count, CD4

Percentage, CD4/CD8 Ratio, Lymphocyte Count and HIV Subtype A. Note that, in the pseudo-population created by the weights, HIV subtype A only reached

borderline statistical significance.

Evidence from both tables 10 and 11 suggests that a lower hazard (better survival) was associated with increases in Baseline CD4 Cell count, CD4 Percentage, CD4/CD8 Ratio, Lymphocyte Count, hemoglobin levels, and with being of HIV Subtype A. For these biomarkers, the log hazard ratio is negative. Conversely, a higher hazard (lower survival) was linked to increases in Plasma Viral Load (Log10/mL), in CD8 cell counts, and with being of HSV-2 positive or of HIV subtype C or D (positive log hazards ratio).

101

In document Application of novel statistical methods for biomarker selection to HIV infection data (Page 110-114)