We analyzed the ARIC Study data using the proposed model. As potential risk
factors, we considered baseline characteristics of participants such as race, gender,
hypertension, parents diabetes history, age, BMI, FPG, HDL cholesterol, and total
cholesterol. These variables are regarded as major factors associated with diabetes
(Duncan et al. 2003).
Both the College of American Pathologists (CAP) and the Clinical Laboratory Im-
provement Amendments of 1988 determined the total allowable error, 10% for glucose,
and generally laboratories are well within the total allowable error (Schwartz et al.
2005). Hence we chose
σ
2=
0.3
2, corresponding to 0.09 ratio of measurement error
variance to the standardized FPG variance.
In order to facilitate calculations, we standardized FPG values as well as baseline
continuous covariates to have mean zero and unit variance. Also, the observation time
was scaled to (0,1]. FPG values below 75mg/dLwere winsorized to reduce the influence
of outliers in the lower tail of the distribution because our interest is in crossing a
threshold towards the upper end of the distribution. The number of subjects having
FPG values below 75mg/dLis 204 (1.7%). In winsorization of FPG at 70 or 65mg/dL,
estimates of parameters for continuous covariates are practically unchanged, whereas
estimates for those of discrete covariates changed slightly. However, the statistical
significance of the risk factors for diabetes remained unchanged.
Although we proposed the variance estimator in Section3.3.3, we adopted the profile
likelihood function as described in Section3.3.3
to estimate the asymptotic variance of
the estimators. For each regression parameter, we computed the log-likelihood function
as its profile likelihood function in the way described in section
3.3.3. The estimated
profile likelihood functions appeared to be unimodal, and we obtained the MLE from
the profile likelihood for age, which has the maximum value among all the profile
likelihood functions (Table
3.3). Then we numerically calculated the inverse of the
negative curvature of the profile likelihood function for each parameter at the MLE.
For comparison, we considered two semiparametric models using the fixed threshold
of 126mg/dL, a naive method and Pan (1999)’s method, both ignoring the measurement
error in the observed FPG value; for the naive method, we treated the interval-censored
data as right-censored data and applied the Cox proportional hazard model. When the
FPG value is above the threshold at the next visit after baseline, the event time is
set to be the mid-point between baseline and the next visit. Otherwise, data are
regarded as right-censored at the visit after baseline. Pan (1999)’s method modified
the iterative convex minorant algorithm as a generalized gradient projection method,
and the algorithm can be implemented using the R-package, INTCOX, developed by
Henschel et al. (2007)(referred to as the ICM method hereafter). In the ICM method,
we treat the visit times as current status data for time to diabetes. These two methods
ignore measurement error in FPG values and use the fixed threshold of 126
mg/dLfor
the counting processes. Moreover, the naive method improperly accounts for interval
censoring. The ICM method does not provide standard errors, so we used the simple
bootstrap sampling method with 200 replications to estimate the standard errors of the
regression parameters.
The ICM method yielded similar results to the naive method for most risk factors,
but the effect size and significance of gender and age are different; from the ICM
method, it is found that African-Americans, people with parental history of diabetes,
older age, higher BMI and FPG, and lower HDL cholesterol have significantly higher risk
of diabetes than people with the opposite characteristics. Our method found additional
significant factors for diabetes such as hypertension and higher total cholesterol. The
factors associated with diabetes found from the proposed model agree with the generally
known factors (Mokdad et al. 2003). Total cholesterol consists of HDL cholesterol, LDL
cholesterol, and triglycerides. It is known that higher LDL cholesterol and triglycerides
and lower HDL cholesterol increase diabetes risk, and high total cholesterol plays a
more critical role as a diabetes risk indicator than low HDL cholesterol. We gain less
biased and more precise risk estimates from the proposed model.
In the four US communities, African-Americans, males, people with hypertension,
and people with parents diabetes history have 1.20, 1.45, 1.31, and 1.42 times greater
hazard of diabetes than whites, females, people with hypotension or normal blood
pressure, and people without parental diabetes history, respectively. When baseline
BMI, FPG, and total cholesterol increase by 1 unit, and HDL cholesterol decreases by
1 unit, where the unit is on the original measurement scale, then the hazard of diabetes
increases by a factor of 0.024, 0.052, 0.001, and 0.002, respectively. To investigate
the goodness-of-fit of our model, we used the log-likelihood ratio test to compare the
model in (3.4) with the model with no measurement error, and the test based on
the mixture chi-square distribution shows that there is significant measurement error
(p<0.001). In addition, we generated the predicted glucose values and the empirical
marginal distribution for the predicted values using the density formula in (3.10) and
the formula in (3.4) based on the estimates, respectively. We compared the histogram
of the observed FPG values with the empirical marginal distribution for the model-fit
(left in Figure
3.2). The marginal distribution in Figure
3.2
shows better fit to the
observed distribution of FPG values than the generalized extreme value distribution in
the mode of the distribution. Using the predicted values, we suggest another graphical
method for model diagnosis, a residual plot, subtracting the predicted means from the
real observed glucose values (right in Figure3.2). The residual plot of Figure3.2shows
a fairly good fit and the residuals are randomly scattered around 0.
However, in the residual plot, we observe that there are 52 observations with rel-
atively large residuals. The observations with large residuals are above the 99.7%
quantiles of FPG values (inter-quartile ranges: 179–246
mg/dL) and their observation
times are relatively early. As a sensitivity analysis, we reanalyzed the data excluding
those observations to investigate the influence of these observations on the result. After
excluding the observations, the estimated hazard ratios barely changed and significance
of the factors remains unchanged.
In document
Hyun_unc_0153D_14679.pdf
(Page 60-63)