Statistical analyses

Overview of analytical approach

All analyses were performed using SAS version 9.3 (SAS Institute, Inc., Cary, North Carolina). We primarily used a complete case analysis for the outcome variables, and assessed frequency and pattern of missing independent variables. Candidate variables were eliminated if their distributions were too narrow to be meaningfully predictive or they had a substantial proportion of missing values (> 20%). Initial descriptive statistics and plots were generated. Means and percentages for each outcome variable were assessed in bivariate and stratified analyses.

Study hypotheses were evaluated using generalized linear models (GLMs) fit with the SAS PROC GENMOD procedure to examine associations between oral health measures and cognitive function. GLMs are flexible and well suited to a wide range of situations, including traditional general linear regression. For a normally distributed continuous variable (e.g., cognitive test scores), we used least-squares linear regression, for which the “identity” link function was specified. In addition, we used an extension of GLMs, marginal generalized estimating equations (GEE) models, for a repeated

1_{Friedewald formula: LDL = TC – HDL- TG/0.5 (mg/dL)}

outcome (cognitive function scores), where a within-subjects correlation was taken into account. In this case, the GENMOD procedure offers a “REPEATED” statement to analyze such correlated outcomes data.

DAGs and the change-in-estimate procedure were used for adjustment-variable selection. We selected explanatory variables based on published studies and biological plausibility of the

relationships. DAGs showed the selected covariates and their relationships with the exposures and outcomes in the causal paths. We also used results of bivariate, stratified, and collinearity analyses to guide the selection of variables for model building.

Descriptive statistics

Descriptive analyses were initially performed on selected variables to study data distributions as well as identify missing values, impossible values, and outliers for outcomes, exposures, and covariates. We used PROC UNIVARIATE to examine mean, median, mode, standard deviation, range, skewness, and kurtosis for each continuous variable (e.g., cognitive score, number of teeth, and levels of inflammatory markers). In addition, data distributions were visualizedusing graphical displays such as box plots and scatterplots. Frequency distributions were evaluated using PROC FREQ for ordinal and categorical variables (e.g., periodontal disease status, CHD, and stroke). The number, percentage, and type of missing data were recorded for each variable.

Bivariate and multivariate analyses

We conducted bivariate analyses to investigate the nature and strength of the following associations: (a) between explanatory variables and both outcomes, and (b) among explanatory variables. These analyses helped to identify potential cofounders, effect modifiers, and collinearity for regression model building. First, we calculated a crude estimate of the association between the

Second, we examined stratum-specific estimates for the presumed effect modifier (diabetes mellitus). If the stratified estimates were substantively different from each other and a likelihood ratio test for homogeneity suggests strong effect modification (p < 0.10), then we included an interaction term between the effect modifier and the exposure in the regression model. Next, we determined whether a covariate was a potential confounder by assessing a magnitude and the 95% confidence interval along with the p-value between the covariate and the exposure or outcome. Then, we compared magnitude and precision of the estimates for the exposure-outcome association with and without adjustments for that covariate. If the estimate adjusted for a selected variable differed from the crude estimate by more than 10%2_{, we considered that covariate to be a potential confounder.}

Collinearity and multicollinearity among continuous and ordinal explanatory variables were investigated using PROC REG with options VIF, TOL, and COLLIN in SAS. For nominal or categorical independent variables, the same approach was applied. We created dummy variables for each category and use our outcome of interest as the dependent variable in the linear regression. We examined tolerance3 and variance inflation factor (VIF) for each variable. The VIF is 1/Tolerance; it is the number of times the variance of the corresponding parameter estimate is increased due to multicollinearity as compared to as it would be if there is no mulitcollinearity. Values of VIF exceeding 10 suggest multicollinearity. If the variables were correlated covariates, coding choices were reconsidered or one variable may be dropped from the analyses.

Analysis plan for specific aim 1

We estimated associations of the following oral health measures with cognitive function: a. Dental status

b. Tooth loss

2_{The change in estimate is the absolute difference between the crude estimate and the adjusted estimate divided by the adjusted estimate:}

(βcrude - βadjusted) x 100. A priori criterion for a potential confounder is that the change in estimate is greater than 10%.

3_{For each independent variable, tolerance = 1-Rsq, where Rsq is the coefficient of determination for the regression of that variable on all}

45 c. Periodontal disease classified by BGI

The conceptual basis for these analyses was that poor oral health can be predictive of low cognitive performance.The main hypothesis was that persons who are edentulous, have lost most of natural teeth, or have severe periodontal disease have lower cognitive scores. We also investigated whether diabetes mellitus was an important modifier of the relationship between each predictor and cognitive function. Separate regression models were applied for each outcome-exposure association. Independent variables included an ordinal measure of periodontal disease (five levels forBGI index), continuous measures of remaining teeth, and a binary measure of dental status (edentulous or dentate). Dependent variables were continuous measures of three cognitive functions. Thus, linear regression was used to analyze the data. The linear trend across the periodontal status categories was tested. First, we evaluated the primary effect for each main outcome, adjusting for age, gender, race- center, education, and income. Second, we developed a fully adjusted model by including all potential confounders and effect measure modifiers based on DAGs and bivariate analyses. A reduced model was also created using the minimally sufficient set of covariates for adjustment used in the multiple analyses included age, race, gender, study sites, education, income, smoking, alcohol use, and diabetes. Covariates in fully adjusted models consisted of variables from the reduced model and variables that were significantly associated with exposures or outcomes, namely body mass index (BMI), hyperlipidemia, hypertension, and APOE ε4. Significance of effect modification by diabetes

was assessed with likelihood ratio tests by comparing the -2log-likelihood of the fully adjusted model (i.e. model included the interaction term between the exposure and diabetes) with that of the nested model (i.e. model excluded the interaction term). The interaction term was removed if its p-value was greater than 0.10. Third, the change-in-estimate approach was used to select the final model.

Regression coefficients for oral health measures (i.e., dental status, the number of teeth, and BGI) from the reduced model and those from the fully adjusted model were compared. If the change-in- estimate was less than 10% or ± 0.1, the more parsimonious model or the reduced model was selected.

46 Analysis plan for specific aim 2

We used data from two ARIC sites to estimate associations of oral health status measures (e.g.,BGI index, tooth loss, and complete tooth loss) with the eight-year change in cognitive function between ARIC Visit 4:1996-1998 ("baseline") and 2004-2006 ("follow-up"). A longitudinal analysis using the GEE method was applied for the cognitive function measures. The dependent variables were continuous measures of three cognitive function scores (DSS, DWR, and WF). We created an indicator variable, namely time (t), to identify whether the scores represented a baseline or follow-up measurement. We assumed unstructured correlation for all pairs of within-subject outcomes (i.e., cognitive score); therefore, working correlation matrix as “unstructured” was specified for the correlated responses in the analyses. In the model, the main effect of time indicated if the cognitive scores significantly changed over the 8-year follow-up. The interaction term between time and the exposure was added to determine whether there was a significant difference in the change of cognitive scores between participants who had poor oral health and those who did not.

Above is the fully adjusted model where Yitis an observation for subject i at time t; β0 is the

intercept; x is the oral health measure; β1 is the regression coefficient for the oral health measure x; β2

is the regression coefficient for time; x* t is an interaction term between Visit 4 oral health measure

and time; β3 is the regression coefficient for the interaction term; Gim are the values of the m time-

independent covariates for subject i; β4m are the regression coefficients for the time-independent covariates; M is the number of time-independent covariates; and εit is the error for subject i at time t.

Since GEE fits models using a quasi-likelihood method, not a maximum likelihood, “Quasi- likelihood under the independence model information criterion” (QIC) was used to assess the model fit. A smaller QIC suggests a better fit of the model.

In document 5589.pdf (Page 60-65)