Methodological Decisions to Derive Dietary Patterns

CHAPTER 2 Estimating Dietary Patterns: Comparing Factor Analysis

2.3 Methodological Decisions to Derive Dietary Patterns

One goal of deriving dietary patterns (DP) is to study the effects of overall diet on health outcomes as opposed to the effects of individual nutrients or foods, which maybe highly correlated (Hu, FB, 2002). DP are latent variables; while not directly observed, they can be measured with a dietary intake instrument and empirically derived using statistical methods. The predominant methods to derive them are factor and cluster analysis (Newby, P.K., 2004). Fahey MT et al. (2007) showed empirically that generalized latent class models (LCM) could be more appropriate to derive DP than traditional cluster analysis by allowing different outcome distributions, correlated measurement errors, and adjustment for energy intake and other covariates. Few studies (Bailey,R.L. 2006; Costacou,T. 2003; McCann S.E. 2001; Newby P.K. 2004; Velie E.M. 2005) have compared different methodologies using the same data. Only one other study (Padmadas S.S. 2006) has used latent class models to derive DP, and none have compared them to traditional methods. The aim of this paper is to compare subjects’ classification into DP using factor and latent class analysis. Methods will be illustrated using data from the third cohort of the Pregnancy, Infection and Nutrition (PIN) Study.

2.3 Methodological Decisions to Derive Dietary Patterns

Nutritional epidemiologists have considered DP in both continuous and categorical scales. Principal components and exploratory factor analysis (EFA) are the predominant

methods for deriving them in a continuous scale. Both methods group food-items according to the degree to which they are correlated with each other, and subjects have a score for each DP. By contrast, cluster and latent class analysis classify individuals into mutually exclusive groups (unknown a priori) such that within groups diets are similar. Even when DP are derived as continuous, usually investigators classify subjects based on joint classification of the factors to estimate the risk of the outcome for each group compared to a referent. In practice, factors are categorized by quantiles, and subjects are classified according to their cross-tabulation (Hu,F.B. 2002; Knudsen,V.K. 2007). Newby and Tucker (2004) review 93 studies that used principal components, factor or cluster analysis to derive dietary patterns. Here we summarize important statistical decisions relevant to FA and briefly describe LCA.

2.3.1 Factor analysis

Factor analysis postulates a statistical model to explain the correlations between many observed variables by a few underlying but unobservable (latent) variables called factors (Bollen K.A. 1989). In exploratory factor analysis the relationship between the observed and the latent factors is not specified in advance; whereas in confirmatory factor analysis (CFA) the model is specified a priori. Some advantages for using CFA are the ability to account for correlated errors, test if factors are uncorrelated, adjust for covariates and assess goodness-of-fit. In practice, EFA is conducted first to suggest the number and characterization of the DP, and then CFA to test hypotheses and assess goodness-of-fit.

Most empirical DP are derived using EFA with the principal components method of estimation to provide a unique factor score solution and using Varimax method of orthogonal rotation to facilitate interpretability by making factor loadings closer to 0 or ±1 rather than intermediate. Orthogonal rotation also simplifies future analyses, such as avoiding

collinearity when using factor scores as covariates in regression models or allowing analysis as independent outcomes. CFA for ordinal outcomes can be estimated using the user’s Stata program GLLAMM (Generalized Linear Latent and Mixed Models) (Rabe-Hesketh,S. 2004) or Mplus (Muthen L.K. and Muthen B.O. 1998-2006).

2.3.2 Latent class analysis

Latent class models (Rabe-Hesketh,S. 2007) classify subjects into unknown a priori classes such that within classes subjects are similar. LCM are specified as a finite mixture model (McLachlan G.J. 2000) of conditional densities given the class and are usually estimated using maximum likelihood via the Expectation-Maximization algorithm (Dempster A.P. 1977). A LCM for categorical outcomes and covariates estimates two sets of parameters: regression coefficients predicting class membership and conditional probabilities of the observed responses given the class. In contrast to cluster analysis, in LCM each subject has a predicted probability for belonging to each class. The most common way to classify subjects into a specific class is to assign them to the one with the highest probability of class membership.

2.3.3 Number of dietary patterns

There is no single best way to select the number of DP. When using EFA, the DP literature most often keeps the meaningful factors by visual inspecting the loadings in combination with eigenvalues above one (Kaiser’s rule), or those before the Cattell’s Scree plot (eigenvalues vs. number of factors) starts to flatten, indicating that there is no gain in explained variance by adding another factor. Similarly, in LCM there is no single accepted statistical test or fit-statistic to determine the number of latent classes. The usual likelihood ratio test (LRT) cannot be used to compare nested latent class models because the

regularity conditions required in classical maximum likelihood theory are violated and hence, its distribution is not chi-square. Two approximations to the LRT are the Lo-Mendell-Rubin LRT (LMR-LRT) (Lo et al, 2001) and the bootstrap LRT (B-LRT) (McLachlan, GJ. 2000). Another way to compare models with different number of classes is with the Bayesian Information Criterion (BIC). A recent simulation study (Nylund, K.L. 2007) showed that the B- LRT performed better in identifying correctly the number of classes than the LMR-LRT and the BIC. However, some disadvantages of the B-LRT are requirement of large sample sizes, increase in computation time, and lack of robustness to model misspecification. Deciding on the number of classes does require care, because spurious latent classes can be accommodating non-normality rather than discovering subpopulations (Bauer DJ and Curran PJ, 2004).

2.4 Analysis of Dietary Patterns for Women in PIN

In document Index Catalog // Carolina Digital Repository (Page 68-71)