CHAPTER 4: MANUSCRIPT 1 — SOCIODEMOGRAPHIC PREDICTORS
4.3 Methods
4.3.1 Study sample
The data in this study are drawn from the Santiago Longitudinal Study (SLS), a longitudinal cohort study from Santiago, Chile. Infants were recruited, from 1991 – 1996, for an infancy iron deficiency anemia preventive trial or neuromaturation study for those with anemia and the next nonanemic control (Lozoff et al. 2003; Lozoff 2012; Gahagan et al. 2009). We characterized the growth period prior to treatment randomization, which occurred at six months, partly to reduce confounding from randomization status and to include as early a time period as possible with as many repeated observations as possible to adequately characterize the nonlinear growth curve. Inclusion criteria for the preventive trial included full-term infants with birthweight ≥ 3.0 kg, vaginal birth, no major health problems for the infant, and no iron deficiency anemia present at 5 to 6 months. The sample size in infancy included 1,645 infants who completed the preventive trial and 153 in the neuromaturation study. The participants included in this analysis numbered 1,412 individuals with anthropometric measures for at least two time points.
4.3.2 Outcome and sociodemographic measures
Three anthropometric measurements including weight (kg), length (cm), and weight- for-length (WFL) (g/cm) were assessed. Weight was measured to the nearest 0.01 kg on an electronic scale at local public health clinics. Length was measured on a recumbent board to the nearest 0.1 cm. Gestational age (GA) was among the set of variables included in models as a covariate.
Sociodemographic measures were self-reported by the mother, and they include: maternal age (years), total years of education, and the modified Graffar index (Graffar 1956), an index of socioeconomic position (SEP) within lower-income countries (Alvarez, Muzzo, and Ivanović 1985). The modified Graffar index represents a sum of 10 measures including education, expenditures, and housing characteristics, which are summed to create a scale with higher values indicating lower social class (Appendix Table A1).
4.3.3 Statistical analyses
Summary statistics included median and interquartile range for continuous variables and percents with counts for categorical variables. All summary statistics were stratified by sex of child.
The outcomes, infant weight, length and weight-for-length (WFL) growth from birth to five months were assessed using the SITAR approach (Cole, Donaldson, and Ben-Shlomo 2010). In this approach, a nonlinear mixed effects model was fit (Beath 2007) using the R nlme package (Pinheiro et al. 2017). The nonlinear mixed model can produce up to three different measures of growth for each individual, which have been named “size”, “tempo” and “velocity” (Cole, Donaldson, and Ben-Shlomo 2010). The “size” SITAR growth parameter indicates a shift of the growth curve up and down for an individual relative to the average
growth curve. The “tempo” SITAR growth parameter indicates a shift of the growth curve to the left and right on the age scale for an individual relative to the average growth curve. Lastly, the “velocity” SITAR growth parameter indicates a transformation of the age scale in the nonlinear model, shrinking or enlarging the age scale for an individual relative to the average growth curve. These three parameters are noted as having biologically meaningful interpretations, which are difficult to obtain with complex growth models (Beath 2007) and are a primary reason for our use of this method. Unless otherwise noted, any references to “size”, “tempo”, and “velocity” herein refer to these parameters from the SITAR construct
applied to early infant growth.
Prior to any subsequent model fitting with the predictors we assessed best model fit for each of the anthropometric measures via the lowest Bayesian Information Criterion (BIC) for growth independently of any covariates. After evaluating all possible combinations of SITAR models from one to three parameters for each of the three anthropometric measures, best fit (Appendix Table B1) entailed: 1) all three growth parameters for weight (BIC=- 22941), 2) sex-specific growth trajectories with tempo and velocity parameters for length (BIC=-38001), and 3) sex-specific growth trajectories with size and tempo parameters for WFL (BIC=-22809).
After selecting relevant growth parameters for each of the three anthropometric measures, we used the relevant growth parameters as outcome measures in separate linear regression models. A sample interpretation of the model with the weight size growth parameter is percentage change in log(kg) for a one unit change in the predictor (Cole and Altman 2017; Cole 2000). Similarly, a one unit change in the predictor corresponds to a shift in the time scale in days for the tempo growth parameter and percentage change in the velocity growth parameter.
separately. The second set, identified as the “adjusted model”, started with all four covariates: gestational age, maternal age, maternal total years of education, Graffar index (Graffar 1956). We removed covariates from the model based on a least absolute shrinkage and selection operator (lasso) approach (Tibshirani 2011; Walter and Tiemeier 2009; Walter and Tiemeier 2009; Franklin et al. 2015; Pavlou et al. 2016). This approach has been shown to have better performance than conventional model selection methods with a univariate approach (Greenland 2007) such as stepwise methods (Harrell 2015; Ratner 2010). The lasso approach is helpful in selecting out predictors with the strongest effects (Hastie, Tibshirani, and Friedman 2017) while balancing bias and variation in the model. We used the glmnet (Friedman, Hastie, and Tibshirani 2010) package in R to estimate shrunken parameter estimates, and the selectiveInference package (Tibshirani et al. 2017) to provide inference via statistical tests and confidence intervals. Each set of comparisons by outcome, i.e. weight, length or weight-for-length were considered separately when controlling multiple comparisons with a Bonferroni correction at an alpha level of 0.05.
We used a complete case data set, i.e. all participants with non-missing covariates, as the number of missing was less than one percent for all variables with the exception of the Graffar index with less than three percent missing. In the sample, the median number of non-missing outcome (anthropometric) values was six out of six monthly measures (birth to five months). The percent of missing outcome values at each time point ranged from 9% at months 1 and 2 to 0.2% at birth.