Conversion factors
LIPID PROFILE
2.4. Statistical analysis
The exposure variable, migration was generated based on three categories —rural, migrant (meaning people who had migrated from rural to urban areas) and urban—
as indicated in the previous section. The rural group was used as the reference category for the main analysis. The migration variable was subsequently sub-divided by length of residence in urban environment, lifetime exposure to urban area as a proportion of current age, and age at first migration, using the lowest categories in each of them as baseline for comparisons. Only data from all individuals who completed the study were used in the analysis. Participants who completed the study were defined as those with completed questionnaires, clinical measurements and laboratory analyses (see section 5.1.1, Definitions used at each stage of study).
All data derived from this study follows recent STROBE guidelines for reporting observational studies [169]. Statistical analyses were carried out using Stata software version 10 (StataCorp LP, College Station, Texas, USA). To minimise errors in transcriptions, estimation results were converted to output datasets with one observation for each of a set of estimated statistical parameters using the parmby command within Stata [170].
2.4.1. Descriptive analysis
For general description of data, frequency analyses are presented as number (percentages), mean (± standard deviations (SD)) or median (interquartile range) when appropriate. Because of the study’s matched-design by age-group and sex, no difference between calculations with and without adjustment for age and sex were expected in univariate analyses of categorical data, e.g. prevalence rates. This assumption was verified in all calculations, using direct standardisation against whole sample studied, and thus, such adjustment was not pursued for the reporting of categorical data.
Continuous non-normally distributed variables (triglycerides, CRP, fibrinogen, fasting glucose, HbA1c, fasting insulin and HOMA insulin resistance) were log
-64-
transformed. Such logarithm transformation led to normal or near normal distributions. Age- and sex-adjusted arithmetic means (± SD) or geometric means (ratios) [171, 172] are presented. In the case of age, since the study-design only included participants from 30 years-old or more, a mid/centre age point was used such that age 45 years-old was used as the baseline in all age-adjustments.
2.4.2. Multivariable analysis
Multivariable logistic regression and linear regression were used for categorical and continuous outcomes respectively. Adjustment for treatment effects in specific continuous outcomes, e.g. antihypertensive therapy on blood pressure outcomes, was done using censored normal regression [173].
In logistic regressions results, odds ratios (OR) compare against the baseline exposure group of interest. For the linear regression dummy variables were created for the main exposure variable and other confounders when appropriate.
Interpretations of categorical exposures for a continuous outcome were based on the β coefficients. β coefficients represent the average change of the outcome of interest, maintaining the units of measurement, in each category of exposure compared to the baseline group for that exposure.
Adjustment for potential confounding was done in a step-wise approach. A conceptual discussion of potential confounding related to the study is presented in this chapter (see section 2.7.3). Later on, the distribution and aggregation of measured socioeconomic indicators and its aggregation for their use as confounders is discussed in section 5.4.
R2 for linear regression is also provided in each result table that reports outputs derived from multivariable analyses. R2 is the proportion of variance in the outcome variable explained by the predictors. Of note, R2 is an overall measure of the strength of association, and does not reflect the extent to which any particular independent variable is associated with the dependent variable [174].
Adjustment for treatment effects could potentially be used in the case of glucose as an outcome censoring those who are taking antidiabetic medication, or similarly, in
lipid traits censoring those on statins. In the case of lipids, none of the participants of this study reported to be on any lipid-lowering medication thus there was no need to make any adjustment based on lipid treatment. However, censored normal regressions, one of the approaches recommended for such treatment effects adjustments [173], has only been described for blood pressure. Such models make two key assumptions. First, it assumes that the underlying —not affected by medication— blood pressure is as least as high as the observed or measured blood pressure. Second, it assumes that the distribution of the underlying blood pressure above any given threshold in treated subjects is the same as the corresponding blood pressure distribution of those untreated, implying a non-informative censoring [173].
The second assumption can be criticised. It is because of such assumption that this approach for censoring was not used in the case of glucose and antidiabetic medication. Basically, glucose presented a skewed distribution and the proportion of those on medication was very low. Thus, censored normal regression was only be used in the case of blood pressure as previously described [173].
2.4.3. Standardised mean differences
To answer the study’s main and specific research questions, which evaluates if there is a difference in specific CVD risk factors in the rural-to-urban migrant group compared to those who did not migrated, odds-ratios (OR) and standardised mean differences (SMD) were calculated. In categorical outcomes, OR and 95%
confidence intervals (CI) were calculated using logistic regression. In the case of continuous outcomes, SMD were chosen because of its advantage to interpreting results of continuous data measured with different scales or units, thereby facilitating comparisons of difference sizes for individual measures. The Cochrane Collaboration has defined SMD as “the difference in means between two groups, divided by the pooled standard deviation of the measurements” and suggests that “the value of a SMD thus depends on both the size of the effect (the difference between means) and the standard deviation of the outcomes (the inherent variability among participants)”
[175]. In this study, a slight variation in the normal SMD calculation was made to be able to include the comparison of more than two groups.
-66-
SMD were calculated through a regression to a within-group SD scale which took into account the variation in all three rural, migrant and urban groups. As such, the denominator is not a pooled SD of measurement of two groups only but it is very similar to the SD of the whole population studied. SMD regressions were carried out in fully adjusted models taking into account age, sex, individual’s socioeconomic indicators and parental education.
In this sense, all comparisons would be expressed as differences in units of SD of normally-distributed variables or as differences in units of SD in the log scale of transformed variables. Due to the lack of units, these SMD allows for comparison of magnitude of differences across various risk factors. In terms of the interpretation of SMD, the Cochrane Collaboration indicates that “rules of thumb exist for interpreting SMD (or ‘effect sizes’)… 0.2 represents a small effect, 0.5 a moderate effect, and 0.8 a large effect [175, 176]”.