Statistical methods in life-course studies
3.2 Statistical models for independent data
Linear regression models
Simple linear regression can be used to establish the relationship between a continuous response variable and one or more explanatory variables, where the response variable follows an independent identical Normal distribution The regression parameters (i.e. intercept and slope) can be estimated using Ordinary Least Squares (OLS), which minimizes the sum of squares of the residuals. The association between an explanatory variable of interest and the response can be assessed by using a r-test for the null hypothesis of the true slope of the regression line being zero.
In a multiple regression model, where more than one explanatory variable is considered, the parameter for a specific explanatory variable is a partial regression coefficient representing the increase in the expected response for every unit increase in the explanatory variable when other variables are held fixed (or adjusted). Curvilinear relationships can be explored by adding quadratic, cubic terms, or higher order terms to the model
Test fo r the linear trend in means fo r ordered categorical variables
Some explanatory variables are ordered categorical rather than continuous. In the simplest case, the order of the category can be treated as a continuous variable. A linear regression model can be applied to test for a linear trend of a continuous response variable according to the ordered categories, assuming that they are equally spaced. For a response variable height and an explanatory variable social class, the slope indicates the height increment for
every category increase in social class variable. A non-zero slope of the regression line indicates significant social class gradients in height.
Confounding factors and ejfect modifiers
The association between a particular exposure variable and a response variable can sometimes be altered after accounting or controlling for a third variable— known as a confounding variable, which is associated with both the response and the exposure variable of interest. In general, a confounder cannot be an intermediate step in the causal path between the exposure and the outcome. A confounding factor may partially or wholly account for the apparent association between the explanatory variable and the outcome variable The distortion introduced by the confounder may lead to the over- or under estimation of an effect, depending on the direction of the associations between the
confounding factor and both the exposure and response variables. Failure to take account of confounding effects may lead to the conclusion that a relationship exists when in fact it does not, or there may be a failure to detect a relationship when one truly exists. For example, consider an association between family size and height, with children from larger families tending to be shorter than children from smaller families. It is known that social class is positively associated with height and negatively associated with family size. If the association between family size and height is different, depending on whether we ignore the proportion of individuals in each social class or control it by studying the association separately for each social class, then social class is an example of a confounding factor for the relationship between family size and height, as family size is unlikely to be a causal factor for social class.
The strength of the association between the exposure and the response variable (i.e. the value of the regression coefficient) may sometimes vary with the level of a third variable, known as an effect modifier. In the same example, if the estimated association between family size and height is significantly different from one social class to another, then social class is an example of an effect modifier. As another example, consider an association between size at birth and adult disease, where children with retarded fetal growth tend to have a higher risk of a disease. It is known that childhood growth may play a role as an effect modifier of the associations between small size at birth and risks of adult diseases
4;228
In particular, it has been shown that the relationship between birthweight and high blood pressure appears to be modified by adult BMI
The distinction between a confounder and an effect modifier in statistical terms is that a confounder is normally treated as a covariate in a model, while effect modification is assessed by modelling the interaction between the exposure and the modifier. For example, an effect modifier can be used to identify a subgroup with a lower (or higher) risk for a disease. Thus confounding is a bias that needs to be controlled for, whereas effect modification should be described and reported
One way to remove the confounding effect and to evaluate and describe effect modification is splitting the third variable into subgroups or strata and obtaining a separate effect
estimate of the exposure from each stratum. Strata are usually defined by levels of a confounder or a combination of confounders. When the association estimated from the crude analysis (without stratification) has a different magnitude (or even a different
effect estimates from all strata. The effect estimated within each stratum according to the confounding variable is not confounded. When the effect estimates differ across strata, suggesting the existence of an effect modifier, the findings should be presented separately for each of the strata.
When there is a large number of strata involved, the data will be dispersed too thinly over the strata and the estimate of an effect will be imprecise. Multiple regression models are commonly used for the adjustment of confounders when a stratified analysis becomes impractical. The coefficient for any explanatory variable is conditioned on the remaining explanatory variables in the model. Multiple regression models provide an efficient way to obtain precise estimates, while controlling for potential confounding factors. The
magnitude of the confounding effect is assessed by comparing the effect estimates obtained from models before and after the adjustment of the potential confounder. The presence of a modifier can be assessed by testing the interaction term of the third variable and the
exposure variable. In an epidemiological study involving many confounding variables, all the confounding factors can be controlled simultaneously in a single multiple regression model, and the adjusted effect of each factor can be estimated.
The potential confounding factors were examined throughout this study and the adjustment was made when necessary. In Chapters 4 and 5, the unadjusted relationship between each early life factor of interest and height (SDS) was first examined; the relationship was then adjusted by adding parental height, fetal and early life factors into the model to establish whether the relationship under investigation was in part related to influences of other factors.
It becomes more complicated when the confounding factor is also an intermediate variable. It occurs when an exposure in some way causes the change in the confounding variable For example, smokers may have higher blood pressure, which may in turn increase the risk of death. However, those with higher blood pressure may have increased motivation to quit smoking, which may in turn reduce the risk of death. Thus higher blood pressure may be a confounder for the effect of smoking, and also an intermediate variable on the causal pathway of smoking on mortality. Longitudinal data are needed to assess such
relationships. Conventional methods, i.e. survival models with time-varying covariates may be biased when this form of time-varying confounders exist. Statistical approaches that are suitable for time-varying confounders are discussed in §3.4.2.
This current study focuses on early influences on height. The exposure and confounding variables used were all measured at one age in early life. Therefore the issue of a
confounding factor also being an intermediate variable does not arise in our analyses. However, it is an important issue in life-course study and is discussed briefly in §3.4.2.