Study design and data structure - Statistical methods in life-course studies

Statistical methods in life-course studies

3.1 Study design and data structure

3.1.1 Cross-sectional and longitudinal studies

A cross-sectional study examines the relationship between a response variable and

exposure variables at one point in time. The exposure variable can be fixed (e.g. sex) or variable (e.g. current smoking or recalled smoking history). Thus a cross-sectional study is not suitable for determining the temporal sequence of events since it only deals with the relationship at one time point, and therefore cannot provide enough evidence for a causal relationship. In general, the statistical techniques for dealing with cross-sectional studies are easier than those required by longitudinal studies. When the response variable is continuous (i.e. height), the analysis can include testing the difference in mean height between two groups using a r-test, or between three or more groups using ANOVA. A bivariate relationship between two continuous variables (e.g. height and BMI) may be assessed using correlation and regression analyses. Multiple regression models can be used to adjust for potential confounding factors (§3.2).

Table 3.1 General overview of statistical methods that may he applied in life-course analyses

Method Data Assumptions Advantages Limitations

Linear regression or generalised linear models 216;217

Cross-sectional Longitudinal

Underlying distribution of the response. Observations are independent

Available in all statistical packages Not practical when the path diagram is complicated

Multilevel/ random effect models ‘39^18^19

Cross-sectional Longitudinal

Underlying distribution of the response. Observations are clustered in a higher level or subjects are measured on more than one occasion

The correlation structure is accounted for. Estimate fixed parameters and variance components at multiple levels. All data is incorporated

Generalized Estimating Equations (GEE) '*)2i9.22i

Longitudinal Underlying distribution of the response. Subjects are measured on more than one occasion

The correlation structure is accounted for. All data is incorporated. Useful for estimating fixed parameters

Growth models ^^2.223 Longitudinal Underlying distribution of the response. Subjects are measured on more than one occasion. Time is assumed to be continuous

The correlation stmcture is accounted for. All data is incorporated. Trend can be tested. Timing and spacing of time points may vary

Not practical when time intervals are large

Multivariate response models

Cross-sectional Longitudinal

Underlying distributions of the responses. Subjects are measured on more than one occasion, or on several response measures. Time is considered as fixed occasions

Repeated or multiple response measures are examined simultaneously. Correlation structure is accounted for. Effect can be directly compared between occasions or responses. All data is incorporated

Not practical when the number of repeated measures is large

Structural equations Cross-sectional

Longitudinal

Response and intermediate variables are Normally distributed. For path analysis latent variables are involved. Assumption for causal directions

Estimate direct and indirect effects. Can be used when there is more than one response variable in system

Can’t determine the underlying causal structure

G-estimation Longitudinal

(event data)

Time-dependent covariates are both confounders and intermediate variables

Can be used when time-dependent exposure variables are also measured repeatedly

Not available in statistical packages

In a longitudinal study, a sample of individuals is observed prospectively over a specified time interval. Exposure variables or response variables are observed on several occasions, where the response variable is observed after the exposure variable For example, a longitudinal study of coronary heart disease (CHD) may define a sample at baseline, and follow the individuals to observe risk factors and morbidity through time. However, longitudinal data could also be obtained retrospectively by reviewing health records or by asking individuals to recall past events. The longitudinal study of the Office of Population Censuses and Surveys (OPCS) follows a 1% sample of the British population that was initially identified at the 1971 Census, Outcomes such as mortality and incidence of cancer have been related to socio-economic factors measured at successive censuses Thus longitudinal designs are uniquely suited to investigate the temporal sequence of events, the relationship between changes of an outcome over time and factors that have affected those changes, or the relationship between the outcome and the accumulation of the exposures. Longitudinal studies therefore provide more evidence of causality than cross-sectional studies.

Prospective longitudinal studies have been referred to as cohort studies. The 1958 cohort provides responses and exposures collected at different life stages, from before birth to adulthood. The Mother and Child Study was conducted only once in 1991 and therefore is a cross-sectional design. However, a cross-generational comparison of cohort members and their offspring is considered as a longitudinal design. Common issues of longitudinal studies include the loss in follow-ups for a variety of reasons and may be a source of bias. Details for testing the missing patterns are given in §3.3.1. In addition, the correlation of

3.1.2 Independent and hierarchical data structures

An independent data structure is characterized by the fact that the value of one observation does not affect that of the others. Whereas, a hierarchical structure is characterized by the fact that individuals can be treated as members of groups. Units at one level are grouped within units at a higher level and data observed from the same group are related. For example, children from the same family share similar genes as well as environment; measurements taken from the same individuals are more alike than those taken from different individuals.

Data of the 1958 cohort have a unique hierarchical structure. First, height and other variables of interest were measured at several occasions for cohort members (G2) (i.e. at ages 7, 11, 16, 23, and 33 years). The variation in height between individuals is greater than the variation between measurements within individuals once age is adjusted for. Second,

cohort members and their offspring (G3) are nested within families; subjects from same families are correlated. Therefore, statistical methods that take account of the data structure are essential in order to make inferences of the data.

Statistical models that are suitable for independent data are briefly described in §3.2. In §3.3 more complicated models for data with a hierarchical structure are discussed in detail.

In document Influences on growth: A study of two generations based on the 1958 British Birth Cohort (Page 98-102)