Basic Study Designs in Analytical Epidemiology For Observational Studies

(1)

Basic Study Designs in Analytical Epidemiology For Observational Studies

Cohort

Case Control

Hybrid design (case-cohort, nested case control)

Cross-Sectional

Ecologic

(2)

OBSERVATIONAL STUDIES (Non-Experimental)

Observational because there is no individual

intervention, treatment, exposures occur in

a “non-study” environment (i.e. not randomly)

Individuals can be observed, prospectively,

retrospectively or currently

(3)

Three observational analytic designs with individuals as the unit of analysis

Cohort–can be prospective or retrospective (also called concurrent/non-concurrent) (can also be mixed)

Case Control Hybrid design Cross-sectional

(4)

DEFINITION OF COHORT STUDY

The analytic method of epidemiological study in which subsets of a defined population are identified who are, have been, or in the future may be exposed or not exposed, or exposed in different degrees, to a factor or factors

hypothesized to influence the probability of occurrence of a given disease or other outcome.

(5)

Cohort Studies

Prospective cohort study is the “gold standard” of

observational studies because events can be recorded as they occur, as opposed to obtaining informatio n retrospectively Cohort is followed over time, and outcomes ascertained (i.e.

disease incidence, death, remission, etc)

(6)

USUAL FEATURES OF COHORT STUDY

· Observation of Large Numbers Over A Long Period

· Comparison of Incidence Rates in Groups That Differ in Exposure Levels

(7)

CALENDAR TIME ISSUES

True Prospective Study/Concurrent study. Cohort constructed in present time; exposures documented in present time and possibly in future; cohort followed prospectively in future calendar time.

Historical Prospective/Non-concurrent Study. Cohort constructed in past time; exposures documented in past; follow-up can extend into future.

Mixed: Can include both concurrent and non-concurrent exposures

(8)

Advantages/Disadvantages of Concurrent, Non concurrent cohort studies

Concurrent Non-concurrent

Exposures/outcomes can be measured prospectively, can obtain biologic measurements

Must rely on records, recall of participants to measure

exposures,

outcomes, subject to error

Depending on

outcome of interest, follow up time may be long, increasing costs

Measures prior exposures, usually cheaper

Easier to trace participants

Subjects harder to trace, especially if last contact

/record many

years prior

(9)

TYPES OF COHORTS

· General Populations

· Occupational Groups

· Memberships of Groups (HMO Members, Graduates of a Particular College, MEDICARE Beneficiaries, Vietnam Era Veterans, Survivors of Atomic Bomb, Framingham residents)

(10)

TEMPORAL MEASUREMENT OF EXPOSURES

· At Baseline Only

· Throughout Study

(11)

TYPES OF MEASUREMENT OF EXPOSURES

· Direct Measurement

· Ex. Biologic measurements / air sampling/ water sampling

· Use of Surrogate Measurements

· Ex. Work records, questionnaires, medical records

(12)

SOURCES OF MEASUREMENTS

· Direct Measurement–subject interviews

· Use of Proxies (friends, relatives)

(13)

STEPS IN CONDUCTING A COHORT STUDY

Prospective Study

Define cohort, invite subjects to participate

Obtain baseline exposure measurements

Interviews, biologic samples, clinical assessments, air /water sampling, etc

Follow cohort for disease–match subjects to registry data, survey cohort for disease, etc

May obtain additional exposure measurements over time ex. survey dietary intake, post menopausal hormone use, etc.

Analyze disease risk according to exposures

(14)

Retrospective Cohort Study Procedures

Define cohort –may include deceased individuals Ex. All employees of company “a” who worked 6 months or longer between time x and time y

Obtain retrospective exposure measurements-- Ex. Review work records for job descriptions, exposure data, survey subjects regarding work history, smoking history, etc

Obtain retrospective disease/mortality data for period of study. Usually only mortality data will be available –can obtain death certificates. Some states may have disease registries for the time period under study.

Analyze events (i.e. cause of death), by exposure status (Relative Risk, SMRs)

(15)

OUTCOME MEASURES

· Incidence

· Risk Ratio (Relative Risk)

· Odds Ratio (Relative Odds)

· Attributable Risks

· Clinical Attributable Risk

· Population Attributable Risk

(16)

ISSUES THAT CAN IMPACT RATES

AGE EFFECT–Most diseases vary by age, hence most analytic studies will always take age into account in the analysis

COHORT EFFECT– Year of birth could affect exposure and/or disease (stomach cancer rates declined after 1930's, probably due to the advent of refrigeration, and hence less need for preserved,

smoked foods that are associated with stomach cancer

PERIOD EFFECT –change in risk of disease at some point in time (risk increases for all ages, all cohorts). Not as important for

diseases with cumulative effects (i.e. smoking and lung cancer).

Change can be due to change in exposure (more relevant for

infectious disease), change in treatment (change would have to have large impact) or improved detection (increase in brain tumor rates may be due to better diagnostic tools in the past 20-30 years)

(17)

Case Control Studies

Case based

Nested case control (nested within cohort)

(18)

DIAGRAM OF CASE-CONTROL STUDY

Population

Subjects

Cases Controls Disease Present Disease

Absent

Exposed

Not Exposed

Exposed Not Exposed

(19)

STEPS TO CONDUCT A CASE CONTROL STUDY

Select cases (hospitals, registries, other) Select controls (RDD, HCFA, friends, etc)

Obtain exposure information (interviews, record reviews, etc)

Analyze data (odds ratios)

(20)

WHY DO CASE CONTROL STUDIES?

ADVANTAGES COMPARED TO COHORT STUDIES

For rare diseases, less costly to conduct a case control study (requires less subjects)

Follow up of a large number of individuals for a long period of time is required in a cohort study of rare diseases (i.e. cancer)

Subjects may drop out, lost to follow up

(21)

DISADVANTAGES of Case Control Studies

Recall bias–selective recall of prior exposures among cases, controls

Retrospective assessment of exposures subject to error.

Limited to one disease.

If selected exposures are of interest (for example,

asbestos) the number of cases/controls with this exposure likely to be low.

Better to conduct a cohort study among exposed/unexposed.

May be difficult to ascertain cases.

(22)

DESIGN FEATURES OF CASE-CONTROL STUDIES

• ^Cases

⇒ Sources of Cases

∗ Population Based Registries

∗ Hospital, HMO, and Other Health Provider Records

∗ Berkson’s Bias

⇒ Definition of Cases

∗ Diagnostic Criteria

∗ Incidence Based Cases

∗ Prevalence Based Cases

∗ Prevalence-Incidence Bias (Neyman’s Bias)

(23)

AN EXAMPLE OF PREVALENCE-INCIDENCE BIAS USING FRAMINGHAM DATA

(24)

(25)

DESIGN FEATURES OF CASE-CONTROL STUDIES (CONTINUED)

• Controls

⇒ Sources of Controls

∗ Institutional Controls (Hospital, HMO, or other Medical Care Provider Sources)

♦ Subjects Selected Among Those Not Having Same Disease as Cases.

♦ Subjects More Accessible and Cooperative

♦ Subject to Similar Referral Patterns as Cases

♦ Easier to Measure Exposure from Records, Physical, or Laboratory Measurements

∗ Population Controls (Neighbors, Friends, Relatives of Cases, Random Samples (RDD), Driver’s License Records, HCFA)

♦ Source Population is Better Defined and Easier to Ensure that Cases and Controls Come from the Same Population

♦ Exposure Measurements More Likely to be Representative of Population without Disease

♦ Problems of Overmatching or Over-Controlling

♦ Population Controls may have lower response rates, however (RDD, HCFA, Driver’s license records)

(26)

(27)

Other Issues

1. Should dead cases be included in your study?

(Ex. Eligible incident cancer cases who die before they can be interviewed)

Can interview proxies (family, co-workers) but data will not be as reliable

2. Should dead cases be matched to dead controls?

If dead cases are included, should only dead controls be used as a comparison group?

Rothman–dead controls are not in source

population for controls since they have no chance of getting the disease. However, if they have same exposure distribution as source populatio n, can reasonably use dead controls –“proxy sampling”

Some exposures in deceased population are likely to be higher than in source population however–i.e.

smoking.

(28)

¨

DESIGN FEATURES OF CASE-CONTROL STUDIES (CONTINUED)

⇒ Definition of Controls

⇒ Selection of Controls

∗ Unmatched

∗ Pair Matched

∗ Frequency Matched

⇒ Ratio of Controls to Cases

∗ Not much gained from more than 3:1 Matching

(29)

EFFECT OF NUMBER OF CONTROLS PER CASE ON THE RELIABILITY OF THE RESULTING ESTIMATES

Controls Per Case Reliability of Resulting Odds Ratio

(Relative to One Control Per Case

Incremental Gain

1 1.00 -

2 1.33 33%

3 1.50 17%

4 1.60 10%

5 1.67 7%

6 1.71 4%

7 1.76 3%

(30)

• Measurement of Exposure

• Measurement of Potential Confounders

Retrospective or concurrent measurement issues in case control studies

1. Recall bias

2. Current biologic measurements may not reflect past exposures (exposures with short half lives)

3. Refusal rates–do they vary by exposure status?

4. Use of existing records (hospital/medical records, occupational records etc). Are they accurate?

(31)

ANALYSIS ISSUES

1.Analysis Must Follow Particular Design

2. Major Analysis Issues for Case Control-Studies

• Is Exposure Quantitative or Categorical? If Categorical, Is it Measured at More Than Two Levels ?

• Is Design Pair-Matched as Opposed to Unmatched or Frequency Matched

?

• Is Main Outcome Trend or Overall Non-Specific Association?

• We Will First Discuss Analysis for Unmatched or Frequency Matched Studies, and then for Pair Matched Studies

(32)

NESTED CASE CONTROL STUDIES/

CASE-COHORT STUDIES

Hybrid design. Identifies cases from cohort study.

Suppose you have identified a number of subjects with disease

“y” in your cohort study. Exposure “x” has been identified as a potential risk factor for disease “y” but you have not

measured this exposure on your entire cohort.

Exposure “y” is expensive to measure (i.e. laboratory analyses of serum collected at baseline, detailed occupational records must be abstracted), you decide to measure exposure in cases and a sample of the cohort without the disease.

Two approaches:

Case-cohort study: sample controls at baseline

Nested case control study: sample controls at the time each case occurs. This matches cases and controls on duration of follow up and uses more information.

In both methods, cases can be both controls and cases. Special statistical techniques are applied in these analyses. (Not

covered in this course)

When cases are excluded as controls, then the usual case control analyses can be conducted, with exposure odds ratios calculated.

(33)

Ex. Nurse’s health study. Analyzed PCB in serum on breast cancer cases and a sample of controls.

CROSS- SECTIONAL STUDIES

“SNAPSHOT” of Exposure and Disease at a SINGLE point in time.

Measures PREVALENCE, not incidence.

Examples:

In cohort studies, baseline measurements of exposure and disease represent cross sectional data.

Community surveys of exposure and disease at a single point in time.

Survey of current workplace exposures and current disease.

CUMULATIVE PREVALENCE

Survey of individuals that measures the occurrence of any current or prior disease in the person’s lifetime.

Measures of association: Prevalence rate ratio

(34)

Can use statistical approaches for incidence rate ratios.

(35)

Incidence Prevalence Bias in Cross Sectional Studies Ex. Survey of current smoking and emphysema Emphysema cases who keep smoking after diagnosis have shorter survival. So prevalent cases are less likely to be current smokers, underestimating risk from

current smoking.

(36)

Ecologic Studies

Use aggregate data, used primarily for hypothesis generation as opposed to hypothesis testing

Examples of aggregate data:

Disease rates (incidence, mortality, etc) Birth rates

“Exposure” data: smoking rates, geographic residence, air pollution data, mean income, per capita

consumption of saturated fats, proximity to nuclear

power plants

(37)

Ecologic Fallacy

Grouped data do not necessarily represent individual level data

Example: Durkheim classic work Suicide

Correlation between percent of population that is Protestant and suicide rates in 19

^th

century

Assumes rates are highest in Protestants– but what if minority Catholics within majority

Protestant communities have highest suicide rates due to their social isolation?

Also, information on confounders not usually

available.

(38)

Epidemiology example

Fat intake and breast cancer rates with countries as the unit of measurement have consistently been found to be highly correlated.

But studies of individuals (cohort, case control studies) have not found any association with fat intake.

Why?

Possible reasons–countries with high fat intake are more likely to have other risk factors

associated with breast cancer (i.e. late age at first pregnancy)

Or-- within population variability is low, but inter-population variability is high.

i.e. Extreme example– if everyone in a country had high fat intake, we would not be able to detect any excess because there would not be any

population to compare them to with low fat intake

(39)

(40)