Inference and Threats to Validity - OReilly Statistics in a Nutshell A Desktop Quick Reference

The choice of research design—experimental or observational—is usually governed by constraints over how data of interest can be collected, and the type of

statistical inferencerequired; that is, inferring characteristics of a population using

statistics calculated on a sample considered to be representative of that population.Before I review the structure of these research designs, I will firstly discuss inference, and why it is the cornerstone of statistics.

Sometimes, data is collected for a very specific purpose, without any desire to understand, characterize, or make predictions about a broader phenomenon.For example, clinicians in a hospital ward dedicated to the treatment of hypertension may be interested in how different anti-hypertensive medications affect each of the patients individually—these effects are difficult to predict, since so many physiological factors are involved.The selection of safe and effective medication for each patient is the primary motivation.In each of the patient cases, data is collected and stored for the primary clinical purpose.However, after a number of years, clinicians begin to notice some patient factors that appear to predict which drug will have the best clinical outcome for certain patients.Drug A appears to be most effective for men, while Drug B appears to be most effective for women. Rather than administering A and B to every patient—since drug administration is inherently risky—the clinicians decide that they would like to determine whether the results that they have observed for individual patients are true for the wider population.Thus, based on the samples that they have obtained in the past (clinical) context, they wish to make inferences about the parameters of a larger population.This will assist in both quantitatively characterizing and predicting the effects of the drugs on patients.

Who is the population in this instance? The population is the group of persons who suffer from hypertensive illness.Since it is infeasible to test the effects of Drug A and Drug B on all hypertensive patients worldwide, a sample—a representative subset of the population—is usually selected for such a study.A number of research designs could be used to investigate the two hypotheses for the study, i.e., that Drug A is most effective for men, and that Drug B is most effective for women. Case control designs or clinical trials could be used to test the effects of Drug A and Drug B on men and women, perhaps using matched samples for the control and experimental conditions.You will learn more about these techniques and strate- gies later in this chapter.

Inference and Threats to Validity | 97

Research

Design

It’s important to distinguish between the estimates obtained from a sample and the numerical characteristics of a population that would be determined if every member of the population were measured.For example, a parameter (from the population) might be characterized as the percentage of male patients whose systolic blood pressure reached 120 after administration of Drug A, while the statistic (from the sample) might be the percentage of male patients whose systolic blood pressure reached 120 after administration of Drug A at Hospital X.If the responses to the treatment by patients at Hospital X were truly representative of the population, then the statistics computed would be considered true estimates of the population parameters.

In real-life situations, you rarely encounter a variable with zero variance, i.e., where every experimental unit responds in exactly the same way.In this case, statistics from random samples are treated as random variables, and the responses gathered take on the form of a probability (or sampling) distribution.The proper- ties of these distributions and associated theorems (such as the Central Limit Theorem*_{) mean that you can make valid comparisons between experimental and} control groups using the designs described in this chapter, and using the analyt- ical tools described in Chapter 7 and the balance of this book.

Validity means being sure that what you are measuring is what you intend to

measure or claim to have measured, as described in Chapter 1.In experiments, the validity of an observed treatment difference between responses (between an experimental and control condition, for example) is the extent to which the result cannot be attributed to error in sampling or measurement.Continuing the hypertensive drug example, if case-control and subsequent experimental studies were under- taken by the same clinical team inside the same hospital, to what extent would the results be valid? The main concerns about greater interpretation vis-à-vis the population would be the fact that some consistent bias in measurement may be giving rise to the result, and/or that the clinical population being tested was too small and/or not sufficiently representative of the broader hypertensive population. Thus, two easy techniques to improve the validity of a research result are to test across different laboratories, research teams, and facilities, and to sample with sufficiently large sample sizes to observe an experimental effect.The actual sizes required to test the statistical significance of differences between treatments depends on the specific test being used (power and sample size calculations are further discussed in Chapter 18).

In some fields, such as psychology, general notions of validity have been refined to develop typologies of validity.The American Psychological Association, for instance, originally classified validity into four categories: content validity, construct validity, concurrent validity, and predictive validity.In more recent times, this classification has itself become more refined into the following major types: construct validity, content validity, internal validity, and statistical validity. I will review each of these below, and discuss threats to validity in each case.

* Where the normal distribution accurately represents the actual distribution of responses, since the Central Limit Theorem predicts that sample means will approximate the normal distribution.

In document OReilly Statistics in a Nutshell A Desktop Quick Reference Aug 2008 pdf (Page 120-122)