Probability Samples - Reading Statistics Huck

If all members of the population can be speciﬁed prior to drawing the sample, if each member of the population has at least some chance of being included in the sample, and if the probability of any member of the population being drawn is known, then the resulting sample is referred to as a probability sample. The four types of probability samples considered here are simple random samples, stratiﬁed random samples, systematic samples, and cluster samples. As you read about each of these samples, keep in mind the illustration presented in Figure 5.1a.

Simple Random Samples. With a simple random sample, the researcher, either literally or ﬁguratively, puts the names of all members of the population into a hat, shufﬂes the hat’s contents, and then blindly selects a portion of the names to determine which members of the total group are or are not included in the sample.

The key feature of this kind of sample is an equal opportunity for each member of the population to be included in the sample. It is conceivable, of course, that such a sample could turn out to be grossly unrepresentative of the population (because the sample turns out to contain the population members who are, for example, strongest or most intelligent or tallest). It is far more likely, however, that a simple random sample will lead to a measurement-based statistic that approximates the value of the parameter. This is especially true when the sample is large rather than small.

In Excerpt 5.3, we see an example of simple random samples being used in applied research studies. Because there are different kinds of random samples that can be drawn from a tangible population, these researchers deserve credit for using the word simple to clarify exactly what type of random sampling procedure was

used in their studies. In this excerpt, you see the term sampling frame. Generally speaking, a sampling frame is simply a list that enumerates the things—people, an-imals, objects, or whatever—in the population. In a very real sense, there must be a sampling frame for simple random samples (or, more generally, for any probabil-ity sample) to be drawn from a population.

Stratified Random Samples. To reduce the possibility that the sample might turn out to be unrepresentative of the population, researchers sometimes select a stratified random sample. To do this, the population must first be subdivided into two or more parts based on the knowledge of how each member of the population stands relative to one or more stratifying variables. Then, a sample is drawn that mirrors the population percentages associated with each segment (or stratum) of the population. Thus, if a researcher knows that the population contains 60 percent males and 40 percent females, a random sample stratified on gender should contain six males for every four females.

An example of a stratified random sample is presented in Excerpt 5.4. This is a good example of a well-described stratified random sample because it answers the question, “Stratified on what?” Too often, researchers either make no mention EXCERPT 5.3

• Simple Random Sample

The population of interest was all college students at a major southeastern university.

The entire student body, except those under age eighteen who were legally minors, formed the sampling frame. A simple random sample of 15,000 individuals from the population of 50,701 students age eighteen or older was e-mailed an invitation to anonymously participate in a Web-based survey.

Source: Patton, C. L., Nobles, M. R., & Fox, K. A. (2010). Look who’s stalking: Obsessive pursuit and attachment theory. Journal of Criminal Justice, 38(3), 282–290.

EXCERPT 5.4

• Stratiﬁed Random Sample

A target sample size of 40 was obtained through a stratiﬁed random sample method.

Students were first grouped according to the final grade (high distinction, distinction, credit, pass, fail). Students were randomly selected so that the proportion of students with each grade in the final sample was equal to the proportion of the grades in the class as a whole. This was done to ensure that the sample was representative of the full range of abilities in the class.

Source: Neumann, D. L., Neumann, M. M., & Hood, M. (2010). The development and eval-uation of a survey that makes use of student data to teach statistics. Journal of Statistics Edu-cation, 18(1), 1–18.

whatsoever of the variable used to create the strata, or terms like “age-stratiﬁed” or

“region-stratiﬁed” are used without any speciﬁcation of how many strata were set up or what the numerical or geographic boundaries were between the strata.

In some studies using stratified random samples, researchers make the size of the sample associated with one or more of the strata larger than that strata’s pro-portionate slice of the population. This oversampling in certain strata is done for one of three reasons: (1) anticipated difficulty in getting people in certain strata to participate in the study, (2) a desire to make comparisons between strata (in which case there are advantages to having equal strata sizes in the sample, even if those strata differ in size in the population), and (3) a need to update old strata sizes, when using archival data, because of recent changes in the characteristics of the popula-tion. In Excerpt 5.5, we see an example of a stratified random sample that involved oversampling for the first of these three reasons.

EXCERPT 5.5

• Oversampling

Computer-assisted self-interviewing was used to collect data from [a stratified] sam-ple of household residents in four cities (Baltimore; Durham, NC; St. Louis; and Seattle) and the U.S. census-defined county subdivisions immediately adjacent to them. . . . Within the four study sites, we stratified segments by the percentage of population who were black and oversampled segments with high minority concen-trations. This procedure yielded a large enough sample of couples in which one or both partners were black to provide stable estimates of both their behaviors and the antecedents of those behaviors.

Source: Billy, J. O. G., Grady, W. R., & Sill, M. E. (2009). Sexual risk-taking among adult dat-ing couples in the United States. Perspectives on Sexual & Reproductive Health, 41(2), 74–83.

Systematic Samples. A third type of probability sample, called a systematic sample, is created when the researcher goes through an ordered list of members of the population and selects, for example, every fifth entry on the list to be in the sam-ple. (Of course, the desired size of the sample and the number of entries on the list determine how many entries are skipped following the selection of each entry to be in the sample.) So long as the starting position on the list is determined randomly, each entry on the full list has an equal chance of ending up in the sample. Thus, if the researcher decides to generate a sample by selecting every fifth entry, the first entry selected for the sample should not arbitrarily be the entry at the top of the list (or the one positioned in the fifth slot); instead, a random decision should determine which of the first five entries goes into the sample.

Excerpt 5.6 exemplifies the use of a systematic sample. As indicated in this excerpt, pages out of census data recorded on microfilm reels were the things being sampled. Every tenth page on a reel ended up in the sample, with the first of those

pages being selected at random from among pages 1 and 5. (I think the excerpt’s ﬁfth sentence would be clearer if the words “that page and” had appeared between the words “designate” and “every.”)

Cluster Samples. The last of the four kinds of probability sampling to be discussed here involves what are called cluster samples. When this technique is used to extract a sample from a population, the researcher ﬁrst develops a list of the clusters in the population. The clusters might be households, schools, litters, car dealerships, or any other groupings of the things that make up the population. Next, a sample of these clusters is randomly selected. Finally, data are collected from each person, animal, or thing that is in each of the clusters that has been randomly selected, or data are collected from a randomly selected subset of the members of each cluster.

Excerpt 5.7, we see an example of a cluster sample in which each “cluster”

was a Head Start school in the state of New Hampshire. As indicated in the excerpt, 27 schools were randomly selected. Then, every student in each selected school was examined (unless their parents declined the opportunity) by a dentist who checked for cavities. This technique of cluster sampling made it much easier for EXCERPT 5.6

• Systematic Sampling

The manuscripts from each census are stored on several thousand microﬁlm reels.

Most reels contain several hundred pages. Each of these pages contains between 40 and 50 lines, with each line containing information on one person. . . . The sam-pling strategy is based on the census page. We generate a random starting point for each microﬁlm reel between 1 and 5, and then designate every 10th page thereafter as a sample page. Thus, for example, if the starting point is 3, we designate the 3rd, 13th, and 23rd pages, continuing in that fashion until the end of the reel.

Source: Davern, M. (2009). Drawing statistical inferences from historical Census data, 1850–1950.

Demography, 46(3), 589–603.

EXCERPT 5.7

• Cluster Samples

We conducted the survey at 27 of the 45 New Hampshire Head Start sites. . . . We used a simple random one-stage cluster sample design, in which all children at each selected site would be surveyed. . . . Four volunteer dentists provided oral exami-nations and determined the presence of untreated dental caries, caries experience, and treatment urgency.

Source: Anderson, L., Martin, N. R., Burdick, A., Flynn, R. T., & Blaney, D. D. (2010). Oral health status of New Hampshire Head Start children, 2007–2008. Journal of Public Health Dentistry, 70(3), 245–248.

the researchers to collect their study’s data than would have been the case if a sim-ple random samsim-ple of children had been taken from all 45 Head Start schools.

As indicated in Excerpt 5.7, the Head Start dental evaluation study used a one-stage cluster sample design. You are likely to encounter research reports that refer to two- or three-stage cluster samples. In these multistage cluster samples, clusters of one kind are embedded inside clusters of different kind, with the sampling process beginning with the larger clusters and then continuing down to the smaller clusters. For example, a three-stage cluster sample of homes in a given state (per-haps to assess their painted color) might involve selecting a sample of counties ﬁrst, then a sample of cities within the selected counties, and ﬁnally a sample of resi-dential neighborhoods within the selected cities. Collecting the study’s data from the resulting sample of homes, grouped in clusters, is far more convenient than if a simple random sample of homes were selected from all homes in the state.

In document Reading Statistics Huck (Page 123-127)