Most research studies are based on more than a few participants. The mean number of participants per study in articles published in 1988 in the Journal of Personality and Social Psychology was about 200 (Reis and Stiller, 1992). This is quite a substantial average number of participants. So:
z How big should a sample be in order for us to claim that our findings apply generally?
z How should samples be selected in order for us to maximise our ability to generalise from our findings?
If everyone behaved in exactly the same way in our studies, we would only need to select one person to investigate the topic in question – everyone else would behave exactly the same. The way in which we select the sample would not have any bearing on the outcome of the research because there is no variability. We only need sampling designs and statistics, for that matter, because of this variability. Psychology would also be a very boring topic to study.
Fortunately, people vary in an infinite number of different ways. Take, for example, something as basic as the number of hours people say they usually sleep a day. While most people claim to sleep between seven and eight hours, others claim that they sleep less than six hours and others that they sleep more than ten hours (Cox et al., 1987, p. 129). In other words, there is considerable variation in the number of hours claimed.
Furthermore, how much sleep a person has varies from day to day – one day it may be six hours and the next day eight hours. Differences between people and within a person are common just as one might expect. So our sampling methods need to be planned with the awareness of the issue of variability together with an awareness of the level of precision that we need in our estimates of the characteristics of people.
The necessary size of the samples used in research should partially reflect the conse-quences of the findings of the research. Research for which the outcome matters crucially may have more stringent requirements about sample size than research for which the outcome, whatever it is, is trivial. For example, what size sample would one require if the outcome of the study could result in counselling services being withdrawn by a health authority? What size sample would one require if the study is just part of the
training of psychology students – a practical exercise? What size sample would one require for a pilot study prior to a major investigation? While they might disagree about the exact sample size to use, probably psychologists would all agree that larger samples are required for the study that might put the future of counselling services at risk. This is because they know that a larger sample is more likely to demonstrate a trend in the study if there is one in reality.
Also, as we have mentioned, many psychologists also tend to favour larger sample sizes because they believe that this is likely to result in greater precision in their estimates of what is being measured. For example, it is generally the case that larger samples are employed when we are trying to estimate the frequency, or typical value, of a particular behaviour or characteristic in the population. If we wanted an estimate of the mean number of reported hours of sleep in, say, the elderly, then we are likely to use a bigger sample. What this means is that it is possible to suggest that the average number of hours of sleep has a particular value and that there is only a small margin of error involved in this estimate. That is, our estimate is likely to be pretty close to what the average is in reality. On the other hand, if we want to know whether the number of hours slept is related to mental health then we may feel that a smaller sample will suffice. The reason for this is that we only need to establish that sleep and mental health are related – we are less concerned about the precise size of the relationship between the two. (If the aim is to produce an estimate of some characteristic for the population, then we will have more confidence in that estimate if the sample on which that estimate is based is selected in such a way so as to be representative of the population. The basic way of doing this is to draw samples at random. However, ways of selecting a representative sample will be discussed in Chapter 13 along with other sampling methods in some detail. Of course, the more representative we can assume our sample to be the more confidence we can have in our generalisations based on that sample.)
Probably most sampling in psychological research is what is termed convenience samples. These are not random samples of anything but groups of people that are rela-tively easy for the researcher to get to take part in their study. In the case of university lecturers and students, the most convenient sample typically consists of students – often psychology students. What is convenient for one psychologist may not be convenient for another, of course. For a clinical psychologist psychotherapy patients may be a more convenient sample than undergraduate students. Bodner (2006) noted that for a random sample of 200 studies selected from PsycINFO in 1999 only 25 per cent of them used college students, ranging from 5 per cent in clinical or health psychology to 50 per cent in social psychology.
Convenience samples are usually considered to be acceptable for much psychological research. Since psychological research often seeks to investigate whether there is a rela-tionship between two or more variables, a precisely defined sample may be unnecessary (Campbell, 1969, pp. 360–2). Others would argue that this is very presumptuous about the nature of the relationship between the two variables – especially that it is consistent over different sorts of people. For example, imagine that watching television violence is related to aggressiveness in males, but inversely related to aggressiveness in females. By taking a sample of psychology students, who tend to be female, a convenience sample of university students will actually stack things in favour of finding that watching television is associated with lower levels of aggressiveness.
Whether it is possible to generalise from a sample of psychology students, or even students, to the wider population is obviously an empirical question for any one topic of research. It is also a matter of credibility since it would be scarcely credible to study post-partum depression simply on the basis of a general convenience sample of univer-sity students. There are many circumstances in which it would seem perverse to choose to study students rather than other groups. For example, if a researcher was interested in the comprehensibility of the police caution then using university students might seem less
appropriate than using a sample of people with poor educational attainment. Obviously, if one is addressing an issue that is particular to a certain group such as children or psycho-therapy patients, then it is important to select this group of people. The use of students as a primary group for study has its advantages in the context of their education and training as it is time-consuming to contact other groups; on the other hand it has severe difficulties for virtually any other purposes. Getting the balance right is a matter for the research community in general, not students learning to do psychology.
Often in psychological research, it is difficult to identify the population that is of concern to the researcher. Although common sense would suggest that the population is that which is represented by the actual participants in the research, this usually does not appear to be what is in the researcher’s mind. Probably because psychologists tend to see research questions as general propositions about human behaviour rather than propositions about a particular type of person or specific population, they have a tendency to generalise beyond the population which would be defined by the research sample. The difficulty is, of course, just when the generalisation should stop – if ever.
Similarly, there tends to be an assumption that propositions are not just true at one point in time but true across a number of points in time. That is, psychological processes first identified more than a lifetime ago are still considered relevant today. Gergen (1973) has argued for the historical relativity of psychological ideas which Schlenker (1974) has questioned.
So there appears to be a distinction between the population of interest and the popu-lation defined clearly by the sample of participants utilised in the research. Of course, it would be possible to limit our population in time and space. We could say that our population is all students at Harvard University in 2010. However, it is almost certain that having claimed this we would readily generalise the findings that we obtain to students at other universities, for example. We may not directly state this but we would write in a way which is suggestive of this. Furthermore, people in our research may be samples from a particular group simply because of the resource constraints affecting our options. For example, a researcher may select some, but not all, 16-year-olds from a particular school to take part in research. Within this school, participants are selected on a random basis by selecting at random from the school’s list of 16-year-olds. While this would be a random sample from the school and can be correctly described as such, the population as defined by the sample would be very limited. Because of the extremely restricted nature of the initial selection of schools, the results of the study may not be seen as being more informative than a study where this random selection procedure was not used but a wider variety of research locations employed.
The question of the appropriateness of sampling methods in most psychological research is a difficult one. Psychological researchers rarely use random sampling from a clearly defined population. Almost invariably some sort of convenience sample of parti-cipants is employed – where randomisation is used it is in the form of random allocation to the conditions of an experiment or the sequence of taking part in the conditions. This is as true of the best and most influential psychological research as less auspicious and more mundane research. In other words, if precise sampling were the criterion for good research, psychology textbooks may just as well be put through the shredder. This is not to say that sampling in psychological research is good enough – there is a great deal to be desired in terms of current practices. However, given that the major justification for current practice lies in the assumed generality of psychological principles, things probably will not change materially in the near future.
Another factor needs to be considered when evaluating the adequacy of psychological sampling methods: participation rates in many sorts of research are very low. Participa-tion rates refer to the proporParticipa-tion of people who take part in the research compared with the number asked to take part in the research, that is, the proportion who supply usable data. Random sampling is considerably undermined by poor participation rates; it cannot
FIGURE 4.2 Factors in the generalisation of psychological research findings
be assumed that those who do not participate are a random sample of the people approached. They do not participate for a variety of reasons, some of which may mean that certain sorts of participants exclude themselves. These reasons may be systematic-ally related to the research topic – maybe potential participants are simply uninterested in the topic of the research. Alternatively, there may be more technical reasons why participation rates are low. A study which involves the completion of a questionnaire is likely to result in less literate potential participants declining to take part. In other words, issues to do with sampling require the constant attention, consideration and vigi-lance of researchers planning, analysing and evaluating research. The issues are complex and impossible to provide rules of thumb to deal with. The lesson is that simply using random selection methods does not ensure a random sample. In these circumstances, convenience samples may be much more attractive propositions than at first they appear to be – if poor participation rates systematically distort the sample then what is to be gained by careful sampling? Figure 4.2 displays some points about the kinds of samples typically used by psychologists.
4.4 Statistics and generalisation
Statistical analysis serves many important roles in psychology – as some students will feel they know to their cost. There are numerous statistical techniques that help researchers explore the patterns in their data, for example, which have little or nothing
to do with what is taught on introductory statistics courses. Most students, however, are more familiar with what is known as ‘inferential statistics’ or, more likely, the concept of ‘significance testing’. Significance testing is only one aspect of research but is a crucial one in terms of a researcher’s willingness to generalise the trends that they find in their data. While students are encouraged to believe that statistical significance is an import-ant criterion, it is just one of two really importimport-ant things. The other is the size of the trend, difference, effect or relationship found in the research. The bigger that these are, then the more important the relationship. Furthermore, statistical significance is not the most important thing in evaluating one’s research. One needs a fuller picture than just that when reaching decisions about research.
Moreover, as a consequence of the tendency of psychologists to emphasise statistical significance, they can overlook the consequences of failing to show that there is a trend in their data when, in reality, there is a trend. This can be as serious as mistakenly con-cluding that there is a trend when in reality there is no trend and that our sample has capitalised on chance. For example, what if the study involves an innovative treatment for autism? It would be a tragedy in this case if the researcher decided that the treatment did not work simply because the sample size used was far too small for the statistical analysis to be statistically significant. Essentially this boils down to the need to plan one’s research in the light of the significance level selected, the minimum size of the effect or trend in your data that you wish to detect, and the risk that you are prepared to take of your data not showing a trend when in reality there is a trend. With these things decided, it is possible to calculate, for example, the minimum sample size that your study will need to be statistically significant for a particular size of effect. This is a rather unfamiliar area of statistics to most psychologists, which known as statistical power analysis. It is included in the new edition of the statistics textbook which accompanies this book (Howitt and Cramer, 2011a).