The principles of inferential statistics
GENERALIZATION, CONFIDENCE AND CAUSALITY
If you carry out a piece of research and obtain evidence that the difference between variable means, or the association between variables, is statistically significant, what are the impli-cations of this? In the final section of this chapter we will examine the important issue of how you should interpret statistically significant differences and associations and, in partic-ular, the extent to which it is appropriate to generalize your findings, how confident you can be that your findings are true, when it is safe to assume that the associations and differences you have found have causal relationships.
Imagine that you carry out a piece of cross-sectional research on the relation between pay and well-being in a large organization. You obtain two main findings. First, there is a correlation between how much people earn and how happy they feel. Second, the mean level of happiness of people in the top two grades of the organization is higher than that of the people in the bottom two grades. Three important questions may now occur to you:
1 To what extent can you generalize your findings beyond the sample of people, the organization, and the setting you have used in your study?
2 To what extent should you have confidence in your findings?
3 Can you claim that you have found evidence of a causal relationship (e.g. that being in the top two grades causes people to be happier than being in the bottom two grades)?
The first issue concerns generalization: under what circumstances is it appropriate to generalize findings beyond the sample of people, organization and setting you have collected data on? This issue is closely connected with research design, an area that is not covered in this book, but on which there are many readable texts such as those by Collis and Hussey (2003) and Saunders et al. (2002). Assuming that you have a good working knowledge of research design, let’s return to the fictitious study outlined above. If you were to obtain the results outlined above, would you be in a position to claim that for people in general, salaries and well-being are positively associated, and that people in the top grades of organ-izations will be happier than those in the lowest grades? Or, in contrast, is it only appropriate to claim that it is likely that the association and difference you have found cannot be gener-alized beyond the people, organization and setting in which you collected your data?
Unfortunately, this question cannot be answered by carrying out a statistical analysis of your existing data. While inferential statistical analysis can be carried out on your data to examine the extent to which the association and difference found can be generalized to the populations of numbers you have sampled from, it cannot tell you anything about popula-tions of numbers you haven’t sampled from. So, if you are to generalize your findings, the question is whether the population of numbers you have sampled from would also be found if you obtained data from people, organizations and settings other than those used in your study.
There is no simple, formulaic, answer to this question. Instead, you need to actively think about the research design, carefully considering the extent to which it may or may not be appropriate to generalize your findings to people, organizations and settings beyond those examined in your research study. For example, a very tentative answer to the ques-tion of the extent to which you can generalize your findings can be obtained by posing the question: ‘Are there any reasons to suspect that the statistically significant association or difference, and effect sizes, I have found in my study would be different if I carried out the research again with other participants, organizations, or settings?’. If the answer to this question is yes (and in organizational research it very often is), very considerable caution should be shown in generalizing the findings. If the answer is no, you may tentatively conclude that it may be possible to generalize the findings to other people, organizations and settings. However, the only way to gain a substantial degree of confidence in the extent to which your findings can be generalized in this way is to carry out the research again, carefully and appropriately sampling from whatever people, organizations and settings you wish to generalize your results to.
The second issue concerns the level of confidence we can have in our research findings.
This is something about which there is some controversy in statistics (e.g. Cohen 1994;
Gigerenzer 1993), but I will draw attention here to two factors with which I believe most 1111
THE PRINCIPLES OF INFERENTIAL STATISTICS
statisticians would agree. The first is that the smaller the p value obtained from a statistical analysis, the more confident we can be that the null hypothesis is false (Fisher 1925), and therefore that the alternative hypothesis is true. The second is that if more than one research study is carried out to examine the same thing, the greater the proportion of these studies which obtain statistical significance, and the lower the p values obtained across these studies, the more confident we can be that the null hypothesis is false.
Incidentally, in relation to confidence, it is not at all unusual for the inexperienced researcher to claim that by finding a statistically significant association or difference between two variables they have ‘proved’ something. So a naïve researcher might believe that the statistically significant difference between the happiness of people in the top two grades and the bottom two grades proves that in the organization studied people in the most senior positions are happier than those in the bottom two grades. Such a claim is misleading and wrong. Remember that statistical significance merely indicates that, if the null hypothesis is true, the probability of finding an association (or difference) between variables as large as the one found in the data is less than 5 per cent. If statistical significance is attained, this does not mean that the existence of the association or difference between populations has been ‘proved’. Proof is a word used in law courts rather than in science. Instead, we should more cautiously claim only that we have found evidence that the variables in question may be associated, or that we have found evidence that means of the variables may be different.
To claim that something has been proved implies that there is certainty about it, and, as explained earlier, inferential statistics is concerned with establishing probabilities and like-lihoods rather than certainties.
The final issue is causality. In what circumstances can you claim that one variable causes changes in another? For example, in this case could you claim that you have shown that, at least in the organization you studied, changes in pay cause changes in happiness, or being in the top two grades in an organization causes people to be happier than if they are in the bottom two grades? This question often causes considerable confusion. One reason for this is that it is often incorrectly assumed that statistical tests of association (e.g. the statistical significance of a correlation coefficient) do not, and cannot, inform us about causality, whereas tests of difference (e.g. the t-test) can do so.
In fact, whether or not we can infer that one or more variables has a causal effect on another has nothing to do with whether we are examining associations or differences, nor does it depend on the particular statistical test used to analyse the data. In order to answer the question of whether a change in one variable has a causal effect on another variable we need to examine the research design from which the data have been produced. If it is the case that a carefully controlled experimental study has been carried out, and the only possible explanation for variation in variable x is variation in variable z, it is appropriate to infer that variable z is having a causal effect on variable x. However, if there are any explanations for the variation in variable x other than the variation in variable z, we cannot claim that such a causal relationship exists.
Let’s take an example to illustrate this point. Imagine that you are a researcher who has developed a medicine to reduce blood pressure in people suffering from hypertension.
In a double blind study (in which neither the patient nor the researcher is aware of the dosage), 100 people suffering from influenza are given a placebo pill not containing any of
the medicine, a pill containing 1 mg of the medicine, a pill containing 5 mg, or a pill con-taining 10 mg. Which of these pills they receive is random, and they are all given the pills at the same time and in the same environment. One week later their blood pressure is measured, and a statistically significant correlation of –.6 is found between the amount of the drug they received and their blood pressure. In these circumstances, because an experimental design has been used, and steps have been taken to eliminate the influence of all critical extraneous variables, it would be acceptable to tentatively conclude that the medicine caused a reduction in blood pressure.
However, such carefully controlled experimental designs are very rare in research carried out in organizations. Much more common is the type of situation proposed in this section in which a researcher finds that there is an association between high pay and happiness, or a difference in the happiness of people in higher grades and lower ones. In these circum-stances it is certainly not the case that the only explanation for the variation in pay levels is variation in happiness (or vice versa). In fact, it is possible that pay levels are having a causal effect on happiness, happiness is somehow having a causal effect on pay levels, or some other variable or variables is influencing both happiness and pay levels. Consequently, all we can say in this case is that there is a relationship between pay and happiness. The question of why this relationship exists cannot be answered with the data obtained here.
SUMMARY
In this chapter we have covered many of the key ideas and principles underpinning inferential statistics. An awareness of these ideas is helpful not only when carrying out your own statistical analysis, but also when interpreting research findings reported by others. For example, being aware that when an association or difference is statistically significant this does not necessarily indicate that it is of practical significance, and that effect sizes are of critical importance when considering research findings, will prove valuable whatever the particular inferential statistical technique you are using or reading about. However, what has not been covered here is how all this is done: how are confidence intervals worked out, and how can we use sampled data to infer that the probability that two populations’ means are different is less than, or greater than, 5 per cent? This is covered in the next chapter.
■ It is important to distinguish between a sample of people and a statistical sample of numbers, or scores, obtained from these people.
■ We are usually interested in inferring information about a statistical population of numbers from the statistical sample of numbers. Commonly, you infer population parameters such as the population mean and population standard deviation from sample statistics such as the sample mean and sample standard deviation.
■ In research, a statistical sample of numbers is collected on whatever variable the researchers are interested in (such as job commitment, age or job performance). So you often want to know about the population mean and standard deviation of a variable such as
1111
THE PRINCIPLES OF INFERENTIAL STATISTICS
job commitment, and you estimate this from the mean and standard deviation of a sample of job-commitment scores.
■ You can never be sure that you have accurate estimates of population parameters from sample statistics because inferential statistics is based on probabilities rather than certainties.
■ A confidence interval indicates that, with a specified level of confidence (e.g. 95 per cent), a population parameter falls somewhere between two specific values. For example, a 95 per cent confidence interval of 4.7–5.3 for the well-being of Spanish waiters means that we can be 95 per cent confident that the well-being of Spanish waiters falls between these two values.
■ For continuous data, the less variation in the variable, and the larger the sample, the smaller the confidence interval will be.
■ For categorical data, the larger the sample, the smaller the confidence interval will be.
■ Often in organizational research you are interested in the associations between categorical or continuous variables, or in the differences in the central tendencies of continuous variables.
■ Null hypotheses and alternative hypotheses are statements made about statistical populations you wish to know about rather than the statistical samples you do know about.
■ Statistical tests tell you the probability that you would obtain a difference (or association) between variables as large as the one you have found in your sample data, if the null hypothesis is true.
■ If the probability is less than 0.05 (5 per cent) that you would obtain a difference (or association) between variables as large as the one you have found in your sample data, if the null hypothesis is true, the convention is to reject the null hypothesis and accept the alternative hypothesis. In these circumstances the finding is said to be statistically significant.
■ There is an important distinction between statistical significance, on the one hand, and theoretical or practical importance, on the other.
■ When considering the theoretical and practical importance of quantitative research findings, you should consider not only whether an association or difference is statistically significant but, also, the relevant effect sizes.
■ Common measures of effect size are the correlation coefficient (for the degree of
association between two continuous variables) and d (for the amount of difference between the means of two variables).
■ If there is a real association or difference between populations, you are more likely to obtain statistical significance with large samples than with small samples.
■ Whether an effect size is large enough to be important needs to be considered in the context of the theoretical implications and practical applications of the research.
■ Power analysis makes it possible to establish the likelihood that a null hypothesis will be rejected if it is false. This likelihood is derived from information about the estimated population effect size, the significance level adopted in the study, and the sample size used.
■ Power analysis can also be used to identify the sample size required in a research study. In this case, information is required about the significance level to be adopted and the level of
power required. It is also necessary to estimate the population effect size by carrying out a pilot study, examining the research literature for relevant studies carried out in the past, or adopting the effect sizes found to be typical in the social sciences.
■ In considering the extent to which the results of a research study can be generalized beyond the participants, organizations and settings used in that study, an attempt should be made to actively think of reasons why generalization may or may not be possible. The only way to gain a substantial degree of confidence in such generalization is to carry out the research again with other people and organizations, and in other settings.
■ The lower the p value obtained in a test of significance, the more confident we can be that the null hypothesis is false. Also, the greater the proportion of research studies carried out to examine an association or difference which produce statistically significant findings (and the lower the p levels obtained across these studies), the more confident we can be that the null hypothesis is false. Claims that statistically significant findings ‘prove’ that particular associations and differences exist are incorrect and should be avoided.
■ If it is the case that a carefully controlled experimental study has been carried out, and the only possible explanation for variation in variable x is variation in variable z, it is appropriate to infer that variable z is having a causal affect on variable x. However, if there are any explanations for the variation in variable x other than the variation in variablez, we cannot claim that such a causal relationship exists.
QUESTIONS
1 In inferential statistics do we generalize from samples to (a) larger samples or (b) populations?
2 Which of the following is true?
(a) If we have good sample statistics we can determine the values of population parameters with exactness and certainty.
(b) When estimates of population parameters are based on sample statistics we cannot be sure they are correct.
3 Which of the following correctly describes a confidence interval?
(a) An estimate of the range of values within which a population parameter falls at a given level of probability.
(b) An estimate of how confident we can be in a sample statistic.
4 What two factors determine the size of the confidence interval for a mean?
5 Which of these do inferential statistical methods make it possible to test directly?
(a) The alternative hypothesis.
(b) The null hypothesis.
1111
THE PRINCIPLES OF INFERENTIAL STATISTICS
6 Is the null hypothesis a statement about (a) the nature of the population or (b) the nature of the sample?
7 Why is .05 commonly chosen as the cut-off value for statistical significance?
(a) It provides a reasonable compromise between the probabilities of making type I and type II errors.
(b) It provides a trade-off between the probabilities that the null hypothesis and the alternative hypothesis are correct.
8 Statistical significance tells us whether a difference between two means is important from a practical point of view. True or false?
9 Name a widely used measure of the effect size of an association, and a widely used measure of the effect size of the difference between two means.
10 How is d worked out?
11 What are the conventions in organizational research for (a) small, medium and large correlations, and (b) small, medium and large differences between means?
12 What is statistical power and what factors determine the amount of statistical power that is present?