Investigations supported by statistics follow a surprisingly standard lifecycle.If you are reviewing a piece of work, try and determine what the sequence of events was during the investigation.Did the investigators start off with one hypothesis and change their minds once the results were in? Did they try numerous different tests with various post-hoc adjustments to make sure that they could report a “significance test” result? Asking searching questions about the research process is like a detective asking questions about movements at a certain date and time— inconsistencies and story-changing can be very revealing!
In short, investigations based on statistics should proceed along the following lines:
• Assuming that a period of observation and exploration has preceded the start of an investigation, research questions should be stated up front.Investiga- tors must have formulated hypotheses (and the corresponding null hypothe- ses) well before they begin to collect data.Otherwise, the use of hypothesis testing is invalid, and the investigation may take on the flavor of a “fishing expedition.” Given that ap= 0.05 result represents a 1 in 20 chance of mak- ing a Type I error, and since many thousands of studies are published each year in the scientific literature alone, many “facts” must surely be open to question.This is where independent repeatability and reliability are critical to the integrity of the scientific method.
• The relationship between the population of interest and the sample obtained must be clearly understood.It’s not sufficient to make inferences about the entire human population based on a sample of highly educated, healthy, middle-class college students from one college. Honestly.
• Hypotheses must relate to the effect of specific IVs on DVs.Thus, it’s critical to know as much about the DVs as possible, especially every source of varia- tion in the DVs.This is particularly important where DVs are thought or known to be highly correlated (i.e., multicollinearity). The DVs must be mea- surable and must operationalize underlying concepts completely.
• In complex designs, where there are both main effects and interactions to consider, all of the possible combinations of main effects and interactions and their possible interpretations must be noted.
• Procedures for random sampling and handling missing data or refusals must be formalized early on, in order to prevent bias from arising.Remember that a truly representative sample must be randomly selected.Where purely ran- dom sampling is not feasible, it may be possible to identify particular strata within the population and sample those in proportion to their occurrence within the population.For example, since the proportion of males and females is generally known for most populations, sampling can be performed appropriately and “in proportion.”
• Always select the simplest test that will allow you to explore the inferences that you need to examine.Multivariate techniques are incredibly important, but if you only need to make simple comparisons, they may be inappropriate.
Research Design | 111
Critiquing
Statistics
• Selection of tests must always be balanced against known or expected charac- teristics of the data.For example, if testing mean differences, and only small samples may be available, then use a two-samplet-test in preference to an ANOVA.Although the two only differ in terms of the number of group means being tested, a designed experiment is the only common way to be able to make causal inferences.
• Don’t be afraid to report deviations, nonsignificant test results, and failure to reject null hypotheses—not every experiment can or should result in a major scientific result!
Research Design
Generally, the design of an investigation of a question of interest needs to follow the guidelines presented in Chapter 5 if meaningful inferences are eventually to be made.However, many investigations do not follow these types of guidelines at all—especially if you have a newspaper to sell that relies on sensational headlines to grab the attention of an inattentive reader.Other investigations are based on a single sample or event whose significance is then extrapolated to indicate some more fundamental shift.
One of the major problems in the climate change debate is that there is no experi- mental apparatus available that completely represents the complexity of the earth’s climatic systems.Indeed, to thoroughly test the various hypotheses that have been developed, you might need to have a number of different planets, iden- tical in composition to earth, which can be assigned to control and treatment conditions to test different hypotheses.For example, does global warming arise from greenhouse gases? Or does it arise from increases in solar activity? Do both factors contribute to global warming? Do the factors have main effects on global warming, or do they interact to produce global warming, or both?
Clearly, it is not possible to obtain such samples for physical objects like plane- tary systems, so investigators may resort to using models that have been demonstrated to have predictive value.However, even though computer models of the climate have improved dramatically over the years, thanks to advances in computer processing power and model refinement, they are still unable to predict the weather accurately beyond a few days.If the models have predictive validity, how well do they rate on explanatory value? A classic example is the development of supervised learning algorithms such as artificial neural networks; these general- ized function approximators were used to predict many different phenomena in psychology and neuroscience, based on the backpropagation algorithm, which attempts to minimize errors between an output array of units, and an expected value supplied for training, where hidden unit values are adjusted to compensate for error.While the networks do have predictive validity, real neural networks rarely provide for the backpropagation of potentials in the way that the model implies.Thus, you would not want to base the development of new neurosurgical procedures on this type of artificial neural network, even though it has a level of predictive validity.