• No results found

There are several identifiable limitations of the present study that must be bome in mind when interpreting the results. One of the most critical of these relates to the nature of the stimuli on which ratings were based. Participants were required to evaluate a hypothetical lecturer whose performance was

represented by statements contained in a written vignette. Using this method

The use of "paper people" as rating stimuli poses questions regarding the

extemal validity of the research. Can the results from the present study be generalised to "real life" rating situations, such as performance appraisals or

reference reports?

There is evidence that studies using "paper person" designs result in different experimental outcomes (larger effect sizes) compared to those that have used

direct observation designs (Murphy, Herr, Lockhart, & Maguire, 1 986) . Woehr

and Lance ( 1 99 1) have tested competing explanations for these observed

differences in effect sizes. They concluded that differences are attributable to

a greater signal-to-noise ratio in direct observation studies. That is, effect sizes are smaller in direct observation studies because they include more performance irrelevant information (background noise) than do paper people studies. Interestingly, Woehr and Lance found that scripts in which

performance statements were embedded in written descriptions that included irrelevant information resulted in recognition and accuracy rating outcomes

similar to those obtained using videotape stimuli. They suggest that carefully constructed performance scripts can simulate some of the additional cues present in "real life" rating situations. However, they also point out that none of the laboratory methodologies, including those using direct observation techniques such as videotape, are likely to capture fully all aspects of the

rating situation inherent in evaluations conducted in the "real world."

Nevertheless, it has to be acknowledged that the present study represents an idealised rating situation where performance-irrelevant information has been minimised. Therefore, further research is required to establish if the results

can generalise to more complex and "noisier" environments found in applied rating situations.

A related issue concems the nature of the rating task and the setting in which the study was conducted. Participants in the present investigation were students who were required to evaluate the performance of a lecturer. It has been suggested by some researchers that results from laboratory studies conducted in educational settings using upward appraisal may not generalise to different settings with non-student raters (Dipboye, 1985; Gordon, Slade, & Schmitt, 1 986; Ilgen & Favero, 1985; Slade & Gordon, 1988) . However,

others have argued that the processes elucidated from laboratory research

have extemal validity and that laboratory and field methodologies are

complimentary (Dobbins, Lane, & Steiner, 1988a, 1988b; Mook, 1983; Woehr & Lance, 1 99 1) . Concems that have been expressed regarding the generality

of research fmdings are certainly reasonable. However, in the present case,

characteristics of the sample may mitigate some of these concems. More

specifically, most of the distance education students who comprised the

sample were experienced raters and were very familiar with reference reports and performance appraisals. Moreover, the majority of the sample were

working full time, and, in addition, many of the participants were employed as managers or supervisors. The background and experience of the present sample sets them apart from the typical student participant used in many

-�

other investigations. In fact, their profile is likely to closely match that of raters in applied settings to whom the results are supposed to generalise. Nevertheless, limitations imposed by the artificial rating situation and nature

of the rating task remain, and place constraints on external validity in the

present study.

In addition to the problem of external validity, there are several other

methodological limitations that challenge the robustness of the results. For

example, the manipulation of rating purpose was simplistic and poorly done.

Part of the rationale for the manipulation was the avoidance of demand

characteristics. However, in hindsight, a more thorough explanation of rating

purpose would have been more likely to have communicated and established

the desired motivational context. Another limitation was the fact that no

manipulation checks were included. This omission means that it is difficult to

determine if the failure to observe effects was due to the weak manipulation of the variable, failure to attend on the part of participants, or simply because the variable was irrelevant. One might also ask questions about the reliability

of the measure of rater affect. Unfortunately, because it was a single item

measure, no reliability coefficients could be calculated.

The low return rate in the present study is also of concern. Requests for

participation were sent out to more than 900 individuals. Slightly less than

300 participated in the study, a return rate of only 3 1 o/o. In hindsight it would have been worthwhile to include a follow-up letter which may have

helped to bolster participant numbers. However, while the return rate was _,.

low it must be pointed out that it is consistent with those reported in other

studies which have used postal surveys (e.g. , Cleveland et al. , 1 989; Judge,

Furthermore, many investigations have reported far lower retum rates (e.g. ,

Arthur & Bennett, 1 995; Lin, 1996; Shaw, Kirkbride, Fisher. & Tang, 1 995) .

Nevertheless, because of the low retum rate the representativeness of the sample cannot be guaranteed, and questions remain conceming the extemal

validity of the results. Finally, the investigation would have been improved if

participants had been required to evaluate more than one ratee. The

inclusion of multiple ratees would have resulted in a design that allowed for

the calculation of the entire range of accuracy measures.