Problems and Limitations of the Study - The Development of a Scoring System for an Alternative

A problem identified in this study was the restricted score range on immediate recall (12 – 20) and delayed recall (7 – 20) for Design C on the Alternative Form. In addition, the relatively high mean score of 17.9 (SD = 1.74) on immediate recall was comparable to the mean score of 17.4 (SD = 2.46) obtained on delayed recall on Design C, which was much higher than the other three designs on the Alternative Form. Design C was developed to be less spatially demanding than the other designs, thus verbal encoding may have enhanced performance on this design.

Furthermore, the higher performance on Design C may well reflect the truncated IQ range. Given the high functioning of the participants in this study and the greater availability to verbal encoding of the design, this may reflect the ability of the sample as a group to readily make semantic associations between elements of the design. In a sample with a wider IQ range and sample size the performances on the design may be more normally distributed.

No information was available on the third design on the Visual Reproduction subtest in the manual (Wechsler, 1987). Some data regarding performances on the third design was obtained from the local normative study conducted by Shores and Carstairs (2000). Detailed examination of the raw data indicated that 80% of the 399 participants in the study achieved a score of 6 – 9 on the third design. This indicates that the performances on the third designs on both the Alternative Form and the Visual Reproduction subtest of the Wechsler Memory Scale – Revised were performed towards the upper end of the potential range of scores. Although the restricted range of scores potentially reduces the discriminatory power and increases ceiling effects, the results provide support for the equivalence of the Alternative Designs to the designs of the Visual Reproduction subtest of the Wechsler Memory Scale – Revised.

Moreover, the internal reliability coefficient for Design C of the Alternative Form of .61 on immediate recall and .66 on delayed was inadequate. This suggested that as participants generally scored on most items the failure to score on a few items can significantly impact on the internal reliability of the design. Furthermore, the scoring criteria may be too lenient for Design C on the Alternative Form. The high scores on Design C of the Alternative Form may also reflect that the scoring criteria may have been too lenient for this particular design.

In order to remedy the problem of the restricted range of scores, high mean performance and inadequate internal consistency estimate for Design C of the Alternative Form, the

scoring criteria may require revision and further refinement. That is, the criteria of the design that have a very high correlation with each other can be eliminated as these items likely measure “the same thing.” Eliminating five items and replacing them with harder criteria may also be effective in providing a greater distribution and range of scores on the design.

The representativeness of the American based norms of the Visual Reproduction subtest Wechsler Memory Scale – Revised in a geographically and culturally unique composition of the Australian population warrants caution in the interpretation of test performance. Furthermore, the application of the Visual Reproduction subtest normative data to the sample in this study needs to be carefully interpreted. That is, the interpolated normative data of the Wechsler Memory Scale – Revised for three of the age bands and specifically the 35 – 44 year olds has been criticised for overestimating performances at upper end of the distribution on the Visual Memory and Delayed Recall Indices that includes the Visual Reproduction subtest (D’Elia et al, 1989; Mittenberg & Burton, 1992).

Moreover, the interpretation of performances on the original Visual Reproduction subtest may also be misleading as the sample sizes of the Wechsler Memory Scale – Revised were small with around 50 to 55 participants in each of the six age bands were used to generate the normative data. The statistical power from a sample of 44 participants was considered sufficient to draw meaningful conclusions from the data, but the sample size in this study was relatively small (Cohen, 1988; Tabachnick & Fidell, 1996). The sample characteristics limit the generalisations of performances to the general Australian population as this was not a stratified and randomised study. Such that, participants in this study were recruited via convenience sampling which can be potentially misleading and biased towards education, ethnicity and socio-economic status (Holdnack et al, 2004).

The psychometric properties of the Alternative Form and scoring system need to be determined using a wider IQ range, particularly in the lower range of intellectual functioning. Furthermore, a larger sample size that is randomly recruited and stratified in

an Australian population across a wider age range would also be required. In particular, further studies in the development of a wider normative base with substantial numbers particularly for the older age groups (55 years and over) would be advantageous. It is important to establish representative and extensive data in the older age groups as often in these later years of life memory problems are initially identified and neuro-degenerative diseases are diagnosed.

A disadvantage of this study was that no inter-rater reliability data was available. It would be valuable to obtain an indication of the inter-rater reliability between experienced clinicians and novice scorers to assist with the refinement of the Alternative Scoring System. The scoring system for the Alternative Form was developed and scored by a single researcher with considerable exposure to individual items and their interpretation. As such, the immediate and delayed recall performances on the Alternative Form may have been different if a researcher who was naïve to the development of the design and scoring system process scored the protocols. However, the explicit nature and operational criteria with the addition of example full credit and no credit drawings to provide further clarity in the scoring of the Alternative Designs.

Another problem identified in this study was the large number of participants (68%) who completely forget at least one of the designs on the Alternative Form. This high rate was considered unusual for a non-clinical population and in some clinical populations. This unexpected high rate of “forgetting” may well have reflected the lack of experience of the researcher in the clinical administration of neuropsychological tests and clinical experience at the time of data collection. Indeed, the participants in this study demonstrated that the designs were not necessarily forgotten on the cued procedure.

Similarly, in the research by Clark (2000) a high percentage of non-clinical participants (47%) failed to recall at least one design. Moreover, given a substantial number of participants benefited with the provision of cue that likely reflected the lack of clinical experience of the researcher in test administration rather than it representing a normal phenomenon. Furthermore, the need for prompts, encouragement and reflection time

may well be sufficient to facilitate recall of the designs. Standardised guidelines clearly need to be developed to address this issue not only for future studies using the Alternative Form, but for all test manuals.

In document The Development of a Scoring System for an Alternative Form of the Visual Reproduction subtest of the Wechsler Memory Scale - Revised (Page 192-196)