Professional Practice Board
Assessment of Effort in Clinical Testing of
Cognitive Functioning for Adults
ISBN 978-1-85433-497-8
Printed and published by the British Psychological Society. © The British Psychological Society 2009.
The British Psychological Society
St Andrews House, 48 Princess Road East, Leicester LE1 7DR, UK Telephone 0116 254 9568 Facsimile 0116 247 0787
E-mail [email protected] Website www.bps.org.uk
Incorporated by Royal Charter Registered Charity No 229642
If you have problems reading this document and would like it in a
different format, please contact us with your specific requirements.
Tel: 0116 252 9523; E-mail: [email protected].
Working Party Members
Professor Tom McMillan (Chair), Psychological Medicine, University of Glasgow. Dr Stuart Anderson, Sussex Rehabilitation Centre.
Professor Gus Baker, Head of Division of Neurosciences, University of Liverpool.
Professor Michael Berger, Representative of the Steering Committee on Test Standards. Dr Graham Powell, Psychology Services, London.
Executive Summary
Purpose
To provide guidance and highlight key issues in the assessment of effort during cognitive testing in adults in clinical settings. In addition to identify issues relevant to the
interpretation of findings. Finally to provide further reading and highlight areas for future research
Readership
This document is for professional psychologists who are trained in clinical assessment of cognitive function in adults.
Key Points
● Cognitive test results are not valid if the testee does not try hard on the tests.
● Effort tests should be given routinely as part of clinical assessment of cognitive function.
● There are some exceptions where routine assessment of effort is not appropriate.
● Failure on effort tests requires careful interpretation. Although a number of causes are possible, deceit should always be considered.
● Clinicians should be aware of the sensitivity and specificity of the effort tests that they use and the base rates of sub-optimal performance in the population from which their testee comes and take these factors into account when interpreting findings.
● Interpretation of failure on effort tests needs to be reported as clearly as possible.
● There is little UK research literature on effort testing; there is a need for this to be developed.
Purpose
The aims of the present document are to highlight key issues in the assessment of effort, provide guidance and suggest further reading. Specifically, the document focuses on the assessment of effort as part of a clinical assessment of cognitive function in a wide range of settings. It further aims to identify issues relevant to the interpretation of findings and the potential courses of action when an individual fails cognitive tests of effort. The primary concern of this document is the validity of cognitive symptoms and not with a broader-based assessment of the validity of symptom complaint. The document will highlight important areas for future research
The scope of this document
This document is written for professional psychologists who have been trained in clinical assessment of cognitive function. The majority of research on testing effort is on adults, and it is to this group that this guidance refers. Modification of this document or a further document would be needed if this guidance is to be extended to other client groups, such as children and young people and those with learning disabilities.
Introduction
Cognitive testing is used in many clinical contexts because it is judged that the resulting scores, together with information from other sources collected as part of the assessment process, will advance an understanding of the testee and their needs, and aid decisions about further action. Tests are given as part of purpose-driven psychological assessment processes. An assessment, in the form of a writtenor verbal statement, constitutes the
psychologist’s informed conclusions in relation to the purpose, based on their
interpretation of the test scores and the other information. The value of an assessment is therefore crucially dependent on the quality of the contributing test data. Users of
psychological tests will be aware of the many factors that influence data quality in testing. These include the psychometric properties of the test, competence of the tester and the multiple influences affecting the test performance of the testee. Good practice in testing and assessment requires the issue of data quality to be addressed and, in particular, includes the testee’s motivation to adhere to the test requirements. Motivation that is at variance with test requirements can distort test findings, limit the relevance of the assessment and even invalidate it. The need for reliable and valid indices sensitive to distortions of motivation explains the development of ‘tests of effort’.
It is important to note that ‘effort’, a proxy for motivation, is a potentially misleading term, implying something that is uni-dimensional, uni-directional, and is encompassed by a test score that itself ranges along a uni-dimensional scale. In fact ‘effort’is more likely to be a vector, having both magnitude and direction. For instance, a testee can put much effort into doing the opposite of what is required and little or no effort into what is. Further, like other concepts in psychology, ‘effort’is a higher order concept, assessed and inferredfrom scores on special tests or test components, and other observations of behaviour. Although perhaps narrower in scope, it is no different in principle to concepts such as ‘intelligence’ and ‘personality’. Each can be conceptualised and tested in a variety of ways and each is the subject of multiple theories.
In applied settings, the test user is interested not only in the score but in particular, its meaning or interpretation. Indications of poor, atypical or unusual motivation can arise for one or more of many possible reasons, including disturbances in mental and physical health, fatigue, neuropsychological dysfunctions, negative beliefs about self-competence, poor communication or understanding of the demands of testing, prior tutoring, or deliberate non-adherence to the requirements of testing. Or the testee may perform to conform with a feigned condition, or to exaggerate a condition they have. Detecting and understanding such distortions of performance represent serious challenges for all test users who are required to reach sustainable conclusions about the pattern of performance. The assessment of effort is complicated by its multi-dimensional or multi-factorial nature, that might reflect possible co-action and interaction of several influences simultaneously and over time on test performance. The nature of the most salient of these will need to be identified and taken into account in order to judge their impact on the test data and the conclusions drawn from the assessment process. Part of the expertise of the test user comprises knowledge of such factors and how to identify and take them into account in an assessment. Tests of effort, either embedded in other tests or as stand-alone procedures,
properly understood and used, can assist this process. Technically, such tests aim to detect signs (patterns of test performance) and symptoms (verbal reports – what the client says), indicative of deliberate distortion and in this context are described as tests of symptom validity. Like other diagnostic tests, part of their psychometric evaluation involves demonstrating the fidelity with which they identify ‘true’ instances while minimising misclassifications. These and related issues are discussed below (see also Appendix 1 for further explanation of terms).
This document is intended to inform practitioners of the major advances and expanding literature concerned with testing effortand the assessment of symptom validity, evidenced for instance by the recent publication of the US National Academy of Neuropsychology position paper (Bush et al., 2005) and to draw on this literature and the experience of practitioners to provide a framework for good practice in relation to clinical testing.
Why is testing of effort an issue?
Psychometric testing is used widely to assess cognitive ability in clinical settings and a crucial skill in forming a judgement is interpretation of results. Scores on cognitive tests are usually considered in the context of published normative data, which in turn assume that the testee has conformed to the instructions of the tester to ‘try hard’ during testing. Clearly if effort is sub-optimal, then test data are likely to be invalid, erroneous conclusions may be reached and as a consequence, inappropriate recommendations given. There has been considerable interest in recent years in identifying such sub-optimal performance. This is evidenced by the burgeoning literature on the assessment of symptom validity, which in turn mirrors the development of tests of ‘effort’ that aim to be highly sensitive and specific, and that can be administered easily in a routine clinical situation (Bush et al 2005). In contrast, available evidence (although limited) suggests that clinical judgement is a poor means of detecting faking of cognitive impairment (Heaton et al., 1978; Faust et al., 1998).
Is sub-optimal effort common?
It is now well established that effort is an important determinant of performance on psychological tests and there is a substantial evidence base that supports the use of effort testing in neuropsychological assessment (e.g. Constantinou et al., 2005; Drane et al., 2006; Green et al., 2001; Green 2007a; Lynch 2004; Stevens et al., 2008). In an investigation of 904 individuals involved in litigation or claiming financial benefits for disability, Green et al. (2001) report that 53 per cent of variance on a composite neuropsychological test battery was explained by effort. This contrasted with 11 per cent of the variance shared by education, four per cent by age and five per cent by brain injury severity. Similar findings were reported in a UK study by Moss et al. (2003). These striking figures should be considered in the context of sample characteristics of these studies which tend to involve litigants.
Much of the research literature on the frequency of occurrence or base rates of sub-optimal effort has concentrated on purposeful exaggeration of cognitive impairment. Given the covert nature of deceit, base rates are difficult to determine with confidence. Larrabee (2003) reviewed 11 studies (N=1363) on mild head injury litigants and 15 to
64 per cent (average 40 per cent) were reported to be malingering. Overall, the figure may be higher given the likely prevalence of disability caused by mild head injury (five per cent) and the prevalence of complaints in litigation involving mild head injury (over 90 per cent) cases (Mittenberg et al., 2002). Malingering may also be common in disability claimants, with findings of 54 to 72 per cent probable or definite, reported in US studies (Miller et al., 2004; Chafetz 2008). In the criminal forensic field, 54 per cent of cases have been reported as probable or definite malingerers (Ardolf et al., 2007). It is important that base rate data is considered when interpreting findings on effort tests (see Appendix 2 and Chapter 16 in Strauss, Sherman & Spreen, 2006). Such rates cannot be directly applied to the UK context, as all of the above studies are based on North American samples, although in general, base rates of malingering are likely to be higher in cases where financial reward is associated with the assessment than where assessment is purely for clinical purposes.
Can sub-optimal effort be detected?
The evidence that informs the importance of effort testing, has led to the creation of purpose-built tests designed to detect non-credible cognitive symptoms as well as the development of measures within existing cognitive tests (i.e. embedded tests). While effort testing will not guarantee detection of poor effort, it will help the assessor to make a
judgement about the reliability of the collected data.
The literature on effort testing tends to evaluate the discriminative ability of tests in terms of sensitivityand specificity(see Strauss, Sherman & Spreen, 2006 and Appendices 1 and 2 for elaboration and further explanation of these and related terms). Generally speaking, the trend has been to set desired specificity at high levels (greater than 90 per cent) in order to minimise the risk of false positive errors (i.e. to identify effort as sub-optimal when good effort has been applied). Sensitivity and specificity data continue to be published, and the reader is referred to recent reviews (see Boone, 2007; Larrabee, 2007). In Appendix 3 a representative (rather than an exhaustive) list of purpose-designed tests of effort is given. The list is based on measures frequently used in North America and data are based on US samples (see also Sharland & Gfeller, 2007). However, many of these tests also seem to be used commonly in the UK (McCarter et al., 2009).
The use of embedded measures in conventional psychometric tests
There are a number of reasons why clinicians may need to consider alternatives and/or additions to purpose-designed tests of effort. At a practical level, the inclusion of effort tests increases assessment time. Furthermore, commonly used effort tests may be
vulnerable to client coaching. Finally, there is an emerging consensus that clinicians should adopt a multi-method (and multi-test) approach to the testing of effort (Slick et al., 1999; Strauss, Sherman & Spreen, 2006; Boone, 2007; Larrabee, 2007).
Various conventional psychometric tests have been investigated for their sensitivity to suboptimal effort and the reader should consult Strauss, Sherman and Spreen (2006), Boone (2007) and Larrabee (2007) for listings and reviews of these various studies. Some of the more commonly used approaches are listed in Table 1. None of the indices listed in Table 1 are recommended for use as sole measures for detecting poor effort.
Table 1: Effort-sensitive measures embedded in conventional psychometric tests. Measure Representative Comment
Studies
WAIS-III Iverson & Tulsky (2003); Computed by adding the longest string of digits Reliable Axelrod et al. (2006) forwards and backwards in which both trials at that Digit Span level are passed, the RDS has been well validated for use (RDS) as a measure of effort. A cut-off score of <7 provides
generally good specificity but variable sensitivity (Larabee, 2007).
WAIS-III Iverson & Tulsky (2003); Iverson & Tulsky (2003) suggest a low base rate
Vocabulary Miller et al. (2004); (five per cent or less in general and clinical populations) minus Larrabee (2003) for Vocabulary minus Digit Span difference scores of Digit Span 5 or greater. Miler et al. (2004) found 90 per cent
specificity for a difference score of 4 or more in non-litigating/compensation-seeking samples.
WMS-III Killgore & DellaPietra Killgore & DellaPietra (2000) identified six items in the Rarely (2000) Logical Memory recognition trial that were correctly Missed endorsed at above chance levels by normal participants Index (RMI) naïve to story content. A weighted scoring system forms
the RMI which demonstrated excellent sensitivity (97 per cent) and specificity (100 per cent) in the validation study comparing analogue malingerers with neurological controls.
WMS-III Langeluddecke & Lucas Using a cut-off score of <43 on the ARD and <18 on Auditory (2003) word-list recognition, Langeluddecke and Lucas (2003) Recognition found sensitivity and specificity values of 80 per cent Delay and 91 per cent for probable malingering (as identified
Index by failed WRMT performance).
WMS-III Kelly et al. (2005) This optional WMS-III task was investigated for its Mental discriminative ability (TBI controls, simulators and Control non-simulators) in a study by Kelly et al. (2005). Using a Test cut-off score of < 13.5, results revealed sensitivity and
specificity values of 80 per cent and 82 per cent respectively.
RBANS Silverberg et al. (2007) Recognising the value of word list recognition and digit Effort Index span as measures of suboptimal effort, Silverberg and
colleagues derived an RBANS Effort Index based on these subtest scores. A validation study using clinical TBI and malingering samples as well as three simulator groups found good to excellent discrimination with an overall classification accuracy of 86.9 per cent.
Many adult psychometric tests of personality, psychopathology and adjustment (e.g. MMPI-II, MCMI-IMMPI-II, BRIEF-A) incorporate built-in validity scales that may be sensitive to symptom augmentation, symptom denial and/or inconsistency. While it may be difficult to form an opinion about suboptimal effort using these scales, they are likely to be useful in providing evidence of non-credible or inconsistent performance.
Measure Representative Comment Studies
Rey Barrash, Suhr & Manzel Delayed recognition score has been the most widely used Auditory (1998, 2004) index for detecting poor effort, although there is
Verbal considerable overlap in the score distribution of Learning probable malingerers and clinical controls (Larrabee, Test (RAVLT) 2007). See Boone et al. (2005) and Barrash et al. (2004)
for a combination approach using several AVLT indices. Warrington Iverson & Franzen In a simulator study paradigm, Iverson & Franzen (1998) Recognition (1998); found that a score of <32 on the Words subtest to be Memory Millis (2002) successful in detecting 90 per cent of malingerers with Test (WRMT) 100 per cent specificity. A score of <30 on the Faces
subtest showed 80 per cent sensitivity to malingering with 100 per cent specificity. Millis provides similar data on a TBI group.
Wisconsin Greve et al. (2002); Mixed findings are evident. The data appears to favour Card Larrabee (2003) the use of multivariate formulae rather than a single Sorting Test index. An increased rate of false positive errors occurs in (WCST) the face of brain injury severity and so caution should be exercised when using the WCST to detect poor effort in cases other than mild TBI.
Multiple Larrabee (2003); Some studies have attempted to operationalise the measure Nelson et al. (2003); multi-test/ multi-method of detecting poor effort contrast Meyers & Volbrecht recommended by Slick et al. (1999). Larrabee (2003) approach (2003) investigated patterns of failure on embedded measures
of symptom validity and found that sensitivity could be improved by aggregating across multiple measures without appreciably altering specificity. Meyers & Volbrecht (2003) looked at the rate of failure on nine measures of response validity in various clinical groups, including litigating and non-litigating patients. They found that failure of more than two indicators was rare in non-institutionalised patients. However, they
cautioned that their multiple indicator method was less appropriate for patients with stroke, dementia, mental retardation or those receiving 24-hour institutional care.
What is optimal versus practical assessment?
In their recommendation for an ‘adequate assessment of response validity in
neuropsychological assessment’ (p.425), Bush et al. (2005) recognise the contextual use of effort tests. Specifically, while they strongly recommend the use of effort tests in legal contexts, they recognise that their use in clinical settings may not always be indicated and/or appropriate (e.g. some individuals who require 24-hour supervised care; see
below). They recommend that neuropsychologists in clinical settings outwith legal contexts assess symptom validity in the manner that is most appropriate for that context. Certainly, there is a strongly emerging view in North America, that effort testing should be routinely carried out, other than in exceptional cases (Bush et al., 2005). A method and multi-test approach has also been recommended for the proper evaluation of effort in
neuropsychological evaluations (Slick et al., 1999; Bush et al., 2005).
It is recognised that there is a diversity of practice settings in the UK and effort testing may be problematic in certain contexts and with certain client groups. For example, the
reliability of effort testing can be adversely affected by sensory or motor impairment (as might be seen following stroke), in individuals with dementia or psychiatric illness or during early recovery from neurological injury or illness (e.g. during post-traumatic
amnesia after head injury). Careful consideration of the usefulness of effort tests is needed in these situations.
If formal effort testing has been conducted and ‘failed’, particular care should be taken when interpreting findings because there can be several reasons for failure. Some propose alternative diagnostic categories for classification of individuals where there is insufficient evidence of intention to deceive or of symptom production to warrant a diagnosis of malingering (Delis & Wetter, 2007). There are published criteria that can aid clinicians in decision making about effort and malingering (Slick et al., 1999; Bianchini et al., 2005; Delis & Wetter, 2007).
What to do if detecting poor effort
(a) Discontinue session immediately? There is a need to obtain sufficient information in order to produce a more general formulation (i.e. beyond simply reporting cognitive test results). It is risky to discontinue as data may be inadequate. Advice is to continue the session.
(b) Discontinue testing? Generally there is a need to explore patterns of test performance and to consider the possibility (even if unlikely) that the testee underperformed only on the test of effort. Advice is to continue testing.
(c) Modify planned assessment? This is often appropriate; either to include further tests (e.g. a second test of effort) or to reduce the number of tests if the person seems to be generally underperforming. For the latter option, it may be important not to give some measures, in order to preserve them for a time when the person may be
performing with more appropriate effort, but to give at least some tests as a baseline and to aid formulation.
(d) Continue as planned? This may not be justified; however it is recommended that the tester actively reconsiders their plan. They have to be able to form a view about why the person has failed the effort test.
Interpretation of failure on effort tests
Failure on a test of effort may be because of purposeful intent to deceive, but this is only one possibility and other causes must always be considered. Examples are abnormal
arousal (e.g. during severe anxiety or drug induced hyperarousal or drug induced stupor), severe psychiatric disorder (e.g. acute psychosis), sensory impairment, poor compliance with testing, iatrogenic symptoms (see Appendix 4) or somatoform disorders. Of these, the last two can be the most difficult to distinguish from malingering because there is not motivation to underperform, but a belief in impaired capacity (Delis & Wetter, 2007). However, these possibilities should be balanced against the fact that effort tests are
designed to be very easy with scores that are robust and not easily affected by factors such as anxiety or depression. For example, Boone (2007, p.297) sets out results of studies that examine the impact of anxiety and depression on one well known effort test, and reports that failure was rare.
Mittenberg et al. (2002) and Sharland and Gfeller (2007) suggest that there are a number of factors that are useful when considering the cause of underperformance. These include: 1. The severity of cognitive impairment in relation to the index event.
2. The pattern of cognitive test performance.
3. Scores that are below empirical cut-offs on forced choice tests. 4. Discrepancies among records, self-report and observed behaviour. 5. Implausible self-reported symptoms at interview.
6. Implausible changes in test scores across repeated examinations
7. Unusual or bizarre errors observed during the interview not captured by the test. This array of possibilities emphasises the need to consider information from a variety of sources when interpreting results from tests of effort and achieving a formulation. Included here would be data from other cognitive test sessions, medical records, concurrent diagnoses, career and social background, self-report of complaints,
psychological state and behaviour during interview, separate interview with a relative. It may become evident that further information should be obtained after the assessment in order to give an opinion and this needs to be actively considered.
Motivation to deceive should be considered and in doing so the context of the assessment must be taken into account (e.g. clinical, occupational, litigation). If it is believed that effort has been suboptimal, as clear a view as possible about the cause of this should be reported.
The assessor-testee interface: What should the testee be told?
There is a range of opinion with regard to this question as outlined below (see also Larrabee, 2007):
Advise testees to do their best at the start.This seems uncontroversial and good practice, providing it is in accord with test administration requirements.
Warn testees that effort on tests will be assessed.Whilst this could negatively affect rapport and might create anxiety, it has some benefits to the testee irrespective of whether best effort is intended by them. If intending to deceive however, the testee may be prompted to look to ‘spot’ the test of effort and this could reduce the sensitivity of the test. However, it is ethical to be open and to indicate the purpose of assessment. This may include a general
statement along the lines that the tester will assess how hard the testee is trying on the tests. Raise the issue with the testee if they fail an effort test.The assessor may not have sufficient
information to make sense of this at the time. This could give the testee an opportunity to explain away their poor test performance. In some contexts this may be useful (e.g. a general clinical context) and in others potentially less so (e.g. a litigation context). Introduce a general query before administering an effort test? For example, ask how the testee is feeling and so on; this would give them an opportunity (without specific
prompting) to say they are feeling unwell or cannot concentrate.
Choice of language in reports and feedback session.In a litigation context the testee will have access to the report and there may be possibilities of re-test where rapport would need to be re-established. Hence language must be clear, but where possible non-confrontational. In the clinical situation there is a need to explain the formulation and not simply the test results to the testee and to have a plan about how to take matters forward. It is important to recognise that in the situation of assessment for legal purposes the testee is not the client, the primary duty of the assessor is to the Court and the assessor should not offer feedback without the permission of the instructing solicitor.
Contextual and situational factors
Should effort testing be restricted to any specific context (e.g. only forensic, only clinical, etc.)? There seems to be no rational reason for doing so as sub-optimal effort could arise in a variety of contexts.
Should effort tests always be given in a forensic or litigation context?There is likely to be a
motivation to deceive in a forensic or litigation context and effort should be assessed when possible. Indeed this may protect some testees from unfair criticism. If not assessed the assessor should be prepared to justify this exclusion (e.g. see ‘Exceptional cases’ below). Should effort test be given to all cases seen by neuropsychologists (other than exceptional cases)? There is likely to be a desire to establish a good rapport with the testee and a fear of unjustifiably accusing them of poor effort. These factors, together with historical practice not to give tests of effort routinely, may explain the current situation whereby some psychologists do not use effort tests at all, even in a litigation context (see McCarter et al., 2009). However, there are reasons to argue that tests of effort should be given more often than not. There are exceptions with regard to specific client groups or severities of impairment (see below), in an acute setting (e.g. soon after an injury) and if a relatively brief screening assessment is the purpose of testing. However, in other contexts, recommendations with regard to treatment, rehabilitation, medication or for social support may be made on the basis of cognitive tests. There is a need to judge whether a person has given their best effort or these recommendations could disadvantage the person (e.g. encouraging them towards a disabled role when poor test scores do not accurately reflect the person’s ability).
1Statement on the Conduct of Psychologists providing Expert Psychometric Evidence to Courts and Lawyers. British Psychological
Society, Psychological Testing Centre, 2007.
An example would be someone who was thought to have a severe head injury and this is seemingly confirmed by poor performance on cognitive tests. If admitted to a brain injury rehabilitation unit, their belief in their disability can be inappropriately strengthened, if in fact they sustained a mild head injury and under performed on the cognitive tests.
Threats to effort testing: A cautionary note
Given the nature and purpose of effort tests, it is important that reasonable precautions are taken to safeguard their use in order to maintain their validity. It is recognised that there are potential threats to test security. In litigation contexts, coaching by solicitors represents one possible threat. Internet and other media exposure is another (Bauer & McCaffery, 2006). Psychologists have an ethical and professional obligation to maintain test security by restricting access to actual tests or test data to those qualified to use and
interpret them. They should also strive to minimise unnecessary test exposure and avoid the release of sensitive data (e.g. detailed descriptions of effort tests and their underlying rationale) into the public arena1. During test administration, reasonable precautions need
to be taken to avoid unnecessary exposure of test manuals and record forms to individuals undergoing assessment.
There are a number of other ways in which the security of symptom validity assessment can be maintained. In report writing for example, psychologists could omit the names and descriptions of effort tests in the body of the report. Instead, a general statement along the lines of ‘effort testing failed to indicate non-credible performance’ may suffice. Specific details of these procedures and actual test scores could subsequently be released to
qualified professionals on request. Consideration of this practice is especially important in situations where individuals may complete several neuropsychological evaluations and where coaching is a possibility. The prevalence of coaching in the UK is unknown but in North America, a recent survey conducted by the National Academy of Neuropsychology and Association of Trial Lawyers, suggested that 75 per cent of attorneys spend 25 to 60 minutes preparing clients for psychological evaluation by providing them with information about tests and how they might respond (Victor & Abeles, 2004).
Are there exceptional cases?
While effort tests are generally designed to minimise the risk of false positive classification errors, some studies suggest that they can be failed for reasons other than poor effort. Examples are learning disability, severe amnesia, moderate to severe dementia, severe sensory or physical disability, perceptual disorders, neurological, psychiatric or physical conditions where initiation or alertness is diminished or attention severely impaired (e.g. confusional or low awareness states or acute psychosis), (Dean et al., 2008; Batt et al., 2008; Browndyke et al., 2008). Studies of this kind emphasise the need for further investigation of the performance characteristics of effort tests and their interpretation. Clearly,
practitioners also need constantly to take account of issues relating to mental capacity and informed consent.
Cultural and ethnic issues
Assessment of effort needs to take account of ethnic and cultural differences that might render tests invalid. In many ways these issues are similar to those described more generally in textbooks on cognitive assessment (see Strauss et al., 2006). Specifically with regard to effort there have been US studies, including on Spanish-speakers and Chinese (see Salazar et al., 2007).
Future research needs
● Tests of cognitive function which simply include embedded assessments of effort need further development.
● Further evidence on UK base rates of cognitive impairment and failure on effort tests in a range of clinical presentations and service settings is needed.
● A better understanding of the sensitivity and specificity of tests of effort is needed using UK populations (especially in non-forced-choice measures (see Nitch & Glassmire, 2007).
References
Ardolf, B.R., Denney, R.L. & Houston, C.M. (2007). Base rates of negative response bias and malingered neurocognitive dysfunction among criminal defendants referred for neuropsychological evaluation. The Clinical Neuropsychologist, 21, 899–916.
Axelrod, B.N., Fichtenberg, N.L., Millis, S.R. & Wertheimer, J.C. (2006). Detecting
incomplete effort with Digit Span from the Wechsler Adult Intelligence Scale – 3rd ed. The Clinical Neuropsychologist, 20, 513–523.
Barrash, J., Suhr, J. & Manzel, K. (1998). A brief, sensitive, and specific procedure for detecting malingered memory impairment. Journal of the International
Neuropsychological Society, 4, 28.
Barrash, J., Suhr, J. & Manzel, K. (2004). Detecting poor effort and malingering with an extended version of the Auditory Verbal Learning Test (AVLTX): Validation with clinical samples.Journal of the International Neuropsychological Society, 26, 125–140. Batt, K., Shores, A. & Chekaluk, E. (2008). The effect of distraction on the Word Memory
Test and Test of Memory Malingering performance in patients with a severe brain injury. Journal of the International Neuropsychological Society, 14, 1074–1080.
Bauer, L., Yantz, C.L., Ryan, L.M., Warden, D.L. & McCaffrey, R.J. (2005). An examination of the California Verbal Learning Test II to detect incomplete effort in a traumatic brain-injury sample. Applied Neuropsychology, 12, 202–207.
Bauer, L. & McCaffery, R.J. (2006). Coverage of the Test of Memory Malingering, Victoria Symptom Validity Test, and Word Memory Test on the Internet: Is test security threatened? Archives of Clinical Neuropsychology, 21, 121–126.
Bianchini, K.J., Greve, K.W. & Glynn, G. (2005). On the diagnosis of malingered pain-related disability: Lessons from cognitive malingering research. Spine Journal, 5, 404–417.
Boone, K.B., Salazar, X., Lu, P., Warner-Chacon, K. & Razani, J. (2002a). The Rey 15-Item Recognition Trial: A technique to enhance sensitivity of the Rey 15-Item
Memorisation Test. Journal of Clinical and Experimental Neuropsychology, 24, 561–573. Boone, K.B., Lu, P., Back, C., King, C., Lee, A., Philpott, L. et al. (2002b). Sensitivity and
specificity of the Rey Dot Counting Test in patients with suspect effort and various clinical samples. Archives of Clinical Neuropsychology, 17, 625–642.
Boone, K.B., Lu, P. & Wen, J. (2005). Comparisons of various RAVLT scores in the
detection of noncredible memory performance. Archives of Clinical Neuropsychology, 20, 301–319.
Boone, K.B. (2007). Assessment of feigned cognitive impairment: A neuropsychological perspective. New York: Guilford Press.
Browndyke, J.N., Paskavitz, J., Sweet, L.H., Cohen, R.A., Tucker, K.A. Welsh-Bohmer, K.A. Burke, J.R. & Schmechel, D.E. (2008). Neuroanatomical correlates of malingered memory impairment: Event-related fMRI of deception on a recognition memory task. Brain Injury, 22(6), 481–489.
Bush, S.S. Ruff, R.M., Troster A.I. et al. (2005). Symptom validity assessment: Practice issues and medical necessity. NAN Policy and Planning Committee. Archives of Clinical
Neuropsychology, 20, 419–426
Chafetz, M.D. (2008). Malingering on the social security disability consultative exam: Predictors and base rates. The Clinical Neuropsychologist, 22, 529–546.
Conder, R., Allen, L. & Cox, D. (1992). Manual for the computerised assessment of response bias. Durham, NC: CogShell.
Constantinou, M., Bauer, L., Ashendorf, L. & McCaffrey, R.J. (2005). Is poor performance on recognition memory effort measures indicative of generalised poor performance on neuropsychological tests? Archives of Clinical Neuropsychology, 20, 191–198.
Dean, A.C, Victor, T.L., Boone, K.B. & Arnold, G. (2008). The relationship of IQ to Effort Test Performance. The Clinical Neuropsychologist, 22(4), 705–722.
Delis, D.C. & Wetter, S.R. (2007). Cogniform disorder and cogniform condition: Proposed diagnoses for excessive cognitive symptoms. Archives of Clinical Neuropsychology, 22, 589–604.
Donders, J. (2005). Performance on the Test of Memory Malingering in a mixed pediatric sample. Child Neuropsychology, 11, 221–227.
Drane, D., Williamson, D.J., Strup, E.S., Holmes, M.D., Jung, M., Koerner, E., Chayter, N., Wilensky, A.J. & Miller, J.W. (2006). Impairment is not equal in patients with epileptic and psychogenic non-epileptic seizures. Epilepsia, 47(11), 1879–1886.
Dunn, T.M., Shear, P.K., Howe, S. & Ris, D.M. (1993). Detection of neuropsychological malingering: Effects of coaching and information. Archives of Clinical Neuropsychology, 18, 121–134.
Faust D., Hart K., Guilmette T.J. et al. (1998). Neuropsychologists’ capacity to detect adolescent malingerers. Professional Psychology, Research and Practice, 19, 508–515. Green, P., Rohling, M.L., Lees-Haley, P.R. & Allen, L.M. (2001). Effort has a greater effect
on test scores than severe brain injury in compensation claimants. Brain Injury, 15, 1045–1060.
Green, P., Lees-Haley, P.R. & Allen, L.M. (2002). The Word Memory Test and validity of neuropsychological test scores. Journal of Forensic Neuropsychology, 2, 97–124.
Green, P. (2003). Word Memory Test for Windows: User’s manual and program. Edmonton, Alberta, Canada: Author. (Revised 2005)
Green, P. (2004). Medical Symptom Validity Test for Windows: User’s manual and program. Edmonton, Alberta, Canada: Author.
Green, P. (2007a). The pervasive influence of effort on neuropsychological tests. Physical Medicine Rehabilitation Clinics of North America, 18(1), 43–68.
Green, P. (2007b). Spoilt for choice: Making comparisons between forced-choice effort tests. In K.B. Boone (Ed.), Assessment of feigned cognitive impairment: A neuropsychological perspective(pp.50–77). New York: Guilford Press.
Greve, K., Bianchini, K., Mathias, C., Houston, R. & Crouch, J. (2002). Detecting malingered performance with the Wisconsin Card Sorting Test: A preliminary investigation in traumatic brain injury. The Clinical Neuropsychologist, 16, 179–191. Heaton, R.K., Smith, H.H., Lehman, R.A.W. et al. (1978). Prospects for faking believable
deficits on neuropsychological testing.Journal of .Consulting and Clinical Psychology, 46, 892–900.
Howe, L.L.S., Anderson, A.M., Kaufman, D.A.S., Sachs, B.C. & Loring, D.W. (2007). Characterisation of the Medical Symptom Validity Test in evaluation of clinically referred memory disorders clinic patients. Archives of Clinical Neuropsychology, 22, 753–761.
Iverson, G.L. & Franzen, M.D. (1998). Detecting malingered memory deficits with the Recognition Memory Test. Brain Injury, 12, 275–282.
Iverson, G.L. & Tulsky, D.S. (2003). Detecting Malingering on the WAIS-III: Unusual Digit Span performance patterns in the normal population and in clinical groups.
Archives of Clinical Neuropsychology, 18, 1–9.
Kelly, P.J., Baker, G.A., van den Broek, M.D., Jackson, H. & Humphries, G. (2005). The detection of malingering in memory performance: The sensitivity and specificity of four measures in a UK population. British Journal of Clinical Psychology, 44, 333–341. Killgore, W.D.S. & DellaPietra, L. (2000). Using the WMS-III to detect malingering:
Empirical validation of the Rarely Missed Index (RMI). Journal of Clinical and Experimental Neuropsychology, 22, 761–771.
Langeluddecke, P.M. & Lucas, S.K. (2003). Quantitative measures of memory malingering on the Wechsler Memory Scale – 3rd ed. in mild head injury litigants. Archives of Clinical Neuropsychology, 18, 181–197.
Larrabee, G.J. (2003). Detection of malingering using atypical patterns of performance on standard neuropsychological tests. The Clinical Neuropsychologist, 17, 410–425.
Larrabee, G.J. (2007). Assessment of malingered neuropsychological deficits.New York: Oxford University Press.
Loring, D., Lee, G. & Meador, J. (2005). Victoria Symptom Validity Test performance in non-litigating epilepsy surgery candidates.Journal of Clinical and Experimental Neuropsychology, 27, 610–617.
Lynch, W.J. (2004). Determination of effort level, exaggeration, and malingering in neurocognitive assessment. Tests, Measurement, and Technology, 19(3), 277–283. McCarter, R., Walton, N.H., Brooks, D.N. & Powell, G.E. (2009). Effort testing in
contemporary clinical neuropsychology practice. The Clinical Neuropsychologist, 23, 1050–1066.
Merten, T., Green, P., Henry, M., Blaskewitz, N. & Brockhaus, R. (2005). Analog validation of German-language symptom validity tests and the influence of coaching. Archives of Clinical Neuropsychology, 20, 719–
Meyers, J.E. & Volbrecht, M.E. (2003). A validation of multiple malingering detection methods in a large clinical sample. Archives of Clinical Neuropsychology, 18, 261–276.
Miller, L.J., Ryan, J.J., Carruthers, C.A. & Cluff, R.B. (2004). Brief screening indexes for malingering: A confirmation of Vocabulary minus Digit Span from the WAIS-III and the Rarely Missed Index from the WMS-III. The Clinical Neuropsychologist, 18, 327–333. Millis, S. (2002). Warrington Recognition Memory Test in the detection of response bias.
Journal of Forensic Neuropsychology, 2, 147–166.
Mittenberg, W., Patton, C., Canyock, E.M. & Condit, D.C. (2002) Base rates of malingering and symptom exaggeration. Journal of Clinical and Experimental Neuropsychology, 24, 1094–1102.
Moss, A., Jones, C., Fokias, D. & Quinn, D. (2003). The mediating effects of effort upon the relationship between head injury severity and cognitive functioning. Brain Injury, 17, 377–387.
Nelson, N.W., Boone, K., Dueck, A., Wagener, L., Lu, P. & Grills, C. (2003). Relationship between eight measures of suspect effort. The Clinical Neuropsychologist, 17, 263–272. Nitch, S.R. & Glassmire, D.M. (2007). Non-forced choice measures to detect non-credible
performance. In K.B. Boone (Ed.), Assessment of feigned cognitive impairment: A neuropsychological perspective(pp.78–102). New York: Guilford Press.
Rey, A. (1941). L’examen psychologie dans las cas d’encephalopathie traumatique. Archives de Psychologie, 28, 286–340.
Rey, A. (1964). L’examen clinique en psychologie.Paris: Preses Universitares de France. Richman, J., Green, P., Gervais, R., Flaro, L., Merten, T., Brockhaus, R. et al. (2006).
Objective tests of symptom exaggeration in independent medical examinations. Journal of Occupational and Environmental Medicine, 48, 303–311.
Root, J.C., Robbins, R.N., Chang, L., & van Gorp, W. (2006). Detection of inadequate effort on the California Verbal Learning Test (2nd ed.): Forced choice recognition and critical item analysis. Journal of the International Neuropsychological Society, 12, 688–696. Salazar, X.F., Lu, P.H., Wen, J. et al. (2007). The use of effort tests in ethnic minorities.
In K.B. Boone (Ed.), Assessment of feigned cognitive impairment: A neuropsychological perspective(pp.405–427). New York: Guilford Press.
Schmand, B. & Lindeboom, J. (1998). Amsterdam Short Term Memory Test Manual. The Netherlands: PITS Test Publishers.
Sharland, M.J. & Gfeller, J.D. (2007). A survey of neuropsychologists’ beliefs and practices with respect to the assessment of effort. Archives of Clinical Neuropsychology, 22, 213–223. Silverberg, N.D., Wertheimer, J.C. & Fichtenberg, N.L. (2007). An effort index for the
Repeatable Battery for the Assessment of Neuropsychological Status (RBANS). The Clinical Neuropsychologist, 21, 841-54.
Slick, D.J., Hopp, G., Strauss, E. & Thompson, G. (1997). Victoria Symptom Validity Test: Professional Manual.Odessa, FL: Psychological Assessment Resources.
Slick, D.J., Sherman, E.M.S. & Iverson, G.L. (1999). Diagnostic criteria for malingered neurocognitive dysfunction: Proposed standards for clinical practice and research. The Clinical Neuropsychologist, 13, 545–561.
Stevens, A., Friedel, E., Mehren, G. & Merten, T. (2008). Malingering and
un-co-operativeness in psychiatric and psychological assessment: Prevalence and effects in a German sample of claimants. Psychiatry Research, 157, 191–200.
Strauss, E., Sherman, E.M.S. & Spreen, O. (2006). A compendium of neuropsychological tests: Administration, norms, and commentary.New York: Oxford University Press.
Tombaugh, T. (1996).Test of Memory Malingering. Toronto, Ontario, Canada: Multi-Health Systems.
Teichner, G. & Wagner, M. (2004). The Test of Memory Malingering: Normative data from cognitively intact, cognitively impaired, and elderly patients with dementia. Archives of Clinical Neuropsychology, 19, 455–464.
Victor, T.L. & Abeles, N. (2004). Coaching clients to take psychological and
neuropsychological tests: A clash of ethical obligations. Professional Psychology: Research and Practice, 35, 373-379.
Appendix 1: Explanation of terms
● Effort:Motivation to comply with implicit or explicit test instructions with regard to speed, accuracy or other performance requirement. Failure on a test of effort means that someone has performed poorly on the test (below a suitable cut-off or low absolute score), and where the test was appropriate for that person, that they performed below their capability as determined by other criteria.
● Symptom validity:The accuracy or truthfulness of the testee’s presentation (e.g. behaviour, self-report, test performance).
● Response bias:Presentation, including test performance that results from an attempt to mislead the examiner, including through sub-optimal effort.
● Dissimulation:The falsification or misrepresentation of symptoms by over
representation or under representation, with an intention to appear different from the ‘true’ state.
● Malingering:The intentional presentation of false or exaggerated symptoms motivated by external gain.
Test says simulating Test says not simulating
Truly simulating (N=100) 90 10
Truly not simulating (N=100) 5 95
Test says simulating Test says not simulating
Truly simulating (N=20) 18 2
Truly not simulating (N=180) 9 171
Appendix 2: Statistical Issues
Sensitivity, Specificity and Base Rates
The diagnostic validity of a test concerns the ability of that test to detect the presence or absence of a defined characteristic in the person assessed. For example, in the context of the present document, this might be whether the person is simulating or not simulating. Gauging diagnostic validity involves understanding the sensitivity and specificity of a test and knowing the base rate of the characteristic in the population being examined. A discussion of sensitivity, specificity and base rates in the context of effort testing can be found in chapter 2 of Larrabee (2007) and chapter 10 of Boone (2007).
Sensitivityis the ability of a test to detect a characteristic in those who truly do posses that characteristic. For example a test might pick out 90 per cent of people who are truly simulating. (The precise percentage will of course depend on the cut-off score chosen). Technically sensitivity is the ratio of true positives to true positives plus false negatives. Therefore, in the example given, the sensitivity of the test will be 90/(90+10)=0.9.
Specificityis the ability of a test to detect the absence of a characteristic in those who truly do not posses that characteristic. For example, the same test using the same cut-off might pick out 95 per cent of those who are truly not simulating. Technically, specificity is the ratio of true negatives to true negatives plus false positives. Therefore, in the example given the specificity is 95/(95+5)=0.95.
However, the precise interpretation of a test result depends on the base rate of the
characteristic in the population being assessed, i.e. the percentage of that population that truly posses the characteristic.
If in the example given, clinician A sees a population in which 50 per cent are truly simulating. The result of 200 assessments on that test will be:
In this scenario, if the test indicates simulating, the likelihood that the person is truly simulating is 90:5; i.e. a probability of 90/95=0.947.
However, suppose clinician B sees a population in which only 10 per cent are truly simulating. In this case the results of 200 assessments on the test will be:
In this scenario, if the test indicates simulating, the likelihood that the person is truly simulating is 18:9, which is a probability of 18/27=0.667.
Hence, in this example, the lower the base rate of simulating, the less certain is the meaning of a positive result on the test.
The Binomial Test:
In multiple-choice testing, the testee is required to select one from among a set of given alternative answers. In such a situation, by guessing alone (chance), a proportion of the answers are likely to be correct. To help compensate for this in interpreting test results, it is useful to have a procedure for taking into account the effect of correct guesses. The
Binomial Test is a statistical procedure useful for determining the probability of achieving a particular score or less. For a given score (i.e. number correct) the standard score (Z) can be computed, and can be converted into a probability or percentage using tables of the normal distribution. The formula is Z = (X-NP)/√(NPQ), where Z is the standard score (a Z of 1.98 is significant at the .05 level, two tailed), X is the obtained score (i.e. the
number of correct answers achieved by the client), N is the number of items, and P and Q are the proportion of correct and incorrect options (e.g. both P and Q will be 0.5 in a two alternative forced-choice situation, or 0.25 and 0.75 in a four alternative forced-choice situation). A correction factor is applied for small N, i.e. Z = (|(X-NP)|-½)/√(NPQ), where the expression |(X-NP)| means the modulus of (X-NP), i.e. the value of (X-NP) regardless of sign. Therefore if a client scored 1 out of 10 on a two-choice, forced-choice test the corrected Z would be (|(1-10.½)|-½)/√(10.½.½) = (4-½)/√2.5 = 3.5/1.58 = 2.2. Hence the probability of obtaining a score of 1 out of 10 by chance is p<.05. Therefore, in this
example, it is highly likely that the score was not obtained by chance. The score is so poor that the possibility that the individual deliberately gave wrong answers is raised.
Test of Memory Malingering (TOMM)
Tombaugh (1996)
Word Memory Test (WMT)
Green (2003)
Rey 15-Item Test (and recognition trial) Rey (1964) Picture recognition (forced choice) Word recognition (forced choice) Letter/Number/ Symbol recall and recognition Manual or Computer (20 min.) Manual or Computer (20 min.) Manual (5 min.) Relatively unaffected by age, low education, cultural factors, anxiety, depression, pain. Suitable for use with children (Donders, 2005) but high misclassification rates in adults with dementia (Teichner & Wagner, 2004). Questions have been raised about
sensitivity relative to other tests (e.g. Green, 2007b).
Assesses effort and verbal memory. Performance not affected by education, IQ, neurological or psychiatric conditions, pain or age (suitable for use with children). Appears to easily identify simulators and those not putting forward maximal effort (Green et al., 2002).
Boone et al. (2002a) extended the original 15-Item test to
include a recognition
Measure and Paradigm Administration Mode Comment
year of publication (Time)
Appendix 3:
Dot Counting Test Rey (1941) Computerised Assessment of Response Bias (CARB)
Conder, Allen & Cox (1992) Processing of visual stimuli Digit recognition (forced choice) Manual (10 min.) Computer (administration time variable depending on version used) component. Their study found adequate sensitivity (71 per cent) and specificity (91 per cent) using a combination cut-off score <20. Sensitive to low effort in patient and control groups that exclude individuals with low IQ and dementia. Relatively unaffected by older age and lower education.
Empirical studies are equivocal about sensitivity and specificity. Variable performance across clinical groups increases the risk of false positive errors. Use of combination effort index score (Boone et al., 2002b) improves sensitivity. Performance
relatively unaffected by age and education but may be affected by psychosis and ethnicity (Larrabee, 2007). Performance is relatively unaffected by severity of head trauma and reasonably resistant to coaching efforts in simulated studies
Measure and Paradigm Administration Mode Comment
Victoria Symptom Validity Test (VSVT) Slick et al. (1997)
Amsterdam Short-Term Memory Test (ASTM)
Schmand &
Lindeboom (1998) – in collaboration with Merten & Millis
Digit recognition (forced choice)
Word recall and recognition
Computer (15 to 20 min.)
Computer (10 to 30 min.)
(e.g. Dunn et al., 2003). Use limited by poorly understood psychometric
properties and risk of false negatives. Best used in combination with other effort tests.
Easy to administer. Format promotes test-taker perception of varied level of difficulty. Loring, Lee and Meador (2005) found poorer VSVT scores in individuals over 40 years with low performance on cognitive tests. Failure rate in clinical groups has been found to be substantially lower than compensation-claiming samples (Larrabee, 2007). Merten et al. (2005) found good separation between individuals asked to try their best and those coached in cognitive
exaggeration with reported sensitivity and specificity values of 100 per cent. They found the ASTM to be approximately equivalent to the
Measure and Paradigm Administration Mode Comment
Medical Symptom Validity Test (MSVT) Green (2004)
Word recognition (forced choice) and free recall
Computer (15 to 20 min.)
MSVT in its ability to differentiate between simulators and those demonstrating good effort.
The MSVT is a relatively new
addition to the field and awaits further independent validation.
Preliminary findings are encouraging and the test manual includes information to help identify dementia profiles in failed protocols. Interestingly, respondents tested in a foreign language can achieve near ceiling scores on the MSVT (Richman et al., 2006). Howe et al. (2007) provide data comparing diagnostic groups of individuals attending a memory disorders clinic.
Measure and Paradigm Administration Mode Comment
Appendix 4: Iatrogenic factors
One of the processes in presentation and symptom complaint is for the individual to come to report symptoms that they are exposed to during a clinical or litigation process
(iatrogenic symptoms); e.g. when reading expert or clinical reports, or as described to them by therapists or lawyers, or in educational programmes in rehabilitation units, or during their own investigations by reading books or searching the internet. An individual can come to believe that they have a severe brain injury and severe cognitive problems even when the brain injury itself was mild and their early history was of good recovery with only mild symptoms. It is important that experts and psychological services do not