1.4 Summary and Conclusions
2.2.2 Assessing the reliability and validity of the N H S3
2.2.4.2 Internal consistency
Internal consistency of the scale, as measured by Cronbach's alpha, was 0.77. Table 2.5 displays the effect of deletion of each item in turn on the alpha coefficient. Deletion of no item led a significant increase in alpha.
Table 2.5 Cronbach's alpha for the scale with each item deleted
Scale Item Alpha if item deleted
Convulsion 0.72 Fall 0.69 Injury 0.72 Incontinence 0.76 Warning 0.76 Automatism 0.79 Recovery 0.71 2.2.4.3 Inter-rater reliability
An intraclass correlation coefficient of 0.90 was determined for inter-observer testing. The standard error of measurement for a single observation was thus 2.0 scale points, predicting that 95% of observations would be within plus or minus 4 scale points. The 99% lower confidence interval for the ICC was 0.86. The mean difference between the two observers was 0.15 scale points with a standard deviation of 2.08 scale points. The limits of agreement therefore between two observers for an individual observation were
{-4.0,4.3} scale points. The score differences were approximately normally distributed around zero. The confidence intervals for the estimation of mean differences (bias) were {-0.2,+0.5}. The confidence intervals for the upper hmit of agreement were
{3.7,4.9} and for the lower limit of agreement were {-4.6,-3.4}. The magnitude of the inter-observer differences was not related to the mean score of the two observers (Figure 2.2).
2 2.4.4 Test-retest reliability
An ICC of 0.90 was found for test-retest observations. The standard error of
measurement was 2.0 scale points. The 99% lower confidence interval for the ICC was 0.85. The mean difference between the first and second application of the scale was +0.5 (s.d. 2.8). The limits of agreement were {-5.1,+6.1}. The confidence intervals for the mean difference were {-0.24,+1.24). The confidence intervals for the upper level of agreement were {+4.8,+7.4} and for the lower level of agreement {-6.4,-3.8}. There was no systematic effect of the magnitude of the severity score on the test-retest
reliability (Figure 2.3). Differences were approximately normally distributed around zero.
2.2.4.S Validity 2.2.4.5a Experiment 1
The rankings given by 80 subjects to the 5 prototype seizures were compared to the rankings derived from the NHS3 scores (Table 2.5). Close agreement is indicated by the figures in the left-right downward diagonal squares (outlined). The weighted kappa was 0.82, indicating very good agreement between patients' ranking and the scale score. Disagreement only occurred over seizures C and D, which were those with the closest score using the NHS3 (see experiment 2).
2.2.4.5b Experiment 2
The mean VAS score matched the NHS3 score closely for each prototype seizure (Table 2.7). This indicated that the relative severity of the 5 types of seizure as judged by the 50 patients was reflected in the scores produced for each seizure by the scale.
Seizure A
The attack consists of a 10 second blank spell during which the patient stares straight ahead. The recovery is immediate and there are no after-effects. There are no falls or injuries.
Seizure B
The attack starts with a fluttering feeling in the stomach, which warns the patient to sit or lie down. The patient then loses awareness for 1/2 a minute during which he smacks his lips. When the attack is over the patient is back to normal within 10 minutes.
Seizure C
The attack occurs without warning, and results in sudden falling to the ground, but the patient recovers within a few seconds. The patient often cuts his head deeply as a result.
Seizure D
Seizure E
The attack starts without warning, and begins with the patient becoming confused, during which he may act oddly like undressing himself or moving objects around, occasionally he is incontinent o f urine or falls to the ground. He has never injured himself. The patient then comes round, and the recovery normally takes 30 minutes.
The attack starts without warning, and the patient always falls unconscious to the floor, and then has a "grand mal" convulsion (with shaking of the arms and legs). The patient is often incontinent of urine and always bites his tongue. Full recovery takes 6 hours.
o u 0) 10.0-1 7.5- 5.0- 2.5- 0.0 -2.5- -5.0- -7.5- -10.0 # • • • • • • $ # # • • • t — r - 10 — r - 20
Mean score (Inter-observer)
mean+2 sd
mean
mean-2 sd
“ 1 30
Figure 2.2 Inter-observer score differences plotted against the mean seizure severity score of the two observers. Each dot represents one or more patients. The mean difference and 2 standard deviations (s.d.) above and below the mean are indicated by the horizontal lines.
■D 4) O U 0) 1 0 . 0 - 7.5- 5.0- 2.5- 0.01 -2.5- -5.0- -7.5-i -10.0 □ □ □ ■ □ □ □ □ □ □ □ □ □ □ B Q Q □ □ □ □ □ □ □ □ □ □ □ B B □ □ 10 I 2 0
Mean Score (Test-Retest)
mean+2
m ean
mean-2
30
Figure 2.3 Test-retest score differences plotted against the mean seizure severity score for the two measurements. The mean difference and 2 standard deviations (s.d.) above and below the mean are indicated by the horizontal lines.
Table 2.6 How the 80 patients ranked the 5 "prototype" seizures. The number in each cell indicates the number of patients for a given pair o f rankings.
Prototype Seizures as ranked by
Prototype seizures as ranked by NHS3 score
A B D E patients (least severe) (most severe) 1 (least severe) 72 8 0 0 0 2 7 67 5 1 0 3 1 5 45 26 3 4 0 0 20 48 12 5 (most severe) 0 0 10 5 65
Table 2.7 The scores (for prototype seizures A,B»C,D and E) as derived by the National Hospital Seizure Severity Scale (NHS3) compared with the scores derived from 50 patients using a visual analogue technique.
Seizure A Seizure B Seizure C Seizure D Seizure E
seizure type: absence complex
partial atonic complex partial tonic-clonic NHS3 Score 3 4 11 13 21 Score derived from VAS rating 3 5 14 16 23
NHS3 = National Hospital Seizure severity scale, VAS = Visual analogue scale.
2.2.5 Discussion
The principal results were that the NHS3 was reliable and had construct validity. The interpretation of these findings will now be discussed.
The alpha coefficient of 0.77 in our study indicated that the scale has adequate internal consistency. The alpha would rise slightly with the elimination of the item on
automatisms, but at the cost of a reduction in the ability to assess complex partial seizures. This item has been retained.
For the scale to be useful in AED trial settings it is critical that it is adequately reliable (especially between different raters). The ICC of 0.90 for both the inter-observer and retest condition indicated adequate rehability. When the reliability data was expressed using the method of Bland and Altman no systematic bias between observers or
between the first and second administration of the scale was found. In addition, there was no systematic relation between size of measurement error and score on the scale. However, the limits of agreement (plus or minus two standard deviations of the
agreement, which were similar to those found in earlier work on the Chalfont scale, are a significant proportion of the maximum score on the scale (about 30%), and may be equivalent to the median magnitude of an NHS3 score for a typical complex partial seizure. The standard error of measurement for a single observation (derived from the ICC) was equivalently wide. These considerations suggest that the scale will be adequately reliable for use in AED trials when data are analysed by groups, but that when assessing an individual seizure type, in a single patient, over time, the level of precision reported here should to be taken into account. It is likely that the scale is not able to measure small changes (2-3 scale points) reliably in individual patients. The cause of the imprecision is most probably related to reliance on the memory of both the subject and the witness to score the scale. Ideally, the reliability study should be repeated by observers not involved in its development and in a new sample of patients. A replication of the findings would increase confidence in the generalizability of the reliability estimates.
The scores from our sample of subjects were not normally distributed, being skewed towards lower values, reflecting a considerable number of patients in the sample with absences or brief complex partial seizures. The scores in Table 2.4 indicated that the scale discriminated (on a group basis) between different clinical seizure types for all comparisons except between absences, simple partial seizures and myoclonic jerks.
The meaning of score changes on the scale requires further explanation. An example of a 4-5 point change would be the cessation of injuries, or quicker recovery plus a
reduction in the frequency of urinary incontinence. The minimum change that is significant for an individual patient has yet to be determined, but is likely to be approximately 2-3 points, as the validation experiment suggested that most people ranked "seizure D" above "seizure C" (2 points difference).
Experiments 1 and 2 have provided evidence for construct validity for the NHS3. The NHS3 scores for absences, mild complex partial seizures, atonic attacks, severe
complex partial seizures, and generalised tonic clonic seizures were almost exactly the same as those predicted using the visual analogue technique (Table 2.7). Thus, the scaling of the NHS3, though reliant on only objective criteria, is in accord with the subjective assessment of the relative severity of different seizure types by patients with epilepsy.
The responsiveness of the NHS3 remains to be determined. This is currently taking place in two trials with the new antiepileptic drug, Topiramate (Personal
communication. Dr H.Coles, Cilag-Janssen, UK) and in a trial of the experimental drug, ucb L 059 (Personal conununication. Dr U. Falter, ucb Pharma, Belgium).
2.2.6 Conclusions
1. The National Hospital Seizure Severity Scale is sufficiently reliable for group studies.
2. The scale is simpler to complete and score than the Chalfont scale.
3. The scale is valid from the subjective point of view of a person with epilepsy.
4. The scale is a valuable additional outcome measure in trials of novel antiepileptic drugs. A number of international multicentre antiepileptic drug trials using the measure are taking place.
Section 3
The Development and Validation of the
Subjective Handicap of Epilepsy Scale
Introduction
The outcome measures sensitive to the psychosocial effects of epilepsy were reviewed in section 1.3.4. It was concluded that there remained a need for scales that assessed the long-term social and vocational handicapping effects of chronic epilepsy. This chapter describes the development of such a measure: "The subjective handicap of epilepsy scale". The material in this chapter has been published as:
O'Donoghue MF, Duncan JS, Sander JWAS. The Subjective Handicap of Epilepsy: a new approach to measuring treatment outcome. Brain 1998; 121:317-343.