PRO evaluation
3.5 Quantitative methods rationale
The quantitative methods in this study are mainly from psychometrics a branch of statistics. Psychometrics concerns the application of analytical methods to measure people's perceptions, beliefs and judgements about physical phenomena and this forms a cornerstone in the development of health measurement methods
(McDowell, 2006).
“The purpose of a psychometric analysis is to establish the extent to which a
quantitative conceptualisation has been operationalised successfully” (Hobart, 2009, pg. 2.). Older methods are underpinned by a theory called classical test theory (CTT), whilst more modern psychometric methods use item response theory (IRT) as their basis.
86
3.5.1 Classical Test Theory
Classical Test Theory (CTT) has its origins in the fields of education and psychology where the aim of measurement was often in the testing students (Hobart and Cano, 2009). Nunnally and Bernstein (1994) suggest that “CTT views measurement as the determination of quantity or how much of an attribute is present in an object”
(pg.21). CTT is a strategy to measure constructs that are not directly observable, “it is suitable for measurement of constructs that follow a reflective model” (De Vet. et al., 2011, pg. 19) and has five main assumptions (Hobart and Cano, 2009) as listed in Table 14.
Table 14: The assumptions underpinning classical test theory
1 Each person has an observed score, which is equal to their "true" score plus an error score.
2 If a scale is administered to a person an infinite number of times, the mean of their observed scores is equal to their true score.
3 Error scores and true scores are not correlated. Errors of measurement are not related to the observed score.
4 The error scores associated with two scales are uncorrelated.
5 The error scores on one scale are uncorrelated with the true score on another scale.
Limitations of CTT
Limitations include the inability to accurately determine the values of the true score (T) or the error score (E) (Hobart and Cano, 2009). Therefore, as these variables are unobservable for individuals, the assumptions underpinning the theory cannot be tested. Other limitations in this approach to psychometrics is that item difficulty and item discrimination are group dependant therefore, dependant on the sample from which they are obtained and will be influenced by the heterogeneity or homogeneity of the sample and their ability to complete the test. Scores are test dependent and CTT does not allow a prediction on how responders may score on an item. The standard error of measurement (SEM) around individual patients’
87
scores is assumed to be a constant value regardless of the person's location on the range of a scale (Petrillo et al., 2015, pg. 32). This suggests that responders scoring at either end of the scale (floor and ceiling) are as precise as those scoring in the centre of the scale (following a normal distribution).
3.5.2 Item Response Theory (IRT)
Item response theory (IRT) also has its origins in the field of educational measurement and is a set of modern psychometric methods focusing on the
relationship between a person's unobservable measurement of the underlying trait (latent variable) and the probability of responding to each of the response
categories of a scale item (Hobart and Cano, 2009, pg. 10). IRT differs to CTT due to the inherent property of invariance of both item parameters and ability parameters (Hambleton & Jones, 1993) which are sample independent unlike CTT.
IRT models are used to measure a patient’s ability (De Vet, 2011). The construct (ability) is usually represented with the Greek letter θ (theta). Guttmann scales are used and consist of multiple items measuring a single unidimensional construct. The items are chosen in such a way that they have a hierarchical order of difficulty. This is known as a ‘deterministic’ model; for example: in the case of assessing a patient's ability to walk, scale items are ranked from “easy” to “difficult” with the possible responses being ‘yes’ or ‘no’. If the patient answers ‘yes’ to an item, this patient will score ‘yes’ to all the easier items and vice versa. Hence, the scale can determine the patient's ability, e.g. if a patient can run for 5 minutes, then they must also be able to stand. In practice, a true Guttman scale is very rare, hence IRT is based on the probabilities of the responses (de Vet et al., 2011).
IRT aims to find the item response model that best explains the data. The models consist of item and person parameters.
3.5.3 Rasch models
Rasch measurement theory (RMT) uses a simple form of measurement model for a single latent trait. It assumes that the item locations (item difficulty) and a person’s
88
score can be estimated independently of the test items from which they were calibrated and of the ability distributed of the sample (Hambleton & Jones, 1993). The Rasch model requires items with binary responses coded as: yes = 1 and no = 0. If data do not fit the Rasch model, researchers will seek to understand why and, if necessary, remove data, re-collect data or re-conceptualise the construct for it to fit the model (Hobart & Cano, 2009).
3.5.4 Limitations of IRT/Rasch
IRT prioritizes the data and aims to find a model that best explains the data and RMT prioritizes the Rasch model and, if the data do not fit, the hypothesis will need to be revisited (Petrillo et al., 2015). Thus, in both IRT and Rasch, models tend to be complex and model fit can be problematic. Both IRT and Rasch require an
advanced level of mathematical understanding and unique software is required when adopting these approaches. Large sample sizes > 500 are required for IRT models.