CSQ scores for eight general practitioners
CHAPTER FOUR: STUDIES OF VALIDITY
4.2. Types of validity.
Although the main principles for the classification of the types of validity are widely accepted, there are differences in the terminology used by different authors. In the discussion that follows, the classification proposed by Streiner and Norman (1989) will be used. This classification was related to the development of health measurement scales, but is suitable for applying to the assessment of the validity of measures of patients’ opinions. Streiner and Norman (1989 p. 107) divide validity into three types; content, criterion and construct. The same classification is used by Cronbach (1990
p. 144) and Kline (1993 p. 15).
1. Content validity
In assessing content validity, the test is examined to make sure it contains questions on each factor that is important to the patient’s decision about satisfaction. The questionnaire should include a representative sample of the universe of all possible questions relevant to patient satisfaction (Committee to Develop Standards for Psychological Testing, 1985). A measure that includes a more representative sample lends itself to more accurate inferences being drawn from the results or scores. If there are issues relevant to patient satisfaction omitted from the questionnaires inferences may be less accurate. One approach to assessing content validity is to ask a group of judges who are familiar with the topic to assess the measure (Kline, 1993 p. 21).
Face validity is related to content validity, and is an indication of whether the measure appears to be assessing the desired issues. It may be judged by review of the measure by one or more experts, with empirical methods of assessment rarely being used (Streiner and Norman, 1989 p. 5).
2. Criterion validitv
In assessing criterion validity a measure or criterion is chosen that is accepted as being concerned with what the test is supposed to measure. The test or questionnaire
is then compared with this accepted "criterion" or "gold standard" (Cronbach, 1990 p. 152). The test of validity is then the correlation of the measure with the "gold standard". Criterion validity is divided into two types, concurrent or predictive. With concurrent validity, the new questionnaire and the "gold standard" are administered at the same time, and the findings of the two scales compared. In predictive validity, the criterion is information which becomes available some time in the future. One illustration of predictive validity is in the ability of an examination such as the advanced general certificate of education (A’ level) to predict a person’s performance on graduation in three or four years time. In this case, the criterion is the person’s eventual performance, and such criteria are sometimes referred to as outcome criteria (Committee to Develop Standards for Psychological Testing, 1985 p. 11).
3. Construct validitv.
Constructs can be thought of as theories to explain the relationships among various behaviours and attitudes (Streiner and Norman, 1989 p. 113). Construct validity seeks to place the theory on which the test is based into a network of laws, at least some of which must involve observables that can be subjected to measurement (Cronbach and Meehl, 1979). The network of laws arises from available research evidence, and explains the relationship between presence of the attitude being measured by the test and a particular change in behaviour of the subject. In order to test the construct validity of a measure studies are undertaken to determine whether inferences drawn from the results of the measure are in accordance with the construct. For example, if a theory or construct about X categorises people into groups according to certain
attributes A, B, and C, where A, B, and C are other instruments, behaviours or diagnoses which can be observed, then a test of X should categorise people into the predicted groups. Thus, an assessment of construct validity assesses not only the measure’s validity but also tests the theory at the same time (Streiner and Norman, 1989 p. 115).
There are several approaches to establishing construct validity. The most straightforward is comparison of extreme groups. In this case, two groups of subjects are compared, one group of which has the attribute or behaviour in question, and the other group does not. The groups are referred to as extreme groups, and a measure intended to distinguish subjects on the basis of the presence or absence of the attribute should score one group significantly differently from the other. One weakness of extreme group comparisons is that differentiating between two very different groups of subjects may not present a very demanding assessment of the measure.
An alternative method is to assess how closely the new scale is related to other variables or other measures of the same construct to which it should or should not be related. The Standards for Educational and Psychological Testing (Committee to Develop Standards for Psychological Testing, 1985 p. 15) recommend that "Construct-related evidence of validity should demonstrate that the test scores are more closely associated with variables of theoretical interest than they are with variables not included in the theoretical network". In testing convergent validity, the degree of correlation expected between the new scale and the other measures will depend on the extent to which they are both concerned with the same attribute or trait
(Cronbach 1990 p. 182). If the two measures agree, despite superficially appearing to be dissimilar, the proposed theoretical interpretation (or construct) is supported.
However, if the new scale covers aspects of an attribute not covered by existing scales, the correlation should be relatively low. Thus, the scale should not only correlate with related measures or variables, it should also not correlate with unrelated variables. This is referred to as discriminant validity or divergence. For example, if the construct of patient satisfaction indicates that there is no relationship between the patient’s intelligence and reported satisfaction, finding a relationship may indicate that the questionnaire is complex and demands a minimum level of intelligence to understand it. There may also be other explanations for finding a relationship, for example the construct itself may be incorrect (Streiner and Norman, 1989 p. 118).
Convergent and discriminant validity can be assessed simultaneously in a more complex procedure known as the multitrait-multimatrix method. Two or more unrelated traits or attributes are measured at the same time by two or more methods. The pattern of correlations enables an assessment of construct validity to be made. For example, low correlations would be expected between the measurement of different traits using the same method, but correlations between measures of the same trait using the same method but on separate occasions should be high.
4. Summary
Assessment of the validity of CSQ and SSQ is essential if these questionnaires are to be used widely. The purpose of tests or measures, including those assessing patient satisfaction, is to enable inferences to be drawn from the results concerning an attribute or attributes of the subjects. The most important property of measures is the extent to which confidence can be placed in the inferences that are drawn, that is the validity of the measure. In this section three principal types of validity have been outlined - content, criterion and construct validity - each of which are assessed using different methods. In assessing validity of a measure, the results of several tests are more useful than the results of a single test. The following sections of this chapter will describe the steps taken to assess the validity of CSQ and SSQ. Each form of validity will be considered in turn.