Section 4: General and background information
3.6.1.5 Reliability and validity
In developing the questionnaire, the researcher paid special attention to two technical considerations: reliability and validity. According to Hunter and Brewer (2003) and Neuman (2000), both qualities are most central to the assessment o f the goodness o f a measurement and are important in establishing the truthfulness, credibility, or believability o f findings. They are complementary concepts and are closely interrelated; however, in some situations they can conflict with each other. In addition, one cannot predict the other i.e. reliability is necessary for validity;
however, it does not guarantee that a measure will be valid (Neuman 2000;
Sarantakos 2005). Both terms have multiple meanings.
Reliability is “the degree to which measures are free from error and therefore yield consistent results” (Zikmund 2000, p.280). It applies to a measure when similar results are obtained over time and across situations. Reliability refers to the consistency o f a measure. There are two main aspects o f this consistency: stability (or external reliability) and internal consistency (or internal reliability) (Bryman 1989;
Punch 1998).
External reliability refers to “the degree to which a measure is consistent and stable over time” (Bryman 1989, p.55). It is usually expressed in the question: does the measure deliver the same answer when applied in different times? (Neuman 2000).
Internal reliability refers to the degree o f internal consistency o f a measure or whether indicators that make up the scale are consistent with each other (Bryman and Bell 2003), and all are working in the same direction (Punch 1998). It can be expressed in the question: does the measure yield consistent results across different indicators?
(Neuman 2000).
There are four common ways o f testing reliability: test-retest method; alternate (or parallel) form method; split-half method; and Cronbach’s coefficient alpha (Bums 2000; Frankfort-Nachmias and Nachmias 1992; Punch 1998; Zikmund 2000).
The test-retest method involves administering the same measuring instm m ent to the same respondents at two separate times to test for stability and the correlation between the two sets o f responses is computed. The obtained coefficient is the
reliability estimate. The high stability correlation indicates a high degree o f reliability.
However, this method has some limitations. It is difficult to persuade respondents to answer the same questionnaire twice, and if they do, they may think more deeply about the questions on the second occasion and give different answers, or may remember specific questions and answer the same way as on the first occasion, thus yielding a high but overstated reliability estimate. In addition, it is possible that change will occur in the measured variable during the measuring interval, thus lowering the estimate o f reliability (De Vaus 2002; Frankfort-Nachmias and Nachmias 1992; Hussey and Hussey 1997; Saunders et al. 2007).
In the alternate (or parallel) form method, two alternative instruments are designed to be as equivalent as possible. Each o f the two measuring instruments are then administered to a group o f persons, and the two sets o f measures (scores) are correlated to obtain an estimate o f reliability. If there is high correlation between the two instruments, the researcher concludes that the measure is reliable. However, with this technique there is a problem o f determining whether the two forms o f an instrument are in fact parallel, in addition to the further time and effort involved in the construction o f another form (Bums 2000; Frankfort-Nachmias and Nachmias 1992;
Zikmund 2000).
In the split-half method, the questionnaire is divided into two equal halves or two equal sets o f questions and each o f the two sets is treated separately and scored accordingly. The two sets are then correlated and the correlation coefficient o f the two sets is taken as an estimate o f reliability. However, with this technique different types o f items with different difficulty levels may occur in each half (Bums 2000;
Frankfort-Nachmias and Nachmias 1992).
The most frequently used method for measuring the internal reliability or internal consistency among the academic researchers is Cronbach’s coefficient alpha. This is used to assess the reliability o f a measurement scale with multi-point items. It calculates the average o f all possible split-half reliability coefficients (Bryman and Bell 2003). The value o f this coefficient varies between 1 (denoting perfect internal reliability) and 0 (denoting no internal reliability); however, Bryman (1989) argued
that most researchers regard 0.80 as an acceptable level o f internal reliability for any multiple-point scale.
Based on the above discussion and due to the limitations recognised in the first three methods, Cronbach’s coefficient alpha was used in the current study to assess the reliability o f the questionnaire. The scales attained a Cronbach’s coefficient alpha of 0.667 to 0.766, which indicates a reasonable degree o f consistency o f the scales used.
As mentioned earlier, reliability is a necessary condition for validity, but a reliable instrument may not be valid. Thus, even though the responses to questions may turn out to be highly reliable, the results will be worthless if the questions do not measure what the researcher intended them to measure i.e. validity is low. Therefore, after testing the reliability o f the questionnaire, the validity o f the research findings must be well considered by the researcher.
Validity is concerned with the question “Is one measuring what one intends to measure?” (Frankfort-Nachmias and Nachmias 1992, p. 158). Validity is a measure o f precision, accuracy and relevance; it reflects the quality o f indicators and instruments;
and it refers to the ability to produce findings that are in agreement with theoretical or conceptual values (Sarantakos 2005). The researcher must provide supporting evidence that a measuring instrument in fact measures what it appears to measure.
Among the various approaches to the validation o f instrument, three o f the main ones are: (1) content validity; (2) criterion-related validity; and (3) construct validity (De Vaus 2002; Neuman 2000; Punch 1998; Saunders et al. 2007; Zikmund 2000).
(1) Content validity refers to the extent to which the measurement device (questions in the questionnaire) provides adequate coverage o f the investigative questions (Saunders et al. 2007, p.366).
(2) Criterion-related validity is an attempt by the researcher to answer the question
“Does my measure correlate with other measures o f the same construct?” (Zikmund 2000, p.282); therefore, an indicator is compared with another measure o f the same construct in which the researcher has confidence. There are two types o f criterion- related validity: concurrent validity, where the criterion variable exists in the present
and predictive validity, where the criterion variable will not exist until later (Punch 1998, p.101).
(3) Construct validity refers to the extent to which the measurement questions actually measure the presence o f those constructs the researcher intended them to measure (Saunders et al. 2007, p.367). It involves relating a measuring instrument (questionnaire) to a general theoretical framework in order to determine whether the instrument is tied to the concepts and theoretical assumptions that are employed (Frankfort-Nachmias and Nachmias 1992, p. 161).
According to Frankfort-Nachmias and Nachmias (1992) and Punch (1998), each of these three kinds o f validity is concerned with a different aspect o f the measurement situation, and each includes several kinds o f evidence and has special value under certain conditions. In fact, there is no ideal way to establish validity and the validation methods used should depend on the situation.
Given the importance o f the validity o f the research findings, two approaches were followed in the current study in order to enhance validity. First, the questionnaire was subject to many modifications and amendments and had passed through several steps before its final distribution in order to enrich its quality and to improve the validity of individual questions (Section 3.6.1.2). Second, the current study employed a postal questionnaire and semi-structured interviews. As mentioned in Section 3.3.3, the findings from one method can be checked against the findings derived from the other method. Consistent findings among different data collection methods increase the credibility o f findings, improve the trustworthiness o f research, and thus maximise its validity.