A fundamental aspect of behavioural research is the construction and application of psychometric measures for observing human behaviour (Dunn et al., 2014; Drost, 2011). In addition, the measurement of people’s behaviour is a method embedded in the positivist
perspective (Dunn et al., 2014). Thus, it is essential that the selected measurement instruments for any research questionnaire are reliable and valid in order to achieve rigor. Furthermore, reliability and validity test are quality enhancers mostly used in quantitative research (Golafshani, 2003). Considering this, the following provides a detailed discussion on validity and reliability.
3.9.1 Validity
Validity testing is defined as the degree to which an item accurately measures what it is intended to measure (Heale and Twycross, 2015). It is possible for a measurement item to be reliable but not valid, however, for a measurement instrument to be valid, it must be reliable. For the sake of illustration, a bus arrives the bus station at 8:15am every morning but is scheduled for 8:11am; it is reliable because it arrives the same time each morning. However, it is not valid because it is not meeting its scheduled time. Previous research suggested that validating survey instruments are fundamental requirement before the conceptual framework of the study can be analysed (Straub, 1989). This is important for confirming the reliability of the instrument used which in turn will help gather valid data and produce thorough interpretation of findings (Moon and Kim, 2001; Dwivedi et al., 2006).
Most studies assessing human behaviour usually involve quantifying attributes that cannot be measured directly (Dunn et al., 2014). In its place, hypothetical constructs or concepts such as attitude, self-efficacy and perceived use are measured by inferring from observed behaviours, which could be indicators of the existence of the concept (Kimberlin and Winterstein, 2008). Considering that the study is dealing with the assessment of human behaviour, validation is therefore important for making an appropriate generalisation with regards to the sample population. Examples of validations commonly carried out in a study include the following:
Content validity: Content validity is a method used to assess whether the selected items
address all aspects of the concept it is intended to address (Haynes, et al., 1995). It goes hand-in-hand with the pretesting of the questionnaire. In addition, it is a method that assesses
if the measurement item is a true representative sample of the concept in question by ensuring that the theoretical and practical aspects are taken into consideration (Adamson and Prion, 2013; Heale and Twycross, 2015). According to Pandey and Chawla (2016), content validity is a fundamental element of construct validation given that it is sometimes applied in validating and refining the construct. This type of validation usually requires the judgement of experts who possess some knowledge in the area being studied. Furthermore, content validity ratio or index proposed by Lawshe (1975) is the most commonly used method for determining content validity. According to Lawshe (1975), the validity of a survey question is higher if over average of the number of selected experts agree that the question is essential. Further details regarding the content validity exercise carried out for this study are provided in chapter 4.
Criterion-related validity: According to Heale and Twycross (2015); Boudreau et al.
(2001) this kind of validity is a form of construct validity and can be measured through concurrent and predictive validation. It assesses whether responses can predict a criterion measure as well as whether the results of the survey correlate with results from other sources (Glasow, 2005).
Construct validity: According to Schriesheim et al. (1993), a construct is defined as a
hypothetical variable used in research to explain a phenomenon. Construct validity refers to the assessment of a measure as an effective indicator of a concept (Heale and Twycross, 2015). Overall, all steps of validation carried out in a study including content validity, criterion-related validity, convergent and discriminant validity are all important components for achieving construct validation (Pandey and Chawla, 2016).
3.9.2 Reliability
Reliability testing is defined as the degree to which a research item is consistent or stable in terms of producing the same results if applied in the same conditions. According to Golafshani (2003), reliability testing helps to determine if the result is replicable. A study conducted by Heale and Twycross (2015) states that there are three attributes for estimating
reliability including internal consistency or homogeneity, stability or test – retest and equivalence or inter-rate. Further details on the techniques employed in estimating reliability are discussed below.
Internal consistency: Internal consistency is the most commonly used reliability estimator
(Heale and Twycross, 2015). It is defined as the degree to which all the instruments in a test measure the same constructs in order to determine the inter-relatedness of the instruments in the test (Tavakol and Dennick, 2011). This helps to ensure the validity of the constructs being used in the research. Although there are various methods for evaluating the internal consistency of an instrument, however, Cronbach alpha (Cronbach, 1951) is the most widely used analysis for measuring reliability. According to the method, alpha ranges from 0 to 1 with acceptable alpha ranging between 0.7 and above (Litwin, 1995. p31). However, Streiner (2003) suggests that Cronbach alpha must be used carefully because the size of the scale has an effect on alpha. Streiner (2003) further suggested that one could derive an acceptable value for alpha if the number of items being measured is over 20. Concurring with Streiner (2003), Adamson and Prion (2013) recommended that alpha would be an acceptable method for determining reliability if items have over two response options. In this study, Cronbach alpha was considered alongside other reliability estimators such as composite reliability and Dijkstra-Henseler’s rho_A (ρA) (Hinton and Brownlow, 2014).
Stability: Stability is the ability of an instrument to show that there is a high correlation
between measurements from one time to another. Stability is usually assessed using test- retest and parallel-form reliability testing. Test-retest is used to examine the consistency of measures when an item is administered to the same participants more than once under the same conditions (Heale and Twycross, 2015). Parallel-form reliability is almost related to test-retest except that the same concept is being measured but it is administered to the participants in a different form, for instance, a change in wordings.
Equivalence: Equivalence also known as inter-rater reliability is defined as a process used to
Having provided a detailed discussion on the reliability and validity processes, the next section discusses the ethical considerations employed in this research.