CHAPTER 2: LITERATURE REVIEW
2.6 Credibility Instrument
2.6.2 Instrument Verification Method
To properly understand the goodness-of-fit of the measurement instruments, reliability and validity should be tested. Scale reliability is the proportion of variance that corresponds to the real score of the latent variable and is often regarded as internal consistency reliability, which is concerned with the homogeneity of the items comprising a scale (DeVellis, 1991). A scale is internally consistent to the extent that its items are highly intercorrelated. Internal consistency is generally considered equivalent to Cronbach's (1951) coefficient alpha, α. Calculating alpha divides the total variance among the sets of items into signal and noise. The proportion of the total variation that belongs to signal is alpha. In other words, alpha = 1 - error. An internal consistency of 0.7 or greater is considered satisfactory (Jöreskog & Sörbom, 1984; Tate, Alexander, & Maheshwari, 2006) and to provide an indication of strong item covariance or homogeneity (Hinkin, Tracey, & Enz, 1997). A value of 0.50 or above is considered acceptable (Netemeyer, Johnston, & Burton, 1990; Tate et al., 2006).
If reliability is about how much a variable affects a set of items, the validity is about whether or not the variable is the underlying cause for the item covariation (DeVellis, 1991). Whether a scale is appropriate as a measure of a particular variable is an issue of validity. There are three types of validity: content validity, criterion validity, and construct validity (DeVellis, 1991). The content validity is related to the sampling suitability of the item (DeVellis, 1991) and the representativeness of the items for the content domain (Ong et al., 2009). In theory, a scale consisting of items randomly selected from the universe of appropriate items has content validity (DeVellis, 1991). For content validity, underlying factors of the variable to be measured are identified mainly through focus groups or expert interviews.
Criterion validity indicates the effectiveness of a measure in predicting behavior under certain circumstances. It can be determined by comparing the correlation coefficient test score with an external criterion or an overall measure (Ong et al., 2009). Criterion validity is a pragmatic issue rather than a scientific one because it focuses on predicting rather than understanding the process. In other words, it is often called "predictive validity" (DeVellis, 1991).
Construct validity refers to the theoretical relationship of a variable with other variables in the instrument. In other words, it is the extent to which a measure representing a construct "acts" as it is supposed to act with regard to the established measures of the other constructs (DeVellis, 1991). Several methods have been used to verify construct validity, including item-to- total correlations, factor analysis, and convergent and discriminant validity evaluations. These verification methods show that the instrument correlates with the variables that should be correlated and does not correlate with the variables that should not be correlated (Ong et al., 2009).
Convergent validity refers to the degree to which the operationalization of a construct is similar to the operationalization of other constructs that should not be theoretically similar. It can be tested by examining whether associations between items of the same factor are statistically significantly larger than zero (Ong et al., 2009). Discriminant validity refers to the degree to which the operationalization of a construct is not similar to the operationalization of other constructs that should not be theoretically similar. Discriminant validity can be determined by counting the number of cases where an item correlates more with items that belong to different factors than the items of the factor to which it belongs. The count should be less than 50% of the potential comparisons (Ong et al., 2009).
For exploratory factor analysis, eigenvalues greater than 1 are primarily used as a criterion for determining the number of factors. Only those items that are obviously loaded into one factor should be selected. A 0.4 criterion level is commonly used to determine the
meaningful factor loadings (Hinkin et al., 1997). The factor analysis and item deletion process should be repeated until all items are analyzed. If no cross-loadings of items are found, the discriminant validity of the instrument can be supported. The major disadvantage of the exploratory factor analysis is that it is difficult to quantify the goodness-of-fit of the derived factor structure (Hinkin et al., 1997).
Instead, the confirmatory factor analysis can be used to measure the quality of the factor structure by examining the statistical significance of the overall model and the relationships among items and scales (Hinkin et al., 1997). Various statistics can be used to measure
goodness-of-fit: the chi-square statistic, comparative fit index (CFI), and the root mean square error of approximation (RMSEA). The chi-square can evaluate the fit of a particular model. The smaller the value, the better the fit of the model (Hinkin et al., 1997). In terms of CFI, the values
larger than .95 imply good model fit (Hinkin et al., 1997), while RMSEA values smaller than or equal to .06 imply a good model fit (Holbert & Stephenson, n.d.; Hu & Bentler, 1999).
2.6.3 Summary
In this section, existing credibility instruments and the method of verifying instruments have been reviewed. Credibility is a somewhat subjective concept, and thus it is difficult to measure it accurately. However, it is true that we cannot expect accurate results without having accurate measurements in science. Among credibility instruments, the one created by Rains and Karmikel (2009) was selected considering the characteristics of the datasets and contexts of this dissertation study. The reviewed methods for verifying instruments provide the bases for testing newly introduced credibility instrument that consists of language that is easy for crowd workers to understand.
CHAPTER 3: METHODS