Model Evaluation - ANALYTICAL METHODOLOGY

CHAPTER 5 RESEARCH METHODOLOGY AND DESIGN

5.5 ANALYTICAL METHODOLOGY

5.5.3 Model Evaluation

The two steps in evaluating the PLS model involve assessing the measurement model (also referred to as the outer model), which relates the indicators to their associated latent variables, and the structural model (also referred to as the inner model), which relates endogenous latent variables to other latent variables (Hair et al., 2006). In assessing a PLS model, the traditional parametric-based techniques for testing significance would be inappropriate on the basis that PLS makes no distributional assumption, other than predictor specification (Chin, 1998b). Therefore, tests consistent with the distribution-free predictive approach of PLS should be adopted (Wold, 1980a; 1982). Consequently, the PLS model was evaluated using prediction- oriented measures that are nonparametric, including various techniques suggested by Chin (1998b).

(a) Evaluating the Measurement Model

The aim of assessing the measurement model is to test the reliability and validity of the model, which is accomplished by examining two elements of factorial validity: convergent and discriminate validity (Churchill, 1979; and Gerbing & Anderson, 1988). Validity tests are performed to ensure that the measures perform adequately, by illustrating how well the measurement items relate to the constructs (Gefen & Straub, 2005). When factorial validity is satisfied, it means each measurement item correlates strongly with the one construct it is related to, while correlating weakly or not significantly with all other constructs. The literature provides several criteria for validating reflective constructs (Chin, 1998b; Gefen & Straub, 2005; Barroso et al., 2010; and Gotz et al., 2010) which includes: indicator reliability, construct reliability, convergent validity and discriminant validity. All these measures are generated by the bootstrapping procedures of PLS-Graph Version 3.

140

(i) Indicator Reliability

The reliability of individual indicators or measures is evaluated by examining the loadings, or correlations, of the indicators with their respective latent variables (Hulland, 1999; and Barroso et al., 2010). A commonly accepted threshold is to accept items with loadings of 0.707 or more, which implies that there is more shared variance between the constructs and its measures, than error variance (Chin, 1998a; Hulland, 1999; Barroso et al., 2010; and Gotz et al., 2010). Arguably, it is equally common to have several items in an estimated model displaying loadings of below 0.707, particularly when new items for newly developed scales are employed (Hulland, 1999; and Chin, 2010).

Chin (1998b), however, cautions against eliminating measures with low loadings in cases where the measures are important to the construct. Chin (2010) advises that the only time to remove measures with low loadings is if these measures are influenced by additional factors, such as a method effect or some other concept. Unlike covariance-based SEM, where including additional poor indicators will lead to a worse fit; in PLS the inclusion of poor indicators will help to extract what useful information is available in the indicator to create a better construct score (Barroso et al., 2010). Given that PLS works with determinate constructs, poor indicators are factored in by lower weights (Barroso et al., 2010). Therefore, keeping items with low loadings may still increase predictiveness since the PLS algorithm will still weigh it to the extent it helps minimise residual variance, as long as other more reliable indicators exist (Chin, 2010).

(ii) Construct Reliability

The construct reliability assessment allows the evaluation of the degree to which a variable, or a set of variables, is consistent in what it intends to measure (Straub et al., 2004). Construct reliability is established by examining the composite reliability which is a measure of internal consistency developed by Werts et al. (1974), and applicable to latent variables with reflective indicators (Chin, 1998b). Therefore, the internal consistency for a given block of reflective measures can be evaluated by calculating the composite reliability (Werts et al., 1974), which can be generated through the bootstrapping resampling procedure. Composite reliability is defined as follows:

141

where , F and are the factor loading, factor variance, and unique/error variance respectively. If F is set at 1, then is the 1-square of .

Although this measure is similar to Cronbach‟s alpha, it does not assume that all indicators are equally weighted (Chin, 1998b). Values larger than 0.6 are considered to be acceptable (Bagozzi & Yi, 1988).

(iii) Convergent Validity

The Average Variance Extracted (AVE) is commonly used to measure convergent validity for reflective measures (Fornell & Larcker, 1981; and Gotz et al., 2010). The AVE attempts to measure the amount of variance that a latent variable captures from its indicators, relative to the amount due to measurement error (Chin, 1998b). Arguably, this ratio is more conservative than composite reliability and is only applicable to constructs with reflective indicators. AVE is calculated as follows:

(

₎

where , F and are the factor loading, factor variance, and unique/error variance respectively. If F is set at 1, then is the 1-square of .

AVE values should be greater than 0.50, demonstrating that 50 percent or more of the indicator variance should be accounted for (Bagozzi & Yi, 1988; Chin, 1998b; Chin & Dibbern, 2010; and Barroso et al., 2010).

(iv) Discriminant Validity (Cross Loadings and Squared Average Variance Extracted)

Discriminant validity demonstrates the extent to which a given construct differs from other constructs (Barroso et al., 2010). Discriminant validity is established when each measurement item correlates weakly with all other constructs except for the one to which it is theoretically associated (Gefen & Straub, 2005). Discriminant validity is assessed in two ways: the first is by examining how each item relates to the latent constructs (cross loadings), and the second is by comparing the square root of the AVE values with the correlations among constructs.

Cross loadings measures are derived by correlating the component scores of each latent variable with both their respective block of indicators and all other items included in the model (Chin, 1998b). Correlation of the latent variable scores with the measurement items have to show an appropriate pattern of loadings, one where the measurement items load highly on their theoretically assigned construct and not highly on other factors (Gefen &

142

Straub, 2005). Currently there is no widely accepted threshold to establish discriminant validity; however, it is commonly accepted that all loadings of the measurement items on their assigned latent variables should be larger than any other loadings (Gefen & Straub, 2005). Chin (1998b) further notes that any indicator that loads higher with other latent variables than the one it is intended to measure should be considered for elimination.

Square Root of Average Variance Extracted is another approach for establishing

discriminant validity (Fornell & Larcker, 1981). In theory, the AVE test claims that the correlation of the latent variable with its measurement items should be larger than its correlation with the other latent variables (Gefen & Straub, 2005), and should be at least 0.50 (Fornell & Larcker, 1981).

(b) Evaluating the Structural Model

The main aim of evaluating the structural model is to test for the model‟s predictive power and the stability of the estimates. Given the unsuitability of traditional parametric-based techniques for evaluating PLS models, non-parametric prediction-oriented measures are needed. This includes applying the R2 measures to predict the power of the endogenous constructs and examining the effect size to assess whether a predictor variable has a significant influence on the dependent variable. In addition, the global goodness of fit index was used to evaluate the overall fit of the model.

(i) R-square (R2)

The R2 values for each dependent (endogenous) construct in the PLS model represent the amount of variance in the endogenous construct that is explained by the model. The R2 values generated by PLS are equivalent to the R2 values derived from traditional regression analysis.

R2 is a normalised term that can assume values between 0 and 1. Arguably, there are no guidelines to determine the acceptable threshold value of R2. To determine whether this determination coefficient is deemed acceptable or not depends on the individual study. However, the larger R2 is, the larger the percentage of variance explained.

The effect size, , is used to assess whether a predictor variable has a significant influence on the dependent variable. The value represents the change in R2 in the dependent variable when a predictor latent variable is used or omitted in the structural equation. The effect size is calculated as follows:

143

where R2 included indicates the R2 of the dependent variable when the independent variable is included, and R2 excluded indicates the R2 of the dependent variable when the independent variable is excluded.

A higher value indicates greater influence of the predictor variable on the dependent variable. An effect size of 0.02, 0.15 and 0.35 indicates a small, medium or large influence on the predictor variable, respectively (Cohen, 1988). A small , however, does not necessarily imply an unimportant effect (Wilson, 2010). In the present study, a number of sub models were created, each with one path missing in order to test for their effect size.

(ii) Path-Coefficients

The PLS structural model‟s path coefficient values are interpreted in a similar manner to standardised regression coefficients (Fornell & Cha, 1994; and Gefen et al., 2000). Path coefficients indicate the strength of the relationships between the dependent and independent variables. The stability of the path estimates can be assessed through the PLS resampling techniques.

(c) Resampling Techniques

(i) Q-Square Predictive Relevance (Blindfolding)

The predictive sample reuse technique, as developed by Stone (1974) and Geisser (1975), can also be applied to test the model‟s predictive validity (Chin, 1998b; 2010). The technique represents a combination of cross-validation and function fitting with the perspective that the prediction of observables or potential observables is of much greater relevance than the estimation of what are often artificial construct-parameters (Geisser, 1975). In PLS, the blindfolding procedure is used to carry out this test, which omits a part of the data for a particular block of indicators during parameter estimations and then attempts to estimate the omitted part using the estimated parameters. This procedure is repeated until every data point has been omitted and estimated. The predictive measure for the block becomes:

∑ _∑

where d is the distance point; E is the sum of squares of prediction error; and O is the sum of squares errors using the mean for prediction. Q2 represents a measure of how well- observed values are reconstructed by the model and its parameters. If Q2 measures more than 0, the model is considered to have predictive relevance, whereas a Q2 measure of less than 0 represents a lack of predictive relevance (Chin, 1998b; 2010).

144

The blindfolding procedure generates two different forms of Q2: the cross-validated communality Q2 and the cross-validated redundancy Q2 (Fornell & Cha, 1994; and Chin, 1998b; 2010). The cross-validated redundancy measures the ability of the model to predict the endogenous manifest variables using the latent variables that predict the block in question, and serves as a sign of the quality of the structural model (Tenenhaus et al., 2005). The cross- validated communality measures the ability of the path model to predict the manifest variables or data points from their own latent variable score, and serves as an indicator of the quality of the measurement model. Chin (1998b) suggests using the cross-validation redundancy measure to evaluate the predictive relevance of the theoretical/structural model. An omission distance for blindfolding of between 5 and 10 is considered to be sufficient. (Wold, 1982; and Chin, 1998b; 2010).

(ii) Jackknifing

The jackknife is an inferential technique that assesses the variability of a statistic by examining the variability of the sample data rather than using parametric assumptions (Chin, 1998b). This technique provides both estimates and compensates for bias in statistical estimates by developing robust confidence intervals. The procedure deletes n cases where n

is typically 1. Parameter estimates are then calculated for each instance and the variations in the estimates are analysed. The jackknife, however, is viewed as less efficient than the bootstrap because it can be considered as an approximation to the bootstrap (Efron & Tibshirani, 1993; and Chin 1998b).

(iii) Bootstrapping

The bootstrap is a nonparametric technique for estimating the accuracy of the PLS estimates and is preferable to the less efficient jackknife approach (Chin, 1998b). This technique creates n sample sets in order to obtain n estimates for each parameter in the model. Each sample is obtained by sampling with replacements from the original data set until the number of cases is identical to the original sample set. A number of approaches for estimating confidence intervals have been developed, but the two procedures available in PLS-Graph are the jackknife and bootstrapping methods. Considering that the jackknife is judged to be less efficient than the bootstrap, and because it is also considered as an approximation to the bootstrap (Efron & Tibshirani, 1993), the current study will adopt the bootstrapping resampling technique.

Efron (1987) noted that applying 100 bootstrap iterations would suffice for the estimation of standard errors, but supported the use of 1,000 iterations, for deriving good estimates for

145

the bootstrap confidence intervals. Efron & Tibshirani (1986) explain that confidence intervals are essentially a more ambitious measure of statistical accuracy than standard errors, and therefore require more computational effort.107 Most recent studies on bootstrapping techniques tend to suggest the use of 1,000 resamples (Chernick, 2008). This study will therefore use 1,000 resamples for the bootstrapping procedure, in an attempt to improve the accuracy of the models‟ estimation of the confidence intervals and estimates of standard errors.

(d) Overall Model Validation

(i) Goodness of Fit Index

PLS does not optimise any global scalar function and, therefore, an index to evaluate the overall quality of the model is not available (Duarte & Raposo, 2010). To overcome this shortcoming, a global criterion of goodness of fit (GoF) was proposed by Tenenhaus et al. (2004) for validating the PLS model globally. The GoF index takes into account the model‟s performance in both the measurement and the structural model, thus providing a single measure for the overall prediction performance of the model (Esposito Vinzi et al., 2010). The fit of the model is determined by taking the square root of the product of the geometric mean of the average communality and the average R2:

where the average communality is computed as weight average of the different communalities with the number of manifest variables or indicators of every construct as weights. The average R2 is the average R2 of the endogenous constructs.

The first part of the formula measures the quality of the outer model and the second part measures the quality of the inner model. The computation for the average communality should only be used for constructs with multiple indicators (Tenenhaus et al., 2005). Single indicator constructs should not be used for the computation of the average communality, because they lead to communalities equal to 1 (Tenenhaus et al., 2005). Further, the GoF is considered to be more appropriate for reflective models (Esposito Vinzi et al., 2010).

The computed GoF index ranges from between 0 and 1; however, there is no inference- based threshold to judge the statistical significance of their values (Esposito Vinzi et al.,

107

Advances in modern technology mean that researchers are not limited by computational speed when determining the number of iterations in a bootstrapping procedure.

146

2010). Further, there are no clear guidelines as to the threshold for the values; however, recent studies seem to suggest that a GoF index of approximately 0.3 is acceptable (Duarte & Raposo, 2010; and Tenenhaus et al., 2005).108 Further, Chin (2009) also considered a GoF of 0.3 to be adequate.

The research model comprises constructs all having multiple reflective indicators, which is suitable for calculating the GoF statistic. The GoF index was therefore computed to measure the fit of the combined measurement and structural model. This is more efficient than applying the two separate Q2 tests (communalities and redundancy tests) prescribed for the Stone-Geisser test, which requires two separate measures (Stone, 1974; and Geisser, 1975).

5.6 SUMMARY

This chapter introduced the research methodology and research design adopted for this study. The key objectives were to examine the influence of beliefs and attitudes towards paying tax, and to test the applicability of the TPB model and provide justification for the use of SEM with PLS in tax compliance research.

The development of the questionnaire relating to the TPB model was guided by the guidelines established by Ajzen (1991), and the remaining questions were based on prior literature. The design of the survey instrument was influenced by Dillman‟s (1978; 2000) „tailored design method‟. Both the mail and web-based surveys were self-administered to randomly selected taxpayers, tax agents and tax lawyers.

This chapter also presented a description of the analytical methodology, which included details of the approaches used to address missing data, nonresponse bias, sample representativeness, and the descriptive analysis proposed for the selected demographic and study variables.

Next, the SEM with PLS approach was introduced, followed by a discussion of the key differences between the PLS and the CBSEM methods. Reasons were also provided to justify the use of PLS (as opposed to CBSEM) for analysing the survey data. The validation process of the measurement models, which included a number of reliability and validity tests, was discussed. The process involved testing for indicator reliability, construct reliability, convergent validity and discriminant validity. The methods applied to evaluate the structural models were also discussed, which included: estimating the path coefficients, and the

108

Duarte and Raposo (2010) obtained a GoF index measuring 0.3814, whereas Tenenhaus et al. (2005) obtained a GoF index of 0.4645.

147

variance explained or R2 for each endogenous construct in the model. These two measures were used to evaluate the predictiveness of the survey models. The bootstrapping method with 1,000 iterations (resamples) was used for the resampling technique.

Finally, the computation of a GoF index was discussed, which is a nonparametric test to evaluate the overall performance of the research model developed to test the hypotheses established in Chapter 4. The next chapter presents the results from the preliminary analysis.

149

In document The application of the theory of planned behaviour and structural equation modelling in tax compliance behaviour: a New Zealand study (Page 151-161)