Assumptions in PLS Path Modelling - Explanatory Stage Data Preparation and Analysis

Technology Society

3.5 Confirmatory Research Phase .1 Method .1 Method

3.5.4 Explanatory Stage Data Preparation and Analysis

3.5.4.2 Assumptions in PLS Path Modelling

Before examining the findings, it is important to provide an overview of the PLS technique used and the underlying assumptions, as well as the particulars of the SmartPLS program which was used for analysis. As mentioned before, literature has cited that the distinctions of PLS path modelling as an analysis technique include less stringent distribution

assumptions (Fornell and Bookstein, 1982) and sample size requirements (Chin and Newsted, 1999) that it is more robust in working with complex models using many latent variables and measurement items (Henseler, 2009), and the ability to work with formative latent variables. These considerations and factors particular to the SmartPLS program which is being used for data analysis and interpretation are discussed in the following.

Complex models

PLS path modelling is a predictive process which can handle many independent variables (Hubona, 2010) and thus was an appropriate choice for the analysis of this model. Hubona states what with very complex models, PLS has an advantage because it breaks the model down into its individual components (a succession of partial models) and scores the specific relationship between a pair of variables before going on to assess the next relationship.

Covariance based techniques are not as flexible when analyzing complex models, with PLS being able to retain power and reliability possibly over that resulting from covariance based analysis.

Predictive capacity of PLS

Whilst PLS has strong predictive capacity and can be used in theory development, it has been widely applied to research as a theory confirmation technique (Hubona, 2010).

Henseler and Fassot (2010) write that as PLS path modelling does not rely on distributional assumptions, direct inference statistical tests of the model fit are not available. Vinzi, Tinchero and Amato (2010) write that whereas there is no overall fit index in PLS path modelling, a global criterion of goodness of fit has been proposed by Tenenhaus et al, (2004), the GoF index. However, usage of this index is not common practice yet. PLS maximizes the variance explained of the endogenous variables, and as such PLS is designed to explain variance, or to examine the significance of the relationships and their resulting R2, as in linear regression (Gefen et al., 2000).

Multicollinearity

Hubona (2010) also states that because of the orthogonal analysis structure of PLS, multicollinearity amongst items is less of a problem, but this should be assessed and avoided at the beginning of the analysis procedure (the predictive latent variables should share less than 50 per cent of the variance explained). Ideally predictor variables should not be correlated at all but this is not an uncommon occurrence. As discussed previously, using exploratory analysis techniques, factor analysis was performed on the moderating

community dimensions in order to refine the scale and prevent problems of

multicollinearity within these constructs. Whilst it is desirable to have collinearity within a block of items measuring one construct, it is not desirable to have measurement items with a high correlation to other latent constructs (Hubona 2010), as this clouds results and interpretation a clear cause and effect relationship is difficult. Thus cross correlations between unrelated latent variables should be to a minimum.

Distribution Assumptions

Additionally, as opposed to covariance based SEM techniques which use maximum likelihood estimation, and require a sample which is normally distributed, PLS is a

“distribution free” approach, in that it does not make assumptions of the presence of normally distributed data. As the distribution of data is not known (or assumed), typical inferential techniques, such as conventional significance tests (e.g. confidence intervals) which require a normal distribution are not available. However, in order to provide t-values and levels of significance, the program SmartPLS uses a bootstrapping method of

resampling within the data set. This will tell if the item is statistically significantly different than zero.

Standardized values

SmartPLS additionally standardizes all values so they have a mean of zero and a standard deviation and variance of 1. This has the advantage that if items were measured on different scales, these would all be standardized and thus path coefficients within the model can be compared which makes model interpretation easier (Hubona, 2010). SmartPLS however, also provides the capability to show unstandardized coefficients, as these may be important when comparing variables from different samples. The PLS method also provides better path estimates when using interval data such as a Likert scale (Hubona, 2010), as there is more information about variance than, for example, with a categorical scale containing fewer groupings. Hence, the use of PLS for Likert data such as used in this study is supported in the literature.

Sample Size and Power

An additional advantage of the PLS technique is its ability to work with smaller sample sizes. Chin (1998) indicates that an appropriate sample size can be determined by

multiplying the maximum number of measurement items for a latent construct by ten (e.g.

if within a model the highest number of items used to measure a latent construct is 3; the minimum sample size is 30). This has been more recently criticized as the reliability of the estimates will be lower. It is important to take into consideration the power of the effect trying to be measured and determine the appropriate sample size based on this (Hubona, 2010). In general, the smaller the sample size, the less reliable the path estimates will be.

Power is the measure of whether a detected effect within a model is reproducible within the general population. A power of 80 percent is generally accepted as it shows that if an effect is seen within the data set, it occurs in at least 80 percent of the instances within the data set. In this research, the software program GPower was used to determine the necessary sample size in order to assess a medium effect and the appropriate size was determined to be 96.

The objective of PLS is, overall, the same as that of linear regression: to show high R² and significant t-values and to reject the null hypothesis of no effect (Thompson et al., 1995).

The objective of covariance based SEM, on the other hand, is to show that the null

hypotheses the assumed research model with all its paths is insignificant, meaning that the complete set of paths as specified in the model that is being analyzed is plausible, given the sample data (Gefen et al., 2000) Additionally, the goodness of fit tests, such as chi square are not available with PLS so alternative methods must be used. Good model fit is

established with significant path coefficients, acceptably high R² and internal (construct reliability) being above .70 for each construct (Thompson et al., 1995; Gefen, et al., 2000).

Gefen continues that convergent and discriminant validity are ensured by checking that the AVE of each construct is larger than its correlation with the other constructs, and that each item has a higher loading (calculated as the correlation between the factor scores and the standardized measures) on its assigned construct than on the other constructs (Gefen et al., 2000).

The following tables 3.4 and 3.5 (adapted from Gefen et al., 2000) illustrate guidelines to assess the statistical validity of the model and constructs using PLS. Guidelines used for this research are highlighted in bold within the tables.

Validity Technique Heuristic

>.90) and an insignificant c2, to show unidimensionality. Item loadings should be above .707, to show that over half the variance is captured by the latent

construct. (Chin, 1998, Hair et al., 1998, Segars, 1997, Thompson et al., 1995).

Discriminant Validity

CFA used in

covariance based SEM

Comparing the c2 of the original model with an alternative model where the constructs in question are united as one construct. If the c2 is significantly smaller in the original model, discriminant validity has been shown (Segars, 1997).

Convergent and

Each construct AVE should be larger than its correlation with other

constructs, and each item should load more highly on its assigned construct than on the other constructs.

Reliability Internal Consistency

Cronbach’s alpha Cronbach’s alphas should be above .60 for exploratory research and above .70 for confirmatory research (Nunnally, 1967, Nunnally, 1978, Nunnally and Bernstein, 1994, Peter, 1979).

SEM The internal consistency coefficient

should be above .70 (Hair et al., 1998, Thompson et al., 1995). smaller c2 in the proposed measurement model in comparison with alternative measurement models (Segars, 1997).

Table 3.4 Comparison of Construct Validity and Reliability Guidelines

Validity Technique Heuristic Model Validity

AGFI LISREL AGFI > .80 (Segars and Grover,

1993) Squared

Multiple Correlations

LISREL, PLS No official guidelines exist, but the larger these values, the better

2 LISREL Insignificant and 2 to degrees of

freedom ratio of less than 3:1 (Chin and Todd, 1995, Hair et al., 1998)

Residuals LISREL RMR <.05 (Hair et al., 1998)

NFI LISREL NFI > .90 (Hair et al., 1998)

Path Validity Coefficients

LISREL The and coefficients must be

significant; standardized values should be reported for comparison purposes (Bollen, 1989, Hair et al., 1998, Jöreskog and Sörbom, 1989)

PLS Significant t-values (Thompson et

al., 1995).

Linear Regression Significant t-values (Thompson et al., 1995).

Table 3.5 Comparison of Model Validity Guidelines

In document Assessing determinants of customer loyalty in an online news service context (Page 178-183)