Data analysis - Questionnaire survey

CHAPTER 7 RESEARCH METHODOLOGY

7.5 Questionnaire survey

7.5.2 Data analysis

The simple data analysis of responses was first conducted by ranking the problems in terms of degree of occurrence and level of influence. The analysis also examined whether there was any difference in perceptions of different groups of respondents by using rank correlation. Scores, therefore, were first converted to ranks. Based on the ranks, the Spearman’s rank correlation coefficient would be computed to test the

strength of associations between the rankings of respondent groups using the following formula (Tan, 2007):

1 6 ∑

where di is the difference between ranks for ith characteristic and n is the number of characteristics ranked. For large samples (n>20), a test of significance for r (with Ho: the correlation is not significant) is given by (Tan, 2007):

1 ~ N(0,1)

Further data analysis of this study involved the analysis of 35 variables in order to identify key factors affecting the efficiency of government-funded infrastructure construction. In order to identify key factors, the degree of criticality was considered first. By considering the degree of criticality first, the study could eliminate factors that are not perceived as critical. Those factors that are perceived as critical would then be selected for further data analysis of their degree of occurrence.

Data analysis is thus first performed to extract the problems that are perceived as critical by the respondents based on their degree of criticality. In next step, the relationships between the occurrences of the key problems and the outcome or effect are examined. Since the data analysis involved simultaneous analysis of the large number of variables, the application of multivariate statistical techniques are considered. The selection of an appropriate multivariate technique depends on how variables are classified as shown by Hair (2010). Without a reason to divide the variables in this study into dependent and independent classifications, an interdependence technique should be utilized in factor analysis.

Among interdependence techniques, factor analysis and structural equation modeling (SEM) are useful for studies focusing on variables rather than on cases or

variables by reducing a large number of variables into a smaller set of factors.

Meanwhile, SEM is more appropriate in analyzing the cause-effect relations between factors (Hair et al., 2010). As a result, in the first stage, factor analysis was used in this study to extract key causal factors limiting the efficient use of public investments in infrastructure. In the second stage, SEM was used to examine the relationships between these key causal factors and the outcomes; the dependent variable is a measure of the occurrence of the outcomes of the inefficient use of public investments in infrastructure, and the independent variables are measures of the occurrence of the factors. Measures of the outcomes and factors are thus the respondents’ ratings of the occurrence of the outcomes and factors.

Using factor analysis to extract key factors from a large number of variables with p variables and k common factors to be determined and a unique factor δ, the factor analytic model can be written as:

X1 = l11F1 + l12F2 + … + l1kFk + δ1

X2 = l21F1 + l22F2 + … + l2kFk + δ2

…

Xp = lp1F1 + lp2F2 + … + lpkFk + δp

The coefficients (ls) reflect the contribution of each factor to each variable, thus called factor loadings. In matrix form,

x = Lf + δ

where x is a p x 1 vector, R is the correlation matrix, x ~ N(0, R) and Var(xi) = 1; L is a p x g matrix of factor loadings; f is a g x 1 vector of independent common factor, f ~ (0, I); δ is a p x 1 vector of unique factors, δ ~ (0, Ψ) or Var(δi) = ψi; and f and δ are independent. Hence,

Var(xi) = Var(li1F1 + li2F2 + … + ligFg + δi) = li12Var(F1) + … + lig2Var(Fg) + Var(δi) = ∑ lij2 + ψi = 1

Var(x) = R = E[xx´] = E[(Lf + δ)(Lf + δ)´]

= E[(Lf + δ)(f´L´ + δ´)]

= E[Lf f´L´ + δδ´]

= LL´ + Ψ

This relation is used to estimate factor loadings. The first step is to compute R:

R = λ1u1u1´ + … + λpupup´

where λs are eigenvalues of R and us are corresponding eigenvectors. Eigenvalues greater than 1 is used to identify the number of factors (g) to retain. Then,

LL´ = λ1u1u1´ + … + λgugug´ and ψ = R - LL´(Tan, 2007).

Principle component analysis is used to extract factors and VARIMAX rotation is used to rotate factor loadings so that they are close to 0 and 1 to facilitate factor interpretation (Hair et al., 2010). With the large number of variables and data, the use of software programs is considered. For statistical analysis, the Statistical Package for Social Sciences (SPSS) is chosen for factor analysis in this study. There are some common statistical packages, such as MINITAB, SPSS, and SYSTAT. However, the SPSS is a comprehensive statistical package widely used by social scientists (Tan, 2007). By utilising this software package it is possible for researchers to:

- import data from a variety of formats;

- retrieve and view a pre-existing data file;

- manipulate and perform statistical analysis; and

- produce joint configurations of tables and graphs (Bryman and Cramer, 2009).

Utilizing this software, raw data were first entered into a computer and frequencies and percentages were calculated. Tables and cross tabulations were then constructed in order to examine the relationship between variables.

The next stage involved the SEM to identify the relationships between the occurrences of key factors extracted from the factor analysis stage and the outcomes of the inefficient use of public investments in infrastructure. Causal factors that were statistically perceived as significant in the factor analysis were used as the independent variables in SEM. The outcomes the inefficient use of public investments in infrastructure were the dependent variables in SEM. A single regression analysis can only interpret a single relationship between two variables. However, in a model with a number of interrelated variables, it would not be meaningful for the relationships to be interpreted separately. SEM is thus more useful to simultaneously examine the structural relationships. SEM examines the structure of interrelationships expressed in a series of equations, which depict all of the relationships among constructs (the dependent and independent variables) involved in the analysis (Hair et al., 2010). As multiple relationships are expected among variables in this study, SEM is useful to indicate which relationships are critical among variables and which require more attention.

There are two distinct approaches to carry out SEM analysis: covariance-based SEM (CB-SEM) and partial least squares SEM (PLS-SEM). CB-SEM technique estimates model parameters by minimizing the difference between estimated covariance values and observed covariance values (Hair et al., 2010). In order to apply CB-SEM, several assumptions of data and minimum sample size need to be met.

Compared to CB-SEM, PLS-SEM has less stringent assumptions of data and minimum sample size. Statistically, PLS-SEM also uses different approach to produce parameter estimates. PLS-SEM estimates model parameters by maximizing the explained variance of the dependent latent constructs. While CB-SEM has been widely applied in marketing researches, the number of studies using PLS-SEM is

growing. On the one hand, CB-SEM is appropriate if the research objective is about theory confirmation. On the other hand, PLS-SEM proves more useful if the research objective is about prediction or more about an extension of an existing structural theory. Moreover, when data characteristics or sample size do not meet the CB-SEM assumptions, PLS-SEM estimates can be used as proxies of CB-SEM results. Another advantage of PLS-SEM is the ability to produce estimates of factor scores which are specifically valuable in impact-performance analyses. Based on factor scores, key factors can be identified for potential performance improvements (Hair et al., 2011).

Given the research objective is to uncover key issues affecting the investment efficiency of funded infrastructure development, PLS-SEM is more appropriate for this study. Among software packages executing PLS-SEM, SmartPLS software can conduct all data analyses required in PLS-SEM (Hair et al., 2011). The software is thus selected in this study. With a path model example, the stages and steps in calculating the basic PLS-SEM algorithm is presented in Figure 7.1.

Figure 7.1 Path model example β₂

The path model example shown in Figure 7.1 has two components. The structural model (the inner model) in the PLS-SEM context refers to the relationships between latent variables yg (LVs). The measurement model (the outer model) refers to the relationships between latent variables and observed variables (x1, …, x3). The following PLS-SEM algorithm is adopted from Monecke and Leisch (2012).

In the measurement model, the latent variables yg (LVs) is measured by the block Xg consisting of three observed variables such as x1, x2 and x3 (MVs) in the model. Assuming that all MVs contained in the data matrix X are scaled to have zero mean and unit variance; and each block of MVs Xg is already transformed to be positively correlated for all LVs yg, g = 1,…., G. Each block of MVs reflects its LV and can be written as the multivariate regression:

, 0

So outer weight can be estimated by least squares as , ,

, .

Assigning κg = {k {1, …, K} | xk ~ yg} to be a set of indices for MVs related to LV yg

the wg, g = 1, …, g, is a column vector of length |κg|. Adjacency matrix M for the measurement model has the same structure as the matrix of outer weights W, and is used for the initialization. With mkg = 1, MV xk is one of the indicators of LV.

Calculating the basic PLS-SEM algorithm thus starts with matrix M used for the initialization:

- Step 1: Initialization (constructing each LV as a weighted sum of their MVs):

scaling LVs to have unit variance.

In document Developing Critical Capacities for the efficient use of Public Investments in Infrastructure to Support Trade and Economic Development in Vietnam (Page 194-200)