Parametric Techniques - Quantitative Data Measurement and Analysis

CHAPTER 4. Research Methodology

4.7 Quantitative Data Measurement and Analysis

4.7.4 Parametric Techniques

Before starting statistical analysis, it has been emphasised by many authors (Fink, 2003, Lawton and Bass, 2006, Garth and Hallam, 2008, Gatignon, 2010) to check normality of data whether data are parametric or nonparametric. In general parametric data are assumed to be normally distributed. It means that the most value of data is distributed close to mean. Garth and Hallam (2008) argued that if the researcher is not assure about normality of data, it’s safer to assume those data are non-parametric. Garth and Hallam (2008) also expressed that the risk of this assumption is that the non-parametric tests are less sensitive; therefore the result would take smaller effect of missing.

Some statistical tests are based on assuming that data distribution of population is parametric such as T-test and correlation coefficient. The other group of tests are relies on non- parametric distribution assumption including; Kruskal-Wallis and Chi-square. After determining normality of data, the researcher would specify whether questionnaire look for measuring differences or correlation. The differences are will be measuring when there are two set of data and correlation is used when there are set of paired data. In the survey research, often the strength relationship between variables and the differences between groups are considered (Pallant, 2010). However, in some researches the interest is limited to the relationship between variables and there are a number of different techniques in that case.

104 In the following sections these techniques are described to have more clear understanding related to statistical tests available for analysing questionnaire survey.

4.7.4.1 Correlation test

Correlation is the technique that applies when the research is interested in exploring the relationship between two continuous variables. When the two variables are described numerically, correlation coefficient can be applied (Fink, 2003). The correlation coefficient has a range of +1 to -1. In correlation coefficient if we consider two variables Y and Z, which Y is independent variable and Z is dependent variable. +1 correlation indicates that the value of dependent variable growth by the same quantity for each unit growth in the value of dependent variable. Nevertheless, correlation -1 shows strong inverse relationship between independent and dependent variables and zero correlation specifies that there are no relationships between independent and dependent variables. Correlation can be described graphically by scatterplots (See Figure 4-5).

Figure 4-5 Positive and negative correlation between dependent and independent variables

Multiple Regressions is another well-known technique that is a more sophisticated extension of correlation and investigates the ability of a set of independent variables on one continuous dependent measure (Pallant, 2010). Correlation coefficient is symbolised by “r” and the formula for calculation is given as following;

Equation 3- Correlation Coefficient Formula

𝑟 = ∑( 𝑋−𝑋ˉ)(𝑌−𝑌ˉ)

105 -1 < r < +1

There is an important warning that has been noticed by Fink (2003) in using correlation test. The correlation is suitable to indicate the relationship between independent and dependent variables, however, strong correlation relationship between variables cannot claim refer to cause and effect. For instance this research would claim that there is strong relationship between using 5D modelling and information accuracy in structural engineering whereas it cannot be claimed that in structural engineering organisation the information is accurate because those organisation use 5D modelling.

Measuring correlation between ordinal data the Spearman’s rank correlation has been suggested (Keller, 2012). Spearman’s rank correlation coefficient is one of the methods for evaluating hypothesis. The degree of relationship between two ordinal variables can be presented by this descriptive statistical method. After running either pearson or sperman test through SPSS, a table of results will be provided. This result presents correlation coefficient, significant level and number of cases. Pallant (2010) expressed that firstly, number of cases should be check to find out if there are missing data or not. The second factor which should be considered is the direction of relationship between variables. The interpretation of correlation direction depends on the way that questionnaire is designed and variables scored. This interpretation can be also conducted by considering scatterplot. When a relationship between variables is positive it means high score on one variable is correlated with high score on the other variable. On the other hand, when a relationship is negative it means high score on one variable is correlated with low score on other one. The third factor that could be determined from correlation in SPSS is the strength of the relationship. As it has been presented in correlation coefficient formula, “r” indicates strength of the relationship and ranged between -1 and +1. Different authors recommended different methods for interpreting of strength. Pallant (2010, p. 134) mentioned the following method according to Cohen (1988) as following;

Low strength r = 0.1 to 0.29 Medium strength r = 0.3 to 0.49 High strength r = 0.5 to 1.0

106 4.7.4.2 Multiple Regression

The major difference between correlation and regression is explained by Fink (2003, p61). The relationship between variables are examined by correlation test however, the value of mathematical model which impact on relationship between dependent and independent variables can be estimated by regression test. Multiple regressions consists family of techniques to examine relationship between one continuous dependent variable and number of independent variables (Pallant, 2010, p148). Multiple regression is often applied to determine how well a set of variables are able to predict a particular outcome and which variable is the best predictor.

There are several various multiple regression tests and researcher can choose one of those based on the nature of research question. The three major categories of multiple regressions are standard, hierarchical and stepwise (Tabachnick and Fidell, 2007). In standard regression all the independent variables are moved into equation concurrently. In standard regression all independent variables can be compared from each other in terms of their predictive power. In hierarchical regression each independents variables are entered into blocks and each will be evaluated to explore how much it is adding to prediction while other variables have been controlled. In stepwise multi regression a list of independent variables will be provided and then this test decides on which variables should be used in equation (Pallant, 2010).

In the simple regression there is one predictor in equation. The regression equation can be presented graphically as illustrated in figure 4-4. In figure 4-4 the simple regression equation is shown as regression line. The regression line crosses in Y for each unit change in X. The slop of regression line presents the quantity of changes in Y for each unit change in X (Fink, 2003). The positive slope of regression line shows while X rises, The Y will rise despite, negative slope shows X rises as Y reduced (See Figure 4-6).

107

Figure 4-6 Graphic interpretation of regression line (Fink, 2003)

Equation 4-Multiple Regression Equation

Y = 𝛽₀+ 𝛽₁𝑥₁ + 𝛽₂𝑥₂+ 𝛽₃𝑥₃+ …+ 𝛽_𝑘𝑥_𝑘

Where; Y = predicted value on the outcome variable 𝛽₀= predicted value on Y when all x=0 𝑥𝑘 = dependent variable

𝛽_𝑘 = unstandardized regression coefficient K = number of independent variables

4.7.4.3 Factor analysis

Factor analysis is not suitable for testing hypothesis. Factor analysis is a technique to summarise variable to smaller group. Factor analysis is contained in SPSS to reduce data. The smallest number of factors can be determined by factor analysis. Those factors can be best representative of the interrelationships within a set of variables. It is assumed in factor analysis that relationships between variables are linear and this technique is relied on correlation analysis.

Factor analysis provide an opportunity for the researcher to test ideas regarding to variables, which are difficult to measure directly (Taylor, 2010). In other words, factor analysis can

108 support confidence of a research that all variables are the same underlying factor or not. Researcher can obtain several outcomes from running factor analysis test through SPSS. One of the most important result is “eigenvalue” (Pallant, 2010). The factors with an eigenvalue of 1.0 or more will be considered in factor analysis exploration process.

Where dependent variables are defined as X1, X2,… Xn, the common factors are F1, F2,…,Fn (independent variables) and unique factors are U1,U2,…,Un. Therefore the regression function can be defined as following;

Equation 5- Regression Function Formula

𝑥_{1 =} 𝑎₁₁𝐹₁+ 𝑎₁₂𝐹₂+ 𝑎₁₃𝐹₃+…+ 𝑎_1𝑚𝐹_𝑚 +𝑎₁𝑈₁ 𝑥2 = 𝑎21 𝐹1+ 𝑎22 𝐹2+ 𝑎23 𝐹3+…+ 𝑎2𝑚 𝐹𝑚 +𝑎2 𝑈2 …

𝑥_{𝑛 =} 𝑎_𝑛1𝐹₁+ 𝑎_𝑛2𝐹₂+ 𝑎_𝑛3𝐹₃+…+ 𝑎_𝑛𝑚𝐹_𝑚 +𝑎_𝑛𝑈_𝑛 (Taylor, 2010)

𝑎_𝑛𝑚is coefficient. For instance, the coefficient 𝑎₁₁shows the effect on variable𝑥₁. To achieve a score on each factor for each variable the equation can be formed as following;

Equation 6- Factor Analysis Formula

𝐹1 = 𝑏11 𝑥1+ 𝑏12 𝑥2+ 𝑏13 𝑥3+…+ 𝑏1𝑛 𝑥𝑛 𝐹2 = 𝑏21 𝑥1+ 𝑏22 𝑥2+ 𝑏23 𝑥3+…+ 𝑏2𝑛 𝑥𝑛 …

𝐹𝑚 = 𝑏𝑚1 𝑥1+ 𝑏𝑚2 𝑥2+ 𝑏𝑚3 𝑥3+…+ 𝑏𝑚𝑛 𝑥𝑛 (Taylor, 2010)

The main aim of factor analysis is to describe correlation within observed variable with regard to small relative factors. The correlation between variable 𝑥1 and 𝑥2 is formulated by summing up coefficients for two variables across all factors. According to table 4-5, the correlation between variable 𝑥₁and factor 𝐹₁is 𝑎₁₁. And the number at the end of each factor column is the sum of squared loading for that factor.

109

Table 4-4 Correlation table in SPSS

Variable Correlation

Factor1(𝐹1 ) Factor n(𝐹𝑛)

𝒙𝟏 𝑎11 𝑎1𝑛

𝒙_𝟐 𝑎₂₁ 𝑎_2𝑛

𝒙_𝒏 𝑎_𝑛1 𝑎_𝑛𝑛

Pallant (2010) explained all steps that a researcher needs to consider after obtaining output from SPSS for data interpretation. The first step is to verify that collected data is appropriate for factor analysis. It is recommended to consider correlation matrix table and if there are not many correlation coefficient more than 0.3 then the researcher should reconsider using factor analysis. In addition, it is recommended to consider Kaiser-Mayer (More than 0.6) and Bartlett’s test of Sphericity (should be 0.5 or smaller). The second step is determining number of factors to extract. It is suggested in factors which have an eigenvalue of 1 or more. In this step, the Total Variance Explained table will be considered. The third step in interpretation data via factor analysis is interpreting the plot. The shape of this plot will be considering whether, its elbow or change in plot. The factors above the change in plot will be extracted. The Component Matrix will be considered in fourth step. SPSS will retain all factors which are likely to be more appropriate. The last step is to considering on the Pattern Matrix. This matrix shows which factors are being loaded by variables.

4.7.4.4 Analysis of Variance (ANOVA)

Analysis of variance is close to regression test. This statistical test is appropriate to examine and model the relationship between a dependent variable and one or more independent variable (Muller and Fetterman, 2003). The analysis of regression and variance differ in some aspects. Despite regression, variables in ANOVA are qualitative (Categorical) and there is no assumption established that refers to coefficient for variables.

110 The most of the ANOVA’s interest is centralised on comparison of average of more than two populations. In studying ANOVA there are two important terms which are factor and level. Factor is characteristic which is under studying and level is a value of a factor. There are two types of analysis of variance. The first type is one-way analysis of variance. It helps the research to find whether groups differ, however, it would not find where the significant difference is. The second type is two-way analysis of variance. This test measure the impact of two independent variables on one dependent variable (Pallant, 2010). Some of the well- known parametric statistical techniques have been studied up to now. The next section discussed the justifications for adopting descriptive statistical analysis, factor analysis and multiple regressions as statistical techniques to analyse quantitative data in this research.

4.7.4.5 Statistical test adoption justification

The First part of questionnaire survey in this research measured on nominal scale data. These data did not consider numerical values however, this research intends to describe frequently that they occurred. For example, participants have been asked to specify their current job role in their organisation. Analysing these sort of data there are suggested some descriptive statistics including proportion, percentage and ratio (Gatignon, 2010). Proportion describes numbers of responses with specific characteristic divided by the total number of responses. The percentage is a kind of proportion from which can be described in hundredths. The ratio describes the relationships between two different parts. The ratio is the number of responses in a given group with a specific characteristic divided by the number of responses with another characteristic.

The second part of the questionnaire examined a summary of the information management challenges of the participants’ responses given in scores. This part of the questionnaire seeks to explore most critical challenges in structural engineering information management domain. Those responses have been analysed through Weighted Score (WS). WS represents ranking of each challenge in structural information management against the total number of participants. This part of the questionnaire also explored respondent’s weighted score of utilisation of BIM technological tools, workflow standards and human resources strategies in their organisation.

The third part of this questionnaire considered to discover that whether a relationship exist between using BIM technological, workflows, and human readiness and information quality

111 in structural engineering organisations. This research examined from literature review and qualitative case study interviews that information quality in structural engineering industry can be measure by interoperability, accessibility, accuracy and security. Therefore in this questionnaire survey there are four dependent variables involved in this analysis. This questionnaire survey is designed to test the effect of information quality by using BIM dimensions. Therefore, this research will examined the BIM criteria which impact on the information quality satisfaction. The criteria for adopting each BIM technological, workflows and human resources readiness have been examined from the literature review and qualitative case study. The questionnaire requested participants to rank BIM criteria outcomes (independent variables) within the context of technological, workflows and human resources perspective in their organisation, as well as information quality dimensions (dependent variables). Hence, this research seeks to measure the relationships between independent variables and four dependent variables (accessibility, accuracy, interoperability and security). In another words this research will seek to find a statistical technique to measure how independent variables can predict dependent variables separately.

This research should take into consideration for parametric statistical alternative. Non- parametric techniques do not make any assumption that refers to population distribution. Parametric techniques are perfect while data are being measured on normal and ordinal scales (Pallant, 2010, p 213). Factor analysis will be an appropriate option in this research to explain a set of fewer factors plus weightings. This research need to verify that the collected data is appropriate for factor analysis before conducting the analysis. The main requirement of factor analysis is normal distributed data. Therefore this research checked the normality of data before conducting the analysis (See Section 6-5-1). This research expects that many of the identified independent variables can be correlated to dependent variables. This research expects that information quality in structural engineering industry explains the identified variability of the measurement. Factor analysis can be applied to test this idea. There would be another idea that some identified variables are not correlated with information quality which has some independence from information quality. Therefore factor analysis could support the confidence of this research that all or most of variables which this research measured are the same underlying factor or not. Factor analysis also allows this research to diminish variables to a smaller and more manageable set of factors. Factor analysis contributes to this research to test which independent variables and dependent variables

112 categorised under the same component or factor and this technique aids this research to test the must correlated relationships between variables (See Section 6-4 for more details).

This research also adopted multiple regressions while the components (factors) presented through factor analysis. The multiple regression enables this research more explorations within a set of variables. The multiple regression also increases research reliability in predicting values for the dependent variables. The multiple regression technique has been conducted to evaluate relationships between independent and dependent variables which are categorised into components through factor analysis. The independent and dependent variables which are categorised into the same component are entered into regression model in the same time. It means that this research consider regression model for each component (See section 6-5 for more details). The Statistical Package for Social Science (SPSS 20.) is used in both Factor analysis and multiple regression tests. Next section justified reliability and validity of the research data analysis and findings.

In document Enhancing information quality through building information modelling implementation within UK structural engineering organisations (Page 118-127)