Preliminary Data Analyses - Data Analyses Procedures

Chapter 4 – Research Methodology

4.3 Data Analyses Procedures

4.3.1 Preliminary Data Analyses

According to Aaker et al. (2005), the quality of statistical analysis is influenced by how well the data is prepared and converted into a form suitable for analysis. Thus, before conducting further statistical analyses, the collected raw data were subjected to preliminary analyses by careful screening to ensure that the data coding and entry were appropriate for carrying out the analyses. It was important to ensure the data were “clean” before proceeding to the next step. The screening process was necessary because model estimation in SEM is not always successful because of “messy data” (Kline, 2005; Schumacker & Lomax, 2004). According to Schumacker and Lomax (2004, p. 240), “messy data” such as “… missing data, outliers, multicollinearity, and non-normality of data distribution seriously affect the estimation process often resulting in fatal error messages or failure to reach convergence (unable to compute a set of parameter estimates).”

Briefly, the procedures used to clean the data in this study were as follows: (1) sample non-response bias (using independent sample t-test); (2) missing data (using t- tests for a series of dependent variables); (3) outliers (using boxplots, histogram and standardised residual values or Z-score); and (4) normality of data distribution (using skewness and kurtosis). Each of these procedures is further described in the following subsection. Besides the data cleaning procedures, SPSS was also employed to conduct descriptive analysis including frequencies, mean, and standard deviation of each item and demographic characteristics of the respondents to gain preliminary information about the data collected in the study.

4.3.1.1Outliers

Upon calculating the descriptive statistics, outliers were thoroughly examined. An outlier is “an observation that is substantially different from the other observations (has an extreme value) on one or more characteristics (variables)” (Hair et al., 2010, p. 36). Typically an outlier is judged “… to be an usually high or low value on a variable or a

unique combination of values across several variables that make the observation stand out from the others” (Hair et al., 2010, p. 64).

There are three methods of detecting outliers: univariate, bivariate and multivariate. This study examined only univariate and multivariate methods considering the claim by Hair et al. (2010, p. 66) that “... researchers’ should limit the general use of bivariate methods to specific relationships between variables, such as the relationship of the dependent versus independent variable in regression; as the outliers will arise whenever the number of variables increases.”

Univariate outliers can be detected during data screening. A case may be a univariate outlier if it has an extreme score for a single variable; it is easy to find by inspecting the frequency distributions of z scores or standardized residual value (Hair et al., 2010; Kline, 2005). Hair et al. (2010) suggest that for a large sample, any data value with a standardized residual value less than -4 or greater than +4 can be identified as an outlier. In terms of handling univariate outliers in this study, any cases that appeared to be less than -4 or greater than +4 were eliminated from the database. However, the decision to remove outliers from the data set must be made with care because the deletion often results in the generation of further outlying cases (Pallant, 2007). Multivariate outliers can be detected using graphical methods such as residual scatter plots or statistical methods such as the Mahalanobis distance (Hair et al., 2010).

4.3.1.2Normality Test

Normality refers to the shape of the data distribution for an individual metric variable and its correspondence to the normal distribution (Hair et al., 2010, p. 71). Skewness and kurtosis are two indications of normality; skewness according to Morgan and Griego (1998) refers to the symmetry of a distribution compared with a normal distribution while kurtosis is used to describe whether the peak of a distribution is taller or shorter than a normal distribution. The values of the skewness and kurtosis are frequently used to examine and determine whether the measured items are normally distributed in a large sample size (200 or more) (Field, 2009). Further, Kline (2005) suggests that the any absolute value of skewness greater than three and an absolute value of kurtosis greater than eight indicate problems with normality in the sample distribution. In this study following Kline (2005), any absolute value of skewness greater than three and any absolute value of kurtosis greater than eight indicated problems with normality in data distribution.

4.3.1.3Procedures for Splitting the Data

Once the data screening was completed, the ‘clean’ data were randomly split into two data sets. The objectives of the data splitting procedure are to validate the EFA results and to move to SEM analyses (Hair et al., 2010; Kline, 2005; Schumacker & Lomax, 2004). According to Kline (2005), it is inappropriate to run EFA and CFA using the same data, as the results of EFA are subject to capitalization on chance variation, and using CFA to specify a model based on the results of EFA just compounds this problem. Kline (2005) added that, sometimes, factor structures identified through EFA may turn out to have poor model-fit-indices to the same data when evaluated using SEM. In addition, Schumacker and Lomax (2004) suggest that a researcher could begin model generation by using EFA on a sample of data to find the number and type of latent variables in a plausible model. Once a plausible model is identified, another sample of data could be used to employ SEM to confirm or test the model.

In line with the reasons, this study deemed it inappropriate to run EFA and SEM using same data set. As a result, two subsamples were required for this research as two techniques were used in part of the data analysis process: EFA and SEM. Each sample group must meet the minimum size requirements as explained in Section 4.2.1. In general, a three-stage process was used in order to perform the data analysis. In the first-stage process of data analysis, the first subsample data set was used to conduct EFA and to perform the Cronbach alphas, which, in turn, partially satisfied Research Objective 1. In the second-stage of data analysis, the second subsample data set was used to reassess the results of the EFA using SEM analysis by employing CFA, which, in turn, satisfied Research Objectives 1 to 3. The third-stage of data analysis involved developing and estimating a causal path model on the second subsample to test the hypotheses regarding the relationships between the five constructs (service quality, customer satisfaction, perceived value, restaurant image and behavioural intentions) discussed in Subsections 3.2.4, which, in turn, satisfied Research Objective 4.

In document An analysis of restaurant patrons' experiences in Malaysia : a comprehensive hierarchical modelling approach (Page 86-88)