3.8 Part two-quantitative methods
3.8.3 Data analysis
The data analysis of the quantitative phase commenced with data preparation, including: missing data, reverse coding, normality and outliers. Thereafter, validating the instrument was done through EFA and CFA. The hypotheses testing was conducted through CFA, multi-group CFA, moderation and moderated-mediation methods.
3.8.3.1 Data preparation
The data preparation was carried out by determining the sample adequacy, treatment of missing values, reverse coding and testing of normality and outliers. Data preparation at the preliminary stages was expected to contribute to the quality and the integrity of the data.
3.8.3.2 The sample adequacy
Sample adequacy for both EFA and CFA was determined. EFA and CFA were carried out by splitting the total sample of 400 into two groups. The sample splitting was conducted through a random selection using the Statistical Package for the Social Sciences (SPSS). A total of 192 cases were utilised for EFA whereas the remainder of 202 were used for
the measurement models of CFA. This split sample was important to validate the new measurement scale developed for this study (Hair et al, 2006). After scale validation, the structural models for hypothesis testing were carried out for the total sample of 400. Prior to any analysis, the sample adequacy was determined.
The sample sizes required for Exploratory Factor Analysis (EFA) lack common consensus. The recommended sample size varies from 100 (Hair et al, 2006) to 300 (Tabachnick & Fidell, 2007), whereas Nunnaly (1978) recommended 10 participants per variable. It is also widely accepted that the suitable sample size for EFA depends on a number of factors such as the number of items, factor loadings and communalities (Field, 2009; Hair et al., 2006). The EFA was conducted with a sample size of 192 and meets recommendations (Hair et al., 2006).
The sample size for SEM is determined by factors including multivariate distribution of data, estimation technique, the complexity of the model, amount of missing data and amount of error variance among the reflective indicators (Hair et al., 2006). The sample size for SEM also depends on the data normality, the estimation method, the model complexity, the missing data, the communalities, the number of items and the number of factors individually or in combination and multivariate normality (Hair et al. 2006). When maximum likelihood estimation method is used a sample size of 100-150 is considered adequate (Hair et al., 2006). Since this study incorporated maximum likelihood estimation, complying to Hair et al. the sample size 202 was considered suitable for the measurement models through CFA. The structural models for hypotheses testing was carried out thorough the total sample, 400.
3.8.3.3 Treating missing data
The following section outlines the process adopted for treating missing data. Missing data results in respondents not fully completing the survey (Allen & Bennett, 2010, p. 12). The researcher may contribute to missing data by data entry omissions or through a weak research design (Hair et al, 2006). Therefore, prior to data analysis, the data was carefully observed to detect missing values. However, since the survey was launched online through Qualtrics, the data was much cleaner than when collected through other survey questionnaire methods, with only a very small number of missing values. For example,
the forced response option in Qualtrics software kept reminding the respondents to complete one question before going on to another.
The missing values were analysed using Little’s MCAR (missing completely at random) test with an expectation maximisation technique (Hair et al, 2006). The MCAR test resulted in chi-square=415.625 (DF=471; P< .968). This indicated that data was completely missing at random and did not have any identifiable pattern. The MCAR test consists of 4 imputation methods: mean substitution, all-variable, regression and expectation maximisation (EM). Since all these methods produce generally consistent results (Hair et al., 2006), EM was used for missing data estimation and for replacing missing values in SPSS.
3.8.3.4 Reverse coding
Reverse coding involves altering the item wording to minimise the extreme responses acquiescence bias6(Sauro & Lewis, 2011). The construct repeat visitation was measured through five items. Among them three items were negatively worded. The positively worded items appealed to respondents who show a preference for familiarity in their hotel selection whereas the negatively worded items appealed to respondents who prefer variety in their hotel selections. Therefore, the questions explaining the repeat visitation behaviour of variety-seekers were reverse coded prior to the analysis. Three items measuring repeat visit intention of variety-seekers: a hotel I have not been to before (16.2), a different one from the last visited one (16.3), a different type of accommodation altogether (e.g. B&B’s, resorts etc.) (16.4) were reverse coded.
3.8.3.5 Normality and outliers
Normality is the “degree to which the distribution of the sample data corresponds to a
normal distribution” Hair, et al (2006, p. 40). Outliers “represent the cases whose scores are substantially deviating from all the others in a particular set of data” (Byrne, 2010,
p. 105). The normality of the data can be observed by the skewness and kurtosis values and determining the normality of the data commenced with graphical observation of the
6
Acquiescence response bias is the tendency for survey respondents to agree with statements regardless of their content (Villar, 2008).
88
distribution curve (Allen & Bennett, 2010; Blunch, 2013; Field, 2009; Hair et al., 2006). The normality of the data was further established statistically. The items with univariate skewness (ZS) and kurtosis (Zk) in the range of ±2.58 were considered normal. According
to Hair et al. (2006) a sample size of 200 or more will not have substantial impact of non- normality on results as per smaller samples such as 50 or less. Given the sample size of 400, based on the law of statistical regularity and the central limit theorem, it was assumed that a large sample can help to estimate parameters properly and leads to a normal distribution of data.
Nevertheless, to carry out this decision the researcher ensured that the regression model is free from heteroscedasticity—in other words, it is homoscedastic. While homoscedasticity is defined as the assumption that dependent variable(s) exhibit equal
levels of variance across the range of predicator variable(s), homoscedasticity is
desirable since due to the variance of the dependent variable being explained in the
dependence relationship should not be concentrated in only a limited range of the independent values (Hair et al., 2006, p. 83). The scatter plot indicted that the results are
homoscedastic. Therefore, the variance of the residuals are constant and there was no systematic pattern of residuals. This indicated that the data was normal (Hair et al. 2006). Thereafter, the univariate outliers were checked using boxplots. SPSS highlighted a number of cases as outliers. Observing carefully, it was found that the cases indicated as outliers were not due to common causes such as data entry errors or mistakes in coding (Field, 2009; Hair et al., 2006; Holmes-Smith, 2012), which were further assured due to the auto generation of data sheets by the Qualtrics software. Thus, the cases which were detected as outliers were considered a valid group of the population (Hair et al, 2006), those indicating a genuine opinion which may point to extreme behaviour (Field, 2009). Considering these factors the outliers were not removed for analysis.
When using maximum likelihood estimation (MLE) for structural equation modelling (SEM) multivariate normality is essential (Blunch, 2013; Byrne, 2010). Multivariate normality was first examined by looking at the values for skewness and kurtosis generated through AMOS. The items with skewness > 2 and kurtosis >7 were considered as non-normal (Holmes-Smith, 2012). In addition, Mardia’s multivariate kurtosis values were observed (Blunch, 2013; Byrne, 2010; Holmes-Smith, 2012). Mardia’s kurtosis values between 3 and 30 were considered as signs of multivariate kutorsis (Newsom, 2005; Walker, 2010). Thereafter, the outliers were detected based on Mahalanobis
distance through AMOS (Hair et al., 2006; Holmes-Smith, 2012; Tabachnick & Fidell, 2007). The Mahalanobis test indicated a number of cases as outliers observed through the AMOS Mahalanobis distance (D2) for each case. Any case that significantly deviated from other cases based on Mahalanobis distance D2 was identified as a significant contributor to multivariate non-normality (Holmes-Smith, 2012). However, these cases were not removed since the removal of the cases did not result in significantly improved results. Similarly it has been identified that statistically significant results may not be delivered by the correction methods suggested to rectify non-normal data (Byrne, 2010; Gao, Mokhtarian, & Johnston, 2008).