DATA ANALYSIS - CHAPTER 4 − METHODOLOGY

LIST OF ABBREVIATIONS

4. CHAPTER 4 − METHODOLOGY

4.9 DATA ANALYSIS

This research adopted quantitative analysis to generate descriptive and inferential findings. This was considered to be the most appropriate way to meet the research objective mainly because it allows not only a “systematic description, factually and accurately, of facts and characteristics of a given population or area of interest”

(Pizam 1994), but also to identify and evaluate ‘what causes the behaviour’ (of cooperation) (Finn et al. 2000). Thus, and in accordance, an analytical design was used in this research to identify and analyse the decisions, perceptions, and behaviour of respondents in relation to cooperation in the past and in the future, and the and influences on cooperation operation and outcomes.

Data collected through structured questionnaire was analyzed using SPSS. The next sections will explain the preparation of data for descriptive and inferential analysis, through univariate, bivariate, and multivariate techniques.

4.9.1. Preparing the data for analysis

Variable and value labels were defined in order to set up the SPSS database, which also included the definition of missing values. In a quantitative approach, preparing the data for analysis includes data cleaning. Hence, the data analysis of this study was also preceded by a cleaning task to identify any errors so that the data could be cleaned and then analysed.

In addition, in this study, preparing the data for analysis also involved dealing with open-ended questions. As explained in Section 3.8, the questionnaire included both open and pre-coded questions. The analysis of open-ended questions was done through data reduction, data display and conclusion drawing/verifying (Miles and Huberman 1994). Data reduction is a continuous process throughout the analysis and refers to the process of selecting, simplifying or transforming the answers (Miles and Huberman 1994). It involves the careful study of the content of the answers and then fitting the answers into a pattern of categories developed after the responses have been studied (Sarantakos 2005). To achieve this, lists of answers were produced in order to observe patterns of response. Although most of the open questions clearly indicated that the respondents should give only one (the main one) motive/reason, some respondents gave two or more reasons. In such circumstances, although all the answers were written by the researcher, only the first reason given was considered for the purposes of the analysis. Then, a two stage coding process was applied. The first stage focused on grouping the answers within a common theme. At this stage an effort was made to reduce significantly the number of categories while maintaining the meaning of the answer. The second stage involved grouping these themes into a few categories so that the information was reduced to a level at which quantitative analysis could be applied. A very important component in quantitative analysis is to guarantee that the data is not stripped from their context (Punch 2005). Using the two stage process allows the

few final categories to be linked to the themes that originated them, maintaining to a certain degree the context of the answers.

In turn, closed-ended questions required diversified data analysis procedures, namely descriptive statistics and inferential statistics. Both, descriptive and inferential analyses are explained below.

4.9.2. Descriptive and Inferential analysis

Descriptive statistical analysis was used to summarize the data (Barnes and Lewin 2005). For categorical variables (nominal data) the response percentage were produced. For ordinal variables, descriptive statistics and analysis of central tendency measures (mean, median, standard deviation) were produced.

Inferential analysis was conducted with different independent variables, depending on the purpose of the analysis. First, when the purpose is to explore the existence of differences between the respondents’ answers regarding their behaviour, their perceptions with regard to advantages, disadvantages and the influences on the decisions in relation to cooperation (close-ended questions) (Chapter 5 and 6) the independent variable to test the null hypotheses is tourism and wine respondents.

Secondly, when the purpose is to determine the differences in terms of the likelihood to whether cooperate or not in the future (Chapter 7), the null hypotheses were tested with the independent variables being the factors that have been identified in the literature. These factors are for example, business size age, respondents’ personality.

The choice of tests to be used was based on the following requirements (Barnes and Lewin 2005):

• The type of data to be analysed, that is at the nominal (categorical), ordinal, or interval level;

• Number of groups of respondents;

• Independent observations.

Additionally, decisions on the tests to be used also implied a choice between parametric and parametric tests. For the purpose of this research, non-parametric tests were chosen. Although these tests are less sensitive, they imply fewer assumptions about the population from which the sample was drawn (Pallant 2010), namely when normal distribution requirements are not met (Barnes and Lewin 2005), which is the case of the present study. Thus, the statistical tests used in this research were: Chi-Square Test for Independence, Mann-Whitney U Test and Kruskal-Wallis test.

The Chi-Square Test for Independence was used for nominal data to verify the existence of statistically significant differences between the two groups of the independent variable (wine and tourism respondents). This test has the following assumptions. First, each case or person must only contribute to one cell in the contingency table. Second, no cell has an expected value of zero. Third, the assumptions of a Chi-Square Test for Independence is that for a 2x2 table no cell should have expected counts below 5, while in larger contingency tables it is accepted that up to 20% of cells could have expected frequencies below 5, but all expected counts should be greater than 1 (Pestana e Gageiro 2000; Barnes and Lewin 2005; Field 2009). When these assumptions were not met, then the Chi-Square test was considered invalid.

Data was analysed making estimations within a 95% confidence level. Thus, when the probability value of 0.05 or less was recorded for hypothesis tests, the null hypotheses were rejected (Pallant 2010). When presenting the results of the Chi-Square Test for Independence, the actual result, the degrees of freedom, the probability value (indicates that the result is a real or a chance result) and the effect size (when significant differences were found and the null hypothesis rejected), namely phi value (2x2 tables) or Cramer’s V (larger contingency tables), were presented and analysed. To evaluate the significance of results, the level of 0.05 (p<0.05 – ‘p’ stands for probability value) was used throughout this study (based on common convention in the literature) (Barnes and Lewin 2005). As recommended by Pallant (2010) for 2x2 tables the size of the effect is decided according the following criteria: small=>0.10, medium=>0.30, large=>0.50. For

larger tables, different criteria are recommended, depending on the number of categories in rows and columns (two, three or four categories). Thus, depending on the number of categories, the suitable criterion is chosen and effect size value always indicated when appropriate.

When testing hypotheses that relied on ordinal data, the Mann–Whitney test (2 groups) and Kruskal-Wallis (three or more groups) tests were applied. Also, descriptive statistics (frequency, mean and median values) were presented. Similar to the Chi-Square Test, when the probability value is <0.05 or less, and the null hypothesis was rejected, the effect size of the significant differences were presented. For Mann-Whitney tests the effect size (r) is calculated based on the following formula (Pallant 2007; Field 2011): r = z / square root of N where N=total number of cases. Effect size results are reported according to Cohen’s (1988 cited Pallant 2010; Field 2009) criteria of 0.1=small effect; 0.3=medium effect, 0.5=large effect.

In order to interpret the results of the analysis of the Likert type scale data, when the mean was being used in a descriptive way, a zoned scale of averages (Vaughan 2007) was used to evaluate whether the likely decision was to cooperate or to not cooperate with businesses (wine/tourism).

4.9.3. Multivariate analysis

In order to group respondents into categories (Pestana and Gageiro 2000), with respect to personality traits, a Hierarchical Cluster Analysis was applied using the statistical program SPSS (version 18). Cluster Analysis is not an inferential test and it does not aim to estimate population parameters (O’Donoghue 2012). It is an exploratory data analysis tool concerned with ‘discovering groups in data’ (Everitt et al. 2011) and with the organization of the observed data (e.g. people) into meaningful groups, or clusters (Timm 2002).

In this research, a Hierarchical (agglomerative) Cluster Analysis was run because it is used to find relatively homogeneous clusters of cases based on measured

characteristics, allowing their classification without prior knowledge about which elements belong to which clusters (O’Donoghue 2012). This technique “generates a sequence of cluster solutions beginning with clusters containing single object and combines objects until all objects form a single cluster” (Timm 2002, p. 522-23).

In addition, a Hierarchical (agglomerative) Cluster Analysis is a technique that allows the researchers to choose how many clusters should be recognized (based on the inherent structure of the cluster hierarchy and the purposes of the research) (O’Donoghue 2012).

The Hierarchical Cluster Analysis in this research was run on 200 cases, each responding to a set of 10 statements on personality traits (Likert type scale) on their level of agreement to each statement. The level of agreement ranged from 1-Strongly agree and 5- 1-Strongly disagree. A Hierarchical Cluster Analysis was run based on Ward’s method. Although distance can be measured differently (O’Donoghue 2012; Vincze anf Mezei 2011), in this research, the ‘Squared Euclidean Distance’ index was used, assuming that the variables considered are independent. This Cluster Analysis created a tree diagram or dendrogram (Timm 2002) and three clusters were identified: proactive, moderately proactive and cautious. The variables used for the typology were selected based on the literature review and the relevance for this research. The three identified groups and their mean scores (ranging from 1-Strongly agree and 5- Strongly disagree) are presented in Figure 4.6.

Figure 4.5: Groups of respondents based on the personality variable (Hierarchical Cluster Analysis)

Source: author

A cluster analysis allows subsequent analysis on the clusters as groups (O’Donoghue 2012). Thus, in this research, the results of Cluster Analysis were used with a twofold purpose. First, to analyse if wine and tourism owners/managers differed in relation to their personality. Second, it was used to test whether the personality of respondents (across the different groups based on the different personality traits) was related to the decision to whether cooperate or not with other businesses (wine/tourism).

In document An Examination of inter-business cooperation by wine and tourism small and medium-sized businesses in the Douro valley of Portugal. (Page 145-151)