Data analysis - The research process - Consumer behaviour issues in electronic commerce commer

4. Consumer behaviour issues in electronic commerce commerce

5.4. The research process

5.4.10. Data analysis

Once the data collection was completed, the next step was to analyse it. The questionnaire included both open and pre-coded questions. This section explains the methods used in the analysis of both qualitative and quantitative type answers.

5.4.10.1. Quantitative analysis

The data was analysed using Version 10 of the Statistical Package for Social Sciences (SPSS). There are three main types of statistical techniques: univariate, bivariate and multivariate (Sarantakos, 1998; Diamantopoulos and Schlegelmilch, 2000; Pallant, 2001;

Bryman and Cramer, 2001). The main factor influencing the decision of which statistical

technique to use is the type of measurement of the variables. There are three major types of measurement: nominal (or categorical), ordinal and interval/ratio (Hoinville and Jowell, 1977; Oppenheim, 1992; Tull and Hawkins, 1993; Sarantakos, 1998, Diamantopoulos and Schlegelmilch, 2000; Bryman and Cramer, 2001; Jennings, 2001).

The distinction between the types of variables is important because certain statistical tests presume certain kinds of variable (i.e. measurement). Some authors (e.g. Sarantakos, 1998;

Bryman and Cramer, 2001) argue that strictly speaking multi-item questionnaire measures (e.g. Likert and semantic differential scales) are ordinal data. In this research, multi-item measures were treated as ordinal data for the purposes of conducting some bivariate statistical analysis. However, although this study treats the data as ordinal, for reasons conntected with description of the results both means and standard-deviations were computed.

Univariate techniques

Univatiate analysis is the simplest form of quantitative analysis and refers to the various ways of analysing and presenting information relating to a single variable. There are three main groups of univariate measures (Sarantakos, 1998; Diamantopoulos and Schlegelmilch, 2000; Bryman and Cramer, 2001):

• The relational measures relate parts of a group of scores to each other or to the whole (e.g. rate, ratio and percentage);

• The measures of central tendency represent the average or the typical value in a distribution and are one of the most commonly used statistical measures (e.g. mean, median and mode);

• The measures of dispersion inform about the degree to which the data is spread around the mean (e.g. range and standard deviation).

The analysis of the data using univariate statistics provided interesting information.

However, it was not sufficient to demonstrate differences in perceptions and relationships between variables. Therefore, bivariate techniques were also used.

Bivariate techniques

The essence of bivariate techniques is that they enable the examination of relationship patterns between two variables. One variable is used to form the comparison groups (i.e.

the independent variable) and the other (i.e. the dependent variable) to assess whether it is explained by the independent variable (Sarantakos, 1998; Bryman and Cramer, 2001). There are a number of bivariate techniques and the main criteria influencing the choice of the most appropriate statistical test are the type of measurement and the number of comparison groups (Bryman and Cramer, 2001).

The chi-square test is used to assess whether two variables are independent from or related to each other (Sarantakos, 1998). More specifically, the test examines the null hypothesis (H₀) assuming that the variables are independent of each other (Sarantakos, 1998; Bryman and Cramer, 2001; Pallant, 2001). This is achieved by comparing the observed and expected frequencies in each of the cells of the contingency table. The acceptance or rejection of the null hypothesis is dependent on the level of probability that the differences happened by chance that the research is prepared to accept. The level of probability for rejecting the null hypothesis was set at the significance value of 0.05, which means that there only is a 1 in 20 chance that the null hypothesis is being rejected when it should have been accepted. Chi-square can be used to assess whether there are differences between the variables, but it does not tell where the differences lay.

There are some circumstances where the chi-square should not be employed. Although these rules vary between researchers, the general rule is that the test should not be used when more than 20 percent of the cells have an expected frequency of less than 5 or if any cell has an expected frequency less than 1 (Tull and Hawkins, 1993; Bryman and Cramer, 2001; Pallant, 2001).

Although the chi-square can be used with nominal, ordinal and interval/ratio variables, it is more frequently used on nominal variables. This is because there are other tests that can be used with ordinal/interval/ratio data and can inform not only if there is any difference between the variables, but also where those differences lie. If this is the case, one important decision to make pertains to whether to use parametric or non-parametric tests. The two types of tests make different assumptions about the population that the sample has been drawn. The main difference between them is that while parametric tests are based on the

assumption that the variable investigated is normally distributed in the population, non-parametric tests do not adhere to the principle of normality (Bryman and Cramer, 2001;

Pallant, 2001; Dancey and Reidy, 2002).

There are several techniques for assessing the normality of a variable, such as the skewness and kurtosis, histograms, normal Q-Q plots and the Kolmogorov-Smirnov statistic. In this research, the Kolmogorov-Smirnov statistic was used to assess normality. A series of tests were undertaken and systematically revealed that the distribution of the scores was not normaly distributed. Moreover, most of the variables were ‘true’ ordinal variables.

Therefore, non-parametric tests were used for assessing the association between the independent variable and ordinal/interval/ratio variables.

The Mann-Whitney test was used to test for differences between two independent groups and an interval variable. This is the equivalent to the parametric t-test of independent samples, but instead of comparing the means of the two groups, the Mann-Whitney test compares medians (Pallant, 2001). More specifically, “it compares the number of times a score from one of the samples is ranked higher than a score from the other sample” (Bryman and Cramer, 2001, p.

133). Similar to the chi-square, and following general practice (e.g. Balnaves and Caputi, 2001; Bryman and Cramer, 2001; Pallant, 2001) the significant value for rejecting the null hypothesis is 0.05.

The Kruskal-Wallis test is similar to the Mann-Whitney but can be used to compare three or more unrelated samples. It is the equivalent test to the parametric ANOVA. A significant result (p<0.05) indicates that at least one of the groups is different from at least one of the others. It does not tell, however, which ones are different, nor does it tell how many groups are different from each other. Because SPSS does not contain a test to reveal where the differences lie, the Multiple Comparison Test (MCT) proposed by Siegel and Castellan (1988) was used to determine which groups were different. The MCT aims to check whether the null hypothesis should be accepted or rejected for each pair of groups.

For this purpose, the researcher needs two values:

• The differences in mean rank for the two sub-groups, which can be easily calculated using the SPSS output.

• The critical value of z, which is obtained using the following formula (Siegel and Castellan, 1988):

As far as the z value is concerned, the significance (or p) value chosen for the original analysis (that is 0.05) was used. Using the number of comparisons (they are three as given by the formula [k(k-1)/2], where k is the number of sub-groups) and the Alpha level (∞=0.05), the corresponding z value is 2.394 (Siegel and Castellan, 1988). N is the total number of responses from the two sub-groups (u and v) being analysed, whereas nu is the number of responses in sub-group u and nv the number of responses in sub-group v. After obtaining the critical value of Z, this value is compared with the difference in the mean ranks. Only when the difference in the mean ranks exceeds the critical value of z is the comparison significant. As far as the presentation of the results is concerned, given that the MCT does not give the exact significance value, the researcher can only report whether the difference between the two groups is significant or not at the 0.05 level (which in the tables are indicated with a +).

The Chi-Square, Mann-Whitney, Kruskal-Wallis and Multiple Comparison tests were performed in order to examine the null hypothesis (H₀). The null hypothesis states that there is no difference between the subgroups of the independent variables (described in Section 5.5.3) and as such any differences could be explained as possibly having arisen by chance or were caused by the sampling procedures (Sarantakos, 1998). In other words, the null hypothesis states that there is no difference between the sub-groups of the independent variable. If the null-hypothesis is rejected (at the 95% level of confidence), the alternative hypothesis is accepted which suggests that any differences between the sub-groups of the independent variables are likely to be genuine (Dancey and Reidy, 2002).

Multivariate statistics

Multivariate statistics explore the connections between three or more variables simultaneously (Bryman and Cramer, 2001). Multivariate statistics are more complex than univariate and bivariate statistics. There are several multivariate techniques and the most frequently used are multiple regression, factor analysis and cluster analysis (Diamantopoulos and Schlegelmilch, 2000; Bryman and Cramer, 2001; Dancey and Reidy,

Z_∞/k(k-1) N(N+1)

1 n_u

1 n_v +

2002). Multivariate techniques were not used because it was not necessary for achieving the aims of the thesis.

5.4.10.2. Qualitative type analysis

There are a variety of approaches for analysing qualitative data. For example, Tesch (1990) identified 26 and Creswell (1998) 28 different approaches to qualitative analysis. Thus, there is no single right way to do qualitative data analysis and much depends on the purposes of the research (Punch, 1998).

One of such approaches to qualitative analysis is the interactive model suggested by Miles and Huberman (1994). According to these authors, analysis of qualitative data can be seen as a process consisting of three activities: data reduction, data display and conclusion drawing/verifying. This model was adopted for the analysis of open-ended questions (Figure 5.4) as it is an appropriate approach “for more quantitatively oriented researchers who accept the necessity of ‘going qualitative’ but are concerned that they will have to leave their scientific principles behind if they do so” (Robson, 2002, p. 473-4).

Figure 5. 4: Components of data analysis: Interactive model Source: Miles and Huberman (1994)

Data display

Conclusions:

drawing/verifying Data

reduction Data collection

Data reduction is a continuous process throughout the analysis and refers to the process of selecting, simplifying or transforming the answers (Miles and Huberman, 1994). It involves the careful study of the content of the answers and then fitting the answers into a pattern of categories developed after the responses have been studied (Sarantakos, 1998). To achieve this, a list was produced in order to observe patterns of response. Although most of the open questions clearly indicated that the respondents should give only one motive/reason, some respondents wrote two or more reasons. In such circumstances, only the first reason written was considered.

The next stage was to input into SPSS all the data from the open-ended question as they were given by the respondents. Similar to Miles and Hiberman (1994) a two stage coding process was applied. The first stage focused on grouping the answers with a common theme (e.g. saving of time). At this stage an effort was made to reduce significantly the number of categories while maintaining the meaning of the answer. The second stage involved grouping these themes into a few categories so that the information was reduced to a level that quantitative analysis could be applied. These themes and categories were, it is argued, accurate, unidimensional, mutually exclusive and exhaustive (Sarantakos, 1998). For example themes such as ‘saving of time’, ‘more time to evaluate options’ and ‘flexibility of time’ were grouped in the category ‘time’.

A very important component in qualitative analysis is to guarantee that the data is not stripped from their context (Punch, 1998). Using the two stage process allows the few final categories to be linked to the themes that originated them, maintaining to a certain degree the context of the answers.

Data display is the stage where information is organised, compressed and assembled (Punch, 1998). Tables were the display method used in this study. For each open ended-question two types of tables were produced. One table shows the final categories that resulted from the process of data reduction. Another table illustrates the themes that made up main each category in order to enable a deeper understanding of the more specific meanings of each main category and at the same time minimise the loss of information.

Drawing and verifying conclusions. Reducing and displaying has only meaning if data are to assist in drawing conclusions (Punch, 1998). There are several tactics that can be used for

the purposes of drawing and verifying conclusions, such as comparisons, noting of themes and patterns and looking for negative statements (Miles and Huberman, 1994).

In document An evaluation of the factors influencing the adoption of e-commerce in the purchasing of leisure travel by the residents of Cascais, Portugal (Page 191-198)