4. RESEARCH METHODOLOGY
4.2 Methodological approach
The section presents the methodology of this research work, which is applied to analyse data collected through the travellers’ survey (see chapter 6) at four interurban interchanges (see chapter 5).
15 A critical incident is an encounter that is particularly satisfying or dissatisfying. Its occurrence is assumed to have a significant impact on satisfaction. In public transport services, negative critical incidents may have the
It is worth noting that the approach focusses on the subjective side of the quality loop, aiming at identifying key quality factors, according to travellers’ perceptions, and thus clustering them according to their perceptions. The focus is on this “part of the cake”, whose measurement and assessment has become a challenge among researcher in transportation sector (Ibeas et al., 2008; Piriyawat and Narupiti, 2004; Swanson et al., 1997; TRB, 2003;
Vanhanen and Kurri, 2007), since in the past efforts were mainly given to the objective side (Lee, 1989, Nakanishi, 1997). Users’ expectations and perceptions are not easily to evaluate when speaking of quality and in transportation this is even more complicated, due to different factors involved in it, as economics, geography, sociology, etc.
Figure 8 illustrates the methodological approach and the relationship between its steps and the specific objectives presented in section 1.2.2.
Figure 8. Methodological approach and specific objectives (A, B, C, D).
The input of all the process are data obtained through customer satisfaction surveys, designed to understand which is the satisfaction of customers with a series of quality attributes. The characteristic of a “well designed” survey are described in 3.2.2. It is common to ask respondents to rate their satisfaction through scales from “1” (very low satisfaction) to
“5” / “10” (very high satisfaction), so that variables to be analysed are ordinal-scale variables with 5 or 10 categories / modalities (Peña and Romo, 1989).
It is worth stressing that, in this research work, the design and collection of data were not a task of the doctoral work and are considered as inputs for the methodology. In fact, data were collected within HERMES project with the objective to test different business models of intermodal services, as it is explained in section 2.2. Nevertheless, descriptive analyses of the samples give light to the characteristics of the intermodal demand at interchanges (specific objective A).
The first step of the methodology is the implementation of Multiple Correspondence Analysis (MCA) on data obtained through the customers’ survey. This multivariate statistical analysis allows to explore the association of categorical variables (quality attributes) to identify the perceived key qualityfactors at interurban interchanges (specific objective B).
The second step of the methodology is the implementation of k-means Cluster Analysis (CA) on the dimensions obtained through the previous step. This multivariate statistical analysis allows to group respondents in homogeneous travellers profiles, according to their perceptions of quality (specific objective C). Statistical tests (Chi-quadro test with socio-economical variables) were implemented to better characterise the profiles.
The third step of the methodology is the development of policy recommendation to support stakeholders when identifying priorities (specific objective D). This last step is based on both the previous findings; as a matter of fact, priorities are targeted to specific travellers because of specific perception of quality at interchanges.
It is worth stressing that the methodology (the statistical analyses) were performed four times, according to the number of case studies. Data were analysed independently for each case study. Statistical analyses were performed using SPSS Statistics16, v.19.0.
The three steps of the methodological approach are described in the following sub-sections, pointing out their main features and justifying why they were selected for this work.
16 SPSS Statistics is a software package used for statistical analysis, developed by IBM Corporation. It is among the most widely used programs for statistical analysis in social science and provides data analysis, data mining,
4.2.1 Multiple Correspondence Analysis (MCA)
In the first step, Multiple Correspondence Analysis was employed to explore latent constructs as concern satisfaction of quality attributes and thus identify key quality factors (KQF) at interchanges.
Correspondence analysis has become most popular in some areas of the social sciences, such as marketing and ecology, and behavioural research (Hoffman and De Leeuw, 1992). It is an exploratory data technique for nominal and categorical data17 and allows the construction of principal components, which optimally summarises the data, since it can show not only the relationship between row and columns, but also between the categories of either the rows or the columns (Benzecri, 1992).
MCA attempts to reduce the variability of a model by calculating the minimum number of factors that can explain the most variability in the model: the aim of MCA is to explain the most inertia, or variance, in the least number of dimensions (Hair et al., 2010). MCA works similar to principal components analysis (PCA), i.e. the total variance of the table is defined and then this total is decomposed optimally along so-called “principal axes”. However, PCA extracts which variables explain the largest amount of variance in the data set, whereas the focus of MCA is to examine the associations among variables (Hoffman and De Leeuw, 1992).
MCA analyses multi-way tables with each row and column becoming a point on a multidimensional graphical map. In practice, MCA represents graphically the categories of both rows and columns and allows for a comparison of their “correspondences”, or associations. These perceptual plots18 provide a comprehensive vision of the associations among row and/or column points (categories), although the interpretation of the axis is not always intuitive, since MCA uses chi-square distance rather than Euclidean distance between points (Gower et al., 2010).
In Annex A is presented the theoretical concept of MCA together with useful hints to interpret and understand the output of the analysis.
MCA was performed in this research work involving the quality attributes of the travellers’
survey carried of the case studies. These attributes are presented in section 6.1 and in the
17 This fact is extremely important: it is the point of the methodological approach. Ordinality is most often ignored and numbers such as “1, 2, 3,…” , representing ordered categories, are treated as numbers having metric properties, a procedure which is incorrect.
18 The word perceptual comes from the word “perception”, which refers to the consumers’ understanding of the
analysis were coded according to an ordinal-scale, since travellers were asked to rate their satisfaction on a Likert-scale that ranges from positive values (“5”) to negative ones (“1”).
Figure 9 visually summarises the inputs and outputs of this step of the methodology: inputs are the ratings of the quality attributes according travellers’ perception, outputs are the key qualityfactor, i.e. the latent constructs (dimensions) obtained through the implementation of MCA.
Figure 9. The first step of the methodology.
Within this work, this statistical method was chosen because of the following main issues:
1. MCA assumes that the data being analyzed are discrete (nominal or categorical) variables and as already pointed out we deal with ordinal-scale variables where the number of categories is “5” for each one of them. The categories of ordinal-scale variables are non-quantitative and only indicate relative position within an ordered scale: performing classical techniques, e.g. Factor Analysis, and arithmetic operations is not the best approach (Hair et al., 2010).
2. MCA is a compositional techniques, rather than a decompositional approach (Greenacre, 2007). It can simplify complex data from a large table into latent
constructs while preserving all the information in the data set –the maximum possible proportion of the total variation in the original data set.
3. MCA presents data using two-dimensional plots, displaying row and column categories together, and it makes easy to add supplementary data points that may aid in the interpretation of the model into the analysis post-hoc. In other words, it allows for the addition of row or column points that carry zero inertia to the biplot after it has been constructed (Hoffman and De Leeuw, 1992).
4. MCA needs no special statistical assumptions or requirements. Due to the fact that MCA is a non-parametric statistic, there is no theoretical distributional assumptions to be met19. MCA is interesting because it focuses mainly on how variables correspond to one another and not whether there is a significant difference between these variables. Generally, in the literature it is only suggested that (Greenacre, 2007;
Garson, 2008):
homogeneity of variance across row and column variables must be met, that means that the statistical properties are similar across rows and columns;
data are made up of several categories (typically more than three) so that the analysis is really informative;
all values in the frequency table must be non-negative so that the distances between the points on the biplot are always positive.
4.2.2 Cluster Analysis (CA)
In the second step, k-means Cluster Analysis was performed to classify travellers in homogeneous groups of transport users, the so-called travellers profiles (TP), according to their perception of satisfaction with the KQF obtained from Multiple Correspondence Analysis.
Cluster analysis is an exploratory data analysis tool for segmenting observed data into groups, or clusters, maximising the dissimilarity between groups (Hair et al., 2010). In this sense, CA creates groups where objects in the same cluster are homogeneous and similar in some ways to each other and dissimilar to those in other clusters. The analysis makes no distinction between dependent and independent variables, because it is explorative.
19 Therefore, contrary to the classical implementation of the chi-square test, when applied to MCA the chi-square test does not reveal whether the association between variables is statistically significant. MCA does not support
Segmenting respondents through CA is typical of marketing research and activity: using cluster analysis, a customer ‘type’ can represent a homogeneous market segment and its needs, attitudes and behaviour can be better identify, so that products can be better designed to be appealing for it (Athanassopoulos, 2000). CA is a common technique in banking, insurance and tourism markets and it has also been applied in transportation sector, due to its strong relationship with economics and social science (QUATTRO, 1998).
Cluster analysis can be achieved by various algorithms which differ significantly. The most common are hierarchical CA and partition CA. The first approach identifies the clusters using previously established clusters, starting with one-point clusters and recursively merging two or more most appropriate clusters (agglomerative algorithms) or starting with the whole set and proceeding to divide it into successively smaller clusters (divisive algorithms). The second approach generates a single partition of data with a specified or estimated number of clusters. In this research work it was chosen a partition method, i.e. k-means clustering.
K-means clustering is a method to quickly cluster large data sets, based on an iterative