• No results found

A introduction to the Statistical Package for the Social Sciences (SPSS) Two - -Step Cluster Analysis (STCA) -Step Cluster Analysis (STCA)

Chapter 3 Taste and social stratification

4.3 Choosing appropriate analytic tools

4.3.1 A introduction to the Statistical Package for the Social Sciences (SPSS) Two - -Step Cluster Analysis (STCA) -Step Cluster Analysis (STCA)

As mentioned in the introduction chapter, one of the primary objectives of this current research is to explore how the field of cultural consumption in China is socially structured and to what extent the cultural hierarchy (if applicable) correlates to the social structures through an analysis of art museum visitors. To fulfil this research purpose, the primary tasks of the quantitative phase of this current study are to explore whether the visitors’ cultural tastes follow certain patterns and what factors contribute to the distinction among the visitors who do not belong to the same ‘interpretive community’ (Fish 1980; Hooper-Greenhill 2000) or have various ‘cultural profiles’

(Hanquinet 2015). In the field of cultural sociology, the use of cluster analysis is a well-established approach in separating survey participants based on their cultura l preferences (e.g. Savage and Gayo 2011; Everitt et al. 2011: 9). Many scholars have defined this analytic approach in their works. For instance, according to Henning and Meila, the aim of adopting cluster analysis is to ‘divide data into groups (clusters) that are meaningful, useful, or both’ (2016: 6). Leese et al. are in line with and define cluster analysis as a method of providing ‘objective and stable classifications’ (2011: 1).

137

Gordon gives a detailed introduction to the term ‘clusters’. For him (1999), clusters represent the relationship between internal factors and external factors, coherence and insolation, and homogeneity and separation.

Although the analytical method has been considered ‘the most basic method of estimating similarities’ (Romesburg 2004: 8), it does not necessarily mean that the scholars who apply this technique need to sacrifice the complexity and richness of their research aims for a less sophisticated analytical instrument. This is exemplified in the work of many scholars in the realm of socio-cultural studies. Those sociologists have agreed that cluster analysis is a practical and efficient approach to understand ing cultural tastes and activities (e.g., Savage and Gayo 2009, 2011; Van Eijck 2011;

Hanquinet 2013). For instance, in Van Eijck’s (2011) study on social differentiation in music taste categories, the findings of the factor and cluster analysis suggest that the music taste genres in the Netherlands can be categorised into three hierarchical groups (highbrow, pop, and folk). In his research, Van Eijck also identifies an omnivore taste among the members of the ‘new middle class’ who have preferences for a broad range of cultural goods. Savage and Gayo’s (2011) account is another example of studies that successfully obtain results by applying cluster analysis. In their account, Savage and Gayo challenge the popularisation of the omnivore–univore hypothesis by highlighting the ‘subtle’ differences underlying the integration of the individuals’ claimed tastes for classic and pop genres. For them, the explanatory power of the omnivore model is limited to understanding the criteria and the disciplines that separate their survey participants into six clusters. In her account, Hanquinet (2013) also uses cluster analysis as the primary approach for setting up different types of individual cultural profiles. In doing so, Hanquinet illustrates the multidimensionality and the complexity of

138

individual cultural preferences.

There are also some additional advantages to using STCA. For instance, first, in comparison to the traditional cluster analytical methods (e.g., hierarchy and k-means clustering analysis), STCA is well-known for its effectiveness in working with a large-scale database (Garson 2009; Altas et al. 2013; Rousseeuw 1987; Shih et al. 2010;

Everitt et al.2011; Norušis 2012; Chorianopoulos 2015: 318). In this respect, considering the large number of survey participants involved in this present study, STCA was chosen as the way to conduct further data mining. Second, STCA can work, simultaneously, with both continuous and categorical variables. The procedure can also generate many clusters automatically after the calculation (Trpkova and Tevdovski 2009: 89). Third, STCA offers graphic solutions for displaying tables and charts, such as significant output, or important predictions for each of the variables. This feature enables researchers to easily determine and interpret the composition of the clusters and identify the importance of specific variables that contribute to the module. Finally, STCA enables researchers to apply Outlier Treatment (OT) to their data, which controls the negative impact of the cases that are different from other cases. According to Norusis, such an issue is directly linked to an increasing overall number of clusters and fewer homologous clusters (2010: 384). Zhang et al. (1996: 107) consider these kinds of cases as the outliers that are separated from the fine sub-clusters. In this regard, STCA detects and separates the atypical value (outliers) during the calculation and tries to either fit these atypical values into the sub-clusters without increasing the cluster size or build another cluster for the cases.

139

The above section illustrates the efficiency of cluster analysis in identifying communitive groups in a database. However, cluster analysis has its weakness. The stability and the validation of the results of the clustering process are difficult to measure and to guarantee. To be more specific, researchers must subjectively ‘decide [on] the optimal number of clusters that fits a data set’ (Halkidi et al. 2010). A bias may be introduced when there is a lack of theoretical guidelines for measuring the quality (goodness-of-fit index) of the cluster assignments (Cheong and Lee 2008). To avoid this methodological shortcoming, this current research adopted two measures: the Silhouette Measure of Cohesion and Separation in two-step cluster analysis, and a stability and validation test. The section that follows will introduce the two approaches.

First, the Statistical Package for the Social Sciences (SPSS) Two-Step Cluster Analysis (STCA) was chosen to explore the potential clusters in the database. In this type of analysis, two phases of statistical calculation are used in the algorithm. Specifica lly, similar to the balanced iterative reducing and clustering using the hierarchies (BIRCH) algorithm, designed by Zhang et al. (1996), the first, or pre-clustering, step of clustering in STCA explores the dense regions by rejecting the outliers (clusters with few cases).

In the second step, pre-clustered features are distributed into numbers of clusters based on the distance measures of the likelihoods that are calculated for the continuous and categorical variables (Everitt et al. 2011: 97). After the analyses, STCA provides an evaluation of the cohesion and the separation of the analysis output (the silhouette measure of cohesion and separation). To be more specific, STCA measures the quality of clustering by displaying a graphic snapshot based on the silhouette scores introduced by Rousseeuw (1987) (as cited in Kaufman and Rousseeuw 1990: 41). In the STCA output, a good result refers to an average silhouette coefficient higher than 0.5, which

140

indicates strong evidence of the cluster structure, while a coefficient less than 0.2 represents a poor solution (Chorianopoulos 2015: 130). The two above-mentio ned values were used as a referential value to determine the number of clusters, the aim of which is to avoid the bias caused by subjectivity.

The secondary measures that are taken to test the stability and the validation of the resulting cluster partitions are as follows: replication analysis, supplementary analysis, and interviews. Firstly, by following Milligan and Hirtle’s (2012) suggestio n, replication analysis is introduced in the quantitative phase to measure the stability of the clustering solution. For Milligan and Hirtle, if the clustering result is stable, a researcher should be able to obtain similar results from applying cluster analysis to the second sample from the same source and set of variables. Within the quantitative data analysis phase of this current research, the case-split module in the SPSS is used to separate the sample into two random halves. Next, Cohen’s kappa coefficient is calculated to measure the agreement between the two equivalent clustering solutio ns.

Second, in aiming to measure the validityof the clustering solution, the quality of the final clustering result is also assessed with supplementary analysis (bivariate analysis).

According to Skinner (1981), an attempt to establish external validity should be involved in the process of developing a classification system. Specifically, if the clusters show a correlation with variables that are associated with existing theoretica l frameworks, then this supports the classification. Eight variables related to conservative aesthetic claims are tested and cross-tabulated with the ‘cultural profiles’ of the visitors.

The design of the variables is based on the theoretical framework within which the volume of embodied cultural capital held by an individual determines his or her position

141

in the hierarchy of cultural consumption (e.g., Bourdieu 1979). This test’s result provides an assessment of the validation of the clustering solution while enriching the characteristics of the visitor categories. The third approach is that, by the same token, the information obtained in the qualitative analysis can also be used to measure the validation of the segments. In exploring whether and how the visitors in differe nt categories distance themselves from one another by expressing their characterised cultural tastes and interpretations of contemporary art, the findings of the qualitative study phase can also be used to assess the differentiation in tastes among the visitors sharing different memberships in the categories.

In summary, this section has illustrated the methodological strengths and weaknesses of STCA. This section has also illustrated how the disadvantages of this analysis tool can be avoided by adopting more statistical measurements. A more detailed introductio n to the primary analytical tool (thematic analysis) in the qualitative phase of this study is given in the following section.