• No results found

Distribution of the visitors in the identified age groups

5.4 Data analysis: Cluster analysis and other External Validation Test

5.4.1 Choosing cluster numbers and validation analysis

This section introduces the procedures for choosing proper formulation of cluster analysis results. As mentioned in Chapter 4, based on the selected measuring criter ia, the Sciences Two Step Cluster Analysis (STCA) is capable of automatically generating

194

optimal numbers of clusters. With this in mind, I decided to pursue a relatively idealist ic solution in the initial phase of data analysis. I used the long-likelihood criteria as a distance measure due to the fact that all the variables are categorical (Trpkova and Tevdovski 2009: 307; Norusis 2010: 380). The Sciences Two Steps Analysis (STCA) offers two types of clustering criteria for auto-calculation: The Schwarz Bayesian Criterion (BIC) and the Akaike Information Criterion (AIC). The two criteria have their own advantages and disadvantages in terms of forging clusters. Specifically, the AIC tends to overestimate the result by overfitting models, while the BIC performs ineffectively when ‘the sample size is limited and the components are not well separated’

(Yang and Yang 2007 as cited in Trpkova and Tevdovski 2009: 161). Based on the issue mentioned above, both criteria were tested in the calculation in order to find the best solution. After the calculation, the BIC- and AIC-based analytical methods generated two types of clusters with no significant differences between them. The proportions of the clusters both accounted for 22.4% and 77.6% (as shown in table 5.4-2). According to the components of the clusters, the solutions could be described as Tolerants and Distants. Visitors to the former cluster indiscriminately like every genre of culture, while their counterparts have no taste for cultural products of any kind. The scores of both types of clusters were fairly high in the Silhouette test, which confirms the quality of the solutions.

195 Figure 15 Clustering solutions based on two types of clustering criteria for auto -calculation: the

Schwarz Bayesian Criterion (BIC) and the Akaike Information Criterion (AIC).

The solutions were considered the most reasonable output by the STCA; however, the significantly limited information generated by the restricted number of clusters made the result less practicable for conducting further analysis. In this regard, more detailed information is required, which can be gained by enhancing the complexity of the clusters. Zhang et al. (1996) encouraged researchers to increase the precision of their method by increasing the number of sub-clusters. Many researchers (e.g. Trpkova and Tevdovski 2009; Norusis 2010) followed Zhang et al. in believing that the number of clusters should be controlled by researchers rather than the software. One might argue, however, that a self-determined number of clusters may cause a risk of bias due to subjectivity. I agree with this argument and, accordingly, decided to control the issue by employing two criteria for selecting the final clustering fragments: 1) the score of the final solution should exceed at least 0.2 in the Silhouette Measure; and 2) the final

196

solution should be made based on the dendrogram provided by hierarchical analysis (see Figure 16).

Figure 16 Dendrogram of hierarchical cluster analysis

In line with the criteria mentioned above, the hierarchical culture analysis was initia lly conducted to determine the most suitable number of segments. Thanks to the visualisation provided by this method, the solutions, ranging from three to five, appeared promising. Among the three outcomes, only the solution with four clusters fulfilled all the requirements of the three criteria. Specifically, the five-cluster solutio n did not pass the cohesion and separation test and suggested a poorly evidenced result, while the three-cluster solution separated visitors into three very uneven clusters.

Therefore, a four-segment case was chosen to conduct further analysis. The average silhouette score of the four-segment solution was above 0.3, suggesting a fair level of clustering output.

197

Art Distants (n=652, size=27.4%) Tolerants (n=461, size= 19.4%)

The visitors in this category tend to distance themselves from all the listed cultural genres.

The visitors in this category tend to present their tastes for all the listed cultural genres.

Legitimate art lovers (n=638, size= 26.9%) Popular culture lovers (n=625, size=

26.3%) and 26.3%. As the third largest cluster, cluster 1 can be described as comprising culture Distants. The reason why this group is named as such is due to the fact that the members

198

of this group tended to distance themselves from all kinds of cultural genres. They claimed that they do not like to listen to music or read books in their leisure time. Unlike the culture Distants, the visitors in the Tolerants category had a totally opposite form of cultural preference. They accept various genres indiscriminately. As the smallest visitor group, the proportion of this type of visitors accounts for only 19.4% of the visitors. As the second largest visitor group, the legitimate culture lovers prefer high-profile cultural genres (e.g. biographies and autobiographies and opera) and tend to display special preferences for cultural goods from mainland China. To be specific, as shown in the Table 10, they like almost all kinds of ‘high-brow’ cultural forms from mainland China. The counterparts of the legitimate art lovers, the popular culture lovers only present their preference toward American and European popular art genres (e.g.

American and European heavy metal, hard rock, and punk music) or the works of art that are under the process of legitimation (e.g. blues and jazz).

The above section showed a very interesting juxtaposition of four types of tastes. In the following section, multinomial logistic regression will be applied to the data in order to establish cultural profiles for the clusters and to explore the relationship between these consumption patterns and visitors’ demographic figures.