• No results found

Simulation Results

7.3 Simulations

7.3.10 Simulation Results

In this section, I will compare clustering results from both spatial hierarchical clustering

and Chameleon spatial hierarchical clustering in all different scenarios (Figures 7.11,

7.12, 7.13 and 7.14). Data are generated from all four different scenarios detailed in

Section7.3.2, with results shown in tables. For all scenarios, tuning parameters required

in Chameleon spatial hierarchical clustering will be set as K = 3, M = 10, C = 2 (C = 4

or α = 0.55 for odd number of objects in Merging stage). PH will also be used to select the number of clusters in both spatial clustering techniques. The number of simulations for each scenario is repeated 100 times.

Table 7.5: Clustering Results for Data from All Sets of the Type Given in Figure 7.11(a)

Average ARI (sd)

Average No. Clusters (sd)

Total No.Clusters (Main Clusters and

Noise Points) Distributions generated from simulation set-up 1

CSHCa 0.870 (0.028) 3.80 (0.41) 6 (4+2)

SHCb 0.657 (0.075) 8.83 (1.39) 6 (4+2)

Distributions generated from simulation set-up 2

CSHC 0.732 (0.063) 7.54 (0.37) 6 (4+2)

SHC 0.545 (0.089) 10.87 (1.32) 6 (4+2)

Distributions generated from simulation set-up 3

CSHC 0.936 (0.023) 4.88 (0.19) 6 (4+2)

SHC 0.683 (0.081) 8.93 (1.42) 6 (4+2)

Distributions generated from simulation set-up 4

CSHC 0.602 (0.058) 7.65 (0.44) 6 (4+2)

SHC 0.547 (0.085) 10.67 (1.47) 6 (4+2)

aCSHC is short for the Chameleon spatial hierarchical clustering. bSHC is short for the spatial hierarchical clustering.

The simulation results for 7.11(b), 7.12, 7.13 and 7.14 are shown in AppendixA.6

The simulation results for locations in Figures 7.11(b), 7.12, 7.13 and 7.14 are shown

in Appendix A.6. The tables show that the Chameleon spatial hierarchical clustering

ran much faster than the spatial hierarchical clustering across all different scenarios. Chameleon spatial hierarchical clustering does better in a majority of scenarios with high average ARI and the formed numbers of clusters closer to the actual numbers of clusters. Chameleon spatial hierarchical clustering tends to form fewer clusters but each one with larger number of areal units as it has difficulty in identifying noise points.

In Table 7.5 we can see that, when the number of noise points is small, the location

is a condensed distribution of areas (i.e. locations in Figures 7.11 and 7.13) and the

mixing proportion of different distributions is similar, both spatial hierarchical clustering and Chameleon spatial hierarchical clustering can get good results with higher ARI (comparing to the truth), but Chameleon spatial hierarchical clustering achieves slightly

higher ARI in all four different distributions set. However, in comparison to Table7.5, estimated clusterings from Chameleon spatial hierarchical clustering shown in Table

A.22 are slightly worse with lower ARIs, but are still good in general, for which ARIs

are around 0.5 to 0.8. Instead, the behaviors of spatial hierarchical clustering are less affected by the increasing number of noise points, the ARIs are similar in both tables

(Tables 7.5 and A.22), the estimated total numbers of clusters are much closer to the

true total numbers of clusters. So it is more likely to conclude that Chameleon spatial hierarchical clustering is sensitive to the number of noise points. This conclusion can also be detected by comparing the same location scenario but with different number

of noise points (e.g. Tables A.23 and A.24). From Tables A.27 and A.28, we can find

that neither of these two spatial clustering techniques are good at clustering the sparse distributions and different mixing proportion areal data. It is interesting to notice that

the ARIs in Table A.28 are not worse than the ARI in Table A.27, which means with

the sparse distributions (i.e. locations in Figures 7.12 and 7.14) and different mixing

proportion areal data, the behaviors of Chameleon spatial hierarchical clustering is less affected by the number of noise points. In addition, the larger variance (i.e. 0.6 and 8) compared with the mean will affect both the numbers of clusters and the ARIs in all of these different scenarios. The estimated clustering results will get slightly worse, with lower ARIs, but still can capture the main clustering structure as the ARIs are still positive. ARI will be zero when areas are randomly assigned to different clusters, which means the estimated clustering is hardly similar to the true classification. If this occurs (ARI is around 0), it will indicate that both spatial hierarchical clustering and Chameleon hierarchical clustering are highly sensitive to the variance.

In the scenarios discussed above it was assumed areal units from the same cluster have independent variables/dimensions. However, dimensions of the areal units from the same cluster can to dependent, i.e. the cluster covariance matrix have non-zero off-diagonal values. So in order to extend the applicable fields of the newly proposed clustering technique, dependent dimensions within clusters in multivariate space will also be used in simulations. They will be added to the scenarios with the best performance (i.e. the number of clusters and average ARI) among all the independent dimensions within cluster scenarios. In addition, I will also simulate the scenarios when different dimensions within cluster have different impact on areal units, i.e. the covariance matrix diagonals have different values.

Comparing different scenarios and distributions, the scenario in Figure 7.11(a) gives

better results in both the number of clusters and ARI. In addition, when the mean levels between two groups are more different, the simulation results will be higher in ARI and number of clusters is closer to the actual number of clusters, so I will compare the

performances about the dependent dimensions and different diagonals in these scenario

(in Figure7.11(a), distributions 3 and 4). The off diagonal elements of the simulations

will be set as 0.5 (weak correlation) and 0.8 (strong correlation) separately, then the covariance matrices are

( 2 0.5 0.5 2 ) (distribution set 5) or ( 2 0.8 0.8 2 ) (distribution set 6) and ( 8 0.5 0.5 8 ) (distribution set 7) or ( 8 0.8 0.8 8 ) (distribution 8) in order to guarantee the determinant to be non-negative and the covariance is symmetric. In the different diagonals simulations, the covariance matrix in distributions 3 will be replaced by

(

0.15 0

0 0.35

)

(distribution set 9), the covariance matrix in distribution 4 will be

replaced by ( 7 0 0 9 ) (distribution set 10).

Table 7.6: Clustering Results for Dependent Dimensions Given in Figure7.11(a)

Average ARI (sd)

Average No. Clusters (sd)

Total No.Clusters (Main Clusters and

Noise Points) Distributions generated from simulation set-up 5

CSHCa 0.930 (0.028) 4.78 (0.17) 6 (4+2)

SHCb 0.675 (0.088) 8.64 (1.27) 6 (4+2)

Distributions generated from simulation set-up 6

CSHC 0.911 (0.034) 4.63 (0.28) 6 (4+2)

SHC 0.672 (0.074) 8.50 (1.82) 6 (4+2)

Distributions generated from simulation set-up 7

CSHC 0.589 (0.095) 7.54 (0.78) 6 (4+2)

SHC 0.540 (0.078) 10.48 (1.63) 6 (4+2)

Distributions generated from simulation set-up 8

CSHC 0.582 (0.058) 7.42 (0.53) 6 (4+2)

SHC 0.536 (0.085) 10.21 (1.89) 6 (4+2)

aCSHC is short for the Chameleon spatial hierarchical clustering. bSHC is short for the spatial hierarchical clustering.

Table 7.7: Clustering Results for Different Variances Given in Figure7.11(a)

Average ARI (sd)

Average No. Clusters (sd)

Total No.Clusters (Main Clusters and

Noise Points) Distributions generated from simulation set-up 9

CSHCa 0.921 (0.083) 4.67 (0.17) 6 (4+2)

SHCb 0.676 (0.086) 8.76 (1.31) 6 (4+2)

Distributions generated from simulation set-up 10

CSHC 0.593 (0.089) 7.51 (0.53) 6 (4+2)

SHC 0.542 (0.072) 10.65 (1.39) 6 (4+2)

aCSHC is short for the Chameleon spatial hierarchical clustering. bSHC is short for the spatial hierarchical clustering.

From Table 7.6 we can see that the correlation or dependence between dimensions has

an influence on the clustering results. When the between dimension correlation gets stronger, then the average ARI will get slightly lower and the number of clusters will be lower. This happened across all these four scenarios regardless of the variance magnitude.

From Table 7.7 we can see that the difference in diagonals has little influence on the

average ARIs and the number of clusters, the clustering results do not vary a lot by

comparing Tables 7.5 and 7.7, which means that the difference in diagonals does not

appear to make the clustering results worse, neither much lower ARI nor very low or high number of clusters.

7.4

Chameleon Spatial Hierarchical Clustering Applied to

Related documents