ISSN: 2005-4238 IJAST 104
Copyright ⓒ 2019 SERSC
An PSO-SFLA Based Ensemble Link Weighted Triple Quality Algorithm to Improve the Performance of Clustering over Categorical Data Clustering
N.Yuvaraj1, Dr.T.Karthikeyan2, A.Sampath Dakshina Murthy3, K.Swathi 3
1Research Scholar, Department of Computer Science and Engineering, St.Peter’s Institute of Higher Education and Research, Tamil Nadu, India
2Associate Professor, Department of Electronics and Communication Engineering, Annamacharya Institute of Technology and Sciences, Rajampet, Andhra Pradesh, India
3Assistant Professor , Department of Electronics and Communication Engineering ,Vignan’s Institute of Information Technology, Visakhapatnam, Andhra Pradesh, India.
Abstract
This paper focus on solving the issues related to the occurrence of irrelevant and null information during cluster partitioning. Hence, to avoid the serious issue arising due to such improper dataset, the proposed method uses a link based cluster ensemble technique uses weighted triple quality and multi-view point using entropy and similarity measurement, respectively. It ensembles the objects into clusters by suitably eliminating the local optimal problem and the quality of clustering is improved by reducing the high dimensional datasets. The clustering is performed using hybrid particle Swarm Optimization (PSO) - Shuffled Frog Leaping Algorithm (SFLA) algorithm. The proposed method is evaluated on categorical datasets to test its effectiveness in terms of Clustering Accuracy (CA), Normalized Mutual Information (NMI) and Adjusted Rank Indices (ARI). The results shows that the proposed approach attains better finalized clusters than the other conventional methods.
Keywords: Bipartite Spectral Algorithm, Entropy Weighted Triple Quality, PSO-SFLA, Multiview point similarity measure.
1. Introduction
The data clustering the basic element to structure the unstructured data using cluster analysis, which discovers the essential objects from the unlabeled data. The possibility of attaining high quality clusters is high, when the clustering algorithm maximizes the intra-cluster similarity and minimizes the inter-cluster similarity. The need to obtain high quality clusters is been achieved through data mining, pattern recognition, machine learning and meta-heuristic evolutionary algorithms.
In recent years, the development of clustering algorithm has increased over the numerical data [1]. There are several clustering algorithms available in literatures to cluster the categorical data which includes: Cluster ensemble framework with Group method of data handling (GMDH) [2], CHAMELEON [3], Ensemble Weiszfeld algorithm [4], plurality voting-based consensus function [5], Fuzzy Weighted Locally Adaptive Clustering (FWLAC) [6], combinatorial optimization problem [7], Hierarchical Cluster Ensemble Selection [8], (self-organizing maps (SOM) and k-means with cluster-based similarity partitioning algorithm (CSPA), hypergraph partitioning algorithm (HGPA), and majority voting [9], hierarchical cluster ensemble model based on knowledge granulation [10], Clustering Algorithms Independency Language [11], incremental
ISSN: 2005-4238 IJAST 105
Copyright ⓒ 2019 SERSC
fuzzy cluster ensemble method based on rough set theory [11], normalized cut algorithm with confidence measure [12], centroid-based ensemble merging [13] and fuzzy clustering and final consensus matrix [14].
The main limitation is the presence of categorical data, where the attribute values are not numerical ones and they act as an unordered nominal values. Hence, the clustering algorithm fails to deal with such nominal data and the computational efficiency of such algorithm in terms of data retrieval is poor. The clustering algorithm utilized so for cannot discover the clustering patterns of unordered nominal categorical data. The clustering varies usually with different parameter for a given unordered nominal categorical data. Hence, it is very difficult to select on the algorithm to cluster the unordered nominal categorical data. However, the use of ensemble based clustering avoids the problem due to unordered nominal values in categorical dataset.
The ensemble clustering mergers two different base clusters of similar criteria and improves the effectiveness of clustering quality.
In spite of its advantages, the cluster ensemble fails to provide correct clusters at the final data partitioning stage. Here, the relationship between the variables is obtained by binary clustering ensemble matrix, which consists mostly of null values leading to poor data partitioning quality. The present work aims to cluster the categorical data by group different category values into a common cluster using linked clustering ensemble technique. This method uses refined cluster association matrix that helps to develop a relationship between different clusters and the data quality is improved henceforth. In this technique, the null entries are found by cluster ensemble similarity and the entries are eliminated finally. The refined ensemble technique forms a single cluster from multiple partitions of various clusters. It increases the robustness of the system in terms of its improved accuracy and retrieval efficiency.
In this paper, a multi-point similarity measurement with weighted triple quality entropy measurement is carried out to measure the cluster point similarity. The relative similarity is measured between the variables for data partitioning. However, the incorrect data partitioning is avoided using spectral co-clustering algorithm. Thus the data is partitioned with suitable relations and obtains good bi-partition. The clustering is done using meta-heuristic hybrid particle Swarm Optimization (PSO) - Shuffled Frog Leaping Algorithm (SFLA) algorithm over categorical datasets.
The outline of the paper is presented as follows, 2. Proposed Methodology
In this section, we provide the proposed cluster ensemble technique to resolve the problem in categorical datasets.
2.1 Cluster Ensemble
The Ensemble approach is used to cluster the datasets and data partition is generated even in the presence of incomplete data. The data point or variable clustering relation belonging to the ensemble matrix avoids the relation during the process of clustering. Further, linked similarity measurement is used to find the incomplete data variables. This avoids the gap between the clustering and link analysis and hence it provides high quality clustering over categorical datasets. The cluster ensemble is formulated as follows. The categorical data set is represented as X = {x1, x2,..., xN} with N objects and m attributes. The cluster ensemble is thus represented by Π = {π1, π2,..., πM} with M base clusters or ensemble members. The base clusters with cluster set is represented as, πg = where 1 ≤ g ≤ M and such that , g belongs to the base clustering class of cluster kg. The main problem in the selection of clustering ensemble is the selection of base cluster subsets, , where through selection strategy in the cluster ensemble Π. The consensus function is used to provide the final clustering result π∗ = {C1, C2,...,Ck} from the base clusters ΠS. The ensemble framework is shown in Fig. 1. It is a three stage process, which involves
C C1g, 1g,...,Ckgg
gj1Cgj X
1', ',..., '
S i S
S
ISSN: 2005-4238 IJAST 106
Copyright ⓒ 2019 SERSC
base cluster generation, base cluster selection and final solution generation through consensus function using ΠS.
Figure.1. General Process of Cluster Ensemble B. Cluster Association Matrix
The cluster association matrix or refined matrix determines the value of zero for unknown and one for known associations. The cluster association matrix preserves the association degree and similarity measurement using binary refined matrix signifies the existence of two different clusters. The matrix transformation method is used to attain better convergence rate. The normalization is maximized and optimal convergence is obtained further using Euler time discretization over matrix transformation. Semi positive associations are found using fractional Laplacian method and eliminates it from the clusters.
Finally, link based ensemble finds the clustering of unknown or known associations without semi positive associations.
C. Link-Based Similarity
Quantification of two different clusters of similar vertices is carried out through weighted triple quality, where each cluster triples. The weighted triple quality is determined by reciprocating the triple the amount of weight. These weights are summed to obtain total cluster triple set and individual triple weights. The accumulated weighted triple quality score is measured over entire triples between the clusters. Finally, the similarity measurement is carried out between the triple clusters and hence the cluster similarity link exists.
2.2 Multiview Point Entropy based Weighted Triple Quality A. Multiview Point Similarity Measurement (MPSM)
The MPSM improves the quality of clustering using an optimization process. The optimal partitioning is estimated based on similarity distance measurement between the variable. The MPSM determines the clustering efficiency and the similarity measurement given in Eq.(1) is constructed using the data points or variables as reference which is given by,
Simdi,dj∈Sr di, dj = 1
n − nr Sim (di− dh, dj − dh)
dh∈S S r
(1)
Clustering 1 Clustering 2 Clustering N
Consensus function Base Clusters
Generation Step
Consensus Step
Final Clustering Result Large Data set
ISSN: 2005-4238 IJAST 107
Copyright ⓒ 2019 SERSC
The similarity measurement is defined as the average similarity between the data points (di and dj) of the document outside a cluster. The similarity measurement lie within the cluster and the points to create the similarity measures lies outside the cluster. This method is called as MPSM and the similarity measurement between di and dj is MVS(di , dj |di, dj ∈ Sr). The MPSM within a cluster Sr over point dh lying outside the cluster is been determined as a dot product of correlation distance over dh and cosine of angle between the documents from dh.
MPSM(di , dj |di, dj) ∈ Sr) = di− dh t
dh∈s s r
(dj− dh)
= 1
n − nr cos(di− dh) , dj − dh)
dh
||di− dh|| || dj− dh || (2)
The final product depends on individual similarity of the sum of clusters and dot-product of the vectors.
B. Entropy Weighted Triple Quality:
The entropy is increased after measuring the MSPM between the variables (di and dj) and hence it possess corrections on similarity. The merged clusters (di∪ dj) are then found and the variables (di and dj) has its own members variables (ni and nj). Based on the entropy level, H (K) and H (K + 1) the difference between the variable is set as the weighted entropy difference between (ni + nj)H (di∪ dj) and niH di + njH (dj). The proposed method uses entropy weighted triple quality and bipartite spectral graph partitioning generates the cluster associated matrix, which is formed using the same data points. The problem in formation of cluster association matrix is resolved with categorical label similarity estimation. The weights from entropy weighted triple quality and bipartite spectral graph partitioning improves the clustering quality.
2.3 Bipartite Spectral Graph Partitioning
The base clusters are linked with entropy weighted triple link similarity measurement technique and the final clustering is obtained using spectral graph partitioning. The generated final clusters contains complete information ensemble. Link based spectral graph partition is used to obtain higher clustering quality and it eliminates the incomplete information.
2.4 Ensemble Creation E*
The proposed link based ensemble forms a candidate ensemble over a cluster (M) during each iteration.
During clustering, the data points are considered as cluster ensemble due to different cluster formation at the end of each iteration. The distance between a data point and a cluster is given by:
∀𝐸𝑖, 𝐸𝑗𝑖 ≠ 𝑗𝑑 𝐸𝑖, 𝐸𝑗 = 𝑚𝑎𝑥
𝑛𝑠∈𝐸𝑖,𝑛𝑡∈𝐸𝑗 𝑑 𝑛𝑠, 𝑛𝑡 (3)
The data point ensemble E* is combined with data point (M) through grouping and evaluates the categorical data set. Further, if the maximum distance is not attained, the hybrid PSO-SFLA method is used to provide better clustering performance. Even if the data point lies in worst case probability, the creation of ensemble data point (N) is accomplished with the hybrid PSO-SFLA clustering model with limited computational complexity.
B. Hybrid PSO-SFLA Algorithm
To improve the clustering ability, this research uses hybrid PSO-SFLA algorithm, which involves updating and co-operation strategies. The population in split into subpopulations in two different swarms: main and
ISSN: 2005-4238 IJAST 108
Copyright ⓒ 2019 SERSC
sub-swarm. The sub-swarms (si) runs independently over entire search space. Depending on the fitness value, the particles are placed in descending order. The entire group is divided into subgroups (M), where each particle (i) is divided into its respective group (M), e.g. first particle (i=1) into first group (M=1) and so on. The global optimal particles are placed in main group and remaining particles in the subgroups.
In main group, the PSO algorithm is converged for particle update. In sub groups, the entire search space is searched to update the particle based on the update rule of SFLA. The main formulation is denoted as follows:
vi(t + 1) = wvi(t) + c1r1(t) pi(t) - xi(t) + c2r2(t)(ps(t) - xi(t))
where, pi – particle’s best position, where i {1,2,...,n} and ps - best position of individual subgroups.
On other hand, if the best position is not updated by the particle, the following equation is used to update the particle’s position,
vi(t + 1) = wvi(t) + c1r1(t) pi(t) - xi(t) + c2r2(t)(pg(t) - xi(t)) where pg – particle’s global best position.
Similarly, if the particle is not updated with its velocity and position values after all operations, the values are then regenerated. Further, the particle convergence speed in accelerated by deep searching in subgroups.
Using the above strategies, the each sub-group searches the particles in global space and maintains the particles diversity, which is shown in Fig.2.
In cooperation strategy, there is a periodical co-operation of swarms for updated solution exchange and it attains better clustering through guided search. In the proposed system we use, independent run strategy [16]. During the process of migration, there exist a trade-off between the accuracy of clustering and computational efficiency. This paper uses blind strategy [17] to migrate the particles in a periodic manner after certain iterations. Topology is the other factor used to share the information between the swarms and when too much information is shared between the swarms, the computational efficiency reduces. The algorithm certainly undergoes local minima problem due to particle convergence. On the other hand if few swarms are used for information exchange, the computational efficiency increases but the results are unsatisfactory. In the proposed method, the redundant information is removed using ensemble technique, hence few information is shared between the clusters with higher computational efficiency. The better solutions are updated over the poor particle in the main group, which is shown in Fig.3. Similarity, the subgroup is updated with the values of position and velocity.
The Hybrid PSO-SFLA clustering model is shown in following steps, Step: 1. Initialise the particles in a random manner.
Step: 2. Set the algorithm parameters (N, d, m, n, cvc, Im).
where, N - total of particles, d - dimension, m – number of group, n - number of subgroups, cvc - local iterations inside the subgroups, Imax - global Iterations.
Step: 3. Calculate the fitness function of the particle or variables
Step: 4. Sort the variables based on the fitness function in descending order
Step: 5. Divide the variables into main with optimal variables and sub-groups with remaining variables.
Step: 6. Repeat step 1 to 5 in the sub-groups until the deep searching criteria is met.
a. Find the optimal clusters (sbest) in each subgroup
b. To update the particle, use xi(t+1) = xi(t+1) + vi(t+1) and vi(t + 1) = wvi(t) + c1r1(t) pi(t) - xi(t) + c2r2(t)(ps(t) - xi(t))
c. If the particle is not updated,
i. use xi(t+1) = xi(t+1) + vi(t+1) and vi(t + 1) = wvi(t) + c1r1(t) pi(t) - xi(t) + c2r2(t)(pg(t) - xi(t))
d. End
e. Even if the particle is not updated, regenerate both position and velocity.
f. Update the optimal clusters (sbest) in each subgroup
ISSN: 2005-4238 IJAST 109
Copyright ⓒ 2019 SERSC
g. If (sbest) >gbest
i. Update the value over poor particle in main group.
Step: 7. If I = migration time,
a. Mix the entire particles and go to step 3 Step: 8. Else if Imigration time,
a. Go to step 6 Step: 9. End
Step: 10. If I = Imax, a. End the process Step: 11. If IImax
Step: 12. Go to step 5 Step: 13. End.
Fig. 1. The flowchart of the Hybrid PSO-SFLA algorithm to update particles in subgroups.
Start
Cvc=0
cvc=cvc+1
Determine the pbest, and sbest
Update according to formula (7)
Which is better than the original
Which is better Replace sbest with gbest and update again
Generate new particle randomly
Replace with new particle
Cvc = n
Finish No
Yes Yes
Yes
No
ISSN: 2005-4238 IJAST 110
Copyright ⓒ 2019 SERSC
Fig. 2. Process of migration 3. Experimental analysis
This section presents the evaluation of robustness and effectiveness of proposed system over multivariate categorical datasets using benchmark evaluation criteria.
3.1 Experimental Setups
The performance of the proposed clustering is evaluated by comparing the conventional ensemble algorithm and details of which is given below. The experimental setup for evaluation is given below.
The base clusters are generated using random partition algorithms (I, II-Fixed k, II-Random k, III- Fixed k and III-Random k) with different clusters that lies in the range of {2, √N}, where N is the number of data points.
To aggregate the base clusters, the co-association and graph based consensus functions are used. The proposed method is further compared with agglomerative clustering methods like single link (SL) and complete link (CL), and other consensus methods like Meta-CLustering Algorithm (MCLA), Cluster-based Similarity Partitioning Algorithm (CSPA) and Hyper-Graph Partitioning Algorithm (HGPA).
The size of base clustering is set as 50 i.e. M = 50 runs for 20 iterations and hybrid PSO-SFLA method runs for 1500 iterations.
The proposed method is implemented using MATLAB computing and Intel Xeon CPU E5- [email protected] GHz is used to conduct the experiments with 128 GB RAM.
3.2 Data sets
The properties of categorical dataset is shown in Table.1. These datasets has supervised class information and it is labelled one. The class labels are used for evaluation and not for ensemble or clustering process.
Main group Subgroup 1 Main group Subgroup 1
Subgroup 2
Subgroup 3
Subgroup 3
Subgroup 3
ISSN: 2005-4238 IJAST 111
Copyright ⓒ 2019 SERSC
Table.1. Dataset Description Dataset Data Points (N) /
Instances / Objects
Attribute
(d) Classes HIV-1 protease
cleavage 6590 1 2
SPECT Heart 267 22 2
Primary Tumor 339 17 22
Breast Cancer 286 9 2
Thyroid Disease 7200 21 3
Soybean (Large) 307 35 19
Soybean (Small) 47 35 4
Mushroom 8124 22 2
Audiology
(Standardized) 226 69 9
3.3 Evaluation criteria
The comprehensive results are obtained using three popular metrics which measures the effectiveness of the proposed algorithm. It includes CA, ARI and NMI [18] for measuring the quality of clustering by measuring the agreement between the algorithm and a ground truth value.
4. Results on effectiveness analysis
In this subsection, we focus on the clustering performance of different clustering ensemble selection algorithms on above mentioned 12 data sets with three evaluation criteria. In particular, the results of 50%
base clustering’s subset are presented, and the related statistical tests are carried out.
Table.2. Results of Classification Accuracy over different datasets.
Dataset Ensemble
Factor Proposed FALCEF MCLA CO+SL CO+AL CSPA HGPA
HIV-1 protease cleavage
I 0.9205 0.96 0.882 0.85 0.864 0.85 0.585 II-Fixed
k 0.956 0.961 0.971 0.842 0.868 0.834 0.875 II-
Random k
0.9653 0.96 0.995 0.885 0.883 0.843 0.84 III-Fixed
k 0.9684 0.936 0.92 0.82 0.913 0.833 0.853 III-
Random k
0.9526 0.969 0.926 0.846 0.862 0.862 0.834
SPECT Heart
I 0.8035 0.7564 0.735 0.565 0.562 0.746 0.66 II-Fixed
k 0.8852 0.865 0.785 0.58 0.742 0.775 0.706 II-
Random k
0.8325 0.833 0.762 0.56 0.663 0.7 0.616 III-Fixed
k 0.8365 0.784 0.735 0.54 0.709 0.716 0.645
ISSN: 2005-4238 IJAST 112
Copyright ⓒ 2019 SERSC
III- Random
k
0.8435 0.7945 0.766 0.54 0.6645 0.665 0.682
Primary Tumor
I 0.7234 0.629 0.436 0.26 0.372 0.443 0.288 II-Fixed
k 0.7335 0.634 0.412 0.306 0.464 0.42 0.425 II-
Random k
0.7429 0.625 0.435 0.235 0.434 0.425 0.426 III-Fixed
k 0.7336 0.6916 0.422 0.263 0.446 0.465 0.442 III-
Random k
0.7721 0.6925 0.466 0.283 0.435 0.475 0.412
Breast Cancer
I 0.9352 0.953 0.93 0.647 0.665 0.835 0.666 II-Fixed
k 0.9826 0.9822 0.966 0.621 0.945 0.816 0.888 II-
Random k
0.9865 0.9843 0.912 0.603 0.935 0.839 0.8 III-Fixed
k 0.9834 0.9898 0.987 0.635 0.9985 0.82 0.824 III-
Random k
0.9829 0.9724 0.935 0.653 0.935 0.822 0.866
Thyroid Disease
I 0.8235 0.782 0.6 0.54 0.7 0.635 0.545 II-Fixed
k 0.8434 0.8 0.756 0.6 0.735 0.649 0.626 II-
Random k
0.8579 0.785 0.79 0.658 0.632 0.7 0.644 III-Fixed
k 0.8565 0.825 0.735 0.534 0.6345 0.624 0.69 III-
Random k
0.8648 0.82 0.51 0.69 0.6 0.61 0.699
Soybean (Large)
I 0.9462 0.935 0.888 0.885 0.852 0.826 0.537 II-Fixed
k 0.9512 0.937 0.91 0.8825 0.872 0.836 0.848 II-
Random k
0.9565 0.941 0.929 0.8612 0.865 0.896 0.834 III-Fixed
k 0.9565 0.954 0.935 0.8915 0.935 0.816 0.849 III-
Random k
0.9542 0.945 0.931 0.845 0.819 0.852 0.834 Soybean I 0.8078 0.7561 0.734 0.573 0.525 0.748 0.57
ISSN: 2005-4238 IJAST 113
Copyright ⓒ 2019 SERSC
(Small) II-Fixed
k 0.8345 0.802 0.785 0.545 0.798 0.735 0.765 II-
Random k
0.8552 0.826 0.77 0.545 0.662 0.75 0.636 III-Fixed
k 0.8316 0.786 0.773 0.5342 0.725 0.726 0.656 III-
Random k
0.8424 0.785 0.763 0.549 0.665 0.635 0.663
Mushroom
I 0.7245 0.685 0.43 0.254 0.362 0.459 0.301 II-Fixed
k 0.7305 0.647 0.456 0.3082 0.437 0.54 0.428 II-
Random k
0.7406 0.686 0.484 0.265 0.452 0.435 0.465 III-Fixed
k 0.7332 0.6845 0.456 0.277 0.421 0.442 0.436 III-
Random k
0.7605 0.6458 0.475 0.265 0.433 0.453 0.465
Audiology (Standardized)
I 0.9621 0.964 0.939 0.635 0.645 0.811 0.662 II-Fixed
k 0.9892 0.9 0.964 0.695 0.932 0.848 0.866 II-
Random k
0.9892 0.975 0.945 0.668 0.921 0.832 0.853 III-Fixed
k 0.9875 0.9814 0.959 0.661 0.94 0.823 0.84 III-
Random k
0.9816 0.974 0.957 0.674 0.952 0.832 0.856 Table.3. Overall CA, NMI and ARI between the proposed and other methods
Ensemble
Factor Methods CA NMI ARI
B W B W B W
I
Proposed 195 21 158 57 171 57 FALCEF 186 26 144 54 163 54 MCLA 172 34 139 64 151 62 CO+SL 35 179 83 135 46 162 CO+AL 73 142 116 94 86 123 CSPA 107 94 134 83 108 96 HGPA 23 192 18 209 25 203
II – fixed k
Proposed 239 3 231 5 225 7 FALCEF 226 4 229 6 213 8 MCLA 209 4 203 12 201 11 CO+SL 26 170 36 149 29 162 CO+AL 132 47 133 42 133 38
ISSN: 2005-4238 IJAST 114
Copyright ⓒ 2019 SERSC
CSPA 85 92 67 103 84 103 HGPA 65 79 92 93 95 92
II – random k
Proposed 244 2 223 6 226 6 FALCEF 236 2 216 8 220 7 MCLA 208 4 204 9 201 13 CO+SL 17 167 37 142 32 162 CO+AL 96 80 94 64 117 55
CSPA 74 115 47 130 51 127 HGPA 66 119 41 137 48 130
III –fixed k
Proposed 224 4 216 7 208 7 FALCEF 217 5 202 8 196 8 MCLA 197 7 192 13 184 17 CO+SL 17 173 38 159 28 170 CO+AL 114 43 117 43 137 30
CSPA 72 87 51 94 60 108 HGPA 73 82 58 103 65 100
III – random k
Proposed 231 2 217 7 212 8 FALCEF 227 3 208 9 198 9 MCLA 205 6 197 13 182 11 CO+SL 16 179 35 145 33 162 CO+AL 82 66 83 60 102 42
CSPA 53 113 38 123 46 122 HGPA 52 116 37 133 62 130
The table 2 shows the classification accuracy between the proposed and other conventional methods. The results shows that the proposed method attains a better clustering accuracy rates than the existing algorithm for all the given datasets. The results indicate the improved quality of proposed algorithm over all the given categorical datasets than the other exiting algorithm. This algorithm hence can be used as benchmark tool for calculating the clustering in categorical dataset. Similarly, the CA, NMI and ARI is good for both better (B) and worst (W) results and proves to be computationally efficient.
5. CONCLUSIONS
This paper presents a novel link ensemble Spectral Graph Clustering Ensemble clustering method, where PSO-SFLA is used to optimize the clusters. The proposed method proves to be computationally efficient than the existing method and avoids the degradation issues in clustering. It explore and utilizes degree of relationship between the generated solutions in base clusters. The MPSM technique with refined matrix clusters well the dataset with clustering ensemble. The PSO-SFLA algorithm attains better CA, NMI and ARI than the existing method.
REFERENCES
[1] Jain, A. K., Murty, M. N., & Flynn, P. J. (1999). Data clustering: a review. ACM computing surveys (CSUR), 31(3), 264-323.
[2] Teng, G., He, C., Xiao, J., He, Y., Zhu, B., & Jiang, X. (2016). Cluster ensemble framework based on the group method of data handling. Applied Soft Computing, 43, 35-46.
[3] Xiao, W., Yang, Y., Wang, H., Li, T., & Xing, H. (2016). Semi-supervised hierarchical clustering ensemble and its application. Neurocomputing, 173, 1362-1376.
[4] Franek, L., & Jiang, X. (2014). Ensemble clustering by means of clustering embedding in vector spaces. Pattern Recognition, 47(2), 833-842.
ISSN: 2005-4238 IJAST 115
Copyright ⓒ 2019 SERSC
[5] Yu, H., & Zhou, Q. (2013, October). A Cluster Ensemble Framework Based on Three-Way Decisions. In RSKT (pp. 302-312).
[6] Parvin, H., & Minaei-Bidgoli, B. (2015). A clustering ensemble framework based on selection of fuzzy weighted clusters in a locally adaptive clustering algorithm. Pattern Analysis and Applications, 18(1), 87-112.
[7] Yang, F., Li, T., Zhou, Q., & Xiao, H. (2017). Cluster ensemble selection with constraints. Neurocomputing, 235, 59-70.
[8] Akbari, E., Dahlan, H. M., Ibrahim, R., & Alizadeh, H. (2015). Hierarchical cluster ensemble selection. Engineering Applications of Artificial Intelligence, 39, 146-156.
[9] Tsai, C. F., & Hung, C. (2012). Cluster ensembles in collaborative filtering recommendation. Applied Soft Computing, 12(4), 1417-1425.
[10] Hu, J., Li, T., Wang, H., & Fujita, H. (2016). Hierarchical cluster ensemble model based on knowledge granulation. Knowledge-Based Systems, 91, 179-188.
[11] Yousefnezhad, M., Reihanian, A., Zhang, D., & Minaei-Bidgoli, B. (2016). A new selection strategy for selective cluster ensemble based on Diversity and Independency. Engineering Applications of Artificial Intelligence, 56, 260-272.
[12] Hu, J., Li, T., Luo, C., Fujita, H., & Yang, Y. (2017). Incremental fuzzy cluster ensemble learning based on rough set theory. Knowledge-Based Systems.
[13] Yu, Z., Wong, H. S., You, J., Yu, G., & Han, G. (2012). Hybrid cluster ensemble framework based on the random combination of data transformation operators. Pattern Recognition, 45(5), 1826-1837.
[14] Hore, P., Hall, L. O., & Goldgof, D. B. (2009). A scalable framework for cluster ensembles. Pattern recognition, 42(5), 676-688.
[15] Bedalli, E., Mançellari, E., & Asilkan, O. (2016). A Heterogeneous Cluster Ensemble Model for Improving the Stability of Fuzzy Cluster Analysis. Procedia Computer Science, 102, 129-136.
[16] Bao, H., & Han, F. (2017, September). A Hybrid Multi-swarm PSO Algorithm Based on Shuffled Frog Leaping Algorithm. In International Conference on Intelligent Science and Big Data Engineering (pp. 101-112). Springer, Cham.
[17] de Oca, M. A. M., Aydın, D., & Stützle, T. (2011). An incremental particle swarm for large- scale continuous optimization problems: an example of tuning-in-the-loop (re) design of optimization algorithms. Soft Computing, 15(11), 2233-2255.
[18] Zhao, X., Liang, J., & Dang, C. (2017). Clustering ensemble selection for categorical data based on internal validity indices. Pattern Recognition, 69, 150-168.