• No results found

7 CONCLUSIONS AND FUTURE WORKS

We presented a new approach to MOO based multiview clustering. Many real-world datasets en-compass and imply multiple representations. Obtaining one consensus partitioning is usually a nontrivial task. We therefore proposed three approaches based on the concept of MOO for that pur-pose. The first approach can automatically identify the appropriate partitioning satisfying multiple views simultaneously for a given dataset. A new encoding strategy and a new way of measuring dissimilarity between multiple partitionings obtained using different views were proposed in the process. Finally, a new method is proposed to combine the partitionings obtained using multiple views to obtain a single consensus partitioning. The final Pareto optimal front provides a set of such consensus partitionings. The second and third (baseline) approaches are based on the con-cepts of cluster ensembles. These were developed to properly combine the outputs of some simple clustering algorithms like k-means, hierarchical clustering, and so forth. The search operators used in the multiobjective clustering approaches help to combine the outputs of these simple clustering techniques effectively and efficiently. These proposed approaches differ in the ways of calculating the objective functions. In all three approaches, the search capability of AMOSA, a SA based MOO technique, is used to explore the search space efficiently.

The results of these approaches were shown for several datasets, some of them from the UCI machine learning repository. Two views were used for each dataset. Results show the efficacy of the first approach compared to other two approaches and also several existing single and multiview based clustering techniques in properly detecting suitable partitionings. Moreover, we have shown the effectiveness of the approach for a real-world application, web-search result clustering. Two views are considered for these datasets: one semantic view and one syntactic view. Results were shown for three benchmark datasets.

Future work includes the use of some multiobjective algorithms in place of AMOSA to under-stand its capacity as the underlying optimization technique. Some multiobjective evolutionary algorithms like NSGA-II can be used in place of AMOSA in order to make a comparative study.

Automatically determining multiple views from a dataset without using any domain knowledge is another important research direction. The search capability of MOO could be used also for that purpose. Apart from web search data, many other real-world domains like molecular biology (RNAseq gene expression and proteomics) can benefit from such techniques. Finally, developing effective and efficient objective functions to capture the quality of partitionings from different do-mains and then incorporating those in the developed multiobjective based approaches will be the subject of further research.

REFERENCES

Sudipta Acharya, Sriparna Saha, Jose G. Moreno, and Gaël Dias. 2014. Multi-objective search results clustering. In Proceed-ings of the 25th International Conference on Computational Linguistics (COLING’14), Conference: Technical Papers, August 23–29, 2014, Dublin, Ireland. 99–108.

S. Bandyopadhyay, U. Maulik, and A. Mukhopadhyay. 2007a. Multiobjective genetic clustering for pixel classification in remote sensing imagery. IEEE Transactions Geoscience and Remote Sensing 45, 5 (2007), 1506–1511.

S. Bandyopadhyay, A. Mukhopadhyay, and U. Maulik. 2007b. An improved algorithm for clustering gene expression data.

Bioinformatics 23, 21 (2007), 2859–2865.

S. Bandyopadhyay and S. Saha. 2007. GAPS: A clustering method using a new point symmetry based distance measure.

Pattern Recognition 40 (2007), 3430–3451.

Sanghamitra Bandyopadhyay, Sriparna Saha, Ujjwal Maulik, and Kalyanmoy Deb. 2008. A simulated annealing based multi-objective optimization algorithm: AMOSA. IEEE Transactions on Evolutionary Computation 12, 3 (June 2008), 269–283.

Asa Ben-Hur and Isabelle Guyon. 2003. Detecting stable clusters using principal component analysis. In Functional Ge-nomics. Springer, 159–182.

J. C. Bezdek. 1981. Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum, New York.

Steffen Bickel and Tobias Scheffer. 2004. Multi-view clustering. In Proceedings of the 4th IEEE International Conference on Data Mining (ICDM’04). IEEE Computer Society, Washington, DC, 19–26.

Xiao Cai, Feiping Nie, and Heng Huang. 2013. Multi-view K-means clustering on big data. In Proceedings of the 23rd Inter-national Joint Conference on Artificial Intelligence (IJCAI’13). AAAI Press, 2598–2604.

Claudio Carpineto, Stanislaw Osiński, Giovanni Romano, and Dawid Weiss. 2009. A survey of web clustering engines. ACM Computing Surveys 41, 3(July 2009), Article 17, 38 pages. DOI:http://dx.doi.org/10.1145/1541880.1541884

Claudio Carpineto and Giovanni Romano. 2010. Optimal meta search results clustering. In Proceedings of the 33rd Inter-national ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’10). ACM, New York, NY, 170–177. DOI:http://dx.doi.org/10.1145/1835449.1835480

Daniel Crabtree, Xiaoying Gao, and Peter Andreae. 2005. Improving web clustering by cluster selection. In Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI’05). IEEE Computer Society, Washington, DC, 172–178. DOI:http://dx.doi.org/10.1109/WI.2005.75

Joaquim Ferreira da Silva, Gaël Dias, Sylvie Guilloré, and José Gabriel Pereira Lopes. 1999. Using localmaxs algorithm for the extraction of contiguous and non-contiguous multiword lexical units. In Proceedings of 9th Portuguese Conference in Artificial Intelligence (EPIA’99). 113–132.

Ian Davidson, Buyue Qian, Xiang Wang, and Jieping Ye. 2013. Multi-objective multi-view spectral clustering via Pareto optimization. In Proceedings of the 13th SIAM International Conference on Data Mining. 234–242. DOI:http://dx.doi.org/

10.1137/1.9781611972832.26

V. R. de Sa. 2005. Spectral clustering with two views. In Proceedings of ICML Workshop on Learning With Multiple Views.

Kalyanmoy Deb. 2001. Multi-Objective Optimization Using Evolutionary Algorithms. John Wiley and Sons, Ltd., England.

K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan. 2002. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation 6, 2 (April 2002), 182–197. DOI:http://dx.doi.org/10.1109/4235.996017 J. Handl and J. Knowles. 2007. An evolutionary approach to multiobjective clustering. IEEE Transactions on Evolutionary

Computation 11, 1 (2007), 56–76.

Harold M. Hastings. 1985. Convergence of simulated annealing-hastings. SIGACT News 17, 2 (September 1985), 52–63.

DOI:http://dx.doi.org/10.1145/382252.382806

Lawrence Hubert and Phipps Arabie. 1985. Comparing partitions. Journal of Classification 2, 1 (1985), 193–218. DOI:http://

dx.doi.org/10.1007/BF01908075

A. K. Jain, M. N. Murty, and P. J. Flynn. 1999. Data clustering: A review. ACM Computing Surveys 31, 3 (September 1999), 264–323. DOI:http://dx.doi.org/10.1145/331499.331504

I.T. Jolliffe. 1986. Principal Component Analysis. Springer Verlag.

S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi. 1983. Optimization by simulated annealing. Science 220, 4598 (1983), 671–680.

DOI:http://dx.doi.org/10.1126/science.220.4598.671

Abhishek Kumar and Hal Daumé III. 2011. A co-training approach for multi-view spectral clustering. In Proceedings of the 28th International Conference on Machine Learning (ICML’11), Bellevue, Washington, June 28–July 2, 2011. 393–

400.

D. Machado, T. Barbosa, S. Pais, B. Martins, and G. Dias. 2009. Universal mobile information retrieval. In Proceedings of the 5th International Conference on Universal Access in Human-Computer Interaction (HCI). 345–354.

U. Maulik, A. Mukhopadhyay, and S. Bandyopadhyay. 2009. Combining Pareto-optimal clusters using supervised learning for identifying co-expressed genes. BMC Bioinformatics 27 (2009), 1197–1208.

Jose G. Moreno, Gaël Dias, and Guillaume Cleuziou. 2013. Post-retrieval clustering using third-order similarity measures. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL’13), Volume 2: Short Papers.

153–158.http://aclweb.org/anthology/P/P13/P13-2028.pdf.

Jose G. Moreno and Gaël Harry Dias. 2014. Easy web search results clustering: When baselines can reach state-of-the-art algorithms. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, (EACL’14). 1–5.

A. Mukhopadhyay and U. Maulik. 2009. Unsupervised pixel classification in satellite imagery using multiobjective fuzzy clustering combined with SVM classifier. IEEE Transactions on Geoscience and Remote Sensing 47, 4 (2009), 1132–1138.

A. Mukhopadhyay, U. Maulik, and S. Bandyopadhyay. Multi-objective genetic algorithm based fuzzy clustering of categor-ical attributes. IEEE Transactions on Evolutionary Computation 13, 5 (2009), 991–1005.

Roberto Navigli and Giuseppe Crisafulli. 2010. Inducing word senses to improve web search result clustering. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (EMNLP’10). Association for Computational Linguistics, Stroudsburg, PA, 116–126.

S. Osinski and D. Weiss. 2005. A concept-driven algorithm for clustering search results. IEEE Intelligent Systems 20, 3 (2005), 48–54.

Malay Kumar Pakhira, Sanghamitra Bandyopadhyay, and Ujjwal Maulik. 2004. Validity index for crisp and fuzzy clus-ters.Pattern Recognition 37, 3 (2004), 487–501.

Liam Paninski. 2003. Estimation of entropy and mutual information. Neural Computation 15, 6 (June 2003), 1191–1253.

DOI:http://dx.doi.org/10.1162/089976603321780272

Y. J. Park and M. S. Song. 1998. A genetic algorithm for clustering problems. In Proceedings of the 3rd Annual Conference on Genetic Programming. Paris, France, 568–575.

Stephen Robertson. 2004. Understanding inverse document frequency: On theoretical arguments for IDF. Journal of Docu-mentation 60 (2004), 503–520.

Sriparna Saha and Sanghamitra Bandyopadhyay. 2009. A new multiobjective simulated annealing based clustering tech-nique using symmetry. Pattern Recognition Letters 30, 15 (2009), 1392–1403. DOI:http://dx.doi.org/10.1016/j.patrec.2009.

07.015

S. Saha and S. Bandyopadhyay. 2010a. A new multiobjective clustering technique based on the concepts of stability and symmetry. Knowledge and Information Systems 23, 1 (2010), 1–27.

S. Saha and S. Bandyopadhyay. 2010b. A new symmetry based multiobjective clustering technique for automatic evolution of clusters. Pattern Recognition 43, 3 (2010), 738–751.

Sriparna Saha and Sanghamitra Bandyopadhyay. 2013. A generalized automatic clustering algorithm in a multiob-jective framework. Applied Soft Computing 13, 1 (January 2013), 89–108. DOI:http://dx.doi.org/10.1016/j.asoc.2012.

08.005

Shiliang Sun. 2013. A survey of multi-view machine learning. Neural Computing and Applications 23, 7 (2013), 2031–2038.

DOI:http://dx.doi.org/10.1007/s00521-013-1362-6

Sandeep Tata and Jignesh M. Patel. 2007. Estimating the selectivity of tf-idf based cosine similarity predicates. SIGMOD Record 36, 2 (June 2007), 7–12. DOI:http://dx.doi.org/10.1145/1328854.1328855

Grigorios Tzortzis and Aristidis Likas. 2009. Convex Mixture Models for Multi-view Clustering. Springer, Berlin, 205–214.

DOI:http://dx.doi.org/10.1007/978-3-642-04277-5_21

Grigorios Tzortzis and Aristidis Likas. 2012. Kernel-based weighted multi-view clustering. In Proceedings of the12th IEEE International Conference on Data Mining (ICDM’12), Brussels, Belgium, December 10–13, 2012. 675–684. DOI:http://dx.

doi.org/10.1109/ICDM.2012.43

Abdul Wahid, Xiaoying Gao, and Peter Andreae. 2014. Multi-view clustering of web documents using multi-objective genetic algorithm. In Proceedings of the IEEE Congress on Evolutionary Computation. 2625–2632.

Zhe Wang, Songcan Chen, and Daqi Gao. 2011. A novel multi-view learning developed from single-view patterns. Pattern Recognition 44, 10–11 (2011), 2395–2413. DOI:http://dx.doi.org/10.1016/j.patcog.2011.04.002

Zhe Wang, Songcan Chen, and Tingkai Sun. 2008. MultiK-MHKS: A novel multiple kernel learning algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence 30, 2 (2008), 348–353. DOI:http://dx.doi.org/10.1109/TPAMI.

2007.70786

Xijiong Xie and Shiliang Sun. 2013. Multi-view clustering ensembles. In Proceedings of International Conference on Machine Learning and Cybernetics (ICMLC’13), Tianjin, China, July 14–17, 2013. 51–56. DOI:http://dx.doi.org/10.1109/ICMLC.

2013.6890443

X. L. Xie and G. Beni. 1991a. A validity measure for fuzzy clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence 13 (1991), 841–847.

Xuanli Lisa Xie and Gerardo Beni. 1991b. A validity measure for fuzzy clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence 13, 8 (1991), 841–847.

O. Zamir and O. Etzioni. 1998. Web document clustering: A feasibility demonstration. In Proceedings of 21st Annual Inter-national ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 46–54.

Erliang Zeng, Chengyong Yang, Tao Li, and Giri Narasimhan. 2010. Clustering genes using heterogeneous data sources.International Journal of Knowledge Discovery in Bioinformatics 1, 2 (2010), 12–28.

Received August 2017; revised January 2018; accepted January 2018

Related documents