2671
Bibliometric Survey On Clustering Algorithm Of
Machine Learning And Its Application
Rashmi Y. Lad, P. S. MetkewarAbstract : Machine learning is an application of artificial intelligence. It is an emerging field of computer science. It provides the ability to automatically learn from experience. Learning means to acquire knowledge based on previous experience and perception. Learning can be performed in a supervised or an unsupervised way. Clustering is a type of learning technique that works in an unsupervised way. The objective of this bibliomertic study is to understand the utility of clustering algorithm and machine learning. In this study the author has retrieved 97 research papers from Scopus database which were published from 2003 to 2020. The survey revealed the maximum publication from journals and conferences that associated to computer science.
Index Terms : Artificial intelligence, Bibliometric Analysis, clustering algorithm, deep learning, Machine learning, supervised learning, unsupervised learning.
—————————— ——————————
1.
INTRODUCTION
Machine learning mainly focuses on new data and changes in the development of computer program. Machine learning algorithm analyses the pattern matching data and predicts the future analysis on the basis of it (R. Ahuja 2020) Machine learning is classified into different categories like supervised, unsupervised and deep learning. Classification and regression are two main types of supervised learning. Supervised learning is performed under the guidance. The new system which develops after sufficient training helps predict the output.
Clustering comes under unsupervised learning.
Unsupervised learning is the techniques that classifies and analysis the data with no labeling. It can be divided into groups based on similarity of data. There are many clustering algorithms available to find clusters that can be used in machine learning, artificial intelligence, data mining and data science.
Speech recognition
Face detection
Spam detection
Credit Card fraud detection
Medical diagnosis
Deep learning is a new technology in artificial neural network in the recent year; it comes up as a powerful tool of machine learning for the future of artificial intelligence (Ravì, D., 2016). It plays an important role in the field of speech recognition, image recognition, prediction of drug molecules, predicting the effect of DNA and natural language and analysis of sentiment analysis. To know more about the clustering algorithm and machine learning implementation, there is a need of bibliometric survey in this topic. This paper presents a bibliometric survey from section 2 that highlights the preliminary data collection about clustering and machine learning publication; section 3 shows bibliometric analysis. Section 4 shows the conclusion and reference at the end.
2.
PRELIMINARY
DATA
COLLECTION
The systematic method defines the process to retrieve the primary data collection by search string of documents and final selection of publication from different libraries or databases. There are various databases available like
Scopus
Web of science
Google scholar,
Research gate
Clarivate
In this paper author has used Scopus database to search for peer reviewed research literature.
2.1 Search strategy by Keywords
Papers were retrieved by search string with the help of Boolean connector like AND, OR. The paper is based on clustering algorithm and machine learning so the search string is
("Clustering algorithm" AND ‖machine learning" AND
"Unsupervised learning" AND "Artificial Intelligence" OR "deep learning")
2.2 Search strategy for Data
On the basis of primary selection by keywords author received ————————————————
Rashmi Y. Lad is currently pursuing Ph.D in Computer Studies in Symbiosis International University, Pune, India. E-mail:
[email protected] working in MIT ACSC, Alandi (D),Pune
Dr. P. S. Metkewar is Professor& Dy. Director at Symbiosis Institute of Computer Studies and Research (SICSR), Symbiosis International (Deemed) University, Lavale , Pune, India e-mail: dy.director @sicsr.ac.in
Fig. 1. Types of Machine Learning
97 papers from 4 different languages from Scopus databases.
Author retrieved 97 papers from the time span of 2003 to 2020 from different journal, conference and book publication from Scopus databases.
2.3 Search strategy by publishing year
In this section author has discussed the publication details year wise and has shown the research paper count of each year in the form of a table. Fig. 3 is a graphical representation of the same.
3.
BIBLIOMETRIC
ANALYSIS
The survey paper has bibliometric analysis performed in two ways.
Geographical region analysis
Analysis by keyword, area and affiliation
3.1 Geographical regional analysis
Fig. 4 gives the details of geographical representation of publication of research paper with the help of imapbuilder online tool. This tool takes the data in excel format and then plots the publication country on map.
It is observed in Fig 5 that the highest research work done in clustering and machine learning algorithm is done in India followed by US and china. Lowest research work is done in Germany, Brazil & Australia. Figure shows the publication work country wise.
2673 3.2 Analysis by Keywords
In this bibliometric survey study Fig 6 gives the details of keywords from the research publication retrieved by author. Clustering algorithm and artificial intelligence are the two keywords mostly used in maximum research paper.
3.3. Analysis by Area
Fig. 7 shows the details of subject area in which clustering and machine learning algorithm works. Maximum research publication has been carried out in the subject of computer science i.e. 53% and 20% research work is done in the fields of engineering. Less research work is done in the fields of neuroscience and agricultural and biological science in which still there is a scope of future research.
3.4 Analysis by Affiliation
In this survey paper Fig 8 shows the details of affiliation of top ten universities and their contribution in clustering and machine learning algorithm.
3.5 Analysis details by Affiliation
In this bibliometric survey author has retrieved 97 papers from journal, conference and book chapter. It has been observed that 76.09% of papers were published in conferences, 22.9% of the papers were published in journals and only 1.09% of papers were published in books as a chapter. No contributions from review articles were found.
3.6 Analysis by citation
Author has used PlumX matrix tool for showing the citation details of the research paper ―Subspace learning for unsupervised feature selection via matrix factorization ―. It includes citation details, its usage, abstract and bibliographic information.
CITATIONS
C
ONCLUSIONMachine learning is an emerging field in the area of computer science. It works in various fields like speech recognition, medical science, finance, prediction and self-driving car etc. This bibliometric study gives a detailed analysis about the topic. The bibliometric survey study on clustering algorithm of machine learning and its application shows maximum publication from LNCS (Lecture notes of Computer Science including Artificial Intelligence and Bioinformatics) category. Most of the research papers from journal, conference and book chapter are affiliated to computer science. The survey revealed that maximum research papers were published in India followed by the US and China.
5.
REFERENCES
[1] Adams, C., Alrashed, M., An, R., Anthony, J., Asaadi, J., Ashkenazi, A., Zhang, C. (2019). Deep neural network for pixel-level electromagnetic particle identification in the MicroBooNE liquid argon time projection chamber. Physical Review D, 99(9) doi:10.1103/PhysRevD.99.092001
[2] Adinugroho, S., Sari, Y. A., Fauzi, M. A., & Adikara, P. P. (2017). Optimizing K-means text document clustering using latent semantic indexing and pillar algorithm. Paper presented at the 5th International Symposium on Computational and Business Intelligence, ISCBI 2017, 81-85. doi:10.1109/ISCBI.2017.8053549 Retrieved from www.scopus.com
[3] Ahuja, R., Chug, A., Gupta, S., Ahuja, P., & Kohli, S. (2020). Classification and Clustering Algorithms of Machine Learning with their Applications. In Nature-Inspired Computation in Data Mining and Machine Learning (pp. 225-248). Springer, Cham. [4] Ai, P., Wang, D., Huang, G., & Sun, X. (2018).
2675
double-beta decay signal/background discrimination in high-pressure gaseous time projection chamber. Journal of Instrumentation, 13(8) doi:10.1088/1748-0221/13/08/P08015 [5] Ali, A. K., & Erçelebi, E. (2019). An M-QAM signal modulation
recognition algorithm in AWGN channel. Scientific Programming, 2019 doi:10.1155/2019/6752694
[6] Aminanto, M. E., Choi, R., Tanuwidjaja, H. C., Yoo, P. D., & Kim, K. (2017). Deep abstraction and weighted feature selection for wi-fi impersonation detection. IEEE Transactions on Information Forensics and Security, 13(3), 621-636. doi:10.1109/TIFS.2017.2762828
[7] Amruthnath, N., & Gupta, T. (2018). Fault class prediction in unsupervised learning using model-based clustering approach. Paper presented at the 2018 International Conference on Information and Computer Technologies, ICICT 2018, 5-12. doi:10.1109/INFOCT.2018.8356831 Retrieved from www.scopus.com
[8] Baldi, P., Bian, J., Hertel, L., & Li, L. (2019). Improved energy reconstruction in NOvA with regression convolutional neural networks. Physical Review D, 99(1) doi:10.1103/PhysRevD.99.012011
[9] Bi, Y., Wang, P., Guo, X., Wang, Z., & Cheng, S. (2019). K-means clustering optimizing deep stacked sparse autoencoder. Sensing and Imaging, 20(1) doi:10.1007/s11220-019-0227-1
[10]Bostani, H., & Sheikhan, M. (2017). Modification of optimum-path forest using markov cluster process algorithm. Paper presented at the Proceedings - 2016 2nd International Conference of Signal Processing and Intelligent Systems, ICSPIS 2016, doi:10.1109/ICSPIS.2016.7869874 Retrieved from www.scopus.com
[11]Braun, P., Cuzzocrea, A., Leung, C. K., Pazdor, A. G. M., Souza, J., & Tanbeer, S. K. (2019). Pattern mining from big IoT data with fog computing: Models, issues, and research perspectives. Paper presented at the Proceedings - 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGrid 2019, 584-591. doi:10.1109/CCGRID.2019.00075 Retrieved from www.scopus.com
[12]Carvalho, T. P., Soares, F. A. A. M. N., Vita, R., Francisco, R. D. P., Basto, J. P., & Alcalá, S. G. S. (2019). A systematic literature review of machine learning methods applied to predictive maintenance. Computers and Industrial Engineering, 137 doi:10.1016/j.cie.2019.106024
[13]Cellini, F., Lavini, F., Berger, C., De Heer, W., & Riedo, E. (2019). Layer dependence of graphene-diamene phase transition in epitaxial and exfoliated few-layer graphene using machine learning. 2D Materials, 6(3) doi:10.1088/2053-1583/ab1b9f [14]Chen, Z., Zhang, C., Mu, T., Yan, T., Chen, Z., & Wang, Y.
(2019). An efficient representation-based subspace clustering framework for polarized hyperspectral images. Remote Sensing, 11(13) doi:10.3390/rs11131513
[15]Curiskis, S. A., Drake, B., Osborn, T. R., & Kennedy, P. J. (2020). An evaluation of document clustering and topic modelling in two online social networks: Twitter and reddit. Information Processing and Management, 57(2) doi:10.1016/j.ipm.2019.04.002
[16]Dai, C., Pi, D., Cui, L., & Zhu, Y. (2018). MTEEGC: A novel approach for multi-trial EEG clustering. Applied Soft Computing Journal, 71, 255-267. doi:10.1016/j.asoc.2018.07.006
[17]Dai, J., Chen, Y., Yi, Y., Bao, J., Wang, L., Zhou, W., & Lei, G. (2018). Unsupervised feature selection with ordinal preserving self-representation. IEEE Access, 6, 67446-67458. doi:10.1109/ACCESS.2018.2878855
[18]Daldal, N., Cömert, Z., & Polat, K. (2020). Automatic determination of digital modulation types with different noises using convolutional neural network based on time–frequency information. Applied Soft Computing Journal, 86 doi:10.1016/j.asoc.2019.105834
[19]Daldal, N., Yıldırım, Ö., & Polat, K. (2019). Deep long short-term memory networks-based automatic recognition of six different digital modulation types under varying noise conditions. Neural Computing and Applications, doi:10.1007/s00521-019-04261-2 [20]Das, A., Patterson, S., & Wittie, M. (2019). EdgeBench:
Benchmarking edge computing platforms. Paper presented at the Proceedings - 11th IEEE/ACM International Conference on Utility and Cloud Computing Companion, UCC Companion 2018, 175-180. doi:10.1109/UCC-Companion.2018.00053 Retrieved from www.scopus.com
[21]Dreyfus, P. -., & Kyritsis, D. (2018). A framework based on predictive maintenance, zero-defect manufacturing and scheduling under uncertainty tools, to optimize production capacities of high-end quality products doi:10.1007/978-3-319-99707-0_37 Retrieved from www.scopus.com
[22]Goyal, J., & Kishan, B. (2019). Progress on machine learning techniques for software fault prediction. International Journal of Advanced Trends in Computer Science and Engineering, 8(2), 305-311. doi:10.30534/ijatcse/2019/33822019
[23]Halibas, A. S., Shaffi, A. S., & Mohamed, M. A. K. V. (2018). Application of text classification and clustering of twitter data for business analytics. Paper presented at the Proceedings of Majan International Conference: Promoting Entrepreneurship and Technological Skills: National Needs, Global Trends, MIC 2018, 1-7. doi:10.1109/MINTC.2018.8363162 Retrieved from www.scopus.com
[24]Hellas, A., Ihantola, P., Petersen, A., Ajanovski, V. V., Gutica, M., Hynninen, T., . . . Liao, S. N. (2018). Predicting academic performance: A systematic literature review. Paper presented at the Annual Conference on Innovation and Technology in Computer Science Education, ITiCSE, 175-199. doi:10.1145/3293881.3295783 Retrieved from www.scopus.com
[25]Injadat, M., Salo, F., Nassif, A. B., Essex, A., & Shami, A. (2018). Bayesian optimization with machine learning algorithms towards anomaly detection. Paper presented at the 2018 IEEE Global Communications Conference, GLOBECOM 2018 - Proceedings, doi:10.1109/GLOCOM.2018.8647714 Retrieved from www.scopus.com
[26]Kim, K., & Aminanto, M. E. (2018). Deep learning in intrusion detection perspective: Overview and further challenges. Paper presented at the Proceedings - WBIS 2017: 2017 International Workshop on Big Data and Information Security, , 2018-January 5-10. doi:10.1109/IWBIS.2017.8275095 Retrieved from www.scopus.com
[27]Kowshalya, G., & Nandhini, M. (2018). Predicting fraudulent claims in automobile insurance. Paper presented at the Proceedings of the International Conference on Inventive Communication and Computational Technologies, ICICCT 2018, 1338-1343. doi:10.1109/ICICCT.2018.8473034 Retrieved from www.scopus.com
[28]Lavanya, P. G., Kouser, K., & Suresha, M. (2020). Efficient pre-processing and feature selection for clustering of cancer tweets doi:10.1007/978-981-13-6095-4_2 Retrieved from www.scopus.com
451-468. doi:10.1007/s10586-018-2516-1
[30]Li, J., Yang, L., Qu, Y., & Sexton, G. (2018). An extended Takagi– Sugeno–Kang inference system (TSK+) with fuzzy interpolation and its rule base generation. Soft Computing, 22(10), 3155-3170. doi:10.1007/s00500-017-2925-8
[31]Li, L., Yu, Y., Bai, S., Hou, Y., & Chen, X. (2017). An effective two-step intrusion detection approach based on binary classification and κ-NN. IEEE Access, 6, 12060-12073. doi:10.1109/ACCESS.2017.2787719
[32]Li, L., Zhang, H., Peng, H., & Yang, Y. (2018). Nearest neighbors based density peaks approach to intrusion detection. Chaos, Solitons and Fractals, 110, 33-40. doi:10.1016/j.chaos.2018.03.010
[33]Li, W., Dou, Z., Qi, L., & Shi, C. (2019). Wavelet transform based modulation classification for 5G and UAV communication in multipath fading channel. Physical Communication, 34, 272-282. doi:10.1016/j.phycom.2018.12.019
[34]Li, X., Dong, F., Zhang, S., & Guo, W. (2019). A survey on deep learning techniques in wireless signal recognition. Wireless Communications and Mobile Computing, 2019 doi:10.1155/2019/5629572
[35]Li, Z., Li, J., Wang, Y., & Wang, K. (2019). A deep learning approach for anomaly detection based on SAE and LSTM in mechanical equipment. International Journal of Advanced Manufacturing Technology, 103(1-4), 499-510. doi:10.1007/s00170-019-03557-w
[36]Lydia, E. L., Prasad, B., Chevuru, M. B., Shankar, K., & Kumar, K. V. (2019). An unsupervised deep learning methods for fabricating text mining analysis based on topic modeling and document clustering techniques. International Journal on Emerging Technologies, 10(2), 103-1039. Retrieved from www.scopus.com
[37]Mahendiran, A., & Appusamy, R. (2018). An intrusion detection system for network security situational awareness using conditional random fields. International Journal of Intelligent Engineering and Systems, 11(3), 196-204. doi:10.22266/IJIES2018.0630.21
[38]Meng, F., Chen, P., Wu, L., & Wang, X. (2018). Automatic modulation classification: A deep learning enabled approach. IEEE Transactions on Vehicular Technology, 67(11), 10760-10772. doi:10.1109/TVT.2018.2868698
[39]Naqi, S. M., Sharif, M., & Jaffar, A. (2018). Lung nodule detection and classification based on geometric fit in parametric form and deep learning. Neural Computing and Applications, doi:10.1007/s00521-018-3773-x
[40]Narendra Kumar, B., Sivarama Bhadri Raju, M. S. V., & Vardhan, B. V. (2019). A novel approach for selective feature mechanism for two-phase intrusion detection system. Indonesian Journal of Electrical Engineering and Computer Science, 14(1), 105-116. doi:10.11591/ijeecs.v14.i1.pp105-116
[41]Nisioti, A., Mylonas, A., Yoo, P. D., & Katos, V. (2018). From intrusion detection to attacker attribution: A comprehensive survey of unsupervised methods. IEEE Communications Surveys and Tutorials, 20(4), 3369-3388. doi:10.1109/COMST.2018.2854724
[42]Nivaashini, M., & Thangaraj, P. (2019). State-of-the-art machine learning and deep learning: Evolution of intelligent intrusion detection system against wireless network (wi-fi) attacks in internet of things (iot). International Journal of Innovative Technology and Exploring Engineering, 8(3), 118-130. Retrieved from www.scopus.com
[43]Pajouh, H. H., Javidan, R., Khayami, R., Dehghantanha, A., &
Choo, K. -. R. (2019). A layer dimension reduction and two-tier classification model for anomaly-based intrusion detection in IoT backbone networks. IEEE Transactions on Emerging Topics in Computing, 7(2), 314-323. doi:10.1109/TETC.2016.2633228 [44]Qian, Y., Chen, M., Chen, J., Hossain, M. S., & Alamri, A. (2018).
Secure enforcement in cognitive internet of vehicles. IEEE Internet of Things Journal, 5(2), 1242-1250. doi:10.1109/JIOT.2018.2800035
[45]Ravì, D., Wong, C., Deligianni, F., Berthelot, M., Andreu-Perez, J., Lo, B., & Yang, G. Z. (2016). Deep learning for health informatics. IEEE journal of biomedical and health informatics, 21(1), 4-21.
[46]Radovic, A., Williams, M., Rousseau, D., Kagan, M., Bonacorsi, D., Himmel, A., . . . Wongjirad, T. (2018). Machine learning at the energy and intensity frontiers of particle physics. Nature, 560(7716), 41-48. doi:10.1038/s41586-018-0361-2
[47]Sadowski, P., & Baldi, P. (2018). Deep learning in the natural sciences: Applications to physics doi:10.1007/978-3-319-99492-5_12 Retrieved from www.scopus.com
[48]Salo, F., Injadat, M., Nassif, A. B., Shami, A., & Essex, A. (2018). Data mining techniques in intrusion detection systems: A systematic literature review. IEEE Access, 6, 56046-56058. doi:10.1109/ACCESS.2018.2872784
[49]Shah, M. H., & Dang, X. (2019). Robust approach for AMC in frequency selective fading scenarios using unsupervised sparse-autoencoder-based deep neural network. IET Communications, 13(4), 423-432. doi:10.1049/iet-com.2018.5688
[50]Shen, C. (2018). A transdisciplinary review of deep learning research and its relevance for water resources scientists. Water Resources Research, 54(11), 8558-8593. doi:10.1029/2018WR022643
[51]Somu, N., M.R., G. R., Kalpana, V., Kirthivasan, K., & V.S., S. S. (2018). An improved robust heteroscedastic probabilistic neural network based trust prediction approach for cloud service selection. Neural Networks, 108, 339-354. doi:10.1016/j.neunet.2018.08.005
[52]Vlasov, A. I., Grigoriev, P. V., Krivoshein, A. I., Shakhnov, V. A., Filin, S. S., & Migalin, V. S. (2018). Smart management of technologies: Predictive maintenance of industrial equipment using wireless sensor networks*. Entrepreneurship and Sustainability Issues, 6(2), 489-502. doi:10.9770/jesi.2018.6.2(2) [53]Wang, L., Xi, Y., Sung, S., & Qiao, H. (2018). RNA-seq assistant: Machine learning based methods to identify more transcriptional regulated genes. BMC Genomics, 19(1) doi:10.1186/s12864-018-4932-2
[54]Yang, S., Wang, H., Zhang, Y., Li, P., Zhu, Y., & Hu, X. (2019). Semi-supervised representation learning via dual autoencoders for domain adaptation. Knowledge-Based Systems, doi:10.1016/j.knosys.2019.105161
[55]Yang, S., Zhang, Y., Zhu, Y., Li, P., & Hu, X. (2019). Representation learning via serial autoencoders for domain adaptation. Neurocomputing, 351, 1-9. doi:10.1016/j.neucom.2019.03.056
[56]Yuan, Y., Sun, Z., Wei, Z., & Jia, K. (2019). DeepMorse: A deep convolutional learning method for blind morse signal detection in wideband wireless spectrum. IEEE Access, 7, 80577-80587. doi:10.1109/ACCESS.2019.2923084