A SURVEY - BASED ON METHODS AND TECHNIQUES OF DATA MINING

(1)

Volume II, Issue X, October - 2015

110

A SURVEY - BASED ON METHODS AND

TECHNIQUES OF DATA MINING

A.Vishnu Priya Dr.K.Krishnaswari

M.E.Computer science and engineering Professor and Head of the department

Tamilnadu college of engineering Tamilnadu college of engineering [email protected] [email protected]

Abstract—Data mining is a powerful and a new field having various techniques. It converts the raw data into useful information in various research fields. Knowledge discovery in databases is a rapidly growing field, whose development is driven by strong research interests as well as urgent practical, social, and economical needs. While the last few years knowledge discovery tools have been used mainly in research environments, sophisticated software products are now rapidly emerging. Here in this paper, explained about the knowledge discovery database (KDD) and also survived the various methods of data mining and its techniques.

Index Terms—Data mining, information prediction, raw data, Knowledge discovery in databases, surveys.

I. INTRODUCTION

Data Mining or Knowledge Discovery is needed to make sense and use of data. Knowledge Discovery in Data is the non-trivial process of identifying valid, novel, potentially useful and ultimately understandable patterns in data [1].Data mining consists of more than collection and managing data; it also includes analysis and prediction. People are often do mistakes while analyzing or, possibly, when trying to establish relationships between multiple features. This makes it difficult for them to find solutions to certain problems. Machine learning can often be successfully applied to these problems, improving the efficiency of systems and the designs of machines. There are several applications for Machine Learning (ML), the most significant of which is data mining. Numerous ML applications involve tasks that can be set up as supervised. In the present paper, we have concentrated on the techniques necessary to do this. In particular, this work is concerned with classification problems in which the output of instances admits only discrete, unordered values.

Figure 1 Data Mining and the KDD Process(Source: Fayyad, et.al., 1996)

Based on figure 2, KDD process consists of iterative sequence methods as follows [7, 9]:

1. Selection: Selecting data relevant to the analysis task from the database

2. Preprocessing: Removing noise and inconsistent data; combining multiple data sources

3. Transformation: Transforming data into appropriate forms to perform data mining

4. Data mining: Choosing a data mining algorithm which is appropriate to pattern in the

data; Extracting data patterns

(2)

111

redundant or irrelevant patterns; Translating the useful patterns into terms that human understandable

II. Related work:

The survey based on the applications of data mining to organizational knowledge management for effective capturing, storing and retrieving, and transferring knowledge.The review concentrate on categories into four main groups: (i) knowledge resource; (ii) knowledge types and/or knowledge datasets; (iii) data mining tasks; and (iv) data mining techniques and applications used in KM.

Table 1 shows the tabulation on data mining and its techniques

Auth ors

Knowle dge Resourc es

Knowl edge Types

DM Tasks

DM techniques /

Applicatio ns

Lavra cet al. (2007 )

Healthca re

Public Health Data •The health-care provide rs databas e •The out-patient health-care statistic s databas e •The medical status Databa se

Classifi cation and clusteri ng

Clustering Methods: •Agglomer ative Classificati on; •Principal Component Analysis; •The Kolmogoro v-Smirnov Test; •The Quantile Range Test and Polar Ordination Classificati on C4.5

Hwan g et al. (2008 )

Healthca re (Clinical Diagnosi s)

• Knowle dge Conver sion

Depend ency Modeli ng

Data Mining Tool IBM Intelligent Miner

and Transfe r • Knowle dge Measur ement

Data Mining Techniques •Associatio n Analysis •Sequential Patterns Analysis Knowledge Manageme nt System (KMS) for Disease Classificati on Liao,

Chen & Wu (2008 )

Retailin g

Knowle dge Extracti on Custom er Knowle dge to product line and brand extensi on

Depend ency Modeli ng; Clusteri ng

Apriori Algorithms (Associatio n Analysis) K-means (Cluster Analysis)

Chen g, Lu &She u (2009 )

Financia l

• Knowle dge Sets – strings of data, models, parame ters, and reports • Knowle dge Sharing Process es to a Corpor ate Bond Classifi cation

Classifi cation Clusteri ng

•HybridSO FM/LVQ Classifier for Bond Rating. The Self-Organizing Feature Map (SOFM) o The Learning Vector Quantizatio n (LVQ) • Ontology of

(3)

Volume II, Issue X, October - 2015 112 nt System (FKMS) Prototype for Financial Research Purposes Li, Zhu & Pan (2010 )

Small & Middle Business es (SMBs): Food Compan y Knowle dge Seedin g &Kno wledge Cultiva ting Classifi cation

• Extension Theory - Extenics • Extension Data Mining (EDM) is combining Extenics with Data Mining Li et

al. (2010 ) Food Supply Chain Network s Knowle dge Base classfic ation Decision Tree Neural Network Early Warning and Proactive Control Systems (EW&PC) Cantu &Ceb allos (2010 ) Entrepre neurial Science Researc h Assets Classifi cation Data Mining Agents •Reasoning •Pattern Recognitio n Knowledge -based System (KBS) Knowledge and Information Network (KIN) Approach Semantic Web Technologi es Wu et al. (2010 )

Business KM Styles & KM perfor Classifi cation Bayesian Network Classifier Rough Set

mance Theory Liu & Lai (2011 ) Collabor ation and Teamwo rk Task (Worker ’s Log & Docume nts) Knowle dge Sharing Knowle dge Graph Knowle dge Flows (KFs) Clusteri ng Process Mining Technique Knowledge Flow Mining Group-based Knowledge Flows (GKFs) Ur-Rahm an& Hardi ng (2012 ) Construc tion Industry Textual Databa ses (Textua l Data Format s) Depend ency Modeli ng; Clusteri ng Text mining • Clustering •Apriori Association Rule Mining Multiple Key Term Phrasal Knowledge Sequences (MKTPKS)

Table 2. : Comparative Study of Different Algorithms Implemented to Reduce Scalability Issue

A) Clustering methods

(4)

Volume II, Issue X, October - 2015 113 Filtering Recomme ndation Algorith m Based on User Clusterin g and Item Clusterin g [2010] tradition al CF Quality improve ment More accurate and more scalable than tradition al

Error m m is to be handle d 3. Collabora tive Filtering Recomme ndation Algorith m Based on Cluster [2011]

Combin ation of CF and clusterin g to improve scalabili ty and sparsity problem Cold start Only algorith m is propos ed For better perfor mance, the cold start and privacy is to be taken into conside ration 4..Scaling

-up Item-based Collabora tive Filtering Recomme ndation Algorith m based on Hadoop [2011] Minimiz e commun ication cost Scalable Efficient perform ance Scalab ility to be Handle d more effecti vely Speed up Iso efficien cy On algorit hm side, optimiz e allocati on and executi on Process to be include d On applica tion side, scalabil ity is to be improv ed

B) Other methods

Title of the paper Conclu sion Draw backs Perfor mance Future work 1.Hybrid Recomm ender System with Temporal Informati on [2012] Scalabl e & Sparsit y and cold start proble ms are covered

Privac y issue is not handle d Mean Absolut e Error and Recom mendati on time Geograph ic informati on can be considere d and Improvin g time stamping for better performa nce RS for N-screen service 2.An Improved Profile based CF Scheme with Privacy [2011] Improv ed perfor mance And accurac y More scalabl e Sparsit y proble m Accura cy and quality are not so good compa red to CF schem e Mean Absolut e Error

(5)

114

III CONCLUSION

Data mining is a “decision support” process in which we search for patterns of information in data. Data mining techniques such as classification, clustering, prediction, association and sequential patterns etc. The commercial, educational and scientific applications are increasingly dependent on these methodologies. Decision trees are a reliable and effective decision making technique which provide high classification accuracy with a simple representation of collected KDD. This paper surveys different data mining techniques that can be used to efficiently and accurately capture user behavior. The paper also presents guidelines that show which techniques may be used more efficiently according to the task implemented by the application.

References

[1] An, X. & Wang, W. (2010). Knowledge management technologies and applications:

A literature review. IEEE, 138-141.

doi:10.1109/ICAMS.2010.5553046

[2] Berson, A., Smith, S.J. & Thearling, K. (1999). Building Data Mining Applications for CRM. New York: McGraw-Hill.

[3] Cantú, F.J. & Ceballos, H.G. (2010). A multi agent knowledge and information network approach for managing research assets. Expert Systems with Applications,

37(7), 5272-5284.

doi:10.1016/j.eswa.2010.01.012

[4] Cheng, H., Lu, Y. & Sheu, C. (2009). An

ontology-based business intelligence

application in a financial knowledge management system. Expert Systems with

Applications, 36, 3614–3622.

Doi:10.1016/j.eswa.2008.02.047

International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.2, No.5, September 2012

[5] Dalkir, K. (2005). Knowledge

Management in Theory and Practice. Boston: Butterworth-Heinemann.

[6] Dawei, J. (2011). The Application of

Date Mining in Knowledge

Management.2011 International Conference on Management of Commerce and e-Government, IEEE Computer Society, 7-9. doi: 10.1109/ICMeCG.2011.58

[7] Fayyad, U., Piatetsky-Shapiro, G. & Smyth, P. (1996). From Data Mining to Knowledge Discovery in Databases.AI Magazine, 17(3), 37-54.

[8] Gorunescu, F. (2011). Data Mining: Concepts, Models, and Techniques. India: Springer.

[9] Han, J. &Kamber, M. (2012). Data Mining: Concepts and Techniques. 3rd.ed. Boston: Morgan Kaufmann Publishers. [10] Hwang, H.G., Chang, I.C., Chen, F.J. & Wu, S.Y. (2008). Investigation of the

application of KMS for diseases

classifications: A study in a Taiwanese hospital. Expert Systems with Applications, 34(1),725-733.

doi:10.1016/j.eswa.2006.10.018

[11] Lavrac, N., Bohanec, M., Pur, A., Cestnik, B., Debeljak, M. &Kobler, A. (2007).Data mining and visualization for decision support and modeling of public health-care resources.Journal of Biomedical

Informatics, 40, 438-447.

doi:10.1016/j.jbi.2006.10.003

[12] Li, X., Zhu, Z. & Pan, X. (2010).

Knowledge cultivating for intelligent

decision making in small &middle

(6)

115

product line and brand extension in

retailing.Expert Systems with

Applications,34(3),1763-1776.doi:10.1016/j.eswa.2007.01.036

[15] Liao, S. (2003). Knowledge

management technologies and applications-literature review from 1995 to 2002. Expert Systems with Applications, 25, 155-164. doi:10.1016/S0957-4174(03)00043-5

[16] Liu, D.R. & Lai, C.H. (2011). Mining group-based knowledge flows for sharing

task knowledge.Decision Support

Systems,50(2), 370-386.

doi:10.1016/j.dss.2010.09.004

[17] Lee, M.R. & Chen, T.T. (2011). Revealing research themes and trends in knowledge management:From 1995 to 2010. Knowledge-Based

Systems.doi:10.1016/j.knosys.2011.11.016 [18] McInerney, C.R. & Koenig, M.E. (2011). Knowledge Management (KM)

Processes in Organizations:Theoretical

Foundations and Practice. USA: Morgan & Claypool

Publishers.doi:10.2200/S00323ED1V01Y20 1012ICR018

[19] McInerney, C. (2002). Knowledge Management and the Dynamic Nature of Knowledge.Journal of the American Society for Information Science and Technology, 53(12), 1009-1018.doi:10.1002/asi.10109 [20] Ngai, E., Xiu, L. &Chau, D. (2009). Application of data mining techniques in

customer relationship management: A

literature review and classification. Expert Systems with Applications, 36, 2592-2602. doi:10.1016/j.eswa.2008.02.021

[21] Ruggles, R.L. (ed.). (1997). Knowledge Management Tools. Boston: Butterworth-Heinemann.

[22] Sher, P.J. & Lee, V.C. (2004). Information technology as a facilitator for enhancing dynamic capabilities through knowledge management. Information & Management, 41, 933-945.

doi:10.1016/j.im.2003.06.004

[23] Tseng, S.M. (2008). The effects of

information technology on knowledge

management. Expert Systems with

Applications, 35, 150-160.

doi:10.1016/j.eswa.2007.06.011

[24] Ur-Rahman, N. & Harding, J.A. (2012).

Textual data mining for industrial

knowledge management and text

classification: A business oriented approach. Expert Systems with Applications, 39, 4729- 4739. doi:10.1016/j.eswa.2011.09.124 [25] Wang, F. & Fan, H. (2008). Investigation on Technology Systems for

Knowledge Management.IEEE, 1-4.

doi:10.1109/WiCom.2008.2716

[26] Wang, H. & Wang, S. (2008). A knowledge management approach to data mining process for business intelligence. Industrial Management & Data Systems, 108(5), 622-634.

[27] Wu, W., Lee, Y.T., Tseng, M.L. & Chiang, Y.H. (2010). Data mining for exploring hidden patterns

between KM and its

performance.Knowledge-Based Systems,

23, 397-401.

doi:10.1016/j.knosys.2010.01.014.

[28] Farman Ullah, Ghulam Sarwar, Sung Chang Lee, ”Hybrid Recommender System with Temporal Information”, (ICOIN), International Conference on Information Networking, IEEE 2012, pp. 421-425

[29] Siavash Ghodsi Moghaddam, Ali

Selamat, ”A Scalable Collaborative

Recommender Algorithm Based on User

Density- Based Clustering”, 3rd

international conference on Data Mining and

Intelligent Information Technology

Applications (ICMiA), IEEE 2011, pp. 246-249

(7)

116

[31] Xingyuan Li, ”Collaborative Filtering Recommendation Algorithm Based on

Cluster”, International Conference on

Business Computing and Global Information (BCGIN), IEEE 2011, pp. 645-648

[32] Jing Jiang, Jie Lu, Guangquan Zhang, Guodong Long, ”Scaling-up Item-based Collaborative Filtering Recommendation Algorithm based on Hadoop”, IEEE World Congress on SERVICES, 2011, pp. 490-497 [33] Gilda Moradi Dakhel, Mehregan Mahdavi, ”A New Collaborative Filtering Algorithm Using K-means Clustering and

Neighbors Voting”, International

Conference on Hybrid Intelligent Systems (HIS), IEEE 2011, pp. 179-184

[34] Kyung-Yong Chung, Daesung Lee and

Kuinam J. Kim, ”Categorization for

grouping associative items using data

mining in item-based collaborative

filtering”, International Conference on

Information Science and Applications

(ICISA), IEEE 2011, pp. 1-6

[35] Joseph A. Konstan John Riedl, ”Recommender systems: from algorithms to user experience”, User Modeling and User-Adapted Interaction, 2012, Vol. 22, pp. 101-123

[36] Fatih Gedikli, Faruk Bagdat, Mouzhi Ge, and Dietmar Jannach, ”RF-REC: Fast

and Accurate Computation of

Recommendations based on Rating

Frequencies”, IEEE 13th Conference on

Commerce and Enterprise Computing

(CEC), 2011, pp. 50-57

[37] Mozhgan Tavakolifard, Kevin C.

Almeroth, ”Social Computing: An

Intersection of Recommender Systems,

Trust/Reputation Systems, and Social

Networks”, IEEE Network, July/August 2012, Vol. 26, No. 4, pp. 53-58

[38] SongJie Gong, ”A Collaborative Filtering Recommendation Algorithm Based

on User Clustering and Item Clustering”, Journal of Software, , July 2010, Vol. 5, No. 7

A. Vishnu Priya

completed her

B.E(computer

science and

engineering) in

2014 from K.S.R. College of Engineering under Anna

University,Coimbatore. Currently she is pursuing M.E.( computer science and engineering) in Tamilnadu College of

Engineering under Anna

University,Chennai.

Mrs.Dr.K.Krishneswa ri completed her B.E

(computer science

and engineering) in 2000 from tamilnadu college of engineering under Bharathiyar

University, Coimbatore.M.E.(software

engineering) in 2005 from sri ramakrishna engineering college under Anna University,

Chennai. Ph.D degree from Anna

University, Coimbatore. She is working as an assistant professor and head of the department of computer science and

engineering at tamilnadu college of

engineering, Coimbatore. She is a member of CSI, ISTE. Her research areas include Image processing, Pattern recognition, Biometrics and having 13 years of teaching experience in engineering colleges and also have published 30 papers in journal.