Knowledge Discovery Techniques

(1)

A Monthly Double-Blind Peer Reviewed Refereed Open Access International e-Journal - Included in the International Serial Directories International Journal in IT and Engineering

http://www.ijmr.net.in email id- [email protected] Page 43 Knowledge Discovery Techniques

Dr. Jatinder Kumar

Assistant Prof., A. S. College, Khanna

Abstract: The knowledge discovery from huge data bases is the need of hour, in the present day era of globalization and liberalization. Inspite of huge investment done by companies on infrastructure related

to IT and ITes, the y are not able to reap the true benefit from this wealth of knowledge. Ins uch a

situation, the techniques of knowledge discovery powered by data mining can prove to be a boon for

the organizations. The techniques like association rule, sequence path analysis and clustering are

helping the data managers to get the true benefit from the data.

Keywords: Knowledge Generation, Retention, Data Mining.

INTRODUCTION

In the most basic sense the knowledge is defined as a derivative of information, which in turn is derived

from the data. Knowledge is the information or data which has been organized in a meaningful way.

Data is mostly unstructured, factual, and often times numeric, and reside in database management

systems. Information is factual, but structured and crisp, Knowledge is inferential, abstract, and is

needed to support decision making or hypothesis generation. Knowledge and knowledge management

are being used interchangeably. Knowledge management is being used in our society since time

immemorial. Its origin can be related to the time when languages were discovered. Human beings leant

how to communicate with each other and this was probably the first medium through which transfer of

information took place [8]. The transfer of knowledge took place from parents to children, teacher to

the taught and from educated to the uneducated through the verbal or non-verbal mediums of

communication [8]. This knowledge transfer was informal and was transferred to generations through

customs and folklore. Till the time, the society was not widespread; these means of knowledge

dissemination were successful even in the small organizations also. The major problem in this process

was faced when the organizations started growing in size and magnitude, the work pressures on

employees and ever changing demands of the industry posed by cut throat competitions started

increasing. The solution to this problem lies in implementation of techniques of knowledge

management and dissemination. The organizations are collecting huge amount of data through various

(2)

http://www.ijmr.net.in email id- [email protected] Page 44

bases only because they do not have some strong mechanism for knowledge collection, organization,

dissemination and hence are not able to use this knowledge for the survival and growth of the

organization. Knowledge is: “... a fluid mix of framed experiences, values, contextual information, and

expert insight that provides a framework for evaluating and incorporating new experiences and

information”

Karl Sveiby defined KM as, the art of creating value from an organizations intangible assets.

Davenport and Prusak defined KM as - KM is concerned with the exploitation and development of the

knowledge assets of an organization with a view to

furthering the knowledge objectives.

Despres, Charles and Chauvel, Daniele defined KM as, the purpose of knowledge management is to

enhance organizational performance by explicitly designing

and implementing tools, processes, systems, structures, and cultures to improve the creation, sharing,

and use of different types of knowledge that are critical

for decision-making.

According to the World Bank, KM is the management of knowledge through systematic sharing that can

enable one to build on earlier experience and obviate

the need for costly reworking of learning by making the same repetitive mistakes.

In the simplest sense, knowledge can be divided into two categories:

- Explicit knowledge

- Tacit knowledge

Explicit knowledge is formal knowledge that can be packaged as information and can be found in the

documents of an organization: reports, articles, manuals, patents, pictures, images etc. It can be

expressed in form of specific language and can be shared in the form of data or scientific formula. It can

be processed, transmitted and stored relatively easily.

Knowledge Discovery Techniques

 Association Rules: This technique finds interesting associations or correlation relationships among large set of data items. Basically, if X and Y are sets of items, association rule mining discovers all

associations and correlations among data items where the presence of X in transaction implies the

presence of Y with a certain degree of confidence. The rule confidence is defined as the percentage

(3)

Association rules discovery techniques can be generally applied to the web Mining research support

system. This technique can be performed to analyze the behavior of given user. Each transaction is

comprised of a set of URLs accessed by a user in one visit to the server. For example, using association

rule discovery techniques we can find correlations in OSS study such as the following:

--(40% of users who accessed the web page with URL.project1, also accessed/ project2;or

--(30% of users who accessed/ project1, download software in/ project1.)

With massive amounts of data continuously being collected from the web, companies can use

association rules to help making effective marketing strategies. In addition, association rule discovered

from WWW access logs can help organization design their web page. Association and correlation is

usually to find frequently used data items in the large data sets. It is the technique of finding patterns

where one event is connected to another event. This type of findings help businesses to make certain

decisions regarding

pricing, selling and to design the strategies for marketing, such as catalogue design, cross marketing and

customer shopping behavior analysis. However the number of possible Association Rules for a given

dataset is generally very large and a high proportion of the rules are usually of little value. The various

types of associations include:

- Multilevel association rule.

- Multidimensional association rule

- Quantitative association rule

- Direct association rule.

- Indirect association rule.

Clustering is technique to group together a set of items having similar characteristics. Clustering is applied in the web usage mining to find two kinds of interesting cluster: usage clusters and page

clusters. Usage clusters group users who exhibit similar browsing patterns. Clustering of client

information or data items can facilitate the development and execution of future marketing strategies.

Page clusters discover groups of pages having related content. This information is useful for Internet

search engines and Web assistance providers. By using clustering, a web site can dynamically create

HTML pages according to the user’s query and user’s information such as past needs. It can be said as

identification of similar classes of objects. This is the technique of combining the transactions with

(4)

group. Classification approach can also be used as effective mean of distinguishing groups. So clustering

can be used as preprocessing approach for attribute subset selection and classification. For Example:

The customer of a given geographic location and of a particular job profile demand a particular set of

services, like in banking sector the customers from the service class always demand for the policy which

ensures more security as they are not intending to take risks, like wise the same set of service class

people in rural areas have a the preferences for some particular brands which may differ from their

counterparts in urban areas. This information will help the organization in cross-selling their products,

Instead of mass pitching a certain“hot” product, the bank’s customer service representatives can be

equipped with customer profiles enriched by data mining that help them to identify which products and

services are most relevant to callers. This technique will help the management in finding the solution of

80/20 principle of marketing, which says: Twenty per cent of your customers will provide you with 80

per cent of your profits, then problem is to identify those 20 % and the techniques of clustering will help

in achieving the same.

Classification is another extensively studies topic in data mining. Classification maps a data item into one of several predefined classes. One task of classification is to extract and select features that best

describe the properties of a given class or category. In web mining, classification rules allow one to

develop a profile of items belonging to a particular group according to their common attributes. For

example, classification on Source Forge access logs may lead to the discovery of relationships such as

following :

--(Users from universities who visit the site tend to be interested in the page/ project 1;or

--(50% of users who download software in /project2, were developers of Open Source Software and

worked in IT companies. Classification is the most commonly applied data mining technique, which

employs a set of pre-classified examples to develop a model that can classify the population of records

at large. Fraud detection and credit risk applications are particularly well suited to this type of analysis.

This approach frequently employs decision tree or neural network-based classification algorithms. The

data classification process involves learning and classification. In Learning the training data are analyzed

by classification algorithm. In classification test data are used to estimate the accuracy of the

classification rules. If the accuracy is acceptable, the rules can be applied to the new data tuples. For a

fraud detection application, this would include complete records of both fraudulent and valid activities

(5)

http://www.ijmr.net.in email id- [email protected] Page 47 CONCLUSION

Knowledge management and retrieval of relevant knowledge as the per the requirements of the

organization and hence to design the business strategies based on this knowledge is the need of the

hour. If you are not able to act before your competitors that means you are dead. The data mining

techniques can be of immense help to the banks and financial institutions in this arena for acquiring new

customers, fraud detection in real time, providing segment based products for customized services,

analysis of the customers’ purchase patterns over time for better retention and relationship, detection

of emerging trends to take proactive approach in a highly competitive market adding a lot more value to

existing products and services and launching of new product and service bundles.

References:

Hafizi Muhamad Ali, Nor Hayati Ahmad (2006) - Knowledge Management in Malaysian Banks: A New

Paradigm Journal of Knowledge Management Practice, Vol. 7, No. 3, September 2006

Usama Fayyad, Gregory Piatetsky-Shapiro, and Padhraic Smyth (1996)- From Data Mining to Knowledge

Discovery in Databases. AI Magazine Volume 17 Number 3 (1996)

Michael J. Shaw, Chandrasekar Subramaniam, Gek Woo Tan, Michael E. Welge (2001)-Knowledge

Management and Data Mining for Marketing. Elsevier Science B.V. Decision Support Systems 31 2001.

127–137

Syed Raiyan Ghani (2009)- Knowledge Management: Tools and Techniques. DESIDOC Journal of Library

& Information Technology, Vol. 29, No. 6, November 2009, pp. 33-38

Suresh Chandra Bihari (2011) - Technology In The Banking Sector In India, How Profitable It Is For The

Customer, Asian Journal of Business and Management Sciences, Vol. 1 No. 2 [56-76]

Ambika Bhatia (2011) A Frame Work for Decision Support System for the Banking Sector – An Empirical