A Monthly Double-Blind Peer Reviewed Refereed Open Access International e-Journal - Included in the International Serial Directories International Journal in IT and Engineering
http://www.ijmr.net.in email id- [email protected] Page 43 Knowledge Discovery Techniques
Dr. Jatinder Kumar
Assistant Prof., A. S. College, Khanna
Abstract: The knowledge discovery from huge data bases is the need of hour, in the present day era of globalization and liberalization. Inspite of huge investment done by companies on infrastructure related
to IT and ITes, the y are not able to reap the true benefit from this wealth of knowledge. Ins uch a
situation, the techniques of knowledge discovery powered by data mining can prove to be a boon for
the organizations. The techniques like association rule, sequence path analysis and clustering are
helping the data managers to get the true benefit from the data.
Keywords: Knowledge Generation, Retention, Data Mining.
INTRODUCTION
In the most basic sense the knowledge is defined as a derivative of information, which in turn is derived
from the data. Knowledge is the information or data which has been organized in a meaningful way.
Data is mostly unstructured, factual, and often times numeric, and reside in database management
systems. Information is factual, but structured and crisp, Knowledge is inferential, abstract, and is
needed to support decision making or hypothesis generation. Knowledge and knowledge management
are being used interchangeably. Knowledge management is being used in our society since time
immemorial. Its origin can be related to the time when languages were discovered. Human beings leant
how to communicate with each other and this was probably the first medium through which transfer of
information took place [8]. The transfer of knowledge took place from parents to children, teacher to
the taught and from educated to the uneducated through the verbal or non-verbal mediums of
communication [8]. This knowledge transfer was informal and was transferred to generations through
customs and folklore. Till the time, the society was not widespread; these means of knowledge
dissemination were successful even in the small organizations also. The major problem in this process
was faced when the organizations started growing in size and magnitude, the work pressures on
employees and ever changing demands of the industry posed by cut throat competitions started
increasing. The solution to this problem lies in implementation of techniques of knowledge
management and dissemination. The organizations are collecting huge amount of data through various
A Monthly Double-Blind Peer Reviewed Refereed Open Access International e-Journal - Included in the International Serial Directories International Journal in IT and Engineering
http://www.ijmr.net.in email id- [email protected] Page 44
bases only because they do not have some strong mechanism for knowledge collection, organization,
dissemination and hence are not able to use this knowledge for the survival and growth of the
organization. Knowledge is: “... a fluid mix of framed experiences, values, contextual information, and
expert insight that provides a framework for evaluating and incorporating new experiences and
information”
Karl Sveiby defined KM as, the art of creating value from an organizations intangible assets.
Davenport and Prusak defined KM as - KM is concerned with the exploitation and development of the
knowledge assets of an organization with a view to
furthering the knowledge objectives.
Despres, Charles and Chauvel, Daniele defined KM as, the purpose of knowledge management is to
enhance organizational performance by explicitly designing
and implementing tools, processes, systems, structures, and cultures to improve the creation, sharing,
and use of different types of knowledge that are critical
for decision-making.
According to the World Bank, KM is the management of knowledge through systematic sharing that can
enable one to build on earlier experience and obviate
the need for costly reworking of learning by making the same repetitive mistakes.
In the simplest sense, knowledge can be divided into two categories:
- Explicit knowledge
- Tacit knowledge
Explicit knowledge is formal knowledge that can be packaged as information and can be found in the
documents of an organization: reports, articles, manuals, patents, pictures, images etc. It can be
expressed in form of specific language and can be shared in the form of data or scientific formula. It can
be processed, transmitted and stored relatively easily.
Knowledge Discovery Techniques
Association Rules: This technique finds interesting associations or correlation relationships among large set of data items. Basically, if X and Y are sets of items, association rule mining discovers all
associations and correlations among data items where the presence of X in transaction implies the
presence of Y with a certain degree of confidence. The rule confidence is defined as the percentage
A Monthly Double-Blind Peer Reviewed Refereed Open Access International e-Journal - Included in the International Serial Directories International Journal in IT and Engineering
http://www.ijmr.net.in email id- [email protected] Page 45
Association rules discovery techniques can be generally applied to the web Mining research support
system. This technique can be performed to analyze the behavior of given user. Each transaction is
comprised of a set of URLs accessed by a user in one visit to the server. For example, using association
rule discovery techniques we can find correlations in OSS study such as the following:
--(40% of users who accessed the web page with URL.project1, also accessed/ project2;or
--(30% of users who accessed/ project1, download software in/ project1.)
With massive amounts of data continuously being collected from the web, companies can use
association rules to help making effective marketing strategies. In addition, association rule discovered
from WWW access logs can help organization design their web page. Association and correlation is
usually to find frequently used data items in the large data sets. It is the technique of finding patterns
where one event is connected to another event. This type of findings help businesses to make certain
decisions regarding
pricing, selling and to design the strategies for marketing, such as catalogue design, cross marketing and
customer shopping behavior analysis. However the number of possible Association Rules for a given
dataset is generally very large and a high proportion of the rules are usually of little value. The various
types of associations include:
- Multilevel association rule.
- Multidimensional association rule
- Quantitative association rule
- Direct association rule.
- Indirect association rule.
Clustering is technique to group together a set of items having similar characteristics. Clustering is applied in the web usage mining to find two kinds of interesting cluster: usage clusters and page
clusters. Usage clusters group users who exhibit similar browsing patterns. Clustering of client
information or data items can facilitate the development and execution of future marketing strategies.
Page clusters discover groups of pages having related content. This information is useful for Internet
search engines and Web assistance providers. By using clustering, a web site can dynamically create
HTML pages according to the user’s query and user’s information such as past needs. It can be said as
identification of similar classes of objects. This is the technique of combining the transactions with
A Monthly Double-Blind Peer Reviewed Refereed Open Access International e-Journal - Included in the International Serial Directories International Journal in IT and Engineering
http://www.ijmr.net.in email id- [email protected] Page 46
group. Classification approach can also be used as effective mean of distinguishing groups. So clustering
can be used as preprocessing approach for attribute subset selection and classification. For Example:
The customer of a given geographic location and of a particular job profile demand a particular set of
services, like in banking sector the customers from the service class always demand for the policy which
ensures more security as they are not intending to take risks, like wise the same set of service class
people in rural areas have a the preferences for some particular brands which may differ from their
counterparts in urban areas. This information will help the organization in cross-selling their products,
Instead of mass pitching a certain“hot” product, the bank’s customer service representatives can be
equipped with customer profiles enriched by data mining that help them to identify which products and
services are most relevant to callers. This technique will help the management in finding the solution of
80/20 principle of marketing, which says: Twenty per cent of your customers will provide you with 80
per cent of your profits, then problem is to identify those 20 % and the techniques of clustering will help
in achieving the same.
Classification is another extensively studies topic in data mining. Classification maps a data item into one of several predefined classes. One task of classification is to extract and select features that best
describe the properties of a given class or category. In web mining, classification rules allow one to
develop a profile of items belonging to a particular group according to their common attributes. For
example, classification on Source Forge access logs may lead to the discovery of relationships such as
following :
--(Users from universities who visit the site tend to be interested in the page/ project 1;or
--(50% of users who download software in /project2, were developers of Open Source Software and
worked in IT companies. Classification is the most commonly applied data mining technique, which
employs a set of pre-classified examples to develop a model that can classify the population of records
at large. Fraud detection and credit risk applications are particularly well suited to this type of analysis.
This approach frequently employs decision tree or neural network-based classification algorithms. The
data classification process involves learning and classification. In Learning the training data are analyzed
by classification algorithm. In classification test data are used to estimate the accuracy of the
classification rules. If the accuracy is acceptable, the rules can be applied to the new data tuples. For a
fraud detection application, this would include complete records of both fraudulent and valid activities
A Monthly Double-Blind Peer Reviewed Refereed Open Access International e-Journal - Included in the International Serial Directories International Journal in IT and Engineering
http://www.ijmr.net.in email id- [email protected] Page 47 CONCLUSION
Knowledge management and retrieval of relevant knowledge as the per the requirements of the
organization and hence to design the business strategies based on this knowledge is the need of the
hour. If you are not able to act before your competitors that means you are dead. The data mining
techniques can be of immense help to the banks and financial institutions in this arena for acquiring new
customers, fraud detection in real time, providing segment based products for customized services,
analysis of the customers’ purchase patterns over time for better retention and relationship, detection
of emerging trends to take proactive approach in a highly competitive market adding a lot more value to
existing products and services and launching of new product and service bundles.
References:
Hafizi Muhamad Ali, Nor Hayati Ahmad (2006) - Knowledge Management in Malaysian Banks: A New
Paradigm Journal of Knowledge Management Practice, Vol. 7, No. 3, September 2006
Usama Fayyad, Gregory Piatetsky-Shapiro, and Padhraic Smyth (1996)- From Data Mining to Knowledge
Discovery in Databases. AI Magazine Volume 17 Number 3 (1996)
Michael J. Shaw, Chandrasekar Subramaniam, Gek Woo Tan, Michael E. Welge (2001)-Knowledge
Management and Data Mining for Marketing. Elsevier Science B.V. Decision Support Systems 31 2001.
127–137
Syed Raiyan Ghani (2009)- Knowledge Management: Tools and Techniques. DESIDOC Journal of Library
& Information Technology, Vol. 29, No. 6, November 2009, pp. 33-38
Suresh Chandra Bihari (2011) - Technology In The Banking Sector In India, How Profitable It Is For The
Customer, Asian Journal of Business and Management Sciences, Vol. 1 No. 2 [56-76]
Ambika Bhatia (2011) A Frame Work for Decision Support System for the Banking Sector – An Empirical