A Framework for Ontology-Based Knowledge Management System

(1)

A Framework for Ontology-Based Knowledge Management System

Jiangning WU

Institute of Systems Engineering, Dalian University of Technology, Dalian, 116024, China E-mail: [email protected]

Abstract

Knowledge management is a crucial activity in organizations since knowledge is considered the most important asset that enables sustainable competitive advantage in very dynamic and competitive markets. The development of effective knowledge management system (KMS) has become an important issue in applied domains. In this paper, we present a framework of ontology-based KMS that mainly focuses on performing the activity for projects and domain experts matching， in which system architecture, ontology building, and semantic similarity calculation are addressed respectively. At last, a simple experiment is implemented to evaluate the effectiveness of the proposed ontology-based KMS.

Keywords: Knowledge management system,Ontology, Semantic Similarity, Matching

1. Introduction

Knowledge management is a crucial activity in organizations since knowledge is considered the most important asset that enables sustainable competitive advantage in very dynamic and competitive markets. The development of effective knowledge management system (KMS) has become an important issue in applied domains.

The goal of a general KMS is to provide the right knowledge to the right people at the right time and in the right format. Through KMSs, users can access and utilize the rich sources of data, information and knowledge stored in different forms. Furthermore KMSs facilitate people sharing knowledge and hence creating new knowledge. Traditional KMSs are based on the existing data repositories and users’ needs. For knowledge discovering, users submit queries to the system and receive knowledge by keyword match. But keyword-based systems cannot understand the meaning of data. They are inflexible and stifle for knowledge creation.

Fortunately, the emerging ontology-based KMSs can find the content-oriented knowledge that people really want due to the fact that the domain ontology is powerful in knowledge representation and associated inference. Ontologies are meant to provide an understanding of the static domain knowledge that facilitates knowledge retrieval, store, sharing, and dissemination.

For KMSs, ontology can be regarded as the classification of knowledge [1]. That is to say, ontology defines shared vocabulary for facilitating knowledge communication, storing, searching and sharing in knowledge management systems.

In this paper, we propose a framework of ontology-based KMS that mainly focuses on performing the activity for projects and domain experts matching. In project management, it is not easy to choose an appropriate domain expert for a certain project if experts’ research areas and the contents of the projects are not understood very well. It is also a hard work for matching projects and domain experts when the number of projects is much high. So there is a great need for the

(2)

effective technologies that can capture the knowledge involved in both domain experts and projects. The ontology-based KMS proposed in this paper tries to solve this problem. The main idea is that both the experts’ research areas and the contents of the projects are represented by separated ontologies based on the same standard subject category of China. So the matching problem is transformed into calculating the semantic similarities between ontologies. Once the similarity values are worked out, the matched results can then be obtained and ranked accordingly.

The two main barriers faced our KMS are ontology building and similarity calculating. In the following sections we will present our approaches to solving these two problems in detail.

2. Ontologies in Knowledge Representation

Research on knowledge representation has been a focus of AI and IS disciplines for a number of years. Much of contemporary research extends the seminal work within AI discipline, of which research in ontology has been one of the beneficiaries. Research in computational ontology has traditionally sought to develop structure for the purpose of knowledge subsumption. The goal of such research aims to develop generic, reusable representations of domain ontology. Much of ontology research considers a deep development approach necessary to provide the extensive knowledge and reasoning required for expert level queries [3].

In [4], T.R. Gruber pointed out: An ontology is an explicit specification of a conceptualization. The term is borrowed from philosophy, where an ontology is a systematic account of existence. For knowledge-based systems, what “exists” is exactly that which can be represented. When the knowledge of a domain is represented in a declarative formalism, the set of objects that can be represented is called the universe of discourse. This set of objects, and the describable relationships among them, are reflected in the representational vocabulary with which a knowledge-based program represents knowledge. Thus, we can describe the ontology of a program by defining a set of representational terms. In such an ontology, definitions associate the names of entities in the universe of discourse (e.g., classes, relations, functions, or other objects) with human-readable text describing what the names are meant to denote, and formal axioms that constrain the interpretation and well-formed use of these terms. In short, an ontology is a vocabulary of entities, classes, properties, functions and their relationships.

So far, domain ontologies are thought to be capable of significantly improving knowledge management practices. They provide conceptual abstraction and differentiated relationships, specifically separate concepts from lexicalizations and thereby better reflect the structure of human understanding of a domain. In ontologies, the semantics are developed through ensuring that each concept within the domain is uniquely and precisely defined and by specifying elaborated relationships among the concepts. The relationships in an ontology are explicitly named and developed with specification of rules and constraints so that they reflect the context of the domain for which the knowledge is modeled.

In our work, the ontology is a collection of concepts and their relationships, and serves as a conceptualized vocabulary to describe an application domain. It is created by means of Protégé¹, which is developed by Stanford University.

The initial concepts in our ontology are broadly extracted from the standard subject category

1 Protégé: http://protege.stanford.edu

(3)

of China. To make the selected concepts more suitable for our concerned projects and domain experts, a tool called Concept Filler is developed, which is simply an interface to help domain experts assign proper concepts and weights manually, see also Figure 1. When specifying the concept, the corresponding weight value ranging from 0 to 1 is also assigned to itself aiming to distinguish its importance, the bigger value the more importance.

Fig. 1. The interface for specifying concepts by the concept filler

As to relationships between concepts, many types can be found in ontology construction as we have known, such as IS-A relation, Kind-of relation, Part-of relation, Substance-of relation, and so on. Since IS-A (hyponym / hypernym) relation is the most common concern in ontology presentation, only this kind of relation is therefore introduced in our research for simplification.

This is also the need for calculating the similarity between concepts presented below. After specifying the concepts and their relationships based on contextual knowledge involved in projects and domain experts, the hierarchical ontology-concept tree can be built through Protégé tool. The following work is how to calculate the similarity between concept trees and realize the matching process.

3. Similarity Calculating and Matching Process

As mentioned above, there are many kinds of relationships between concepts in ontology creation.

Calculating the similarity between concepts based on the complex relationships is a challenging work. But unfortunately no method can deal with the above problem effectively up to now.

Considering some similarity calculation methods have been developed based on the simplest relation - IS-A relation [8], only this kind of relation is retained in our study. And we let the other relations be the future research topics.

3.1 Node-based Approach and Edge-based Approach

Here we want to discuss two types of methods for calculating semantic similarities between concepts, they are node-based method and edge-based method [8].

(4)

Resnik used information content to measure the similarity [9, 10]. His point is that the more information content two concepts share, the more similarity two concepts have. The similarity of two concepts c1 and c2 is quantified as

1 2

( , )

( , ) max [ log ( )]

c Sup c c

sim c c p c

= ∈ − , (1)

where Sup( c1 , c2 ) is the set of concepts whose child concepts contain c1 and c2, p c( ) is the probability of encountering an instance of concept c, and

( ) freq c( )

p c = N , (2)

where freq c( ) is simply the statistical frequency of concept c, and N is the total number of concepts contained in the given document. Considering many inherited concepts may have more than one senses, similarity calculation should be modified as

)]

, ( [ max )

,

( ₁ ₂

) ( ), 2 (

1

2 2 1 1 2

1 c simc c

c sim

t sen c t sen c t t

∈

= ∈ , (3)

where sen(t) means the set of possible different concepts denoted by the same term.

Another important method to quantify the similarity is the edge-based approach. Leacock and Chodorow summed up the shortest path length and converted this statistical distance to the similarity measure [11]:

2 ]

) , ( min log[

) , (

max 2 ) 1

( ), ( 2

1

2 2 1 2 1

1

d c c len c

c

sim ^t ^t ^c ^sen^t ^c ^sen^t

− ×

= ^∈ ^∈ ,

(4)

where len c c( ,₁ ₂) is the number of edges along the shortest path between concepts c1 and c2, and dmax is the maximum depth of the ontology hierarchy.

3.2 An Integrated and Improved Approach

By analyzing both the node-based method and the edge-based method, three main shortcomings can be found below.

1. Both node-based and edge-based methods only simply consider two concepts in the same concept tree without expanding to two lists of concepts in different concept trees. However the fact is when we describe different documents in the same domain using ontology structures, homogeneous but heteromorphic concept trees are often formed. The matching problem to be solved here is calculating the similarity between two different concept trees, not between two concepts in the same tree. So we have to develop a new method that can calculate the similarities between two lists of concepts in different trees, by which the quantified similarity value can show how similar the documents are.

2. The node-based method does not concern the distance between concepts. Take a four-hierarchy concept tree for example, as shown in Figure 2. If concepts C21, C31 and C36 have the sense and the equal frequency that determines the same information content, we may get the following result according to the node-based method

sim(C21, C31) = sim(C21, C36). (5)

(5)

However, it is obvious, from Figure 2, to see that concepts C21 and C31 are more similar since C31

is the direct inheritor of C21.

Fig.2. An example of four-hierarchy concept tree

3. In contrast to the node-based method, the edge-based method only considers the relationships between concepts and ignores the weights of concepts. For example in Figure 2, both concepts C31 and C32 respectively have only one edge with C21. According to the edge-base method, the same similarity value can be obtained. That is

Sim(C31, C21) = Sim(C32, C21). (6)

But, if C31 has bigger weight than C32, C31 is considered to be more important and the corresponding similarity value between C31 and C21 should be greater.

To overcome the shortcomings of both node-based and edge-based methods, a new integrated method is proposed in this paper in order to calculate the similarity between two documents.

Before conducting the proposed method, the documents related to projects and domain experts should be formalized first that results in two vectors containing the concepts with their frequencies.

Suppose Doc(i) describes the ith project, and Doc(j) describes the jth domain expert, the formalization results are:

Doc(i) = {ci1, ci2, …, cim} , (7)

Doc(j) = {cj1, cj2, …, cjn} , (8)

with their corresponding frequencies:

W(i) = {wi1, wi2, …, wim} , (9)

W(j) = {wj1, wj2, …, wjn} . (10)

For each pair of concepts (cis, cjt) in the concept tree, there must exist a concept c, for which both cis and cjt are child concepts, and the path length is minimum. Concept c is the nearest parent concept for both cis and cjt. The similarity between cis and cjt can be calculated by

1) ) , ( 1 ) , log( ( )

( + +

− +

=

jt jt is

is jt

is len cc

w c

c len c w

, c

sim , (11)

where len(c, cis) is the path length between c and cis. Considering multiple senses of the concepts, we improve the calculation equation as:

)]

, ( [ max )

,

( c sen(t),c sen(t) is jt t

jt t

is c simc c

c sim

jt jt is st is jt

∈

= ∈ , (12)

where tis is the sense of concept cis. Then we calculate the maximum similarity value among all Layer 1

C11

Layer 2

Layer 4 Layer 3 C22

C21 C23

C36

C31 C32 C33 C34 C35

C41 C42 C43 C44

(6)

candidate concepts:

)]

, ( max[sim c_is^t^is c^t_jt^jt

SIM = . (13)

Thus, the similarity between two documents can be calculated by using the following formula:

n m

SIM c c j

Doc i Doc sim

m

s n

t

t jt t is

is jt

=

∑∑

×

=1 =1

) , sim(

)) ( ), (

( .

(14)

Once we get the similarity values between each pair of documents in both project collection and domain expert collection, the matched results are therefore ranked and returned to end users.

4. A Framework of Ontology-based KMS

Our ontology-based KMS encompasses four main modules as shown in Figure 3, they are:

Ontologies Building, Documents Formalization, Similarity Calculation and User Interface.

Ontology Building: We adopt Protégé, developed by Stanford University, to build our domain ontologies. The concepts and relations are from the standard subject category of China.

Document Formalization: Benefiting from the ontologies that we have built, we can use the concepts to formalize the documents containing information about projects and domain experts.

Similarity Calculation: By conducting the proposed integrated method to the concept trees corresponding to projects and domain experts respectively, we can calculate the similarities between them and rank the candidate domain experts afterwards. As a result, the most appropriate domain expert can be obtained.

User Interface: This matching system implements the typical client-server paradigm. End users can access and query the system from the Internet, while domain experts or system administrators can manipulate the formalization and ontology building process.

Fig. 3. The architecture for ontology-based KMS

Expert Documents

Project Documents

Similarity Calculation

Ontology Library Documents Formalization

Ontologies Building

Database

User Interface Expert

Concept Trees

Project Concept Trees

Result List

Internet

Users

(7)

5. Evaluation

We carry out a series of experiments to compare and evaluate edge-based method, node-based method and our integrated method. Generally two measures precision and recall are used to evaluate the effectiveness of the information retrieval system. In our research, we also use these two measures to verify our ontology-based KMS. Let R be the set of relevant documents, and A be the answer set of documents. The precision and recall are defined as follows respectively:

| |

100%

| |

A R

Precision A

= ^I × ,

(21)

| |

100%

| |

A R

Recall R

= ^I × .

(22)

In the experiment, we collect around 300 domain experts (including professors, engineers, researchers, etc) and over 500 projects within the domain of computer science and engineering.

Table 1 shows the different precision and recall results using three different methods with different number of projects. Also the comparison charts are given in Figures 4 and 5 respectively.

Table 1. Precision and recall comparison. E-based denotes edge-based approach, N-based denotes node-based approach, Integrated denotes integrated approach.

Precision (%) Recall (%)

Projects

E-based N-based Integrated E-based N-based Integrated 1 100 20.65 25.99 30.71 30.56 32.28 39.04 2 200 22.32 25.85 28.93 31.00 33.98 34.73 3 300 27.55 19.32 32.79 23.56 30.46 42.92 4 400 20.38 27.61 31.59 30.87 35.43 32.96 5 500 23.40 23.44 29.63 33.70 43.75 49.74

Fig. 4. A comparison of precision among three different methods Precision Comparison Chart

0.00%

10.00%

20.00%

30.00%

40.00%

1 2 3 4 5

E-based N-based Integrat ed

(8)

Fig. 5 A comparison of recall among three different methods 6. Conclusions

In this paper, we present an ontology-based method to match projects and domain experts. The prototype system we developed contains four modules: Ontology building, Document formalization, Similarity calculation and User interface. Specifically, we discuss node-based and edge-based approaches to computing the semantic similarity, and propose an integrated and improved approach to calculating the semantic similarity between two documents. The experimental results show that our ontology-based KMS performing the activity for projects and domain experts matching can reach better recall and precision.

As mentioned previously, only the simplest relation “IS-A relation” is considered in our study. When dealing with the more complex ontology whose concepts are restricted by logic or axiom, our method is not powerful enough to describe the real semantic meaning by merely considering the hierarchical structure. So the future work should focus on the other kinds of relations that are used in ontology construction. In other words, it will be an exciting and challenging work for us to compute the semantic similarity upon various relations in the future.

References:

[1] N. Guarino, Understanding, Building, and Sing Ontologies: A Commentary to Using Explicit Ontologies in KBS Development, International Journal of Human and Computer Studies, 46:

293-310, 1997.

[2] D.E. O'Leary, Enterprise Knowledge Management, Computer, 31(3): 54-61, 1998.

[3] M.S. Fox and M. Gruninger, Enterprise Modeling, AI Magazine, 19(3): 109-121, 1998.

[4] T.R. Gruber, A translation approach to portable ontologies. Knowledge Acquisition, 5(2):

199-220, 1993.

[5] S. Staab, H.-P. Schnurr, R. Studer, Y. Sure, Knowledge processes and ontologies, IEEE intelligent Systems, 16(1): 26-34, 2001.

[6] N. Guarino, P. Giaretta, Ontologies and knowledge bases: Towards a terminological clarification, In N.J.I. Mars (Ed.), Towards Very Large Knowledge Bases, IOS Press, 1995.

[7] P. Resnik, Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural language, Journal of Artificial Intelligence

Recall Comparison Chart

0.00%

10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

1 2 3 4 5

E-based N-based Integrated

(9)

Research, 11: 95-130, 1999.

[8] J. Jiang and D. Conrath, Semantic similarity based on corpus statistics and lexical taxonomy, In Proceedings on International Conference on Research in Computational Linguistics, Taiwan, 1997, pp. 19-33.

[9] P. Resnick, Using information content to evaluate semantic similarity in a taxonomy, In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence (IJCAI-95), Canada, 1995, pp. 448-453.

[10] P. Resnik, Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural language, Journal of Artificial Intelligence Research, 11: 95-130, 1999.

[11] C. Leacock and M. Chodorow, Filling in a sparse training space for word sense identification, ms, 1994.