Knowledge Fusion Methods: A Survey

(1)

2017 2nd International Conference on Software, Multimedia and Communication Engineering (SMCE 2017) ISBN: 978-1-60595-458-5

Knowledge Fusion Methods: A Survey

Xue-lu YU and Lin QIAO

*

Department of Computer Science and Technology, Tsinghua University, China

*Corresponding author

Keywords: Knowledge fusion, Facet fusion, Ontology.

Abstract. Knowledge fusion is to organize knowledge into a unified form from disparate sources in a highly dynamic way. This article analyzed, summarized and classified the knowledge fusion patterns and methods proposed in recent years. We also discussed the advantages and disadvantages of different kind of knowledge fusion patterns and methods.

Introduction

The rapid development of science and technology greatly increased the total amount of human knowledge. A large amount of knowledge is distributed in different data sources on the Internet and presented in different forms. To exploit these knowledge sources to the fullest extent and increase the usability of knowledge, knowledge needs to be organized into a unified form from disparate sources in a highly dynamic way, which is called knowledge fusion [1]. In the fields of massive information retrieval, knowledge management, e-learning, knowledge acquisition and so on, it is a common requirement to aggregate fragmented knowledge.

This article is a survey that summarizes the knowledge fusion patterns and knowledge fusion methods proposed in recent years. We classify them into different types by their different focus. There are two types of knowledge fusion patterns, One-result-list fusion pattern and Facet fusion pattern; four types of knowledge fusion methods, ontology-based, rule-based, statistical learning-based and context-based methods.

The rest of this paper is structured as follows: Section 2 introduces two types of knowledge fusion patterns and their corresponding examples. Section 3 introduces four knowledge fusion methods and their related researches.

Knowledge Fusion Patterns

One-result-list Fusion Pattern

The one-result-list pattern is based on the metadata or keywords to be searched. It retrieves relevant knowledge and information from the knowledge database distributed in different places. These results will be screened, disambiguated and integrated to get the final result list with a unified access interface [2]. This fusion pattern is common in indexing systems like search engines and digital library systems.

The downside of this approach is that it usually simply lists the search results without providing deep analysis and optimization of the knowledge data. Knowledge of different facets comes in a mixture, and it is difficult for the user to quickly and accurately acquire the knowledge of interest from the list of results.

Facet Fusion Pattern

(2)

This pattern typically uses the structured knowledge resources such as the RDF dataset [4] and Wikipedia to align and encapsulate the fragmented knowledge and aggregate it into a "theme-facet" structure according to the pre-defined schemas [5]. This pattern is used in Google Knowledge Graph [6].

Google Knowledge Graph collects its data from Freebase [7], CIA World Factbook [8], Wikipedia and so on. Google uses this knowledge base to enhance its search engine's search results with semantic-search information, so that users can get the knowledge of interest without having to navigate to other sites to get the information.

Facet fusion pattern not only gives all the relevant facets of a particular knowledge subject, but also allows the user to quickly locate the knowledge of interest by selecting the facet [9]. However, there are still some limitations in it. It cannot reflect the cognitive relationship between the subjects of knowledge and can lead to the learning disorientation problem. The aggregation of knowledge requires pre-defined schemas and can't be easily extended. It is also only suitable for the knowledge sources with semantic and structural features.

Knowledge Fusion Methods

Ontology-based

Ontology-based knowledge fusion methods aggregate knowledge from different sources into an ontology library with the techniques based on ontology theory. Ontology-based methods is effective in its specific field while have a large dependency on the ontology base of that field.

Ontologies are very common on Internet as they provide semantic to Web pages. However, this also makes ontologies overlap with each other and brings difficulty to use them. To use these ontologies, Natalya et al. [10] invented PROMPT algorithm which implemented semi-automated alignment and combination of ontologies. PROMPT gives the user suggestions and guidance to select ontologies. It also finds the possible solutions to conflicts. PROMPT can be used in different knowledge representation systems.

To deal with the uncertainty of ontologies, Laskey et al. [11] used probabilistic ontologies on multi-source knowledge fusion and introduces a probabilistic ontology language, PR-OWL, to support knowledge fusion on information sources with noises. PR-OWL is an extension on the OWL language, provides the representation of probabilistic ontologies. It uses Multi-Entity Bayesian Networks (MEBN) as the expressive Bayesian logic. PR-OWL allows the user to add probability information to present uncertain relations, which ensures semantic consistency with respect to issues of uncertainty or data quality.

Kuo et al. [12] propose a three-phase knowledge fusion framework which utilizes the shared vocabulary ontology. This method fuses all different knowledge bases into one using only the shared vocabulary ontology for facilitating the fusion process. The process includes 3 phases: preprocessing phase, partitioning phase and meta-knowledge construction phase.

Castano et al. [13] propose an ontology architecture for integration of heterogeneous XML data sources and introduced the ontology design techniques for deriving clusters of semantically related data source. With this method, ontology knowledge is organized into a semantic mapping scheme and a mediation scheme.

Rule-based

Rule-based knowledge fusion methods give the unified representation of knowledge from different sources with pre-defined or inducted rule set, and solve the redundancy and ambiguity problems. Rule-based methods depend on its rule set, while it can be difficult to get a complete rule set because of the diversity and non-normality of knowledge representation.

(3)

engines to form a single answer. They define different fusion methods to select the rules to meet user preference, including data quality-based, content rule-based, and mixed method-based fusion.

Yang et al. [15] build Flora-2, a rule-based knowledge representation and inference system, which provides infrastructure for semantic information reasoning, like OWL [16] or RDF. Flora-2 uses rules to represent problems to solve and provides basic features that are essential for modeling semantic information on the Web, including basic rules, classification, frame-based syntax, meta-information processing, reification, transactional updates and so on.

Most existing relational learning techniques make inferences based on the facts in the knowledge bases but ignored that the domain knowledge can also be used. Guo et al. [17] proposed the concept of rule-enhanced relational learning to improve the inference results by introducing domain knowledge. They introduced simple implication, argument type restriction and at-most-one restriction and restrict knowledge with these rules to get more evidence in improving the accuracy of the results of embedding-based TransE model [18] and the path ranking algorithm (PRA) [19].

Statistical Learning-based

Statistical learning-based knowledge fusion methods use supervised or semi-supervised learning methods to aggregate knowledge from different sources. The effectiveness of statistical learning-based methods depends on its training data set, which usually needs a large effort to build with high quality.

Dong et al. [20] proposed a multi-source knowledge fusion method based on supervised learning and built a large scale Knowledge Vault (KV) based on it. KV stores information as RDF triple (subject, predicate, object) and attach a confidence score to each tuple to indicate the correctness of this triple that KV considers. Knowledge Vault combines the noisy information from Web with the existing knowledge from knowledge base to reduce the error caused by the extraction process and error in knowledge source itself.

Eugene et al. [21] use a probability model, Bayesian Knowledge Bases (BKB) [22], as a framework to represent knowledge. They proposed a knowledge fusion method based on Bayesian Learning to aggregate multiple Bayesian Knowledge pieces into a BKB. This method makes it easier to aggregate information from different knowledge sources and preserves the information from all sources.

Sawaragi et al. [23] present an automation method for data and knowledge fusion in signal-understanding tasks. They attempt to automate a task of velocity analysis that derives propagating velocities of seismic waves from velocity spectrum data. In this method, the human expert’s procedural skills are implemented using a clustering analysis and a genetic algorithm.

Context-based

In common knowledge fusion systems, knowledge is aggregated from different sources but context is rarely used. However, context is actually very important to the knowledge extracted, so it should play a more important role in knowledge fusion systems [24]. Context-based methods have good performance in similar contexts while are usually less effective when context switch happens, so it is hard to re-deploy it into a different scenario and extra training will be required.

Alexander et al. [25] find some knowledge fusion patterns by studying context based knowledge fusion in decision support systems. The patterns include: (1) the effect of knowledge fusion process in the systems; (2) the preservation of context structure; (3) the preservation of multiple sources and the context autonomies. They found seven knowledge fusion modes: simple fusion, extension, instantiated fusion, configured fusion, adaptation, flat fusion and historical fusion.

Summary

(4)

inconsistency. Knowledge fusion can reduce the redundancy of knowledge base and improve retrieval efficiency. This article is a survey on the recent knowledge fusion patterns and methods. Each of these has its own advantages and disadvantages. The future knowledge fusion systems should integrate these methods to improve accuracy, shorten response time, and make more efficient use of fragmented knowledge.

Acknowledgement

This work is partially supported by the Key Program of National Natural Science Foundation of China, under Grant No. 61532015.

References

[1] Preece A, Hui K, Gray A, et al. The KRAFT architecture for knowledge fusion and transformation[J]. Knowledge-Based Systems, 2000, 13(2): 113-120.

[2] Nikolov A, Uren V, Motta E. KnoFuss: A comprehensive architecture for knowledge fusion[C]//Proceedings of the 4th international conference on Knowledge capture. ACM, 2007: 185-186.

[3] Hjørland B. Facet analysis: The logical approach to knowledge organization[J]. Information processing & management, 2013, 49(2): 545-557.

[4] Bizer C, Heath T, Berners-Lee T. Linked data-the story so far[J]. Semantic services, interoperability and web applications: emerging concepts, 2009: 205-227.

[5] Euzenat J, Shvaiko P. Ontology matching[M]. Heidelberg: Springer, 2007.

[6] Information on https://blog.google/products/search/introducing-knowledge-graph-things-not/

[7] Bollacker K, Evans C, Paritosh P, et al. Freebase: a collaboratively created graph database for structuring human knowledge[C]//Proceedings of the 2008 ACM SIGMOD international conference on Management of data. ACM, 2008: 1247-1250.

[8] Information on https://www.cia.gov/library/ publications/the-world-factbook/

[9] Tunkelang D. Faceted search[J]. Synthesis lectures on information concepts, retrieval, and services, 2009, 1(1): 1-80.

[10] Noy N F, Musen M A. Algorithm and tool for automated ontology merging and alignment[C]//Proceedings of the 17th National Conference on Artificial Intelligence (AAAI-00). Available as SMI technical report SMI-2000-0831. 2000.

[11] Laskey K B, Costa P C G, Janssen T. Probabilistic ontologies for knowledge fusion[C]//Information Fusion, 2008 11th International Conference on. IEEE, 2008: 1-8.

[12] Kuo T T, Tseng S S, Lin Y T. Ontology-based knowledge fusion framework using graph partitioning[C]//International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems. Springer Berlin Heidelberg, 2003: 11-20.

[13] Castano S, Ferrara A. Knowledge representation and transformation in ontology-based data integration[J]. Transformation for the Semantic Web KTSW 2002, 2002, 21: 51.

[14] Nengfu X, Wensheng W, Xiaorong Y, et al. Rule-based agricultural knowledge fusion in web information integration[J]. Sensor Letters, 2012, 10(1-2): 635-638.

(5)

[16] McGuinness D L, Van Harmelen F. OWL web ontology language overview[J]. W3C recommendation, 2004, 10(10): 2004.

[17] Guo S, Ding B, Wang Q, et al. Knowledge Base Completion via Rule-Enhanced Relational Learning[M]//Knowledge Graph and Semantic Computing: Semantic, Knowledge, and Linked Big Data. Springer Singapore, 2016: 219-227.

[18] Bordes A, Usunier N, Garcia-Duran A, et al. Translating embeddings for modeling multi-relational data[C]//Advances in neural information processing systems. 2013: 2787-2795.

[19] Lao N, Cohen W W. Relational retrieval using a combination of path-constrained random walks[J]. Machine learning, 2010, 81(1): 53-67.

[20] Dong X, Gabrilovich E, Heitz G, et al. Knowledge vault: A web-scale approach to probabilistic knowledge fusion[C]//Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2014: 601-610.

[21] Santos Jr E, Wilkinson J T, Santos E E. Bayesian Knowledge Fusion[C]//FLAIRS Conference. 2009.

[22] Shimony S E, Santos Jr E, Rosen T. Independence Semantics for BKBs[C]//FLAIRS Conference. 2000: 308-312.

[23] Sawaragi T, Umemura J, Katai O, et al. Fusing multiple data and knowledge sources for signal understanding by genetic algorithm[J]. IEEE Transactions on Industrial Electronics, 1996, 43(3): 411-421.

[24] Snidaro L, Garcia-Herrera J, Llinas J, et al. Context-Enhanced Information Fusion[J]. 2016.