A SURVEY ON DATA MINING FOR SEMANTIC WEB DATA

(1)

Available Online at www.ijpret.com 1419

INTERNATIONAL JOURNAL OF PURE AND

APPLIED RESEARCH IN ENGINEERING AND

TECHNOLOGY

A PATH FOR HORIZING YOUR INNOVATIVE WORK

A SURVEY ON DATA MINING FOR SEMANTIC WEB DATA

SNEHA D. JAWANJAL

Student of Master of Engineering in (CE), Sipna college of Engineering and Technology, Amravati, India.

Accepted Date: 05/03/2015; Published Date: 01/05/2015

Abstract:The amount of semantic data for wide range of applications is constantly growing. This semantic data is of complex and heterogeneous nature, creating new challenges in the area of data mining research. The integration of two scientific research areas Semantic Web and Web Mining is known as Semantic Web Mining and when data come in concerned with this we can say it as Semantic Web Data Mining. The huge amount of Semantic data became a perfect target for many researchers to apply Data Mining techniques on it. It shows the positive effects of Semantic Web Mining, and propose some approaches to deal with the very complex and heterogeneous information and knowledge, which are produced by Web services. Association Rule Mining is one of the data mining technique, defined as extracting the interesting relation, which are required by target, among large amount of transactions. This technique is more concerned about data representation, so it is more challenging data mining technique to be applied on semantic web data. This paper gives a brief survey of research which are currently perform in this area, and propose a novel method to provide a way to address some challenges that enable processing huge volumes of semantic data and perform perfect Semantic web data mining.

Keywords:Association rule, Data mining, Semantic data, web mining

Corresponding Author: MS. SNEHA D. JAWANJAL

Access Online On:

www.ijpret.com

How to Cite This Article:

Sneha D. Jawanjal, IJPRET, 2015; Volume 3 (9): 1419-1427

(2)

INTRODUCTION

Semantic Web Data Mining is an integration of two most important and widely growing, fast-developing scientific fields: Semantic Web and Data Mining [1]. Semantic data is a meaningful data and semantic web is used for providing meaningful data to its users. At the same time this semantic web creates a heterogeneous and complex data structure.The second term, Data Mining are used to extract interesting patterns of knowledge from, homogenous and less complex databases. Because of the rapid increasing in the amount of data and knowledge in various areas in credit card transaction, biomedical and clinical scenarios, this could be transformed correctly to its intended target for mining [2,3], which introduces the term “Semantic Web Mining”.

Semantic web works in smarter way as it provide web service, which synchronizes and arranges all the data over web correctly and in a disciplined manner. With the success of the World Wide Web (WWW) addresses new challenge as the amount of data is so huge. The Semantic Web addresses the part of this challenge by trying to make the data machine understandable, and Web Mining addresses the other part by extracting the useful knowledge hidden in these data [4].Semantic Web Mining aims at combining the two areas Semantic Web and Web Mining along with data mining. As there is increase in the numbers of resercher work on improving the quality of Web data Mining by exploiting semantics in theWeb data, and using mining techniques for building the Semantic Web.

In general, data mining approaches not only can be used inside the web services, but it is also can use them to develop web services. For example, a user of web services can use a data mining methods to explore data resources on the web. In this paper, firstly we perform the general overview of all the terms used in this semantic web data mining techniques like Semantic data, semantic web, data mining, web mining, etc. And in the next section we propose the methods to properly extract the semantic data on the web. We use the Association rule in data mining and Apriori algorithm, and proposed the model for properly extracting the semantic web data.

II. IMPORTANT TERMS

A. Semantic Data

(3)

real world it is a software engineering model. The designed Goals of Semantic Data system is to represent the real world as accurately as possible within some data set. There is linear and hierarchical organization of data to give certain meanings like in below example. Semantic data allow the real world within data sets by representing, machines to interact with worldly information without human interpretation [5].This semantic data is organized on binary models of objects, mostly in groups of three parts consisting of two objects and their relationship.

Consider example, if one wanted to represent a pen is on a letter book, the organization of data might look like: PEN LETTER BOOK. The objects (pen and letter book) are interpreted with regard to their relationship (residing on). The data is organized linearly, telling the software that as PEN comes first in the line, it is the object that acts. i.e., the position of the word makes the software to understand that the pen is on the letter book and not that the letter book is sitting on the pen. Databases designed in this concept have greater applicability and are easily integrated into other databases.

Since, this semantic data is develop from 1970, its uses are growing on increasing and reach to many important applications. It has very important applications for the enterprise world. Database Management Systems can be integrated with one another and compared. It is helpful model for streamlining the relationship between company and vendors, making database sharing and integration much simpler. Semantic language called Gellish was developed recently, as a formal language that represents semantic data models. Gellish can be interpreted by computers very easily and no human interaction is required.

B. Semantic Web

(4)

Uniform Resource Identifier (URI): It is a formatted string that serving as a means to identify

abstract or physical resource, classifying further as a locator, a name, or both.

Resource Description Framework (RDF): RDF contains. The concept of an assertion.

Meta-assertions that is Meta-assertions about Meta-assertions make it possible to rudimentary checks on a document. It is a model of statements made about resources and associated URI’s and its statements have a structure of three parts: subject, predicate, and object.

Metadata: Same as the concept of Meta-assertions Metadata are data about data. They index

Web pages and Web sites in the Semantic Web, allows other computers to get acknowledge that what the Web page is about.

Ontology: Ontology provides a set of well-founded constructs to build meaningful higher level

knowledge that specifies the semantics of terminology systems in a well-defined manner. Ontology represents a richer language for a particular domain for providing more complex constraints on the types of resources and their properties. By providing richer relationships between the terms of a vocabulary, it enhance the semantics of terms.

Web Services: A Web service is called as a software system designed to support interoperable

machine-to-machine. Interaction over a network. If it is a Semantic web services it is useful for the interchange of semantic data, which is easy for programmers for combining data from different sources and services with proper meaning [7].

C. Data mining

(5)

D. Web Mining

The Web Mining is the application of data mining techniques working in context of content, structure and usage of Web resources [11].

The three main areas of Web Mining are:

Content Mining – It Analyzes the content of Web resources and mainly based on text mining techniques, but extensions to multimedia content is also beginning to emerge in this field.

Structure Mining – It mines data by analyzing the hyperlink structure between Web pages.

Usage Mining – When the user’s clicks from Web server logs it is analyzes by this technique [12].

III. TECHNIQUES USED

A. Resource Description Framework (RDF)

The Resource Description Framework (RDF) gives us facility to store resources’ information that are available in the World Wide Web using their own domain vocabularies as a common language. Three types of elements contented in the RDF: resources (URIs), literals, and properties [13]. This is a very effective framework to represent any kind of data that one wants to define on the web [14].

B. Web Ontology Language (OWL)

The Web Ontology Language (OWL) is considered as a more complex language having some better machine interpretability than RDF discussed above which precisely identifies the resources’ nature and their relationships [16]. To represent the Semantic Web information, this language uses ontology, which is a shared machine-readable representation that explicitly describe common conceptualization and the fundamental key of Semantic Web Mining [15].

C. Association rule

(6)

optimal way, many researchers is mainly focused on the first step, how to efficiently discover all frequent item sets[17].

D. Apriori Algorithm

Apriori algorithm is an algorithm to mine effectively Boolean association rule from frequent term sets. This algorithm uses the breadth-first search strategy, it searches layer-by-layer iterative method, first of all, find out the frequent term set with length of 1 which is recorded as

L1, L1 is used to find the aggregate L2 of frequent 2-term sets, L2 is used to find the aggregate

L3 of frequent 3-term sets. The cycle continues, until no new frequent k - term sets can be found. For finding each Lk needs a database scan. Finding all the frequent term sets is the core of association rule mining algorithm and it has the maximum calculating workload. Afterward, according to the minimum confidence threshold the effective association rules can be constructed from the frequent term sets[18].

Apriori is a basic algorithm for frequent item set mining and association rule mining over transactional databases. It proceeds by identifying the frequent individual items in the database and extending them to larger and larger item sets as long as those item sets appear sufficiently often in the database. The frequent itemsets determined by Apriori can be used to generate association rules, which highlight general trends in the database. This algorithm is rewritten to deal with semantic transactions and accordingly semantic rules, with their predefined format in the ontology, will be resulted. In addition, some factors such as support and confidence should be determined for each particular data set. The achieved rules are expected to be useful in improving intelligent decision support systems.

IV. PROPOSED METHODOLOGY

(7)

items are formatted based on the association rule mining ontology defined. Therefore, they can be linked with other semantic data, and also semantic reasoning can be performed on them to generate new more meaningful transactions.

Therefore, there are two general phases in our semantic association rule mining system, semantic transaction production and running semantic association rule mining algorithm on them. The second phase is implemented based on Apriori. This algorithm is rewritten to deal with semantic transactions and accordingly semantic rules, with their predefined format in the ontology. In addition, some factors such as support and confidence should be determined for each particular data set. The achieved rules are expected to be useful in improving intelligent decision support systems.

Fig.1 Flow diagram of the proposed methodology based on above techniques.

V. CONCLUSION

(8)

challenges listed above, it is crucial to apply the knowledge of semantic annotated data based on ontology’s, to produce semantic transactions efficiently. By definition of semantic web and their properties in the ontology, try to overcome the heterogeneity of semantic web data. We have given the mining process presented in this paper, which can be performed automatically for any kind of semantic data after extracting semantic data. To conclude, we perform general overview of all the terms and techniques used for proper Semantic Web Mining. Moreover, we propose a method for properly mine data from Semantic web.

VI. REFERENCES

1. O. Mustapaşa, A. Karahoca, D. Karahoca and H. Uzun- boylu, “Hello World, Web Mining for E-Learning,” Pro- cedia Computer Science, Vol. 3, No. 2, 2011, pp. 1381- 1387. doi:10.1016/j.procs.2011.01.019

2. H. Liu, “Towards Semantic Data Mining,” Proceedings of the 9th International Semantic Web Conference, Shanghai, 7-11 November 2010, pp. 1-8.

3. V. Nebot and R. Berlanga, “Finding Association Rules in Semantic Web Data,” Knowledge-Based Systems, Vol. 25, No. 1, 2012, pp. 51-62. doi:10.1016/j.knosys.2011.05.009

4. Hamed Hassanzadeh and Mohammad Reza Keyvanpour, “Semantic Web Requirements through Web Mining Techniques”, International Journal of Computer Theory and Engineering, Vol. 4 No. 4, August 2012.

5. What is Semantic Data [Online] available: http://www.semagix.com/what-is-semantic-data.html.

6. T. Berners-Lee, W.hall, “The Semantic Web”, Secintific America, May 2001.

7. H. Zeng and T. C. Son, “Semantic Web Services”, Intelligent systems, Vol. 16, pp. 46-53, 2001.

8. Fayyad, Usama; Piatetsky-Shapiro, Gregory; Smyth, Padhraic (1996). "From Data Mining to Knowledge Discovery in Databases". Retrieved 17 December 2008.

9. Hastie, Trevor; Tibshirani, Robert; Friedman, Jerome (2009). "The Elements of Statistical Learning: Data Mining, Inference, and Prediction". Retrieved 2012-08-07.

(9)

11.Stumme, G., Hotho, A., Berendt, B.: Semantic Web Mining: State of the art and future directions. Web Semantics: Science, Services and Agents on the World Wide Web 4(2) (2006) 124 – 143 Semantic Grid –The Convergence of Technologies.

12.Berendt, B., Hotho, A., Mladenic, D., van Someren, M., Spiliopoulou, M., Stumme, G.: A Roadmap for Web Mining: From Web to Semantic Web. Web Mining: From Web to Semantic Web Volume 3209/2004 (2004) 1–22.

13.V. Nebot and R. Berlanga, “Finding Association Rules in Semantic Web Data,” Knowledge-Based Systems, Vol. 25, No. 1, 2012,pp.51-62. doi:10.1016/j.knosys.2011.05.009

14.Jeon and W. Kim, “Development of Semantic Deci- sion Tree,” Proceedings of the 3rd International Confer- ence on Data Mining and Intelligent Information Tech- nology Applications, Macau, 24-26 October 2011, pp. 28- 34.

15.Sugumaran and J. A. Gulla, “Applied Semantic Web Technologies,” Taylor & Francis Group, Boca Raton, 2012.

16.Jain, I. Khan and B. Verma, “Secure and Intelligent Decision Making in Semantic Web Mining,” International Journal of Computer Applications, Vol. 15, No. 7, 2011, pp. 14-18. doi:10.5120/1962-2625.

17.Mala A., Ramesh Dhanaseelan F., 2011: DATA STREAM MINING ALGORITHMS – A REVIEW OF ISSUES AND EXISTING APPROACHES, International Journal on Computer Science and Engineering (IJCSE), Vol. 3, No. 7, Pages 2726-2732.