Distributed Ontology - Knowledge extraction from unstructured data and classification through d

OWL Lite : supports those users primarily needing a classification hierarchy and

simple constraints. For example, while it supports cardinality constraints, it only permits cardinality values of 0 or 1. It should be simpler to provide tool support for OWL Lite than its more expressive relatives, and OWL Lite provides a quick migration path for thesauri and other taxonomies.

OWL DL : supports those users who want the maximum expressiveness while

retaining computational completeness (all conclusions are guaranteed to be computable) and reliability (all computations will finish in finite time). OWL DL includes all OWL language constructs, but they can be used only under certain restrictions (for example, while a class may be a subclass of many classes, a class cannot be an instance of another class). OWL DL is so named due to its correspondence with description logic, a field of research that has studied the logics that form the formal foundation of OWL.

OWL Full : is meant for users who want maximum expressiveness and the syn-

tactic freedom of the RDF Schema with no computational guarantees. For example, in OWL Full a class can be treated simultaneously as a collection of individuals and as an individual in its own right. OWL Full allows an ontology to augment the meaning of the pre-defined (RDF or OWL) vocabulary. It is unlikely that any reasoning software will be able to support complete reasoning for every feature of OWL Full.

2.3 Distributed Ontology

An ontology is a hierarchical structured set of terms for describing a domain that can be used as a skeletal for describing a domain. Often, the term ontology defines also the data it hosts, i.e. instances. Using an analogy with the database literature, we can define an instance is the equivalent of a record in a database table. A record, generally speaking, follows the structure of the table where it is stored. A bag of records, belonging to different tables, define the content of the database, or better, the knowledge base. Going back to the Semantic Web literature a set of instances, which follow the schema provided by an ontology, create the knowledge base. Although the ontology definition defines only the relationship level among classes, often that word is used to define the materialization of the classes, i.e. the instances. In this section we refer as ontology to the stack which contains both the structure definition and the knowledge base.

Many efforts have been made to build an infrastructure composed by several centralized peers aimed at sharing knowledge and support consistency and availability of retrieved data. Jena [23] and Sesame [20] are examples of centralized storage systems which enable external applications to retrieve data using the SPARQL [2],

a graph-matching query language used to retrieve semantic knowledge from RDF repositories. A set of this storage servers can cooperate together to share knowledge, but the major issue related to this approach is that the results may change over time, as a function on the availability of network nodes. That is, if a node becomes unavailable it is not backed up by any other node of the network. Thus any node may be a single point of failure for the entire architecture [87].

To overcome these centralized constraints, many approaches in literature consider a distributed scenario, based on a P2P network. A P2P approach is proposed in Edutella [70] using the Gnutella protocol [54]. The overlay is composed of a set of Super Peers (SP) [71], each of them connected to a large number of simple peers. The SPs are nodes with high network bandwidth and sustainable computation power. These SPs form the backbone, manage a local set of RDF data and are responsible for routing and querying. Although this approach is developed for a distributed scenario, it also presents the single point of failure issue for the SPs, because when a SP is not available, the consistency of the retrieved knowledge is compromised.

The P2P paradigm offers an interesting alternative to existing information sys- tem infrastructures. The most important features are: i) scalability in terms of number of nodes and distribution, ii) direct access to data at the source which guar- antees freshness in contrast to centralized repositories, iii) robustness and resilience against attacks and churn by exploiting self organization principles, and iv) simpli- fied deployment because peers can be used and no special infrastructure is required to join the network (e.g. a new data repository can be added to a P2P network without any particular administrative task or declaration of adherence to a common schema) [16]. A P2P system (we refer to the P2P database system in the database literature) is conceived as a collection of autonomous local repositories which in- teract (e.g. establish correspondences or exchange query and update requests) in a P2P style. That is, local repositories are autonomous peers with equal rights and are linked to only a small number of neighbors: it allows the decentralization of the control. Furthermore, the term repository indicates that a single peer is a collection of instances, where these instances are shared to other peers allowing sym- metric communication [37]. To summarize, P2P system enables two main features: autonomy (self-organization when insertion and leave of node events happen) and heterogeneity.

Although the link from the database community to the Semantic Web community is strongly enough to allow the knowledge sharing, a P2P architecture for an RDF knowledge base has been introduced by the RDFPeers [21] project, based on a DHT, which is developed on the top of Multi Attribute Addressable Network (MAAN) [22], an application layer based on Chord [89]. Each peer in this network is responsible for a segment of the whole key space, thus satisfying the “node equality” principle in terms of required network bandwidth, storage and query dependent computational load. Each RDF field is hashed and a copy of the triple is stored in the nodes

In document Knowledge extraction from unstructured data and classification through distributed ontologies (Page 34-36)