• No results found

Graph-based Representation of an OCMS

The OCMS is represented using a graph-based formalism. We choose a graph-based formal- ization over set theory or relational algebra for the following reasons. First, graphs provide exhaustive theory support and reduce the problem to a well studied topic in graph theory [Baresi & Heckel, 2002]. This includes mappings between structures and finding a minimal representation of a given graph. In this research, we frequently search entities in the OCMS to delete or add semantics. Graphs have some proven efficiency for searching subgraphs, nodes and edges. There are generic implementations and algorithms available for graphs [Heckel, 2006].

Second, graphs provide appropriate data structure to represent ontologies and annota- tions. The available ontology editors, such as protege, use graphs to represent ontologies in RDF and OWL [Trinkunas & Vasilecas, 2007] [B¨onstr¨om et al., 2003]. Finally graphs visu- alize complex data in a simple and understandable way. In our OCMS, the ontology and the annotation are represented as graphs and the content is represented as a set of documents. The document set serves as a node (of type instances) in the annotation layer.

An OCMS is represented as graph G = Go∪ Ga∪ Cont, where Go is the ontology

graph, Ga is the annotation graph and Cont is the content set. An example of a graph

representation of an OCMS is given in Figure4.3representing the ontology graph at the top, annotation graph in the middle and the document set at the bottom. Each of the individual graphs and their descriptions are given below.

4.4.1 Ontology Graph

An Ontology Graph is represented by a directed labelled graph Go = (No, Eo) where No

is a set of labelled nodes no1, no2, . . . , nol which represent classes, data properties, object

properties and instances [Zhang et al., 2010]. Eois a set of labelled edges eo1, eo2, . . . , eom.

An edge eois written as(n1, α, n2) where n1, n2 ∈ Noand the labels of an edge represented

by α∈ CA ∪ DP A ∪ OP A ∪ IA ∪ RA.

Figure 4.3: Graph-based representation of OCMS

DPA={subDataPropertyOf, dataPropertyRange, dataPropertyDomain, disjointDataProp- erties, equivalentDataProperties, functionalDataProperty}

OPA={subObjectPropertyOf, objectPropertyRange, objectPropertyDomain, disjointO- bjectProperties, equivalentObjectProperties, inverseObjectProperties, symmerticObjectProp- erties, functionalObjectProperty, inverseFunctionalObjectProperties, transitiveObjectProp- erty, reflexiveObjectProperty, irreflexiveObjectProperty}

IA={sameIndividual, differentIndividuals, classAssertion, dataPropertyAssertion, ob- jectPropertyAssertion}

RA={objectAllValuesFrom, objectSomeValuesFrom, objectHasValue, objectHasSelf, objectExactCardinality, objectMaximumCardinality, objectMinimumCardinality, dataAll- ValuesFrom, dataSomeValuesFrom, dataHasValue, dataExactCardinality, dataMaximum- Cardinality, dataMinimumCardinality}

In the ontology graph, properties are often represented as nodes and property instances are represented as edges [B¨onstr¨om et al., 2003]. User defined property nodes link with other class nodes using schema level property instances such as rdfs:domain and rdfs:range. For example, in Figure 4.3, the object property Contains is a property node linked to

Help File using a schema level instance property rdfs:domain as an edge. This schema

level property instance defines the domain of the property node. However, at the annotation level, the property which is treated as a node in the ontology (now serving as schema for the annotation graph) is treated as an edge in the annotation instead of a node. For example, an instance of Help File CNGL:id-a9221956.xml is linked to an instance of a paragraph using the edge cngl:Contains.

In general, we treat properties as nodes and property instances as edges (Figure4.6). When we define properties as part of an ontology, we represent them as nodes and when we use those defined properties in the annotation, we represent them as edges of the annotation graph. We define the properties as a node and link them with other class nodes and prop- erty nodes in the ontology graph. We represent property instances as edges that link two instances in the annotation graph.

OWL:Thing rdfs:SCO Help_File paragraph Activity OWL:class rdfs:type rdfs:type rdfs:type Help_File paragraph Activity rdfs:SCO rdfs:SCO Activity Task owl: EC

a. Types of the entities b. Hierarchy of classes c. Equivalent classes

OWL:Location hasLocation hasCity isAbout OWL:Object Property rdfs:type rdfs:type rdfs:type hasCity hasCountry owl: SOPO hasCity isCityOf owl:IPO

d. types of the entities e. Hierarchy of classes f. InversePropertyOf

Contains

Help_File paragraph

g. Domain and range of properties rdfs:Domain rdfs:Range

Node Edge

rdfs:SCO rdfs:SubClassOf owl: SOPO owl: SubObjectPropertyOf

owl: SOPO

= =

owl: EC = owl: EquivalentClasses owl:IPO = owl: InversePropertyOf

Figure 4.4: Graph-based representation of the ontology layer

In Figure4.4.a, the graph nodes represent entities that are linked to the owl:class node. Universal classes and domain-specific classes are defined as owl:class. The edge links each of the entities to the owl:class node. Figure4.4.b shows the relationship among the

nodes. The edges represent a subclass axiom. In Figure 4.4.c, the edges represent the equivalence axiom between the two nodes, which are represented as classes in Figure4.4.a. The representations of the property nodes and edges created between properties are given in Figure 4.4.d, Figure 4.4.e and Figure 4.4.f. Figure 4.4.g represents nodes and edges between classes and properties.

4.4.2 Content Set

A Content Set can be viewed as a set of content documents. Cont = {d1, d2, . . . , dn}

where: di represents a structured or semi-structured document or elements of a document.

In the content layer, such content is represented as a node.

Figure 4.5: Document collection

The content is represented as a set of documents either in a flat file, or in a database. We represent the set of documents using their unique identifiers. The unique identifiers ensure access to the exact location of the documents. However, the selection of storage structure is the decision of the architects at the time of deployment of the OCMS.

4.4.3 Annotation Graph

An Annotation Graph is represented by a directed labelled graph Ga = (Na, Ea) where

Nais a set of labelled nodes na1, na2, . . . , naland Eais a set of labelled edges ea1, ea2, . . . , eam.

An annotation edge eais written as (na1, αa, na2) where na1 ∈ Cont is a subject, na2 ∈

Cont∪ GOis an object and αa∈ GOis a predicate. The edges are referred as triples.

The user-defined properties are treated as labels of the edges when they are used in the annotation layer to describe the document nodes. For example, in the triple (CNGL:id-

19221955.xml, cngl:Contains , CNGL:id-19221955\para) in Figure 4.6, the document is treated as a node and the instance property contains is represented as the label of the edge in the annotation layer. Figure 4.6further depicts the sources of the objects in the triples. The triples in the vertical ovals get their object from the ontology graph, whereas the triple in the horizontal oval gets its object from the content set.

Figure 4.6: Annotation graph

4.4.4 Attributes of the Graph

The type of a node is given by type (n) that maps the node to its type which is defined in the schema (class, instance, data property, object property). The label of any edge e = (n1, α, n2), which is α , is a string given by label(e). The label of a node n is the URI

associated with the node and is given by label (n). All the edges of a node n are given by a function edges (n). It returns all the edges as(n, α, m)∨ (m, α, n) where n is the target node and m is any node linked to n via α.