4.3 Layered OCMS Framework
4.3.3 Annotation Layer
The annotation layer provides a means of handling semantic annotation [Chu et al., 2009]. Semantic annotation is a process of linking content with ontology entities to enrich the con- tent with the semantics [Oren et al., 2006]. Semantic annotation is used to explicitly iden- tify concepts and relationships between concepts discussed in the content [Uren et al., 2006] [Kr¨otzsch et al., 2007].
The annotation process semantically enriches content by defining its attributes using concepts and properties from the ontology. It further creates a link between two content documents to indicate their semantic relationships. Annotation also refers to the output of the annotation process.
In any application that makes use of ontologies, the target content, which needs to be semantically enriched, is required to have an explicit link, at least to one or more elements in the ontology. The annotation becomes the major element of the OCMS for the following reasons.
• In applications that make use of ontologies, the target content which needs to be semantically enriched is required to have an explicit link at least to one element in the ontology. This is achieved by annotations.
• Annotation provides semantics which can be used by humans and machines.
• Annotation provides traceability of the content fragment using the semantics associ- ated with it.
4.3.3.1 Annotation Storage
In semantic annotation, there are two approaches commonly used to store annotation data: in-line annotation and stand-off annotation [Wilcock, 2009]. The in-line approach embeds the annotation information in the content. Such annotations either modify the content of the original document to embed the annotation or maintain a copy of the original document together with the annotation data. Whenever the semantic data is required, the system needs to access the annotated document and extract the annotations6. The disadvantage of the in- line annotation is that the annotation must be aligned with the T Box statements which requires additional effort to align the content with the ontology.
The stand-off annotation stores the annotations of the document in a separate storage space. This approach uses the document URI as a unique identifier of the documents and every annotation of that document is associated with a URI. This approach has advantages and disadvantages. The first advantage is the separation of the semantics from the content, which allows independent evolution of either the content or the annotation. The second is, it enables the annotation data to be accessed separately without reading the whole document. Exhaustive annotation increases the size of the original document and becomes a problem for accessibility of individual annotations [Maynard, 2008]. Third, it is suitable to anno- tate content when the annotator does not have the permission to modify the content. The separate annotation layer further provides facilities such as querying the annotation triples. However, there are disadvantages associated with it [Wilcock, 2009].
The main disadvantage is, it requires a systematic synchronization of the annotation with the content. When the document is modified or deleted, the annotation layer should be updated accordingly. In a distributed environment, this task may introduce additional overhead. The other disadvantage is the separate storage of the content and the annotation. The separation causes the content to get delivered separate from the annotation. In fact, this problem can be addressed by merging the content and the annotation data during content delivery.
6
Table 4.1: Annotation triple representation
Subject Predicate Object context
CNGL:id-2.xml rdf:type rdfs:Resource cngl:triple CNGL:id-2.xml rdf:type CNGL:Document cngl:triple CNGL:id-2.xml CNGL:isAbout CNGL:DeletingEmail cngl:triple CNGL:id-2.xml CNGL:hasTitle “Deleting email account” cngl:triple CNGL:id-2.xml CNGL:Contains CNGL:id-6.xml cngl:triple CNGL:id-2.xml CNGL:mediaType CNGL:Text cngl:triple
4.3.3.2 Annotation Triples
The annotation process uses RDF triples (subject, predicate and object) to annotate any content document. It further stores the context of the annotation to distinguish between different contexts. The subject of the annotation comes from the content layer and is usually a URI. The predicate comes from the ontology or the schema defined for the ontologies. The objects can be resources from the ontology or other content artefacts. A single content document can have multiple annotation triples. A single resource defined in the ontology can be used many times in the annotation layer.
Table4.1shows the structure of the annotation triple. The subjects of the annotation are content documents (xml files in this example) or parts of xml files. The predicates originate from OWL or RDF properties (rdf:Type) or properties from domain-specific ontologies (CNGL:isAbout). The objects come from either the ontology (CNGL:Document) or from the content layer (CNGL:id-6.xml). The user can provide different contexts to categories of annotation triples. For example, the context of the triples is CNGL:triple, to indicate that they are triples for annotating resources in CNGL.
In the OCMS, we store the annotation triples in triple stores. Triple stores improve the speed of the retrieval of the required information. They store large numbers of triples and are suitable for further expansion [Bizer & Schultz, 2008]. Furthermore, the annotation triples are compliant to RDF and RDF/XML serializations.
4.3.3.3 Change in the Annotation
The annotation layer is the dynamic layer of an OCMS [Goncalves et al., 2011]. The changes in the annotation layer are frequent and include addition and deletion of individual annotations. There are a number of triples added, modified or deleted in this layer. This layer is highly dependent on both the content and the ontology layer. Any change in the other two layers affect the annotation layer which carries all the semantics related to the content. Changes made on the triples of this layer may cause other changes to related an- notations within the layer. In such a situation, the changes in the annotation layer require proper analysis and evaluation before they are implemented in the system.
Figure 4.2: An example of a layered framework of OCMS
Now, let us take a concrete OCMS representation and see how the three layers inter- act with each other. In Figure4.2, the ontology layer contains an ontology which contains concepts such as Help file, User, Software Feature, etc. These concepts are used in the annotation layer to describe help documents stored in the content layer. For example, the document that contains information about an administrator is linked to the concept admin-