Supporting Information Systems Literature Research
3.2 Tailored Data Entry Interface Generation Tools
3.2.3 Ontologies
Using an ontology to control the data entry for a database has the potential to ensure that better quality data is captured and that data from differing data providers will be compatible. It may also allow a data entry interface to be created that allows domain users to enter data using terms with which they are familiar but which are clearly defined semantically. However at the start of this research there was no agreed ontology for taxonomic description, although a structured data model and associated glossary can be considered to be similar to a weak form of ontology.
3.2.3.1 Defining Ontology
There is a great deal of literature concerning ontology, much of which while valid in its own field, is not relevant to this project. Originally ontology referred to a discipline of philosophy concerning the nature of reality, asking the question 'what is?’, of the kinds and structures of objects, properties, events, processes and relations in every area of reality' [Smith 2003]. In the past decades, however, the concept of ontology has gained new meanings in other disciplines, particularly information sciences.
Ontology has become a loosely utilised term for a number of different approaches. Gruber's definition "An ontology is a formal explicit specification of a shared
conceptualisation" [Gruber 1993a] is commonly cited in computer science literature,
but is itself a fairly wide definition, open to interpretation. Another definition [Guarino1995] defines ontology as 'an engineering artefact, constituted by a specific
vocabulary used to describe a certain reality, plus a set of explicit assumptions regarding the intended meaning of the vocabulary words…In the simplest case, an ontology describes a hierarchy of concepts related by subsumption relationships; in more sophisticated cases, suitable axioms are added in order to express other relationships between concepts and to constrain their intended interpretation.'
Generally ontological systems in non-philosophical disciplines aim to improve communications between entities in a field by ensuring that, within the scope of the ontological system, the entities precisely understand what each other means when referring to concepts and relationships. This is similar to the aims of the tool to support taxonomists by promoting better communication of ideas through improved description.
There are top-level ontologies, which are concerned with general categories (e.g. time, space, identity, quantity, etc) as opposed to domain specific ontologies (e.g. of geography, medicine, ecology). The former attempts (e.g. OIL[Fensel 2000]) are criticised for employing exclusively set-based ontology construction, which would make them unsuitable in many actual real-world applications. In taxonomy, a domain specific ontology could be envisioned that would simply define the terms for use in description and regulate how those terms could be related. Whilst this would be a fairly simple and limited ontology which did not attempt to model the entirety of the domain, such relationships would likely go further than just subsumption.
3.2.3.2 Use of ontologies
As mentioned, ontologies are widely used in a variety of contexts. A summary classification of ontology usages can be made, such as in the following table.
Uses of Ontology For communication
Between implemented computational systems. Between humans.
Between humans and implemented computational systems. For computational inference
For internally representing and manipulating plans and planning information.
For analyzing the internal structures, algorithms, inputs and outputs of implemented systems in theoretical and conceptual terms.
For reuse (and organization) of knowledge
For structuring or organizing libraries or repositories of plans and planning and domain information.
Table 3.1: Uses of Ontologies [Gruninger 2002]
In these terms, an ontology primarily for reuse and organisation of knowledge, and for communication between humans, as well as between humans and implemented computational systems would be of possible use in controlling terminology. There are knowledge base systems in existence which attempt to garner knowledge about a domain for these purposes.
Existing ontology based approaches for data entry are, however, generally still limited to using automatically generated forms-based data entry interfaces unless manual editing is used (e.g. [Gennari 2002]). A form tends to be generated for each class instance with IO widgets derived from the ‘slot’ data types. Links between forms are based on class subsumption relationships. See figure 3.1 for an example of a data entry form and figure 3.2 for an example of an editor for such forms that allows the representative concrete widget to be changed. These systems are designed to populate a knowledge base describing relationships between described instance items of interest, rather than regulating the capture of the description of a complex concept.
Figure 3.1: Protégé Frames instances tab for knowledge acquisition [Stanford Medical Informatics 2005a]
Figure 3.2: Protégé Frames forms tab for tailoring instance editing forms [Stanford Medical Informatics 2005b]
3.2.3.3 Ontology Representations
Editing an ontology has analogies with building a structured proforma using a defined description terminology. However, ontology editor tools are designed with IT specialists in mind. Again tending to the forms based interface, they conform to the structure of the ontology as might be expected, using ontology modelling language. Generally ontology editors represent the class hierarchy with some form of file tree visualisation, with the linked columns or separate forms, for associated instances, slots or other elements. Some of these editors do however use some graphical views of class relationships such as in figure 3.3, however even using all the screen space, only a small number of nodes are clearly displayed in these visualisations.
Figure 3.3: OWLviz visualisation of an ontology, showing the subsumption class relationships. [Stanford Medical Informatics 2006]
There has also been substantial work in the medical informatics field by ontologists with displaying ontologies visually, in this case mainly for exploration. Many ontologies in the scientific and medical fields concentrate on logical relationships between concepts. They seek to use one consensual nomenclature (standard definitions for terminology). In biological ontologies wide agreement on basic anatomical details/terms and detailed composition hierarchies differentiate these ontologies from the taxonomic case. Nevertheless sufficient parallels can be drawn in underlying subject matter, to warrant investigating some representative visualisation techniques used in medical ontologies.
The Gene Expression Information Resource Project [Davidson 1997, Baldock 2002] for example includes an ontology of genes and anatomy of mice with a 3D atlas. The 3D Atlas uses a high-resolution digital representation of mouse anatomy, using serial sections of embryos at various developmental stages taken from the ontology (see figures 3.4, 3.5). The ontology allows users to visually explore the tissue representations and relate gene expressions or anatomical terms to parts of the visualisation.
The visually impressive multi-pane biological visualisations, with alternate views of physical representations are not however possible in the proposed taxonomic description interface. A multimedia based approach to description representation and exploration may be possible for individual plant descriptions, but each one would require special work, unless suitable mapping abstractions could be found. They are more analogous to multimedia character keys, than to direct representations of character description and definition space. The use of linked display panes and details on demand are however shown to be a useful technique with such structured data.
Figure 3.5: Gene Expression Information Resource Project - MOUSE ATLAS (with overview navigation pane)