Domain Models - Data Entry Process - Interactive visualisation tools for supporting taxonomists

Data Entry Process

7.2 Domain Models

The data entry interface operates on the specialised domain model described in previous chapters. This section will only cover extra elements of that model required to accurately capture the data entered by users.

7.2.1 Specimens

Each specimen being described requires a unique identifier to allow the system to track the instantiated data values for each specified attribute in the specialised domain model. The system provides such an identifier when users indicate they desire to

instantiate a new specimen, although users can override the identifier with one of their own choosing to match domain or other practical considerations.

One XML file is exported for each instantiated specimen containing the instantiated portions of the specialised domain model. This file can be exported to a database or used to reload the data for the instantiated specimen back into the system for further data entry.

7.2.2 Multiple values

The domain model uses a system object called an instance-score to record instantiated values for an attribute of a specimen. This is straightforward when a user enters a single value for a specialised attribute. The domain model must also be able to distinguish and record accurately cases where users enter multiple values for an attribute.

7.2.2.1 Ranges

Multiple values can form a range for attributes with a numerical entry value domain, where the description object is being described abstractly.

However, ranges do not apply to attributes with value constraints restricting entry to selection from defined value objects. It was decided not to allow value objects, of a value domain, to form ranges, as this would require a continuum. Primarily this decision was made in order to protect the independence of the definitions and to retain clarity of entered data. The ordering of such a continuum would be a source of highly subjective opinion. Take for example various outline shape value objects from which some taxonomists initially believed they could form a range. It quickly became apparent from informal exercises that different users were unable to interpret accurately the meaning of each other’s ranges as users were not forming the same mental ‘natural’ continuum. The difficulty in doing so can be seen from considering how one would order a series of normal shapes such as square, rectangular, circular, oval, triangular. One could start with the shapes using straight lines and move to the more circular, or one could order based on the number of points, their symmetry, etc. Published descriptions were

observed to use such qualitative ranges in current descriptions, however taxonomists were unable to interpret those ranges unambiguously.

If the value objects in a value domain were part of a continuum, users would need to select the whole continuum for inclusion in the specialised domain model or comparability would not be possible. Take a very simple example with 6 value objects in a continuum: A, B, C, D, E, F. User #1 only includes A, D, F in their specialised domain model and then records a range D-F for a specimen. User #2 includes B, C, D, E and records D-E in data entry for a specimen. At first glance these describe different real-world values, however consider that user #2 does not have F as a possible point and hence may have recorded E as the closest point. In fact they may be describing the same real-world range. More subtle and complex examples can be readily conceived, introducing an area of uncertainty to the interpretation of recorded data.

Anyone interpreting the results would always need to see the whole set of value objects that had formed the value domain to understand the meaning of the range. Essentially a range of value objects would only be shorthand for an enumerated set of value objects. Given the subjective nature of any ordering and the need for every point along a continuum, it was considered to be better to constrain users to explicitly selecting all value objects that were applicable to a specimen. This also fits with the domain model assumption that each value object definition is self-contained and is not reliant on other value objects in order to interpret its meaning clearly.

Numeric values do have a well understood and unambiguous range mechanism and so are supported. An instance-score of a numeric attribute can thus contain two numeric values, which represents a range or one value, which represents a singular value.

When a description object is being recorded concretely (see chapter 6 for concrete description status details), ranges are not considered suitable, as individual real-world description object instances are being recorded.

7.2.2.2 AND/OR

Apart from numeric ranges, cases involving multiple values are referred to as AND-ing or OR-ing. Both cases can apply to attributes with a value domain of value objects.

It is necessary to distinguish between these circumstances. Namely, where multiple values are applicable to each of a number of real-world instances of the description object in the high-level concept (AND-ing) as opposed to the situation where the attribute has different values on different individual real-world instances of the description object (OR-ing). For example where a specimen has petals that are white and purple as opposed to a specimen whose individual petals are either white petals or purple petals. The permutations of this situation can be quite complex. Multiple values for an attribute that are AND-ing are contained within one instance-score. Multiple values for an attribute that are OR-ing are contained in separate instance-scores that are linked to the same high-level concept.

7.2.3 Concrete description objects

Instance-scores are grouped together for each concrete description object instance (see 5.3.4.3), using a sequential numerical identifier. As they only refer to one real- world instance, OR-ing is not permitted for attributes of a concrete description object instance.

7.2.4 Modifiers

Various descriptive modifier terms can be mapped from the ontology to the domain model. These modifiers come in modifier groups (see figure 5.2) that restrict how they can be applied. Some of these modifiers have allowed relationships to multiple description objects and/or attributes (relative and spatial modifiers in the angiosperm ontology). These are applied to attributes in the specialisation process (see 5.3.4.4). Other modifiers are applied to an attribute by users during data entry. These groups may be restricted as to which attributes they can apply to based upon concrete status. In the angiosperm ontology, modifier groups ‘locator modifiers’ (e.g. ‘at/on apex’); ‘frequency modifiers’ (e.g. ‘rarely’) and ‘qualifier modifiers’ (e.g. ‘approximately’) fall into this category. The cardinality of applying modifiers from modifier groups is derived from the mapped ontology. By default if not stated in the ontology, only one

modifier from each group is permitted to apply to any instance-score of an instantiated attribute for reasons of simplicity and to avoid over-use of modifiers.

These modifiers aim to provide users with an ability to qualify their descriptive statements without resorting to free text entry. This helps improve accurate communication of a user’s descriptive observation in a consistent manner and avoid loss of such data. However, the value of such modifiers for any form of automated comparison using a database is doubtful.

7.2.4 Not scored statement

The domain model has provision for each specialised attribute to be marked as not- scored. This allows the data entry user to make a positive statement that although an attribute has not been instantiated for the specimen, this has been done for a reason and not simply due to an error of omission. No other data is recorded for an attribute when the not-scored statement is recorded.

Taxonomic specimen descriptions are beset by inconsistent recording of characters, with characters recorded for some specimens but not others, leaving later interpreters uncertain in the omitted cases whether the character was not present, not the same as other explicitly recorded cases, not able to measured due to specimen condition, not interesting enough to the recorder or simply omitted in error. By providing the facility for a positive statement, the data can be interpreted with greater clarity.

7.2.5 Description object presence attribute

Presence is a special attribute that was added to the model during development. This attribute is included in the specialised domain model for every description object that has any included specialised attributes and is treated in a special manner by the data entry presentation model. Presence has a value domain with value objects: ‘present’; ‘absent’; ‘not scored’; ‘no comment’. When an on ontology is imported, the system attempts to map an ontology ‘presence’ attribute (and relevant value objects) to the system presence attribute. If an ontology-based term can be mapped, then the exported descriptions will utilise it. If no mapping can be formed, the system will use a default in-built presence attribute.

The presence attribute is recorded for all description objects with specialised attributes. When the description object’s presence attribute is recorded with the value object ‘not scored’, no other data is recorded for its specialised attributes. When the description object is recorded as ‘absent’, no other data is recorded for its attributes and all description objects below it in the hierarchy are also recorded as absent.

Enforcing the recording of presence aims to improve the clarity of the data. Similar to the reasons behind the ‘not scored’ statement (see above), this eliminates interpretative uncertainty when no data is recorded for a description object. It also ensures that the logic of the description object hierarchy relationships is reflected in recorded data. In addition to the automatically included presence attribute, users can specialise and include any ontology based presence attributes that would normally be supported by ontology relationships. These user specialised presence attributes do not have the same consequences, as their specialisation may have altered the concept so that the underlying logic no longer applies.

In document Interactive visualisation tools for supporting taxonomists working practice (Page 175-180)