Geospatial Data Quality Indicators and Attributes

Chapter 2 Background Literature Review

2.4 Geospatial Data Quality

2.4.2 Geospatial Data Standards and Quality Indicators (Internal Quality)

2.4.2.1 Geospatial Data Quality Indicators and Attributes

The importance of spatial data quality indicators is widely recognised in scientific literature (e.g., Caprioli et al., 2003; Devillers et al., 2007; Wang and Huang, 2007). Devillers et al. (2002, p. 50) argue that quality indicators are “a way of seeing the big picture by looking at a small piece of it”. They suggest that quality indicators can inform users of a global measure of quality without them having to examine the data in much detail. Indicators significantly simplify quality evaluation, decision-making and justification processes by providing a number of quality cues that are easy to manage and avoiding information overflow (Devillers et al., 2007). Many researchers and scholars refer to the ‘famous five’ as the common criteria for evaluating spatial data quality (Duckham, 2000; Pundt, 2002), namely: lineage; completeness; consistency; positional accuracy; and attribute accuracy. Devillers et al. (2007), refine the ‘famous five’ to be: positional accuracy; attribute accuracy; temporal accuracy; logical consistency; and completeness as common spatial data quality elements. Caprioli et al. (2003) identify four major elements of spatial data quality: accuracy; resolution; consistency; and completeness. They further refine accuracy into spatial, temporal and thematic accuracy. Each of these commonly accepted spatial data quality attributes are discussed in more detail below.

Accuracy

significantly because highly accurate data can be costly and complex to produce. The concept of geospatial data accuracy can be refined to horizontal, vertical, attribute, conceptual, and logical accuracy. Accuracy is a relative measure and always depends on some defined specification of a true value.

Attribute/Thematic Accuracy

Attribute or thematic accuracy denotes the correctness of object classifications and the level of precision of attribute descriptions in the produced data (Cockcroft, 1997). The data produced can have high positional accuracy but objects can be misclassified or a low level of detail is provided. For instance, a line in a dataset that denotes a river can be misclassified as a road. On the other hand, the classification of the object can be correct but the description of it can be insufficient; for instance, a farm object can have the farmer or crops descriptions missing from it.

Positional/Spatial Accuracy

Positional or spatial accuracy is the level of accuracy of the spatial objects in a dataset (Stein and van Oort, 2006). It is defined as “the difference between the recorded location of a feature in a spatial database or in a map and its actual location on the ground, or its location on a source of known higher accuracy” (Tucci and Giordano, 2011, p. 453). Positional accuracy can be refined to horizontal and vertical accuracy as it applies to horizontal and vertical positions of captured data.

Temporal Accuracy

Temporal accuracy is the difference between encoded dataset values and the true temporal values of the measured entities (Veregin, 1999; Devillers and Jeansoulin, 2006). It only applies when the dataset has a temporal (time) dimension in the form of [x, y, z, t]. Temporal accuracy indicates the time stamp applied to the entities in the dataset. It is often mistaken with data currency – up-to-dateness of the data – even though these two concepts are quite distinct since currency refers to the degree to which a database is up to date (Veregin, 1999).

Lineage

The lineage of geospatial data is the historical information about the data which refers to how the data has been collected and processed to arrive at the final data product (Stein and van Oort, 2006). Geospatial data lineage includes information on data source, data producer, data content, capturing effort, the methodology applied to collect the data, processing steps that were applied to derive the data product, algorithms applied, geographic coverage of the data, and other historic information.

Completeness

Geospatial data completeness measures the omission error in the data and its compliance with data specification. From a data supplier’s point of view, it can be defined as “a measure of the degree to which data content corresponds to the real world in accordance with the data capture specification, dataset coverage, and at the level of currency required by the update policy” (Harding, 2006, p. 150). Highly generalised data can be accepted as complete if it complies with its specification of coverage, classification and verification.

Consistency

Geospatial data consistency can be defined as the absence of conflicts or contradictions in a dataset (Caprioli et al., 2003). Geospatial data consistency includes logical consistency, topological consistency, temporal consistency, and thematic consistency. Logical consistency relates to structures and attributes of geospatial data and defines compatibility between dataset objects – e.g., variables used adhere to the appropriate limits or types (Servigne et al., 2006). Topological consistency is the dataset compliance with topological rules – e.g., no objects can have x-coordinate values below 0, polygons cannot intersect (Caprioli et al., 2003). Temporal consistency is conformance to temporal topology rules – e.g., the dataset rules can specify that only one event can occur in one place a given time. Temporal consistency relates to dates of data acquisition, types of updates, and validity of periods (Servigne et al., 2006). Thematic consistency measures conflicts in thematic attributes – e.g., a population density value must be correct given population and area (Caprioli et al., 2003).

Resolution

Resolution is the amount of detail that geospatial data contains and is also known as precision or granularity (Caprioli et al., 2003). Resolution is always finite because no measurement system can be infinitely precise. High resolution does not necessarily mean better fitness for use: in some cases low resolution may be required to formulate more general models.

The broad scientific acceptance of the common spatial quality elements does not imply their applicability to all the cases of quality or fitness for use evaluation (Pundt, 2002) since user requirements can go far beyond the widely accepted ‘famous five’. While no tangible user- defined quality indicators to specifically assist fitness for use evaluation have been identified, there are many existing forms of metadata (such as documentation describing subjective quality measures outlined in this section) which can potentially be used to this end if they are consistently supplied, and can be easily viewed by a user through the prism of their own priorities.

In document Visualisation of quality information for geospatial and remote sensing data:providing the GIS community with the decision support tools for geospatial dataset quality evaluation (Page 49-52)