Chapter 5 Phase I: Soliciting User and Expert Views on a GEO Label
5.2 Results and Discussion
5.2.4 User and Producer Views on Certification of Geospatial Data
a) the certifications “help the user to find the fit-for-purpose datasets”;
b) the certifications “provide some kind of guarantee and standard procedures of claiming”;
c) “in our case, [the certifications] ensure to the community the quality of our productive processes and existence of documentation about all the process steps, indicators of quality control, etc.”; and
d) “Local NWS office or subsequent NCDC publication of observations are routinely used and trusted by partner agencies, industrial users, academics, and others. NCDC 'certified' observations are regularly used in legal cases to establish weather or hydrologic conditions at the scene”.
5.2.4 User and Producer Views on Certification of Geospatial Data
When asked their initial views as to whether geospatial data or metadata records would benefit from the application of certification programme(s), the opinions of users and producers were reasonably concordant: 49% of users and 57% of producers agreed that certification would be beneficial; only 12% of users and 13% of producers disagreed with the benefit of certification programmes; and 39% of users and 30% of producers were not sure. Many respondents agreed that data certification “would help to improve general quality of data […], would also help the user to know the limitations of the data, and if done in a standard, nonbiased manner, would be a useful comparison among similar datasets”. One of the data users pointed out that “the amount of geospatial data without proper metadata is overwhelming; establishing the quality of data by trial and error takes too much time to benefit from the increasing availability of data”. Respondents argued that certification would “ensure that a minimum amount of information is captured”, “it would encourage data providers to follow a common framework of data formats or metadata provision”, and would ensure consistency of data quality. Furthermore, respondents suggested that certification would help to “identify datasets from authoritative sources and distinguish those datasets from similar data of uncertain quality”. Respondents also confirmed the previous findings on importance of provenance and licensing information, stating that certification could ensure
that metadata “provides a record of the history of the processing, which tends to be neglected if there is a long path towards the final product. It may help with IP issues and similar questions. Some datasets are provided with restrictions on them which can be difficult to sort out after the event. Resolving them becomes a paper chase”.
Although geospatial data certification was viewed positively, some of the study respondents indicated that certification will never outweigh other important data characteristics such as data content, citation information and peers’ recommendations. Generally, users appear to view geospatial data certification as a type of formal data quality control (e.g., certification of a dataset’s conformance to a defined level of uncertainty, accuracy, resolution, etc., data/metadata interoperability, etc.) and control over metadata completeness; data content, peer review and citations do not appear to be considered as potential certification metrics. A study respondent highlighted that the “most important quality indicator is actually ratings by users (partly equivalent to peer review)”. Another respondent further argued:
“In general it would be good to know whether fundamental metadata is attached to the data in a form that is straightforward to use. This might be helpful when browsing the data, but when it comes down to the choice of what to use for your application I do not see how this would circumvent the process of looking through metadata, searching the literature and talking to other users”.
Consistent with the initial interview results, these findings indicate the value of producer- supplied metadata records as well as more subjective information to support dataset quality evaluation.
In contrast, those respondents who disagreed with certification stated that “certification is an extra effort […and] it seems unlikely asking for more extra effort will improve the current [situation] on a broad basis”. Data producers stated that they “have in-house quality checks and provide information on these [and the] procedures are ISO-certified already”. ISO, OGC, IETF, OASIS, W3C, etc. also have their own conformance schemes, consequently “there are already too many labels/certificate/etcetera in the world, and it is VERY difficult to distinguish the really useful/neutral labels with industry 'self-labelling' things”. The cost of certification also raised a concern because the “regimented bureaucracy of [an] approval process may be costly and difficult for data providers to implement” and “if you need too much time to produce this certification, the updating of the data can be compromised”. Also certification could potentially “reduce the number of datasets made available (due to publishers not wanting to go through the certification process – it is hard enough to encourage filling in of Metadata records)”. This would not be desirable because certification would “make the data
Finally, a number of study respondents did not believe that certification of geospatial data quality is feasible because “quality is in the eye of the beholder, so that is not certifiable”. It was argued that “at best you can certify whether the producer followed recommended practice of documentation and stewardship” and it was suggested that “data is used for a range of purposes, and data sets not passing the certification process could sometimes be better suitable for [one’s] purpose than certified ones”.