While any generic, shared data model could provide these benefits and allow the prove- nance of results to be determined. The data model that we specify is designed to support the creation of accurate process documentation. In Section 1.3, we outlined a number of characteristics that help ensure accurate process documentation. We termed process documentation that adheres to these characteristics, high quality. These characteristics were derived from an analysis of use cases from several domains [126]. We looked at both the technical requirements enumerated in the analysis as well as the use cases themselves. We found that in a majority of the use cases, process documentation providesevidence
that a process occurred. Thus, these characteristics are justified by philosophical ar- guments that equate process documentation to evidence. Beyond these philosophical arguments, a number of the characteristics also directly support a technical requirement enumerated in the use case analysis. We now revisit these characteristics and justify them in greater detail below.
Characteristic 1 (Factual). In the previous chapter, we noted that a number of prove- nance systems produce documentation that contains both factual and inferred informa- tion. With this combination, it is difficult to determine whether the process evidenced by the documentation actually occurred as described. Thus, we introduce the notion that process documentation should be factual: it should only be about what is known to have occurred in an application. To support this characteristic, we adopt the notion of observation by participation that is process documentation that evinces a particu- lar application operation should only be created by the component that performed the operation.
Characteristic 2 (Attributable). In a court of law, evidence, particularly testimony, is judged by the person or institution who provides it. Furthermore, if it is found that the evidence given is false then remedial action can be taken against the provider. Similarly, if a user deems that process documentation is somehow erroneous, the user must know who is responsible for the creation of the documentation so that the party can be held accountable. By insuring users know the accountable party, they will have greater confidence in process documentation.
Characteristic 3 (Autonomously Creatable). In both criminal and scientific in- vestigations, evidence is gathered at the most appropriate time and by the most ap- propriate person, device, or institution. By analogy, the distributed components of a
Chapter 3 A Model of Process Documentation 45
multi-institutional scientific system should be able to create process documentation at a convenient point in time without having to synchronise with any other component. Fur- thermore, process documentation created in an autonomous manner should be able to be collated together to present a complete representation of processes that occur across multiple components.
Characteristic 4 (Process Oriented). Evidence is only useful in a court if it can be put together to make a convincing case that a particular crime occurred. Likewise, process documentation is only useful if it can be put together to show that a process occurred. Thus, for a process documentation to be useful, it must be connected such that the process that led to a result can be determined from it. We term this notion of connectedness, process orientation. Therefore, for the data model defined here to meet the motivating use case, it must be process oriented.
Characteristic 5 (Immutable). In a legal setting, once evidence has been collected, it is criminal to tamper with it because it corrupts the view the court has on what occurred. Therefore, if we treat process documentation as evidence, once an application has created it for a particular execution, it should not be modified or deleted. Many times, it is not apparent that process documentation is useful until it is needed. Indeed, many users are caught off-guard when data they previously produced is deleted. Thus, it is important to maintain process documentation even when it is thought to be unimpor- tant. Furthermore, in a multi-institutional setting, where the provenance of data might be used for scientific or legal verification, process documentation must not be tampered with. Hence, process documentation should be made immutable after creation.
Characteristic 6 (Finalizable). If process documentation is created in the context of a dynamic multi-institutional system, it is helpful to know when the components within the system have fully documented their processes so that the system can be disbanded or the component can be relinquished. Furthermore, it is helpful for users to know when full evidence (i.e. process documentation) has been provided for a particular process, otherwise, it is hard to know when a judgement can be made about the evidence. Thus, process documentation should be finalizable (i.e. markable as the final representation of a past process).
We term process documentation that possesses these six characteristics high quality
process documentation. We now present our data model for process documentation called the p-structure. After presenting the p-structure in detail, we show how the design decisions for the p-structure support both these high quality characteristics and our analysis conclusions from Section2.5.
Chapter 3 A Model of Process Documentation 46