One of the consequences of applications that span multiple institutions and the use of the above deployment patterns is that process documentation need not be centrally located and can reside across various locations. There are several benefits to this: the elimination of a central point of failure, the spreading of demand across multiple services and the ability for provenance stores to exist in different network areas (for example, one provenance store may be behind a firewall whereas another is not). In general, allowing p-assertions to be recorded across multiple locations increases the flexibility and scalability of systems recording p-assertions.
However, to retrieve the provenance of a result, distributed process documentation must be connected so that the provenance of results can be found. The technique of linking, discussed below, enables this distribution.
Given that the p-assertions documenting a given execution may be spread across multiple stores, there must be some mechanism to retrieve these p-assertions in order to validate, visualise or replay the represented process. To facilitate such a retrieval mechanism, we introduce the notion of a link defined as follows.
Definition 4.1. A link is a pointer to a provenance store.
We note that links are necessarily unidirectional: a link always points to a remote provenance store location. Links are used in two instances, which we now describe.
4.3.1 View Links
The first use of a link deals with the situation where a sender’s view of an interaction and a receiver’s view of the same interaction as identified by a shared interaction key are stored in two different provenance stores. It is necessary for each actor to record a link, which we refer to as a View Link, that points to the provenance store where the opposite party recorded their p-assertions. Thus, the sender in an interaction records a link to the provenance store that the receiver used to record p-assertions for the given interaction, and vice-versa. This allows querying actors to navigate from one provenance store to the other in order to retrieve both views of an interaction. We note that View Links point to provenance stores only, not to particular pieces of data in a provenance store; the actual data of interest can be found by a local search of the provenance store.
Chapter 4 Recording Process Documentation 78
If an actor, A, interacting with actor B has to assert, in provenance store PA, that B is
recording its view of the interaction in another provenance store, then actor A has to become aware that the store used by B isPB. Either such knowledge is built into A, or
it is communicated to A in the course of application execution. If it is built into A, then such knowledge is already part of A’s scope, and can be asserted by A as an internal information p-assertion. Alternatively, if it is to be communicated to A, then such knowledge can be passed as part of a context, as shown by the ContextPassing pattern (for instance, when B returns a result to A). Hence, A can assert the link as part of an interaction p-assertion. When A and B do not have a request-response interaction, B can communicate to A the link to PB by an out of band mechanism. One concern
with View Links is the possible performance overhead of passing the provenance store location back to the sender. We have tried to minimize this impact by allowing the information to be sent in response messages. However, in a very dynamic system where an actor uses a different provenance store for every interaction there may be significant overhead. Developers should be aware of this limitation when using View Links. View Links are stored in the provenance through p-assertions. If they are passed by in the context of a message, then they are stored in interaction p-assertions. If they are provided by some other mechanism, they can be stored as internal information p- assertions. To enable queriers to more readily discover and traverse provenance stores, View Links should be made accessible via exposed interaction metadata. Recall from Section 3.3.2.5, that exposed interaction metadata places metadata in a well defined location in the p-structure such that queriers know where to look for the information, without having to traverse application-specific p-assertion content.
4.3.2 Cause Links
Chapter 3 defined relationship p-assertions as internal causal connections between oc- currences, where an occurrence is identified within process documentation by locating the p-assertion that represents it (See Section 3.3.2.2). Furthermore, the p-assertion containing the occurrence that is the effect of a relationship p-assertion must be located in the same View as the relationship p-assertion (See Section3.3.2.5) itself. However, the p-assertions the represent the causes in the relationship p-assertion may be in various other Views and therefore located in different provenance stores.
To assist in finding these p-assertions, we introduce a second usage of links. For each relationship p-assertion it creates, an actor needs to identify which provenance stores the p-assertions representing causes are stored in; such a link is named aCause Link. A Cause Link extends the relationship p-assertion data structure by adding the provenance store where each cause is stored. Like the View Link, a Cause Link only points to the provenance store and not to a particular piece of data in the store.
Chapter 4 Recording Process Documentation 79
4.3.3 Linking Summary
Figure 4.8 shows an example of how both Cause and View Links are recorded. Actor A sends a message M2 to actor B as a consequence of message M1. The interaction exchanging messageM2 is identified by interaction key 2. In the context (shown by the circle) of the message, A puts a link to the provenance store, PA, that it uses for the
interaction with B. Actor B then extracts the link from the context and records it as an interaction p-assertion in the provenance storePB. As a result, a View Link fromPBto
PAis created (shown by the arc VL 1). We assume that A knows from its configuration
that B always stores its p-assertions inPB. Hence, A records a link toPBas an internal
information p-assertion in PA, which creates a corresponding View Link shown by the
arc VL 2. All View Links are made available by actors recording exposed interaction metadata. Finally, A makes a relationship p-assertion between its interaction with B and the previous interaction containing M1. As part of the relationship p-assertion, it adds a Cause Link to the provenance store, PR, where the p-assertion related to the other
interaction is stored. It then records the relationship p-assertion inPA, thus connecting
PAtoPR shown by the arc CL. Figure4.9 shows the contents of the provenance stores
PAand PB after all p-assertions have been recorded.
A PA B PB M2 PA Internal information p-assertion containing a View Link to PB Relationship p-assertion relating M2 and M1 with a Cause Link to PR Interaction p-assertion containing M2 and a View Link to PA PR M1 VL 1 VL 2 CL
Figure 4.8: An example of linking
Both View Links and Cause Links allow data and p-assertions stored across provenance stores in multiple institutions to be retrieved by querying actors. View and Cause Links can be contrasted as follows.
• A View Link points to another store that contains a piece of data written by another actor (which is providing a different view on a same interaction).
• A Cause Link points to another store containing a piece of data asserted by the same actor (which is making assertions about another interaction).
Chapter 4 Recording Process Documentation 80
Contents of PA
interaction key p-assertion type p-assertion content
1 interaction M1
2 interaction M2
2 internal information View Link to PB
2 relationship 2 is related to 1, Cause Link to PR
Contents of PB
interaction key p-assertion type p-assertion content 2 interaction M2, with View Link toPA
Figure 4.9: Contents of provenance stores
Links provide a solution to the problem of connecting distributed process documentation. Similar to the Web, the unidirectional nature of links avoids the problem of having to synchronise between provenance stores when recording a link. Instead, each actor is responsible for recording a link just as each web page author is responsible for adding links to other pages as appropriate. Creating links is lightweight; the information needed to establish a link is minimal. Furthermore, the link structure provides a structured and simple mechanism for querying actors to traverse provenance stores hosted by multiple institutions.
So far, we have discussed the high level concept of recording process documentation into provenance stores and how this documentation can be connected so that it can be queried in a distributed environment. We now present a protocol by which actors can record their p-assertions into provenance stores.