Version Management - Requirements on Databases for PSDEs

Requirements on Databases for PSDEs

3.5 Version Management

Given the overall representation of a project as dened in Section 3.1, the DBSE must support management of versions of those subgraphs that represent versionable documents. This

requires that the DBSE oers a concept for dening subgraphs. Subgraphs may be dened statically in the schema or dynamically at run-time. For a static denition, the DBSE's DDL must provide language primitives for distinguishing aggregation edges from reference edges. A dynamic denition requires a predened type that can be used for implementing subgraphs. This type must oer operations for including and removing nodes and edges to or from the subgraph. Static denition is more ecient in terms of space and time since for any existing project-wide graph no further actions are required to identify the subgraphs that implement documents. Dynamic denition of subgraphs is more exible, since arbitrary components of the project-wide abstract syntax graph can be identied as a subgraph.

The particular functionality required for version management of subgraphs is that, during creation of a root node of a subgraph, an initial version or root version must be created as well. Versions must have a name called version label for accessing them. Therefore, the DBSE must oer facilities for dening version labels. The DBSE must then support derivation of a new version of a subgraph from a particular version of the subgraph. It must maintain the version history between dierent versions of a subgraph, i.e. the DBSE must keep track of the predecessor/successor relationship between dierent subgraph versions. It must facilitate navigation along predecessor/successor relationships. Moreover, the DBSE must support a tool session in selecting a current version of a subgraph so that all its nodes and edges are seen in the state of that version. In addition, the system must support the denition of default versions of subgraphs which are persistently stored and used to determine accesses to nodes and edges when no current version has been explicitly selected.

Function Comment Using Type OpName Param List ADT Module Comment Type Name Operation List Imp Module Import List Import Interface Window Version 0.0 ’TWindow’ ’Window’ ’CreateWindow’ ’/* defines a type...’ ’BasicTypes’ ModName Function Comment Using Type OpName Param List ADT Module ModName Comment Type Name Operation List Imp Module Import List Import Interface In Param ParName Type Import

version duplicated edge version specific edge successor edge UsingType ’Window’ ’TWindow’ ’CreateWindow’ ’name’ ’STRING’ ’TWindow’ ’BasicTypes’ ’STRING’ ModName Comment TC Module Type

NameList TypeName Type Name Type Name Type Name Type Name BasicTypes Version 0.0 ’BasicTypes’ ’STRING’ ’BOOLEAN’ ’CHAR’ ’INTEGER’ ’CARDINAL’ ’/* defines a type...’

’/* creates a new window */’ Window Version 0.1

Figure 3.3: Versions of Subgraphs of the Project-Wide Abstract Syntax Graph

Figure 3.3 depicts earlier subgraph versions of the module interface graphs displayed in Fig- ure 3.1. The two subgraphs that represent module Windowand module BasicTypesare under

version control. Currently two versions of Window exist. They are labelled Version 0.0 and Version 0.1. There is a successor edge between these two subgraph versions indicating that Version 0.1is a successor ofVersion 0.0.

After two subgraph versions have been derived from the same predecessor, they may have to be merged. We note that this merging cannot be done completely by the DBSE, but requires additional actions by a tool. In case of conicting changes in the two subgraph versions where the same nodes have been modied, the DBSE cannot decide which version of the node to take. Instead, this has to be determined by the tool user. The basic mechanism a tool builder requires from a DBSE is that it can merge two subgraph versions in the version derivation graph and provides a primitive to nd the dierence in terms of nodes and edges between dierent versions of the same subgraph.

We also note that the DBSE need not oer the means for freezing versions. During conguration management, changes will be required to subgraph versions, although the corresponding document is considered frozen. These changes might have to create or delete reference edges that keep track of a new semantic relationship between document versions. Changes of at- tribute values might be required during inter-version consistency checks. These changes are impossible if the subgraph is frozen. As a consequence, the DBSE must not impose the same behaviour on subgraph versions that we had required for document versions, but it must enable changes to be made to subgraphs that have successor versions. In fact, the mechanism that freezes documents cannot be implemented in the database, but must be implemented within tools.

Within versioned subgraphs the DBSE must resolve between fully lazy and fully eager duplication strategies for nodes and aggregation edges of the subgraph concerned. Fully lazy duplication gives maximum sharing of components, and hence minimum storage usage, but complicates the update process during user edits. Fully eager duplication avoids all such complications, but implies maximum storage usage.

The DBSE must also decide between alternative strategies for handling both intra- and inter- document relationships (or reference edges). Intra-document reference edges will normally be treated (like aggregation edges) as version-duplicated (i.e., a new edge is automatically created for each version created). Inter-document reference edges may be seen as version-specic (and hence not duplicated when new versions are created). Version-specic edges then represent relations between document versions within a conguration.

In the example given in Figure 3.3, aggregation and intra-document reference edges are considered as version-duplicated. Therefore, the edges that connect using types with the nodes where the types are declared are drawn as a solid shape. Inter-document reference edges are considered as version-specic, since they form particular congurations of subgraph versions.

3.6 Transactions

In Subsection 2.3.2 we required integrity preservation and immediate persistence for command execution in tools. To implement this, we require the means from the DBSE to group a set of operations that access or modify the abstract syntax graph to one unit. We call these units transactionsin the following. A transaction is started by a tool before it begins to execute a command. It, therefore, issues a start transaction request to the DBSE. Then the tool performs the operations necessary to execute the command. When nished it completes the transaction by issuing a commit request to the DBSE. If it detects an intolerable error, the tool may also explicitly request a transaction abort. This undoes the eect of the current transaction and recovers each modied node and edge to the state it had when the transaction was started.

In general, we require transactions to have ACID properties as suggested in [Gra78]. Due to the atomicity property, transactions are either performed completely or not at all. Due to the durability property, the eect of a completed transaction in every case persists in the database. The consistency2 preservation property ensures that after completion of a transaction, the

abstract syntax graph is in an integer state and tools can continue using it. Finally, the isolation property ensures that the eect of a transaction is independent of other concurrent transactions. Hence the isolation property of transactions contributes to the multi-user support required in Subsection 2.2.3. To achieve isolation of transactions, the DBSE has to apply a concurrency control protocol. The most common protocol is two-phase locking [BHG87]. During this protocol nodes and edges are locked by the DBSE as soon as they are accessed by tools, i.e. locking should be transparent to tools. During commit or abort all locks of the transaction are released at once.

If a node or edge is accessed without being modied, the DBSE only has to lock it in read mode. If it is modied, the DBSE has to lock it in write mode. Read locks are compatible with each other, i.e. the DBSE can grant arbitrarily many read locks for a node or edge. Unlike read-locks, write locks are neither compatible with read locks nor with write locks. This implies that if a transaction obtains any lock on a node or edge, another concurrent transaction cannot acquire a write lock on the same node or edge.

This strict concurrency control protocol is appropriate in syntax-directed tools for the following reasons. Command execution in tools only lasts for a very short period of time, say less than some hundred milliseconds. During this time, only a few nodes are modied. In addition, it is unusual for two or more users to edit the same document concurrently. Therefore, the only node accesses that could possibly result in concurrency control conicts are node accesses along inter-document reference edges. They eectively cause a concurrency control conict only if another tool concurrently accesses the remote node in an incompatible mode. In these rare cases the tool can await completion of the concurrent transaction before it gets the lock granted. Users will hardly ever notice these short delays.

In even rarer cases, it can happen that concurrent command execution results in a deadlock, because two-phase locking ensures serialisability, but is not deadlock-free. Consider that a transaction executing one command has locked a set of nodes and edges while another transaction has locked another set of nodes and edges. If both transactions now try to lock a node or edge the other transaction has already locked in an incompatible mode, a deadlock will occur. We require the DBSE to detect these deadlocks and inform the tools about them. The tools can then ask their users to abort the command execution and retry it later. If one user aborts, the deadlock will be resolved without losing signicant eort. The tool will recover to the state it was in before the aborted transaction started.

In many cases a PSDE builder knows that concurrency control conicts cannot occur. If we consider an editor for a programming language like Modula-2, for instance, and assume that only one user is responsible for one module, then concurrency control conicts cannot occur during edit operations on statements, comments, parameter names and local variable declarations. If two-phase locking is used for these operations, the DBSE will use unnecessary eort checking for concurrency control conicts. If the DBSE oers a weaker transaction concept which we call activity in the following, the transaction throughput can be increased signicantly. These activities need not guarantee isolation of the group of operations they

2This notion of consistency is at a lower level and must not be confused with static semantics or inter-

document consistency. For the DBSE any syntax graph is consistent, even though it might represent a completely inconsistent document, as long as the graph conforms to the schema denition.

execute, but only ensure atomicity and durability. These two properties are needed to preserve the integrity of the project-wide abstract syntax graph against any kind of hardware or software failures. During a tool session, both transactions and activities may have to be used in an arbitrary sequence. As an example, consider that the user of the Modula-2 editor wants to change a procedure name after he or she has changed a comment. Changing the procedure name must be performed as a transaction, since other concurrent users may, at the same time, create an import relationship in one of their modules importing the changed operation. We note that we do not require any advanced transactions such as nested transactions [Mos85], design transactions [KSUW85, Kat84], group transactions [Kel89], CAD transactions [BKK85] or split/join transactions [KHPW90] from the DBSE. All these advanced transactions assume patterns of cooperation between users of a PSDE, which do not apply in general. Therefore, these advanced transactions cannot be built into the DBSE, but should rather be dened explicitly by the process model and be implemented by the process engine. The DBSE and the tools only have to oer the basic mechanisms to enable the process engine to do so. As argued in [Wol94], ACID transactions, together with the means for versioning documents, are sucient for this purpose.

Note also that version management with lazy node duplication interferes with concurrency control. Assume that a node is shared by two versionsV1andV2. If one transaction modies the

node in versionV1, while some other transaction accesses the node in versionV2, a concurrency

control conict arises. This conict is not serious and should not cause one of the transactions to be blocked or even aborted. Instead, the shared object is split into two versions and the two concurrent transactions continue to be executed. This, of course, requires concurrency control management to be aware of the versioning strategy. Thus, the lazy duplication strategy for version control must be implemented inside the DBSE together with the basic concurrency control. It is impossible to implement it on top of a DBSE that does not support versions of subgraphs.

In document Tool specification with GTSL (Page 34-38)