Data Denition and Data Manipulation Language

Requirements on Databases for PSDEs

3.2 Data Denition and Data Manipulation Language

The kinds of nodes and edges required to represent a project, and the attribute information associated with each, cannot be determined by the DBSE itself. They should be dened by tool builders and then be controlled by the DBSE in order to have dierent tools sharing a well- dened project-graph. The overall structure of the project's syntax graph should, therefore, be dened in terms of the data denition language of the DBSE and established and controlled by the DBSE's conceptual schema.

As a minimum, we require that the data denition language (DDL) can express the dierent node types that occur within the graph, that it can express which edge types may start from node types and to which node types they may lead, and that it can express which attributes are attached to node types. Such basic requirements are common to any graph storage. To allow for concise schema denitions, the DDL should allow properties common to more than one node type to be specied only once in, for instance, an abstract node type declaration. All node types sharing this property should then be declared in such a way that they inherit the property from the abstract type. As an example, consider nodes of types Function and Procedureas used in Figure 3.1. They have common properties such as outgoing aggregation

edges to nodes of typesOpName,ParamListandComment. It would be appropriate to dene these

properties only once in some abstract node typeOperation. ThenFunctionandProcedurenode

types should inherit the common properties fromOperation.

In practice, the data denition language should be tailored towards syntax graphs that the DBSE is used to store. Common structures in syntax graphs are multi-valued aggregations or references such as lists, sets and dictionaries of nodes. The data denition language should, therefore, oer the means to express these multi-valued edges as conveniently as possible.

They often not only contain nodes of one type, but of a number of dierent types. As examples, consider data ow diagrams that consist of a set of processes, terminators, stores and ows [dM78]. The DDL should, therefore, facilitate the denition of this kind of heterogeneous structures. Finally, structures in abstract syntax graphs may be nested. As examples consider data ow diagrams where a process can be rened by another data ow diagram or procedure declarations in Modula-2 that can contain nested procedure declarations. To anticipate these structures, the DDL of the DBSE must be able to express recursive structures.

As argued previously, changes to the internal syntax graph should become incrementally per- sistent. Therefore, edit operations performed by tools on documents have to be implemented in terms of operations modifying the internal syntax graph. These operations should be established as part of the DBSE schema mainly for two reasons:

Encapsulation:

The structure denition of the project-graph should be encapsulated with operations which preserve the graph's integrity. They then provide a well-dened interface for accessing and modifying the graph. In order to enforce usage of this interface, the operations must become part of the DBSE schema.

Performance:

Executing graph accessing and modifying operations within the DBSE is more ecient than executing similar operations within tools. In the latter case a signicant number of nodes and edges need to be transferred from the DBSE to tools via some network communication facility, which is rather expensive in time.

To establish graph-modifying operations as part of the DBSE schema, the data manipulation language DML must be powerful enough to express them. This means in particular, that the DBSE's DML must be capable of expressing the creation and deletion of nodes and edges as well as the assignment of attribute values. Moreover, the DML must be computationally complete, as alternatives and iterations are needed in graph-modifying operations for graph traversal purposes.

The DBSE should provide predened operators to access and modify multi-valued aggregations or references such as lists, sets and dictionaries. These operators should facilitate enumeration of all nodes of these structures. Given the identity, a property or a position, there should be operators to search for nodes contained in such structures. Finally, there should be operators to update these structures through the insertion or deletion of a node.

Abstract syntax graph denitions for single document types have already become rather com- plex. If we consider C++ class denitions, for instance, about 80 node types have to be dened with their aggregation and reference edges because the subset of the C++ grammar for class denitions has as many non-terminal and terminal symbols. In order to cope with the complexity of the schema denition for the whole PSDE, which may contain several document types, the DBSE's DDL must oer structuring mechanisms for schemas. Then the overall conceptual schema of a PSDE can be dened in fairly independent component schemas { one for each tool. As these schema components must rely on denitions of other schema components in order to implement inter-document consistency constraints, importing and exporting schema denitions between components must be supported by the DDL.

3.3 Views

In PSDEs, there are often instances of dierent document types that provide dierent views of the same aspect of a software system. Most information contained in documents of early phases (such as architectural or module interface design) is included in documents produced in later phases (such as the module implementation). Hence, documents of dierent types often contain redundant information. The module interface design in Figure 2.1 on Page 5, for instance, denes signatures of operations. The C implementation contains each of these operations with the same names, and parameter lists matching the signature of the design. If design and implementation of the module are considered as distinct documents, this leads to a high number of inter-document consistency constraints. In the document representation, this leads to redundant nodes in many abstract syntax graphs that must be related by inter- document reference edges.

Eliminating this duplication by sharing the aggregation subtrees concerned has the following advantages:

storage of the schema and corresponding data requires less space,

inter-document consistency preservation especially across document boundaries, is auto-

matically achieved, and

the conceptual schema is simplied.

Such sharing cannot be contemplated, of course, if automatic inter-document consistency preservation between documents is inappropriate.

If subtree sharing is to be used, tools must use the same conceptual schema. In order to maintain appropriate separation of tool concerns and so allow separate tool development and maintenance, the DBSE must provide a view mechanism like that oered in many relational database systems.

The view mechanism must allow for view denitions consisting of virtual node and edge types. A virtual node or edge type declaration is based on a (real) node or edge type declaration given in a conceptual schema. The set of all virtual node and edge type declarations of a view dene structure and behaviour of a class of virtual abstract syntax graphs.

Instances of these virtual node and edge types are virtual as well. They are called virtual nodesand virtual edges, respectively. They do not persist in the database, but are derived by the view mechanism according to the virtual node and edge type type denitions from nodes and edges stored in the database. These stored nodes and edges have been instantiated from real node and edge types dened in the conceptual schema. Therefore, we call them real nodes and real edges respectively. Accordingly, the abstract syntax graph they form is called real abstract syntax graph.

As an example consider Figure 3.2. It depicts an excerpt of the abstract syntax graph from Figure 3.1 with two virtual abstract syntax graphs dened on top of it. The two virtual syntax graphs represent excerpts of a module implementation graph and a module interface graph. Dashed lines indicate the relationship between virtual nodes and the real nodes they were derived from.

The particular functionality required from a view mechanism for a DBSE is concerned with how virtual node and edge types are dened based on real node and edge types. Not all node

Function Comment Using Type OpName Param List ...

Virtual Module Design Graph Function Comment Using Type OpName Param List Body Statement List VarDecl List ... ... ... Virtual Module Implementation Graph

Function Comment Using Type OpName Param List

Body Statement_List VarDecl List

...

... ... Real Abstract Syntax Graph

Figure 3.2: Two Views of an Abstract Syntax Graph

and edge types dened in a conceptual schema need to be accessed in a view dened on top of the schema. In the above example, it is not necessary for a view dening a virtual module interface graph to include variable declaration lists and statement lists. Therefore, the view mechanism must support the hiding of node and edge types dened in the conceptual schema from one or the other view.

Even if a node type must be visible in a view, it may be appropriate to hide parts of its declaration. Attributes, operations or edges starting from a node may only need to be visible in some views while they are hidden in others. In the example above, function nodes have to be visible in both views. As the body of a function need not be seen in the interface view, the edge leading from a function node to a body should be hidden from the module interface view. Moreover, it may be reasonable to build the interface and implementation tool in a such way that changes to the interface are made in the module interface document only and inhibited in the implementation. Therefore, operations for changing the signature of a function may be hidden in the implementation view while they are available in the design view.

Some operations may be specic to only one view. Then it would be inappropriate to dene them in the conceptual schema, for they would not be shared by dierent views and have to be hidden in all views but one. We, therefore, require that operations for accessing or modifying virtual nodes or edges can be added to a view. Operations dened in a conceptual schema can then also be redened in a view. Then the operation from the conceptual schema would be hidden rst. After that its redenition would be added to the view.

Views are used by tools like editors that have to modify virtual abstract syntax graphs. There- fore, views must be updatable, i.e. updates that tools perform on virtual nodes and edges must transparently migrate into the database the view is built on. To make this updatability possible we cannot have virtual node or edge types added to a view, but not dened in the conceptual schema. If the node or edge type were dened on the relevant view only, which would be attractive for the same reason as adding operations to a view would be, the view could not be updated because the DBSE does not know of any real node or edge types upon which to store, for example, new virtual nodes or edges.

In document Tool specification with GTSL (Page 30-34)