Combining Horizontal and Vertical Integration

Selection

O 2 C supports this information hiding in-the-large A schema denition can be exported and

5.9.3.3 Combining Horizontal and Vertical Integration

Given that there are two methods for data integration, we are faced with the problem of when to use which. In principle, vertical data integration could be implemented with horizontal integration. We would have to materialise basically all the virtual syntax graphs that we would have in the case of vertical integration. Therefore, we would have to export each node type from one schema, import it to the other schemas and vice versa. We would then have to create objects and change instance variables in the materialisations of all virtual syntax graphs, whenever the respective objects and instance variables materialising some other virtual syntax graph are changed. If the schemas and their instantiations in bases are, from a structural point of view, very similar there are a number of serious implications. Firstly, tool schemas will be very hard to maintain since a change in the structure of one schema aects all other schemas. Secondly, tools will perform slower as they always have to perform updates in implementations of other virtual syntax graphs. Finally, physical disk space utilisation will be worse than with vertical integration, since each object is physically duplicated as often as it occurs in dierent syntax graphs.

On the other hand, vertical integration cannot be used to implement horizontal integration. Using import/export statements, we do not impose any strategies concerning when to propa- gate a change into a document of another type. In the example given above, changing a name of an entity does not imply an immediate change to the name of the ADT specication of that entity. For instance, it may well be deferred in order to give some other developer, who is in charge of the ADT specication, the chance to reject that change. This is not possible if syntax graphs are virtual and implemented with views. If a node is changed, the change is visible, after commit of the transaction that performed the change, in all the counterparts of that node in other virtual syntax graphs. In the example, if class Entity was a virtual class

andADTModulewas another virtual class and both were derived from the same base class, then

the change of the name of a virtual object of classEntitywould have been visible immediately

in the corresponding virtual object of classADTModule.

This means that views cannot be used for data integration if changes made to nodes are not to be propagated immediately into other nodes. If immediate visibility of changes is required, however, views are the preferred means of integration. In practice, therefore, a hybrid data integration approach that combines vertical and horizontal integration will be used in most PSDEs. If we take above examples, the most benecial integration of the schema, for the entity relationship editor with the schema for the module interface editor and the source code editor, is the following. The interface and source code editors both declare a view each based on a common conceptual schema and, therefore, share their physical syntax graph representations. The entity relationship editor schema is integrated with the former two by importing from and exporting to the conceptual schema of the two other editors. Thus, changes of an entity name can be performed without immediately changing the module name.

5.9.4 Summary

We have seen that the schema of a tool can be divided into a reusable and a tool-specic part. The reusable part contains class denitions which implement properties that are common to arbitrary tools. These classes can be used by inheritance or declaration of instance variables in classes contained in the tool-specic part. As tool-specic classes vary from tool to tool, we have discussed how to identify these classes based on the syntax and static semantics

of the underlying languages and have suggested strategies for dening inheritance, instance variables and methods of these classes. In the last paragraphs, we have discussed how to integrate schemas of dierent tools in order to implement inter-document (type) consistency constraints. We have suggested a horizontal and a vertical strategy for data integration and have discussed when the strategies are to be employed and how they can be combined.

5.10 Related Work

A number of syntax-directed tools have been discussed in literature. Among the rst were tools developed in the Gandalf project [HN86] (e.g. IPE [MF81], GP [HN86] or GNOME [GM84]), the Cornell Program Synthesizer [RT81] or tools developed in the Mentor project [DGHKL84]. The architectures of all these tools contain a subsystem of some sort that maintains abstract syntax trees of documents in main memory. These subsystems oer operations in order to dump syntax trees to a persistent representation on an operating system le and to restore trees from such a representation. Working on a syntax tree representation in main memory that is not the persistent representation has a number of serious consequences. First of all, checking for inter-document consistency constraints cannot be based on edges that span between documents. Therefore, these tools do not adequately address the issue of checking for consistency between dierent types of documents. Sometimes dierent unparsing schemes are used to display dierent document types. In this case, however, changes made to one document are immediately visible in its corresponding document, which may not always be appropriate. Secondly, updates of concurrent users that aect the document a user is working on cannot be handled appropriately, as there is no common persistent representation. Moreover, tools are not tolerant of hardware or software failures and users might lose signicant eort in the case of a failure. Finally, none of these tools support version and conguration management of documents. Dierent revisions of a le representation of an abstract syntax could be maintained by controlling them with basic versioning mechanisms such as SCCS or RCS [Tic85]. Then, however, neither predecessor and successor relationships between documents nor consistency of congurations can be maintained by the tools.

For tools contained in the IPSEN environment [Nag85, ENS87], a fundamentally dierent architecture is chosen [Sch86]. First of all, the tools do not operate on a transient document representation, but work directly on the persistent representation managed by the GRAS database system (c.f. Subsection 4.3.1). All documents are stored in a single GRAS graph. Each document is represented, in turn, by a subgraph with reference edges leading to other documents' subgraphs, thus facilitating ecient checks of static semantics and inter-document consistency constraints. The GRAS transaction mechanism is used and, therefore, loss of eort in the case of failures is restricted to the last completed command. Revisions of documents are supported in a later version of the IPSEN prototype [Wes89b] based on the functionality oered by the GRAS database system. Congurations that select dierent versions of a document, however, are not supported. Concurrent editing by multiple users is not possible in IPSEN, since all documents are stored in one graph which has to be locked exclusively as soon as a user wants to modify a document. If documents were stored in separate graphs in order to have a ner granularity of locking, reference edges between dierent graphs and exploitation of the transaction mechanism would be inhibited due to the limitations of the GRAS database (c.f. Page 66). Further drawbacks are the lack of schemas and views in GRAS. Finally, the IPSEN architecture is not open. It might be extended with a subsystem for service execution, but then concurrency control problems might occur since services have to be executed concurrently

to editing sessions. This is not possible with the GRAS database if both service and editing sessions have to access the same graph.

The problem of openness is, to a certain extent, solved by the architecture of the Field pro- gramming environment [Rei90]. The environment contains a number of tools including textual editors and graphical viewers for source code. These tools are open and communicate with one another based on a message passing subsystem. They send a message of a particular type to a broadcast message server4 which, in turn, broadcasts the message to any other tool

which has registered as being interested in messages of that type. The problem with Field and its broadcast message server is that it is still a single-user solution as the server cannot route messages to remote workstations. In addition, exchanging messages in no way solves the problem of dierent users concurrently changing related documents.

5.11 Summary

The architecture developed in this thesis overcomes all the above deciencies by storing abstract syntax graph representations of documents in object databases. The structure of, and available operations on, these syntax graphs are dened and controlled by the ToolSchema

subsystem, dened in an object-oriented schema denition language. Version management of documents is supported, based on version management primitives for composite objects, which are to date only oered by Orion and

O

2. Simple support for conguration management is

implemented in the ToolKernel of the tool architecture. Data integration of tools, i.e. mea-

sures for preserving inter-document (type) consistency constraints, is based on export/import between dierent schemas, an object-oriented view mechanism or a combination of these. Communication between tools and the process engine is done via a communication protocol built on top of a message router. Commands are executed as ACID database transactions, which are controlled by the ToolKernel in order to secure the integrity of the syntax graph

against failures, as well as arranging to save, concurrent updates of related documents. Dis- tributed execution of tools is supported, though not reected in the tool architecture. It is transparent to tool builders since it is achieved by the client/server architecture of current object database systems.

During the course of this chapter, we have separated components of the tool architecture into reusable and tool-specic components. The reusable components are the Control class, the ToolKernel,SoftwareProcessCommunicationProtocoland UserInterfacesubsystems and some

of the classes contained in the CommandExecution, ToolAPI and the ToolSchema subsystems.

The ODBS, the MessageRouterInterfaceand the UIMSsubsystems are third party components

and have been reused in any case. The reusable components can now be reused in any tool architecture. This signicantly simplies the tool construction process. Nevertheless, there are a number of components which still have to be constructed anew for each tool. These are the LayoutComputationclass, the ToolSpecificServicesclass and most of the classes in the CommandExecution,ToolAPIand ToolSchemasubsystems.

4The Broadcast Message Server product from Hewlett Packard was, in fact, built according to the architecture

The GOODSTEP Tool

In document Tool specification with GTSL (Page 124-127)