Requirements on Databases for PSDEs
3.8 Distribution
Most PSDEs run on workstations connected via a local area network in which each user works at one workstation. In order to benet from the computational power of modern workstations and not to produce a bottleneck of the PSDE on a centralised host, tools of a PSDE should be executed on the user's workstation. Then, however, a need for distribution of documents or at least distributed access to documents and distributed tool communication arises.
When a DBSE is used to store the project-wide abstract syntax graph in a database, a database monitor has to implement the concurrency control protocol. Therefore, tools cannot directly access nodes and edges, but have to communicate their access requests to the database monitor and await the monitor handling their requests. Both tools and monitor require a signicant number of computations to full their duties. To achieve a PSDE performance acceptable for
a number of concurrent users, execution of database monitors and tool processes should be distributed in order to balance the load over several machines.
Tool1 Tool2 Tool3
Display1 Display 2
OS Interprocess Communication OS Communication to external device Process Machine Database Monitor DB DB DB1 2 3
Figure 3.5: Centralised Architecture
Figure 3.5 depicts a model where this is not the case. Database monitor and tool processes are executed on the same host. The monitor and tools use facilities provided by the operating system, such as shared memory, message queues or semaphore sets for their inter-process communication. The advantage of this model is that inter-process communication facilities provided by the operating system perform fast. The severe disadvantage is that all tools must be executed on the same host. In fact, they have to share the host's resources such as virtual memory or CPU-time. The performance of each tool will decrease as the number of concurrent tools increases. Hence the host will become a performance bottleneck as soon as a certain number of concurrent tools is reached.
DB
Tool1 Tool2 Tool3
Display1 Display 2
OS Communication to external device Process
Machine Database Monitor
Network Interprocess Communication DB
DB
1 2 3
Figure 3.6: Client/Server Architecture
To avoid this situation, the obvious strategy is to use a client/server architecture and have tools running on possibly dierent client hosts and the database monitor on a server host. This model is depicted in Figure 3.6. As a consequence, communication between database monitor and tools can no longer be implemented by operating system primitives, but network commu- nication protocols such as sockets or remote procedure calls must be used. The advantage of this is that the load directly caused by tool execution is removed from the server host. In Section 3.2 we required the DBSE's schema to dene all syntax graph accesses and mod- ications. There are two options on where to execute these operations. The rst option is a
DB
Tool1 Tool2 Tool3
Display1 Display 2
OS Communication to external device Process
Machine Database Monitor
Network Interprocess Communication Database Engine Database Engine Database Engine DB DB 1 2 3
Figure 3.7: Client-based Client/Server Architecture
server-based client/server architecture, where the database monitor would execute access and modication operations, receive operation parameters from tools and communicate operation results back to tools. This could still result in a performance bottleneck on the server host, for a great deal of the functionality of tools is implemented by graph access and modication operations. The other option which remedies this problem is sketched in Figure 3.7. In this client-based client/server architecture the part of the database system (called database engine in the following) that executes operations dened in the schema is linked with the tool. The engine is, therefore, executed in the same operating system process as the tool. Thus commu- nication of the tool with the database engine is achieved by procedure calls that perform fairly eciently. The duties that remain with the database monitor are concurrency control and elementary operations on raw data called pages that it has to communicate via some network communication protocol to processes running the database engine. Hence the server hosting the database monitor is further relieved from load compared with server-based client/server architectures.
Database monitors access pages either directly on raw disk devices or indirectly via the op- erating system's le-system. If the latter is the case, the host running the database monitor may be further relieved from load by storing raw data on disks that are connected to other hosts. This is sketched in Figure 3.8. Then the database monitor must use network le-system facilities. The server running the database monitor becomes a client of some other le servers. We, therefore, call this architecture multi-level client/server architecture.
In the previous model, any access a tool performs to a node or edge of the project-wide abstract syntax graph must be processed by the database monitor. Therefore, even multi-level client/server architectures cannot be used in arbitrary large projects. The distributed database architecture sketched in Figure 3.9 no longer assumes only one monitor process and allows for multiple database monitors. Each of these monitors controls a set of local databases. Tools are still served by one monitor. Accesses of a tool to a remote database must be transferred from the local monitor to the respective remote monitor and be handled there.
In practice, hybrid architectures that combine dierent distribution paradigms may be used. Database monitors of distributed database systems, for instance, can store their raw data on
Tool 1 Tool 2 Tool 3
Display1 Display 2
Database Monitor
OS Communication to external device Process
Machine
Network Interprocess Communication Database Engine Database Engine Database Engine DB DB DB 1 2 3 NFS Server NFS Server
Figure 3.8: Multi-level Client/Server Architecture
Tool1 Tool2 Tool3
Display1 Display 2
Database Monitor
OS Communication to external device Process
Machine
Network Interprocess Communication Database Engine Database Engine Database Engine DB 1 DB2 DB3 Database Monitor
Figure 3.9: Distributed Database Architecture
other le servers. Similarly, the database engine in distributed database systems may reside with the database monitor or be linked with tools.
In short, we require from the DBSE at least distribution support based on a client-based client/server architecture or a multi-level client/server architecture. For large projects, ac- ceptable performance will not be achieved without a distributed database architecture. The disadvantage of using a distributed database architecture is that it will incur additional ad- ministration overheads as several database monitors must be controlled and data distribution be administered. Moreover, the two-phase locking protocol will no longer be sucient for concurrency control. Instead a two-phase commit protocol will be required.