I These classes are described fully elsewhere [dear87] To give the reader a flavour of PAIL
54.2.5 Distribution
In a distributed persistent environment objects may be transparently moved from one machine to another. The principle of orthogonal persistence implies that users do not know where or how their data is stored. In other words, data is manipulated independently of the storage mechanism. In a distributed network of non-homogeneous machines this has serious consequences for the design of the architecture. Since the architecture supports procedures as first class data objects one of the objects that may be moved from machine to machine is code. This code must be capable of being executed on any of the machines in the network. If native code on one machine is moved to a different machine this clearly is not possible.
There is a need for a machine independent network language that describes procedures. PISA provides two of these. The first of these is Persistent Abstract Machine code. This code may be executed on any machine that has a PAM implementation. This code is not optimal since, in general, it wiU need to be interpreted. For this reason, a location exists
.-yv.'
within a PAM code vector to which alternative code vectors may be assigned. As an optimisation, native code vectors for any of the machines in the network may be assigned here. This location may be a reference to a vector of alternative code vectors if more than one alternative is required. In order to compile optimal native code for a machine, a second higher level representation of the procedure is required. PAIL provides a high level, machine independent description of procedures.
S.4.2.6 Protection
Protection of data from corruption and misuse is important in any system, however, it is especially important in a persistent system. The data on which a persistent program operates is not merely local data loaded into RAM. It may be long lived data that has been expensive to collect, this data is equivalent to data stored in a conventional database. It is essential that this data may not be corrupted by erroneous programs or malice.
The most common method employed to protect data, is the use of capabilities [nee74,wul74]. A capability gives a program the ability to perforai some operations on a collection of data. The data may be viewed as a segment or object. The capability may then be considered as an access mechanism for a particular object. That is, it gives the program the ability to operate on data within a particular segment or object. The kind of operation the program may perform on the segment depends on the type of the capability. Capabilities come in different flavours such as read or write capabilities, A program cannot read some data within an object without a read capability for that object.
In capability based systems the protection of capabilities themselves becomes crucial. Some protection must exist in the system to ensure that capabilities are not forged either accidentally or deliberately by programs. The results in the protection mechanism having to be protected, resulting in considerable complexity in the architecture.
When a program attempts to perform some operation on an object the capability that the program has must be checked. The more complicated the protection regime provided by the system, the more complicated and therefore expensive this checking will be. This expense is
133
- V ■ ...
I
extremely costly in terms of program execution time. Research performed on the Cambridge Capability Machine [nee74] estimated that 1000 operations were necessary between capability checks to obtain acceptable performance. This is required in order to keep the cost of context switching small in comparison with the amount of computation that is performed in a context.
In order to achieve this efficiency target, compilers are required to compile checks away by that coalescing small objects into larger ones. In this way, one capability may protect many small objects. If this is possible many objects may be accessed with only one capability check. However, this may only be achieved if objects can be grouped statically. Whilst this is true for some objects such as code vectors, it is not generally true, many objects are bound dynamically and must therefore have separate capabilities. This is an intrinsic problem of capability systems and cannot be overcome.
Another common solution to the protection problem is for each process within the system to have its own address space. Each process may only access data within this address space. The machine architecture prevents processes from accessing any data outside their own address space, protecting other data from misuse or corruption. This solution is common in modern operating systems like Unix.
The problems with each process having its own address space is twofold. Firstly, process creation is an extremely expensive operation. This expense has lead to the name heavyweight processes being adopted for this solution. Secondly, and more seriously, this solution complicates the sharing of objects between processes.
The protection of data from corruption is only necessary if the executing code cannot be guaranteed to operate safely. Code that has been produced by low level languages such as C
or assembler may violate data. A simple example of this is shown in the following segment j of C code.
'■’î
'4*
disasterO
*int a = 0 ;
while( ++a ) *a = (int) a;
}
This procedure will overwrite all the addresses in the address space with their own address.
If such programs are prevented from occurring the protection mechanisms described above
I
may be discarded. This allows processes to share one address space without fear of objectsbeing corrupted by rogue processes.
1
In PISA, the protection of data is achieved by a high level protection mechanism, this is | provided by PAIL. All programs that wish to access the persistent store must be compiled
into PAIL. PAIL is the lowest level at which access to the store is provided. The integrity of a PAIL program is checked by the code generator. If the program attempts some illegal | operation on data the program will not be accepted by the code generator.
Not all languages may be mapped into PAIL, for example, languages that are not typed such as C or assembly language. The restriction of only allowing languages that may be compiled into PAIL into the architecture may seem restrictive but this is more than compensated for by
the simplification of the underlying architecture. -
4 5.5 Conclusions
PAIL is not an intrinsic requirement of the system, it is provided for engineering reasons. It has been shown how the provision of PAIL may support many different activities within the
persistent architecture. These activities are wide ranging and include code generation, |l debugging, optimisation, syntax directed editing, distribution and protection.
• I
6 The Compilation Environment