Programming Languages and Related Concepts

As previously mentioned, the Stack Based Approach has been used successfully to seamlessly integrate querying constructs and programming abstractions into a uniform database programming language. F irst, we introduce this approach in greater detail. Subsequently, we will address other integrated languages and object-oriented (database) environments that have made significant contributions to the respective research areas.

2 . 1 The Stack-Based Approach

Subieta et al. [131] investigates the 'seamless' integration of a query language with a programming language. Thus, a foundation of a QL-centralised programming language according to the traditional paradigms of the programming language domain is built. An extended approach to two-stack abstract machines - known from classical programming languages such as Pascal - is presented. This proposal includes definitions of an abstract storage model, an abstract machine model, and semantics of query and programming language operators that are defined through operations on these stacks. F igure 2 . 1 shows the relationship between data models and the abstract storage model.

Data Model

j

: Query in the : : Data Model :

I . I

1 Query Adressmg the 1 1 Abstract Storage Model I L - - - 1

� SBQL's Abstract

Machine Program Storage Model

: Interpretation of rile Result : in the Data Model

2 . 1 . THE STACK-BASED APPROACH Markus ·Kirchberg In the abstract storage model, objects are defined as triples < id, name, value > ,

where i d i s a n internal object identifier, name represents its external identifier,. ancl'

value is either an atomic value, an identifier of another object or a �et of ·objects. ·A database instance is viewed as a set of objects that satisfy the referential integrity . constraint.

The abstract machine model introduces two stacks. These are the environment stack

ES and the query result stack QRES. The environment stack deteri:nimis scoping and binding1 . ES never stores objects directly, instead it only holds (collections of) references to objects. The query result stack is a storage for i ntermediate results, used either for the evaluation of query operators or for the evaluation of arithmetic-style expressions. Results are represented as tables, which are bags of atomic values and internal object identifiers.

Semantics of operations are part of the proposed language SBQL (Stack-Based Query Language) . It is an untyped, query-centralised programming language in the 4GL style. Evaluation of SBQL operations is performed by a single machine. The state of this machine is defined by the current database instance, ES and QRES. Seman tics of SBQL operations are defined using top-down recursion in accordance with the respective query tree. In particular, semantics are defined for the following operations: Atomic queries, compound queries (using both algebraic and non-algebraic operators), selection, projection, navigation, path expressions, natural join, quantifiers, bounded variables, transitive closure, ordering, null values, variants, assignments, and f or each statements. In addition, remarks are provided on how to deal with procedures, classes, class inheritance, methods, and encapsulation. As an example, let us consider oper ational SBQL semantics for the navigational join [131 , pages 33 and 34] in greater detail.

EXAMPLE 2 . 1 . SBQL supports only a single join operator, which is the navigational

join. Its operational semantics are defined as follows:

01 procedure evaL ( query : string ) ; 02 begin

04 if query is recognised as q1 t><l Q2 then

11 the ma�n eval procedure

05 begin 11 commence evaLua t i on of a navigat ionaL join 06 var RESUL T : Table ; 11 define resu L t variab L e as a tab L e 07 RESULT := 0 ; 11 ini t i a L is e resu L t v ariab L e 0 8 eva L ( q1 ) ; I I eva Lua t e Ql expression

09 f or each r E t op ( QRES ) do I I for each row of the resu L t of Ql do

10 begin

1 1 push ( ES , nes t ed ( r ) ) ; I I open a new scop e on ES 1 2 eva L ( q2 ) ; I I eva Luate q2 express ion 1 3 RESULT : = RESULT U ( r. 181 t op ( QRES ) ) ; I I join and add t o resu L t

14 p op ( QRES ) ; I I c anc e L the resu L t of Q2

15 pop ( ES ) ; I I res t ore the previ ous s t a t e of ES 16 end ;

1 Binding refers to the association between two things, such as the association of values with identifiers. Binding is closely related to scoping. In fact, scopes decide when binding occurs. In most modern PLs, the scope of a binding is determined at compile-time (i.e. static scoping). Alternatively, binding may be dependent on the flow of execution ₍i.e. dynamic scoping).

2.2. OVERVIEW OF DB P ROGRAMMING ENVIRONMENTS Markus Kirchberg

17 pop C QRES ) ;

18 push ( QRES , RESULT 19 end ;

20 else . . . 2 1 end ( * eva L * ) ;

) ;

11 cance L the resu L t of ql

11 compose resu L t

r. denotes a single-row table from the row r . 0 denotes the horizontal composition of

bags. D

The SBA approach has been implemented in the LOQIS system [1 29] .

More recently, this SBA approach has been refined (refer [130] ) . In contrast to the earlier approach, it now supports a hierarchy of object store models ranging from a simple version (called MO) , which covers (nested-) relational and XML-oriented DBSs to higher-level object store models covering classes and static inheritance (in the model known as M 1 ) , object roles and dynamic inheritance (in the model known as M2) , and encapsulation (in the model known as M3) . In addition, a number of extensions have been proposed introducing types [52, 77] , object roles [54] and views [74] . However, the shortcomings outlined in Section 1 . 1.2 still apply.

2 . 2 Overview of Existing Database Programming Environments

While the SBA approach is of particular interest to our research, there are numerous research findings that have influenced the iDBPQL proposal. In this section, we intend to provide a brief overview of research results that are important to the DBPL research community and, in particular, to our research.

In general, there are two approaches to develop ODBSs. On one hand, a type system from an existing OOPL can be adopted and then extended to include concepts vital for DBSs. On the other hand, research may start by proposing a new, purpose-built object model. For instance, the former approach has been adopted by the DBPL project (which is based on Modula-2) , ObjectStore (which is based on C++) and Gemstone (which is based upon Smalltalk) . The latter approach has been more popular among research prototypes such as 02, SBA, TIGUKAT, and iDBPQL.

Persistence is a property not commonly supported by main stream programming languages. Thus, a large body of research has been devoted to address this shortcoming.

Next, we will introduce selected DB programming environments and important re search contributions in more detail.

2.2.1 The Database Programming Language DBPL

The Database programming language DBPL [115] integrates programming abstrac tions and querying constructs by extending the Modula-2 [144] system programming language. Main extensions include the addition of bulk data management (through a bulk type constructor for ' keyed sets' or relations) , access expressions (for access ing relational variables) , persistent modules (by extending Modula-2's modularisation mechanism with a special DATABASE MODULE construct) , and transactions (as special procedures) .

2 .2. OVERVIEW OF DB PRO G RAMMING ENVIRONMENTS Markus Kirchberg

While DBPL is not an object-oriented language, it still had a major impact on the development of object-oriented database languages. It demonstrated the feasibility and practicality of a database programming language t hat has been designed to ad vocate simplicity and orthogonality. In addition, DBPL supported an implementation ( procedure) type, but without incorporating a notion of methods.

2.2.2 The 02 Object Database System

The 02 object database system [14] has been developed in the late 1980's and 1990's. Starting from various (research) prototype systems, 02 became one of the most suc cessful commercial object base systems by the end of the 1 990's. The main objective underlying the development of 02 was to build an environment for data intensive ap plications that integrates the functionality of a DBMS, a programming language, and a programming environment .

02 distinguishes between values and objects. Each value has a type, which describes its structure. Pre-defined primitives are associated with types that may be invoked upon corresponding values. Besides atomic values, sets, lists and records are supported. Ob j ects belong to classes, have an identity and encapsulate values and user-defined meth ods. A class has a name, a type and a set of methods. Classes and types are partially ordered according to an inheritance relation. Multiple inheritance is supported. Persis tence is user-controlled. An 02 schema consists of classes, types, named objects and named values. A query language that uses the distinction between objects and values and the existence of primitives to manipulate structured values has been designed.

The 02 object base contains the 02 object manager [138] that is concerned with complex object access and manipulation, transaction management, persistence, disk management, and distribution in the adopted server j workstation environment. The object manager is divided into four layers:

- A layer that supports the manipulation of objects and values as well as transaction control;

- A memory management layer;

- A communication layer executing object transfers, execution migration, and application downloading; and

- A storage layer, on the server, provides persistence, disk management and transac tion support.

Applications correspond to system processes. For every application there is one process on the workstation and one (mirror) process on the server. The lock table, the buffer and the object manager's memory are shared among all processes.

2 .2.3 The Object Base Management System TIGUKAT

The TIG UKA T project [100] has been another attempt at developing a novel ODBMS that integrates object-oriented concepts with database systems. Characteristics of this approach include a purely behavioural object model (i.e. user interactions only happen t hrough behaviour invocations) , a uniform object model (i.e. everything is a first-class object; there is no separation between objects and val ues) , and the vision that this

2.2. OVERVIEW OF DB PROGRAMMING ENVIRONMENTS Markus Kirchberg

uniformity property is extended to other system entities such as queries, transactions etc. Research started by investigating requirements of OOPLs, DBPLs and ODBSs. The selection of requirements was mainly theory-driven. As a result, numerous desired properties that are not commonly found in practical systems even within the respective domain have been added. For instance, support of multi-methods and parametric poly morphism together with inclusion polymorphism is uncommon even in today's OOPLs due to their complexity and associated implementation challenges. Leontiev et al. [79] outlines all identified requirements in more detail and also proposes ten tests that may be used to verify whether or not an object-oriented type system satisfies some or all of the desired properties. Leontiev [78, page 206] already acknowledges the complexity and difficulty of meeting the outlined set of requirements: '[. .. ] the desired combination of features remains elusive in that no type system, theoretical or practical, exists in a programming language or proposed, has all the features necessary for the consistent and uniform treatment of object-oriented database programming. ' . In the same article [78] , a corresponding type system is proposed that passes all ten tests. While theoretical correctness was proven, practical verification or prototyping remained unaccomplished. Among other things, the complexity of verification algorithms is assumed to be expo nential.

The TIGUKAT research project was terminated (in 1 999, to the best of our knowl edge) without addressing problems that arise when data creation and manipulation statements, PL constructs, transactions and distribution are included into the pro posed behavioural DBS language. Nevertheless, this research project not only provides an extensive review of existing database and system programming languages, but it also influenced our research. In particular, the shape of parts of TIGUKAT's type sys tem hierarchy and the adopted approach to class-based object access has affected the respective concept in the proposed language iDBPQL.

2.2.4 The Parallel Database Programming Language FAD

The Franco-Armenian Data Model (FAD) has been designed for highly parallel da tabase systems. Besides query operations, the optimisation of programming language constructs, in terms of utilising parallelism, is considered to enhance system perfor mance. The language FAD is presented in [13] . FAD provides a rich set of built-in structural data types and operations. It operates on sets and tuples, which can be nested within each other to an unlimited degree. FAD operators mainly utilise pipelin ing and set-oriented parallelism. For instance, a f i lter operator is proposed targeting sets. In addition, a pump operator, which supports parallelism by a divide and conquer strategy, targets aggregate functions. Both operators have i nfluenced how iDBPQL supports simultaneous processing.

FAD was later extended [51 ] with communication primitives, in particular with asynchronous message passing mechanisms. The resulting language is called PFA D ,

2.2. OVERVIEW OF DB PROGRAMMING ENVIRONMENTS Markus Kirchberg

2.2.5 Additional Relevant Research Results on Database Programming Languages

The Object Database and Environment (ODE) [2] is based on the C++ [128] object paradigm. C++ classes are used for both database and general purpose manipulation. ODE provides its own database programming language 0++ [1] , which allows the definition, querying and manipulation of ODE databases. Persistence is modelled on the 'heap'. In particular, memory is partitioned into volatile and persistent. Volatile objects are allocated in main memory (on the heap) . Persistent objects are allocated in the persistent store. An ODE database is a collection of persistent objects. Each object is identified by a unique object identifier, which is realised as (persistent) pointer to a persistent object.

PS-Algol [9] is an experimental language, which demonstrates that persistence can be achieved regardless of type. It is considered to be the first programming language that supports this orthogonal persistence property.

The support of orthogonal persistence is strongly advocated, especially for DBPLs. Among others, [10, 1 1] underline this importance. Three principles of persistence are outlined ' that should govern language design:

- Persistence should be a property of arbitrary values and not limited to certain types.

- All values should have the same rights to persistence.

- While a value persists, so should its description (type). ' [ 1 1 , page 109]

In addition, [11] also advocates the support of type completeness (i.e. all data types should be of equal status) and adequate expressive power (i.e. computational power, database manipulation and database access) .

This need for orthogonal persistence is reinforced in [10] . Technologies required to support orthogonal persistence are presented. These include a stable and reliable per sistent object store, implementing persistence by reachability and type-safe linguistic reflection as defined in [126] .

Napier88 [93] is a persistent programming system that supports orthogonal per sistence and type completeness. In contrast to its predecessor, PS-algol, the Napier88 system consists of its own, special-purpose built N apier88 language and a persistent environment. As such, it uses objects within the persistent store.

In Napier88, the philosophy that types are sets of values from the value space is adopted. This is in accordance with research results obtained in type theory (e.g. refer to [27] ) .

Concurrency is provided by threads and semaphores for co-operative and competi tive concurrency, and designer transactions.

The Napier88 system is designed as a layered architecture consisting of a compiler, the Persistent Abstract Machine (PAM) and persistent storage architecture. Multiple incarnations of persistent stores and activations of the PAM are supported. However, only one PAM incarnation may work on one persistent store at any one time.

2.2. OVERVIEW OF DB PROGRAMMING ENVIRONMENTS Markus Kirchberg Fibonacci [5] is an object-oriented database programming language, which is based on the Galileo language [4] . Its main contribution is the proposal of three orthogonal mechanisms: Objects with roles, classes and associations.

Objects are encapsulated entities that can only be accessed by method invocation. Objects are grouped into classes. Internally, objects (e.g. persons) are organised as acyclic graphs of roles (e.g. students and academic staff members) . An object can only be accessed through one of its roles. Also, the associated behaviour depends on the role used to access the object. This enables a person to be both a student and an academic staff member (without supporting mul tiple inheritance on the level of objects / classes) . Associations relate objects that exist independently. In fact, they represent modifiable n-ary symmetric relations between classes.

Considering database functionality, Fibonacci supports common concepts such as persistence, transactions, queries, and integrity constraints. In addition, a modu larisation mechanism is provided enabling the structuring of complex databases in interrelated units, and for the definition of external schemata.

The Java [46] application programming language has become the most popular object-oriented language to date. There are numerous languages derived from Java, such as Orthogonally Persistent Java (PJava) [12] . PJava extends Java by supporting the three principles of persistence as outlined earlier. Persistence is achieved through promotion. As the first object of a class is promoted, so is its class and all classes that are used to define it. Thus, the promotion algorithm maintains reachability (sub-) graphs.

In addition, research results from our own research group are incorporated in Chap ter 3. Findings from other researchers that relate to a particular iDBPQL concept (e.g. support of NULLable types, union types and simultaneous processing) , related ODBS components (e.g. the persistent object store and the multi-level transaction manage ment system) and further implementation-specific issues are stated when the corre sponding concepts are to be discussed.

Chapter

In document Integration of database programming and query languages for distributed object bases : a dissertation presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Information Systems at Massey University (Page 30-38)