THE OBJECT MODEL - THE ASSOCIATIVE MODEL OF DATA

Object orientation is first and foremost a way of programming that ensures the integrity of data in the computer’s main memory. In object-oriented programming languages, the purpose of an object is to act as the custodian of some data. The object guarantees the integrity of its data by not allowing anything except itself to see or touch the data. Any process that needs to read or change the data must do so by invoking one of the object’s methods. A method is a process that the object itself knows how to perform on its data, in a manner designed and tested to guarantee the integrity of the data at all times. This way of hiding data from the outside world to ensure its integrity is called encapsulation.

An object’s data items and methods are determined by its

class. A class is an abstract expression of the characteristics and behaviour of a collection of similar objects. A class may inherit

data items and methods from other classes. An object is an

instance of its class, and is said to instantiate the class. Instance is another word for object.

Your bank account is a good real-world analogy for an object. You cannot change anything in your bank account directly: instead you send it messages in the form of cheques, deposits and so on. Your bank account’s methods are “Pay a cheque”, “Receive a deposit”, “Produce a statement” and so on. Your bank account instantiates the class Bank account.

For a mental image of an object, picture an egg. The yolk is the data. The white is the methods themselves, comprising procedural code, and the shell is the interface that the object presents to the world outside, comprising the names of its methods and their various parameters.

Objects and Methods

Let’s look more closely at objects and methods. Take a piece of data: August 11th 1999. In a traditional programming environment, this data item would exist in a computer’s memory as a string of digits: “19990811”. (Computers often store dates as year/month/day so that they can be ordered and compared more easily.) Suppose we want a program to calculate the date of the same day in fifty years time. Simple enough – we just add 50 to the year.

But suppose we forget that the date is stored in year/month/day form, and so by mistake tell the program to add 50 to the last four digits, which is where the year would have been if we hadn’t inverted it. The result when we read the data back is “19990861”, or August 61st 1999, very far from the desired result.

In an object oriented environment, the data would be in the custody of an object, and nothing else would be allowed to access the data directly. The object’s data items are “day”, “month” and “year”, but other programs can’t see either their names or their values. All that the object shows to other programs is its methods, which are the things that an object can do. Some of the methods for our date might be:

Tell me what date you are

Tell me just your year (or month, or day) Tell me what day of the week you are

Tell me how many days before or after a given date you are Add x years to yourself and tell me the result

Add x months to yourself and tell me the result Add x days to yourself and tell me the result

So to calculate a date 50 years on, we have our program invoke the method “Add x years to yourself”, specifying a value for x of 50, and the object gives us the answer “August 11th 2049”. In this object-oriented environment, we simply do not have the

option of trying to do the job ourselves by getting our hands on the data.

This is an elegant way to work, but why is it so significant? To appreciate the full implication, picture yourself as a programmer working in a multi-tasking computing environment, where one computer’s central processor and main memory may be used not only by your own program, but also by many other programs simultaneously. In a traditional programming environment, your data could be overwritten or corrupted not only by your own mistakes, as in our example, but also by the mistakes of any number of other programmers who may have once worked on the programs executing alongside your own. By contrast, in an object-oriented environment, each piece of data is safely encapsulated inside its own custodian object.

Clearly this technique can significantly reduce the number and the impact of programming errors. Software errors are now the main cause of space rocket launch failures, so the potential for savings and disaster avoidance are clear.

One further word on encapsulation. Confusingly, many object-oriented programming languages allow programmers to circumvent encapsulation by allowing objects to have public data items, which can be read and changed by the methods of other objects. The use of public data is a blatant subversion of object orientation, and, as experienced programmers will know, it almost always bites the hand that practices it in the end.

Classes

It would be inappropriate to specify the entire list of data items and methods anew for every new date we come across, so instead we define a class called Date that does the job once and once only. Then, each time we need a new date, our program says “Create a new instance of the class Date called Date of eclipse”. Date of eclipse then comes fully equipped with the data

items and methods that it needs to be a Date. So the class Date is an abstraction of every date we are likely to need in the future, and all future dates are constructed by reference to the class

Date.

Another way to look at it is that the class Date is a machine whose purpose is to create properly-formed dates. When a new class is created, it is equipped by its creator with a full set of methods, including a special method called a constructor whose sole job is to create new instances of the class and ensure that only valid instances may be created. Once a class is written and thoroughly tested, it can safely be made available for use to other programmers.

When we come across a need for a new class, often we already have a class that does part but not all of the job that we need doing. When this is the case, we can use the existing class as a starting point for the new one, and create our new class by adding the extra data items and methods that we need to the existing class. But we must do this carefully. The old class is already in use, so we can’t add the new items directly to it because this would alter its behaviour, and disrupt programs that rely on it. Nor do we want to make a copy of the old class and add the new items to it to form the new one, because if we later find a problem in the old class, we would then have to fix it in two or more places.

Instead, we use a mechanism called inheritance. We define the new class by associating it with the old one, and then specifying the extra items that we need. When the programming language compiler wants to build a complete picture of the new class, it first builds a picture of the old one by referring to its current definition, and then adds the extra items to create the new one. Suppose that as well as the date we also want to record the exact time of the eclipse, 11:11am on August 11th 1999. We can create the new class, Date and time, by taking the class Date

appropriate methods to deal with them. Then we can say “Create a new Date and Time called Date and time of eclipse”.

The Object Model of Data

The object model of data was originally developed to provide persistent storage for object-oriented programming languages. Whilst an object-oriented program is running, all of its variables (that is, the data items that it is using at a particular point in time) are stored in main memory in the custody of objects. When the program ends, the memory is cleared and the objects are lost. Persistence is the capability that allows the objects to be stored on disk, so that when the program is stopped and re-started, it can re-load its objects from disk and carry on exactly where it had left off.

A conference room whiteboard provides a good analogy for persistence. The content of a whiteboard is not persistent, because the next group of people to use the conference room will probably clean the whiteboard and write on it themselves. So if you want to keep a permanent record of anything you write on the board during your own meeting, you must copy it onto paper before you leave the conference room at the end of your meeting. Object databases were developed to be like the paper onto which the whiteboard’s contents are written.

The need for persistence first became evident for object- oriented Computer Aided Design (CAD) applications. After word processing, spreadsheets and programming tools, CAD was the fourth “killer application” for personal computers. The data used by the first three of these came in the form of strings of text and numbers interspersed with special characters such as carriage returns to show how the data should be presented. These strings could exist just as readily on disk as in main memory, so no special tools were needed to write the data to disk when the program ended. CAD applications were different.

They allowed their users to create complex graphical representations that existed solely as networks of interrelated objects in main memory, and could not easily be converted into strings of characters that could be saved to and reloaded from disk (a process known as “serialisation”). Hence the need for object databases, which could copy the objects between main memory and disk virtually unaltered.

The object model of data is consistent with the concepts that we discussed earlier: it pictures data items in the custody of objects that derive their characteristics from classes. The precise interpretation and implementation of this concept varies somewhat from one object database to another. In general terms, an object database is a piece of software that can copy individual objects or associated sets of objects between main memory and disk storage without needing to decompose (or normalise) the objects into their component data items. An association between two objects in main memory is typically expressed as a pointer from one object to the other. Such pointers resolve directly or indirectly to the physical address of the object in main memory. As objects are copied from main memory to disk, these pointers are replaced by the addresses of the objects on disk, and vice versa – a process called “swizzling”. As the objects are written to disk, the database builds and maintains an index that tells it where to locate individual objects on disk.

The object model of data was not originally conceived to improve on or even compete with the relational model but was intended simply to provide persistent storage for object-oriented programming languages. The original proponents of object orientation came from scientific and engineering disciplines rather than commercial data processing, and had had little exposure to the needs of transaction processing systems where relational databases excel today. Nevertheless, the market’s enthusiasm for object orientation has positioned the object model as a challenger and potential successor to the relational model. Despite this, the object model has not been a commercial

success, mainly because its proponents have never made a clear case that its potential benefits outweigh the costs associated with its adoption. In seeking to challenge the relational model on its home turf, the object model of data suffers from a series of shortcomings, both practical and conceptual.

Shortcomings of the Object Model of Data

Conceptual Model

The object model lacks a single, unified conceptual model, and so it is subject to interpretation by authors and vendors alike. As many authors have pointed out, a sound conceptual model is vital as the foundation of any application, and the lack of one for object database has allowed an anarchic situation to develop. Authors disagree on basic matters of definition and terminology. Two examples: some people define an object unequivocally as an instance of a class whilst others use the term to mean either a class or an instance. Some insist that inheritance refers to the passing of properties from a class to a subclass, whilst others allow it also to mean the passing of data and method definitions from a class to an instance.

Microsoft’s proprietary object technology OLE exemplifies this well. In 1991, Microsoft created a technique that would allow multimedia fragments such as things like spreadsheets and pictures to be embedded inside other multimedia documents, to create “compound documents”. It called this technique “Object Linking and Embedding”, or OLE Version 1. But what Microsoft meant by an object in OLE Version 1 – essentially a multimedia file – was not what C++ meant by an object, or Smalltalk meant by an object, or any one of a number of object databases then on the scene meant by an object. Microsoft has since re-invented OLE by renaming it “Oh-lay”, and has declared it to be an enabling technology for software component

integration. Throughout, Microsoft has used the term object to mean what it understands by an object at the time, which continues to be a moving target.

Not Everything is an Object

As a way of structuring programs and managing data in main memory, the object model has the elegance, simplicity and clarity of an approach that is clearly right. But in seeking to apply the model more widely, there is a trap for the unwary: it is all too easy to start viewing things in the world solely as objects. This alters our viewpoint in a dangerous way: instead of using a model to illuminate our view of the real world, we see the real world only in terms of our preferred model, and ignore the features of the real world that do not fit the model.

This in turn has led to some questionable applications of object orientation. It is held by some to be a sound basis for a business process re-engineering methodology, but is it really appropriate to re-engineer a commercial enterprise around a model where individuals and departments jealously guard their own information against all-comers, and respond only to a limited and precisely-defined set of inputs? Also we commonly use object-oriented analysis and design methodologies for applications that are to be implemented using relational databases: again, is there not a fundamental mis-match here?

Software development is all about modelling. Programming languages model the facilities and capabilities of the computers that they motivate, design methodologies model the behaviour of real-world systems, and database schemas model the structure of information in the real world. Within the constraints of current technology, each model must be as close a fit as possible to the reality that it represents, be it computer hardware at one end of the spectrum, or the real world at the other. Here are some of the ways in which the world of object orientation does not match the real world.

• Each object belongs to a class whose behaviour and characteristics it retains throughout its life. Things in the real world have a lifecycle, and alter their behaviour and characteristics as they do so. Children become adults; prospects become customers; trees become telegraph poles.

• Each object belongs to one class only: it has a single set of properties and methods. Things in the real world have many different aspects depending on who is interacting with them, and a different set of properties and methods for each aspect.

• Objects have no volition: they respond to messages. A wholly object-oriented system would never get started because no-one would send the first message. In the real world, individuals and enterprises have volition and act spontaneously.

• Objects have a limited and prescribed set of messages to which they can respond. In the real world, the ability of individuals and enterprises to survive and prosper depends on their ability to formulate useful responses to entirely new messages.

Querying

Querying is the process of obtaining information, in the form of answers to specific questions, from a database: “How many

customers do we have in Scotland?”; “Which suppliers are able

to supply 2-inch steel widgets by Friday?”, and so on. Given that the object model’s primary goal is to ensure the integrity of data through encapsulation, it is perhaps unavoidable that getting at the data is not quite as easy as it might otherwise be. There are three reasons why querying under the object model is less efficient than under other models.

Firstly, the value of an object’s data items can only be ascertained by invoking a method. This entails at a minimum passing process control to the method, fetching the value of the data item from storage, and passing process control back to the requesting program. Even if the extra work entailed in invoking a method is no more than these two control transfers (and, especially in a polymorphic environment, it is often much more than that) it is additional to the work involved in simply fetching the value of the data item from storage. Thus retrieving data in an object-oriented environment is inherently less efficient than the same operation in an environment that doesn’t implement encapsulation.

Secondly, in practice many queries can be answered on the basis of only part of an object’s data. For example, it is not

In document THE ASSOCIATIVE MODEL OF DATA (Page 61-79)