Operations and Methods - Database Design For Smarties Using UML For Data Modeling Robert Mull

Creating your columns only gets you part of the way to having a complete application. Table columns comprise the static part, but what about the dynamic part? UML classes have operations as well as attributes. How can you leverage these operations in a relational database environment?

The major relational databases limit you to stored procedures and functions as objects in your schema. For example, Oracle7 lets you define stored functions or procedures. If your functions don't have side effects, you can use them in standard SQL expressions wherever you can use a scalar value. With a little creativity, you can often represent your class behavior as stored procedures or functions that you can call either in SQL expressions or as part of a larger stored procedure. You can also call these from your application programs using various nonstandard APIs. Whether

this makes sense depends on the nature of the behavior, the larger system architecture of which the database is a part, and the requirements for reuse across applications.

First, however, it's important to realize the distinction between the database server and the application code. This division is not arbitrary; some things make sense as database server behavior, while other things do not. This is the art of application partitioning.

Partitioning There are whole classes of operations that you don't want to see on the server. Partitioning is the process of deciding whether to put the behavior on the database server, an application server, or the client.

Note

The world is full of partitions. With respect to computers, in different contexts you can partition any number of different things. The consequence so far as this book is concerned is to confuse you with using the same word for different things. In particular, don't confuse application partitioning, the division of the application between client and server, or server and server, with data partitioning, the division of a table or other collection of data into different physical chunks for physical input/output optimization.

First, you don't want operations that handle transient issues. Any operation that requires a reference to an in- memory address or an in-memory object is not suitable for the database server, since that server has no access to such things.

Second, you don't want client-called operations that access data attribute by attribute. Recall from Chapter 7 that you can create operations with the {query} property, here called accessors or observers, that simply return an attribute's value (usually as a constant reference in C++, for example). Database servers don't work that way; they return records, not values. Visualize a very large tunnel filled with a huge cargo train that contains a single grain of corn for each trip. It's going to take a lot of resources to get even a single bag of corn to market. Moving data value by value across a network is not a good idea. You can extend this logic to any of the operations on single attributes. For example, simple mutators (set operations) often change a single attribute value. Instead, you should leave these on the client or application server side, then submit changes as entire rows of data when you're ready to commit the transaction. If you can group several attributes for a single update, that's better, but still has limitations.

Note

If you intend the operations only for use within the database server (that is, by other stored procedures), then you can probably get away with accessors and mutators as stored functions. If you choose to do this, don't make the functions available to applications.

Third, you don't want to put operations on the server that need to refer back to the transient client in any way. It's not good design to require the server to maintain behavioral state across calls. It has enough to do managing transaction state. This is not to say that you can't do it; PL/SQL packages, for example, let you set data values and keep them in a special shared buffer across as many calls as you like. This can have very strange effects as you scale up to many clients accessing the data at once. Generally, you want your server code to be reentrant: you want a session to be able to access and run the behavior without worrying about anything that happened before. That means not relying on any state left over from previous invocations of the operation (static data or data from tables). Keep these kinds of operations on the server.

Also, some operations do extensive error checking in memory, aborting the operation if things don't work out. Do this checking on the client or application server side, not on the database server side. The only error handling you want to do on the server side is error handling related to database structures—business rule checking, physical data storage errors, DBMS errors, and so on. This especially applies to code you think belongs in a trigger. Because triggers execute when the DBMS raises an event, you don't really have control over trigger execution. Choose code you put into triggers wisely. You should code trigger operations that make sense for the event that runs the trigger (AFTER INSERT and so on). If you find yourself creating trigger code to work around some problem with a stored procedure, consider other alternatives. If you find yourself creating trigger code to work around other trigger code, bail out fast! Keeping it simple works nicely with triggers.

Finally, you want to evaluate the performance of your remaining operations. It may be cheaper to run your operation in a compiled language on the application server rather than in an interpreted, slow PL/SQL stored procedure. So what kind of operations does this leave?

Query a result set: Return a set of records to the client

Insert/update/delete: Modify data for a single row (not a single value)

Rule checking: Enforce business rules with triggers or procedures (for example, you can handle special constraints on foreign keys using triggers or add code that evaluates complex procedural constraints on an input row or rows)

Derived values: Calculate a value from database data with an algorithm you can't specify as an SQL expression in your target DBMS version of SQL

Encapsulating operations: Operations that call other database operations such as stored procedures provided by the vendor or data dictionary accesses related to the data in the table (for example, a stored procedure to retrieve the primary key column definition of the table)

Error checking: Operations that check various error states and raise exceptions (or whatever method by which errors return to the client)

Stimulus and Response: Behavior and Side Effects Much in the relational database world depends on the nature of the behavior you intend to put into your persistent classes. Oracle7, Informix, and Sybase all provide server-side programming facilities, but there are various subtleties about structure and use that depend directly on what's happening inside the black box of the operation.

Oracle7.3 provides both stored procedures and stored functions. You can use the functions in your SQL statements, but only if they have no side effects. In UML terms, the operation must have a {query} tag. This permits the Oracle7 parser to call the function and return a value with the knowledge that the function won't be doing anything to upset the internal processing of the DBMS.

On the flip side, if the operation has no {query} tag, you cannot use it as an SQL function. You can, however, add it to your set of database objects as a stored program unit for use by other program units or by the client through some kind of API for calling stored units from programs.

The SQL security system introduces another factor: privileges. One frequently asked question in the Usenet newsgroups for Oracle is "I've coded a stored procedure and I have access to all the data it uses, but I can't run it from my program." This is usually because the user logs in under a user name different from the owner of the stored procedure and has not granted EXECUTE privilege on the procedure to that user. There are various quirks and subtleties in the different RDBMS products with respect to stored program unit privileges.

A third factor is the external access required by the operation. Many programmers want to open or write to files or print things or display things on the screen from a stored program unit. The RDBMS permits all of these things. What it does at runtime is not, however, very easy to understand in a client/server context. Generally, if you try to do things outside the DBMS, the code executes on the server. You thus print to the printers attached to the server, not to those attached to your client. You write files on disks accessible to the server, not your client hard disk (unless of course that disk is accessible to the server and you identify it and write to it using parameters). Sending mail, running devices, synchronizing with other applications—all of these operations make for much more complexity in your operations if you make them stored procedures.

Another form of access is to in-memory data. Often, programmers want to refer to data in their running program in their stored program unit. Since they are running in different processes and potentially on different machines, this is not possible. All such references must be to parameters of the stored procedure. In its ultimate form, this leads to stateless programming: ensuring that the stored program unit gets everything it needs through its parameters and makes no assumptions about prior calls or actions in the database. You will often find such procedures with embedded transaction commits and rollbacks, as being stateless implies not carrying transactions across server access calls.

That leads to a final behavioral factor: transaction requirements. If your transactions are at all complex, you will have to decide at some point where those transactions happen. Generally, transactions happen in the application, not on the database server. In other words, you generally do not put transaction commands (COMMIT, ROLLBACK, SAVEPOINT) into your stored program units. Instead, the application server (or client in a two-tier system) issues these commands separately because it is the center for the transaction semantics. Whichever you decide is appropriate, you then must stick to that: mixing is forbidden. If some stored procedures have transaction logic and others don't, programmers get terminally confused about whether they must code commits and rollbacks.

Programmers are quite surprised if (in Oracle7) they attempt to roll back data that a stored procedure has already committed. They are surprised (in SQL Server or Sybase) when they get an exception or database error when they commit because there is no active transaction because a stored procedure committed or rolled back earlier. The transaction logic issue leads into the next topic: system architecture.

A Pattern Language: Architecture and Visibility The system architecture determines quite a lot about your approach. Given that your UML design is OO, and given that relational databases are not, you need to shift one architecture toward the other. It's better to make the relational system look more object-oriented than to mess up your OO system architecture to make it look relational. I've done both, and the latter approach is definitely bad for the digestion.

Unfortunately, RDBMS products are not chock-full of OO capabilities. This lack is part of what led to the primary justification for OODBMS products—the "impedance mismatch" between the OO and relational structure. You have a number of different ways to overcome this mismatch, none totally satisfactory.

On the database side, you can use stored procedures and packages to encapsulate your objects. In Oracle7, for example, you can create packages that correspond to the classes in your UML data model; in other systems, you can just use a collection of stored procedures. You implement the operations that make sense for the database server as packaged program units. You create the underlying tables and the packages in a schema belonging to a specific user, the "owner" of your application schema. You don't grant access to the tables, though; instead, you just grant the appropriate privileges such as EXECUTE to the users that need access to objects of the class. The procedures provide the database server interface (API) for the class. When an application logs on as an enabled user, the code calls the procedures rather than using SQL to query or manipulate the underlying tables.

This approach reduces the mismatch in two ways. First, it packages the underlying tables as classes rather than as tables. This permits you to use OO structure rather than forcing you into the flat table model of programming. Second, the approach eliminates the SQL programming interface, reducing every database access to a procedural call. Such calls are much closer to the OO way of thinking, though they don't get all the way there. You must still pass the object identifier (that is, the primary key or row identifier) to most of your procedures as a parameter, as the package doesn't let you create an instance for each object in your program. That is, procedures are independent of the state of any individual row or rows in the underlying tables, so you have to pass in primary keys or other identifiers to operate on the correct data.

Also, this approach provides full encapsulation of your data. No application can access the data in the tables directly. You can thus protect the data from accidental or deliberate damage through unauthorized access to the table data. Everything goes through the packaged procedures. This provides the same visibility control that an OO language gives you through access control and operational visibility restrictions. It's kind of all or nothing, however: you must hide all your table data, not just some of it.

The limitation to this approach is the mismatch hypothesis itself: the lack of SQL. SQL is a declarative language. In using it, you specify what you want, not how to get it. Your stored procedures, on the other hand (and OO

languages) are procedural: you tell the computer exactly how to navigate around the data by procedure calls. The whole reason for the existence of SQL is that it is much easier to use a declarative language to specify complex data requirements. By eliminating SQL, the procedural approach eliminates all the advantages that SQL gives you. You may be fully OO, but you'll be much less efficient at creating the code that manipulates the data. For simple data access, that usually isn't a problem. If you find yourself adding more than a few lines of code to a packaged procedure, however, you may want to reevaluate your approach.

Gray Water Applications: Portability and Reuse "Gray water" is water you recycle from household uses such as baths to reuse in irrigation or other applications that don't require "clean" water. Many jurisdictions wrongly ban the use of gray water on the assumption that dirty water is evil. Gray water is a useful metaphor for relational database operations. Each gray water application uses a different technology, which makes it difficult to develop a standard approach. Consequently, people use clean water for purposes for which they could be using gray water, and the gray water goes down the sewer without reuse. The end result is that you waste a lot of water.

Similarly, different RDBMS products have completely different approaches to database behavior. The SQL-92 standard says nothing whatever about it, leaving it up to the database vendors to be creative. Many shops outlaw the use of stored procedures and other behavioral constructs such as triggers on the theory that it makes the database less portable. I'll take a position on this: it's wrong. It's not wrong because the code is portable; it's wrong because it arbitrarily removes a tool you can use effectively to get reuse on the database server. You are essentially flushing your reuse potential down the sewer.

The holy grail of portability is to reduce the amount of code that requires work to move to another system to zero. Having achieved the grail, you can move your system from platform to platform with no effort—the ultimate in code reuse.

The trick with grails is that very few people can see them or reach them, and it's a whole lot of work to do so even if you are pure of heart, body, and soul. Usually, full database portability implies both a level of performance and a level of coding productivity that are close to unacceptable. Many managers think that by insisting on full portability they will be saving themselves millions in maintenance and porting costs. This is nothing but magical thinking in a shaman culture.

For example, different DBMS products have radically different transaction processing architectures. Some support page-level locking; others support row-level locking. Some support read consistency (Oracle7); others don't. Some RDBMS products have automatically maintained temporary tables (Sybase, SQL Server, Informix); others don't (Oracle7). Some RDBMS products support UNICODE for standard international language support (Oracle7); others don't (SQL Server). Don't even get me started on SQL optimizers. Portability? Only if you don't have transactions, don't use temporary tables, and don't need to retrieve data in a hurry.

I believe it is foolish not to take the limitations of your targeted range of platforms into account. It is also foolish not to take advantage of the major productivity and performance features of your target DBMS. In the best of all possible worlds, you would have a standard to which all vendors adhere. That doesn't exist, and will not exist, for relational databases, particularly with respect to optimization and transactions, but also for things like globalization and character strings. Economics is against it, the installed base is against it, and Microsoft/IBM/ Oracle are against it. I won't say I'm against it, but you must bow to reality occasionally.

The best way to proceed in this situation is to evaluate rationally which parts of your application can benefit from using a portable, reusable approach and which would benefit most from coding as stored procedures in your

database server. It's not going to be 100% one way or the other. As with gray water, you'll find that using dirty stuff is

In document Database Design For Smarties Using UML For Data Modeling Robert Muller pdf (Page 176-180)