• No results found

Packages, Subsystems, and Name Spaces

The package and its specialization, the subsystem (see Chapter 7), provide organizing name spaces for your system architecture. The objective is to design a system in reusable chunks that are as independent from one another as possible.

Packages are a new concept to the relational world. Most database designers think about relational schemas as a kind of global data repository, or even as a single name space. This approach comes from technological limitations in the past and from cultures that built up around these limitations. For example, you will still find DBAs who insist on global column names, or on names that include all kinds of identifying information. CMN_ID_ID_NO_NUM, for example, might represent such an excessive naming convention: The commonplace book (CMN) ID table (ID) ID number (ID_NO) column, which is a numeric data type (NUM). I prefer IDNumber in the Identification table, taking full advantage of the name space established by the schema and the table.

Note

Don't confuse the UML package that I discuss here and in Chapter 7 (and later in this chapter put to use designing a database) with the Oracle PL/SQL package. PL/SQL took the package idea from the Ada programming language, where a package is a compilation unit that collects a number of different elements into a single object. It's really more of a module or a class than a subsystem, although the concepts are related by the concept of the name space.

Figure 11-2: The Organization Subsystem in UML

The problem with schemas as sets of global names is precisely the problem that packages address: reusability through encapsulation and uncoupling. Most database designers have at some point confronted the need to integrate more than one database into a seamless whole. In reality, you should think about the problem as

integrating multiple packages into a system. Each package provides its contribution as required in an encapsulated, uncoupled way. Using OO techniques, you don't use the lowest common denominator approach of raising all the names up to the top as global; you use name spaces to carefully organize access.

Figure 11-3: The Entity Subsystem in UML

Standard SQL provides a very basic name space capability with the schema name [Melton and Simon 1993; ANSI 1992]. An ANSI schema is a descriptor that includes a name, an authorization identifier, a character set name, and all the descriptors of a set of components (tables, views, domains, assertions, privileges, character sets, collations, or translations). No major DBMS of which I am aware implements this schema capability precisely. IBM's SQL/DS and DB2 systems established a de facto standard by setting up a system of users in a single database. Oracle and Informix followed suit, as did a host of smaller DBMS vendors. This concept of user corresponds to the ANSI concept of authorization identifier. In the earlier SQL standard, schema name and authorization were the same; the 1992 standard uncoupled them.

Note

As an aside, I have to express my opinion (repeat, opinion) at this point in a small sermon about the unfortunate way in which the major relational vendors have ignored the ANSI standard. Not one of the major vendors even comes close to implementing the full features of that standard, despite the dramatic improvements it makes to the SQL language in areas ranging from date-time processing to character sets and internationalization [Melton and Simon 1993]. The schema name space capabilities are just another example of the major vendors' inability to move forward. You should pressure the vendors to implement the complete standard as quickly as possible. You can remind vendors that complain about the difficulty of implementing the standard that people are paying huge amounts of money for their software design and implementation skills. Also, having a complete ANSI SQL provides a tremendously strong value proposition for customers. A vendor who provides this will have a very strong competitive position in the RDBMS market.

The user corresponds to the authentication identifier and to the schema name. These databases do not have a "schema" object, although they implement the CREATE SCHEMA statement from the standard. Instead of establishing a schema name space, these systems establish a user name space. Each user can run multiple CREATE SCHEMA statements without any implication about the relationships between the sets of objects the various statements create other than that they all belong to the named user.

Users are really part of the standard SQL security mechanism, which security theorists call discretionary access control. This kind of security system establishes an owner for each object, such as a table, and a set of privileges granted on that object to other users. Users must refer to objects owned by other users by prefacing the object with the name of the owning user. Again, this corresponds to the SQL standard's authentication identifier and schema name.

Note

Microsoft has integrated its SQL Server security system with the security system of the underlying operating system, Windows NT. This makes it easier to administer the database by using the NT user and password to authenticate access to the database as well. You don't have to define separate users for the database and the operating system. The database name space thus becomes part of the larger name space established by operating system security.

Various DBMS vendors extend this system with an additional concept: the synonym or alias. You can create an alias for an object that belongs to another user. You can then refer to the alias without prefacing it with the name of the owning user. Using synonyms, you can create names in your user name space that make a database with multiple name spaces look like a single, global name space. A public synonym makes the name global to all users to provide a truly global name.

But what about multiple databases? The SQL standard is totally silent on the concept of database; its only concern is to establish the schema name space. This omission has resulted in an array of different approaches to having multiple databases. Oracle and Informix have added the concept of a database link. This is essentially a synonym you establish that refers to a foreign database within the database in which you create the link. You then prefix the name of an object in the foreign database with not only the name of the owning user but the database link name as well.

Sybase and its cousin Microsoft SQL Server take a different and more complex approach. These systems have multiple databases as objects registered on a server. Databases and users are completely separate. You also have logins, the username and password with which a user authenticates themselves, and these are also completely separate. You can give a user access to any object in any accessible database. You then refer to an object by prefacing it with the database name and the name of the owning user, which may or may not be the name of the login. At the moment, the name space extends only to the boundaries of the server on which the databases reside due to the way these systems identify the server. There is an administrative login (sa) with full access to everything. There is for each database a publicly owned set of tables (the standard user dbo, database owner, owns these tables) that any user with access to the database can see.

Note Again, SQL Server version 7 integrates the security system with operating system security. Bringing all of this diversity to bear on the problem of implementing persistent UML packages is an interesting exercise. First, forget a standard approach—there is none. Second, spend some design time establishing how your applications will refer to the different name spaces. Ideally, a subsystem or package should refer to a well-defined set of other packages, and it should be able to ignore any relationships between those packages.

To see an example of implementing an approach to name spaces in Oracle7, see the section "Packaged Subsystems" toward the end of this chapter.

To summarize, you can establish separate package and subsystem name spaces and encapsulation using a

combination of different aspects of your relational database manager. You still don't get a complete representation of the package, but you get more encapsulation than with a more usual implementation of the system.