The UML class and its structure corresponds closely to the entity type and its structure in ER diagramming. In the UML:
A class is a description of a set of objects that share the same attributes, operations, methods, relationships, and semantics. A class may use a set of interfaces to specify collections of operations it provides to its environment. [Rational Software 1997b, p. 20]
There are quite a few words with specific meanings in this definition, all of which are relevant to the database designer:
Attribute: "A named slot within a classifier [an interface, type, class, subsystem, database, or component] that describes a range of values that in stances of the classifier may hold"
[Rational Software 1997b, p. 149]. This is very similar to the relational definition of an attribute as a named mapping to a domain [Codd 1972].
Operation: "A service that can be requested from an object to effect behavior. An operation has a signature [name and parameters, possibly including the returned parameter], which may restrict the actual parameters that are possible." [Rational Software 1997b, p. 155]
Method: "The implementation of an operation. It specifies the algorithm or procedure that effects the results of an operation." [Rational Software 1997b, p. 154]
Relationship: "A semantic connection among model elements. Examples of relationships include associations and generalizations." [Rational Software 1997b, p. 156]
Association: "The semantic relationship between two or more classifiers that involves connections among their instances." [Rational Software 1997b, p. 149]
Generalization: "A taxonomic relationship between a more general element and a more specific element. The more specific element is fully consistent with the more general element and contains additional information. An instance of the more specific element may be used where the more general element is allowed." [Rational Software 1997b, p. 152]
Interface: "A declaration of a collection of operations that may be used for defining a service offered by an instance." [Rational Software 1997b, p. 153]
The classifier is a fundamental concept in the UML that may confuse even many OO design experts. This is an abstract UML class that includes the various UML notational subclasses you can use to classify objects in some way by their structure and behavior. Classes, types, interfaces, subsystems, components, and databases are all kinds of classifiers. Often the UML diagramming notations apply to classifiers in general, not just to classes.
These terms combine to form the semantics of the class as it appears in the UML class diagram. That is, to be meaningful, you must build your class diagram with notation that corresponds to these concepts as the UML defines them. This section discusses classes and attributes, while the following sections discuss operations, relationships, and other business rules.
A class diagram is a structural or static diagram with which you model the structure of a system of classes. In many ways, class diagrams strongly resemble ER diagrams. If you compare the definition of the class given above to the definition of the entity type for ER diagrams, you'll see they are substantially the same. Differences emerge mainly in the modeling of operations and relationships. All this is hardly surprising, since the OO notations on which UML was founded all developed from ER notations.
Note
The UML class diagram is really broader than just classes, as it can model interfaces, relationships, and even individual class instances or objects. An alternative and sometimes preferable name for the class diagram is thus the "static structural diagram," which many of the books on the UML use instead of the more common "class diagram." Since the focus here is on classes and their relationships, I have chosen to use the term "class diagram."
But before getting into the details of class diagramming, you first need to understand how to structure your system as a whole, and that means packages.
Packages
Most UML-capable design tools let you organize your classes independently of the UML diagram structure, as the UML specification suggests. That is, you can model all your classes in a single diagram, or you can break them up into chunks across several diagrams. Some tools let you do both, letting you define your classes in a repository or single diagram from which you create other diagrams that refer back to the repository for class and relationship structure. You can thus show different views of your system without duplicating your actual modeling efforts.
In many ways, the organization of your diagrams depends on the tools you use to draw them. Using a simple drawing tool, you can organize the classes however you like, but it's a lot of work. Using a comprehensive tool such as Rational Rose or one of its competitors, you can let the tool do most of the work. However, you pays your money and you takes your choice: you must often structure your diagrams the way the tool requires.
Part of the art of OO project management is to structure your project and organization along the lines of your product architecture [Hohmann 1997; Muller 1998]. One aspect of the project is the data model, and modeling your information is just as subject to architectural structuring as every other part of the project. This may seem kind of vague and difficult, but it isn't really if you understand how object systems work in architectural terms.
An software system is a working software object that stands alone, such as a software product or a framework library. These systems in turn comprise various smaller systems that do not stand alone but which come together to make up the system as a whole. I call these subsystems clusters [Muller 1998]; others call them everything from packages to subsystems to layers to partitions to modules. A UML package is a grouping of model elements. A UML subsystem is a kind of package that represents the specification and realization of a set of behaviors [Rational Software 1997b, pp. 130— 137], which is close to my concept of cluster. A subsystem is also a kind of classifier, so you can refer to it anywhere that you can use a class or interface. The specification consists of a set of use cases and their relationships, operations, and interfaces. The realization is a set of classes and other subsystems that provide the specified behavior. You relate the specification and the realization through a set of collaborations, mappings between the use cases and the classes or interfaces, in a collaboration diagram, which this book doesn't cover. These subsystems are the building blocks or components that make up your software system. They, not classes or files, should be the basis for configuration management in your system, if your source control tools are smart enough to
understand subsystems [Muller 1998]. Usually a subsystem groups several classes together, though you can have single-class subsystems or even subsystems that are merely facades for other
subsystems, with no classes at all, just interfaces. In terms of system architecture, subsystems often correspond to static (LIBs) or dynamic link libraries (DLLs), which present the external interfaces of the subsystem for reuse by other libraries.
Packages are really just name spaces, named groupings of elements that have unique names within the group. Subsystems are much more substantial and form the basis for your system design. You can use packages as such to create lightweight groups within your subsystems without needing to develop separate use cases and collaborations. How to decide what needs a subsystem and what is OK to leave as just a package is up to you.
The UML model is a package that contains a complete representation of your system from a given modeling perspective. You can have different models of the system using different perspectives or levels. For example, you can have an analysis model consisting entirely of use cases and a design model consisting of subsystems or classes.
Besides models and subsystems, there are several kinds of packages identified only by adding stereotypes to the package name. A stereotype is a phrase in guillemets, « », that you place on a symbol to represent an "official" extension to the semantics of the UML.
«System»: The package that contains all the models of the system
«Facade»: A package that consists solely of references to other packages, presenting the features of those packages in a view
«Framework»: A package that presents an extensible template for use in a specific domain, consisting mainly of patterns
«Top-level package»: A package that contains all the other packages in a model, representing all the nonenvironmental parts of the model
«Stub»: A package with a public interface and nothing more, representing design work that is not yet complete or that is deferred for some reason
Note
The way you structure your packages and subsystems is a part of the general process of OO system architectural design [Rumbaugh et al. 1992, pp. 198—226; Booch 1994, pp. 221— 222; Booch 1996, pp. 108—115]. There is much more to system architecture than just the aspects relevant to database design, so please consult these references for more details. The UML semantics specification is also a very useful reference for system architecture concepts [Rational Software 1997b].
This concept of system architecture as related and nested packages helps you to structure the system for optimal reusability. The point of packaging model elements is to let you uncouple large parts of the system from one another. This in turn lets you encapsulate large parts of the system into the
packages, which become your reusable components. In other words, the package is the focus for creating reusable components, not the class. Packages typically have a package interface or set of interfaces that model the services that the package provides. Other packages access the packaged elements through those interfaces but know nothing about the internal structure of the package. It's best to be clear: you should think about your system architecture as a set of packages and their interfaces, not as a collection of interrelated database entities. This focus on packaging is the critical difference between the ER approach to designing databases and the OO methods I am presenting in this book. The consequences of this type of thinking go all the way through to the database—or, I should say, databases. You should consider the subsystems different schemas and hence different databases, even if you actually build them all together in a single physical schema. The databases themselves become packages at some level, representing shared data, when you're detailing the actual implementation structure of your system. Breaking up your system into multiple databases gives you the option of modularizing your system for reuse. You can create a Person package that lets you model people and their related objects (addresses, for example). You can use that package in several systems, reusing the database schema nested within it along with everything else. You might even be able to reuse the database implementation, which is after all the point of the three-level ANSI/SPARC architecture.
You will find two central organizing principles behind most of your packages. The first is purely structural: How do the classes within the package relate to classes outside the package? Minimizing such interrelationships gives you a clear metric for judging whether to include a class in one package or another. From the database perspective, you have another consideration: transactions. Minimizing the number of packages that are part of a transaction is usually a good idea. Keeping transactions within a single package is ideal.
A good way to approach the packaging of your classes is through your use cases. To start developing your classes, as the next section describes, you go through your use cases looking for objects and their attributes. As you do this, also look at the transaction structure. You'll often find that you can simplify your package structure by understanding how the use case scenarios affect objects in the database. Start building subsystems by grouping the use cases that affect similar objects together. You will also often find that you can think of interesting ways to modify the use cases to simplify your transactions. As you build your packages, think about how the use cases map to the transactions involving the classes in the package. If you can find a way to restate the requirements to minimize the number of packages, you will usually have a better system in the end. Since your use cases
correspond to transactions, this method guarantees that transactions center within packages rather than being spread all over the system.
All this is part of the iterative nature of OO design. You will find yourself moving from the microdesign level back to the architectural level or even back to requirements, then moving forward again with a better system. Your subsystems grow from a set of use cases to a complete subsystem package with classes, interfaces, and collaborations as well as the use cases. Your use cases may get more detail as you understand more about what is really going on within them. As long as you have it under control, such iteration is the essence of good design.
Enough abstraction. How do packages work in practice? When you first begin designing an
architecture, you should usually have gone through building a set of use cases. You've gotten some basic ideas about the classes you'll need, and a reasonably clear idea of the sorts of transactions in the system. You should be able at this point to brainstorm a few basic packages that will form a basis for building your system architecture model. You have a few class hierarchies in mind, and you've almost certainly identified a few major subject areas of the system.
For example, the commonplace book system has several major areas of interest. Recall the use cases from Chapter 4:
Authenticate: Connect to the system as an authenticated user
Report Case History: Report a case history with text and multimedia details
Report Event: Report on a set of events based on factual and text search criteria
Report Criminal: Report a criminal biography with text and multimedia details, including police file data and case histories involving the criminal
Identify with Alias: Find a criminal's identity based on alias
Identify with Evidence: Find a criminal's identity based on evidence analysis such as fingerprint, DNA, and so on
Report Topic: Report an encyclopedia topic
Find Agony Column: Find an agony column entry based on text search criteria
Find Mob: Find information about criminal organizations based on factual search criteria
Explore: Explore relationships between people, organizations, and facts in the database The System Architecture model contains a top-level subsystem called Commonplace Book that contains all the other subsystems. You can see several major areas of interest here. Authentication implies that something needs to keep track of users of the system and their security attributes. You might call this subsystem the Security package or the User package. Other subsystems all check with Security before allowing access. Case histories and events have their own subsystems. There are a series of use cases relating to criminals, who are people, and use cases that refer to other kinds of people as well, so you have a Person subsystem. You could create a Criminal subsystem layered on top of Person, or generalizing Person, that holds things such as aliases and roles in criminal
organizations. You need a CriminalOrganization subsystem, certainly, to accommodate the needs of the Report Case History, Report Criminal, and Find Mob use cases. The first two use cases are part of the Event and Criminal subsystems, while Find Mob becomes part of the CriminalOrganization subsystem. You also need Encyclopedia and Media Report packages to hold information about these elements of the use cases. Finally, Explore requires a separate subsystem called Ad Hoc Query that offers a set of operations permitting users to explore the relationships in the other subsystems. Thinking a bit more about the needs of case history reports and criminal record reports, you might create some additional packages for the multimedia objects such as fingerprint records, photos, DNA profiles, and other data elements of the commonplace book. These are perhaps a bit too detailed for the first pass. You might benefit from doing some class design before creating packages for these things, or at least making sure you iterate a few times before putting it all in the freezer.
You can also take a first crack at relating the packages, resulting in the package diagram in Figure 7- 1.
The package diagram shows the overall structure of your system. Each package is a named file folder icon with the package name and the appropriate stereotype. You show dependencies between the packages with dashed, directed arrows. In Figure 7-1, for example, both the Ad Hoc Query subsystem and the Role package depend on the CriminalOrganization subsystem. The Role package depends on both the Criminal and the CriminalOrganization subsystems, and the Case History subsystem depends on the Role. The Commonplace Book subsystem depends on all the other subsystems except for Person; it also uses the Role package. Security does this as well, but I've hidden the arrows for clarity. Keeping the diagram from resembling a plate of pasta is always a problem; using straight lines helps, and hiding minor lines helps even more. The solid arrow with a white arrowhead from Criminal to Person is a generalization, meaning the Criminal subsystem inherits from the Person subsystem (a criminal is a kind of person).
Figure 7-1: The Initial Package Diagram for the Commonplace Book
You could actually label all these as «Stub» until you're ready to push down the design, but in this example the intent is to complete the subsystem designs in the current design process step. If you tasked several people with designing the different pieces in separate design steps, you could stub the subsystems until the first complete design is ready.
As you progress in defining the details of the system, your packages acquire internal details such as use cases, classes, interfaces, and collaborations. By the end of your design effort, you should have a reasonably solid set of packages that are mostly independent of one another and that interact through well-defined, reusable interfaces. Each package should break down into a set of class diagrams that you can use to document and direct the construction of the actual code for the system, including the database creation and manipulation code.
Again, while architectural design is different from database design, you will find your subsystems and packages do have direct effects on the underlying database architecture in the end. You do the