Sizing a schema for estimation purposes is easy, right? You just count the classes, and voila! Perhaps, but if you are going to be using the size metric for estimating effort, you'll find that not all classes are created equal, at least for that purpose.
The best all-around metric for schema size that I've found is the function point. A function point is an ordinal-scale metric (with some interval-ratio tendencies) that measures the functionality of a software system. There is no room for an extensive description of function points here, so you should consult a tutorial for more details [Dreger 1989; Garmus and Herron 1996; IFPUG 1994;Jones 1991].
Once you have your use cases, you can count function points for your system as a whole. The total number of adjusted function points gives you a clear measure of the size of the software system, including the database-centric aspects. In counting function points, you divide the functionality of your system into five separate countable types of functions:
External inputs (EIs): Transactions that result in input across the application boundary
External outputs (EOs): Transactions that result in output across the application boundary
External queries (EQs): Transactions that send inputs to the application and get back outputs without changing anything in the database
Internal logical files (ILFs): Data the application manages
External interface files (EIFs): Data the application uses but doesn't manage
For schema sizing, these last two counts are the most important. ILFs are for the most part the persistent classes in your data model. To qualify as an ILF, your persistent class must be within the application boundary; that is, your application must create persistent objects of the class as part of its essential function. ELFs, on the other hand, are classes that you refer to but don't maintain. That is, your application does not create or update persistent objects, it just uses them after retrieval from the database.
To count ILFs and ELFs, you need your data model and your use cases. First, divide the classes in your model according to whether the application manages them or just queries them. For example, in the commonplace book system, the bulk of the classes represent data that the application queries for the Holmes PLC operatives without
letting them change anything. These are ELFs, such as Person and CriminalOrganization. There are some classes that represent operatives' input into the system, such as Case Note, in which operatives record their reports on current cases. These are ILFs. Most systems have many small tables that an administrator maintains, such as lists of cities and states, data type codes, and security information. These are usually ELFs in most applications. The tables critical to a given application are usually ILFs, but not always, as in the commonplace book. Such systems are usually query-only systems for the most part.
Tip
If your data model consistently marks the operations of classes with the {query} tag, you should consider adding a {query} tag to the class to indicate that it is an ELF rather than an ILF in your application. This makes counting function points even easier.
The second part of function point counting is determining the complexity of the count for a given item. To do this for the two file counts, you need to determine the record element types (RETs) and data element types (DETs) for the file. With respect to function points, complexity is a weighting factor by which you multiply the raw counts to get an "unadjusted" function point count (I'll get to "adjusting" those counts later).
DETs are "unique user recognizable, nonrecursive fields/attributes, including foreign key attributes, maintained on the ILF or EIF" [Garmus and Herron 1996, pp. 46—47]. These correspond to the individual attributes and
relationships in your UML classes. You count one DET for each attribute/relationship; you don't vary this by the number of objects of the class or the number of related objects. You just count the attributes and relationships. If you have a relationship to a class with an explicit OID consisting of more than one attribute, you count one DET for each of the attributes rather than just one DET for the relationship. The explicit OID means that you are using a user- recognizable attribute to identify an object, and therefore each contributes to complexity. You can ignore attributes that appear only as implementation techniques. You also should collapse repeating attributes that are identical in format, a common implementation technique. Usually these are arrays or structures (or embedded classes); sometimes for reasons of clarity you break them out into separate attributes, such as a series of bit flags or a group of monthly amount fields. Ignore these extra attributes and count a single DET.
RETs "are user recognizable subgroups (optional or mandatory) of data elements contained within an ILF or EIF. Subgroups are typically represented in an entity relationship diagram as entity subtypes or attributive entities, commonly called parent-child relationships" [Garmus and Herron 1996, p. 47]. The RET thus corresponds to either the generalization relationship or to the composite aggregation association or to the parent-child recursive
relationship in UML persistent object diagrams. The idea is to count a single RET for a single object, and an object consists of its class and its superclasses as well as the objects it owns through aggregation. You count one RET for an object with no generalization or aggregation association, or you count one RET for each such relationship. For a real-world example, consider the criminal organization in Figure 8-9, reproduced here and labeled for your convenience as Figure 9-1. The commonplace book lets the operative query information about the criminal
organization; other parts of the Holmes PLC system maintain the information. Therefore, all of the classes in Figure 9-1 are EIFs or RETs. In a data entry application for criminal organizations, they would be ILFs and RETs. The RET/DET determination is the same, however. The EIFs appear with the «EIF» stereotype and the RETs appear with the «RET» stereotype, just to be completely compatible with the UML notational world. You show the DET counts as tagged DET values on the classes.
Figure 9-1: Counting the CriminalOrganization
There are five EIFs with five RETs and 29 DETs in Figure 9-1. It's not quite that simple, however, because you need to associate the RETs and DETs with the EIFs of which they are a part. This lets you factor in the complexity of the EIF or ILF in counting its function points. The example shows the kind of decisions you need to make.
Note
Figure 9-1 shows how you can use the UML notation in a completely different way, in this case to count function points more easily. If you are using a diagramming tool that permits multiple views of models, you can use the various different stereotypes and tags to great effect in this way. Such tools are hard to come by, however; I know of none that currently support this kind of modeling.
The first decision you must make is what class in a class hierarchy to use as the EIF. In this case, our application focuses on CriminalOrganization, so that becomes the first EIF. If you're counting a standalone database design, you start with the root class as the EIF. You then make the subclasses RETs. In this case, we make the
CriminalOrganization class an EIF and the rest RETs.
The Entity class provides a second decision. It has a recursive, parent-child relationship to show how entities relate to one another. This association becomes an RET.
The Role class provides a third decision. Because it relates to the Organization class as a composite aggregate, it becomes an RET rather than an EIF. That is, you consider the Role a part of the Organization object rather than thinking about it separately. The Role is a part of the Organization.
Finally, the Membership class owns the RoleAssignment class, so that is in turn an RET rather than an EIF. Again, the RoleAssignment is a part of the Membership (which in turn is a relationship between a Person and an
Organization, just to be totally confusing).
How do you convert the raw function point count (in this case, five EIFs) into the unadjusted function point count? Use Table 9-3 to determine the ordinal value for the complexity given the number of RETs and the number of DETs for a given EIF.
Let's consider the EIFs one by one. The first one is CriminalOrganization, the center of the web. It has two
generalization RETs, Organization and Entity, its superclasses. Organization has in turn the RET for Role, making three RETs. Leave the RoleAssignment to its owner, Membership. Entity has a RET for the parent-child recursive relationship. Thus the CriminalOrganization EIF has four RETs and a total of eight DETs from the various classes, leading to a low complexity.
Person has a single RET, its superclass Entity. It has two DETs itself and two for Entity. That leads to a low complexity. You don't count the RET for the parent-child recursive relationship again, as you've already accounted for that in the Organization count.
Table 9-3: File Complexity Ordinals
RETs 1—19 DETs 20—50 DETs 51+ DETs
1 Low Low Average
2—5 Low Average High
6+ Average High High
Site has no RETs, which means you count one RET for the class itself. It has three DETs. That leads to a low complexity.
Whereabouts has no RETS(1) and four DETs, two for its attributes and two for the foreign keys of the association class, leading to a low complexity.
Membership has one RET (RoleAssignment) and five DETs, three for the attributes and two for the foreign keys of the association class, leading to a low complexity.
Given these ordinals, you calculate unadjusted function points with Table 9-4.
Since all of the complexity ordinals are low, you multiply the number of EIFs by five to get a total of 25 unadjusted function points (five EIFs multiplied by complexity multiplier five). The CriminalOrganization application that uses these classes thus has 25 function points relating to the database. There are, of course, other function points coming from the EI, EO, and EQ calculations for the transient software subsystems; these contribute to effort estimates just as much as the data calculations.
The final step in counting function points is to adjust the total by the "general system characteristics." It is at this point that function points become a bad metric. You evaluate your application according to an ordinal scale on various characteristics, such as data communications, performance, transaction rate, online data entry, reusability,
installation ease, multiple sites, and online update. You rate each item on an ordinal scale of 1 to 5, then sum these to get the total degree of influence. You then calculate the value adjustment factor by multiplying this number by .01 and adding 0.65 to it. This calculation, unfortunately, completely destroys the validity of the measure by converting it from an ordinal scale to a ratio scale, resulting in a meaningless fractional number such as 4.356.
Fortunately, for most database systems using the client/server model and destined for a real-world commercial environment with a decent number of users, the value adjustment factor comes surprisingly close to unity (1). For the most part, I recommend using the unadjusted function point count as your size measure for this reason.
Table 9-4: Function Point Count Multipliers
Type Low Average High
ILF í7 í10 í15
Note
You should consult the IFPUG documentation for complete details on adjusting function points. If your situation differs significantly from the normal commercial database application, you may need to adjust your function point count. Don't put too much faith in the exact nature of the number, however, since you've abrogated the mathematical validity of the scale by moving to a ratio scale.