A Case-Based Approach for Reuse in Software Design
3.3 Knowledge Base
The KB stores all the knowledge used by the reasoning mechanisms. It comprises a case library, the WordNet ontology, case indexes and the data type taxonomy. Each of these parts is described in this section.
3.3.1 Case Library
The case library comprises two types of cases: design cases and design pattern ap-plication cases (DPA cases). The design cases are the most frequently used in RE-BUILDER, and represent UML class diagrams. DPA cases represent specific situa-tions of design pattern applicasitua-tions. In the text, we use the term case to refer to a design case, if we want to refer to a DPA case, we will explicitly use the DPA acronym to distinguish it.
Cases are stored in files - a file for each case - and they are only read to memory if necessary. Cases in a case library are organized in three lists. A list of confirmed cases, comprising the cases used for reasoning that were approved by the KB administrator.
The list of unconfirmed cases, containing the cases submitted to the case library by designers. These cases are not used for reasoning and wait for revision by the KB administrator. They can be accepted to be included in the case base, moving to the
CHAPTER 3. A Case-Based Approach for Reuse in Software Design
Figure 3.4: Example of an UML Class diagram (Case1 ), the package classification is School.
confirmed list or, if rejected, they can be deleted or moved to the obsolete list. This list comprises cases that were considered obsolete for various reasons.
The next subsections describe the representations for design cases and DPA cases.
Design Cases
In REBUILDER a case describes a software design, which is represented in the UML formalism through the use of Class Diagrams. Figure 3.4 shows an example of a class diagram representing part of an educational information system. Nodes repre-sent classes and comprise a name, attributes and methods. Links reprerepre-sent relations between classes. Conceptually a case in REBUILDER comprises: a name used to identify the case within the case library; the main package, which is an object that comprises all the objects that describe the main class diagram; and the file name where the case is stored. Cases are stored using XML/XMI (eXtended Mark-up Language, XML Metadata Interchange), which is a widely used format for data ex-change. UML class diagram objects available in REBUILDER are: packages, classes and interfaces. See section 2.3.4 for more details.
DPA Cases
A DPA case describes a specific situation where a software design pattern was applied to a class diagram. Each DPA case comprises: a problem and a solution description.
The problem describes the application situation based on: the initial class diagram, and the mapped participants. The initial class diagram is the UML class diagram to which the software design pattern was applied. This is the diagram before the application of the design pattern. The mapped participants are specific elements that must be present in order for the software design pattern to be applicable. Participants
can be: classes, interfaces, methods or attributes. Each participant has a specific role in the design pattern and it is determinant for the correct application of the design pattern. Each pattern has it’s specific set of participants. Once the participants are identified, the application of a design pattern follows a specific algorithm that embeds the pattern actions. To select the role for each participant in the initial class diagram, a mapping of these participants is performed.
It is important to describe the types of participants defined within our approach.
Object participants can be classes or interfaces, attribute participants correspond to class attributes, and method participants correspond to object methods. Each participant has a set of properties:
Role (String): Role of the participant in the design pattern.
Object (class or interface): Object playing the role, or in case of attribute or method participant the object to which the attribute or method belongs.
Method (method): Method playing the role in case of a method participant.
Attribute (attribute): Attribute playing the role in case of an attribute partici-pant.
Mandatory (Boolean): True if the participant must exist in order for the design pattern to be applicable, if it is optional then the value is false.
Unique (Boolean): True if there can be one or more participants with this role type, otherwise it is false.
The solution description of a DPA case is the applied name of the design pattern, which is then used to select the correspondent software design pattern operator (de-scribed in detail in subsetion 4.4.2). Different DPA cases can have the same solution, because what a DPA case represents is the context of application of a design pattern, and a large number of context situations is possible.
3.3.2 WordNet
WordNet is used in REBUILDER as an ontology. It uses a differential theory where concept meanings are represented by symbols that enable a theorist to distinguish
CHAPTER 3. A Case-Based Approach for Reuse in Software Design
among them. Symbols are words, and concept meanings are called synsets. A synset is a concept represented by one or more words. Words that can be used to represent the same synset are called synonyms. A word with more than one meaning is called a polysemous word. For instance, the word mouse has two meanings, it can denote a rat, or it can express a computer mouse. In this way the word mouse belongs to more than one synset.
WordNet is built around the concept of synset. Basically it comprises a list of word synsets, and different semantic relations between synsets. The first part is a list of words. Each word is linked to a list of synsets that the word can represent. The second part, is a set of semantic relations between synsets. REBUILDER uses four semantic relations: is-a, part-of, substance-of, and member-of. Synsets are classified into four different types: nouns, verbs, adjectives, and adverbs.
Synsets are used in REBUILDER for categorization of software objects. Each object has a synset associated, which is the synset that was selected as the correct synset accordingly to the specific object diagram. In order to find the correct synset, REBUILDER uses the object name, and the names of the objects related with it, which define the object context. The object’s synset can then be used for computing object similarity (using the WordNet semantic relations), or it can be used as a case index, allowing rapid access to objects with the same classification. WordNet is also used to compute the semantic distance between synsets. This distance is the length of the shortest path between the two synsets. Any of the four relation types can be used to establish the path between synsets. This distance is used in REBUILDER to assess the type of similarity between objects, and to choose the correct synset when the object’s name has more than one synset. This process is called name disambiguation [Ide and Veronis, 1998] and is a crucial process in REBUILDER. If a diagram object has a name with several synsets, then more information about this object has to be gathered to find which synset is the correct one. The diagram objects that directly or indirectly are associated with this object are used for this disambiguation. In case the object is a class, its attributes can also be used in the disambiguation process.
This procedure is used when a case is inserted in the case library or when the designer calls the retrieval module, section 4.7 describes this in more detail.
One problem that we encountered in WordNet, was that it had few concepts
Institution Educational
Figure 3.5: A small example of the WordNet structure and case indexes.
concerning specific knowledge on the software development domain. So we decided to integrate the concepts of the Java Class Hierarchy. This is the hierarchy of classes of the Java language. We connected this hierarchy to the WordNet structure by establishing an is-a link from the synset representing the concept of computer program to the Java class object (the top of the Java class hierarchy).
3.3.3 Case Indexes
Cases can not be stored in memory due to their dimensions, so they must be stored in files, which makes case access slower then if they were in main memory. To solve this problem we use case indexing. This provides a way to access the relevant case parts for retrieval without having to read all the case files from disk. Each object in a case is used as an index. REBUILDER uses the synset of each object to index the case in WordNet. This way, REBUILDER can retrieve a complete case, using the case root package, or it can retrieve only a subset of case objects, using the objects’ indexes. This allows REBUILDER to provide the designer the possibility to retrieve not only packages, but also classes and interfaces. To illustrate this approach, suppose that the class diagram of figure 3.4 represents Case1 and figure 3.5 presents part of the WordNet structure with the case indexes associated with Case1. As can be seen, WordNet relations are of the types is-a, part-of and member-of, while the index relation links a case object (squared boxes) with a WordNet synset (rounded boxes). For instance Case1 has one package named School (the one presented in figure 3.4), which is indexed by synset School. It has also a class with the same name and categorization, indexed by the same synset, making also this class available for retrieval. Another advantage of indexing cases by their parts (diagram objects) is that
CHAPTER 3. A Case-Based Approach for Reuse in Software Design
Figure 3.6: An example of the DPA case indexing. Synsets are identified by nine digit numbers.
cases can be retrieved in other situations, in which a part of a case can be relevant.
The indexing of Case1 by the synset Teacher illustrates this.
DPA cases are indexed using the synsets of the object participants (figure 3.6 presents an example) and only the participants (objects, attributes and methods) can be used as retrieval indexes. The WordNet structure is used as an index structure enabling the search for DPA cases in an incremental way. Each case can be stored in a file, which may be read when it is necessary . In figure 3.6 there are four indexed objects, three of them corresponding to object participants (Teacher, School and University), and one to a method participant (Classroom1 ), indexed by the object comprising the method.
3.3.4 Data Type Taxonomy
The data type taxonomy is a hierarchy of data types used in REBUILDER. Data types are used in the definition of attributes and parameters. The data taxonomy is used to compute the conceptual distance between two data types. Figure 3.7 presents part of the data type taxonomy used in REBUILDER.