Ontology Modeling Using UMLXin Wang Christine W. Chan
Department of Computer Science, University of Regina, Regina, Saskatchewan, Canada S4S 0A2 email@example.com, firstname.lastname@example.org
Ontology is a comprehensive knowledge model which enables the developer to practice a “higher” level of reuse of knowledge. Typically, different modeling languages are employed in different phases of software development of a knowledge-based system. In order to achieve knowledge instead of software reuse, we propose forging a closer mapping between the knowledge and software models in the development process. In this paper, we first present an ontology developed and then investigate UML as an ontology modeling tool to facilitate the mapping from knowledge model to software model. To illustrate the integrated modeling approach using UML, we applied it to develop an ontology for the domain of selecting a remediation technique for petroleum contaminated sites.
Ontology, Knowledge Reuse, Unified Modeling Language (UML)
The objective of our study is to investigate stronger coupling between the knowledge engineering and software engineering phases of a knowledge-based system. While both phases emphasize a model-based approach, the models are often not directly translatable between the two phases to enable a seamless transition in the development process. This paper presents our investigation of UML as an ontology modeling tool to facilitate the mapping from knowledge to software models. First the model-based approach in the two phases of knowledge engineering and software engineering are discussed.
The notion of ‘knowledge level’  has provided an important perspective to Artificial Intelligence (AI) in general and to knowledge systems in particular. The emphasis on knowledge level has engendered the so-called modeling approaches for knowledge systems, in which developing a knowledge-based system is viewed as the construction of a series of models related to some problem-solving behavior. Within this context, ontology, as a kind of knowledge model, has become important as a vehicle to enhance knowledge sharing and reuse, and object-oriented database design.
In software engineering, a model is used as an abstraction of reality. Models enable developers to visualize a system and to specify the structure or behavior of a system. However, even though both software engineers and knowledge engineers use model-based methodologies, they often employ different modeling languages and tools. Hence, their models are often not directly translatable. The Unified Modeling Language (UML) is a standard language for writing software blueprints, which may be used to visualize, specify, construct and document an object-oriented system. To bridge the gap between ontological models and software engineering models, we investigate representing a domain ontology using UML. To illustrate model development, we apply our modeling approach to the sample domain of selection of petroleum remediation technology for contaminated sites.
The paper is organized as follows: Section 2 describes the problem domain of petroleum remediation selection and presents the knowledge models that were developed using a hierarchical tree of classes and entity-relationship (ER) diagrams. Section 3 briefly explains our rationale for integrating the knowledge and software models. In section 4, the approach of adopting UML for the integration is explained. Some conclusion remarks are in Section 5.
2. Ontology Design in the Petroleum Remediation
2.1 Ontology Modeling
Two major components of a knowledge-based system are domain knowledge and problem-solving knowledge. While ontologies influence problem-solving knowledge, they mainly play a role in analysing, modeling and representing domain knowledge. Development of a knowledge base typically assumes commitment to a single conceptualization and purpose. An ontology, on the other hand, is an explicit specification of a conceptualization, which can serve as a comprehensive foundation of knowledge. Ontologies are often equated with taxonomic hierarchies of classes , with class definitions, and the subsumption relation. They can be used as the basis of knowledge acquisition tools for gathering domain knowledge or for generating databases or expert systems. An ontology model can facilitate the knowledge analysis and representation processes. Before discussing the process of ontology construction, we first describe the application problem domain.
2.2 Domain of Selection of Petroleum Remediation Technologies for Contaminated Sites
Pollution from the petroleum industry is currently a major environmental concern world-wide. To adequately deal with each pollution situation, an appropriate remediation technology has to be selected. The environmental engineer must make a decision whether to control or reduce the contaminant in the soil and groundwater. However, contaminated sites have different characteristics depending on the pollutant’s properties, hydrological conditions, and a variety of physical characteristics such as mass transfer between different phases, chemical, and
biological processes. Therefore, remediation technologies applicable for different site conditions can vary significantly. This selection process is difficult and poses an important challenge for environmental engineers who need support tools in this decision-making process. Thus, implementing a shareable knowledge base in the domain of remediation selection process is a step towards fulfilling this need.
2.3 Ontology Design for the Domain of Petroleum Remediation Selection
Ontology design is primarily a categorization process. Good categorizations can facilitate information retrieval. Studies on categorization that pertain to ontology design in the AI field include Sowa’s ontology , Dahlgren’s ontology , and Gensim . Since the domain ontology of a knowledge-based system is an explicit specification of the objects, concepts, and other entities that are presumed to exist in some area of interest as well as the relationships that are held among them , it defines the set of terms and relations of a domain independent of any problem-solving method. Normally, such method-specific formulation of domain knowledge is difficult to reuse in a different application. Therefore, to separate the potentially reusable domain knowledge from the method-specific knowledge is a consideration that guided our structure of the domain ontology.
Construction of an ontology for a particular domain requires a profound analysis, which reveals the relevant concepts, attributes, relations, constraints, instances and axioms of that domain. Details of this process have been discussed in  and some key points are presented here. The design of the ontology structure for the domain of petroleum remediation selection is illustrated in Figure 1. The ontology structure is constructed with a number of assumptions which are discussed as follows.
Figure 1 consists of three major sub-categories under the root of Thing. The three sub-categories are class, process, and relation. A “Class” can be a Tangible Thing and an Abstraction. There are two major categories under Tangible Thing: decomposable objects and non-decomposable objects. Basically, the class ontology includes all tangible or abstract concepts or substances that are relevant in the petroleum remediation process, such as chemicals, site media, standards, and experiments.
A “Process” can be a simple process, complex process, or combination process. If a task can be accomplished in two steps using objects within a single class hierarchy such as mix and add, then it is considered to be a simple process. If a task is accomplished in more than two steps using objects within a single class hierarchy, we define it to be a complex process. The combination process applies when a process involves objects from more than one class hierarchy. In other words, a task is accomplished in more than two steps using objects from different class hierarchies.
A “Relation” covers properties of classes including their internal structure and relationships between classes. A relation can be one of three types: binary relation, multiple relation, and instance relation. A binary relation is a relation between two classes; multiple relation is a relation involving more than two classes; instance relation is a relation of some sets of attributes with certain values for an object. An instance relation is only true for a specific class or instance.
Things Tangible_thing Abstraction Process Relation Decomposable_object Non_decomposable _object Site_media
Chemical Standard Experiment Remediation Soil Water Gas Groundwater Chemical _contaminant Organic_chemical _contaminant Non_organic_chemical _contaminant Petroleum_ contamination _standard Air_pollution _standard Water_pollution _standard USEPA_
standard Saskatchewan_standard Alberta_standard Sampling_ experiment Soil_sampling_ _experiment Water_sampling_ _experiment Air_sampling_ _experiment Petroleum_waste _remediation Water_pollution _remediation Air_pollution _remediation In_situ_ remediation Ex_situ_ remediation Place Time Efficiency Simple_ process Complex_ process Combination_ process Class Binary_ relation Multiple_ relation Instance_ relation
Figure 1. Ontology Design: Classification Hierarchy of the Petroleum Remediation Domain ( Taken from )
2.4 Entity Relationship Modeling
Another aspect of the ontology involves relationships among the classes in the petroleum remediation domain. Entity-Relation (ER) diagrams have been used to provide a convenient framework for database design, development and documentation. They represent relationships among entities involved in an information system and they are used here to represent relationships among classes in the ontology.
Some of the classes depicted in the classification hierarchy shown in Figure 1 are described in greater detail in Figure 2 which shows detailed information on attributes of select classes. Some sample relations in the domain are described in .
3. Integration of Knowledge and Software Models
The knowledge in the petroleum remediation domain has been analysed and represented using the tools of classification hierarchy, class hierarchy and entity-relationship diagrams. The explicit representation can be converted into an implemented ontology using ontology editors such as Protégé. The construction
Site_media Soil Groundwater Soil_groundwater Slot: SuperClass: Class: isa link species link Soil_type Soil_hydraulic_permeability Soil_heterogeneity Soil_isotropy Horizontal_hydraulic_conductivity Vertical_hydraulic_conductivity Groundwater_type Site_size Site_hydraulic_conductivity Site_volume Site_area Depth_of_site class/object/entity attribute/slot/species Water Gas Soil_type Soil_hydraulic_permeability Soil_heterogeneity Soil_isotropy Horizontal_hydraulic_conductivity Vertical_hydraulic_conductivity pH_value Groundwater_type pH_value
Figure 2. A Sample Class Hierarchy (Taken from )
of the implemented ontology model has been described in . However, a domain ontology model is not directly translatable to a system model. While Protégé can construct an ontology model, the implemented representation of the ontology is usually not well understood by the software engineer. Hence, a bridge is needed to cross the gap in modeling between the knowledge model and a knowledge based system.
Ideally, the software development process should consist of two completely separate stages: User-centered Stage, which is related to the users and their needs, and System-centered Stage, which is to find the computer solution that satisfies user needs. From this perspective, the knowledge model is mainly related to the first stage while the software model strides the two stages but emphasizes the second. Knowledge modeling cannot produce good software because of its lack of system-centered support. Hence, a bridge is needed to cross the gap in modeling between the knowledge model and a knowledge based system.
In this paper, we proposed that the UML(trademark of the Object Management Group, Inc.) together with Object Constraint Language (OCL) can be used as the representation language to bridge this gap.
4. Application of the Integrated Modeling Approach
4.1 UML as an ontology modeling language
The ontology presented in section 2 can be represented using Unified Modeling Language (UML). UML is a standard language for writing software blueprints. UML can be used to visualize, specify, construct, and document the artifacts of a
software-intensive system. It is appropriate for modeling systems ranging from enterprise information systems to distributed Web-based applications and real time embedded systems. UML is an expressive language, capable of representing different views needed to develop and then deploy such systems. However, despite its expressiveness, a UML graphical model, such as a class model, cannot by itself support a precise and complete specification. Usually, additional constraints need to be specified about the objects in the models. Constraints specified in natural language often result in ambiguities. Hence, formal and often mathematically derived languages are needed, which are often difficult for the average business or system modeler. Users of the Unified Modeling Language can use OCL to specify constraints and other expressions attached to their models. OCL is a formal language that is easy to read and write. An additional advantage of employing the UML with OCL formalism is that each symbol in the UML notation is well-defined with well-understood semantics. Hence a model represented in UML with OCL can be easily interpreted unambiguously by other people or other tools. Next, some sample UML representations of the ontology in the petroleum remediation domain are discussed.
4.2 Ontology of the Petroleum Remediation Domain: a UML
The ontology of the petroleum remediation domain model can be mapped into two modeling components as shown in Figure 3: the Classification and Relationship modules. The Classification module includes the classes of concepts in the petroleum remediation domain while the Relationship module describes relationships among those classes. The two modules can be regarded as the high level encapsulation of the analysis and design model for the system to be developed. Depending on the target system, modification and refinements
may be necessary on the analysis model. Sample portions are explained in the following subsections.
Figure 3. Domain Ontologies
Figure 4 shows the top-level classification diagram of ontology design in the petroleum remediation domain. The standard notations in UML are used to represent the classes and the generalization association.
Figure 4. Top-level Diagram of Ontology Design in Petroleum Remediation Domain
Figure 5. Middle Level Ontology Design in Petroleum Remediation Domain (Sample Portion)
The top-level diagram can be refined further. For example, the middle-level classification diagram of the class of Decomposable Object is shown in Figure 5 and the lower-level classification diagram of the class of Media is shown in Figure 6.
Figure 6. Lower Level Ontology for Media Class
In addition to the taxonomy of classes, ontologies also include all relevant constraints between classes, attribute values, instances and relations (or axioms). OCL is employed to articulate the constraints among the classes. For example, in Figure 6, the OCL expression for the class of Soil specifies all the possible types of Soil. And for the class of Media, two notes are attached indicated by the dotted lines. One specifies the possible values of the attribute of Hydraulic Conductivity and the other provides the formula for calculating the volume of the site media. 4.2.2 Relationship
Relationships among the classes can be represented in the Class diagram in UML. Figure 7 is a sample portion of the Relationship diagram depicting five classes in the petroleum remediation domain ontology.
The process describes the dynamic or task knowledge in the domain ontology. One possible way to explain the steps of the process is to use a natural language procedure. In UML, the activity diagram is used to represent the process from the activity view and the operation view. For example, from the activity view, the process of Determine the Media Size involves two steps: (1) to measure the area of the contaminated site, (2) to calculate the volume of the contaminated site according to the formula listed in the classification hierarchy (see one of the notes attached to the class of Media shown in Figure 6). The UML activity diagram for this process is shown in Figure 8. The same process can also be described from the operation view. As shown in Figure 9, when the area of the contaminated site is less than 1600m2 and its volume is less than 25000m3, the size of the site is small. If the area of the site is greater than 1600m2 but less than 2000m2, the volume of
site is greater than 25000m3 but less than 30000m3, the size of the site is medium. Otherwise, the size is large. The OCL expressions are used for the guard expressions, which are the conditions for making the decisions.
Figure 7. Relationship Among Some Classes in the Petroleum Remediation Domain
Figure 8. Determine the Media Size Figure 9. Determine the media Size
(Activity view) (Operation View)
An important benefit of developing an ontology for a software system is that it supports a “higher” level of reuse than is usually the case in software engineering, that is, knowledge reuse instead of software reuse. Moreover, an ontology enables
the developer to reuse and share application domain knowledge using a common vocabulary across heterogeneous software platforms and programming languages. It also enables the developers to concentrate on the structure of the domain or task at hand and insulate them from implementation details. To ensure realization of these benefits, we propose that integrating the ontology into the software
engineering process is essential. This integration can be realized by using UML with OCL as the representation mechanism for ontology construction. Our application illustrates that UML and OCL show promise for representing concepts, attribute constraints, relationships and process knowledge useful for both knowledge engineers and software engineers.
In our approach, the use of explicit domain ontologies can guide subsequent phases of the software development process because the ontology is directly incorporated into software design and implementation. This ensures the software developed is grounded solidly in the knowledge model derived directly from the initial knowledge capture. An ontology that captures and represents the basic concepts and relationships among concepts in a given application area can guide the generation of a large number of software components that contribute to a final system. The guidance provided by an ontology is important also for long term maintenance of the software. The UML with OCL can function as the representation tool for developing the integrated knowledge and software model. Future work includes investigating using UML with OCL for representing problem solving methods and task knowledge.
We are grateful for the generous support of a strategic grant from Natural Sciences and Engineering Research Council of Canada.
1. Newell A. The knowledge level. Artificial Intelligence 1982; 18: 87-127
2. Gruber T.R. A translation approach to portable ontology specifications. Knowledge Acquisition 1993; 5: 199-220
3. Sowa J. Top-level ontological categories. International Journal of Human-Computer Studies 1995; 43(5/6): 669-686 Studer R, Benjamins V.R, Fensel D. Knowledge Engineering: Principles and Methods. Data & Knowledge Engineering 1998; 25:161-197
4. Dahlgren K. Naive semantics for natural language understanding. Kluwer Academic, Boston, MA, 1988
5. Karp P.D. Hunter L. A qualitative biochemistry and its application to the regulation of the trytophan operon. In: Artificial Intelligence and Molecular Biology. AAAI Press/The MIT Press, 1993, pp.289-325
6. Chen L. Chan C.W. Ontology construction from knowledge acquisition. Pacific Knowledge Acquisition Workshop (PKAW 2000), 11-13 December, 2000, Sydney, Australia