Discovering Dimensional Concepts - AMDO: Automatic Multidimensional Design from Ontologies

4.3 AMDO: Automatic Multidimensional Design from Ontologies

4.3.1 Discovering Dimensional Concepts

This section introduces the multidimensional pattern to identify dimensional concepts from DL ontologies. Note that, in this chapter, we propose two different algorithms to compute this pattern (i.e., an ad hoc reasoning algorithm, and another using generic DL reasoners). Both are properly described in Section 4.4.

According to [C1], a dimensional concept is related to a fact by a one-to-many relationship (i.e., a complete fd); that is, every instance of factual data is related to one, and just one, of its instances. Hence, we can express our pattern to look for dimensional concepts as follows:

F v = 1r.D, where r ≡ (r1◦ . . . ◦ rn)

Note that this pattern is expressed in Description Logic (DL) [BCM+_{03], where r and D are} variables, and F the ontology concept we are trying to identify as a potential fact. As discussed in the introduction, in the general case, we assume OWL DL (a W3C recommendation) as our

input ontology language. Accordingly, we use OWL notation and thus, we consider a class to be a unary predicate (i.e., D and F), and a property (i.e., r) as a binary predicate expressing a relationship between two classes . Briefly, the v symbol stands for subsumption, the basic inference in DL. Subsumption is the problem of checking if the subsumer (in our assertion, F) is considered more general than the subsumee (= 1r.D). That is, if the subsumee can always be considered a subset of the subsumer. ≡ stands for a logic equivalence and can be defined as a specific kind of subsumption, that is: subsumer v subsumee and subsumee v subsumer. ◦ stands for property composition (i.e., {a, c} ∈ r ◦ s iff ∃b such that {a, b} ∈ r and {b, c} ∈ s). Finally, = 1 stands for a specific number restriction where, in our case, the number of individuals belonging to class D related to a given individual of the class F, through the property r, must be exactly one. Thus, we are looking for classes (D) such that every instance of a given fact (F) is related, directly or by property composition (r), to, at least and at most, one of its instances. For each ontology class F we look for classes that may play the dimensional concept role by evaluating the pattern presented above; where the dimensional concept is defined by the class D (from here on, the ending concept) and a composite property r (from here on, the property path or simply, the path).

For example, consider the conceptual schema in Figure 4.2. Branch is a dimensional concept of rental agreement, as every instance of rental agreement is related, to at least and at most, one instance of branch. Thus, rental agreement would play the F role; branch the D role and dropOffBranch the r role. Note, however, that r can be a com- posite property and thus, country is a dimensional concept of rental agreement as well, because every instance of rental agreement is related, at least and at most, to one instance of country by means of dropOffBranch ◦ locatedAt.

In our approach, we not only consider classes to play a dimensional concept role (i.e., the role of D), but also datatypes. Hence, a datatype may play a measure role (as it would seem more natural to think and we will discuss later), but also the role of an analysis dimensional concept. Handling facts and dimensions uniformly is not new. In fact, it was introduced by Agrawal et al. [AGS97] and since then, it has also been considered in many other design methods. Conse- quently, in our example, basicPrice and bestPrice will be considered potential dimensional concepts of rental agreement. Importantly, note that a dimensional concept and a measure derived from the same datatype must be semantically related in the output multidimensional schema (for example, by the “equivalence” construct in OWL or by an “association” relationship in UML).

Definition 1. A dimensional concept is defined by an ending concept and a path of properties (i.e., a composite property). From a multidimensional point of view, the path must be considered because it adds relevant semantics. Two classes related by means of n different to-one paths (i.e., a complete fd) must give rise to n different perspectives of analysis, as all these paths will potentially identify different sets of instances in the ending concept.

For example, consider Figure 4.2. There, rental agreement has two to-one relation- ships to branch (i.e., pickUpBranch and dropOffBranch). Thus, {branch, pickUpBranch} and {branch, dropOffBranch} must be considered as two different points of view from where analyze a rental agreement, and the semantics of each dimensional concept identi- fied is provided by the combined semantics of the path and the ending concept.

4.3.1.1 Practical Consideration

Several current design methods consider a dimensional concept to be a functional dependency (see, for example, [GR09, BvE99, HLV00, MK00, JHP04, GRG05]). Thus, they do not require the dimensional concept to be a complete functional dependency. From our point of view, we strongly recommend to enforce the theoretical pattern presented as much as possible. Relaxing them, indeed, may entail the identification of meaningless dimensions or give rise to sparser multidimensional spaces, which may mislead the user.

Nevertheless, we could consider this practical consideration (i.e., fact instances not related to any instance of a dimensional concept) and, like current approaches do, automatically create a dummy instance (for example, named others) related to fact instances not related to the dimensional concept. Then, our pattern to look for dimensional concepts would look like as follows:

F v ≤ 1r.D, where r ≡ (r1◦ . . . ◦ rn)

This multidimensional pattern, however, cannot be used over arbitrary OWL DL ontologies. Indeed, the mandatory participation is needed in arbitrary ontologies to avoid discovering meaningless functional dependencies. Importantly, in an ontology, properties are not necessarily typed, i.e., they do not necessarily have a specified class as domain and a specified class as range. Therefore, we cannot establish, in the general case, that a property relates one class to another class. As a consequence, considering the pattern introduced in this section, every functional untyped property would potentially allow to infer that two arbitrary classes are functionally de- pendent on each other, provided that the property relates one instance to, at most, a single other instance (i.e., that it is functional).

Note, however, that this general assumption does not hold for conceptual schemas. Consider Figure 4.2. In a UML conceptual schema, every property is strictly typed. Therefore, OWL DL ontologies derived from conceptual schemas are also strictly typed. Thus, when working from OWL DL ontologies assuming strict property-typing (for example, ontologies derived from conceptual schemas) we may relax and successfully compute the alternative pattern presented in this section.

In the AMDO tool we introduced a check-box to allow the user enforce this restriction, or relax it, according to the designer own considerations.

In document Automating the multidimensional design of data warehouses (Page 135-137)