4.2 A data mining query language
4.2.3 Syntax for concept hierarchy specication
Concept hierarchies allow the mining of knowledge at multiple levels of abstraction. In order to accommodate the dierent viewpoints of users with regards to the data, there may be more than one concept hierarchy per attribute or dimension. For instance, some users may prefer to organize branch locations by provinces and states, while others may prefer to organize them according to languages used. In such cases, a user can indicate which concept hierarchy is to be used with the statement
use hierarchy hhierarchyiforhattribute or dimensioni.
Otherwise, a default hierarchy per attribute or dimension is used.
How can we dene concept hierarchies, using DMQL? In Section 4.1.3, we studied four types of concept hierarchies, namely schema, set-grouping, operation-derived, and rule-based hierarchies. Let's look at the following syntax for dening each of these hierarchy types.
1.
Denition of schema hierarchies.
Example 4.17
Earlier, we dened a schema hierarchy for a relationaddressas the total order street < city < province or state < country. This can be dened in the data mining query language as:dene hierarchy location hierarchy onaddressas[street, city, province or state, country]
The ordering of the listed attributes is important. In fact, a total order is dened which species thatstreet
is conceptually one level lower thancity, which is in turn conceptually one level lower than province or state,
and so on. 2
Example 4.18
A data mining system will typically have a predened concept hierarchy for the schema date (day, month, quarter, year), such as:dene hierarchy time hierarchyondateas[day, month, quarter, year]
2
Example 4.19
Concept hierarchy denitions can involve several relations. For example, anitem hierarchymay involve two relations,item andsupplier, dened by the following schema. item(item ID;brand;type;place made;supplier)
supplier(name;type;headquarter location;owner;size;assets;revenue) The hierarchy item hierarchycan be dened as follows:
dene hierarchy item hierarchyonitem, supplieras
[item ID, brand, item.supplier, item.type, supplier.type]
where item.supplier = supplier.name
If the concept hierarchy denition contains an attribute name that is shared by two relations, then the attribute is prexed by its relation name, using the same dot (\.") notation as in SQL (e.g., item.supplier). The join
condition of the two relations is specied by awhereclause. 2
2.
Denition of set-grouping hierarchies.
Example 4.20
The set-grouping hierarchy forageof Example 4.4 can be dened in terms of ranges as follows:dene hierarchy age hierarchyforageon customeras
level1: fyoung, middle aged, seniorg< level0: all
level2: f20, ..., 39g< level1:young
level2: f40, ..., 59g< level1:middle aged
young senior 20,...,39 40,...59 60,...89 all middle_aged level 0 level 1 level 2
Figure 4.7: A concept hierarchy for the attributeage.
The notation \..." implicitly species all the possible values within the given range. For example, \f20, ...,
39g" includes all integers within the range of the endpoints, 20 and 39. Ranges may also be specied with real
numbers as endpoints. The corresponding concept hierarchy is shown in Figure 4.7. The most general concept for age is all, and is placed at the root of the hierarchy. By convention, the all value is always at level 0 of any hierarchy. The allnode in Figure 4.7 has three childnodes, representing more specic abstractions ofage, namelyyoung,middle aged, and senior. These are at level 1 of the hierarchy. The age ranges for each of these
level 1 concepts are dened at level 2 of the hierarchy. 2
Example 4.21
The schema hierarchy in Example 4.17 for location can be rened by adding an additional concept level,continent.dene hierarchy onlocation hierarchy as
country: fCanada, USA, Mexicog< continent: NorthAmerica
country: fEngland, France, Germany, Italyg< continent: Europe
...
continent: fNorthAmerica, Europe, Asiag<all
By listing the countries (for which AllElectronicssells merchandise) belonging to each continent, we build an additional concept layer on top of the schema hierarchy of Example 4.17. 2
3.
Denition of operation-derived hierarchies
Example 4.22
As an alternative to the set-grouping hierarchy forage in Example 4.20, a user may wish to dene an operation-derived hierarchy for agebased on data clustering routines. This is especially useful when the values of a given attribute are not uniformly distributed. A hierarchy for age based on clustering can be dened with the following statement:dene hierarchy age hierarchyforageon customeras
fage category(1), ..., age category(5)g:= cluster(default, age, 5) <all(age)
This statement indicates that a default clustering algorithm is to be performed on all of the age values in the relation customer in order to form ve clusters. The clusters are ranges with names explicitly dened as \age category(1), ..., age category(5)", organized in ascending order. 2
4.
Denition of rule-based hierarchies
Example 4.23
A concept hierarchy can be dened based on a set of rules. Consider the concept hierarchy of Example 4.8 foritemsatAllElectronics. This hierarchy is based on item prot margins, where the prot margin of an item is dened as the dierence between the retail price of the item, and the cost incurred byAllElectronicsto purchase the item for sale. The hierarchy organizes items intolow prot marginitems,medium-prot margin
dene hierarchy prot margin hierarchy onitemas
level 1: low prot margin< level 0: all if(price,cost) < $50
level 1: medium-prot margin< level 0: all
if((price,cost) > $50)and ((price,cost)$250))
level 1: high prot margin< level 0: all if(price,cost) > $250
2