SPATIAL RU LES - Dunham Data Mining pdf

Spatial rules can be generated that describe the relationship between and structure of spatial objects. There are three types of rules that can be found during spatial data mining [KAH96] . Spatial characteristic rules describe the data. Spatial discriminant rules describe the differences between different classes of the data. They describe the

features that differentiate the different classes. Spatial association rules are implications of one set of data by another. The following examples illustrate these three types of rules:

• Characteristic rule: In Dallas the average family income is $50,000.

• Discriminant rule: In Dallas the average family income is $50,000, while in Plano

234 _Chapter₈ _{Spatial M i ning}

• Association rule: In Dallas the average family income for families living near White Rock Lake is $100,000.

Characterization is the process of finding a description for a database or some subset thereof. All of these rules can be thought of as spedal types of characterizations. The characteristic rule is the simplest.

Another common approach to summarizing spatial data is that of performing a trend detection, which is viewed as a regular change in one or more nonspatial attribute values for spatial objects as you move away from another spatial object [EFKS98]. For example, the average price per square foot of a house may increase as the proximity to the ocean increases. Regression analysis may be used to identify a trend detection.

8.5.1 Spatial Association Rules

Spatial association rules _{are association rules about spatial data objects. Either the} antecedent or the consequent of the rule must contain some spatial predicates (such as near):

• Nonspatial antecedent and spatial consequent: All elementary schools are located close to single-family housing developments.

• Spatial antecedent and nonspatial consequent: If a house is located in Highland Park, it is expensive.

• Spatial antecedent and spatial consequent: Any house that is near downtown is south of Plano.

Support and confidence for spatial association rules is defined identically to that for regular association rules. Unlike traditional association rules, however, the underlying database being examined usually is not viewed as a set of transactions. Instead, it is a set of spatial objects.

The simplest spatial association rule generation algorithm is found in [KH95]. The approach is similar to that discussed earlier for classification in that a two-step approach is used. As with traditional association rule algorithms, all assodation rules that satisfy the minimum confidence and support are generated by this algorithm. Because of the large number of possibilities for topological relationships, it is assumed that the data mining request indicates what spatial predicate(s) is to be used. Once the relative subset of the database is determined, relationships of this type are ide.ntified. It initially is assumed that "generalized" versions of the topological relationships are used. The generalized relationships are satisfied if some objects higher Up the concept hierarchy satisfy it. For example, zip codes may be used instead of the exact structure of the house. At this level, a filtering is performed to remove objects that could not possibly satisfy the relationship.

To illustrate the concept of generalization with the spatial relationships, we follow the example found in [Kop99]. Suppose that the topological relationship being examined is "close_to." The GIS system would define precisely what this predicate means. For example, it could define the relationship based on the Euclidean distance between the two spatial objects. In addition, it might be defined differently based on the type of objects in question. The generalization of "close_to" that is written as "g_close_to" may

Section 8.5 Spati al Rules 235

be defined by a hierarchy that shows that g_close_to contains close_to as well as other predicates (such as contains and equal). A first step in d�termining satisfiability of the close_to predicate would be to look at a coarse evaluation of g_close_to. The co�se evaluation is used as a type of filter to efficiently rule out objects that could not posstbly satisfy the true predicate. The coarse predicate coarse_g_close_to is satisfied by objects if their MBRs satisfy g_close_to. Only those objects that satisfy coarse_g_close_to are examined to see if they satisfy g_close_to.

The five-step algorithm is outlined in Algorithm 8.4. It is assumed that a

�

ata mining query is input. The query contains selection informat

�

on that

�

s used to r�tneve the objects from the database that are of interest. The topologtc�l predtcates

�

�ng the spatial relationships of interest are also input. Using these predtcates, P, an tmttal .table is built C p that identifies which pairs of objects satisfy P at a coarse level. The mput minim

�

m s

�

pports are actually a set of support values to be used at different levels in the processing. s [l] is the support level to be used at the coarse filtering level. Af

�

er th

�

s filtering, the pairs of objects that satisfy the coarse predicates are counted to see tf therr support is above the minimum. In effect, this frequent coarse predicate. (FCP) database is the set of large one-itemsets. The predicates in FCP are then exammed to fin.d the frequent predicates at a fine level (FFP). The last step expands these frequ�nt pre

�

t�ates of size 1 to all arbitrary predicate sizes and then generates the rules as wtth tradttlonal association rules. This is performed similarly to Apriori. By finding the FCRs first, the number of objects to be examined is reduced at the last step.

ALGORITHM 8.4 Input : D c s Ci q p

/ / Data , inc luding spat ial and nonspatial attributes / / Concept hierarchies

/ /Minimum support for leve l s

/ / Conf idence

/ /Query to retri eve interested obj ects / / Topologi cal predi cat e ( s ) of interest

Output :

R / /Spat ial assoc i at i on rules

SPATIAL association rule algorithm:

d = q(D) ;

CP is bui l t by applying the coarse predi cate ver s i on of P to d ; 1 1 CP cons i s t s of the set of coarse predicates sat i s f ied by

pairs of obj ects in d .

determine the s e t o f frequent coarse predicates FCP by f i nding

the coarse predicates that sat i s fy s ;

f ind the s e t of frequent fine predi cates FFP from FCP ; f ind R by finding a l l frequent fine predi cates and then

generat ing rul es ;

This algorithm works in a similar manner to the Apriori algorithm in that large "predicate sets" are determined. Here a predicate set is a �et of . predicates of interest. A !-predicate might be { (close_to, park) }, so all spatial o

�

Jects that are close_to a park will be counted as satisfying this predicate. A 2-predtcate could be { (close_to, park), (south_of, Plano)}. Counts of 1-predicate sets are counted, then th?se that are large are used to generate 2-predicate sets, and these are then counted. In actuality,

236 _{Chapter 8} _{Spatial Min ing}

the algorithm can be used to generate multilevel association rules if desired or rules at a coarse level rather than a fine leveL

In document Dunham Data Mining pdf (Page 123-125)