Journal of Environmental Science, Computer Science and Engineering & Technology

(1)

JECET; September 2013 – November 2013; Vol.2.No.4, 1320-1335.

Journal of Environmental Science, Computer Science and Engineering & Technology

An International Peer Review E-3 Journal of Sciences and Technology

Available online at www.jecet.org Computer Science

Research Article

A New Approach for Spatial Pattern Analysis

Neethu C V and Subu Surendran

Department of Computer Science, SCT College of Engineering Trivandrum, Kerala, india Received: 5 October 2013; Revised: 6 November 2013; Accepted: 14 November 2013

Abstract: Spatial data mining aims to extract interesting, useful, non-trivial patterns from the spatial datasets. The main challenge in accomplishing this task is the huge amount of data contained in thespatial data bases. In order to tackle this problem, we are proposing a new frame work based on spatialclustering and association rule mining. This is a generalized frame work which can be applied to anyspatial application domains. Here, we are illustrating the working for discovery of co-located pointsof interests in Indian region. This provides an overview of current trend in this region. DBSCAN algorithm is using for spatial clustering. The spatial clusters obtained from this are then given as inputto the FPGrowth algorithm for performing association rule mining. These spatial association rules arethe indicators of current spatial trends.

Key words: Spatial pattern,Cluster,Association Rules.

INTRODUCTION

The last decade has experienced a revolution in information exchange in various sectorsvia internets. In the same spirit, more and more business and organizations began to collect data related to their own operations. So, mining of important data and relations from stored data became a critical issue. While the database technologists have been seeking efficient means of storing, retrieving and manipulating data, the machine learning community has focused on developing techniques for learning and extracting valuable knowledge from the data. Due to the massive applicability of this technology, it became an important

(2)

thread for active researches. This lead to finding more techniques related to the mining of interesting, previously unknown patterns from the database and also solved many challenges related to the

Theoretical concepts and practical applications. The main issues related to the data mining arethe following,

• Massive data sets and high dimensionality.

• Overfitting and assessing the statistical significance.

• Understandability of patterns.

• Non-standard incomplete data and data integration.

• Mixed changing and redundant data.

1.1 Spatial Association Rule Mining: Spatial association rule mining is one of the most popular techniques using for deducing implicit information from spatial data. This is very much similar to the conventional association rule mining; the only difference is in the data set. Here, a generalized approach is using for mining rules based on the spatial clustering. The concept of association rule mining is introduced in¹ for discovering sales patterns in large scale transaction data of supermarket. The support and confidence are the two key parameters in the rule mining algorithms.Suppose, the transaction database D composed of n items, I = {I1, I2, I3,....,In } and an item set is the subset of I. The support of an item set X is given by Supp(X) and defined as the proportion of transactions which contains the item set X. Association rule has the form Antecedent ) Consequent. Here, both antecedent and consequent can be any item set but must Antecedent \ Consequent = φ. The confidence of any rule X→Y is given by Conf(X)Y) = Supp(X[Y)/Supp(X).The main research challenge is the transact ionizing of the spatial data.

This can be solved by applying spatial clustering methods. Spatial clusters are obtained by grouping spatial objects so that high degree of similarity can be seen within clusters as compared to between clusters. DBSCAN² and CLIQUE³ are the two clustering methods utilized in this work for making transactions. The spatial dataset consist of interesting locations like school, hospital, parking, restaurant etc in Indian region. Our aim is to find the interesting patterns from this data set so that builders can make decisions based on the patterns obtained through this method. The same frame work can be applied to find interesting patterns in other applications. The proposed system is a generalized way of deducing spatial association rules that provides some useful hidden information indicating the current trends. For example, hospital and pharmacy is the two spatial points that are found to be in nearby region. These kind of relations can be represented by hospital) pharmacy rules. These rules provides the hidden information about the new trends in Indian cities that may influence the decisions like, “Whereis the suitable locations for particular building?”The efficiency of these decisions is governed by the accuracy of the spatial information of various locations in the spatial database. The same framework had implanted in two ways by changing the clustering algorithms for the comparison purpose.

1.2 Motivation for Spatial Association Rule Mining: There are mainly two motivations for spatial association rule mining. The first one is the changing of trends with respect to geography and other is the difficulty in handling huge spatial data set. Spatial analysis deals with many complex issues, most of which are not clearly specified. One of the fundamental issues is difficulty in specifying spatial location of a particular entity. For example, a study related to the activities of peoples of a particular region needs to describe the locations of where they live, where they work etc. As the spatial locations changes, the activities of peoples also show some variations which constitute the spatial trends. There are many challenges related to spatial data handling. The most complex and fascinating problem is the integration of spatial data collected with different spatial scales. Recent advances in geographic information system make it possible to combine spatial data with different spatial scales by using various techniques like

(3)

block kriging⁴. The issue involved in management of spatial data is its huge size. Spatial pattern extraction makes it easier to handle and analyze spatial data through spatial association rule mining technique. The paper is organized as follows. Section 2 points out the major milestones in the literature of spatial data mining technology. This includes the important spatial clustering and association rule mining methods. Section 3 explores the design of cluster based spatial association rule mining framework in the context of spatial points in Indian region. Section4 describes the results obtained from each stage of this framework and the spatial patterns regarding the region under study. This chapter also includes the performance analysis of cluster based spatial association rule mining framework. At last, section 5 provides the conclusion of the work.

2 LITERATURE SURVEY

Data mining is frequently described as “the process of extracting valid, authentic, and actionable information from large databases”. In other words, data mining derives patterns and trends that exist in data. These patterns and trends can be collected together and defined as a mining model. Mining models can be applied to specific business scenarios, such as: Forecasting sales, Targeting mailings toward specific customers, determining which products are likely to be sold together, finding sequences in the order that customers add products to shopping cart etc.Spatial data mining is the process of applying data mining techniques to spatial data sets. The proposed framework for spatial association rule mining combines the spatial clustering and association rule mining techniques.

2.1 Spatial Clustering: Spatial clustering is the process of grouping a set of objects in certain dimensional space into clusters such that the objects in the same cluster have high similarity, while are dissimilar to those in other clusters². Spatial clustering is an important component of spatial data mining, which is a tool to get insight into the distribution of data and to provide the characteristics of the clusters.

Spatial clustering plays an important role in several applications, such as spatial epidemiology, landscape ecology, crime analysis, disease surveillance, population genetics and many other fields. There are different types of spatial clustering methods depending upon the concept behind the grouping. Partitional clustering partitions the whole data set into different partitions so that similarity within the cluster can be maximized while between clusters can be minimized. Means⁵, Kmedoid⁶ etc are members of this category. In hierarchical clustering, clusters are formed by sequence of partitional operations which can be either bottom-up or topdown.AGNES⁷, DIANA⁷ are the examples of this method. Density-based clustering is another method in which dense regions of the data space form the clusters while spatial points in the low density regions are regarded as outliers.DBSCAN² and DENCLUE are the examples of density-based methods. The fourth and last method is Grid based clustering in which the object space is divided into different hyper rectangles according to the dimension and then perform required operations to make clusters, CLIQUE³,STING⁹ etc are belongs to this category.

2.2 Association Rule Mining: Association rule mining, one of the most important and well researched techniques of data mining. It aims to extract interesting correlations, frequent patterns, associations or casual structures among sets of items in the transaction databases or other data repositories⁵. Association rules are widely used in various areas such as telecommunication networks, market and risk management, inventory control etc. Algorithms related to the association rule mining are the following:

Apriori Algorithm has great importance in the history of data mining. This is simple and straightforward approach proposed by Agrawal in⁹ .The algorithm proceeds through repeated passes over the database by generating candidate item set and storing counters of each of these item set. Then retrieve frequent item set for association rule mining¹⁰. In the process of finding frequent item sets, candidates are formed by joining frequent item sets level wisely. Then, infrequent item sets are then eliminated by support measurement and confidence measurements using for rule generation process. The main disadvantage of

(4)

the Apriori algorithm is the scanning of database many times for possible candidate generation. So, FP- Tree algorithm was proposed as a solution to these Apriori series of algorithms in¹⁰ [6]. Frequent patterns are generated by only two passes over the whole database without candidate item set generation.Computationally; FP-Tree is an order of magnitude faster than Apriori. Detailed description of 3this algorithm is given in the next section. Rapid Association Rule Mining (RARM)is another approach for association rule mining without candidate generation and proposed in¹¹ and proved to be faster than FP-Tree. This utilizes a specialized SOTrieIT structure, similar in construction to FP-Tree, for generating large 1-itemsets and large 2-item sets quickly.

Figure 1: database and support of items

Figure 2 shows the FP Tree structure and So TrieIT structures built based on the the transaction database given in Figure 1.

(a) FP Tree (b) SOTrieIT structure

(5)

(b) SOTrieIT structure

Figure 2: FP Tree and SOTrieIT corresponding to the original database 3. CLUSTERS BASED SPATIAL ASSOCIATION RULE MINING

3.1 Problem Definition: Let S = f S1,S2,S3,_ _ _,Sn g be a set of spatial objects in a geographical region, then extract spatial association rules of the form X)Y where X,Y_S and thus find spatial patterns X[Y of that geographical region.

Def: A spatial pattern is a perceptual structure, placement, or arrangement of objects on Earth. It also includes the space in between those objects. Patterns may be recognized because of their arrangement;

maybe in a line or by a clustering of points.4

3.2 Framework of Spatial Association Rule Mining

Figure 3: Overview of spatial Association Rule Mining Framework

(6)

The framework combines two important data mining techniques, ie spatial clustering and association rule mining and has mainly two stages (Figure 3).The input for the first stage is the spatial data stored in the spatial database of ArcGIS 10.1 tool, which is a platform for designing and managing solutions through the application of geographic knowledge. The first stage represents the clustering of the spatial data. This had done by DBSCANand CLIQUE clustering algorithms which are specifically designed for handling spatial data. Each cluster represents a dense area in the spatial region, obtained by materializing the distance relation between spatial points. The main advantage of this framework is that, overall efficiency of the system is directly related to the efficiency of clustering method. So, any advancement in spatial clustering can make this framework more efficient. These clusters are analogues to the transactions of market-basket data and can be used as input to the next stage.

The next stage represents the association rule mining from the spatial clusters. Here, FPGrowth algorithm had employed to extract spatial association rules. Transformation of spatial data to transactions by spatial clustering reduces the overhead of handling spatial data while mining any implicit information from the huge spatial data set. This stage also is an independent stage which can contribute to the overall efficiency of the system by employing new association rule mining methods. This stage of the framework output the spatial association rules.

3.3 DBSCAN (Density Based SCANning): The first phase of the framework is the spatial clustering.

DBSCAN is one of the effective methods for obtaining spatial clusters. This is a density based method and it requires two parameters: e(eps) and the minimum number of points required to form a cluster (minPts). It starts with an arbitrary starting point that has not been visited¹². This point’s e-neighbourhood is retrieved, and if it contains sufficiently many points, a cluster is started. Otherwise, the point is labelled as noise. Note that this point might later be found in a sufficiently sized e-environment of a different point and hence is made part of a cluster. If a point is found to be a dense part of a cluster, its e- neighbourhood is also part of that cluster. Figure 4 illustrate the working of DBSCAN algorithm.

Figure 4: Working of DBSCAN

(7)

Hence, all points that are found within the e-neighbourhood are added, as is their own e-neighbourhood when they are also dense. This process continues until the density-connected cluster is completely found.

Then, a new unvisited point is retrieved and processed, leading to the discovery of a further cluster or noise. The two parameters are important influencing factors of the algorithm. The pseudo code for the algorithm is given below.

Data: D, eps, MinPts Algorithm DBSCAN()

C = 0 for each unvisited point P in dataset D do mark P as visited

NeighborPts = regionQuery(P, eps) if sizeof(NeighborPts) < MinPts then

mark P as NOISE

end end

Data: P, NeighborPts, C, eps, MinPts Result: Neighborlist

Procedure expandCluster()

for each point K in NeighborPts do

regionQuery(P, eps)

return all points within P’s 2-neighborhood if K is not visited then

mark K as visited NeighborPts1 = regionQuery(K, eps) if sizeof(NeighborPts’) >=

end if P’ is not yet member of any cluster then add P’ to cluster C

end

Algorithm 1: DBSCAN algorithm MinPts then

NeighborPts = NeighborPts joined with NeighborPts1

end else

end

C = next cluster

expandCluster(P, NeighborPts, C,eps,MinPts)

(8)

DBSCAN algorithm visits each point in database many times (in the case when a point is candidate to different clusters). According to the above algorithm, the complexity is governed by the number of region Query invocations. One region Query method is required to execute for each of the points and it executes in O(log n) using an indexing structure where n is the number of spatial points, then overall runtime complexity can be O(n log n). If there is no such indexing structure, the complexity would increase to O (n²).

3.4 Clique Algorithm: CLIQUE (Clustering In QUEst) is another spatial clustering algorithm used for implementing the first stage. The CLIQUE algorithm integrates density based and grid based clustering unlike other clustering algorithms described earlier, CLIQUE is able to discover clusters in the subspace of the data ¹[3]. In CLIQUE, the data space is partitioned into non-overlapping rectangular units by equal space partition along each dimension. A unit is dense if the fraction of total data points contained in it exceeds an input model parameter.

CLIQUE performs multidimensional clustering by moving from the lower dimensional space to higher.

When searches for dense units at the k-dimensional space, CLIQUE make use of information that is obtained from clustering at the (k-1) dimensional space to prune off unnecessary search. This is done by observing the Apriori property used in association rule mining. In general, property employs prior knowledge of the items in the search space so that portions of the space can be pruned. The property adapted for the CLIQUE states the following: If a k-dimensional unit is dense, then so are its projections in the (k-1) dimensional space. That is, given a k-dimensional candidate dense unit, check its (k-1)th projection units and find any that is not dense, then we know that the kth dimensional unit cannot be dense either. Therefore, we generate the potential candidate dense units in the k-dimensional space from the dense units found in the (k-1) dimensional space. It illustrated in Figure 5. In general, the resulting space searched is much smaller the original one. The dense units are then examined to construct clusters.

Having found clusters, CLIQUE generates a minimal description for each cluster as follows: For each cluster, it determines the maximal region that covers the cluster of connected dense units. It then determines a minimal cover for each cluster².

Figure 5: Determining potential region with dense units

(9)

Data: Data D

for each dimension d in D do

end

Partition d into equal interval Identify dense units

K=2while 1 d

end

foreach combination of k dimension d1, d2...dk forintersection I of dense units along the k dimension if I is dense then

Marks I as dense unit

end

if no units marked as dense then break from while (1) loop

end

Algorithm 2: CLIQUE algorithm

CLIQUE automatically finds subspace of highest dimensionality such that high density clusters exist in those subspaces. It is intensive to the order of input tuples and does not presume any canonical distribution. It scales linearly with the size of input and has good scalability as the number of dimensions in the data is increased. However, the accuracy of clustering results may be degraded at the expense of the simplicity of the method. The clusters obtained through this method are act as transactions and can be given as the input to next stage of the frame work. As the efficiency of clustering increases, this can in turn raise the accuracy of patterns obtained at the last stage. However, the efficiency is not in terms time, the main concern is for efficient clusters based on the distance relation.

3.5 FP Growth Algorithm: The third stage of the framework is the association rule mining from the spatial clusters. The efficient rule mining can make the patterns more accurate and can thereby helps to take good decisions.FP Growth is one of the basic algorithm use for generate association rules which is an approach based on divide and conquers method. The main purpose of this technique is to produce frequent item sets by using the combination of data attributes¹³. The main advantages of the FP Tree algorithm are the following

• Only two passes over data set is required to complete the algorithm

• FP Tree structure compresses the data

• No candidate generation in prior is required

(10)

• It is much faster than Apriori

The major difference between Apriori and Frequent Pattern Tree is that even though they both aim to derive frequent pattern data set. However, Apriori algorithm includes all source data set (i.e. non- candidate dataset) as frequent data pattern for data analysis while Frequent Pattern Tree (FP Tree) only considers candidate data set (valid data set according to user’s requirements) as frequent data pattern for data analysis. As a result, FP tree can perform much better than Apriori algorithm to derive Frequent Pattern data. The third phase is the association rule mining from the spatial clusters. .The major steps of FP growth are consisting of the following steps:

3.5.1 FP Tree Construction: The construction of the FP-tree requires two scans of the transaction database. The first scan accumulates the support of each item and then selects items that satisfy minimum support, i.e. frequent 1-itemsets. The supports of the ancestors of each item are also accumulated. Those items are sorted in frequency descending order to form F-list. The second scan constructs the FP-tree.

The pseudo code for the construction of the FP-tree is given below. First, the ancestors of each item in the transaction are added. Then the transactions are reordered according to the F-list, while non-frequent items are stripped off. Lastly, the reordered transactions are inserted into the FP-tree. In the function insert fptree, if the node corresponding to the items in transaction exists the count of the node is increased, otherwise a new node is generated and the count is set to one. The same order of the items plays important role for the compression of the database since common prefixes can be shared among many transactions. The FP-tree also has a frequent-item header table that holds the head of the node- links, which connect nodes of same item in FPtree. The node-links facilitate item traversal during the mining of frequent patterns. The pseudocode for the FPTree construction is given below.

Data: D,FList Result: FPtree while not eof(D) do

end

tranline = read-trans(D)

add all ancestors of each item in tranline removing any duplicates in tranline o-trans = get-ordered-trans(Flist, tranline) insert-fptree(FPtree, o-trans);

Algorithm 3: FP Tree construction

3.5.2 Bottom up traversal : The pseudo code for the recursive function Bottom-Up FP-tree is given in Algorithm 4.Inputs to the algorithm are the FP-tree, the minimum support, and a list of ancestors which have been investigated so far. To find all frequent patterns whose support are higher than minimum support, it adopts the methodology called conditional-search, which looks for all patterns with the same suffix once at a time.

(11)

Data: FPtree, X, anclist

for each item y (bottom-up order)in the header of FPtree do

end

if y is in anclist then then continue

endgenerate pattern Y = y [ X with support = y support add ancestors of y to anclist

cond-pbase = construct-cond-pbase(Tree,y);

Y-Flist = sort-cond-pbase(cond-pbase);

Y-Tree = construct-fptree(cond-pbase,Y-FList);

if Y-Tree is not NULL) then BU-FPtree(Y-Tree, Y, anclist

end

Algorithm 4: Bottom up traversal

It traverses nodes in the FP-tree starting from the least frequent item in F-list. While visiting each node, collects the prefix-path of the node, which is the set of items on the path from the suffix node to the root of the tree. It also stores the count on the node as the count of the prefix path. The prefix paths form the so-called conditional pattern base of that item. Then BU-FPtree creates small FP-tree from the conditional pattern base called conditional FPtree. During each iteration, a new frequent item set is generated by adding the suffix to the item set from the previous iteration. BUFPtree also maintains a list of ancestors anclist for the items in the current item set. The process is recursively iterated until no conditional pattern base can be generated and all frequent patterns that contain the item are discovered.

3.5.3 Association Rule Mining: Given a frequent itemset L, find all non-empty subsets f → L such that f → L - f satisfies the minimum confidence requirement. If {A, B, C, D} is a frequent item set, candidate rules:

ABC→D, ABD→C, ACD→B, BCD→A, A→BCD, B→ACD, C→ABD, D→ABC, AB→CD, AC→BD, AD→BC, BC→AD, BD→AC, CD→AB

This means If |L| = k, then there are 2^k-2 candidate association rules (ignoring L →

Φ

and Φ → L).So, in order to mine efficient rules, confidence measure is used.

(12)

Figure 6: Possible association rules

The dark circles correspond to rules which have confidence greater than a particular threshold. The confidence measure is using to eliminate unwanted rules. Those rules which have confidence greater than a particular threshold would then retrieved as result.

4 IMPLEMENTATION RESULTS

The frame work had applied to find the current trends in the location of interesting locations in the Indian region. For this, a spatial database consisting of various interesting points like school, hospital, pharmacy, restaurant etc in various parts of the India had created and applied the system to find the association among the various spatial points. This providesus an insight to the implicit needs of the society.The Figure7 is the thematic map of these spatial points. The data set used in this study was obtained from^14,15 [22][23].The shape files of the case study area had then transformed to the spatial database using ArcGIS tool and the rest of the system had been implemented using Java 1.7 and Netbeans 7.0.

Figure 7: Thematic map of Indian region with spatial points

(13)

The framework mainly consists of two stages which include spatial clustering and association rule mining. The main aim behind the clustering of the spatial data is to transactionising the geographical data. The clusters obtained from the second steps are analogues to the transactions of market basket analysis. This can be passed to the next step as the input. DBSCAN and CLIQUE are the two algorithms that are utilized for constructing clusters the second step. These two algorithms are specifically designed for handling spatial data. Conventional clustering methods like Hierarchical, Partitional etc are avoided due to the following reasons.

• Partitional clustering such as k-means algorithm can only discover spherical clusters

• It is sensitive to centroid point and noise

• Hierarchical clustering can be used to discover well separated isotropic clusters. It does not usually consider the attributes of spatial objects

The main advantages of DBSCAN and CLIQUE algorithms are given below.

• Discover clusters of arbitrary shape

• Do not require user to input number of clusters The sample clusters obtained are listed in Table 1:

Table 1: Sample Clusters

Cluster Number Cluster Members

1 Theatre library post_box post_office parking cafe restaurant

2 Hospital fuel theatre attraction restaurant ,restaurant fuel fuel atm bank 3 Post_office post_office atm bank pharmacy bank parking

4 atm pharmacy fast_food post_box telephone parking atm

5 Pub bank restaurant fuel hospital supermarket post_office fast-food

When considering the time complexity of these two algorithms, CLIQUE is better than DBSCAN because the complexity of DBSCAN is O(n log n) where n is the number of data points and the complexity of CLIQUE is O(D²),where D is the dimensionality of the input data. The CLIQUE algorithm is specifically designed to handle high dimensional spatial data. This combines the density concept of clustering with grid based concept. So, this method had got the benefits from both.

The second step is association rule mining which utilizes FP Growth algorithm for the discovery of co- located spatial patterns. The unique characteristics of FP Growth algorithm as compared to other algorithms such as Apriori are:

Completeness: Preserve complete information for frequent pattern mining and never break a long pattern of any transaction.

Compactness: Reduce irrelevant information, so never be larger than original database. It is by sharing frequently occurring items.

The main advantages of the FP Growth algorithm are the following:

• Only two passes over data-set is required to complete the algorithm

(14)

• FP Tree structure “compresses" data-set

• No candidate generation for frequent pattern mining

• It is much faster than Apriori algorithm ( The time complexity of Apriori is O(n²) where n is the number of items in the transaction database whereas the time complexity of FP Growth algorithm is O(2ⁿ))

Support and confidence are the two parameters involved in association rule mining. The support had set to 40% (0_4) and confidence to 80%. The spatial association rules obtained shows that there is a high degree of association between the locations {school, bank, atm, restaurant}, {bank, atm, parking, fuel},{hospital pharmacy}. Some of the association rules are given below.

{Hospital}→ {restaurant bank} (sup= 0.5168539325842697 conf= 1.0) {School bank}→ {hospital} (sup= 0.6123595505617978 conf= 1.0)

{Hospital} → {pharmacy} (sup= 0.5539325842696629 conf= 0.9130434782608695) {Parking parking}→ {fuel} (sup= 0.5168539325842697 conf= 0.8440366972477065)

4.1 Performance Analysis: This clustering based spatial association rule mining frame work had implemented in two ways in order to show the flexibility of the system. As explained before, the first one is based on the DBSCAN algorithm and second is based on the CLIQUE clustering algorithm.

Handlingof spatial data by CLIQUE is found to be more efficient as compared to DBSCAN algorithm.

The Figure 8 is a comparison graph for evaluating the performance of DBSCAN based systemand CLIQUE based system in detecting association rules based on the data provided in Table 2. For same number of clusters, the CLIQUE based system generates more association rules under same support and confidence measures.

Table 2: DBSCAN based system Vs CLIQUE based system Number of Association rules

Number of clusters DBSCAN based system CLIQUE based system

50 764 822

100 2138 3500

150 3706 3706

200 10034 14000

So, as the efficiency of the clustering method increases, the number of the association rules increases.

When number of clusters is 50, the DBSCAN based system produces 764 rules while CLIQUE based system generates 822 rules. The number of rules is same (3706) when number of clusters is 150.However, in other two cases, ie, number of clusters is 100 and 200; the second system had detected more association rules as compared to the first one. From these observations, it is clear any improvement in the spatial clustering methods can subsequently increase the efficiency of the system.

(15)

Figure 8: Comparison of DBSCAN based system with CLIQUE based system

CONCLUSIONS

This framework had been developed to analyze the spatial trends of a particular region via spatial association rule mining. The scope of generated rules has been oriented to show the spatial trends and thereby can take necessary actions according to this. The main advantage of this system is that any advancement in spatial clustering methods can enhance the efficiency of the system. The performance analysis of the system shows that efficient clustering can enhance its efficiency. DBSCAN and CLIQUIE are the two algorithms employed for spatial clustering and spatial data had handled well by CLIQUE as compared to DBSCAN.As the efficiency of the cluster increased, more association rules are generated.

REFERENCES

1. Rakesh Agrawal, Tomasz Leminski,Arun N Swami., “Mining association rules between sets of items in large databases," , Proceedings of 1993 ACM SIGMOD international conference on management of data,1993.

2. Ester, Hans-peter kriegel, Jorg Sander, Xiawoei Xu., “A density based algorithm for discovering clusters in large spatial database with noise,”,Proceedings of international

3. Lance Parson, Etheshan Haque, Huan Liu., “Evaluating subspace clustering algorithm,"

,Department of computer science and enginerring,Arizonia University,,2004.

4. Carol A, Gotway, Linda J Young., “ Combining incompatible spatial data.," American Statistical Association Journal of American stastical association ,2002.

5. Chris Ding, Xiaofeng He., “K-means clustering via principal component analysis,"

,Computational research division,Lawrence Berkely University,,2007.

(16)

6. Hae-sang park, Jong Seok Lee, Chi Hyuck Jan., “A K-means like algorithm for K-medoid clustering and its performance," ,Department of Industrial and management engineering,

7. POSTECH,, 2000.

8. J. Wang, R. Yang,Muntz., “STING: A Statistical Information Grid Approach to spatial datamining," ,International conference of VLDB,,1997.

9. Rakesh Agrawal, Ramakrishnan Srikanth., “Fast algorithms for mining association rules in large databases,”, Proceedings of the 20th international conference on VLDB,1994.

10. Qiankum Zhao, Sourav S Bhowmchick., “Association Rule Mining: A Survey,”, Technical Report, CAIS Nanyang Technological University, Singapore,2003.

11. Amitabha Das, Wee-Keong Ng,Yew-kwong Woon ., “Rapid Association Rule Mining," ,In the proceedings of the 10th international conference on information and knowledge management,2001.

12. J. Wang, R. Yang,Muntz., “STING: A Statistical Information Grid Approach to spatial datamining," ,International conference of VLDB,,1997.

13. Jiawei Han, Jian Pei, and Yiwen Yin., “Mining frequent patterns without candidate generation,"

,In proceedings of In ACM SIGMOD international conference,,2000 14. http://www.diva-gis.org/gdata.

15. http://www.cs.utah.edu/ lifeifei/SpatialDataset.htm.

*Corresponding Author: Neethu C V;

Department of Computer Science, SCT College of Engineering Trivandrum,Kerala .

.