• No results found

SPATIAL JOIN

N/A
N/A
Protected

Academic year: 2021

Share "SPATIAL JOIN"

Copied!
5
0
0

Loading.... (view fulltext now)

Full text

(1)

SPATIAL JOIN

Biplob Kumar Debnath

Department of Electrical and Computer Engineering, University of Minnesota

SYNONYMS Intersect Join

DEFINITION

Spatial join operation is used to combine two or more dataset with respect to a spatial predicate. Predicate can be a combination of directional, distance, and topological spatial relations. In case of nonspatial join, the joining attributes must of the same type, but for spatial join they can be of different types. Usually each spatial attribute is represented by its minimum bounding rectangles (MBR).

A typical example of spatial join is “Find all pair of rivers and cities that intersect”. For example in Figure 1, the result of join between the set of rivers {R1, R2} and cities {C1, C2, C3, C4, C5} is { (R1, C1), (R2, C5)}.

Figure 1: Example of spatial join HISTORICAL BACKGROUND

In 1986, Orenstein used grid based technique to perform spatial join. It is the first known

technique to solve spatial join operation. Using grid multidimensional spaces are divided

into smaller blocks, known as pixels. Then a z-ordering is used to order the pixels. Each

object is approximated by the pixels which interest with its MBR. As pixels are ordered

by z-ordering, now each object is represented by a set of z-values, which are one-

dimensional. Now, any one-dimensional indexing (e.g., B+-tree) can be used sort them

and using sort-merge spatial join operation is done. The performance of this technique

solely depends on the granularity of the grids. The finer grids are the more accurate the

results will be, but the more memory it will consume. Later on to remedy this problem,

that people devised multidimensional indices (e.g., R-tree) which can directly handle

(2)

spatial data. Various new spatial join algorithms (e.g., R-tree join, sort and match, spatial hash join, slot index hash join etc.) based on multi-dimensional index appeared .

KEY CONCEPTS:

Spatial join is done in two steps: filter step and refine step. In filter step, tuples whose MBR overlaps with query region are determined. This step is not computationally expensive as at most four computations are required to determine whether two rectangles intersect. The tuples which passed the filter step is fed to the refinement step, where exact spatial representation is used and spatial predicate is checked on these spatial representations. Refinement step is computationally expensive, but the number of tuples it processed in this step is less, due to initial filter step.

Spatial join algorithm can be classified into three categories. For the discussion below we will assume that we want to spatial join relation R1 and R2. In this discussion, we will focus on only intersection join. Same techniques can be extended for other join variants (e.g., distance join).

Nested Loop

In this algorithm, for each tuple of R1, entire R2 is scanned; any pair of tuples of R1 and R2 which satisfies the spatial join predicate is added to the result. The basic algorithm follows:

1. for all tuple r1

R1

2. for all tuple r2

R2

3. if pair (r1, r2) satisfies the spatial join predicate 4. add <r1, r2> to result

Here, R1 is the outer relation and R2 is the inner relation. If an index is available, we can make that relation as an inner one. In this case, we need not to scan the entire inner relation.

Tree Matching

Tree matching algorithm can be applied when indices are available on both the relations.

For this discussion, we will assume that R-tree index is available. In R-tree, every node is in the form of <ref, rect>, where ref is pointer to child node and rect is the MBR of the child node or MBR of a spatial object. The pages which contain leaf nodes are called data pages, and the pages which contain non-leaf nodes are called directory pages. As directory entries contains the MBR of the child node entries, if MBRs of two directory entries Er1 and Er2 are disjoint, then there can be no match between entries of both directory pages. If they are not disjoint, there is some match between the entries, so we have traverse deeper the tree to get the matching tuple. The basic algorithm follows:

Spatial_Join (R1, R2 ) // R1 and R2 are R-Tree nodes 1. for all Er1

R1

2. for all Er2

R2

3. if (Not_Disjoint( Er1.rect, Er2.rect))

4. if ( R1 and R2 are leaf pages)

(3)

5. if pair (R1, R2) satisfies the spatial join predicate 6. add <R1, R2> to result

7. else if (R1 is a leaf page) 8. Read_Page (Er2ptr)

9. Spatial_Join (Er1.tr, Er2.ptr) 10. else if (R2 is a leaf page)

11. Read_Page (Er1ptr)

12. Spatial_Join (Er1.tr, Er2.ptr) 13. else

14. Read_Page (Er1.ptr) 15. Read_Page (Er2.ptr)

16. Spatial_Join (Er1.tr, Er2.ptr)

When index exists for only one relation, the index on the other relation is built on the fly and tree-matching technique is applied.

Partition-Based Spatial Merge Join

In this case, first both of the relations are divided into p partitions if both of them do not fit in main memory. After that partition i of R1, where

1ip

, is compared with corresponding partition i of R2. We briefly go through the filter step of this algorithm:

1. For each tuple in R1 and R2, form new relations R1’ and R2’ where each tuple consists of unique object id of the tuple and MBR of the joining attributes.

2. If we can fit both R1’ and R2’ in the main memory, using a plane-sweep algorithm we can process the join relation.

3. If both R1’ and R2’ cannot be fitted in the main memory, we partition both the relations into p parts (R1’

1

,….R1’

p

and R2’

1

,….R2’

p)

where any partitions pair (R1’

i

,R2’

i

) fits in main memory. In addition, we will make sure that, for each R1’

i

, any overlapping tuples in R2’ will reside in partition R2’

i

. Now, we can apply plane-sweep algorithm in each partition.

This strategy is very good when no indices are present on both the relations.

KEY APPLICATIONS

One of the applications of applications of spatial join is to find all the objects which either intersect or overlap with each other. Some variants of spatial join (e.g., distance join) are used in data mining for data analysis and clustering. It can also be used to process closest-pairs query, k-nearest neighbors query, and є-distance query.

FUTURE DIRECTIONS

There are some issues in spatial join require further attention from the research community. For processing spatial join queries we usually follow filter and refine step in order. In some cases, some variants of this (e.g., interleaving) may give us more benefit.

We can explore where probable variants can be beneficial and what information we need

to collect for this. Although intersection joins algorithms (e.g., R-tree join) can be

directly extended for other types (e.g., distance join) but often it cause inefficient

performance benefit. Various optimization techniques can be applied to remedy this.

(4)

Extending existing intersection join algorithms with various optimization criteria to other domain will be an interesting area for research.

CROSS REFERENCES 1. Intersection join 2. Distance join 3. Similarity join

4. Spatial access method 5. R-Tree

RECOMMENDED READING

1. Shashi Shekar, Sanjay Chawla (2003). Spatial Databases A Tour, First Edition, Prentice Hall.

2. Patel J. M. and Dewitt. D. J. (1996). Partition Based Spatial-Merge Join, Proceddings of ACM SIGMOD, pages 259-270.

3. Brinkhoff, T., Kriegel H., and Seeger B. (1993) Efficient processing of spatial joins using R-trees. In Proceeding of ACM SIGMOD, pages 237-246.

4. Brinkhoff, T., Kriegel H., and Seeger B. (1996) Parallel processing of spatial joins using R-trees. Proceeding of ICDE Conference, pages 258-265..

5. Yannis Manolopoulos, Apostolos Papadopoulos, Michel Gr. Vassilakopulous (2005). Spatial Databases, Technologies, Techniques and Trends, IDEA Group Publishing.

6. Böhm C. and Krebs F. (2002). High Performance Data Mining Using the nearest Neighbor Join. Proceedings of the IEEE International Conference on Data Mining, pages 43-55.

7. Shou Y., Mamoulis N., Cao H., Papadis D., Cheung D. W. (2003). Evaluation of Iceberg Distance Joins. Proceedings of the Eighth International Symposium on Spatial and Temporal Databases, pages 270-288.

8. Corral A., Manolopoulos Y., Theodorisdis Y., Vassilakopoulos M., (2000).

Closest pair queries in spatial databases. Proceedings of the ACM SIGMOD Conference, pages 189-200.

9. Guttmann A.(1984) R-trees: A dynamic index structure for spatial searching.

Proceedings of the ACM SIGMOD Conderecee3, pages 47-57.

10. Koudas N., Sevcik k. (2000)/ High Dimensional Similarity Join. Proceedings of the ACM SIGMOD Conference, pages 324-335.

11. Mamaulis N., Papadias D. (2001). Multi-way Spatial Joins. ACM Transactions on Database Systems (TODS), 26(4), pages 424-475.

12. An N. Yang, Sivasurbramaniam A. (2001). Selectivity estimation for Spatial Joins. Proceddings of the IEEEE ICDE Conference, pages 368-375.

13. Faloutsos C., Seeger B., Traina A. , Traina C. (2000). Spatial Join Selectivity Using Power Laws. Proceedings of the ACM SIGMOD Conference, pages 177- 188.

14. Mamoulis N., and Papadias D. (2003). Slot Index Spatial Join, IEEE Transactions on Knowledge and Data Engineering (TKDE), 15(1), pages 211-231.

15. Orenstein J. (1986). Spatial Query Processing in an Object-Oriented Database

System. Proceedings of the ACM SIGMOD Conference, pages 326-336.

(5)

References

Related documents

A study was carried out at two different locations in northern Punjab-Pakistan to elucidate the effect of inoculum and fertilizer application on growth and nitrogen fixation of

Marja Hietala, Kirsti ja Leena Saarela, Espoo kasv.. Kirsti ja Leena

An evidence- based educational intervention on sepsis will help nurses become more competent in the early identification of clinical signs of sepsis, and more confident in

This study takes 500 enterprises ’ brand names as samples which are on the list of &#34;2015 China Internet Top500 list (Q1)&#34; released by Internet Weekly, using content

This Regional Environmental Manager (REM) works with his or her Regional Management Team and our Corporate Environmental Manager to estab- lish and monitor regional

It is often believed and proposed that public-private sector participation in providing municipal services could be the best possible way to solve the current

761600 Kit Contains: 21ga x 1 1/2&#34; Monoject Magellan ™ Safety Needle, Dual Port Enteral Connector, 6cc Syringe , 16Fr Yes 2. Pinch Clamp, #11 Safety Scalpel, Catheter

The 17 th Corps US approached on the Old Flat Rock Road and camped on Walnut Creek north of town. Kilpatrick drove the CSA from Lovejoy (note markers along present Hwy 19/41