PART 2 A SURVEY ON PARALLEL SPATIAL OVERLAY AND
2.4 Key Applications
In this section, we will describe parallelization of spatial overlay and join computation using Windows Azure and Hadoop MapReduce platform. We will also describe the role of high performance computing to accelerate spatial overlay and join.
2.4.1 Polygon Overlay Processing using Windows Azure
The basic idea is to divide the overlay computation tasks among web role and worker roles using Azure queues and Blobs. An Azure Cloud based parallel system for polygon overlay known as Crayons is described in [14, 15].
1. The web role presents the interface with a list of Geographic Markup Language (GML) files available to be processed along with the supported operations (currently union, intersection, xor, and difference). The user selects the GML files to be processed along with the spatial operation to be performed on these files. The web role puts this information as a message on the input queue.
2. Worker roles continuously check the input queue for new tasks. If there is a task (message) in the queue, the worker roles read the message, download the input files, parse them, and create the intersection graph to find the independent tasks. To achieve this, Crayons finds each overlay polygon that can potentially intersect with the given base polygon and only performs spatial operation on these tasks. This is achieved using the coordinates of bounding boxes generated during parsing of input files. Then each worker role shares the tasks it creates among all the worker roles by storing the task IDs in a common task pool.
3. After the workers finish task creation, they fetch work from the task pool and process them by invoking polygon clipping library. After each task is processed, the corre- sponding worker role permanently deletes the message related to this task from the task pool queue. Additionally, each worker role puts a message on the termination indicator queue to indicate successful processing of the task.
4. The web role keeps checking the number of messages in the termination indicator queue to update the user interface with the current progress of the operation. On completion, the web role commits the resultant blob and flushes it as a persistent blob in the Blob storage. The output blob’s uniform resource identifier (URI) is presented to the user for downloading or further processing.
R"tree&Index& Overlay&& R"tree&Index& Overlay&& R"tree&Index& Overlay&& Polygons(from(Input( layers( ( Emit((cell(id,(polygon)( ( ( Grid:based(Shuffle(&( Exchange( ( ( ( R:tree(query(based( Overlay(Processing( & Emit(output(polygon(
Figure (2.1)Polygon Overlay in Hadoop Topology Suite (HTS) using a single Map and Reduce
phase.
2.4.2 Spatial Overlay and Join using Hadoop
Parallelizing polygonal overlay and join with MapReduce has its own set of challenges. MapReduce focuses mainly on processing homogeneous data sets, while polygon overlay and join are binary-input problems that has to deal with two data sets. Moreover, partitioning vector data evenly to yield good load balance is non-trivial.
A suite of overlay algorithms, known as Hadoop Topology Suite (HTS), is presented
in [16, 17] by employing MapReduce in three different forms, namely, i) with a single map and reduce phase, ii) with chaining of MapReduce jobs, and iii) with a single map phase
only. HTS is based on R-tree data structure which works well with non-uniform data. HTS
carries out end-to-end overlay computation starting from two input GML files, including their parsing, employing the bounding boxes of potentially overlapping polygons to determine
the basic overlay tasks, partitioning the tasks among processes, and melding the resulting polygons to produce the output GML file.
The basic idea is to perform filtering step in map phase and perform polygon clipping in the reduce phase. This works because the geometric computations are performed on large sets of independent spatial data objects and thus, naturally lend themselves to parallel processing using map and reduce paradigm. Initially, the dimension of the grid is determined which is a minimum bounding box spatially containing all the polygons from base layer and overlay layer. Then, the dimension of grid cells is computed based on the number of partitions. The number of cells is greater than the reduce capacity of the cluster in order to ensure proper load-balancing. A polygon may belong to one or more cells and since the bounding boxes of all the grid cells are known, each mapper task can independently determine to which cell(s) a polygon belongs to.
Figure 2 shows the HTS architecture using one map and one reduce phase. First, each
mapper process reads the polygon and parses the MBR and polygon vertices. Then, the polygons are emitted as (key, value) pairs, where the cell ID is the key and polygon vertex list is the value. After all the mappers are done, each reducer gets a subset of base layer and overlay layer polygons corresponding to a grid cell. In each reducer, a local R-tree is built from the overlay layer polygons and for every base layer polygon, the intersecting overlay layer polygons are found out by querying the R-tree. Finally, the overlay operation is carried out using polygon clipping library and output polygons are written to HDFS.
Hadoop-GIS [18] is a scalable and high performance spatial data warehousing system
on Hadoop and it supports large scale spatial join queries. According to the authors, its performance is on par with parallel SDBMS and outperforms SDBMS for compute-intensive queries. It is available as a set of library for processing spatial queries, and as an integrated software package. The evaluation of the system is done using pathology imaging data and
OpenStreetMap data. SpatialHadoop [19, 20] is an open source MapReduce framework for
large spatial data that supports spatial joins. In addition to spatial join, SpatialHadoop is
cient spatial operations. BothHadoop-GISandSpatialHadoopused spatial indexing whereas Zhang et al. [21] used sweepline algorithm for spatial join.
Authors in [9] have taken a parallel database approach for spatial join. The unique fea-
tures of their system known asNiharikaare spatial declustering and dynamic load balancing
on top of a cluster of worker nodes, each of which runs a standard PostgreSQL/PostGIS rela- tional database. In [22] Ray et al. discussed a skew-resistant parallel in-memory spatial join using Quadtree declustering, which partitions the spatial dataset such that the amount of computation demanded by each partition is equalized and the processing skew is minimized.
SpatialHadoopand Niharika have been shown to run on Amazon EC2 cluster [9, 23].
2.4.3 Spatial Overlay and Join using GPUs
GPUs have been successfully utilized in many scientific applications that require high performance computing. Architecturally, a CPU is composed of only a few cores that can handle a limited number of software threads at a time. In contrast, a GPU is composed of hundreds to thousands of cores that can handle thousands of threads simultaneously. In addition to providing a very high level of thread parallelism, the GPUs coupled with high throughput memory architectures achieve significant computational throughput. As a result, leveraging the parallel processing power of GPUs for speeding up overlay and join computation has been explored recently. However, the architecture of GPUs is fundamentally different than CPUs; thus, traditional computational geometry algorithms cannot simply be run on a GPU. GPUs exhibit Single Instruction Multiple Data (SIMD) architecture and many algorithms, including plane-sweep algorithms, do not naturally map to SIMD parallelism. As a result, new algorithms and implementations need to be developed to get performance gains from GPU hardware.
GPU-accelerated parallel rectangle intersection algorithms are discussed in [24, 25]. In [24], all intersecting pairs of iso-oriented rectangles are reported using GPU. The result shows over 30x speedup using Geforce GTX 480 GPU against well implemented sequential algorithms on Intel i7 CPU. In [26], Wang et al. used hybrid CPU-GPU approach to find
intersection and union of polygons in pixel format. McKenney et al. [27] developed GPU implementation of line segment intersection and the arrangement problem for overlay com- putation. For some datasets, authors show that their implementation of geospatial overlay on Nvidia Tesla GPU runs faster than plane-sweep implementation. In [28], Franklin’s uni- form grid based overlay algorithm [29] is implemented using OpenMP in CPUs and CUDA on GPUs. The robustness issue is handled by snap rounding technique and improved per- formance is achieved by computing the winding number of overlay faces on the uniform grid efficiently using CPUs as well as GPUs. At first, edges of spatial features are mapped to grid cells. According to the paper, 10 to 20 edges per cell on average minimizes execution time. The Single Instruction Multiple Thread (SIMT) architecture of a GPU is used to detect intersecting edges by all-to-all comparison of edges. Simple point-in-polygon tests are executed in parallel to find the output vertices. Experimental results using real-world datasets showed their multi-threaded implementation exhibiting strong scaling with an effi- ciency of 0.9 on average. GPU implementation took 4 times less time when compared with CPU version. A GPU implementation of Greiner-Hormann polygon clipping algorithm [30] is discussed in [8].
Authors in [31] describe how a spatial join operation with R-Tree can be implemented on a GPU. The algorithm begins by parallel filtering of objects on the GPU. The steps of the algorithm are as follows:
1. The data arrays required for the R-Tree are mapped to the GPU memory.
2. A function to find an overlap between two MBR objects is executed by threads on the GPU in parallel.
3. The set of MBRs from Step 2 are checked whether they are in the leaf nodes or not. If they are in the leaf nodes, return the set as the result and send them to the CPU. If they are not the leaf nodes, then they are used as input again recursively until reaching leaf nodes.
2.4.4 Accelerated Hadoop for Spatial Join
While MapReduce can effectively address data-intensive aspects of geospatial problems, it is not well suited to handle compute-intensive aspect of spatial overlays and joins. A high performance approach can leverage parallelism at multiple levels for these compute- intensive operations. Most spatial algorithms are designed for executing on CPU, and the branch intensive nature of CPU based algorithms require them to be redesigned for running efficiently on GPUs. Haggis [35] first transforms the vector based geometry representation into raster representation using a pixelization method. The pixelization method reduces the geometry calculation problem into simple pixel position checking problem, which is well suited for executing on GPUs. Since testing the position of one pixel is independent of another, the computation is be parallelized by having multiple threads processing the pixels in parallel.