• No results found

Spectrum Location Intelligence for Big Data

N/A
N/A
Protected

Academic year: 2022

Share "Spectrum Location Intelligence for Big Data"

Copied!
93
0
0

Loading.... (view fulltext now)

Full text

(1)

Data

User Guide

Version 5.0

(2)

1 - Introduction

What is Spectrum Location Intelligence for Big Data?...4 Spectrum Location Intelligence for Big Data

Architecture...5 System Requirements and Dependencies...6

2 - Spatial

Installing the SDK...8 Hive User-Defined Spatial Functions ...9 Spark...73

3 - Appendix

PGD Builder...86 Download Permissions...88 Operators and Syntax Delimiters...89

(3)

In this section

What is Spectrum Location Intelligence for Big Data?...4 Spectrum Location Intelligence for Big Data Architecture...5 System Requirements and Dependencies...6

(4)

What is Spectrum Location Intelligence for Big Data?

The Precisely Spectrum Location Intelligence for Big Data is a toolkit for processing enterprise data for large scale spatial analysis. Billions of records can be processed in parallel, using MapReduce, Hive, and Apache Spark's cluster processing framework, yielding results faster than ever. Unlike traditional processing techniques that used to take weeks to process the data, now the data processing can be done in a few hours using this product.

(5)

Spectrum Location Intelligence for Big Data Architecture

What is Spectrum Location Intelligence for Big Data?

The Spectrum Location Intelligence for Big Data transforms and packages Location Intelligence components into an SDK for Big Data platforms like Hadoop for Spark, MapReduce, and Hive.

SDK provides:

• Integration APIs for Location Intelligence

• Input datasets and metadata API Types:

• Pre-built Spark and Hive UDF wrappers for Location Intelligence operations

• Core Location Intelligence APIs with sample MapReduce, Hive, and Spark programs (security enabled via Kerberos and Apache Sentry for Hive)

(6)

System Requirements and Dependencies

This product is verified on the following Hadoop distributions:

• Cloudera 5.16, 6.3, 7.1

• Hortonworks 3.0, 3.1

• EMR 5.30, 6.0

To use the product, you must be familiar with configuring Hadoop in Hortonworks, Cloudera, or EMR, and developing applications for distributed processing. For more information, refer toHortonworks, Cloudera, orEMRdocumentation.

The following additional tools must be available to use certain product features:

for Hive:

• Hive version 1.2.1 or above for Spark and Zeppelin Notebook:

• Java JDK version 1.8 or above

• Hadoop version 2.6.0 or above

• Spark version 2.0 or above

• Zeppelin Notebook is not supported in Cloudera

(7)

This section describes the Spark jobs and Hive user defined functions (UDFs) for geometry and coordinate operations and the ability to read TAB files.

Spark jobs use the Location Intelligence SDK (LI SDK) API in map and reduce operations to use the big data processing systems for spatial data analysis. The LI SDK provides geometry and coordinate operations, the ability to read TAB files, and in-memory r-tree creation and searching.

Hive UDFs also use the LI SDK API to provide SQL-like functions for spatial analysis in Hive.

In this section

Installing the SDK...8 Hive User-Defined Spatial Functions ...9 Spark...73

(8)

Installing the SDK

To use spatial functions for Spectrum Location Intelligence for Big Data, the Hadoop cluster must have reference data and libraries accessible from the master node.

For the purposes of this guide, we will:

• use a user called sdkuser

• install everything into /precisely

Perform the following steps from a node in your cluster, such as the master node.

1. Create the install directory and give ownership to sdkuser.

sudo mkdir /precisely

sudo chown sdkuser:sdkuser /precisely

2. Add the Location Intelligence distribution zip to the node at a temporary location, for example:

/precisely/temp/spectrum-bigdata-locationintelligence-version.zip 3. Extract the Location Intelligence distribution.

mkdir /precisely/li

mkdir /precisely/li/software

unzip /precisely/temp/spectrum-bigdata-locationintelligence-version.zip -d /precisely/li/software

4. Create an install directory on hdfs and give ownership to sdkuser.

sudo -u hdfs hadoop fs -mkdir -p hdfs:///precisely/li

sudo -u hdfs hadoop fs -chown -R sdkuser:sdkuser hdfs:///precisely 5. Upload the distribution into HDFS.

hadoop fs -copyFromLocal /precisely/li/software hdfs:///precisely/li

(9)

Hive User-Defined Spatial Functions

Hive user-defined functions (UDFs) create MapReduce jobs in SQL-like syntax so there is no need to write code. Spectrum Location Intelligence for Big Data and Spectrum Geocoding for Big Data provide Hive user defined functions for Geometry operations and to work with grids in the

spectrum-bigdata-li-hive-<version>.jar.

Refer to the table below to quickly navigate to Hive UDFs described in this document:

Name Description

Type

FromGeoJSON Construct an instance of WritableGeometry

from supported geometry representation formats

Constructor Functionson page 16

FromKML

FromWKB

FromWKT

ST_Point

GeoHashBoundary Grid processing functions

Grid Functionson page 56

GeoHashID

HexagonBoundary

HexagonID

SquareHashBoundary

SquareHashID

(10)

Name Description

Type

Area Geometry measurement functions

Measurement Functions on page 32

ClosestPoints

Distance

Length

Perimeter

ST_X Geometry observer functions

Observer Functionson page 49

ST_Y

ST_XMax

ST_XMin

ST_YMax

ST_YMin

ToGeoJSON Serialize an instance of WritableGeometry

to supported geometry representation formats

Persistence Functionson page 22

ToKML

ToWKB

ToWKT

Disjoint Geometry predicate functions

Predicate Functionson page 26

Intersects

Overlaps

Within

IsNullGeometry

(11)

Name Description

Type

Buffer Geometry processing functions

Processing Functionson page 42

ConvexHull

Intersection

Transform

Union

LocalPointInPolygon Spatial search functions

Search Functions on page 67

LocalSearchNearest

(12)

Setup

This topic assumes the product is installed to /precisely/li/software as described inInstalling the SDKon page 8. To set up user-defined spatial functions for Hive, perform the following steps:

1. Proceed according to your platform.

Do this On this platform

Copy the Hive jar for Location Intelligence to the HiveServer node.

/precisely/li/software/hive/lib/spectrum-bigdata-li-hive-version.jar

Cloudera

In Cloudera Manager, navigate to the Hive Configuration page. Search for the Hive Auxiliary JARs Directory setting. If the value is already set then move the Hive jar into the specified folder. If the value is not set then set it to the parent folder of the Hive jar.

/precisely/li/software/hive/lib/

On the HiveServer2 node, create the Hive auxlib folder if one does not already exist.

sudo mkdir /usr/hdp/current/hive-server2/auxlib/

Hortonworks

Copy the Hive jar for Location Intelligence to the auxlib folder on the HiveServer2 node:

sudo cp /precisely/li/software/hive/lib/

spectrum-bigdata-li-hive-version.jar /usr/hdp/current/hive-server2/auxlib/

2. Restart all Hive services.

3. Launch Beeline, or some other Hive client, for the remaining step.

beeline -u jdbc:hive2://localhost:10000/default -n sdkuser

4. Register spatial user-defined functions. Add the temporary keyword after create if you want a temporary function (this step would need to be redone for every new Hive session).

create temporary function FromWKT as 'com.pb.bigdata.spatial.hive.construct.FromWKT';

create temporary function FromWKB as 'com.pb.bigdata.spatial.hive.construct.FromWKB';

create temporary function FromKML as 'com.pb.bigdata.spatial.hive.construct.FromKML';

create temporary function FromGeoJSON as 'com.pb.bigdata.spatial.hive.construct.FromGeoJSON';

create temporary function ST_Point as 'com.pb.bigdata.spatial.hive.construct.ST_Point';

create temporary function ToWKT as 'com.pb.bigdata.spatial.hive.persistence.ToWKT';

(13)

create temporary function ToWKB as 'com.pb.bigdata.spatial.hive.persistence.ToWKB';

create temporary function ToKML as 'com.pb.bigdata.spatial.hive.persistence.ToKML';

create temporary function ToGeoJSON as 'com.pb.bigdata.spatial.hive.persistence.ToGeoJSON';

create temporary function Disjoint as 'com.pb.bigdata.spatial.hive.predicate.Disjoint';

create temporary function Overlaps as 'com.pb.bigdata.spatial.hive.predicate.Overlaps';

create temporary function Within as 'com.pb.bigdata.spatial.hive.predicate.Within';

create temporary function Intersects as 'com.pb.bigdata.spatial.hive.predicate.Intersects';

create temporary function IsNullGeometry as 'com.pb.bigdata.spatial.hive.predicate.IsNullGeometry';

create temporary function Area as 'com.pb.bigdata.spatial.hive.measurement.Area';

create temporary function ClosestPoints as 'com.pb.bigdata.spatial.hive.measurement.ClosestPoints';

create temporary function Distance as 'com.pb.bigdata.spatial.hive.measurement.Distance';

create temporary function Length as 'com.pb.bigdata.spatial.hive.measurement.Length';

create temporary function Perimeter as 'com.pb.bigdata.spatial.hive.measurement.Perimeter';

create temporary function ConvexHull as 'com.pb.bigdata.spatial.hive.processing.ConvexHull';

create temporary function Intersection as 'com.pb.bigdata.spatial.hive.processing.Intersection';

create temporary function Buffer as 'com.pb.bigdata.spatial.hive.processing.Buffer';

create temporary function Union as 'com.pb.bigdata.spatial.hive.processing.Union';

create temporary function Transform as 'com.pb.bigdata.spatial.hive.processing.Transform';

create temporary function ST_X as 'com.pb.bigdata.spatial.hive.observer.ST_X';

create temporary function ST_XMax as 'com.pb.bigdata.spatial.hive.observer.ST_XMax';

create temporary function ST_XMin as 'com.pb.bigdata.spatial.hive.observer.ST_XMin';

create temporary function ST_Y as 'com.pb.bigdata.spatial.hive.observer.ST_Y';

create temporary function ST_YMax as 'com.pb.bigdata.spatial.hive.observer.ST_YMax';

create temporary function ST_YMin as 'com.pb.bigdata.spatial.hive.observer.ST_YMin';

create temporary function GeoHashBoundary as 'com.pb.bigdata.spatial.hive.grid.GeoHashBoundary';

create temporary function GeoHashID as 'com.pb.bigdata.spatial.hive.grid.GeoHashID';

create temporary function HexagonBoundary as 'com.pb.bigdata.spatial.hive.grid.HexagonBoundary';

create temporary function HexagonID as 'com.pb.bigdata.spatial.hive.grid.HexagonID';

create temporary function SquareHashBoundary as 'com.pb.bigdata.spatial.hive.grid.SquareHashBoundary';

create temporary function SquareHashID as 'com.pb.bigdata.spatial.hive.grid.SquareHashID';

create temporary function LocalSearchNearest as

'com.pb.bigdata.spatial.hive.search.LocalSearchNearest';

create temporary function LocalPointInPolygon as

'com.pb.bigdata.spatial.hive.search.LocalPointInPolygon';

Note: If you want to view the complete stack trace for any encountered error, enable logging in DEBUG mode and then restart the job execution.

• The first time you run a job may take a while if the reference data has to be downloaded remotely from HDFS or S3. It may also time out when using a large number of datasets that are stored in remote locations such as HDFS or S3. If you are using Hive with the MapReduce engine, you can adjust the value of the mapreduce.task.timeout property.

• Some types of queries will cause Hive to evaluate UDFs in the HiveServer2 process space instead of on a data node. The Routing UDFs in particular use a significant amount of memory and can shut down the Hive server due to memory constraints. To process these queries, we recommend increasing the amount of memory available to the HiveServer2 process (for example, by setting HADOOP_HEAPSIZE in hive-env.sh).

(14)

WritableGeometry

This is an implementation of Hadoop's Writable interface for geometry.

Spatial Hive user defined functions (UDFs) use WritableGeometry to exchange data between two functions. Constructor Hive functions provide a mechanism to get an instance of WritableGeometry from standard geometry formats like WKT, WKB, GeoJSON and KML. For example:

To get an instance of WritableGeometry from WKT:

SELECT FromWKT(t.geometry,'epsg:4267') FROM hivetable t;

To get an instance of WritableGeometry from WKB string:

SELECT FromWKB(t.geometry,'epsg:4267') FROM hivetable t;

Persistence Hive UDFs convert an instance of WritableGeometry to standard formats like WKT, WKB, GeoJSON and KML. For example:

To serialize an instance of WritableGeometry to WKT:

SELECT ToWKT(t.geometry) FROM hivetable t;

The output of Constructor functions can be supplied as input to other Hive functions that perform some operations on it. For example:

To calculate the length of a geometry:

SELECT Length(FromWKT(t.geometry, 'epsg:4267'), 'm', 'SPHERICAL') FROM hivetable t;

To get the distance between two geometries:

SELECT Distance(FromWKT(t.geometry,'epsg:4267'), FromWKT(t.geometry2,'epsg:4267'), 'm', 'SPHERICAL') FROM hivetable t;

The IsNullGeometry Hive UDF provides the capability to perform a null check on an instance of WritableGeometry. The IsNullGeometry function returns true if the geometry is NULL or empty;

otherwise, it returns false. The result type is a boolean.

SELECT IsNullGeometry(null);

SELECT IsNullGeometry(FromWKT("POINT(10 20)"));

SELECT IsNullGeometry(ST_Point(x, y, 'epsg:4326')) FROM src;

For more information, see

https://hadoop.apache.org/docs/r3.1.1/api/org/apache/hadoop/io/Writable.html.

(15)

Geometry Functions

Constructor Functions

Persistence Functions

Predicate Functions

Measurement Functions

Processing Functions

Observer Functions

Grid Functions

(16)

Constructor Functions

The following Constructor functions are available:

FromGeoJSON

FromWKT

FromWKB

FromKML

Point FromGeoJSON

Description

The FromGeoJSON function returns a WritableGeometry instance from a GeoJSON representation of a geometry.

Function Registration

create function FromGeoJSON as 'com.pb.bigdata.spatial.hive.construct.FromGeoJSON';

Syntax

fromGeoJSON(String jsonGeometry)

Parameters

Description Type

Parameter

The geometry in geoJSON format.

String jsonGeometry

Return Values

Description Return Type

The geometry from geoJSON format.

WritableGeometry

(17)

Examples

SELECT FromGeoJSON('{ "type": "Point", "coordinates": [100.0, 0.0] }');

SELECT FromGeoJSON(t.geometry) FROM hivetable t;

FromWKT

Description

The FromWKT function returns a WritableGeometry instance from a Well-Known Text (WKT) representation of a geometry.

Function Registration

create function FromWKT as 'com.pb.bigdata.spatial.hive.construct.FromWKT';

Syntax

fromWKT(String geometry, [SpatialInfo CRS])

Parameters

Description Type

Parameter

The geometry in WKT format.

String geometry

Optional. The coordinate system for the geometry. Default = EPSG:4326 String

CRS

Return Values

Description Return Type

The geometry from WKT format.

WritableGeometry

(18)

Examples

SELECT FromWKT(t.geometry) FROM hivetable t;

SELECT FromWKT(t.geometry,'epsg:4267') FROM hivetable t;

SELECT FromWKT ('POINT (30 20)', 'epsg:4267');

FromWKB

Description

The FromWKB function returns a WritableGeometry instance from a Well-Known Binary (WKB) representation of a geometry.

Function Registration

create function FromWKB as 'com.pb.bigdata.spatial.hive.construct.FromWKB';

Syntax

fromWKB(String geometry, [SpatialInfo CRS])

Parameters

Description Type

Parameter

The WKB of the geometry in byte array format (byte[ ]) .

String geometry

Optional. The coordinate system for the geometry. Default = EPSG:4326 String

CRS

Return Values

Description Return Type

The geometry from WKB format.

WritableGeometry

(19)

Examples

SELECT FromWKB(unhex('010100000000000000000024400000000000002440'), 'epsg:4326');

FromKML

Description

The FromKML function returns a WritableGeometry instance from the text formatted in KML (Keyhole Markup Language).

Function Registration

create function FromKML as 'com.pb.bigdata.spatial.hive.construct.FromKML';

Syntax

fromKML(String geometry)

Parameters

Description Type

Parameter

A KML string, where only the geometry or geometry in placemark will be parsed.

String geometry

Return Values

Description Return Type

The geometry from KML format.

WritableGeometry

Examples

SELECT FromKML(t.geometry) FROM hivetable t;

(20)

ST_Point

Description

The ST_Point function constructs a point geometry from the provided X and Y, and an optional CRS.

Function Registration

create function ST_Point as 'com.pb.bigdata.spatial.hive.construct.ST_Point';

Syntax

ST_Point(String|Number X, String|Number Y, [SpatialInfo CRS])

Parameters

Description Type

Parameter

The X ordinate.

String or Number X

The Y ordinate.

String or Number Y

Optional. The coordinate system for the geometry. Default = EPSG:4326 String

CRS

Return Values

Description Return Type

The geometry of the specified X/Y coordinates. If any of the argument values are invalid, then an empty geometry will be returned in the output.

WritableGeometry

(21)

Examples

SELECT ST_Point(-73.750333 , 42.736103, 'epsg:4326');

SELECT ST_Point('-73.750333' , '42.736103', 'epsg:4326');

SELECT ST_Point(-73.750333 , 42.736103);

SELECT ST_Point('-73.750333' , 42.736103);

SELECT ST_Point(p.x, p.y, p.crs) FROM points p;

(22)

Persistence Functions

The following Persistence functions are available:

ToGeoJSON

ToWKT

ToWKB

ToKML ToGeoJSON

Description

The ToGeoJSON function returns a text formatted in GeoJSON representation of geometry from the specified WritableGeometry instance.

Function Registration

create function ToGeoJSON as 'com.pb.bigdata.spatial.hive.persistence.ToGeoJSON';

Syntax

ToGeoJSON(WritableGeometry geometry)

Parameters

Description Type

Parameter

The instance of a WritableGeometry.

WritableGeometry geometry

Return Values

Description Return Type

The GeoJSON representation of a geometry.

GeoJSON String

(23)

Examples

SELECT ToGeoJSON(Buffer(FromGeoJSON(t.geometry), 5.0, 'km', 12, 'SPHERICAL' )) FROM hivetable t;

ToWKT

Description

The ToWKT function returns a Well-Known Text (WKT) representation of a geometry from the specified WritableGeometry instance.

Function Registration

create function ToWKT as 'com.pb.bigdata.spatial.hive.persistence.ToWKT';

Syntax

ToWKT(WritableGeometry geometry)

Parameters

Description Type

Parameter

The instance of a WritableGeometry.

WritableGeometry geometry

Return Values

Description Return Type

The WKT representation of a geometry.

String

Examples

SELECT ToWKT(Buffer(FromGeoJSON(t.geometry), 5.0, 'km', 12, 'SPHERICAL' )) FROM hivetable t;

(24)

ToWKB

Description

The ToWKB function returns a byte array in a Well-Known Binary (WKB) representation of a geometry as parsed from the specified WritableGeometry instance.

Function Registration

create function ToWKB as 'com.pb.bigdata.spatial.hive.persistence.ToWKB';

Syntax

ToWKB(WritableGeometry geometry)

Parameters

Description Type

Parameter

The instance of a WritableGeometry.

WritableGeometry geometry

Return Values

Description Return Type

The WKB representation of a geometry expressed as a byte array.

Byte[ ]

Examples

SELECT ToWKB(Buffer(FromGeoJSON(t.geometry), 5.0, 'km', 12, 'SPHERICAL' )) FROM hivetable t;

(25)

ToKML

Description

The ToKML function returns a text formatted in KML in the OGC standard, KML2.2 namespace (http://schemas.opengis.net/kml/2.2.0/ogckml22.xsd) as parsed from the specified

WritableGeometry instance.

Function Registration

create function ToKML as 'com.pb.bigdata.spatial.hive.persistence.ToKML';

Syntax

ToKML(WritableGeometry geometry)

Parameters

Description Type

Parameter

The instance of a WritableGeometry.

WritableGeometry geometry

Return Values

Description Return Type

The KML representation of a geometry expressed as a hexadecimal encoded string.

String

Examples

SELECT ToKML(Buffer(FromGeoJSON(t.geometry), 5.0, 'km', 12, 'SPHERICAL' )) FROM hivetable t;

(26)

Predicate Functions

The following Predicate functions are available:

Disjoint

Intersects

Overlaps

Within

IsNullGeometry Disjoint

Description

The Disjoint function tests if two geometry objects have no points in common.

Function Registration

create function Disjoint as 'com.pb.bigdata.spatial.hive.predicate.Disjoint';

Syntax

Disjoint(WritableGeometry geometry1, WritableGeometry geometry2)

Parameters

Description Type

Parameter

The first instance of a WritableGeometry.

WritableGeometry geometry1

The second instance of a WritableGeometry.

WritableGeometry geometry2

Return Values

Description Return Type

True, if the two geometry objects have no points in common; otherwise, False.

If either geometry1 or geometry2 are null, Null is returned.

Boolean

(27)

Examples

SELECT Disjoint(FromWKT(t1.geometry, 'epsg:4326'), FromWKT(t2.geometry, 'epsg:4326')) FROM hivetable1 t1, hivetable2 t2;

Intersects

Description

The Intersects function determines whether or not one geometry object intersects another geometry object.

Function Registration

create function Intersects as 'com.pb.bigdata.spatial.hive.predicate.Intersects';

Syntax

Intersects(WritableGeometry geometry1, WritableGeometry geometry2)

Parameters

Description Type

Parameter

The first instance of a WritableGeometry.

WritableGeometry geometry1

The second instance of a WritableGeometry.

WritableGeometry geometry2

Return Values

Description Return Type

True, if there is any direct position in common between the two geometries; otherwise, False.

If either geometry1 or geometry2 are null, Null is returned.

Boolean

(28)

Examples

SELECT Intersects(FromWKT(t1.geometry, 'epsg:4326'), FromWKT(t2.geometry, 'epsg:4326')) FROM hivetable1 t1, hivetable2 t2;

SELECT R.highway FROM USA_RIVERS L, usa_highways R WHERE L.name='Hudson River' AND Intersects(FromWKT(L.geom), FromWKT(R.geom));

Overlaps

Description

The Overlaps function determines whether or not one geometry object overlaps another geometry object.

Function Registration

create function Overlaps as 'com.pb.bigdata.spatial.hive.predicate.Overlaps';

Syntax

Overlaps(WritableGeometry geometry1, WritableGeometry geometry2)

Parameters

Description Type

Parameter

The first instance of a WritableGeometry.

WritableGeometry geometry1

The second instance of a WritableGeometry.

WritableGeometry geometry2

Return Values

Description Return Type

True, if the geometry1 overlaps the geometry2; otherwise, False.

If either geometry1 or geometry2 are null, Null is returned.

Boolean

(29)

Examples

SELECT Overlaps(FromWKT(t1.geometry, 'epsg:4326'), FromWKT(t2.geometry, 'epsg:4326')) FROM hivetable1 t1, hivetable2 t2;

Within

Description

The Within function returns whether or not one geometry object is entirely within another geometry object.

Function Registration

create function Within as 'com.pb.bigdata.spatial.hive.predicate.Within';

Syntax

Within(WritableGeometry geometry1, WritableGeometry geometry2)

Parameters

Description Type

Parameter

The first instance of a WritableGeometry.

WritableGeometry geometry1

The second instance of a WritableGeometry.

WritableGeometry geometry2

Return Values

Description Return Type

True, if the geometry2 entirely contains geometry1; otherwise, False.

If either geometry1 or geometry2 are null, Null is returned.

Boolean

(30)

Examples

SELECT Within(FromWKT(t1.geometry, 'epsg:4326'), FromWKT(t2.geometry, 'epsg:4326')) AS Result FROM hivetable1 t1, hivetable2 t2;

SELECT L.zipcode as zipcode, SUM(L.insurance) as TotalInsuredAmount, AVG(R.riskdesc) AS RiskScore

FROM book_of_business L, FIRE_RISK_BOUNDRIES R

WHERE Within(FromWKT(L.location), FromWKT(R.geom)) GROUP BY L.zipcode;

IsNullGeometry

Description

The IsNullGeometry function performs a null check of the input geometry.

Function Registration

create function IsNullGeometry as 'com.pb.bigdata.spatial.hive.predicate.IsNullGeometry';

Syntax

IsNullGeometry(WritableGeometry inputGeometry)

Parameters

Description Type

Parameter

The input geometry to be checked for a null or empty value.

WritableGeometry inputGeometry

Return Values

Description Return Type

True, if the geometry is null or empty; otherwise, False.

Boolean

(31)

Examples

SELECT IsNullGeometry(null);

SELECT IsNullGeometry(FromWKT("POINT(10 20)"));

SELECT IsNullGeometry(ST_Point(x, y, 'epsg:4326')) FROM src;

(32)

Measurement Functions

The following Measurement functions are available:

Area

ClosestPoints

Distance

Length

Perimeter Area

Description

The Area function calculates and returns the area of given Geometry in the desired unit. The unit must be specified as a parameter while calling the function. The area of a polygon is computed as the area of its exterior ring minus the areas of its interior rings. Points and curves have zero area.

Function Registration

create function Area as 'com.pb.bigdata.spatial.hive.measurement.Area';

Syntax

Area(WritableGeometry geometry, String areaUnits, [String computationType])

Parameters

Description Type

Parameter

The input geometry.

WritableGeometry geometry

The desired return unit type. For valid values, seeArea Unitson page 33.

String areaUnits

(33)

Description Type

Parameter

Optional. Indicates the logic to be used to interpret geometry coordinates. The

computation type is based on the coordinate system of the geometry being operated on:

• For geographic (long/lat) coordinate systems: Valid type = SPHERICAL (default)

• For projected coordinate systems: Valid types = CARTESIAN, SPHERICAL (default)

• For engineering coordinate systems: Valid type = CARTESIAN (default)

The geometry coordinates are interpreted using cartesian logic.

CARTESIAN

The geometry coordinates are interpreted using spherical logic.

SPHERICAL String

computationType

Area Units

Valid values for unit are the following area units:

Description Value

square miles sq mi

square kilometers sq km

square inches sq in

square foot sq ft

square yards sq yd

square millimeters sq mm

square centimeters sq cm

square meters sq m

(34)

Description Value

square US Survey feet sq survey ft

square nautical miles sq nmi

acres acre

hectares ha

Return Values

Description Return Type

The area of the geometry.

Double

Examples

SELECT Area(FromWkt(t.geometry,'epsg:4267'), 'sq mi', 'SPHERICAL') FROM hivetable t;

SELECT Area(FromWkt(t.geometry,'epsg:4267'), 'sq mi') FROM hivetable t;

ClosestPoints

Description

The ClosestPoints function returns the closest points between two geometries.

Function Registration

create function ClosestPoints as 'com.pb.bigdata.spatial.hive.measurement.ClosestPoints';

Syntax

ClosestPoints(WritableGeometry geometry1, WritableGeometry geometry2)

(35)

Parameters

Description Type

Parameter

The first instance of a WritableGeometry.

WritableGeometry geometry1

The second instance of a WritableGeometry.

WritableGeometry geometry2

Return Values

Description Return Type

The closest points between the two geometries. The geometries that intersect are at distance zero from each other, and in this case a shared point is returned.

Array<WritableGeometry>

Examples

SELECT ToWKT(result.points[0]) AS point1, ToWKT(result.points[1]) AS point2 FROM hivetable t LATERAL VIEW OUTER inline(array(named_struct('points', ClosestPoints(FromWKT(t.geometry1, 'epsg:4326'), FromWkt(t.geometry2, 'epsg:4326'), 'SPHERICAL')))) result;

Distance

Description

The Distance function calculates and returns the distance between two geometries.

Function Registration

create function Distance as 'com.pb.bigdata.spatial.hive.measurement.Distance';

Syntax

Distance(WritableGeometry geometry1, WritableGeometry geometry2, String linearUnits, [String computationType])

(36)

Parameters

Description Type

Parameter

The first instance of a WritableGeometry.

WritableGeometry geometry1

The second instance of a WritableGeometry.

WritableGeometry geometry2

The desired return unit type. For valid values, seeLinear Unitson page 36.

String linearUnits

Optional. Indicates the logic to be used to interpret geometry coordinates. The

computation type is based on the coordinate system of the geometry being operated on:

• For geographic (long/lat) coordinate systems: Valid type = SPHERICAL (default)

• For projected coordinate systems: Valid types = CARTESIAN, SPHERICAL (default)

• For engineering coordinate systems: Valid type = CARTESIAN (default)

The geometry coordinates are interpreted using cartesian logic.

CARTESIAN

The geometry coordinates are interpreted using spherical logic.

SPHERICAL String

computationType

Linear Units

The following table lists the valid values for unit type.

Description Value

miles mi

kilometers km

inches in

(37)

Description Value

feet ft

yards yd

millimeters mm

centimeters cm

meters m

US Survey feet survey ft

nautical miles nmi

Return Values

Description Return Type

The distance between the two geometries. Geometries that intersect are at distance zero from each other. Distance is always non-negative.

Double

Examples

SELECT Distance(FromWkt(t.geometry,'epsg:4267'), FromWkt(t.geometry2,'epsg:4267'), 'm', 'SPHERICAL') FROM hivetable t;

SELECT Distance(FromWkt(t.geometry,'epsg:4267'), FromWkt(t.geometry2,'epsg:4267'), 'm') FROM hivetable t;

Length

Description

The Length function calculates and returns the geographic length of a line or polyline geometry object in the desired unit type.

(38)

Function Registration

create function Length as 'com.pb.bigdata.spatial.hive.measurement.Length';

Syntax

Length(WritableGeometry geometry, String linearUnits, [String computationType])

Parameters

Description Type

Parameter

The input geometry.

WritableGeometry geometry

The desired return unit type. For valid values, seeLinear Unitson page 38.

String linearUnits

Optional. Indicates the logic to be used to interpret geometry coordinates. The

computation type is based on the coordinate system of the geometry being operated on:

• For geographic (long/lat) coordinate systems: Valid type = SPHERICAL (default)

• For projected coordinate systems: Valid types = CARTESIAN, SPHERICAL (default)

• For engineering coordinate systems: Valid type = CARTESIAN (default)

The geometry coordinates are interpreted using cartesian logic.

CARTESIAN

The geometry coordinates are interpreted using spherical logic.

SPHERICAL String

computationType

Linear Units

Valid values for unit type:

(39)

Description Value

miles mi

kilometers km

inches in

feet ft

yards yd

millimeters mm

centimeters cm

meters m

US Survey feet survey ft

nautical miles nmi

Return Values

Description Return Type

The length of the geometry.

Double

Examples

SELECT Length(FromWkt(t.geometry,'epsg:4267'), 'm', 'SPHERICAL') FROM hivetable t;

SELECT Length(FromWkt(t.geometry,'epsg:4267'), 'm') FROM hivetable t;

(40)

Perimeter

Description

The Perimeter function calculates and returns the total perimeter of a given geometry in the desired unit type. The Perimeter of a polygon is the sum of the lengths of its rings (both exterior and holes).

The curves are considered as thin polygons.

Function Registration

create function Perimeter as 'com.pb.bigdata.spatial.hive.measurement.Perimeter';

Syntax

Perimeter(WritableGeometry geometry, String linearUnits, [String computationType])

Parameters

Description Type

Parameter

The input geometry.

WritableGeometry geometry

The desired return unit type. For valid values, seeLinear Unitson page 41.

String linearUnits

Optional. Indicates the logic to be used to interpret geometry coordinates. The

computation type is based on the coordinate system of the geometry being operated on:

• For geographic (long/lat) coordinate systems: Valid type = SPHERICAL (default)

• For projected coordinate systems: Valid types = CARTESIAN, SPHERICAL (default)

• For engineering coordinate systems: Valid type = CARTESIAN (default)

The geometry coordinates are interpreted using cartesian logic.

CARTESIAN

The geometry coordinates are interpreted using spherical logic.

SPHERICAL String

computationType

(41)

Linear Units

Valid values for unit type:

Description Value

miles mi

kilometers km

inches in

feet ft

yards yd

millimeters mm

centimeters cm

meters m

US Survey feet survey ft

nautical miles nmi

Return Values

Description Return Type

The perimeter of the geometry.

Double

Examples

SELECT Perimeter(FromWkt(t.geometry,'epsg:4267'), 'm', 'SPHERICAL') FROM hivetable t;

(42)

Processing Functions

The following Processing functions are available:

Buffer

ConvexHull

Intersection

Transform

Union Buffer

Description

The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object.

Function Registration

create function Buffer as 'com.pb.bigdata.spatial.hive.processing.Buffer';

Syntax

Buffer(WritableGeometry geometry, Number offset, String linearUnit, Number resolution, [String computationType])

Parameters

Description Type

Parameter

The geometry to buffer.

WritableGeometry geometry

The distance from the input geometry.

Number offset

The desired return unit type. For valid values, seeLinear Unitson page 43.

String linearUnits

(43)

Description Type

Parameter

Specifies how many straight-line segments are used in approximating a circle. For example, with a resolution of 8, the buffer of a point will be an octagon. Buffers with larger resolution values take more time and space to compute.

Number resolution

Optional. Indicates the logic to be used to interpret geometry coordinates. The

computation type is based on the coordinate system of the geometry being operated on:

• For geographic (long/lat) coordinate systems: Valid type = SPHERICAL (default)

• For projected coordinate systems: Valid types = CARTESIAN, SPHERICAL (default)

• For engineering coordinate systems: Valid type = CARTESIAN (default)

The geometry coordinates are interpreted using cartesian logic.

CARTESIAN

The geometry coordinates are interpreted using spherical logic.

SPHERICAL String

computationType

Linear Units

Valid values for unit type:

Description Value

miles mi

kilometers km

inches in

feet ft

(44)

Description Value

millimeters mm

centimeters cm

meters m

US Survey feet survey ft

nautical miles nmi

Return Values

Description Return Type

A geometry consisting of all the points within the offset distance of the input geometry.

WritableGeometry

Examples

SELECT Buffer(FromWKT(t.geometry,'epsg:4267'), 5.0, 'km', 12, 'SPHERICAL' ) FROM hivetable t;

SELECT Buffer(ST_POINT(5, 6, 'epsg:4267'), 5.0, 'km', 12, 'SPHERICAL' );

ConvexHull

Description

The ConvexHull function computes the convex hull of a geometry. The convex hull is the smallest convex geometry that contains all the points in the input geometry.

Function Registration

create function ConvexHull as 'com.pb.bigdata.spatial.hive.processing.ConvexHull';

(45)

Syntax

ConvexHull(WritableGeometry geometry)

Parameters

Description Type

Parameter

The input geometry.

WritableGeometry geometry

Return Values

Description Return Type

The convex hull of the geometry.

WritableGeometry

Examples

SELECT ConvexHull(FromWKT(geometry, 'epsg:4326')) FROM hivetable;

SELECT ToWKT(ConvexHull(FromWKT(table.geometry,'epsg:4267'))) AS result FROM hivetable;

SELECT ConvexHull(FromWKT('MULTIPOLYGON (((40 40, 20 45, 45 30, 40 40)), ((20 35, 10 30, 10 10, 30 5, 45 20, 20 35), (30 20, 20 15, 20 25, 30 20)))'),'epsg:4267');

Intersection

Description

The Intersection function is a geometry (point, line, or curve) common in two geometry objects (such as lines, curves, planes, and surfaces). It returns the geometry consisting of direct positions that lie in both specified geometries.

Function Registration

create function Intersection as 'com.pb.bigdata.spatial.hive.processing.Intersection';

(46)

Syntax

Intersection(WritableGeometry geometry1, WritableGeometry geometry2)

Parameters

Description Type

Parameter

The first instance of a WritableGeometry.

WritableGeometry geometry1

The second instance of a WritableGeometry.

WritableGeometry geometry2

Return Values

Description Return Type

The geometry formed from the direct positions that are common to both input geometries.

WritableGeometry

Examples

SELECT Intersection(FromWKT(t1.geometry,'epsg:4267'), FromWKT(t2.geometry,'epsg:4267')) FROM hivetable1 t1, hivetable2 t2;

Transform

Description

The Transform function transforms a given geometry from one coordinate system to another.

Function Registration

create function Transform as 'com.pb.bigdata.spatial.hive.processing.Transform';

Syntax

Transform(WritableGeometry geometry, String CRS)

(47)

Parameters

Description Type

Parameter

The source input geometry.

WritableGeometry geometry

The destination coordinate system for the geometry.

String CRS

Return Values

Description Return Type

The geometry transformed to the destination coordinate system.

WritableGeometry

Examples

SELECT Transform(FromWKT(t.geometry,'epsg:4326'), 'epsg:3857') FROM hivetable t;

SELECT Transform(ST_POINT(30, 20),'epsg:3857');

Union

Description

The Union function returns a geometry object which represents the union of two input geometry objects.

Function Registration

create function Union as 'com.pb.bigdata.spatial.hive.processing.Union';

Syntax

Union(WritableGeometry geometry1, WritableGeometry geometry2)

(48)

Parameters

Description Type

Parameter

The first instance of a WritableGeometry.

WritableGeometry geometry1

The second instance of a WritableGeometry.

WritableGeometry geometry2

Return Values

Description Return Type

The geometry that represents the union of the input geometries.

WritableGeometry

Examples

SELECT Union(FromWKT(geometry1, 'epsg:4326'), FromWKT(geometry2, 'epsg:4326')) FROM hivetable;

SELECT ToWKT(Union(FromWKT(t1.geometry,'epsg:4267'),FromWKT(t2.geometry,'epsg:4267'))) FROM hivetable1 t1, hivetable2 t2;

(49)

Observer Functions

Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables. For example, the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another. The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another.

Another common need is the ability to filter records in an XY table by the bounds of a geometry. The ST_XMax, ST_XMin, ST_YMax, and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry.

The following Observer index functions are available:

ST_X

ST_XMax

ST_XMin

ST_Y

ST_YMax

ST_YMin ST_X

Description

The ST_X function returns the X ordinate of the geometry if the geometry is a point, or Null if the geometry is not a point or is null.

Function Registration

create function ST_X as 'com.pb.bigdata.spatial.hive.observer.ST_X';

Syntax

ST_X(WritableGeometry geometry)

Parameters

Description Type

Parameter

The input geometry.

WritableGeometry geometry

(50)

Return Values

Description Return Type

The X ordinate of the geometry if the geometry is a point, or Null if the geometry is not a point or is null.

Double

Examples

SELECT ST_X(ST_Point(x, y, 'epsg:4326')) FROM src;

ST_XMax

Description

The ST_XMax function returns the X maxima of a geometry, or Null if the specified value is not a geometry.

Function Registration

create function ST_XMax as 'com.pb.bigdata.spatial.hive.observer.ST_XMax';

Syntax

ST_XMax(WritableGeometry geometry)

Parameters

Description Type

Parameter

The input geometry.

WritableGeometry geometry

(51)

Return Values

Description Return Type

The X maxima of the input geometry, or Null if the specified value is not a geometry.

Double

Examples

SELECT ST_XMax(FromWKT(…, 'epsg:4326')) FROM src;

ST_XMin

Description

The ST_XMin function returns the X minima of a geometry, or Null if the specified value is not a geometry.

Function Registration

create function ST_XMin as 'com.pb.bigdata.spatial.hive.observer.ST_XMin';

Syntax

ST_XMin(WritableGeometry geometry)

Parameters

Description Type

Parameter

The input geometry.

WritableGeometry geometry

(52)

Return Values

Description Return Type

The X minima of the input geometry, or Null if the specified value is not a geometry.

Double

Examples

SELECT ST_XMin(FromWKT(…, 'epsg:4326')) FROM src;

ST_Y

Description

The ST_Y function returns the Y ordinate of the geometry if the geometry is a point, or Null if the geometry is not a point or is null.

Function Registration

create function ST_Y as 'com.pb.bigdata.spatial.hive.observer.ST_Y';

Syntax

ST_Y(WritableGeometry geometry)

Parameters

Description Type

Parameter

The input geometry.

WritableGeometry geometry

(53)

Return Values

Description Return Type

The Y ordinate of the geometry if the geometry is a point, or Null if the geometry is not a point or is null.

Double

Examples

SELECT ST_Y(ST_Point(x, y, 'epsg:4326')) FROM src;

ST_YMax

Description

The ST_YMax function returns the Y maxima of a geometry, or Null if the specified value is not a geometry.

Function Registration

create function ST_YMax as 'com.pb.bigdata.spatial.hive.observer.ST_YMax';

Syntax

ST_YMax(WritableGeometry geometry)

Parameters

Description Type

Parameter

The input geometry.

WritableGeometry geometry

(54)

Return Values

Description Return Type

The Y maxima of the input geometry, or Null if the specified value is not a geometry.

Double

Examples

SELECT ST_YMax(FromWKT(…, 'epsg:4326')) FROM src;

ST_YMin

Description

The ST_YMin function returns the Y minima of a geometry, or Null if the specified value is not a geometry.

Function Registration

create function ST_YMin as 'com.pb.bigdata.spatial.hive.observer.ST_YMin';

Syntax

ST_YMin(WritableGeometry geometry)

Parameters

Description Type

Parameter

The input geometry.

WritableGeometry geometry

(55)

Return Values

Description Return Type

The Y minima of the input geometry, or Null if the specified value is not a geometry.

Double

Examples

SELECT ST_YMin(FromWKT(…, 'epsg:4326')) FROM src;

(56)

Grid Functions

A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between.

This makes grids very useful for spatial indexing and aggregating.

SpectrumLocation Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases. Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier.

We provide three types of UDFs for processing three grid cell shapes:

• rectangular (geohash)

• hexagon (hexagon hash)

• square (square hash)

Hashes are useful for analysis and interoperability with other systems.

Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator, the cells appear as squares.

Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps.

The following Grid index functions are available:

GeoHashBoundary:

GeoHashID

HexagonBoundary

HexagonID

SquareHashBoundary

SquareHashID GeoHashBoundary

Description

The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location. It also can return the boundary of the cell that contains the given point at the specified precision. The shape of the cell is rectangular.

Function Registration

create function GeoHashBoundary as 'com.pb.bigdata.spatial.hive.grid.GeoHashBoundary';

Syntax

GeoHashBoundary(String UNIQUE_ID)

(57)

Parameters

Description Type

Parameter

The unique geohash identifier of a cell in a grid.

String UNIQUE_ID

Return Values

Description Return Type

A representation of the boundary of a cell in a grid.

WritableGeometry

Examples

SELECT GeoHashBoundary(hashStringId) FROM hivetable;

SELECT GeoHashBoundary("ebvnk");

Syntax

GeoHashBoundary(Number|String X, Number|String Y, Number precision)

Parameters

Description Type

Parameter

The longitude value of the point.

Number or String X

The latitude value of the point.

Number or String Y

The length of the string key to be returned.

The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells).

Number precision

(58)

Return Values

Description Return Type

The boundary of the grid cell at the given precision that the point falls into.

WritableGeometry

Examples

SELECT GeoHashBoundary(x, y, precision) FROM hivetable;

SELECT GeoHashBoundary("-73.750333", "42.736103", 3);

SELECT GeoHashBoundary(-73.750333, 42.736103, 3);

GeoHashID

Description

The GeoHashID function returns a unique well-known string ID for the grid cell. The ID then is sortable and searchable that corresponds to the specified X, Y, and precision.

Function Registration

create function GeoHashID as 'com.pb.bigdata.spatial.hive.grid.GeoHashID';

Syntax

GeoHashID(Number|String X, Number|String Y, Number precision)

Parameters

Description Type

Parameter

The longitude value of the point.

Number or String X

The latitude value of the point.

Number or String Y

(59)

Description Type

Parameter

The length of the string key to be returned.

The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells).

Number precision

Return Values

Description Return Type

The geohash ID of the grid cell at the specified precision that contains the point.

String

Examples

SELECT GeoHashID(x, y, precision) FROM hivetable;

SELECT GeoHashID("-73.750333", "42.736103", 3);

SELECT GeoHashID(-73.750333, 42.736103, 3);

CREATE TEMPORARY TABLE tmptbl AS

SELECT *, (GeoHashID(x, y, 10)) AS hashID FROM coordinates ORDER BY hashID;

INSERT INTO TABLE coordinates_with_hash SELECT *, (GeoHashID(x, y, 10)) AS hashID FROM coordinates ORDER BY hashID;

SELECT c.hashID, ToWKT(GeoHashBoundary (c.hashID)), count (*) as quantity FROM (SELECT GeoHashID(x, y, 10) AS hashID FROM coordinates) c

GROUP BY c.hashID;

HexagonBoundary

Description

The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location. It also can return the boundary of the cell that contains the given point at the specified precision. The shape of the cell is a hexagon.

(60)

Function Registration

create function HexagonBoundary as 'com.pb.bigdata.spatial.hive.grid.HexagonBoundary';

Syntax

HexagonBoundary(String UNIQUE_ID)

Parameters

Description Type

Parameter

The unique geohash identifier of a cell in a grid.

String UNIQUE_ID

Return Values

Description Return Type

A representation of the boundary of a cell in a grid.

WritableGeometry

Examples

SELECT HexagonBoundary(hashStringId) FROM hivetable;

SELECT HexagonBoundary("PF625028642");

Syntax

HexagonBoundary(Number|String X, Number|String Y, Number precision)

(61)

Parameters

Description Type

Parameter

The longitude value of the point.

Number or String X

The latitude value of the point.

Number or String Y

The length of the string key to be returned.

The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells).

Number precision

Return Values

Description Return Type

The boundary of the grid cell at the given precision that the point falls into.

WritableGeometry

Examples

SELECT HexagonBoundary(x, y, precision) FROM hivetable;

SELECT HexagonBoundary(-73.750333, 42.736103, 3);

HexagonID

Description

The HexagonID function returns a unique well-known string ID for the grid cell. The ID then is sortable and searchable that corresponds to the specified X, Y, and precision.

Function Registration

create function HexagonID as 'com.pb.bigdata.spatial.hive.grid.HexagonID';

(62)

Syntax

HexagonID(Number|String X, Number|String Y, Number precision)

Parameters

Description Type

Parameter

The longitude value of the point.

Number or String X

The latitude value of the point.

Number or String Y

The length of the string key to be returned.

The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells).

Number precision

Return Values

Description Return Type

The ID of the grid cell at the specified precision that contains the point.

String

Examples

SELECT HexagonID(x, y, precision) FROM hivetable;

SELECT HexagonID(-73.750333, 42.736103, 3);

CREATE TEMPORARY TABLE tmptbl AS

SELECT *, (HexagonID(x, y, 10)) AS hashID FROM coordinates ORDER BY hashID;

INSERT INTO TABLE coordinates_with_hash SELECT *, (HexagonID(x, y, 10)) AS hashID FROM coordinates ORDER BY hashID;

SELECT c.hashID, ToWKT(HexagonBoundary(c.hashID)),

count (*) AS quantity FROM (SELECT HexagonID(x, y, 10) AS hashID FROM coordinates)

c GROUP BY c.hashID;

References

Related documents

แผนพัฒนาบุคลากร ส านักวิทยบริการและเทคโนโลยีสารสนเทศ ประจ าปีงบประมาณ 2554 ประกาศคณะกรรมการส านักวิทยบริการและเทคโนโลยีสารสนเทศ เรื่อง หลักเกณฑ์และ

Some scholars cited in earlier chapters, as well as some participants in this study, have chronicled how e-governance could be used to promote sustainable development in Ghana,

Further, by showing that v τ is a modular unit over Z we give a new proof of the fact that the singular values of v τ are units at all imaginary quadratic arguments and obtain

A consequence of this perspective of woman directors being appointed as part of the symbolic management of the independence of the board is that, when these directors lose

The Wisconsin Law Enforcement Directory is maintained and published monthly by The Wisconsin Department of Justice, Training and Standards Bureau. The directory and a CSV

It examines the gender dimensions of rural land reform process and impacts by exploring the accessibility and benefits of land- use certificates for female household heads

everyone else who were not members of the First or Second Estates. There was an enormous logistical problem with the way in which the estates were divided for example, the First

Nussbaum has served as president of the Disease Management Association of America, chairman of the National Committee for Quality Health Care, chair of America's Health