Data
User Guide
Version 5.0
1 - Introduction
What is Spectrum Location Intelligence for Big Data?...4 Spectrum Location Intelligence for Big Data
Architecture...5 System Requirements and Dependencies...6
2 - Spatial
Installing the SDK...8 Hive User-Defined Spatial Functions ...9 Spark...73
3 - Appendix
PGD Builder...86 Download Permissions...88 Operators and Syntax Delimiters...89
In this section
What is Spectrum Location Intelligence for Big Data?...4 Spectrum Location Intelligence for Big Data Architecture...5 System Requirements and Dependencies...6
What is Spectrum Location Intelligence for Big Data?
The Precisely Spectrum Location Intelligence for Big Data is a toolkit for processing enterprise data for large scale spatial analysis. Billions of records can be processed in parallel, using MapReduce, Hive, and Apache Spark's cluster processing framework, yielding results faster than ever. Unlike traditional processing techniques that used to take weeks to process the data, now the data processing can be done in a few hours using this product.
Spectrum Location Intelligence for Big Data Architecture
What is Spectrum Location Intelligence for Big Data?
The Spectrum Location Intelligence for Big Data transforms and packages Location Intelligence components into an SDK for Big Data platforms like Hadoop for Spark, MapReduce, and Hive.
SDK provides:
• Integration APIs for Location Intelligence
• Input datasets and metadata API Types:
• Pre-built Spark and Hive UDF wrappers for Location Intelligence operations
• Core Location Intelligence APIs with sample MapReduce, Hive, and Spark programs (security enabled via Kerberos and Apache Sentry for Hive)
System Requirements and Dependencies
This product is verified on the following Hadoop distributions:
• Cloudera 5.16, 6.3, 7.1
• Hortonworks 3.0, 3.1
• EMR 5.30, 6.0
To use the product, you must be familiar with configuring Hadoop in Hortonworks, Cloudera, or EMR, and developing applications for distributed processing. For more information, refer toHortonworks, Cloudera, orEMRdocumentation.
The following additional tools must be available to use certain product features:
for Hive:
• Hive version 1.2.1 or above for Spark and Zeppelin Notebook:
• Java JDK version 1.8 or above
• Hadoop version 2.6.0 or above
• Spark version 2.0 or above
• Zeppelin Notebook is not supported in Cloudera
This section describes the Spark jobs and Hive user defined functions (UDFs) for geometry and coordinate operations and the ability to read TAB files.
Spark jobs use the Location Intelligence SDK (LI SDK) API in map and reduce operations to use the big data processing systems for spatial data analysis. The LI SDK provides geometry and coordinate operations, the ability to read TAB files, and in-memory r-tree creation and searching.
Hive UDFs also use the LI SDK API to provide SQL-like functions for spatial analysis in Hive.
In this section
Installing the SDK...8 Hive User-Defined Spatial Functions ...9 Spark...73
Installing the SDK
To use spatial functions for Spectrum Location Intelligence for Big Data, the Hadoop cluster must have reference data and libraries accessible from the master node.
For the purposes of this guide, we will:
• use a user called sdkuser
• install everything into /precisely
Perform the following steps from a node in your cluster, such as the master node.
1. Create the install directory and give ownership to sdkuser.
sudo mkdir /precisely
sudo chown sdkuser:sdkuser /precisely
2. Add the Location Intelligence distribution zip to the node at a temporary location, for example:
/precisely/temp/spectrum-bigdata-locationintelligence-version.zip 3. Extract the Location Intelligence distribution.
mkdir /precisely/li
mkdir /precisely/li/software
unzip /precisely/temp/spectrum-bigdata-locationintelligence-version.zip -d /precisely/li/software
4. Create an install directory on hdfs and give ownership to sdkuser.
sudo -u hdfs hadoop fs -mkdir -p hdfs:///precisely/li
sudo -u hdfs hadoop fs -chown -R sdkuser:sdkuser hdfs:///precisely 5. Upload the distribution into HDFS.
hadoop fs -copyFromLocal /precisely/li/software hdfs:///precisely/li
Hive User-Defined Spatial Functions
Hive user-defined functions (UDFs) create MapReduce jobs in SQL-like syntax so there is no need to write code. Spectrum Location Intelligence for Big Data and Spectrum Geocoding for Big Data provide Hive user defined functions for Geometry operations and to work with grids in the
spectrum-bigdata-li-hive-<version>.jar.
Refer to the table below to quickly navigate to Hive UDFs described in this document:
Name Description
Type
FromGeoJSON Construct an instance of WritableGeometry
from supported geometry representation formats
Constructor Functionson page 16
FromKML
FromWKB
FromWKT
ST_Point
GeoHashBoundary Grid processing functions
Grid Functionson page 56
GeoHashID
HexagonBoundary
HexagonID
SquareHashBoundary
SquareHashID
Name Description
Type
Area Geometry measurement functions
Measurement Functions on page 32
ClosestPoints
Distance
Length
Perimeter
ST_X Geometry observer functions
Observer Functionson page 49
ST_Y
ST_XMax
ST_XMin
ST_YMax
ST_YMin
ToGeoJSON Serialize an instance of WritableGeometry
to supported geometry representation formats
Persistence Functionson page 22
ToKML
ToWKB
ToWKT
Disjoint Geometry predicate functions
Predicate Functionson page 26
Intersects
Overlaps
Within
IsNullGeometry
Name Description
Type
Buffer Geometry processing functions
Processing Functionson page 42
ConvexHull
Intersection
Transform
Union
LocalPointInPolygon Spatial search functions
Search Functions on page 67
LocalSearchNearest
Setup
This topic assumes the product is installed to /precisely/li/software as described inInstalling the SDKon page 8. To set up user-defined spatial functions for Hive, perform the following steps:
1. Proceed according to your platform.
Do this On this platform
Copy the Hive jar for Location Intelligence to the HiveServer node.
/precisely/li/software/hive/lib/spectrum-bigdata-li-hive-version.jar
Cloudera
In Cloudera Manager, navigate to the Hive Configuration page. Search for the Hive Auxiliary JARs Directory setting. If the value is already set then move the Hive jar into the specified folder. If the value is not set then set it to the parent folder of the Hive jar.
/precisely/li/software/hive/lib/
On the HiveServer2 node, create the Hive auxlib folder if one does not already exist.
sudo mkdir /usr/hdp/current/hive-server2/auxlib/
Hortonworks
Copy the Hive jar for Location Intelligence to the auxlib folder on the HiveServer2 node:
sudo cp /precisely/li/software/hive/lib/
spectrum-bigdata-li-hive-version.jar /usr/hdp/current/hive-server2/auxlib/
2. Restart all Hive services.
3. Launch Beeline, or some other Hive client, for the remaining step.
beeline -u jdbc:hive2://localhost:10000/default -n sdkuser
4. Register spatial user-defined functions. Add the temporary keyword after create if you want a temporary function (this step would need to be redone for every new Hive session).
create temporary function FromWKT as 'com.pb.bigdata.spatial.hive.construct.FromWKT';
create temporary function FromWKB as 'com.pb.bigdata.spatial.hive.construct.FromWKB';
create temporary function FromKML as 'com.pb.bigdata.spatial.hive.construct.FromKML';
create temporary function FromGeoJSON as 'com.pb.bigdata.spatial.hive.construct.FromGeoJSON';
create temporary function ST_Point as 'com.pb.bigdata.spatial.hive.construct.ST_Point';
create temporary function ToWKT as 'com.pb.bigdata.spatial.hive.persistence.ToWKT';
create temporary function ToWKB as 'com.pb.bigdata.spatial.hive.persistence.ToWKB';
create temporary function ToKML as 'com.pb.bigdata.spatial.hive.persistence.ToKML';
create temporary function ToGeoJSON as 'com.pb.bigdata.spatial.hive.persistence.ToGeoJSON';
create temporary function Disjoint as 'com.pb.bigdata.spatial.hive.predicate.Disjoint';
create temporary function Overlaps as 'com.pb.bigdata.spatial.hive.predicate.Overlaps';
create temporary function Within as 'com.pb.bigdata.spatial.hive.predicate.Within';
create temporary function Intersects as 'com.pb.bigdata.spatial.hive.predicate.Intersects';
create temporary function IsNullGeometry as 'com.pb.bigdata.spatial.hive.predicate.IsNullGeometry';
create temporary function Area as 'com.pb.bigdata.spatial.hive.measurement.Area';
create temporary function ClosestPoints as 'com.pb.bigdata.spatial.hive.measurement.ClosestPoints';
create temporary function Distance as 'com.pb.bigdata.spatial.hive.measurement.Distance';
create temporary function Length as 'com.pb.bigdata.spatial.hive.measurement.Length';
create temporary function Perimeter as 'com.pb.bigdata.spatial.hive.measurement.Perimeter';
create temporary function ConvexHull as 'com.pb.bigdata.spatial.hive.processing.ConvexHull';
create temporary function Intersection as 'com.pb.bigdata.spatial.hive.processing.Intersection';
create temporary function Buffer as 'com.pb.bigdata.spatial.hive.processing.Buffer';
create temporary function Union as 'com.pb.bigdata.spatial.hive.processing.Union';
create temporary function Transform as 'com.pb.bigdata.spatial.hive.processing.Transform';
create temporary function ST_X as 'com.pb.bigdata.spatial.hive.observer.ST_X';
create temporary function ST_XMax as 'com.pb.bigdata.spatial.hive.observer.ST_XMax';
create temporary function ST_XMin as 'com.pb.bigdata.spatial.hive.observer.ST_XMin';
create temporary function ST_Y as 'com.pb.bigdata.spatial.hive.observer.ST_Y';
create temporary function ST_YMax as 'com.pb.bigdata.spatial.hive.observer.ST_YMax';
create temporary function ST_YMin as 'com.pb.bigdata.spatial.hive.observer.ST_YMin';
create temporary function GeoHashBoundary as 'com.pb.bigdata.spatial.hive.grid.GeoHashBoundary';
create temporary function GeoHashID as 'com.pb.bigdata.spatial.hive.grid.GeoHashID';
create temporary function HexagonBoundary as 'com.pb.bigdata.spatial.hive.grid.HexagonBoundary';
create temporary function HexagonID as 'com.pb.bigdata.spatial.hive.grid.HexagonID';
create temporary function SquareHashBoundary as 'com.pb.bigdata.spatial.hive.grid.SquareHashBoundary';
create temporary function SquareHashID as 'com.pb.bigdata.spatial.hive.grid.SquareHashID';
create temporary function LocalSearchNearest as
'com.pb.bigdata.spatial.hive.search.LocalSearchNearest';
create temporary function LocalPointInPolygon as
'com.pb.bigdata.spatial.hive.search.LocalPointInPolygon';
Note: If you want to view the complete stack trace for any encountered error, enable logging in DEBUG mode and then restart the job execution.
• The first time you run a job may take a while if the reference data has to be downloaded remotely from HDFS or S3. It may also time out when using a large number of datasets that are stored in remote locations such as HDFS or S3. If you are using Hive with the MapReduce engine, you can adjust the value of the mapreduce.task.timeout property.
• Some types of queries will cause Hive to evaluate UDFs in the HiveServer2 process space instead of on a data node. The Routing UDFs in particular use a significant amount of memory and can shut down the Hive server due to memory constraints. To process these queries, we recommend increasing the amount of memory available to the HiveServer2 process (for example, by setting HADOOP_HEAPSIZE in hive-env.sh).
WritableGeometry
This is an implementation of Hadoop's Writable interface for geometry.
Spatial Hive user defined functions (UDFs) use WritableGeometry to exchange data between two functions. Constructor Hive functions provide a mechanism to get an instance of WritableGeometry from standard geometry formats like WKT, WKB, GeoJSON and KML. For example:
To get an instance of WritableGeometry from WKT:
SELECT FromWKT(t.geometry,'epsg:4267') FROM hivetable t;
To get an instance of WritableGeometry from WKB string:
SELECT FromWKB(t.geometry,'epsg:4267') FROM hivetable t;
Persistence Hive UDFs convert an instance of WritableGeometry to standard formats like WKT, WKB, GeoJSON and KML. For example:
To serialize an instance of WritableGeometry to WKT:
SELECT ToWKT(t.geometry) FROM hivetable t;
The output of Constructor functions can be supplied as input to other Hive functions that perform some operations on it. For example:
To calculate the length of a geometry:
SELECT Length(FromWKT(t.geometry, 'epsg:4267'), 'm', 'SPHERICAL') FROM hivetable t;
To get the distance between two geometries:
SELECT Distance(FromWKT(t.geometry,'epsg:4267'), FromWKT(t.geometry2,'epsg:4267'), 'm', 'SPHERICAL') FROM hivetable t;
The IsNullGeometry Hive UDF provides the capability to perform a null check on an instance of WritableGeometry. The IsNullGeometry function returns true if the geometry is NULL or empty;
otherwise, it returns false. The result type is a boolean.
SELECT IsNullGeometry(null);
SELECT IsNullGeometry(FromWKT("POINT(10 20)"));
SELECT IsNullGeometry(ST_Point(x, y, 'epsg:4326')) FROM src;
For more information, see
https://hadoop.apache.org/docs/r3.1.1/api/org/apache/hadoop/io/Writable.html.
Geometry Functions
• Constructor Functions
• Persistence Functions
• Predicate Functions
• Measurement Functions
• Processing Functions
• Observer Functions
• Grid Functions
Constructor Functions
The following Constructor functions are available:
• FromGeoJSON
• FromWKT
• FromWKB
• FromKML
• Point FromGeoJSON
Description
The FromGeoJSON function returns a WritableGeometry instance from a GeoJSON representation of a geometry.
Function Registration
create function FromGeoJSON as 'com.pb.bigdata.spatial.hive.construct.FromGeoJSON';
Syntax
fromGeoJSON(String jsonGeometry)
Parameters
Description Type
Parameter
The geometry in geoJSON format.
String jsonGeometry
Return Values
Description Return Type
The geometry from geoJSON format.
WritableGeometry
Examples
SELECT FromGeoJSON('{ "type": "Point", "coordinates": [100.0, 0.0] }');
SELECT FromGeoJSON(t.geometry) FROM hivetable t;
FromWKT
Description
The FromWKT function returns a WritableGeometry instance from a Well-Known Text (WKT) representation of a geometry.
Function Registration
create function FromWKT as 'com.pb.bigdata.spatial.hive.construct.FromWKT';
Syntax
fromWKT(String geometry, [SpatialInfo CRS])
Parameters
Description Type
Parameter
The geometry in WKT format.
String geometry
Optional. The coordinate system for the geometry. Default = EPSG:4326 String
CRS
Return Values
Description Return Type
The geometry from WKT format.
WritableGeometry
Examples
SELECT FromWKT(t.geometry) FROM hivetable t;
SELECT FromWKT(t.geometry,'epsg:4267') FROM hivetable t;
SELECT FromWKT ('POINT (30 20)', 'epsg:4267');
FromWKB
Description
The FromWKB function returns a WritableGeometry instance from a Well-Known Binary (WKB) representation of a geometry.
Function Registration
create function FromWKB as 'com.pb.bigdata.spatial.hive.construct.FromWKB';
Syntax
fromWKB(String geometry, [SpatialInfo CRS])
Parameters
Description Type
Parameter
The WKB of the geometry in byte array format (byte[ ]) .
String geometry
Optional. The coordinate system for the geometry. Default = EPSG:4326 String
CRS
Return Values
Description Return Type
The geometry from WKB format.
WritableGeometry
Examples
SELECT FromWKB(unhex('010100000000000000000024400000000000002440'), 'epsg:4326');
FromKML
Description
The FromKML function returns a WritableGeometry instance from the text formatted in KML (Keyhole Markup Language).
Function Registration
create function FromKML as 'com.pb.bigdata.spatial.hive.construct.FromKML';
Syntax
fromKML(String geometry)
Parameters
Description Type
Parameter
A KML string, where only the geometry or geometry in placemark will be parsed.
String geometry
Return Values
Description Return Type
The geometry from KML format.
WritableGeometry
Examples
SELECT FromKML(t.geometry) FROM hivetable t;
ST_Point
Description
The ST_Point function constructs a point geometry from the provided X and Y, and an optional CRS.
Function Registration
create function ST_Point as 'com.pb.bigdata.spatial.hive.construct.ST_Point';
Syntax
ST_Point(String|Number X, String|Number Y, [SpatialInfo CRS])
Parameters
Description Type
Parameter
The X ordinate.
String or Number X
The Y ordinate.
String or Number Y
Optional. The coordinate system for the geometry. Default = EPSG:4326 String
CRS
Return Values
Description Return Type
The geometry of the specified X/Y coordinates. If any of the argument values are invalid, then an empty geometry will be returned in the output.
WritableGeometry
Examples
SELECT ST_Point(-73.750333 , 42.736103, 'epsg:4326');
SELECT ST_Point('-73.750333' , '42.736103', 'epsg:4326');
SELECT ST_Point(-73.750333 , 42.736103);
SELECT ST_Point('-73.750333' , 42.736103);
SELECT ST_Point(p.x, p.y, p.crs) FROM points p;
Persistence Functions
The following Persistence functions are available:
• ToGeoJSON
• ToWKT
• ToWKB
• ToKML ToGeoJSON
Description
The ToGeoJSON function returns a text formatted in GeoJSON representation of geometry from the specified WritableGeometry instance.
Function Registration
create function ToGeoJSON as 'com.pb.bigdata.spatial.hive.persistence.ToGeoJSON';
Syntax
ToGeoJSON(WritableGeometry geometry)
Parameters
Description Type
Parameter
The instance of a WritableGeometry.
WritableGeometry geometry
Return Values
Description Return Type
The GeoJSON representation of a geometry.
GeoJSON String
Examples
SELECT ToGeoJSON(Buffer(FromGeoJSON(t.geometry), 5.0, 'km', 12, 'SPHERICAL' )) FROM hivetable t;
ToWKT
Description
The ToWKT function returns a Well-Known Text (WKT) representation of a geometry from the specified WritableGeometry instance.
Function Registration
create function ToWKT as 'com.pb.bigdata.spatial.hive.persistence.ToWKT';
Syntax
ToWKT(WritableGeometry geometry)
Parameters
Description Type
Parameter
The instance of a WritableGeometry.
WritableGeometry geometry
Return Values
Description Return Type
The WKT representation of a geometry.
String
Examples
SELECT ToWKT(Buffer(FromGeoJSON(t.geometry), 5.0, 'km', 12, 'SPHERICAL' )) FROM hivetable t;
ToWKB
Description
The ToWKB function returns a byte array in a Well-Known Binary (WKB) representation of a geometry as parsed from the specified WritableGeometry instance.
Function Registration
create function ToWKB as 'com.pb.bigdata.spatial.hive.persistence.ToWKB';
Syntax
ToWKB(WritableGeometry geometry)
Parameters
Description Type
Parameter
The instance of a WritableGeometry.
WritableGeometry geometry
Return Values
Description Return Type
The WKB representation of a geometry expressed as a byte array.
Byte[ ]
Examples
SELECT ToWKB(Buffer(FromGeoJSON(t.geometry), 5.0, 'km', 12, 'SPHERICAL' )) FROM hivetable t;
ToKML
Description
The ToKML function returns a text formatted in KML in the OGC standard, KML2.2 namespace (http://schemas.opengis.net/kml/2.2.0/ogckml22.xsd) as parsed from the specified
WritableGeometry instance.
Function Registration
create function ToKML as 'com.pb.bigdata.spatial.hive.persistence.ToKML';
Syntax
ToKML(WritableGeometry geometry)
Parameters
Description Type
Parameter
The instance of a WritableGeometry.
WritableGeometry geometry
Return Values
Description Return Type
The KML representation of a geometry expressed as a hexadecimal encoded string.
String
Examples
SELECT ToKML(Buffer(FromGeoJSON(t.geometry), 5.0, 'km', 12, 'SPHERICAL' )) FROM hivetable t;
Predicate Functions
The following Predicate functions are available:
• Disjoint
• Intersects
• Overlaps
• Within
• IsNullGeometry Disjoint
Description
The Disjoint function tests if two geometry objects have no points in common.
Function Registration
create function Disjoint as 'com.pb.bigdata.spatial.hive.predicate.Disjoint';
Syntax
Disjoint(WritableGeometry geometry1, WritableGeometry geometry2)
Parameters
Description Type
Parameter
The first instance of a WritableGeometry.
WritableGeometry geometry1
The second instance of a WritableGeometry.
WritableGeometry geometry2
Return Values
Description Return Type
True, if the two geometry objects have no points in common; otherwise, False.
If either geometry1 or geometry2 are null, Null is returned.
Boolean
Examples
SELECT Disjoint(FromWKT(t1.geometry, 'epsg:4326'), FromWKT(t2.geometry, 'epsg:4326')) FROM hivetable1 t1, hivetable2 t2;
Intersects
Description
The Intersects function determines whether or not one geometry object intersects another geometry object.
Function Registration
create function Intersects as 'com.pb.bigdata.spatial.hive.predicate.Intersects';
Syntax
Intersects(WritableGeometry geometry1, WritableGeometry geometry2)
Parameters
Description Type
Parameter
The first instance of a WritableGeometry.
WritableGeometry geometry1
The second instance of a WritableGeometry.
WritableGeometry geometry2
Return Values
Description Return Type
True, if there is any direct position in common between the two geometries; otherwise, False.
If either geometry1 or geometry2 are null, Null is returned.
Boolean
Examples
SELECT Intersects(FromWKT(t1.geometry, 'epsg:4326'), FromWKT(t2.geometry, 'epsg:4326')) FROM hivetable1 t1, hivetable2 t2;
SELECT R.highway FROM USA_RIVERS L, usa_highways R WHERE L.name='Hudson River' AND Intersects(FromWKT(L.geom), FromWKT(R.geom));
Overlaps
Description
The Overlaps function determines whether or not one geometry object overlaps another geometry object.
Function Registration
create function Overlaps as 'com.pb.bigdata.spatial.hive.predicate.Overlaps';
Syntax
Overlaps(WritableGeometry geometry1, WritableGeometry geometry2)
Parameters
Description Type
Parameter
The first instance of a WritableGeometry.
WritableGeometry geometry1
The second instance of a WritableGeometry.
WritableGeometry geometry2
Return Values
Description Return Type
True, if the geometry1 overlaps the geometry2; otherwise, False.
If either geometry1 or geometry2 are null, Null is returned.
Boolean
Examples
SELECT Overlaps(FromWKT(t1.geometry, 'epsg:4326'), FromWKT(t2.geometry, 'epsg:4326')) FROM hivetable1 t1, hivetable2 t2;
Within
Description
The Within function returns whether or not one geometry object is entirely within another geometry object.
Function Registration
create function Within as 'com.pb.bigdata.spatial.hive.predicate.Within';
Syntax
Within(WritableGeometry geometry1, WritableGeometry geometry2)
Parameters
Description Type
Parameter
The first instance of a WritableGeometry.
WritableGeometry geometry1
The second instance of a WritableGeometry.
WritableGeometry geometry2
Return Values
Description Return Type
True, if the geometry2 entirely contains geometry1; otherwise, False.
If either geometry1 or geometry2 are null, Null is returned.
Boolean
Examples
SELECT Within(FromWKT(t1.geometry, 'epsg:4326'), FromWKT(t2.geometry, 'epsg:4326')) AS Result FROM hivetable1 t1, hivetable2 t2;
SELECT L.zipcode as zipcode, SUM(L.insurance) as TotalInsuredAmount, AVG(R.riskdesc) AS RiskScore
FROM book_of_business L, FIRE_RISK_BOUNDRIES R
WHERE Within(FromWKT(L.location), FromWKT(R.geom)) GROUP BY L.zipcode;
IsNullGeometry
Description
The IsNullGeometry function performs a null check of the input geometry.
Function Registration
create function IsNullGeometry as 'com.pb.bigdata.spatial.hive.predicate.IsNullGeometry';
Syntax
IsNullGeometry(WritableGeometry inputGeometry)
Parameters
Description Type
Parameter
The input geometry to be checked for a null or empty value.
WritableGeometry inputGeometry
Return Values
Description Return Type
True, if the geometry is null or empty; otherwise, False.
Boolean
Examples
SELECT IsNullGeometry(null);
SELECT IsNullGeometry(FromWKT("POINT(10 20)"));
SELECT IsNullGeometry(ST_Point(x, y, 'epsg:4326')) FROM src;
Measurement Functions
The following Measurement functions are available:
• Area
• ClosestPoints
• Distance
• Length
• Perimeter Area
Description
The Area function calculates and returns the area of given Geometry in the desired unit. The unit must be specified as a parameter while calling the function. The area of a polygon is computed as the area of its exterior ring minus the areas of its interior rings. Points and curves have zero area.
Function Registration
create function Area as 'com.pb.bigdata.spatial.hive.measurement.Area';
Syntax
Area(WritableGeometry geometry, String areaUnits, [String computationType])
Parameters
Description Type
Parameter
The input geometry.
WritableGeometry geometry
The desired return unit type. For valid values, seeArea Unitson page 33.
String areaUnits
Description Type
Parameter
Optional. Indicates the logic to be used to interpret geometry coordinates. The
computation type is based on the coordinate system of the geometry being operated on:
• For geographic (long/lat) coordinate systems: Valid type = SPHERICAL (default)
• For projected coordinate systems: Valid types = CARTESIAN, SPHERICAL (default)
• For engineering coordinate systems: Valid type = CARTESIAN (default)
The geometry coordinates are interpreted using cartesian logic.
CARTESIAN
The geometry coordinates are interpreted using spherical logic.
SPHERICAL String
computationType
Area Units
Valid values for unit are the following area units:
Description Value
square miles sq mi
square kilometers sq km
square inches sq in
square foot sq ft
square yards sq yd
square millimeters sq mm
square centimeters sq cm
square meters sq m
Description Value
square US Survey feet sq survey ft
square nautical miles sq nmi
acres acre
hectares ha
Return Values
Description Return Type
The area of the geometry.
Double
Examples
SELECT Area(FromWkt(t.geometry,'epsg:4267'), 'sq mi', 'SPHERICAL') FROM hivetable t;
SELECT Area(FromWkt(t.geometry,'epsg:4267'), 'sq mi') FROM hivetable t;
ClosestPoints
Description
The ClosestPoints function returns the closest points between two geometries.
Function Registration
create function ClosestPoints as 'com.pb.bigdata.spatial.hive.measurement.ClosestPoints';
Syntax
ClosestPoints(WritableGeometry geometry1, WritableGeometry geometry2)
Parameters
Description Type
Parameter
The first instance of a WritableGeometry.
WritableGeometry geometry1
The second instance of a WritableGeometry.
WritableGeometry geometry2
Return Values
Description Return Type
The closest points between the two geometries. The geometries that intersect are at distance zero from each other, and in this case a shared point is returned.
Array<WritableGeometry>
Examples
SELECT ToWKT(result.points[0]) AS point1, ToWKT(result.points[1]) AS point2 FROM hivetable t LATERAL VIEW OUTER inline(array(named_struct('points', ClosestPoints(FromWKT(t.geometry1, 'epsg:4326'), FromWkt(t.geometry2, 'epsg:4326'), 'SPHERICAL')))) result;
Distance
Description
The Distance function calculates and returns the distance between two geometries.
Function Registration
create function Distance as 'com.pb.bigdata.spatial.hive.measurement.Distance';
Syntax
Distance(WritableGeometry geometry1, WritableGeometry geometry2, String linearUnits, [String computationType])
Parameters
Description Type
Parameter
The first instance of a WritableGeometry.
WritableGeometry geometry1
The second instance of a WritableGeometry.
WritableGeometry geometry2
The desired return unit type. For valid values, seeLinear Unitson page 36.
String linearUnits
Optional. Indicates the logic to be used to interpret geometry coordinates. The
computation type is based on the coordinate system of the geometry being operated on:
• For geographic (long/lat) coordinate systems: Valid type = SPHERICAL (default)
• For projected coordinate systems: Valid types = CARTESIAN, SPHERICAL (default)
• For engineering coordinate systems: Valid type = CARTESIAN (default)
The geometry coordinates are interpreted using cartesian logic.
CARTESIAN
The geometry coordinates are interpreted using spherical logic.
SPHERICAL String
computationType
Linear Units
The following table lists the valid values for unit type.
Description Value
miles mi
kilometers km
inches in
Description Value
feet ft
yards yd
millimeters mm
centimeters cm
meters m
US Survey feet survey ft
nautical miles nmi
Return Values
Description Return Type
The distance between the two geometries. Geometries that intersect are at distance zero from each other. Distance is always non-negative.
Double
Examples
SELECT Distance(FromWkt(t.geometry,'epsg:4267'), FromWkt(t.geometry2,'epsg:4267'), 'm', 'SPHERICAL') FROM hivetable t;
SELECT Distance(FromWkt(t.geometry,'epsg:4267'), FromWkt(t.geometry2,'epsg:4267'), 'm') FROM hivetable t;
Length
Description
The Length function calculates and returns the geographic length of a line or polyline geometry object in the desired unit type.
Function Registration
create function Length as 'com.pb.bigdata.spatial.hive.measurement.Length';
Syntax
Length(WritableGeometry geometry, String linearUnits, [String computationType])
Parameters
Description Type
Parameter
The input geometry.
WritableGeometry geometry
The desired return unit type. For valid values, seeLinear Unitson page 38.
String linearUnits
Optional. Indicates the logic to be used to interpret geometry coordinates. The
computation type is based on the coordinate system of the geometry being operated on:
• For geographic (long/lat) coordinate systems: Valid type = SPHERICAL (default)
• For projected coordinate systems: Valid types = CARTESIAN, SPHERICAL (default)
• For engineering coordinate systems: Valid type = CARTESIAN (default)
The geometry coordinates are interpreted using cartesian logic.
CARTESIAN
The geometry coordinates are interpreted using spherical logic.
SPHERICAL String
computationType
Linear Units
Valid values for unit type:
Description Value
miles mi
kilometers km
inches in
feet ft
yards yd
millimeters mm
centimeters cm
meters m
US Survey feet survey ft
nautical miles nmi
Return Values
Description Return Type
The length of the geometry.
Double
Examples
SELECT Length(FromWkt(t.geometry,'epsg:4267'), 'm', 'SPHERICAL') FROM hivetable t;
SELECT Length(FromWkt(t.geometry,'epsg:4267'), 'm') FROM hivetable t;
Perimeter
Description
The Perimeter function calculates and returns the total perimeter of a given geometry in the desired unit type. The Perimeter of a polygon is the sum of the lengths of its rings (both exterior and holes).
The curves are considered as thin polygons.
Function Registration
create function Perimeter as 'com.pb.bigdata.spatial.hive.measurement.Perimeter';
Syntax
Perimeter(WritableGeometry geometry, String linearUnits, [String computationType])
Parameters
Description Type
Parameter
The input geometry.
WritableGeometry geometry
The desired return unit type. For valid values, seeLinear Unitson page 41.
String linearUnits
Optional. Indicates the logic to be used to interpret geometry coordinates. The
computation type is based on the coordinate system of the geometry being operated on:
• For geographic (long/lat) coordinate systems: Valid type = SPHERICAL (default)
• For projected coordinate systems: Valid types = CARTESIAN, SPHERICAL (default)
• For engineering coordinate systems: Valid type = CARTESIAN (default)
The geometry coordinates are interpreted using cartesian logic.
CARTESIAN
The geometry coordinates are interpreted using spherical logic.
SPHERICAL String
computationType
Linear Units
Valid values for unit type:
Description Value
miles mi
kilometers km
inches in
feet ft
yards yd
millimeters mm
centimeters cm
meters m
US Survey feet survey ft
nautical miles nmi
Return Values
Description Return Type
The perimeter of the geometry.
Double
Examples
SELECT Perimeter(FromWkt(t.geometry,'epsg:4267'), 'm', 'SPHERICAL') FROM hivetable t;
Processing Functions
The following Processing functions are available:
• Buffer
• ConvexHull
• Intersection
• Transform
• Union Buffer
Description
The Buffer function returns an instance of WritableGeometry having a MultiPolygon geometry inside it which represents a buffered distance around another geometry object.
Function Registration
create function Buffer as 'com.pb.bigdata.spatial.hive.processing.Buffer';
Syntax
Buffer(WritableGeometry geometry, Number offset, String linearUnit, Number resolution, [String computationType])
Parameters
Description Type
Parameter
The geometry to buffer.
WritableGeometry geometry
The distance from the input geometry.
Number offset
The desired return unit type. For valid values, seeLinear Unitson page 43.
String linearUnits
Description Type
Parameter
Specifies how many straight-line segments are used in approximating a circle. For example, with a resolution of 8, the buffer of a point will be an octagon. Buffers with larger resolution values take more time and space to compute.
Number resolution
Optional. Indicates the logic to be used to interpret geometry coordinates. The
computation type is based on the coordinate system of the geometry being operated on:
• For geographic (long/lat) coordinate systems: Valid type = SPHERICAL (default)
• For projected coordinate systems: Valid types = CARTESIAN, SPHERICAL (default)
• For engineering coordinate systems: Valid type = CARTESIAN (default)
The geometry coordinates are interpreted using cartesian logic.
CARTESIAN
The geometry coordinates are interpreted using spherical logic.
SPHERICAL String
computationType
Linear Units
Valid values for unit type:
Description Value
miles mi
kilometers km
inches in
feet ft
Description Value
millimeters mm
centimeters cm
meters m
US Survey feet survey ft
nautical miles nmi
Return Values
Description Return Type
A geometry consisting of all the points within the offset distance of the input geometry.
WritableGeometry
Examples
SELECT Buffer(FromWKT(t.geometry,'epsg:4267'), 5.0, 'km', 12, 'SPHERICAL' ) FROM hivetable t;
SELECT Buffer(ST_POINT(5, 6, 'epsg:4267'), 5.0, 'km', 12, 'SPHERICAL' );
ConvexHull
Description
The ConvexHull function computes the convex hull of a geometry. The convex hull is the smallest convex geometry that contains all the points in the input geometry.
Function Registration
create function ConvexHull as 'com.pb.bigdata.spatial.hive.processing.ConvexHull';
Syntax
ConvexHull(WritableGeometry geometry)
Parameters
Description Type
Parameter
The input geometry.
WritableGeometry geometry
Return Values
Description Return Type
The convex hull of the geometry.
WritableGeometry
Examples
SELECT ConvexHull(FromWKT(geometry, 'epsg:4326')) FROM hivetable;
SELECT ToWKT(ConvexHull(FromWKT(table.geometry,'epsg:4267'))) AS result FROM hivetable;
SELECT ConvexHull(FromWKT('MULTIPOLYGON (((40 40, 20 45, 45 30, 40 40)), ((20 35, 10 30, 10 10, 30 5, 45 20, 20 35), (30 20, 20 15, 20 25, 30 20)))'),'epsg:4267');
Intersection
Description
The Intersection function is a geometry (point, line, or curve) common in two geometry objects (such as lines, curves, planes, and surfaces). It returns the geometry consisting of direct positions that lie in both specified geometries.
Function Registration
create function Intersection as 'com.pb.bigdata.spatial.hive.processing.Intersection';
Syntax
Intersection(WritableGeometry geometry1, WritableGeometry geometry2)
Parameters
Description Type
Parameter
The first instance of a WritableGeometry.
WritableGeometry geometry1
The second instance of a WritableGeometry.
WritableGeometry geometry2
Return Values
Description Return Type
The geometry formed from the direct positions that are common to both input geometries.
WritableGeometry
Examples
SELECT Intersection(FromWKT(t1.geometry,'epsg:4267'), FromWKT(t2.geometry,'epsg:4267')) FROM hivetable1 t1, hivetable2 t2;
Transform
Description
The Transform function transforms a given geometry from one coordinate system to another.
Function Registration
create function Transform as 'com.pb.bigdata.spatial.hive.processing.Transform';
Syntax
Transform(WritableGeometry geometry, String CRS)
Parameters
Description Type
Parameter
The source input geometry.
WritableGeometry geometry
The destination coordinate system for the geometry.
String CRS
Return Values
Description Return Type
The geometry transformed to the destination coordinate system.
WritableGeometry
Examples
SELECT Transform(FromWKT(t.geometry,'epsg:4326'), 'epsg:3857') FROM hivetable t;
SELECT Transform(ST_POINT(30, 20),'epsg:3857');
Union
Description
The Union function returns a geometry object which represents the union of two input geometry objects.
Function Registration
create function Union as 'com.pb.bigdata.spatial.hive.processing.Union';
Syntax
Union(WritableGeometry geometry1, WritableGeometry geometry2)
Parameters
Description Type
Parameter
The first instance of a WritableGeometry.
WritableGeometry geometry1
The second instance of a WritableGeometry.
WritableGeometry geometry2
Return Values
Description Return Type
The geometry that represents the union of the input geometries.
WritableGeometry
Examples
SELECT Union(FromWKT(geometry1, 'epsg:4326'), FromWKT(geometry2, 'epsg:4326')) FROM hivetable;
SELECT ToWKT(Union(FromWKT(t1.geometry,'epsg:4267'),FromWKT(t2.geometry,'epsg:4267'))) FROM hivetable1 t1, hivetable2 t2;
Observer Functions
Obtaining the X and Y ordinates of a geometry is important when dealing with XY tables. For example, the TRANSFORM UDF accepts and returns a geometry which means an XY table cannot be transformed FROM one coordinate system to another. The ST_X and ST_Y UDFs allow the transformation of an XY table FROM one coordinate system to another.
Another common need is the ability to filter records in an XY table by the bounds of a geometry. The ST_XMax, ST_XMin, ST_YMax, and ST_YMin UDFs provide a way to get the values of the MBR for a writeable geometry.
The following Observer index functions are available:
• ST_X
• ST_XMax
• ST_XMin
• ST_Y
• ST_YMax
• ST_YMin ST_X
Description
The ST_X function returns the X ordinate of the geometry if the geometry is a point, or Null if the geometry is not a point or is null.
Function Registration
create function ST_X as 'com.pb.bigdata.spatial.hive.observer.ST_X';
Syntax
ST_X(WritableGeometry geometry)
Parameters
Description Type
Parameter
The input geometry.
WritableGeometry geometry
Return Values
Description Return Type
The X ordinate of the geometry if the geometry is a point, or Null if the geometry is not a point or is null.
Double
Examples
SELECT ST_X(ST_Point(x, y, 'epsg:4326')) FROM src;
ST_XMax
Description
The ST_XMax function returns the X maxima of a geometry, or Null if the specified value is not a geometry.
Function Registration
create function ST_XMax as 'com.pb.bigdata.spatial.hive.observer.ST_XMax';
Syntax
ST_XMax(WritableGeometry geometry)
Parameters
Description Type
Parameter
The input geometry.
WritableGeometry geometry
Return Values
Description Return Type
The X maxima of the input geometry, or Null if the specified value is not a geometry.
Double
Examples
SELECT ST_XMax(FromWKT(…, 'epsg:4326')) FROM src;
ST_XMin
Description
The ST_XMin function returns the X minima of a geometry, or Null if the specified value is not a geometry.
Function Registration
create function ST_XMin as 'com.pb.bigdata.spatial.hive.observer.ST_XMin';
Syntax
ST_XMin(WritableGeometry geometry)
Parameters
Description Type
Parameter
The input geometry.
WritableGeometry geometry
Return Values
Description Return Type
The X minima of the input geometry, or Null if the specified value is not a geometry.
Double
Examples
SELECT ST_XMin(FromWKT(…, 'epsg:4326')) FROM src;
ST_Y
Description
The ST_Y function returns the Y ordinate of the geometry if the geometry is a point, or Null if the geometry is not a point or is null.
Function Registration
create function ST_Y as 'com.pb.bigdata.spatial.hive.observer.ST_Y';
Syntax
ST_Y(WritableGeometry geometry)
Parameters
Description Type
Parameter
The input geometry.
WritableGeometry geometry
Return Values
Description Return Type
The Y ordinate of the geometry if the geometry is a point, or Null if the geometry is not a point or is null.
Double
Examples
SELECT ST_Y(ST_Point(x, y, 'epsg:4326')) FROM src;
ST_YMax
Description
The ST_YMax function returns the Y maxima of a geometry, or Null if the specified value is not a geometry.
Function Registration
create function ST_YMax as 'com.pb.bigdata.spatial.hive.observer.ST_YMax';
Syntax
ST_YMax(WritableGeometry geometry)
Parameters
Description Type
Parameter
The input geometry.
WritableGeometry geometry
Return Values
Description Return Type
The Y maxima of the input geometry, or Null if the specified value is not a geometry.
Double
Examples
SELECT ST_YMax(FromWKT(…, 'epsg:4326')) FROM src;
ST_YMin
Description
The ST_YMin function returns the Y minima of a geometry, or Null if the specified value is not a geometry.
Function Registration
create function ST_YMin as 'com.pb.bigdata.spatial.hive.observer.ST_YMin';
Syntax
ST_YMin(WritableGeometry geometry)
Parameters
Description Type
Parameter
The input geometry.
WritableGeometry geometry
Return Values
Description Return Type
The Y minima of the input geometry, or Null if the specified value is not a geometry.
Double
Examples
SELECT ST_YMin(FromWKT(…, 'epsg:4326')) FROM src;
Grid Functions
A grid is a way of dividing the surface of the earth into contiguous cells with no gaps in between.
This makes grids very useful for spatial indexing and aggregating.
Spectrum™Location Intelligence for Big Data provides Hive user defined functions (UDF) for hashing that allow you to manage grid cells for a variety of use cases. Hashing is a way of encoding and decoding the grid cell using the cell boundary and a unique identifier.
We provide three types of UDFs for processing three grid cell shapes:
• rectangular (geohash)
• hexagon (hexagon hash)
• square (square hash)
Hashes are useful for analysis and interoperability with other systems.
Square hash is similar to GeoHash but has the advantage that when displayed in Popular Mercator, the cells appear as squares.
Hexagons are often used in telecommunication solutions as they approximate circles while covering the surface of the earth without gaps.
The following Grid index functions are available:
• GeoHashBoundary:
• GeoHashID
• HexagonBoundary
• HexagonID
• SquareHashBoundary
• SquareHashID GeoHashBoundary
Description
The GeoHashBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location. It also can return the boundary of the cell that contains the given point at the specified precision. The shape of the cell is rectangular.
Function Registration
create function GeoHashBoundary as 'com.pb.bigdata.spatial.hive.grid.GeoHashBoundary';
Syntax
GeoHashBoundary(String UNIQUE_ID)
Parameters
Description Type
Parameter
The unique geohash identifier of a cell in a grid.
String UNIQUE_ID
Return Values
Description Return Type
A representation of the boundary of a cell in a grid.
WritableGeometry
Examples
SELECT GeoHashBoundary(hashStringId) FROM hivetable;
SELECT GeoHashBoundary("ebvnk");
Syntax
GeoHashBoundary(Number|String X, Number|String Y, Number precision)
Parameters
Description Type
Parameter
The longitude value of the point.
Number or String X
The latitude value of the point.
Number or String Y
The length of the string key to be returned.
The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells).
Number precision
Return Values
Description Return Type
The boundary of the grid cell at the given precision that the point falls into.
WritableGeometry
Examples
SELECT GeoHashBoundary(x, y, precision) FROM hivetable;
SELECT GeoHashBoundary("-73.750333", "42.736103", 3);
SELECT GeoHashBoundary(-73.750333, 42.736103, 3);
GeoHashID
Description
The GeoHashID function returns a unique well-known string ID for the grid cell. The ID then is sortable and searchable that corresponds to the specified X, Y, and precision.
Function Registration
create function GeoHashID as 'com.pb.bigdata.spatial.hive.grid.GeoHashID';
Syntax
GeoHashID(Number|String X, Number|String Y, Number precision)
Parameters
Description Type
Parameter
The longitude value of the point.
Number or String X
The latitude value of the point.
Number or String Y
Description Type
Parameter
The length of the string key to be returned.
The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells).
Number precision
Return Values
Description Return Type
The geohash ID of the grid cell at the specified precision that contains the point.
String
Examples
SELECT GeoHashID(x, y, precision) FROM hivetable;
SELECT GeoHashID("-73.750333", "42.736103", 3);
SELECT GeoHashID(-73.750333, 42.736103, 3);
CREATE TEMPORARY TABLE tmptbl AS
SELECT *, (GeoHashID(x, y, 10)) AS hashID FROM coordinates ORDER BY hashID;
INSERT INTO TABLE coordinates_with_hash SELECT *, (GeoHashID(x, y, 10)) AS hashID FROM coordinates ORDER BY hashID;
SELECT c.hashID, ToWKT(GeoHashBoundary (c.hashID)), count (*) as quantity FROM (SELECT GeoHashID(x, y, 10) AS hashID FROM coordinates) c
GROUP BY c.hashID;
HexagonBoundary
Description
The HexagonBoundary function returns a WritableGeometry that defines the boundary of a cell in a grid if given a unique ID for the location. It also can return the boundary of the cell that contains the given point at the specified precision. The shape of the cell is a hexagon.
Function Registration
create function HexagonBoundary as 'com.pb.bigdata.spatial.hive.grid.HexagonBoundary';
Syntax
HexagonBoundary(String UNIQUE_ID)
Parameters
Description Type
Parameter
The unique geohash identifier of a cell in a grid.
String UNIQUE_ID
Return Values
Description Return Type
A representation of the boundary of a cell in a grid.
WritableGeometry
Examples
SELECT HexagonBoundary(hashStringId) FROM hivetable;
SELECT HexagonBoundary("PF625028642");
Syntax
HexagonBoundary(Number|String X, Number|String Y, Number precision)
Parameters
Description Type
Parameter
The longitude value of the point.
Number or String X
The latitude value of the point.
Number or String Y
The length of the string key to be returned.
The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells).
Number precision
Return Values
Description Return Type
The boundary of the grid cell at the given precision that the point falls into.
WritableGeometry
Examples
SELECT HexagonBoundary(x, y, precision) FROM hivetable;
SELECT HexagonBoundary(-73.750333, 42.736103, 3);
HexagonID
Description
The HexagonID function returns a unique well-known string ID for the grid cell. The ID then is sortable and searchable that corresponds to the specified X, Y, and precision.
Function Registration
create function HexagonID as 'com.pb.bigdata.spatial.hive.grid.HexagonID';
Syntax
HexagonID(Number|String X, Number|String Y, Number precision)
Parameters
Description Type
Parameter
The longitude value of the point.
Number or String X
The latitude value of the point.
Number or String Y
The length of the string key to be returned.
The precision determines how large the grid cells are (longer strings means higher precision and smaller grid cells).
Number precision
Return Values
Description Return Type
The ID of the grid cell at the specified precision that contains the point.
String
Examples
SELECT HexagonID(x, y, precision) FROM hivetable;
SELECT HexagonID(-73.750333, 42.736103, 3);
CREATE TEMPORARY TABLE tmptbl AS
SELECT *, (HexagonID(x, y, 10)) AS hashID FROM coordinates ORDER BY hashID;
INSERT INTO TABLE coordinates_with_hash SELECT *, (HexagonID(x, y, 10)) AS hashID FROM coordinates ORDER BY hashID;
SELECT c.hashID, ToWKT(HexagonBoundary(c.hashID)),
count (*) AS quantity FROM (SELECT HexagonID(x, y, 10) AS hashID FROM coordinates)
c GROUP BY c.hashID;