The following Search functions are available:
• LocalPointInPolygon:
• LocalSearchNearest
LocalPointInPolygon
Description
The LocalPointInPolygon UDTF function returns the polygon (contained in the specified TAB or shapefile) in which an input point resides. All the geometries where the search point lies within are returned by this UDTF; that is, if the search point lies on a Line, Polyline, Point, Polygon, or MultiPolygon geometry type, the respective geometries will be returned in the output.
Function Registration
create function LocalPointInPolygon as
'com.pb.bigdata.spatial.hive.search.LocalPointInPolygon';
Syntax
LocalPointInPolygon(WritableGeometry inputPoint, String dataSourcePath, [map(String options)])
Parameters
Description Type
Parameter
A WritableGeometry representing a point.
WritableGeometry inputPoint
Description Type
Parameter
The path to the data source to be searched.
The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node.
Note: If you are storing and distributing your data remotely using HDFS or S3, you must set the option for remoteDataSourceLocation and also specify the download location as described in the table below.
String dataSourcePath
Optional.Optionsthat allow you to set return criteria, in <String, String> format.
Map the charset to use when
reading a shapefile shpCharset
'shpCrs', 'epsg:4326' the coordinate reference
system to use when reading a shapefile
shpCrs
'remoteDataSourceLocation, 'hdfs:///data/mydata.zip' the path to the directory or
archive that contains the data source (required only if you remoteDataSourceLocation
are storing and distributing data remotely on HDFS or S3)
'downloadLocation', '/precisely/downloads' the local file system location
to which resources get downloaded (required only if downloadLocation
Note: If you are also using Spectrum Geocoding for you are storing and
distributing data remotely on
HDFS or S3) Big Data and have already
set the
pb.download.location Hive variable, then you do not need to set this option here as well.
Example Description
Option
'downloadGroup', 'dm_users'
the operating system group which should be applied to downloaded data on a local downloadGroup
file system; the default is the value from the Hive strow, pb.download.group (required only if you are storing and distributing data remotely on HDFS or S3) For more information, see Download Permissionson page 88.
Return Values
Description Return Type
The polygon (contained in the specified TAB or shapefile) in which an input point resides. All the geometries where the search point lies within are returned; that is, if the search point lies on a Line, Polyline, Point, Polygon, or MultiPolygon geometry type, the respective geometries will be returned in the output.
geometry
Examples
Using HDFS:
SELECT pip_points.id, pipresult.capital, pipresult.state FROM pip_points LATERAL VIEW
LocalPointInPolygon(FromWKT(pip_points.geometry, pip_points.crs), '/STATECAP.TAB',
map('remoteDataSourceLocation', 'hdfs:///data/pip/capitals.zip',
'downloadLocation', '/precisely/pip/download', 'downloadGroup', 'dm_users')) pipresult
In the above example, id is a field from the pip_points table, which is the table being used to get the points we are searching from. The pipresult.capital and pipresult.state fields are from the STATECAP TAB file that we want in our query result.
Tip: To improve performance when searching TAB files, consider creating PGD (prepared geometry) index files. For more information, seePGD Builderon page 86.
LocalSearchNearest
Description
The LocalSearchNearest UDTF function returns the nearest geometry or geometries contained in the specified TAB or shapefile to an input point.
Function Registration
create function LocalSearchNearest as
'com.pb.bigdata.spatial.hive.search.LocalSearchNearest';
Syntax
LocalSearchNearest(WritableGeometry inputPoint, String dataSourcePath, [map(String options)])
Parameters
Description Type
Parameter
A WritableGeometry representing the point to search near.
WritableGeometry inputPoint
the location of the input TAB or shapefile. The path can be either a relative path based on the remote resource or a local path to a file that must be available on the master node and every data node.
Note: If you are storing and distributing your data remotely using HDFS or S3, you must set the option for
remoteDataSourceLocationand also specify the download location as described in the table below.
String dataSourcePath
Optional.Optionsthat allow you to return more than one value, return additional information, or set other return criteria, in <String, String>
format.
map options
Options
Example Description
Option
'maxCandidates', '3' the maximum number of
results to return (if not set, the default value is 1) maxCandidates
'maxDistance', '25' the maximum distance to
search for results (if not set, the default value is no limit) maxDistance
'distanceUnit', 'mi' the distance unit (if not set,
the default value is m for meters)
See theDistanceon page 35 function for examples of supported distance units.
distanceUnit
'returnDistanceColumnName', 'Miles'
the name of the column to use for returning the distance
returnDistanceColumnName
'shpCharset', 'utf-8' the charset to use when
reading a shapefile shpCharset
'shpCrs', 'epsg:4326' the coordinate reference
system to use when reading a shapefile
shpCrs
'remoteDataSourceLocation, 'hdfs:///data/mydata.zip' the path to the directory or
archive that contains the data source (required only if remoteDataSourceLocation
you are storing and
distributing data remotely on HDFS or S3)
'downloadLocation', '/precisely/downloads' the local file system location
to which resources get downloaded (required only if downloadLocation
Note: If you are also using Spectrum Geocoding for Big you are storing and
distributing data remotely on
HDFS or S3) Data and have already set
the
pb.download.location Hive variable, then you do not need to set this option here as well.
'downloadGroup', 'dm_users' the operating system group
which should be applied to downloadGroup
Example Description
Option
value from the Hive strow, pb.download.group (required only if you are storing and distributing data remotely on HDFS or S3) For more information, see Download Permissionson page 88.
'queryFilter','Name Like
\'%Park%\'' search on a subset of the
data based on an attribute queryFilter
Note: For more information about defining filter
expressions, seeOperators and Syntax Delimiterson page 89.
Return Values
Description Return Type
The nearest geometry or geometries contained in the specified TAB or shapefile to the input point.
geometry
Examples
Using HDFS:
SELECT search_points.id, nearestresult.capital, nearestresult.state FROM search_points LATERAL VIEW OUTER
LocalSearchNearest(FromWKT(search_points.geometry, search_points.crs), '/STATECAP.TAB', map('maxCandidates', '3',
'remoteDataSourceLocation', 'hdfs:///data/search/capitals.zip',
'downloadLocation', '/precisely/search/download', 'downloadGroup', 'dm_users', 'queryFilter', 'Name Like \'%Park%\'')) nearestresult
In the above example, id is a field from the search_points table, which is the table being used to get the points we are searching from. The nearestresult.capital and nearestresult.state fields are from the STATECAP TAB file that we want in our query result. In this particular example, the maxCandidates option limits the results to 3 records for each search point.
Tip: To improve performance when searching TAB files, consider creating PGD (prepared geometry) index files. For more information, seePGD Builderon page 86.