Platfora Utilities Reference

In document Platfora Installation Guide (Page 85-94)

The Platfora command-line management utilities are located in $PLATFORA_HOME/bin of your Platfora server installation. All utility commands should be executed from the Platfora master node.

Topics:

setup.py

hadoop-check

hadoopcp

hadoopfs

install-node

platfora-catalog

platfora-config

platfora-export

platfora-import

platfora-license

platfora-node

platfora-services

platfora-syscapture

platfora-syscheck

setup.py

Initializes a new Platfora instance or upgrades an existing one. Can also be used to reset bootstrap system configuration properties.

Synopsis

setup.py [-h] [-q] [-v] [-V]

setup.py [--hadoop_conf path] [--platfora_conf path] [--datadir path] [--dfs_dir dfs_path] [--port admin_port] [--

[--websvc_port http_port] [--ssl_port https_port] [-- jvmsize jvm_size]

[--hadoop_version string] [--extraclasspath path] [-- extrajavalib path]

[--skip_checks] [--skip_syscheck] [--skip_sync] [-- skip_setup_ssl] [--skip_setup_dfscachesize]

[--skip_setup_telemetry] [--upgrade_catalog] [--nochanges] [--verbose]

Description

The setup.py utility is run on the Platfora master node after installing the Platfora software, but

before starting the Platfora server for the first time.

For new installations, setup.py:

• Runs platfora-syscheck to verify that all system prerequisites have been met.

• Confirms that you have installed the correct Platfora software package for your intended Hadoop distribution.

• Prompts for bootstrap configuration information, such as port numbers, directory locations, memory resources, secure connections, and diagnostic data collection.

• Verifies that the supplied ports are open and that permissions and disk space are sufficient on both the local and remote DFS file systems.

• Initializes the Platfora metadata catalog database in PostgreSQL. • Creates the default System Administrator user account.

• Copies setup files to the Platfora storage location in the configured Hadoop DFS. For upgrade installations, setup.py:

• Runs platfora-syscheck to verify that all system prerequisites have been met.

• Confirms that you have installed the correct Platfora software package for your intended Hadoop distribution.

• Displays your current bootstrap configuration settings and prompts if you want to make changes. • Upgrades the Platfora metadata catalog database in PostgreSQL if necessary.

• Copies any updated library files to the Platfora storage location in the configured Hadoop DFS. • Synchronizes the Platfora software and configuration files on the worker nodes in a multi-node

installation.

Required Arguments No required arguments. Optional Arguments -c | --hadoop_conf path

-C | --platfora_conf path

This is the local directory where Platfora will store its configuration files. Defaults to $PLATFORA_CONF_DIR if set.

-d | --datadir path

This is the local directory where Platfora will store its metadata catalog database, lens data, and log files. Defaults to $PLATFORA_DATA_DIR if set.

--data_port

This is the data transfer port used during query proccessing on multi-node Platfora clusters. By default, uses the same port number as the master node.

--db_port port

This is the port of the PostgreSQL database instance where the Platfora metadata catalog database resides. The default PostgreSQL port is 5432.

--db_dump_path path

This is the path where the backup SQL file of the Platfora metadata catalog database will be created prior to upgrading the catalog. Defaults to the current directory.

-g | --dfs_dir dfs_path

This is the remote directory in the configured Hadoop distributed file system (DFS) where Platfora will store its library files and MapReduce output (lens data).

-j | --extraclasspath path

This is the path where the Platfora server will look for additional custom Java classes (.jar files), such as those for Hive JDBC connectors, custom Hive SerDes, or user-defined functions. These are not included in Lens Building in Hadoop. They are deprecated, please use $PLATFORA_DATA_DIR/extlib instead. -l | --extrajavalib path

This is the path where the Platfora server should look for native Java libraries. These are not included in Lens Building in Hadoop. They are deprecated, please use $PLATFORA_DATA_DIR/extlib instead. -n | --nochanges

On upgrade, do not prompt the user if they want to make changes to their current Platfora bootstrap configuration settings.

-p | --port admin_port

This is the server administration port used for management utility and API calls to the Platfora server. This is also the port that multi-node Platfora servers use to connect to each other. The default is 8002. -s | --jvmsize jvm_size

The maximum amount of Java virtual memory (JVM) allocated to a Platfora server process. On a dedicated machine, this should be about 80 percent of total system memory. You can specify size using M for megabytes or G for gigabytes.

--skip_checks

Do not do safety checks, such as verifying ports, disk space, and file permissions. --skip_setup_dfscachesize

Do not prompt to configure the maximum local disk space utilization for storing lens data. If this question is skipped, Platfora will set the maximum to 80 percent of the available space in

$PLATFORA_DATA_DIR. When this limit is reached, lens builds will fail during the pre-fetch stage. --skip_setup_ssl

Do not prompt to configure secure connections (SSL) between browser clients and the Platfora server. If these questions are skipped, the default is no (do not use SSL).

--skip_sync

Do not sync the installation directory to the worker nodes. --skip_syscheck

Do not run the platfora-syscheck utility prior to setup. --skip_setup_telemetry

Do not prompt to disable/enable diagnostic data collection. If these questions are skipped, the default is yes (enable diagnostic data collection), and the company name is set to default (anonymous).

-t | --hadoop_version version_string

The version string corresponding to the Hadoop distribution you are using with Platfora. Valid values are cdh5 (Cloudera 5.0.x an5.1.x), cdh52 (Cloudera 5.2.x and 5.3.x), cdh54 (Cloudera 5.4.x), mapr4 (MapR 4.0.1), mapr402 (MapR 4.0.2 and 4.1.x), emr3 (Amazon Elastic Map Reduce), HDP_2.1 (Hortonworks 2.1.x), HDP_2.2 (Hortonworks 2.2.x), pivotal_3 (PivotalHD 3.0).

--upgrade_catalog

Automatically upgrade the metadata catalog schema if necessary. The catalog update check is run by default.

-v | --verbose

Runs in verbose mode. Show all output messages. -w | --websvc_port http_port

This is the HTTP listener port for the Platfora web application server. This is the port that browser clients use to connect to Platfora. The default is 8001.

This is the HTTPS listener port for the Platfora web application server. This is the SSL port that browser clients use to connect to Platfora. The default is 8443.

Examples

Run setup without doing the prerequisite checks first: $ setup.py --skip_syscheck

Run initial setup without any prompts using the specified bootstrap configuration settings (or use the default settings when not specified):

$ setup.py --hadoop_conf /home/platfora/hadoop_conf --platfora_conf / home/platfora/platfora_conf \

--datadir /data/platfora --dfs_dir /user/platfora --jvmsize 12G -- hadoop_version cdh4 \

--skip_setup_ssl --skip_setup_dfscachesize --skip_setup_telemetry Run upgrade setup without any prompts and keep all previous configuration settings:

$ setup.py --upgrade_catalog --nochanges

hadoop-check

Checks the Hadoop cluster connected to Platfora to make sure it is not misconfigured. Collects information about the Hadoop environment for troubleshooting purposes.

Synopsis

hadoop-check [-h] [-v] [-vv] [-V] Description

The hadoop-check utility verifies that Hadoop is correctly configured for use with Platfora. It also collects system information from the Hadoop cluster environment. You must complete setup.py before running this utility.

Output from this utility is logged in $PLATFORA_DATA_DIR/logs/hadoop-check.log. It performs the following checks:

Root DFS Test. This test makes sure that Platfora can connect to the configured Hadoop file system, and that file permissions are correct on the directories that Platfora needs to write to. It also makes sure that any jar files that have been placed in $PLATFORA_DATA_DIR/extlib have the correct file permissions.

File Codec Test. This test makes sure that Platfora has the codecs (file compression libraries) it needs to recognize and read the compression types supported in Hadoop. If Hadoop is configured to support a compression type that Platfora does not recognize, then this test will fail. You can put the jar files for any additional codecs in $PLATFORA_DATA_DIR/extlib of the Platfora server (requires a restart).

Hadoop Host Configuration Test. This test runs a small MapReduce job on the Hadoop cluster and reports back information from the Hadoop environment. It makes sure that memory is not over- subscribed on the Hadoop MapReduce cluster. These tests assume that all nodes in the Hadoop cluster have the same resource configuration (same amount of memory, CPU cores, etc.).

The check retunrs a RC (return code) value. A return code 0 means all tests passed. Return code 1 means one or more tests failed.

Root DFS Test

This test is skipped if Platfora is configured to use Amazon S3. Test DFS file system information and returns the following:

Total The total disk space in the Platfora storage directory on the Hadoop file system.

Used The used disk space in the Platfora storage directory on the

Hadoop file system.

Available The available disk space in the Platfora storage directory on the Hadoop file system.

Permissions on the Platfora

DFS Directory Permissions on Platfora DFS Directory The platfora systemuser has write permissions to the Platfora storage directory on the Hadoop file system (PASSED or FAILED).

File Codec Test

Codecs Installed The file compression libraries that are installed in Hadoop. Output compression in

Hadoop Conf Checks if the mapred-site.xml propertymapred.output.compress is enabled, and if it is makes sure the compression library specified in mapred.output.compression.codec is also installed in Platfora.

Hadoop Host Configuration Test

JobTracker Status

(ResourceManager for YARN)

Ensures the server is up and running. Black Listed Tasktrackers

(NodeManagers for YARN)

Lists the number of servers marked unavailable in the Hadoop cluster.

Total Cluster Map Tasks Total number of map task slots available. This is the value of mapred.tasktracker.map.tasks.maximum in the JobTracker for pre-YARN distributions. This is the value of mapreduce.tasktracker.map.tasks.maximum in the

Total Cluster Map Tasks Total number of map task slots available. This is the value of mapred.tasktracker.map.tasks.maximum in the JobTracker.

Total Cluster Map Tasks Total number of map task slots available. This is the value of mapreduce.tasktracker.map.tasks.maximum in the ResourceManager.

Map Tasks Occupied The number of map task slots that were occupied at the time of the test.

Total Cluster Reduce Tasks Total number of reduce task slots available. This is the value of

mapred.tasktracker.reduce.tasks.maximum in the JobTracker. This is the mapreduce.tasktracker.reduce.tasks.maximum in the ResourceManager for YARN distributions.

Reduce Tasks Occupied The number of reduce task slots that were occupied at the time of the test.

Job Submission Took How long it took for Platfora to submit the test MapReduce job.

Hadoop Host The host name of the JobTracker.The host name of the ResourceManager node for YARN distributions. Hadoop Version The version of Hadoop that is running.

CPUs Number of CPUs per TaskTracker node.Number of CPUs for the NodeManager in YARN distributions.

RAM The available memory per TaskTracker. The available memory per NodeManager in YARN distributions.

Map Slots Maximum map task slots available.

Reduce Slots Maximum reduce task slots available.

Hadoop Configured Memory The configured amount of memory available to MapReduce processes. Looks at maximum JVM size per task (mapred.child.java.opts) times the total number of tasks slots. The total number of task slots is equal to mapred.tasktracker.map.tasks.maximum plus mapred.tasktracker.reduce.tasks.maximum for pre- YARN distributions. The total number of task slots is equal to mapreduce.tasktracker.map.tasks.maximum plus mapreduce.tasktracker.reduce.tasks.maximum on YARN distributions.

This test will fail if the Hadoop configured memory exceeds available RAM.

Required Arguments No required arguments. Optional Arguments -h | --help

Shows the command-line syntax help and then exits. -v | --verbose

Runs in verbose mode. Show all output messages. -V | --version

Shows the software version information and then exits. -vv

Runs in extra verbose mode. Examples

Test and collect information from the Hadoop cluster that Platfora is configured to use: $ hadoop-check

hadoopcp

Copies a file from one location in the configured DFS to another location in the configured DFS with the ability to transcode files.

Synopsis

hadoopcp source_dfs_uri destination_dfs_uri Description

The hadoopcp utility allows you to copy a file residing in the remote Hadoop DFS from one location to another and optionally transcode the file.

File paths must be specified in URI format using the appropriate DFS file system protocol. For example, hdfs:// for Cloudera, Apache, or Hortonworks Hadoop, maprfs:// for MapR, s3n:// for

Amazon S3.

This command executes as the currently logged in system user (the platfora user, for example). The target directory location must exist, and this user must have write permissions to the directory.

Required Arguments source_dfs_uri

hdfs://hostname:[port]/dfs_path destination_dfs_uri

The target location in a remote Hadoop file system in URI format. For example: hdfs://hostname:[port]/dfs_path

Optional Arguments -h

Shows the command-line syntax help and then exits. Examples

Copy the file /mydata/foo.csv residing in HDFS to the same location in HDFS but transcode it to a gzip compressed file:

$ hadoodcp hdfs://localhost/mydata/foo.csv hdfs://localhost/mydata/ foo.csv.gz

hadoopfs

Executes the specified hadoop fs command on the remote Hadoop file system. Synopsis

hadoopfs -command Description

The hadoopfs utility allows you to run Hadoop file system commands from the Platfora server. This is analagous to running the specified hadoop fs command on the Hadoop NameNode server.

The command executes as the currently logged in system user (the platfora user, for example). This user must have sufficient Hadoop file system permissions to perform the command.

Required Arguments -command

A Hadoop file system shell command. See the Hadoop Shell Command Documentation for the list of possible commands.

Optional Arguments No optional arguments. Examples

List the contents of the /platfora/uploads directory in the configured Hadoop file system: $ hadoopfs -ls /platfora/uploads

Remove the file /platfora/uploads/test.csv in the configured Hadoop file system: $ hadoopfs -rm /platfora/uploads/test.csv

In document Platfora Installation Guide (Page 85-94)