The Platfora command-line management utilities are located in $PLATFORA_HOME/bin of your Platfora server installation. All utility commands should be executed from the Platfora master node.
Topics:
•setup.py
•hadoop-check
•hadoopcp
•hadoopfs
•install-node
•platfora-catalog
•platfora-config
•platfora-export
•platfora-import
•platfora-license
•platfora-node
•platfora-services
•platfora-syscapture
•platfora-syscheck
setup.py
Initializes a new Platfora instance or upgrades an existing one. Can also be used to reset bootstrap system configuration properties.
Synopsis
setup.py [-h] [-q] [-v] [-V]
setup.py [--hadoop_conf path] [--platfora_conf path] [--datadir path] [--dfs_dir dfs_path] [--port admin_port] [--
[--websvc_port http_port] [--ssl_port https_port] [-- jvmsize jvm_size]
[--hadoop_version string] [--extraclasspath path] [-- extrajavalib path]
[--skip_checks] [--skip_syscheck] [--skip_sync] [-- skip_setup_ssl] [--skip_setup_dfscachesize]
[--skip_setup_telemetry] [--upgrade_catalog] [--nochanges] [--verbose]
Description
The setup.py utility is run on the Platfora master node after installing the Platfora software, but
before starting the Platfora server for the first time.
For new installations, setup.py:
• Runs platfora-syscheck to verify that all system prerequisites have been met.
• Confirms that you have installed the correct Platfora software package for your intended Hadoop distribution.
• Prompts for bootstrap configuration information, such as port numbers, directory locations, memory resources, secure connections, and diagnostic data collection.
• Verifies that the supplied ports are open and that permissions and disk space are sufficient on both the local and remote DFS file systems.
• Initializes the Platfora metadata catalog database in PostgreSQL. • Creates the default System Administrator user account.
• Copies setup files to the Platfora storage location in the configured Hadoop DFS. For upgrade installations, setup.py:
• Runs platfora-syscheck to verify that all system prerequisites have been met.
• Confirms that you have installed the correct Platfora software package for your intended Hadoop distribution.
• Displays your current bootstrap configuration settings and prompts if you want to make changes. • Upgrades the Platfora metadata catalog database in PostgreSQL if necessary.
• Copies any updated library files to the Platfora storage location in the configured Hadoop DFS. • Synchronizes the Platfora software and configuration files on the worker nodes in a multi-node
installation.
Required Arguments No required arguments. Optional Arguments -c | --hadoop_conf path
-C | --platfora_conf path
This is the local directory where Platfora will store its configuration files. Defaults to $PLATFORA_CONF_DIR if set.
-d | --datadir path
This is the local directory where Platfora will store its metadata catalog database, lens data, and log files. Defaults to $PLATFORA_DATA_DIR if set.
--data_port
This is the data transfer port used during query proccessing on multi-node Platfora clusters. By default, uses the same port number as the master node.
--db_port port
This is the port of the PostgreSQL database instance where the Platfora metadata catalog database resides. The default PostgreSQL port is 5432.
--db_dump_path path
This is the path where the backup SQL file of the Platfora metadata catalog database will be created prior to upgrading the catalog. Defaults to the current directory.
-g | --dfs_dir dfs_path
This is the remote directory in the configured Hadoop distributed file system (DFS) where Platfora will store its library files and MapReduce output (lens data).
-j | --extraclasspath path
This is the path where the Platfora server will look for additional custom Java classes (.jar files), such as those for Hive JDBC connectors, custom Hive SerDes, or user-defined functions. These are not included in Lens Building in Hadoop. They are deprecated, please use $PLATFORA_DATA_DIR/extlib instead. -l | --extrajavalib path
This is the path where the Platfora server should look for native Java libraries. These are not included in Lens Building in Hadoop. They are deprecated, please use $PLATFORA_DATA_DIR/extlib instead. -n | --nochanges
On upgrade, do not prompt the user if they want to make changes to their current Platfora bootstrap configuration settings.
-p | --port admin_port
This is the server administration port used for management utility and API calls to the Platfora server. This is also the port that multi-node Platfora servers use to connect to each other. The default is 8002. -s | --jvmsize jvm_size
The maximum amount of Java virtual memory (JVM) allocated to a Platfora server process. On a dedicated machine, this should be about 80 percent of total system memory. You can specify size using M for megabytes or G for gigabytes.
--skip_checks
Do not do safety checks, such as verifying ports, disk space, and file permissions. --skip_setup_dfscachesize
Do not prompt to configure the maximum local disk space utilization for storing lens data. If this question is skipped, Platfora will set the maximum to 80 percent of the available space in
$PLATFORA_DATA_DIR. When this limit is reached, lens builds will fail during the pre-fetch stage. --skip_setup_ssl
Do not prompt to configure secure connections (SSL) between browser clients and the Platfora server. If these questions are skipped, the default is no (do not use SSL).
--skip_sync
Do not sync the installation directory to the worker nodes. --skip_syscheck
Do not run the platfora-syscheck utility prior to setup. --skip_setup_telemetry
Do not prompt to disable/enable diagnostic data collection. If these questions are skipped, the default is yes (enable diagnostic data collection), and the company name is set to default (anonymous).
-t | --hadoop_version version_string
The version string corresponding to the Hadoop distribution you are using with Platfora. Valid values are cdh5 (Cloudera 5.0.x an5.1.x), cdh52 (Cloudera 5.2.x and 5.3.x), cdh54 (Cloudera 5.4.x), mapr4 (MapR 4.0.1), mapr402 (MapR 4.0.2 and 4.1.x), emr3 (Amazon Elastic Map Reduce), HDP_2.1 (Hortonworks 2.1.x), HDP_2.2 (Hortonworks 2.2.x), pivotal_3 (PivotalHD 3.0).
--upgrade_catalog
Automatically upgrade the metadata catalog schema if necessary. The catalog update check is run by default.
-v | --verbose
Runs in verbose mode. Show all output messages. -w | --websvc_port http_port
This is the HTTP listener port for the Platfora web application server. This is the port that browser clients use to connect to Platfora. The default is 8001.
This is the HTTPS listener port for the Platfora web application server. This is the SSL port that browser clients use to connect to Platfora. The default is 8443.
Examples
Run setup without doing the prerequisite checks first: $ setup.py --skip_syscheck
Run initial setup without any prompts using the specified bootstrap configuration settings (or use the default settings when not specified):
$ setup.py --hadoop_conf /home/platfora/hadoop_conf --platfora_conf / home/platfora/platfora_conf \
--datadir /data/platfora --dfs_dir /user/platfora --jvmsize 12G -- hadoop_version cdh4 \
--skip_setup_ssl --skip_setup_dfscachesize --skip_setup_telemetry Run upgrade setup without any prompts and keep all previous configuration settings:
$ setup.py --upgrade_catalog --nochanges
hadoop-check
Checks the Hadoop cluster connected to Platfora to make sure it is not misconfigured. Collects information about the Hadoop environment for troubleshooting purposes.
Synopsis
hadoop-check [-h] [-v] [-vv] [-V] Description
The hadoop-check utility verifies that Hadoop is correctly configured for use with Platfora. It also collects system information from the Hadoop cluster environment. You must complete setup.py before running this utility.
Output from this utility is logged in $PLATFORA_DATA_DIR/logs/hadoop-check.log. It performs the following checks:
• Root DFS Test. This test makes sure that Platfora can connect to the configured Hadoop file system, and that file permissions are correct on the directories that Platfora needs to write to. It also makes sure that any jar files that have been placed in $PLATFORA_DATA_DIR/extlib have the correct file permissions.
• File Codec Test. This test makes sure that Platfora has the codecs (file compression libraries) it needs to recognize and read the compression types supported in Hadoop. If Hadoop is configured to support a compression type that Platfora does not recognize, then this test will fail. You can put the jar files for any additional codecs in $PLATFORA_DATA_DIR/extlib of the Platfora server (requires a restart).
• Hadoop Host Configuration Test. This test runs a small MapReduce job on the Hadoop cluster and reports back information from the Hadoop environment. It makes sure that memory is not over- subscribed on the Hadoop MapReduce cluster. These tests assume that all nodes in the Hadoop cluster have the same resource configuration (same amount of memory, CPU cores, etc.).
The check retunrs a RC (return code) value. A return code 0 means all tests passed. Return code 1 means one or more tests failed.
Root DFS Test
This test is skipped if Platfora is configured to use Amazon S3. Test DFS file system information and returns the following:
Total The total disk space in the Platfora storage directory on the Hadoop file system.
Used The used disk space in the Platfora storage directory on the
Hadoop file system.
Available The available disk space in the Platfora storage directory on the Hadoop file system.
Permissions on the Platfora
DFS Directory Permissions on Platfora DFS Directory The platfora systemuser has write permissions to the Platfora storage directory on the Hadoop file system (PASSED or FAILED).
File Codec Test
Codecs Installed The file compression libraries that are installed in Hadoop. Output compression in
Hadoop Conf Checks if the mapred-site.xml propertymapred.output.compress is enabled, and if it is makes sure the compression library specified in mapred.output.compression.codec is also installed in Platfora.
Hadoop Host Configuration Test
JobTracker Status
(ResourceManager for YARN)
Ensures the server is up and running. Black Listed Tasktrackers
(NodeManagers for YARN)
Lists the number of servers marked unavailable in the Hadoop cluster.
Total Cluster Map Tasks Total number of map task slots available. This is the value of mapred.tasktracker.map.tasks.maximum in the JobTracker for pre-YARN distributions. This is the value of mapreduce.tasktracker.map.tasks.maximum in the
Total Cluster Map Tasks Total number of map task slots available. This is the value of mapred.tasktracker.map.tasks.maximum in the JobTracker.
Total Cluster Map Tasks Total number of map task slots available. This is the value of mapreduce.tasktracker.map.tasks.maximum in the ResourceManager.
Map Tasks Occupied The number of map task slots that were occupied at the time of the test.
Total Cluster Reduce Tasks Total number of reduce task slots available. This is the value of
mapred.tasktracker.reduce.tasks.maximum in the JobTracker. This is the mapreduce.tasktracker.reduce.tasks.maximum in the ResourceManager for YARN distributions.
Reduce Tasks Occupied The number of reduce task slots that were occupied at the time of the test.
Job Submission Took How long it took for Platfora to submit the test MapReduce job.
Hadoop Host The host name of the JobTracker.The host name of the ResourceManager node for YARN distributions. Hadoop Version The version of Hadoop that is running.
CPUs Number of CPUs per TaskTracker node.Number of CPUs for the NodeManager in YARN distributions.
RAM The available memory per TaskTracker. The available memory per NodeManager in YARN distributions.
Map Slots Maximum map task slots available.
Reduce Slots Maximum reduce task slots available.
Hadoop Configured Memory The configured amount of memory available to MapReduce processes. Looks at maximum JVM size per task (mapred.child.java.opts) times the total number of tasks slots. The total number of task slots is equal to mapred.tasktracker.map.tasks.maximum plus mapred.tasktracker.reduce.tasks.maximum for pre- YARN distributions. The total number of task slots is equal to mapreduce.tasktracker.map.tasks.maximum plus mapreduce.tasktracker.reduce.tasks.maximum on YARN distributions.
This test will fail if the Hadoop configured memory exceeds available RAM.
Required Arguments No required arguments. Optional Arguments -h | --help
Shows the command-line syntax help and then exits. -v | --verbose
Runs in verbose mode. Show all output messages. -V | --version
Shows the software version information and then exits. -vv
Runs in extra verbose mode. Examples
Test and collect information from the Hadoop cluster that Platfora is configured to use: $ hadoop-check
hadoopcp
Copies a file from one location in the configured DFS to another location in the configured DFS with the ability to transcode files.
Synopsis
hadoopcp source_dfs_uri destination_dfs_uri Description
The hadoopcp utility allows you to copy a file residing in the remote Hadoop DFS from one location to another and optionally transcode the file.
File paths must be specified in URI format using the appropriate DFS file system protocol. For example, hdfs:// for Cloudera, Apache, or Hortonworks Hadoop, maprfs:// for MapR, s3n:// for
Amazon S3.
This command executes as the currently logged in system user (the platfora user, for example). The target directory location must exist, and this user must have write permissions to the directory.
Required Arguments source_dfs_uri
hdfs://hostname:[port]/dfs_path destination_dfs_uri
The target location in a remote Hadoop file system in URI format. For example: hdfs://hostname:[port]/dfs_path
Optional Arguments -h
Shows the command-line syntax help and then exits. Examples
Copy the file /mydata/foo.csv residing in HDFS to the same location in HDFS but transcode it to a gzip compressed file:
$ hadoodcp hdfs://localhost/mydata/foo.csv hdfs://localhost/mydata/ foo.csv.gz
hadoopfs
Executes the specified hadoop fs command on the remote Hadoop file system. Synopsis
hadoopfs -command Description
The hadoopfs utility allows you to run Hadoop file system commands from the Platfora server. This is analagous to running the specified hadoop fs command on the Hadoop NameNode server.
The command executes as the currently logged in system user (the platfora user, for example). This user must have sufficient Hadoop file system permissions to perform the command.
Required Arguments -command
A Hadoop file system shell command. See the Hadoop Shell Command Documentation for the list of possible commands.
Optional Arguments No optional arguments. Examples
List the contents of the /platfora/uploads directory in the configured Hadoop file system: $ hadoopfs -ls /platfora/uploads
Remove the file /platfora/uploads/test.csv in the configured Hadoop file system: $ hadoopfs -rm /platfora/uploads/test.csv