Running Hadoop and Stratosphere
jobs on TomPouce cluster
TomPouce cluster
•
TomPouce is a cluster of 20 calcula@on nodes = 240
cores
•
Located in the Inria Turing building (École
Polytechnique)
•
Used jointly by Inria teams
•
Jobs are run with the help of a scheduler SGE (Sun
TomPouce cluster
•
SPECIFICATIONS:
–
Calcula@on:
•
20 nodes > bi-‐processors > 6 cores. Total: 240 cores
•
48 Gb Ram per node
•
Local space 400 Gb
–
Storage:
•
Dell R510 /home 19 Tb NFS
•
Dell R710 x2 /scratch 37 Tb FHGFS (Fraunhofer FS)
–
Network:
•
Switch Dell 5548
•
Switch infiniband Mellanox InfiniScale IV QDR
Execute a Hadoop/Stratosphere job
1.
Copy your job from the local machine to the cluster front node
$ scp myjob.jar inria_username@
195.83.212.209
:~/
!
myjob.jar
will be copied in the folder
/home/leo/inria_username
.
Execute a Hadoop/Stratosphere job
2.
Connect via ssh to front node
!
$ ssh inria_username@
195.83.212.209
!
!
Welcome to Bright Cluster Manager 6.0 Based on Scientific
!Linux release 6 Cluster
!Manager ID: #120054!
!Use the following
!commands to adjust your environment:!
!'module avail ' - show available modules !
!'module add <module> ' - adds a module to your environment for this session !
!'module initadd <module> ' - configure module to be loaded at every login !
IMPORTANT
: To connect to the cluster, you ssh key should be stored in the
Inria LDAP. If not, send an e-‐mail with your public ssh key to:
Execute a Hadoop/Stratosphere job
3.
Log in as clustervision superuser using your LDAP password
!
$ sudo su - clustervision
!
-‐ To execute Hadoop and Stratosphere jobs and edit configura@ons needed.
-‐ If you don’t have enough permissions, ask for them to:
helpmi-‐saclay@inria.fr
Execute a Hadoop/Stratosphere job
4.
Add Hadoop/Stratosphere environment to your session
–
To add Hadoop environment, type:
!!
$module add hadoop/1.1.1
–
To add Stratosphere environment, type:
$module add stratosphere/stratosphere
-‐
To add an environment automa@cally when you login:
$module initadd hadoop/1.1.1
-‐
To check all the environments loaded:
!
$ module list
!
!Currently Loaded Modulefiles:!
!1) gcc/4.7.0 2) intel-cluster-checker/1.8 3) stratosphere/stratosphere-0.2.1 ! !4) sge/2011.11 5) openmpi/gcc/64/1.4.5 6) gromacs/openmpi/gcc/64/4.0.7 ! !7) hadoop/1.1.1!
Execute a Hadoop/Stratosphere job
4.
Add Hadoop/Stratosphere environment to your session
•
Hadoop installa@on:
/cm/shared/apps/hadoop/current/
•
Stratosphere installa@on:
/cm/shared/apps/stratosphere/current/
Execute a Hadoop/Stratosphere job
5.
Create an execuTon script (Hadoop)
!
#!/bin/bash
!
#$ -N hadoop_run
!
#$ -pe hadoop 12
!
#$ -j y
!
#$ -o output.$JOB_ID
!
#$ -l h_rt=00:10:00,hadoop=true,excl=true
!
#$ -cwd
!
#$ -q hadoop.q
!
!
#Copy the input files into the HDFS filesystem
!
hadoop --config /home/guests/clustervision/current/ dfs -copyFromLocal
!
/home/guests/clustervision/tmp /input
!
!
#Running the hadoop task(s) here. I am specifying the jar, class, run parameters:
hadoop --config /home/guests/clustervision/current/ jar myjob.jar
org.myorg.job /input /output
!
!
# Copying the output files from the HDFS filesystem
!
hadoop --config /home/guests/clustervision/current/ fs –get /output
!
Execute a Hadoop/Stratosphere job
5.
Create an execuTon script (Hadoop)
!
#!/bin/bash
!
#$ -N hadoop_run
!
#$ -pe hadoop 12
!
#$ -j y
!
#$ -o output.$JOB_ID
!
#$ -l h_rt=00:10:00,hadoop=true,excl=true
!
#$ -cwd
!
#$ -q hadoop.q
!
!
#Copy the input files into the HDFS filesystem
!
hadoop --config /home/guests/clustervision/current/ dfs -copyFromLocal
!
/home/guests/clustervision/tmp /input
!
!
#Running the hadoop task(s) here. I am specifying the jar, class, run parameters:
hadoop --config /home/guests/clustervision/current/ jar myjob.jar
org.myorg.job /input /output
!
!
# Copying the output files from the HDFS filesystem
!
hadoop --config /home/guests/clustervision/current/ fs –get /output
!
! !
Execute a Hadoop/Stratosphere job
SGE execuTon parameters:
•
Should be wrigen aher ‘#$’ at the beginning of the
script.
•
-‐N <job_name> . Used to give a name to the job to
run.
•
-‐pe <environment> N . Specifies the environment. N
is the number of cores (limited to 180).
•
-‐j y : to use the same output file (errors and standard
exit).
Execute a Hadoop/Stratosphere job
SGE execuTon parameters:
•
-‐o output.$JOB_ID: the standard output will be in a
file name ouput.$JOB_ID. $JOB_ID will be the
number SGE will assign automa@cally to our job.
•
-‐l name=value. Used to demand a resource. In this
case:
•
h_rt=00:10:00
indicates that the job should be killed aher
10 minutes
•
hadoop=true
indicates that the job to run is a Hadoop job
(it DOES NOT CHANGE for Stratosphere jobs)
•
excl=true
indicates that it is executed exclusively
!
!
Execute a Hadoop/Stratosphere job
5.
Create an execuTon script (Hadoop)
HADOOP COMMANDS
•
Copy input files into HDFS
!
hadoop --config /home/guests/clustervision/current/ dfs -copyFromLocal !
/home/guests/clustervision/tmp /input!
!
•
Run Hadoop tasks
hadoop --config /home/guests/clustervision/current/ jar !
/pathToJob/myjob.jar org.myorg.job /input /output!
!
•
Copy output files from HDFS
hadoop --config /home/guests/clustervision/current/ fs –get /output
!
Execute a Hadoop/Stratosphere job
5.
Create an execuTon script (Hadoop)
HADOOP COMMANDS
•
Copy input files into HDFS
!
hadoop
--config /home/guests/clustervision/current/
dfs -copyFromLocal !
/home/guests/clustervision/tmp /input!
!
•
Run Hadoop tasks
hadoop
--config /home/guests/clustervision/current/
jar !
/pathToJob/myjob.jar org.myorg.job /input /output!
!
•
Copy output files from HDFS
hadoop
--config /home/guests/clustervision/current/
fs –get /output
!
!
Execute a Hadoop/Stratosphere job
5.
Create an execuTon script (Stratosphere):
! #!/bin/bash #$ -N strato_run ! #$ -pe stratosphere 24 ! #$ -j y ! #$ -o output.$JOB_ID ! #$ -l h_rt=00:10:00,hadoop=true,excl=true ! #$ -cwd ! #$ -q hadoop.q! ! export PATH=$PATH:'/cm/shared/apps/hadoop/current/conf/' ! export STRATOSPHERE_HOME='/cm/shared/apps/stratosphere/current’! MASTER=`cat /home/guests/clustervision/current/masters`! !
hadoop --config /home/guests/clustervision/current/ dfs -copyFromLocal /home/guests/ clustervision/tmp /var/hadoop/dfs.name.dir!
!
$STRATOSPHERE_HOME/bin/pact-client.sh run -j myjob.jar -a 2 hdfs://$MASTER:50040/var/hadoop/ dfs.name.dir/inputFile hdfs://$MASTER:50040/var/hadoop/dfs.name.dir/outputFile!
!
hadoop --config /home/guests/clustervision/current/ fs -get /var/hadoop/dfs.name.dir/output!
Execute a Hadoop/Stratosphere job
5.
Create an execuTon script (Stratosphere):
! #!/bin/bash #$ -N strato_run ! #$ -pe stratosphere 24 ! #$ -j y ! #$ -o output.$JOB_ID ! #$ -l h_rt=00:10:00,hadoop=true,excl=true ! #$ -cwd ! #$ -q hadoop.q! ! export PATH=$PATH:'/cm/shared/apps/hadoop/current/conf/' ! export STRATOSPHERE_HOME='/cm/shared/apps/stratosphere/current’! MASTER=`cat /home/guests/clustervision/current/masters`! !
hadoop --config /home/guests/clustervision/current/ dfs -copyFromLocal /home/guests/ clustervision/tmp /input!
!
$STRATOSPHERE_HOME/bin/pact-client.sh run -j myjob.jar -a 2 hdfs://$MASTER:50040/input !
hdfs://$MASTER:50040/output! !
hadoop --config /home/guests/clustervision/current/ fs -get /output!
!
Execute a Hadoop/Stratosphere job
5.
Create an execuTon script (Stratosphere):
! #!/bin/bash #$ -N strato_run ! #$ -pe stratosphere 24 ! #$ -j y ! #$ -o output.$JOB_ID ! #$ -l h_rt=00:10:00,hadoop=true,excl=true ! #$ -cwd ! #$ -q hadoop.q! ! export PATH=$PATH:'/cm/shared/apps/hadoop/current/conf/' ! export STRATOSPHERE_HOME='/cm/shared/apps/stratosphere/current’! MASTER=`cat /home/guests/clustervision/current/masters`! !
hadoop --config /home/guests/clustervision/current/ dfs -copyFromLocal /home/guests/ clustervision/tmp /input!
!
$STRATOSPHERE_HOME/bin/pact-client.sh run -j myjob.jar -a 2 hdfs://$MASTER:50040/input !
hdfs://$MASTER:50040/output! !
hadoop --config /home/guests/clustervision/current/ fs -get /output!
Execute a Hadoop/Stratosphere job
5.
Create an execuTon script (Stratosphere)
STRATOSPHERE COMMANDS
•
Copy input files into HDFS
!
hadoop config /home/guests/clustervision/current/ dfs
-copyFromLocal /home/guests/clustervision/tmp /input
!
!
•
Run Stratosphere tasks
$STRATOSPHERE_HOME/bin/pact-client.sh run -j /pathToJob/myjob.jar
-a 2
hdfs://$MASTER:50040/input hdfs://$MASTER:50040/output
!
!
•
Copy output files from HDFS
hadoop --config /home/guests/clustervision/current/ fs -get
!
/output
!
!
Execute a Hadoop/Stratosphere job
6.
Submission of a job
•
To submit, execute:
$qsub script.qsub
•
Aher submission, you can see the state of execu@on with the
command:
$ qstat
!
!
job-ID prior name user state submit/start at queue slots ja-task-ID !
---!
159048 0.60500 strato_run clustervisio r 10/15/2013 23:17:59 hadoop.q@node011.cm.cluster 24 !