How To Run A Tompouce Cluster On An Ipra (Inria) (Sun) 2 (Sun Geserade) (Sun-Ge) 2/5.2 (

22 

Loading....

Loading....

Loading....

Loading....

Loading....

Full text

(1)

Running  Hadoop  and  Stratosphere  

jobs  on  TomPouce  cluster  

 

 

(2)

TomPouce  cluster  

TomPouce  is  a  cluster  of  20  calcula@on  nodes  =  240  

cores  

Located  in  the  Inria  Turing  building  (École  

Polytechnique)  

Used  jointly  by  Inria  teams    

Jobs  are  run  with  the  help  of  a  scheduler  SGE  (Sun  

(3)

TomPouce  cluster  

SPECIFICATIONS:

 

Calcula@on:  

20  nodes  >  bi-­‐processors  >  6  cores.  Total:  240  cores  

48  Gb  Ram  per  node  

Local  space  400  Gb  

Storage:  

Dell  R510  /home  19  Tb  NFS  

Dell  R710  x2  /scratch  37  Tb  FHGFS  (Fraunhofer  FS)  

Network:  

Switch  Dell  5548  

Switch  infiniband  Mellanox  InfiniScale  IV  QDR  

(4)
(5)

Execute  a  Hadoop/Stratosphere  job  

1.

Copy  your  job  from  the  local  machine  to  the  cluster  front  node  

 

$ scp myjob.jar inria_username@

195.83.212.209

:~/

!

 

 

myjob.jar

 will  be  copied  in  the  folder  

/home/leo/inria_username

.

 

(6)

Execute  a  Hadoop/Stratosphere  job  

2.

Connect  via  ssh  to  front  node  

 

!

$ ssh inria_username@

195.83.212.209

!

 

!

Welcome to Bright Cluster Manager 6.0 Based on Scientific

!Linux release 6 Cluster

!Manager ID: #120054!

!Use the following

!commands to adjust your environment:!

!'module avail '        - show available modules !

!'module add <module> '     - adds a module to your environment for this session !

!'module initadd <module> ' - configure module to be loaded at every login !

 

IMPORTANT

:  To  connect  to  the  cluster,  you  ssh  key  should  be  stored  in  the  

Inria  LDAP.  If  not,  send  an  e-­‐mail  with  your  public  ssh  key  to:  

(7)

Execute  a  Hadoop/Stratosphere  job  

3.

Log  in  as  clustervision  superuser  using  your  LDAP  password  

 

!

$ sudo su - clustervision

!

 

 

-­‐  To  execute  Hadoop  and  Stratosphere  jobs  and  edit  configura@ons  needed.  

 -­‐  If  you  don’t  have  enough  permissions,  ask  for  them  to:  

helpmi-­‐saclay@inria.fr

   

(8)

Execute  a  Hadoop/Stratosphere  job  

4.

Add  Hadoop/Stratosphere  environment  to  your  session  

 

To  add  Hadoop  environment,  type:

!!

$module add hadoop/1.1.1

   

To  add  Stratosphere  environment,  type:  

$module add stratosphere/stratosphere

   

 

 

-­‐  

To  add  an  environment  automa@cally  when  you  login:  

 

$module initadd hadoop/1.1.1

   

 

 -­‐  

To  check  all  the  environments  loaded:  

!

$ module list

!

!Currently Loaded Modulefiles:!

!1) gcc/4.7.0 2) intel-cluster-checker/1.8 3) stratosphere/stratosphere-0.2.1 ! !4) sge/2011.11 5) openmpi/gcc/64/1.4.5 6) gromacs/openmpi/gcc/64/4.0.7 ! !7) hadoop/1.1.1!

(9)

Execute  a  Hadoop/Stratosphere  job  

4.

Add  Hadoop/Stratosphere  environment  to  your  session  

 

• 

Hadoop  installa@on:  

/cm/shared/apps/hadoop/current/  

• 

Stratosphere  installa@on:  

/cm/shared/apps/stratosphere/current/  

(10)

Execute  a  Hadoop/Stratosphere  job  

5.

Create  an  execuTon  script  (Hadoop)  

!

#!/bin/bash

!

#$ -N hadoop_run

!

#$ -pe hadoop 12

!

#$ -j y

!

#$ -o output.$JOB_ID

!

#$ -l h_rt=00:10:00,hadoop=true,excl=true

!

#$ -cwd

!

#$ -q hadoop.q

!

!

#Copy the input files into the HDFS filesystem

!

hadoop --config /home/guests/clustervision/current/ dfs -copyFromLocal

!

/home/guests/clustervision/tmp /input

!

!

#Running the hadoop task(s) here. I am specifying the jar, class, run parameters:

hadoop --config /home/guests/clustervision/current/ jar myjob.jar

org.myorg.job /input /output

!

!

# Copying the output files from the HDFS filesystem

!

hadoop --config /home/guests/clustervision/current/ fs –get /output

!

(11)

Execute  a  Hadoop/Stratosphere  job  

5.

Create  an  execuTon  script  (Hadoop)  

!

#!/bin/bash

!

#$ -N hadoop_run

!

#$ -pe hadoop 12

!

#$ -j y

!

#$ -o output.$JOB_ID

!

#$ -l h_rt=00:10:00,hadoop=true,excl=true

!

#$ -cwd

!

#$ -q hadoop.q

!

!

#Copy the input files into the HDFS filesystem

!

hadoop --config /home/guests/clustervision/current/ dfs -copyFromLocal

!

/home/guests/clustervision/tmp /input

!

!

#Running the hadoop task(s) here. I am specifying the jar, class, run parameters:

hadoop --config /home/guests/clustervision/current/ jar myjob.jar

org.myorg.job /input /output

!

!

# Copying the output files from the HDFS filesystem

!

hadoop --config /home/guests/clustervision/current/ fs –get /output

!

! !

(12)

Execute  a  Hadoop/Stratosphere  job  

SGE  execuTon  parameters:  

Should  be  wrigen  aher  ‘#$’  at  the  beginning  of  the  

script.  

-­‐N  <job_name>  .  Used  to  give  a  name  to  the  job  to  

run.

 

-­‐pe    <environment>  N  .  Specifies  the  environment.  N  

is  the  number  of  cores  (limited  to  180).    

-­‐j  y  :  to  use  the  same  output  file  (errors  and  standard  

exit).    

(13)

Execute  a  Hadoop/Stratosphere  job  

SGE  execuTon  parameters:  

-­‐o  output.$JOB_ID:  the  standard  output  will  be  in  a  

file  name  ouput.$JOB_ID.  $JOB_ID  will  be  the  

number  SGE  will  assign  automa@cally  to  our  job.  

-­‐l  name=value.  Used  to  demand  a  resource.  In  this  

case:    

h_rt=00:10:00  

indicates  that  the  job  should  be  killed  aher  

10  minutes  

hadoop=true  

indicates  that  the  job  to  run  is  a  Hadoop  job  

(it  DOES  NOT  CHANGE  for  Stratosphere  jobs)  

excl=true  

indicates  that  it  is  executed  exclusively    

 

!

!

(14)

Execute  a  Hadoop/Stratosphere  job  

5.

Create  an  execuTon  script  (Hadoop)  

 

HADOOP  COMMANDS  

 

Copy  input  files  into  HDFS

!

hadoop --config /home/guests/clustervision/current/ dfs -copyFromLocal !

/home/guests/clustervision/tmp /input!

!

Run  Hadoop  tasks  

hadoop --config /home/guests/clustervision/current/ jar !

/pathToJob/myjob.jar org.myorg.job /input /output!

!

Copy  output    files  from  HDFS  

hadoop --config /home/guests/clustervision/current/ fs –get /output

!

(15)

Execute  a  Hadoop/Stratosphere  job  

5.

Create  an  execuTon  script  (Hadoop)  

 

HADOOP  COMMANDS  

 

Copy  input  files  into  HDFS

!

hadoop

--config /home/guests/clustervision/current/

dfs -copyFromLocal !

/home/guests/clustervision/tmp /input!

!

Run  Hadoop  tasks  

hadoop

--config /home/guests/clustervision/current/

jar !

/pathToJob/myjob.jar org.myorg.job /input /output!

!

Copy  output    files  from  HDFS  

hadoop

--config /home/guests/clustervision/current/

fs –get /output

!

!

(16)

Execute  a  Hadoop/Stratosphere  job  

5.

Create  an  execuTon  script  (Stratosphere):  

! #!/bin/bash   #$ -N strato_run ! #$ -pe stratosphere 24 ! #$ -j y ! #$ -o output.$JOB_ID ! #$ -l h_rt=00:10:00,hadoop=true,excl=true ! #$ -cwd ! #$ -q hadoop.q!  ! export PATH=$PATH:'/cm/shared/apps/hadoop/current/conf/' ! export STRATOSPHERE_HOME='/cm/shared/apps/stratosphere/current’! MASTER=`cat /home/guests/clustervision/current/masters`! !

hadoop --config /home/guests/clustervision/current/ dfs -copyFromLocal /home/guests/ clustervision/tmp /var/hadoop/dfs.name.dir!

 !

$STRATOSPHERE_HOME/bin/pact-client.sh run -j myjob.jar -a 2 hdfs://$MASTER:50040/var/hadoop/ dfs.name.dir/inputFile hdfs://$MASTER:50040/var/hadoop/dfs.name.dir/outputFile!

 !

hadoop --config /home/guests/clustervision/current/ fs -get /var/hadoop/dfs.name.dir/output!

(17)

Execute  a  Hadoop/Stratosphere  job  

5.

Create  an  execuTon  script  (Stratosphere):  

! #!/bin/bash   #$ -N strato_run ! #$ -pe stratosphere 24 ! #$ -j y ! #$ -o output.$JOB_ID ! #$ -l h_rt=00:10:00,hadoop=true,excl=true ! #$ -cwd ! #$ -q hadoop.q!  ! export PATH=$PATH:'/cm/shared/apps/hadoop/current/conf/' ! export STRATOSPHERE_HOME='/cm/shared/apps/stratosphere/current’! MASTER=`cat /home/guests/clustervision/current/masters`! !

hadoop --config /home/guests/clustervision/current/ dfs -copyFromLocal /home/guests/ clustervision/tmp /input!

 !

$STRATOSPHERE_HOME/bin/pact-client.sh run -j myjob.jar -a 2 hdfs://$MASTER:50040/input !

hdfs://$MASTER:50040/output!  !

hadoop --config /home/guests/clustervision/current/ fs -get /output!

!

 

 

(18)

Execute  a  Hadoop/Stratosphere  job  

5.

Create  an  execuTon  script  (Stratosphere):  

! #!/bin/bash   #$ -N strato_run ! #$ -pe stratosphere 24 ! #$ -j y ! #$ -o output.$JOB_ID ! #$ -l h_rt=00:10:00,hadoop=true,excl=true ! #$ -cwd ! #$ -q hadoop.q!  ! export PATH=$PATH:'/cm/shared/apps/hadoop/current/conf/' ! export STRATOSPHERE_HOME='/cm/shared/apps/stratosphere/current’! MASTER=`cat /home/guests/clustervision/current/masters`! !

hadoop --config /home/guests/clustervision/current/ dfs -copyFromLocal /home/guests/ clustervision/tmp /input!

 !

$STRATOSPHERE_HOME/bin/pact-client.sh run -j myjob.jar -a 2 hdfs://$MASTER:50040/input !

hdfs://$MASTER:50040/output!  !

hadoop --config /home/guests/clustervision/current/ fs -get /output!

(19)

Execute  a  Hadoop/Stratosphere  job  

5.

Create  an  execuTon  script  (Stratosphere)  

 

STRATOSPHERE  COMMANDS  

 

Copy  input  files  into  HDFS

!

hadoop config /home/guests/clustervision/current/ dfs

-copyFromLocal /home/guests/clustervision/tmp /input

!

!

Run  Stratosphere  tasks  

$STRATOSPHERE_HOME/bin/pact-client.sh run -j /pathToJob/myjob.jar

-a 2

hdfs://$MASTER:50040/input hdfs://$MASTER:50040/output

!

!

Copy  output    files  from  HDFS  

hadoop --config /home/guests/clustervision/current/ fs -get

!

/output

!

!

(20)

Execute  a  Hadoop/Stratosphere  job  

6.

Submission  of  a  job    

To  submit,  execute:  

$qsub script.qsub

   

Aher  submission,  you  can  see  the  state  of  execu@on  with  the  

command:  

$ qstat

!

!

job-ID prior name user state submit/start at queue slots ja-task-ID !

---!

159048 0.60500 strato_run clustervisio r 10/15/2013 23:17:59 hadoop.q@node011.cm.cluster 24 !

 

(21)

Execute  a  Hadoop/Stratosphere  job  

6.

Submission  of  a  job    

• 

Or  if  you  want  a  more  detailed  informa@on:  

$qstat –t

!

   

 

(22)

Execute  a  Hadoop/Stratosphere  job  

7.

Logs  

 

• 

/home/guests/clustervision/output.$JOB_ID

:  

Output  of  

the  job  execu@on  in  SGE  

• 

/home/guests/clustervision/config.$JOB_ID/logs

:  

Logs  of  Hadoop  file  system.  

 

Figure

Updating...

References

Updating...

Related subjects :