Get the
hardware
:
*
(Raspberry
Pi
x as many as you like), to keep
things simple we are going to use only 3. You
should have at least one
Pi
(Master) with
keyboard, mouse and monitor.
You can create an Open MPI + Open MP
cluster using regular x86 or x64 machines
using the same procedure as well.
A network router with at least 4 ports. (3
Pi
’s on 3
ports and an extra port if you extend the cluster by
adding more switches or hubs.
A network router with at least 4 ports. (3
P
i’s on 3
ports and an extra port if you want to extend the
cluster by adding more switches or hubs.
*
Setup the first Pi by installing the Raspbian image.
(http://www.raspberrypi.org/documentation/installation/installing-images/)
1.
Start the image and login in to the Pi. Type in “
sudo
raspi-config
” . This is start the configuration screen. Go in to advanced
options and set
hostname
to
Voltaire-1.
This is going to be our
Master Node.
Go to youtube:
Play https://www.youtube.com/watch?v=LTP9FUIt0Egor https://www.youtube.com/watch?v=h3cE9iXIx9c Install OpenMPI by typing this in to the terminal: sudo apt-get install openmpi-bin openmpi-dev
*
Cloning
Now, since you have setup everything which was needed. Lets see if it runs.
2.
go the terminal and type in “
mpiexec -f machinefile –n 1 hostname”
it should display the systems hostname, which is “
VOLTAIRE-1
” .
3. Create a directory on Desktop and name is “Parallel”. Create a new file
called mpi1.cpp and write your MPI code on it.
See the last slide for the code to our cpp file.
4. Clone the memory cards and change their Hostname as such :
“
VOLTAIRE-2
,
VOLTAIRE-3
” .
For cloning you can either use
dd
command on linux or
wind32diskimager
on Windows.
Copy Image on Linux: sudo dd bs=4M if=/dev/mmcblk0 of=~/Desktop/voltair-1.img Write Image on Linux: sudo dd if=~/Desktop/voltair-1.img of=/dev/mmcblk0 bs=4M
5. A command will be executed using mpiexec which will run on all the
nodes. The nodes will run mpi1.out which is the outfile. Therefore it must
be present in the same location on all the nodes. We are going to compile
the source code over SSH (tunnel) by logging in to
VOLTAIRE-2
,
*
6. When the Master will invoke the Two Slaves over the Ethernet network, it needs the to login on the remote slaves to execute the mpiexec command. Therefore we must create a way for the Master to access the Slaves without login.
Login to the router and find all the attached devices and their IP given by the DHCP service of the router. You can also set it to static IP etc. etc.
Type this in to the terminal of the Master node to allow for passwordless login from the master. ssh-keygen -t rsacat ~/.ssh/id_rsa.pub | ssh [email protected] "mkdir .ssh;cat >> .ssh/authorized_keys“ you may have to type in “yes”, then if it asks for password, the default for raspbian image is “raspberry” for the login “pi”
*
7. on the master open terminal: cd Desktop/Parallel
mpic++ -o mpi1.out mpi1.cpp
8. on the master open terminal to access the slave and do the same(compile): ssh [email protected]
9. lets run the program from the master, on the terminal type: mpiexec –n 2 –host 192.168.2.3,192.168.2.4 mpi1.out
Master Slave executable
*
11. If all goes well you will see the code running:
In our code we have used SEND and RECEIVE. The two slaves sends the master their “hostname” as array of characters of size 100. Master receives them and display them. Simple. See our code on the last slide for the details.
The return of the message has not been synchronized. But it does return them. The master closes the program as void Finalize() is called.
Now changing code and making sure its on all nodes is a tedious task if you have say 64 nodes. Therefore you can either use NFS or FTP with a script and another MPI program to get the source and compile it. NFS Share Executable : http://stackoverflow.com/questions/25829684/how-to-avoid-copying-executable-from-master-node-to-slaves-in-mpilibs
*
We suggest you also look at OpenMP on the bellow slides as it will allow for proper utilization of the target slaves.
Memory
Slave 1Core 1
Core 3
Core 2
Core 4
Memory
Slave 2Core 1
Core 3
Core 2
Core 4
Memory
Slave 3Core 1
Core 3
Core 2
Core 4
MPI MPI MPI MPI MP MP MPMemory
MasterCore 1
Core 3
Core 2
Core 4
MPWhat is OpenMP (Open Multi Processing)
It is a defacto standard
API
for writing shared memory parallel applications in C, C++ and Fortran.
OpenMP is managed by the
nonprofit
technology
consortium
OpenMP Architecture Review
Board (or OpenMP ARB), jointly defined by a group of major computer hardware and software
vendors, including
AMD
,
IBM
,
Intel
,
Cray
,
HP
,
Fujitsu
,
Nvidia
,
NEC
,
Red Hat
,
Texas
Instruments
,
Oracle Corporation
, and more.
[1]OpenMP uses a
portable
, scalable model that gives
programmers
a simple and flexible interface for
developing parallel applications for platforms ranging from the standard
desktop computer
to
the
supercomputer
.
*
OpenMP
Compiler Directives
Runtime Subroutines
Environment variables
Directive pragma. It is alanguage constructthat specifies how acompiler(orassemblerorinterpreter) should process its input.
Older Generation
Processors
Where only one processor was used and it had One
Core
.
*
Core
Memory
Current Generation
Processors
Today, multiple
Cores
are present on the same
Processor.
Core 1
Memory
Processor ProcessorCore 2
Sequential Program
Programs written were sequential in nature and
it utilized only 1
Core
. Even if multiple were
available. But we want to use all of the cores.
*
Core
Memory
ProcessorInstructions
Core 1
Memory
ProcessorInstructions
Core 2
Core 3
Core 4
*
Instructions
FORK
OpenMP programs start with a single thread; the master thread
At start of parallel region master creates team of parallel ”worker” threads (FORK)
Thread 0
(master) Thread 1 Thread 2 Thread 3
JOIN
Statements in parallel block are executed in parallel by every thread
At end of parallel region, all threads
synchronize, and join master thread (JOIN)
What are threads, cores, and how do they relate?
Thread is independent sequence of execution of program code. Block of code with one entry and one exit. OpenMP threads are mapped onto physical cores. It is possible to map more than 1 thread on a core.
Every program consists of two parts:
•
Sequential part
The compiler is available for free from
http://openmp.org/wp/openmp-compilers/
OpenMP v4.0 specification
(July2013) includes the library
“libgomp” in GNU compilers (C++,C etc.).
The manual can be found at:
https://gcc.gnu.org/onlinedocs/libgomp/
#include <iostream>
#include ”omp.h” // inclusion of the OPEN MP header files
Using namespace std;
int main() {
#pragma omp parallel
{ // start of clause
cout << ”Hello World”<<endl; } // end of clause
return 0; }
*
int main() {
int threads = 100; int id = 100;
cout <<”Viewing Thread Number: ”,id << ” Of”; cout << threads <<endl;
return 0; }
*
OpenMP can control the number of threads used. It can be set using the following
Environmental variable OMP_NUM_THREADS
Runtime function omp_set_num_threads(n)
To get information about threads:
•
Runtime function omp_get_num_threads()•
Returns number of threads in parallel region•
Returns 1 if called outside parallel region•
Runtime function omp_get_thread_num()•
Returns id of thread in team•
Value between [0,n-1] // where n = #threads•
Master thread always has id 0Environment Variables
To activate the OpenMP extensions for C/C++, the compile-time flag
-fopenmp must be specified
Example : g++ -fopenmp -o hello.x hello.cpp
Invoke
compiler Flag
Executable
*
OpenMPI C++ Test Code
#include <iostream> #include <ctime> #include <mpi.h> using namespace std; int main(){ MPI :: Init();int process =MPI::COMM_WORLD.Get_size(); int rank =MPI::COMM_WORLD.Get_rank(); char host[100]; char displayhost[100]; gethostname(host,100); cout<<"Hostname: "<<host<<endl; cout<<"Process : "<<process<<endl; cout<<"Rank : "<<rank<<endl; cout<<" + "<<endl; if (rank==1) {
MPI :: COMM_WORLD.Send (&host, 100, MPI::CHAR , 0, 0); }
if (rank==2) {
MPI :: COMM_WORLD.Send (&host, 100, MPI::CHAR , 0, 0); }
if (rank==0) {
MPI :: COMM_WORLD.Recv (&displayhost,100,MPI::CHAR,1,0); // recieve from Rank 1 cout<<"Recieved Hostname: "<<displayhost<<endl;
cout<<"--- "<<endl;
MPI :: COMM_WORLD.Recv (&displayhost,100,MPI::CHAR,2,0); // recieve from Rank 2 cout<<"Recieved Hostname: "<<displayhost<<endl;
cout<<"--- "<<endl; }
void Finalize(); // or use MPI::Finalize(); return 0;