Evaluation of Java Message
Passing in High Performance
Data Analytics
Overview
•
Performance of MPI Kernel Operations
•
Implementations based on Ohio MicroBenchmark suite
•
Evaluates MPI allreduce, and send and receive
•
Performance of Deterministic Annealing Vector Sponge
•
Performance with pure MPI and MPI + threads
•
Threads come from Habanero Java library
•
Terms
•
OMB – Ohio MicroBenchmark suite
•
DAVS – Deterministic Annealing Vector Sponge
•
OMPI-trunk – OpenMPI source tree revision 30301
•
OMPI-nightly – OpenMPI nightly snapshop verison 1.9a1r28881
Performance of MPI Kernel Operations
Message size (bytes)
0B 2B 8B 32B 128B 512B 2KB 8KB 32KB 128KB 512KB
Average
time
(us)
1 10 100 1000 10000
MPI.NET C# in Tempest FastMPJ Java in FG OMPI-nightly Java FG OMPI-trunk Java FG OMPI-trunk C FG
Performance of MPI send and receive operations
Message size (bytes)
4B 16B 64B 256B 1KB 4KB 16KB 64KB 256KB 1MB 4MB
Average
time
(us)
10 100 1000 10000 100000
MPI.NET C# in Tempest FastMPJ Java in FG OMPI-nightly Java FG OMPI-trunk Java FG OMPI-trunk C FG
Performance of MPI allreduce operation
Message Size (bytes)
4B 16B 64B 256B 1KB 4KB 16KB 64KB 256KB 1MB 4MB
Average
Time
(us)
1 10 100 1000 10000 100000 1000000
OMPI-trunk C Madrid OMPI-trunk Java Madrid OMPI-trunk C FG OMPI-trunk Java FG
Message Size (bytes)
0B 2B 8B 32B 128B 512B 2KB 8KB 32KB 128KB 512KB
Average
Time
(us)
1 10 100 1000 10000
OMPI-trunk C Madrid OMPI-trunk Java Madrid OMPI-trunk C FG OMPI-trunk Java FG
Performance of MPI send and receive on
DAVS Performance
DAVS Charge5 performance
DAVS Charge5 speedup
DAVS Charge2 performance
DAVS Charge2 speedup
TxPxN
1x1x1 1x1x2 1x2x1 1x1x4 1x4x1 1x1x8 1x2x4 1x4x2 1x8x1
Time
(hours)
0 0.2 0.4 0.6 0.8 1 1.2 MPI.NET OMPI-nightly OMPI-trunkTxPxN
1x1x1 1x1x2 1x2x1 1x1x4 1x4x1 1x1x8 1x2x4 1x4x2 1x8x1
Speedup
1 1.5 2 2.5 3 3.54 4.5 5 5.5 6 MPI.NET OMPI-nightly OMPI-trunkTxPxN
1x1x1 1x1x2 1x2x1 1x1x4 1x4x1 1x1x8 1x2x4 1x4x2
Time
(hours)
0 5 10 15 20 25 30 MPI.NET OMPI-nightly OMPI-trunkTxPxN
1x1x1 1x1x2 1x2x1 1x1x4 1x4x1 1x1x8 1x2x4 1x4x2
Speedup
1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 MPI.NET OMPI-nightly OMPI-trunkTxPxN
2x1x8 4x1x8 8x1x8 1x2x8 4x2x8 1x4x8 2x4x8 1x8x8
Time
(hours)
0 0.51 1.52 2.53 3.544.55 MPI.NET OMPI-nightly OMPI-trunk
DAVS Charge5 performance w/ threads
TxPxN
2x1x8 4x1x8 8x1x8 1x2x8 4x2x8 1x4x8 2x4x8
Time
(hours)
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 MPI.NET OMPI-nightly OMPI-trunkDAVS Performance on Single Node
DAVS Charge2 performance on
single node
DAVS Charge6 performance on single
node
single node with multiple processes
DAVS Charge6 performance on
TxPxN
1x1x1Time
(hours)
0.00 5.00 10.00 15.00 20.00 25.00 30.00 35.00
OMPI-trunk Madrid OMPI-trunk FG MPI.NET Tempest
TxPxN
1x1x1Time
(s)
0 50 100 150 200
OMPI-trunk Madrid OMPI-trunk FG MPI.NET Tempest
TxPxN
1x4x1Time
(s)
0 20 40 60 80 100 120 140