• No results found

Performance of RDMA-Capable Storage Protocols on Wide-Area Network

N/A
N/A
Protected

Academic year: 2021

Share "Performance of RDMA-Capable Storage Protocols on Wide-Area Network"

Copied!
16
0
0

Loading.... (view fulltext now)

Full text

(1)

Performance of RDMA-Capable Storage

Performance of RDMA-Capable Storage

Protocols on Wide-Area Network

Protocols on Wide-Area Network

Weikuan Yu

Weikuan Yu

Nageswara S.V. Rao

Nageswara S.V. Rao

Pete Wyckoff*

Pete Wyckoff*

Jeffrey S. Vetter

Jeffrey S. Vetter

Ohio

(2)

InfiniBand Clusters around the World

InfiniBand Clusters around the World

Ranger (US)

SGI (US) CEA (France)

Tsubame (Japan) Dawning (China)

(3)

The Problem of Computing Islands

The Problem of Computing Islands

Islands of InfiniBand (IB) clusters

Islands of InfiniBand (IB) clusters

More IB clusters are deployed

More IB clusters are deployed

Some already connected, e.g. through

Some already connected, e.g. through

TeraGrid

TeraGrid

• But only via TCP/IP protocolsBut only via TCP/IP protocols

Data transfer across these islands

Data transfer across these islands

Need ever-greater data movement capabilities.

Need ever-greater data movement capabilities.

GridFTP, BBCP or other special storage configuration

GridFTP, BBCP or other special storage configuration

TCP performance on Long Distance can be low

TCP performance on Long Distance can be low

• With 10GigE on UltraScience Net (no tuning)With 10GigE on UltraScience Net (no tuning)

– – 9.2 Gbps at 0.2 mile9.2 Gbps at 0.2 mile – – 8.2 Gbps at 1400 miles8.2 Gbps at 1400 miles – – 2.3-2.5 Gbps at 6600+ miles2.3-2.5 Gbps at 6600+ miles

(4)

RDMA (IB) in Clusters and Local Area Networks

RDMA (IB) in Clusters and Local Area Networks

Sub-microsecond latency

Sub-microsecond latency

Superb bandwidth (32Gbps with IB QDR)

Superb bandwidth (32Gbps with IB QDR)

Heavily used for clustering

Heavily used for clustering

Getting popular in storage environment

Getting popular in storage environment

– NFS over RDMA (NFS over RDMA (NFSoRDMANFSoRDMA)) –

– SCSI RDMA Protocol (SRP)SCSI RDMA Protocol (SRP) –

– iSCSI over RDMA (iSCSI over RDMA (iSERiSER))

InfiniBand HCA Applications Verbs InfiniBand HCA MPI NFS/iSERI/SRP Applications Verbs MPI NFS/iSERI/SRP 1µsec

(5)

Sample Performance of RDMA-based Storage

Sample Performance of RDMA-based Storage

RDMA
enables
good
iSCSI
bandwidth
within
LAN

RDMA
enables
good
iSCSI
bandwidth
within
LAN

(6)

Feasibility of RDMA (IB) on WAN

Feasibility of RDMA (IB) on WAN

• Long-range Extensions for InfiniBand availableLong-range Extensions for InfiniBand available

– Network Equipment Technologies (NET): NX5010Network Equipment Technologies (NET): NX5010 –

– Obsidian Research: LongbowObsidian Research: Longbow •

• Long latency (10Long latency (1044~10~1055µsec)µsec)

• High bandwidth yet feasibleHigh bandwidth yet feasible

– Good Good distance scalability and tolerance to interfering trafficdistance scalability and tolerance to interfering traffic –

– Good network throughput and MPI-level PerformanceGood network throughput and MPI-level Performance

• Can RDMA provide a good transport protocol for storage on WAN?Can RDMA provide a good transport protocol for storage on WAN?

InfiniBand HCA Applications Verbs InfiniBand HCA MPI NFS/iSERI/SRP Applications Verbs MPI NFS/iSERI/SRP 104~105µsec

(7)

Experimental Environment

Experimental Environment

• HardwareHardware

– Long-range IB extension devices from NET (Network EquipmentLong-range IB extension devices from NET (Network Equipment Technologies, Inc)

Technologies, Inc) –

– Mellanox PCI-Express 4x DDR Mellanox PCI-Express 4x DDR HCAs HCAs (InfiniHost-III and Connect-X)(InfiniHost-III and Connect-X)

• Software PackagesSoftware Packages

– OFED-1.3 from OFED-1.3 from openfabricsopenfabrics.org.org

– Linux-2.6.25 with Linux-2.6.25 with NFSoRDMA NFSoRDMA and and iSER iSER supportsupport •

• Performance of RDMA-based Storage Protocols on WANPerformance of RDMA-based Storage Protocols on WAN

– NFS over RDMANFS over RDMA –

(8)

UltraScience Net at ORNL

UltraScience Net at ORNL

• Experimental WAN NetworkExperimental WAN Network

– Oak Ridge, Oak Ridge, Atlanta, Chicago, Seattle, and SunnyvaleAtlanta, Chicago, Seattle, and Sunnyvale –

– OC192 backbone connectionsOC192 backbone connections –

(9)

RDMA-based Transport

RDMA-based Transport

• • Request
and
request
becomes
pure
control
messages,Request
and
request
becomes
pure
control
messages, and
have
to
travel
long
distance
on
WAN and
have
to
travel
long
distance
on
WAN • • Use
of
RDMA
read
(round‐trip
operations)
for
clients
to
write
dataUse
of
RDMA
read
(round‐trip
operations)
for
clients
to
write
data • • Possible
additional
control
messages
for
NFSoRDMA
for
long
argumentsPossible
additional
control
messages
for
NFSoRDMA
for
long
arguments • • Further
fragmentation
due
to
the
use
of
page‐based
operationsFurther
fragmentation
due
to
the
use
of
page‐based
operations

(10)

RDMA on WAN

RDMA on WAN

• • RDMA
has
good
network‐level
performance
within
short
distance
WANRDMA
has
good
network‐level
performance
within
short
distance
WAN • • High
bandwidth
at
long
distance
is
only
possible
for
large
messagesHigh
bandwidth
at
long
distance
is
only
possible
for
large
messages • • Low
RDMA‐read
performance
for
page‐based
messages
(4KB),
even
atLow
RDMA‐read
performance
for
page‐based
messages
(4KB),
even
at 0.2
mile
when
using
InfiniHost‐III

(11)

NFS over RDMA

NFS over RDMA

• NFS
over
RDMA
achieves
goodNFS
over
RDMA
achieves
good

bandwidth
within
short
distancebandwidth
within
short
distance

(12)

NFS - Large block size

NFS - Large block size

• • NFS
over
IPoIB‐CM
benefits
from
large
block
sizeNFS
over
IPoIB‐CM
benefits
from
large
block
size • • NFS
over
RDMA
needs
to
support
large
block
size
for
better
fitNFS
over
RDMA
needs
to
support
large
block
size
for
better
fit on
long‐distance
WAN on
long‐distance
WAN

(13)

NFS over RDMA - using Connect-X

NFS over RDMA - using Connect-X

• Better
RDMA
read
in
connect‐X
improvesBetter
RDMA
read
in
connect‐X
improves

the
performancethe
performance

of
file
write
for
NFS
over
RDMA

of
file
write
for
NFS
over
RDMA

(14)

iSCSI over RDMA (

iSCSI over RDMA (

iSER

iSER

)

)

• • RDMA
enables
high‐performance
iSCSI
within
short
distanceRDMA
enables
high‐performance
iSCSI
within
short
distance • • RDMA
has
good
promise
over
long
distance
as
shown
with
largeRDMA
has
good
promise
over
long
distance
as
shown
with
large messages messages

(15)

Perspectives

Perspectives

Long-range InfiniBand

Long-range InfiniBand

InfiniBand over SONET is promising

InfiniBand over SONET is promising

Storage protocols are not yet exploiting the bandwidth

Storage protocols are not yet exploiting the bandwidth

potential of RDMA at long distance

potential of RDMA at long distance

RDMA-based Storage on WAN

RDMA-based Storage on WAN

Need to enable large block sizes

Need to enable large block sizes

Need to avoid page-based RDMA operations in NFS

Need to avoid page-based RDMA operations in NFS

• Utilize IB FRMR support to avoidUtilize IB FRMR support to avoid small RDMA operationssmall RDMA operations

(16)

Acknowledgment

Acknowledgment

Network Equipment Technologies, Inc

Network Equipment Technologies, Inc

– Andrew Andrew DiSilvestreDiSilvestre –

– Rich Rich EriksonErikson –

– Brad Brad ChalkerChalker

ORNL

ORNL

– Susan HicksSusan Hicks –

– Philip RothPhilip Roth

References

Related documents

In addition, retailers that were locally-owned were more likely to offer both larger craft sections, and fewer brands with ownership ties to big brewers.. Below I provide

In this case, when assessed against a wide range of sustainability criteria from a perspective that values those criteria, HS2 Phase I turned out to be the least

single-object read only transactions with a single RDMA read Transaction commit protocol optimized for RDMA operations. read validation during commit using a single

International protocols for blood pressure measuring device validation have been released and the European Society of Hypertension Working Group on Blood Pressure Measurement [11],

In Creating Your First Deck, you added notes using the Basic note type to create multiple cards with front and back fields—this is not to be confused with the front and back

05-20-2015 CIPANP 2015 Page 14 ATLAS: Differential cross sections: Boosted top.. • 20.3fb -1 at 8TeV

Active Directory Domain Services (ADDS) เป็น Roles บน Windows Server ที่มีไว้เพื่อใช้งานเกี่ยวกับการ ระบุตัวตน (ช่วย

• Storage options viable for Hadoop on Gordon • SSD via iSCSI Extensions for RDMA (iSER).. • Lustre filesystem (Data Oasis),