EFFICIENT BIg DATA TRANSFER - Networking for Big Data Chapman pdf

We present four methods for speedup of Big Data transfer: data segmentation for multipathing and parallel TCP streams (in section “Data Segmentation for Multipathing and Parallel TCP Streams”), multihop path splitting (in section “Multihop Path Splitting”), hardware-supported dynamical bandwidth control for reducing delay and packet loss rate (in section “Hardware-Supported Dynamic Bandwidth Control: Aspera FASP”) and BitTorrent, improving Big Data transfer speed by making the best use of the bandwidth

TCP connection 1 TCP connection 2

TCP connection 3 TCP connection 4

TCP connection 5

TCP connection 0 for control

Sender Receiver

resources available on the Internet (in section “Optimizing Bandwidth: BitTorrent”). The first three methods optimize throughputs and the last method increases bandwidths for improving Big Data transfer.

Data Segmentation for Multipathing and Parallel TCP Streams

Segmenting a Big Data block into small pieces is the basis for both multipathing and parallel TCP streams. Multiple streams may route data pieces through multiple paths to achieve different levels of parallelism; that is, one path may be used by more than one stream in different time.

Multipathing

Moving Big Data in and out of the data center becomes a challenge, since existing transport technology cannot fully utilize the end-to-end capacity provided by the underlying hardware platform, particularly over a wide area [18]. According to Aspera [19], leading web search companies, such as the Hadoop File system (HDFS), Google file system (GFS), and Amazon Dynamo, often adopt object storage architectures that organize file data and associated metadata as an “object.” They assume the file data is written into the storage system in small “chunks”—typically 64–128 MB and stored redundantly across many physical disks. For example, writing a 1 Terabyte file requires dividing it into more than 10,000 64 MB-chunks. So, the cloud storage systems use HTTP to PUT and GET each object chunk. So, multipathing employs multiple independent routes to simultaneously transfer disjoint chunks of a file to its destination for efficiency [20].

Parallel TCP Streams

Parallel TCP streams, for example, in bbFTP [21] and GridFTP [22], use multiple flows in parallel to address the problem of large file transfers [23]. The relationship among throughput, number of streams, packet loss rate, and RTT is critical to optimize the throughput of multiple concurrent transfers of files [24]. For example, the optimal number of concurrent threads at runtime may speedup data transfers, the packet loss, and delay may greatly impact the throughputs of FTPs and a mixed system using different FTPs (i.e., bbFTP for single big files and GridFTP for bulk small files) may satisfy various Big Data applications.

The bbFTP protocol [21], an open source file transfer software, implements its own transfer protocol optimizing for large files (larger than 2 GB) using parallel TCP streams [25]. The main strength of bbFTP is the ability to use Secure Shell (SSH) and certificate-based authentication, data compression on-the-fly, and customizable time-outs [26]. As shown in Figure 8.3a, the bbFTP protocol segments a Big Data file into n smaller subfiles and opens

m (m≤n) FTP connections depending on the number of streams required. Since m often is less than n, bbFTP reopens data connections for transfer of each subsequent file and thus it sends files of small or medium size at a low throughput. Also, bbFTP uses fixed TCP window size and tries to flexibly balance the stable settings for the number of parallel streams and TCP window sizes used for different applications; and the usage of fixed TCP window size may benefit high packet loss scenarios [27].

Speedup of Big Data Transfer on the Internet ◾ 145

The GridFTP protocol opens permanent data connections that can be reused to transfer multiple files as shown in Figure 8.3b, and thus allows achieving higher throughput with small files; for files bigger than what can be sent within one TCP window, GridFTP utilizes parallel transfer of a single file over several streams—a feature common with bbFTP [27]. So, GridFTP is highly suited for medium and large file transfer on WANs with no packet loss. But in case of packet loss on the network, GridFTP decreases throughputs.

Also, parallel TCP streams can be used to increase throughput by multiplexing and demultiplexing data blocks. But the side effect is that signiﬁcant data block reordering occurs due to differences in transmission rate between individual TCP streams when applications use the obvious round robin scheduling algorithm for multiplexing data blocks. This forces the demultiplexing receiver to buffer out-of-order data blocks, consum- ing memory and potentially causing the receiving application to stall [28]. So, a new adap- tive weighted scheduling approach is provided in Hacker et al. [28] for multiplexing data blocks over a set of parallel TCP streams. Compared with the scheduling approach used by GridFTP, it reduces reordering of data blocks between individual TCP streams, maintains the aggregate throughput gains of parallel TCP, and consumes less receiver memory for buffering out-of-order packets.

Multihop Path Splitting

Multihop path splitting [29] replaces a direct TCP connection between the source and destination by a multihop chain through some intermediate nodes. A split-TCP connection may perform better than a single end-to-end TCP connection. First, the RTT on each intermediate hop is shorter as compared to the direct end-to-end path. The congestion control mechanism of TCP would sense the maximum throughput quickly thereby attain- ing steady state, wherein it will give maximal possible throughput until a congestion event occurs. Second, any packet loss is not propagated all the way back to the source but only to the previous intermediate hop.

If the bandwidth on each of the intermediate hops is higher than the direct path, the overall throughput can be improved; an example is shown in section “One-to-All Broadcast of Big Data.” Based on pipelining parallelism scheduling in He et al. [30], Theorem 8.1 proves that multihop is more efficient than single hop.

(a)

(b)

Segment a Big Data file into n

smaller sub-files Use n sub-filesm standard FTP to transmit

A permanent TCP connection

FIgURE 8.3 A comparison of two FTPs: (a) bbFTP for A Big File and (b) GridFTP for Bulk Small Files.

Theorem 8.1

We assume that R (R≥ 1) Gigabytes data on the source node is split into k pieces with the same size and the greatest bandwidth is less than f ( f≪k) times of the smallest bandwidth. Using pipelining parallelism to move data, multihop (w≪k hops) is more efficient than single hop if the smallest bandwidth on multihop is greater than the bandwidth on single hop.

Proof

We analyze the pipelining time shown in Figure 8.4. Suppose the multihop path has w− 1 hops shown in Figure 8.4a, bandwidths sorted from low to high are b1, b2,…, bw, and thus

t1 =(1/b1)>t2 =(1/b2)>>tw =(1/bw). The pipelining time is

Tpipelining =kT + ∆T (8.1)

where T = (R/kb1) and ∆T =

(

R k/ Σiw=2(1/bi)

)

≤(w−1/kb2). Due to f≪k and w− 1 ≪k,

(w−1/( )(kb w2 −1) /f kb) 1 (1/b1), so kT≫ΔT. Thus,

Tpipelining ≈ kT = _bR

1 (8.2)

For single hop, if the bandwidth b<b1, then Tpipelining =(R b/ 1)<Tsinglehop =(R b/ ). This

proves Theorem 8.1.

hardware-Supported Dynamic Bandwidth Control: Aspera FASP

Aspera FASP [16] uses a hardware-supported dynamic bandwidth control to significantly improve throughput, compared to standard FTP and TCP transfers, to transfer large files whatever the impaired long delay links and high loss rates occur or not [31].

(a) (b) t1 t1 t2 t2 t1 tw tw tw–1 tw T1 T2 Tk T′ kT ΔT A1 A2 Aw bi bj

Speedup of Big Data Transfer on the Internet ◾ 147

The Mechanism of Aspera FASP

A limitation of the WAN transport is that the round-trip latency and packet loss are suf- ficient to limit the achievable throughput to <100 Mbps and over international WANs to limit the throughput to <10 Mbps at a WAN [19]. Aspera FASP provides a dynamic bandwidth control with hardware support, which manages priority and bandwidth control, sacrifices TCP’s coupling of rate control and reliability, and may bring congestions; and therefore achieves higher efficiency and bandwidth utilization.

Aspera’s FASP is limited only by the available network bandwidth and the hardware resources at both ends of the transfer, and it enables transfer speeds between 100 and 1000 times faster than standard TCP over the same conditions [16].

According to Munson [16], Aspera FASP achieves 10GBps+ transfers over global WAN through leveraging Intel®_Xeon®_{processor E5-2600 product systems and Aspera’s FASP}

transport. The support hardware components include

• “Intel®_{Data Direct I/O Technology (Intel® DDIO), which allows Intel® Ethernet con-}

trollers to route I/O traffic directly to the processor cache.”

• “Built-in support for Single-Root I/O Virtualization (SR-IOV), which allows virtual machine platforms to bypass the hypervisor in order to directly access resources on the physical network interface.”

The Performance of Aspera FASP

In transfer throughput test with FASP [31], when transferring larger files, the packet loss and delay do not impact the throughput very much; only high loss rates impact the throughput. Also, the bigger the data, the worse the delay and packet loss rate; for example, the packet loss is increased from 5% (50GB) to 10% (100 GB) and round-trip delay (RTD) also climbs from 120 ms (50 GB) to 240 ms (100 GB).

We present the throughput test of transferring a 2 GB file on FASP and FTP provided in Keltsch and Hammer [31] in Table 8.1. FTP only reaches an average throughput of 172 Mbps with no delay and loss environment while FASP reaching 930 Mbps. But with 120 ms delay and 0% packet loss, FTP only reaches an average throughput of 320 Kbps while FASP still keeping about 930 Mbps. With 5% packet loss and no delay, the FTP throughput is 120 Kbps and the FASP throughput is 880 Mbps. This test validates that packet loss impacts the throughput more than delay, since packet loss in TCP connections means an unknown delay as shown in Figure 8.1.

TABLE 8.1 FASP versus FTP

Test Environment

Throughput FASP FTP

No delay and 0% packet loss 930 Mbps 172 Mbps 120 ms delay and 0% packet loss 930 Mbps 320 Kbps No delay and 5% packet loss 880 Mbps 120 Kbps

In a test where Aspera FASP and standard TCP transfer a 2 GB file in parallel over the same link, the comparison is illustrated in Table 8.2 according to results from Keltsch and Hammer [31]. With 120 ms RTD and no loss, TCP session reaches 500 Kbps while FASP reaches 948 Mbps. In the second test with 40 ms RTD and no loss, four TCP sessions were run simultaneously instead of one, the four TCP sessions reached a total throughput of around 930 Mbps. In the third test (40 ms RTD, no loss), the Aspera FASP was initiated first reaching 930 Mbps, then four TCP sessions were started in parallel and FASP left almost half of the available bandwidth for the TCP transfer.

Optimizing Bandwidth: BitTorrent

BitTorrent [32] is a peer-to-peer file sharing protocol like FTP in client/server paradigm, which aggregates available bandwidth between data centers. BitTorrent is simple, widely used, and effective at most cases. BitTorrent uses a simple tracker to coordinate the com- plex participating nodes and tries to utilize the underlying network bandwidth as much as possible [14]; as we analyzed, its performance depends on the size and behavior of the group of downloaders [33]. BitTorrent [32] is widely used to transmit massive files by changing the data source from one to a distribution of multiple sources; this actually increases the involved bandwidth resources in the network.

We plot the main technique of BitTorrent [14] in Figure 8.5. In Figure 8.5a, a static file with the extension .torrent is put on an ordinary web server. The .torrent contains information about the file, its length, name, and hashing information and the URL of a tracker. BitTorrent cuts files into pieces of fixed size (e.g., 1/4 MB) and the SHA1 hashes of all the pieces. Trackers help downloaders to find each other. Trackers and downloaders use a simple protocol layered on top of HTTP (HTTP runs over TCP*). The downloader sends information about what file it is downloading, what port it is listening on, and so on and tracker responds with a list of con- tact information for peers which are downloading the same file. Downloaders then use this information to connect to each other. In Figure 8.5b, a downloader that has the complete file is called a seed. A seed must send out a complete copy of the original file. BitTorrent transfers data over TCP to achieve pipelining parallel streams, BitTorrent breaks pieces (e.g., 256 KB) further into subpieces (e.g., 16 KB). BitTorrent adopts various piece selection schemes, such as subpieces from the same piece have higher priority, random first piece, and rarest first.

* _{Tim Berners-Lee, The Original HTTP as defined in 1991, World Wide Web Consortium. http://www.w3.org/Protocols/} HTTP/AsImplemented.html.

TABLE 8.2 FASP versus TCP

Test Environment

Throughput

FASP 1 TCP Session 4 TCP Sessions in Parallel

120 ms RTD and no packet loss 948 Mbps 500 kbps — 40 ms RTD and no packet loss — — 930 Mbps 40 ms RTD and no packet loss 465 Mbpsa _— _{465 Mbps}a a _Estimated.

Speedup of Big Data Transfer on the Internet ◾ 149

In document Networking for Big Data Chapman pdf (Page 166-172)