• No results found

BGP Design for the Separation of the Local Big Data Traffic in Data Center

N/A
N/A
Protected

Academic year: 2021

Share "BGP Design for the Separation of the Local Big Data Traffic in Data Center"

Copied!
10
0
0

Loading.... (view fulltext now)

Full text

(1)

BGP Design for the Separation of the Local Big Data Traffic in

Data Center

Hyoung Woo Park

1

, Il-Yeon Yeo

2

and Sang Oh Park

3,* 1,2,3

Korea Institute of Science and Technology Information, South Korea

1

[email protected],

2

[email protected],

3

[email protected]

Abstract

As the service for big data increases, the number of analysis servers for big data grows rapidly. For example, the scale of analysis servers for the big data from Large Hadron Collier in Swiss CERN is generally beyond tens of thousands of servers. Therefore, tasks for the selection of the appropriate local routing protocol and the efficient configuration of the routing protocol for those servers are getting more important portion in the local network administration in big data center. In this paper, we introduce BGP as a routing protocol for the local network of big data center even though it is developed for WAN. The concept we introduced in this paper for the separation of big data traffic in local data center network is created based on the concept of the partition of the conference room in convention center. The main conference room can be partitioned by multiple small meeting rooms according to users’ requests. Each small room can be protected from the noise of the neighbor meeting rooms by partition walls. In the same way, the huge number of servers can be grouped into various server farms and then each of server farms can be identified Autonomous System number that is used for the BGP routing domain for network administrator. Using the techniques of AS numbering we can control the big data traffic more efficiently than ever in data center. We introduced new application method of AS number for a service to recognize certain traffic. Therefore, we can separate and control big data traffic more in detail. We tried to prove that BGP is superior to IGP and overlay networking with respect to the routing control in big data center.

Keywords: BGP, Local Routing Protocol, Big Data Center

1. Introduction

Big data is so huge that great numbers of servers are required to analyze them. Lots of servers and big data traffic become typical features of big data center and at the same time brought new problems in the network management for big data center. One of new problems is about a routing protocol. The routing protocol for the legacy data center’s local network usually used IGPs (Interior Gateway Protocols) such as RIP(Routing Information Protocol), OSPF(Open Shortest First First) [1] IGRP(Interior Gateway Routing Protocol) and so on. Technologies that IGP uses for IP routing are default routing, best path first, IP address based routing, dynamic propagation, et al. These technologies have a strong tendency to force IP packets to converge on the best single path that is also called by default route. The attribute of packets’ convergence of IGP frequently drove the LAN (Local Area Network) of the legacy data center to be congested for a long time [2]. To make it worse, this problem is getting more serious as the big data service becomes popularized. Therefore, most of network researchers and experts study to suggest new network architecture [3, 4] for big data center as one of the solutions. The purpose of developing new network architecture makes it easy to separate big data traffic from the existed traffic.

(2)

In this paper, we suggested to introduce BGP(Border Gateway Protocol) [5][6] for internal routing protocol for the LAN of big data center. BGP uses AS(Autonomous System) number to control route for IP packet routing. AS number based routing is more flexible than an IP address based routing because IP address used for the identification of the network device as well as IP routing. Therefore, it is not a good recommendation to change the IP address of network interfaces to change IP routing.

The concept we introduced in this paper for the separation of big data traffic in local data center network is created based on the partition of the conference room in convention center. The main conference room can be partitioned by multiple small meeting rooms according to users’ requests. Each small room can be protected from the noise of the neighbor meeting rooms by partition walls. In the same way, the huge number of servers can be grouped into various server farms and then each of server farms can be identified Autonomous System number that is used for the BGP routing domain and for the routing policy by network administrator. Using the techniques of AS numbering we can control the big data traffic more efficiently than ever in data center. In this paper, we introduced new application method of AS number to recognize certain traffic for a service. Therefore, we can separate and control big data traffic more in detail. We tried to prove that BGP is superior to both of IGP and overlay networking with respect to the routing control in big data center. Figure 1 and Figure 2 show the concept of our approach.

Figure 1. Dynamic Allocation of Meeting Rooms in Convention Center

Figure 2. Dynamic Traffic Separation by AS Numbering in Big Data Center

2. Related Researches

As the legacy data centers start to service big data, the separation of big data traffic from the existed data traffic becomes important in the traffic management of the network administrators. Most of data centers use IGPs as a routing protocol for the traffic management for data centers. As we mentioned above IGPs are born as the routing

(3)

protocol for LAN. They have a limit in the separation and the control of traffic based on the policy of user service. But, EGPs (Exterior Gateway Protocols) such as BGP, overlay networking, etc. are fully developed for the separation and the control of traffic for WAN (Wide Area Network). It’s because many ISPs (Internet Service Providers) on WAN should closely collaborate together for their own benefits. Therefore, the functions of EGPs for the control and the separation of the traffic are superior to those of IGPs. Therefore, we had an idea that EGPs will work better than IGPs when a data center has more than a hundred of thousands of servers. Usually, more than a hundred of thousands of servers are required for the analysis of 10 peta byte data.

There are two kind of leading technologies in EGPs. One is BGP and the other is Overlay networking. Most researches for them were achieved in early 2000. The main reason of the introduction of EGPs for the local network of big data center is to provide the more control ability of big data traffic with network administrators. It means that traffic control for big data traffic requires high performance approach compared to current IGPs. Therefore, we prefer BGP to overlay networking.

The trends of research for Overlay networking [7, 8] tended to place emphasis on the enhancement of routing control. For a user, some ISPs wanted to have an ability to provide special route that didn’t service by the routing protocols. For example, a route that was recommended by the routing protocol was the best way but it was congested. At that time, network administrator found another idle route and wanted to force users’ traffic to go through the idle route. Overlay networking technologies were introduced for the solution of the dream of network administrators. Therefore, Overlay networking requires additional servers to reroute user’s traffic according to the only policy of network for ISP. The performance of traffic forwarding is limited by the process of additional server though the function of traffic control is very high. Figure 3 shows the concept of Overlay networking. With the aid of additional server, Network administrators can change the route of the traffic according to SLA (Service Level Agreement) contract.

Figure 3. Path Selection by Overlay Networking

One analysis of computing for the science area of big data is Grid computing. The model of Grid computing is similar to model of Overlay networking. Figure 4 illustrated this similarity. The research on bridging the Grid computing technologies and Overlay networking technologies for the purpose of the provisioning direct interface between user and network. Every user requirement for network service can be completed by the aid of the network administrators. In other words, all kind of network service are indirect service

(4)

with respect to the view of users. It means there is no means for users to directly control the network system.

Figure 4. Model Comparison between Grid Computing Model and Overlay Networking

Compared to Overlay networking, BGP [9, 10, 11] is preferred to the network administrators of big data center for the management of local big data traffic. Because BGP is simple and is operated at the network layer. Overlay networking technologies is generally the application layer. Therefore, BGP is used for high performance network. Figure 5 shows the concept of the path control for BGP. Overlay networking technologies also can support multiple routes. The complexity of the provisioning of multi-routes in Overlay networking is higher than the complexity of BGP for the support of multi-routes.

Figure 5. Path Selection by BGP Multihoming

The benefits of BGP as usages of the local routing protocol are enumerated by “heterogeneous vender interoperability”, “per-hop traffic engineering”, “simple trouble shooting”, “constrained propagation”, and so on. BGP provide heterogeneous vender interoperability. Most of network device and server systems provide BGP as a one of basic (or default) function. Per-hop traffic engineering serviced by BGP can be implemented by using unequal-cost anycast load balancing solution. Simple trouble shooting means that BGP RIB (Routing Information Base) structure is simpler compared

(5)

to link-state LSDB (Link State Data Base). The meaning of constrained propagation implies that link failures have limited propagation scope. Therefore, it can have more stability die to reduced event “flooding” domains.

3. BGP Design for the Local Routing of Big Data Center

Though BGP was born for the routing protocol of WAN (Wide Area Network), it can also work for LANs when they have hundreds of thousands of servers. Using AS number, BGP can forward IP packets on a path that has not the highest metric (or cost). BGP can eliminate default route and provide multi path for the separation of big data traffic. Therefore, BGP can mitigate the drawback caused by IGP in big data center when big data service is popularized. As an application method for this BGP design, we introduced the new concept of the function based AS numbering. AS numbering in BGP is usually used to identify the administration domain of a certain network. In this paper, we extended the function of AS number to a tool to recognize a certain service. Therefore, we can separate and control IP traffic more conveniently.

Figure 6. Function based AS Numbering for Tiered Data Center

For BGP design for big data, we also tried to create representative AS numbers, Gateway AS numbers, Service AS numbers and Server Farm AS numbers according to their unique function. Representative AS numbers and Gateway AS numbers are used for identification of the administration domain and configured by public AS number. Service AS numbers and Server Farm AS numbers are used for the recognition of big data service and configured by private AS number. Therefore, Internet Service Provider and External network administrators cannot know the existence of private AS number of a big data center that uses private AS numbers. But, the big data center that uses the private AS number proposed in this paper can have more privileges on IP traffic control. For example, Service AS number can be used for the identifier of service traffic. Using service AS number, we can control (or filter) the routing of IP packets that are originated from a server or servers regardless of IGP routing metric. Figure 6 showed the concept of the function based AS numbering. In order to identify a certain traffic, we recommend that traffic go through the specific-numbered service AS which will be used at AS filtering for traffic control. We are considering Multi-Path technologies of BGP for the separation of big data traffic too. BGP Multi-Path technologies resume today again due to the increase of the interest in big data routing in data center, Figure 7 and 8 showed the concept of the BGP Multi-Path technology. Multi-Path technology within same AS in Figure 7 is already

(6)

developed and serviced by commercial network system. Multi-path technology with different AS is under developing.

Figure 7. BGP Multi-Path with same AS

Figure 8. BGP Multi-Path with different AS

4. Analysis of the BGP Application for Local Routing

We first analyzed qualitatively the characteristics of IGPs and BGP from the big data centers’ point of view. The emersion of big data causes the data center to be more complex than ever. The IGPs that automatically exchange IP routing information begin to show the lack of the fine control on the IP traffics of big data centers. IGPs are weak in fine policy based routing control. Table 1 showed the result for the comparison between IGPs and BGP.

(7)

Table 1.Characteristics Comparison of IGPs and BGP

Considerations IGPs BGP

Service domain Intra domain Inter domain

Multi path Not available Available

Network size Relatively small Large

Id. For routing IP address AS number

CIDP implement Not easy Easy

Routing information Proportional to servers Small

Configuration Dynamic routing Static routing

Policy based routing Weak Strong

We planned to design each AS to be connected other AS by EGBP. Therefore, IGPs cannot help running within a domain denoted by single AS. Each server farm AS is designed to have hundreds of servers. We found that IGPs are appropriate for hundreds of servers from our experience. The aim of our BGP design is to fully utilize all of the benefits from both of IGPs and BGP. BGP is forced to take on the role of the classification of big data traffic and the role of the provision of the multi paths. We can also adjust the scale of a server farm without the change of IP address of any server with the help of existed IGPs. Therefore, we can have a tool for a detail control of large-scale data traffic and legacy small traffic at the same time. It is a key reason that we should suggest to use dynamic routing (from IGP) and policy based routing (from BGP) simultaneously for the LAN of big data center.

Table 2. Characteristics Comparison of BGP and Overlay Networking

Considerations BGP Overlay

Cost Low (Connectivity fees) High (Connectivity fees + overlay fee)

Path selection Policy oriented selection Performance oriented selection

Management Simple Complex

Style of multipath One path via each AS Multiple candidate paths

Routing Information Part of the entire information Full routing information

Connection Neighbor to neighbor Peer to peer Applications

Domain Area specific Application specific

Convergence A few minutes Tens of seconds

We also compared features of BGP and Overlay networking because both technologies will play important role in local network of big data center as well as WAN. It’s because both technologies have been used for traffic separation and control in WAN. Table 2 showed the results of the comparison. First, the cost for the implementation of BGP is cheaper than Overlay networking. It owes BGP costs only connectivity fees but Overlay networking costs connectivity fees and overlay fees. The criteria of path selection are different. BGP basically selects short AS_hop path or by policy. Overlay networking prefer performance oriented selection. Therefore, the management of BGP is simpler than Overlay networking. The path selection based on short AS hop is easier in network management and troubleshooting compared to the path selection by link state algorithm. The routing algorithm for BGP is a kind of distance vector algorithm. The style of building multipath is also different. BGP prefer one path via each AS but Overlay networking tends to make multiple candidate AS regardless of the size of AS. Routing information of BGP only keeps the information of neighbors but Overlay networking has full routing information of the networks. The entity of connection in BGP is neighbor

(8)

nodes in adjacent networks and the entity of peering in Overlay networking is application software defined by a user. The routing domain is configured by BGP is related to an area. Area can mean one sub network or multiple sub networks or entire network.

The domain that Overlay networking indicates can be configured with more variety. It means Overlay networking can control the traffic flow of a certain application as well as traffic of a sub network. Finally, BGP may take as much as ten minutes after recovery. But, this data is obtained from WAN. BGP convergence will be greatly reduced if we deploy BGP in LAN of big data center because LAN of big data center is constructed by 10G NIC (Network Interface Card). We insist that the real power of BGP is accomplished by the implementation on the LAN of big data center. The convergence time of Overlay networking takes tens of seconds that was measured in WAN. We studied BGP as an IGP for big data center that hundreds of thousands of servers. Results from this paper predicts BGP will be one of the best solution for the separation of big data traffic in data center

5. Conclusion

The increasing complexity of big data center brings new research trends on the routing technology for the LAN. For the efficiency of local routing control for more than tens of hundreds servers, the introduction of BGP for LAN starts to attract network researchers again. In our case, we run data center called by GSDC (Global Science Data Center). Currently, GSDC has 4700 servers physically and provides multi peta (1015)-byte big data service. Therefore, we also joined BGP study as a data center IGP. We will keep on BGP design in detail for GSDC, and then try to prove the fruitful result by simulations before the local implementation of BGP.

Acknowledgments

This work was supported by the program of the Global hub for Experiment Data of Basic Science, 2015 funded by the NRF (N-15-NM-CR01-S01).

References

[1] A. Sridharan, R. Guerin and C. Diot, “Achieving Near-Optimal Traffic Engineering Solutions for Current OSPF/IS-IS Networks”, IEEE/ACM Transactions on Networking, vol.13, no. 2, (2005), pp.234-247.

[2] S. Hares, “NANOG 53 Operators Perspective”, IETF Internet-Draft, draft-hares-armd-nanog52-00.txt

(2012).

[3] A. Greenberg, P. Lahiri, D. A. Maltz, P. Patel and S. Sengupta, “Towards a Next Generation Data Center Architecture: Scalability and Commoditization”, Proceedings of PRESTO ’08, Seattle, USA,

(2008) August 22.

[4] M. Caesar, D. Caldwell, N. Feamster, J. Rexford, A. Shaikh and J. V. Merwe, “Design and implementation of a routing control platform”, Proceedings of the 2nd conference on Symposium on Networked Systems Design & Implementation, Berkely, USA, (2005), July 27.

[5] P. Lapukhov, A. Premji and J. Mitchell, “Use of BGP for routing in large-scale data centers”, IETF Internet-Draft, draft-lapukhov-bgp-routing-large-dc-06.txt, (2014).

[6] P. Mohapatra and R. Fernando, “BGP Link Bandwidth Extended Community”, IETF Internet-Draft, draft-ietf-idr-link-bandwidth-06, (2013).

[7] A. Ganguly, A. Agrawal, P. Oscar and R. Fgueiredo, “IP over P2P: Enabling Self-configuring Virtual IP Networks for Grid Computing”, Proceedings of IPDPS, Rhodes Island, Greece, (2006), April 25-29. [8] D. Rao, J. Mullooly and R. Fernando, “Layer-3 virtual network overlays based on BGP Layer-3 VPNs”,

draft-drao-bgp-l3vpn-virtual-network-overlays-00 (2012).

[9] Z. Li, P. Mohaspatra and C Chuah, “Virtual Multi-homing: On the Feasibility of Combining Overlay Routing with BGP Routing”, LNCS 3462 (2005), pp.1348-1352.

[10] Cisco, “Load Sharing with BGP in Single and Multihomed Environments: Sample Configurations,” Document ID 13762, (2005).

[11] A. Akella, J. Pang, A. Shaikh, B. Maggs and S. Seshan, “A Comparison of Overlay Routing and Multihoming Route Control”, ACM SIGCOMM Computer Communication Review, vol.34, no.4 (2004), pp.93-106.

(9)

Authors

Hyoung Woo Park is the director of national R&E network (KREONET) center and the Principal Researcher in KISTI Supercomuting center in south Korea. He obtained his Ph. D. in Computer Networks at SungKyunKwan University in south Korea. He had participated lots of R&D projects including the construction of national R&D network (KREONET), the implementation of National Grid Computing Infrastructure, and the project for construction of peta-scale science data Grid center for the global collaboration researches on CERN LHC data, KEK Belle data etc.

Il-Yeon Yeo received the B.S. and M.S. degrees from the School of Electronic Engineering at Kyungpook National University in 2000 and 2002, respectively.

He has been serving as a Senior Researcher of Global Science Experimental Data Hub Center at Korea Institute of Science and Technology Information (KISTI) since 2012. He served as a Senior Researcher of Knowledge Information Center and National Science and Technology Information Service at KISTI since 2002. His research interests include Parallel Computing, Information Retrieval, Database, Grid Computing, and Security.

Sang Oh Park received the B.S., M.S., and PH.D. degrees from the School of Computer Science and Engineering at Chung-Ang University in 2005, 2007, and 2010, respectively.

He has been serving as a Senior Researcher of Global Science Experimental Data Hub Center at Korea Institute of Science and Technology Information since 2012. He served as a Research Professor at Chung-Ang University. His research interests include Big Data System, Tape Storage System, Embedded System, Cyber Physical System, Home Network, and Linux System.

Author’s picture should be in grayscale.

Picture size should be absolute 3.18cm in height and absolute 2.65cm in width

Author’s picture should be in grayscale.

Picture size should be absolute 3.18cm in height and absolute 2.65cm in width

Author’s picture should be in grayscale.

Picture size should be absolute 3.18cm in height and absolute 2.65cm in width

(10)

Figure

Figure 1. Dynamic Allocation of Meeting Rooms in Convention Center
Figure 3. Path Selection by Overlay Networking
Figure 4. Model Comparison between Grid Computing Model and Overlay  Networking
Figure 6. Function based AS Numbering for Tiered Data Center
+3

References

Related documents

[r]

El primer deber —en puridad, carga, ya que el incumplimiento se torna en una consecuencia negativa para el cliente— que pesa sobre el cliente es el de entregar el bien o facilitar

Abstract In this paper the well-known minimax theorems of Wald, Ville and Von Neumann are generalized under weaker topological conditions on the payoff function ƒ and/or extended

If you expect search engine spiders to execute Flash, Java or Javascript code in order to access links to further pages within your site, you'll usually be disappointed with

To help us continually improve our service, and in the interests of security, we may monitor and/or record your telephone calls with us.. Any recording remains our

The main wall of the living room has been designated as a "Model Wall" of Delta Gamma girls -- ELLE smiles at us from a Hawaiian Tropic ad and a Miss June USC

…in developing a plan to implement a SNF VBP program, the Secretary shall consider the structure of value- based payment adjustments, including the determination of thresholds or

We agree that many of the elements listed above are an important part of the accounting for disclosure. Elements such as time, date, patient identification, user identification,