• No results found

Distributed Packet Buffers for High Bandwidth Switches and Routers

N/A
N/A
Protected

Academic year: 2020

Share "Distributed Packet Buffers for High Bandwidth Switches and Routers"

Copied!
6
0
0

Loading.... (view fulltext now)

Full text

(1)

CSEIT183172 | Received : 05 Jan 2018 | Accepted : 19 Jan 2018 | January-February-2018 [(3) 1 : 236-241 ]

International Journal of Scientific Research in Computer Science, Engineering and Information Technology © 2018 IJSRCSEIT | Volume 3 | Issue 1 | ISSN : 2456-3307

236

Distributed Packet Buffers for High Bandwidth Switches and

Routers

Maragoni Mahendar1, Pinnapureddy Manasa2

1Assistant. Professor, Department of CSE, Avanti’s Scientific of Technological and Research Academy. Hyderabad, Telangana, India

2M.Tech Student, Department of CSE, Avanti’s Scientific of Technological and Research Academy, Hyderabad, Telangana, India

ABSTRACT

High-speed routers rely on well-designed packet buffers that support multiple queues, provide large capacity and short response times. Some researchers suggested combined SRAM/DRAM hierarchical buffer architectures to meet these challenges. However, these architectures suffer from either large SRAM requirement or high time-complexity in the memory management. In this paper, we present scalable, efficient, and novel distributed packet buffer architecture. Two fundamental issues need to be addressed to make this architecture feasible: 1) how to minimize the overhead of an individual packet buffer; and 2) how to design scalable packet buffers using independent buffer subsystems. We address these issues by first designing an efficient compact buffer that reduces the SRAM size requirement by (k - 1)/k. Then, we introduce a feasible way of coordinating multiple subsystems with a load-balancing algorithm that maximizes the overall system performance. Both theoretical analysis and experimental results demonstrate that our load-balancing algorithm and the distributed packet buffer architecture can easily scale to meet the buffering needs of high bandwidth links and satisfy the requirements of scale and support for multiple queues.

Keywords :Distributed Packet Buffers, SRAM, DRAM, RTT, Cisco CRS, TCP, HSD

I.

INTRODUCTION

The phenomenal growth of the Internet has been fueled by the rapid increase in the communication link bandwidth. Internet routers play a crucial role in sustaining this growth by being able to switch packets extremely fast to keep up with the growing bandwidth (line rate).This demands sophisticated packet switching and buffering techniques. Packet buffers need to be designed to support large capacity, multiple queues, and provide short response times.

The router buffer sizing is still an open issue. The traditional rule of thumb for Internet routers states that the routers should be capable of buffering RTT_R data, where RTT is a round-trip time for

(2)

requirements, nowadays, a packet buffer usually maintains thousands of queues. For example, the Juniper E-series routers maintain as many as 64,000 queues. Given the increasing popularity of Open Flow, a packet buffer that supports millions of queues is always desired.

Furthermore, a packet buffer should be capable of sustaining continuous data streams for both ingress and regress. With the ever-increasing line rate, current available memory technologies, namely SRAM or DRAM alone cannot simultaneously satisfy these three requirements. This prompted researchers to suggest hybrid SRAM/DRAM (HSD) architecture with a single DRAM, interleaved DRAMs or parallel DRAMs sandwiched between SRAMs .In this paper, we briefly review previous work on packet Buffer architectures and present scalable and efficient hierarchical packet buffer architecture. This is our first attempt to combine the merits of two previously published packet buffer architectures consequently; the SRAM occupancy has been significantly reduced. By fully exploring the advantage of parallel DRAMs, we first propose a memory management algorithm (MMA) called Random Round Robin (RRR). Thereafter, we devise a “traffic-aware “approach which aims to provide different services for different types of data streams. This approach further reduces the system overhead. Both mathematical analysis and simulation demonstrate that the proposed architecture together with its algorithm reduce the overall SRAM requirement significantly while providing guaranteed performance in terms of low time complexity, upper bounded drop rate, and uniform allocation of resources. In one simulation, the proposed architecture reduces the size of SRAM by more than 95 percent and the maximal delay is only us-level, when the traffic intensity is 76 percent.

Existing System

at the expense of a small loss in throughput.

Focusing on the performance of individual TCP flows, researchers claimed that the output/input capacity ratio at a network link largely determines the required buffer size. If the output/input capacity ratio is lower than one, the loss rate follows a power-law reduction with the buffer size and significant buffering is needed.

Proposed System

We devise a “traffic-aware” approach which aims to provide different services for different types of data streams. This approach further reduces the system overhead. Both mathematical analysis and simulation demonstrate that the proposed architecture together with its algorithm reduce the overall SRAM requirement significantly while providing guaranteed performance in terms of low time

3. Main Router: Main router sends the forward packets from source to destination and backward packets from destination to source. It receives empty packets from destination to calculate the bandwidth of destination and ack packets to send the next packet to destination.

4. Destination Router: It sends empty, ack packets to centralized router.

(3)

II.

ARCHITECTURE

First introduced the basic hybrid SRAM/DRAM architecture [8] with one DRAM sand- witched between two smaller SRAM memories, where the two SRAMs hold heads and tails of all the queues and the DRAM maintains the middle part of the queues. Shuffling packets between the SRAM and t h e D R AM is under the control of a memory management algorithm. The principal idea behind their MMA is to temporarily hold amount of data for each queue in both the ingress and egress SRAM, so as to change the scattered DRAM accesses into a continuous one. Since the batch loads are strictly limited within each queue, the size requirement of SRAM for the HSD architecture is where Q is the number of FIFO queues. Whenever a FIFO queue accumulates b amount of data, it is transferred to the DRAM through a single write. A SRAM queue size of 2gb guarantees against queue overflow. Flow further suggested that the size of the tail SRAM can be further reduced by introducing a pipeline design. They also introduced a so-called the Earliest Critical Queue First (ECQF)-MMA for the egress, which reduces the size of head SRAM to Q By introducing an extra delay the ECQF- MMA now predicts the most critical queue (the one that goes empty or bears the biggest deficit first) and fetches the corresponding b- size chunk [10] of data from the DRAM in advance. This architecture, also known as Nemo, has been adopted by Cisco. Distributed Packet Buffer Architecture In our view, all packet buffering techniques so far have adopted a traffic-agnostic approach while designing the packet buffering algorithms. We m u s t clarify that even though existing approaches do use Q queues, each queue is treated the same by the buffer management algorithms. No effort is made to exploit the inherent characteristics of the corresponding traffic patterns like the arrival rate, burst sizes, transit time requirements through the router, etc. However, a traffic -aware approach to the problem, we believe,

will yield new possibilities for conquering the scalability problem.

Per-Destination and Per-Packet Load Balancing You can set load-balancing to work per-destination or per-packet. Per-destination load balancing means the router distributes the packets based on the destination address. Given two paths to the same network, all packets for destination1 on that network go over the first path; all packets for destination2 on that network go over the second path, and so on.

Figure 1.

Distributed Packet Buffer Architecture

(4)

cache information includes the outgoing interface. For per-packet load balancing, the forwarding process determines the outgoing interface for each packet by looking up the route table and picking the least used interface. This ensures equal utilization of the links, but is a processor intensive task and impacts the overall forwarding performance. This form of per-packet load balancing is not well suited for higher speed interfaces.

Figure 2. Flow Allocation among Multiple Subsystems

Figure 3 shows the state machine we have defined. Although there could be thousands of combinations, we only reserve six critical states. They are “unallocated”, “large”, “small” and three intermediate states “large-small”, “small-large” and “large-small-large”. Any flow can switch its state between “small” and “large” smoothly with certain constrains.

Figure 3. Flow States

Meanwhile, it is strictly controlled that any flow can only possess no more than three serving states at any time, i.e. at most 2 turning points. This helps the system minimize the overhead of state maintenance. Based on the state machine above, we devise a load-balancing algorithm. The pseudo code of our algorithm is shown in Fig. 4. The algorithm is naturally separated into three tasks that are implemented at the distributor, compact packet buffer subsystem and the aggregator respectively.

The tasks communicate with each other through the centralized flow table.

Figure 4. Load-balancing algorithm

III.

SIMULATIONS

Our experimental results are presented in this section. Unless otherwise specified, the default for all the experiments is as specified in Table 1.

Table 1. Default Parameters

(5)

Assume the simulation lasts for X timeslots. For the first 0.2*X timeslots, there is only input without output where cells are backlogged. In this way, we can create an initial backlog and simulate the situation when the congestion happens. After 0.2*X+1 timeslots, a full-speed output begins while input maintains. With backlogged cells in the first phase, we can monitor the system performance in detail, especially how the load-balancing algorithm

behaves. After 0.5*X timeslots, the input stops while only the output maintains fetching any backlogged cells. In this way, we can simulate the situation when the system is lightly loaded. Moreover, we choose the parallel system (i.e. PHSD in [19]) as the basic reference standard of our distributed system. Because it is the best parallel architecture we known so far which represents the previous “flow-agnostic” approaches.

Figure 5. The overall active queues for both architectures

Figure 6. Flow-aware services based on probability method

Figure 7. Active queues allocations among four subsystems

Since our algorithm does not refer the flow assignment status in balancing, we are curious about the actual distribution of active queues among subsystems. Fig. 7 shows the distribution of active queues among multiple subsystems. By increasing flow number from 100 to 1000, we observe that the unbalanced distribution is greatly improved which matches pervious mathematical analyses. We also monitor the length of front-buffers. As shown in Fig. 8, the average FIFO lengths for both algorithms are much less than 100 and the distributed system always

(6)

Figure 8. The number of backlogged cells in a front-buffer

IV.

CONCLUSION

Unlike the previous approaches, our design dispenses with both the head and tail caches, keeping only tiny distributed front-buffers inside each subsystem. Each flow is only mapped to approximately one queue in a compact buffer. Thus, it maintains much less physical queues compared to other approaches and reduces the size of SRAM significantly. The significance of this research lies in the bold new direction towards distributed scalable buffer design that we plan to pursue. In particular, our approach yields a scalable, independent subsystem based packet buffer architecture that can easily be tailored to meet the specific line rate and traffic requirements for different switches, and match the flow-level requirements including bandwidth and delay guarantees within a switch. Our approach will also yield a power efficient buffer design, where parts of the buffer may be switched on and off based on the real-time traffic arrival rates and buffering requirements.

V.

REFERENCES

[1].S. Iyer, R. Kompella and N. McKeown, "Designing packet buffers for router line cards", in IEEE Transactions on Networking, vol.16, Jun. 2008, Issue 3.

[2].Samsung SRAM Chips,

http://www.samsung.com/global/business/semico nductor/prod

ucts/sram/Products_HighSpeedSRAM.html

[3].Samsung DRAM Chips,

http://www.samsung.com/global/business/semico nductor/prod ucts/dram/Products_DRAM.html

[4].J.Corbal, R.Espasa, and M.Valero, "Command vector memory systems: High performance at low cost," In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, pp.68-77, October 1998.

[5].K.G. Coffman and A.M.Odlyzko, "Is there a "Moore's Law" for data traffic?" Handbook of Massive Data Sets, eds., Kluwer, 2002,pp.47-93.

Author Details

Maragoni Mahendar has Received B.TECH Degree in Computer Science Engineering (C.S.E) from Avanthi’s Scientific Technological & Research Academy, Gunthapally, Rangareddy in 2012, under

Jawaharlal Nehru Technological University

Hyderabad and Masters Technology in Computer Science Engineering (C.S.E) from Nova College of Engineering & Technology, Jafferguda, Ranga Reddy 2014, under Jawaharlal Nehru Technological University Hyderabad. He is dedicated to teaching field since the last 4 years. His field of interest includes Cloud Computing, Data science & Big Data. He published Two International Journals and Participated in 4 International conferences. At present working as Asst.Professor, Department of Computer science and Engineering in Avanthi's Scientific Technological & Research Academy, Ranga Reddy, Telangana, India.

Email Id : m.mahender527@gmail.com

Figure

Figure 1. Distributed Packet Buffer Architecture
Figure 2. Flow Allocation among Multiple
Figure 7. Active queues allocations among four subsystems

References

Related documents

This report reviews the experience in 100 consecutive patients treated by peritoneal dialysis in a single centre.. MATERIALS

Another factor that ought to be considered is the fact that all the lesions that were enucleated are unicvstic ameloblastomas wh ich had been reported to have a much better

It is evident from this preliminary urodynamic study that in at least some patients it is difficult to predict the type of neurogenic bladder according to specific spinal

(D and E) At higher magnification, however, large numbers of anucleate cell bodies can be seen throughout the lung lesions in mice infected with the ⌬ yopM mutant (representative

Pseudoseizures, weakness of limbs, elective mutism, dystonia and behaviour problems were the presenting symptoms in three children from three different families with crises

normally sized (“typical”) cells in the presence of physiologically relevant stressors, we isolated and cultured purified titan and typ- ical cell populations with different degrees

2 As the sporadic form of non-A, non-B (NANB) infection can cause chronic liver disease in up to 50% of infected patients, we felt it pertinent to assess the frequency of non-A,

reported a seemingly low nephrectomy rate of 5% in his series, but his study was confined to patients with blunt trauma and he had included all minor renal contusions too.. The