• No results found

Putting it on the NIC: A Case Study on application offloading to a Network Interface Card (NIC)

N/A
N/A
Protected

Academic year: 2021

Share "Putting it on the NIC: A Case Study on application offloading to a Network Interface Card (NIC)"

Copied!
5
0
0

Loading.... (view fulltext now)

Full text

(1)

Putting it on the NIC: A Case Study on application

offloading to a Network Interface Card (NIC)

Yaron Weinsberg, Elan Pavlov, Yossi Amir, Gilad Gat, Sharon Wulff

Department of Computer Science

The Hebrew University Of Jerusalem Email:{wyaron,elan}@cs.huji.ac.il

Abstract— We have implemented a firewall application on

a Network Interface Card (NIC). We have tested the CPU utilization and the bandwidth in a variety of scenarios. The benefits of offloading code are most pronounced when rejecting packets. Our results suggest significant benefits of offloading applications and in particular firewall logic to a NIC.

I. INTRODUCTION

There are many communication applications that act on every incoming packet. Offloading such applications to the network interface card (NIC) has many potential advantages. Utilizing the onboard computational power of the NIC can reduce the demands put on the CPU. If the NIC can process incoming information it can avoid costly interrupts to the CPU. In addition, the NIC can serve as a gatekeeper thus avoiding potential threats to the CPU. Furthermore, applications on a NIC can be built such that they are system and OS indepen-dent.

An application of particular promise for offloading is a firewall application. Since a firewall is an application that filters packets by a user defined security policy, earlier filtering (especially discarding packets) has a potential for significant improvements in performance. A firewall application on a NIC also has the additional advantage that it is harder for an adversary to modify than a software application running at the host.

We have designed and implemented a firewall application which we call SCIRON (Secure-Communication IntegRated

Over NIC) on a NIC. The system consists of three elements: The firewall logic, a management console and a policy builder. This paper presents SCIRON, and shows that offloading full

applications has significant advantages and market potential more so than TCP offload engines [9] (TOEs) or protocol specific offloaded extensions.

II. RELATEDWORK

Numerous firewall applications exist in the market today. The objective of firewall applications is to protect the network from external and internal attacks. Firewall applications range from commercial products developed by leading companies in the industry (such as Checkpoint [2] and Cisco [3]), to lightweight free tools such as Linux IP-Chains [10] and more advanced open source solutions such as Snort [7] and ClamAV [4]. These products share a common philosophy of filtering communication at the network stack layer. The packet

filtering module is integrated into the operating system’s kernel and intercepts each incoming and outgoing packet. The packet is evaluated against the Firewall’s security policy and will be either discarded or allowed access to (or from) the protected computer.

Recently, 3COM has independently developed [1] an intru-sion detection system (IDS) suit which is a distributed firewall bundle for servers and desktops, priced at around800$. Our philosophy of offloading firewall logic to the NIC is similar to their product but has significant cost advantages. Furthermore, although there is evidently commercial potential, there has been no research released on the quality of such a solution.

III. SYSTEMDESCRIPTION

The system is divided into three main parts: The firewall logic is what controls which packets are accepted and which are filtered. For purposes of comparison we implemented the firewall both on the NIC and as a driver. We then compared in section IV the results of the implementation on a NIC vs the results of the implementation as a driver.

The second part is a policy builder which creates the policy that regulates the firewall logic. Since the number of rules effects the speed of the system we attempt to find a small set of rules that implements the given policy. As this problem is non-polynomial time complete we utilize several heuristics (detailed in the full paper). In addition we warn of conflicting or redundant rules.

Finally, there is a management console which controls the loading of rules to a specific computer in the network as well as logging and other management activities.

A. Overview and Motivation

Offloading firewall logic to a NIC offers several benefits, as an off the shelf OS independent implementation. It is also harder to tamper with a hardware implementation as opposed to a software implementation. Finally, it allows flexibility in accessing a remote host without jeopardizing the whole segment.

Firewall applications are computationally expensive for sev-eral reasons:

The host’s CPU is repeatedly interrupted by the NIC

on incoming packets. The processing power required to handle the interrupts is wasted if the packet is doomed to be discarded.

(2)

An adversary can try to perform a denial of service (DoS)

attack by sending packets from many computers, in an attempt to overload the system.

The networking stack has significant overhead.

The PCI [6] bus is a major bottleneck especially in

today’s incline towards faster networking fabrics (should be reduced in the future PCI express).

Offloading firewall activities to the NIC can evade some of these issues. We have implemented a static, 5-tuple (pro-tocol, IP-source-address, IP-destination-address, source-port, destination-port) firewall, at the NIC. Our motivation was to examine the benefits from such an offload and to measure the expected performance in CPU utilization and overall throughput.

B. Environment

Our programmable interface card is based on the Tigon chipset. The Tigon programmable Ethernet controller, released in 1997 is used in the family of 3Com’s Gigabit NICs.

The Tigon controller supports a PCI host interface and a full-duplex Gigabit Ethernet interface. Figure 5 shows a block diagram of the Tigon. The Tigon has two 88 MHz MIPS R4000-based processors which share access to external SRAM. Each processor has a one-line (64-byte) instruction cache to capture spatial locality for instructions from the SRAM. In the Tigon, each processor also has a private on-chip scratch pad memory, which serves as a low-latency software-managed cache. Hardware DMA and MAC controllers enable the firmware to transfer data to and from the system’s main memory and the network, respectively.

Fig. 1. Tigon Controller Block Diagram

Scratch

Pad CPU A CPU B ScratchPad Memory Bus Arbiter External RAM MAC Read

DMA WriteDMA

PCI Interface

Memory Bus

The Tigon Chipset

PCI Full-Duplex Gigabit Ethernet

The Tigon controller uses an event-loop approach instead of an interrupt driven logic. The motivation is to increase the NIC’s runtime performance by lowering the overhead imposed by interrupting the host’s CPU each time a packet arrives or a DMA request is ready. Furthermore, on a single processor the need for synchronization and its associated overhead is eliminated. Our system is comprised of the Netgear GA620T NIC, which uses Tigon version II chipset, with an external

SRAM of 1 MB. This NIC does not provide a CPU interrupt mechanism. Although in our architecture, the NIC’s operating system is designed as a non-preemptive kernel, our work does provide the OS specification for interrupt-enabled NIC hardware.

C. Programming Model

SCIRON [11] is based on a previous project conducted in our lab called STORM [12]. STORM provides a framework

on top of the original NIC’s firmware which enables a de-veloper to install predefined hooks. Hooks can be installed both at the firmware level and/or at the kernel level (inside the NIC’s driver). STORM’s framework also enables adding custom events which are triggered by the driver. These triggers can be used as a communication method between the firmware and the network driver.

Figure 2 presents the modules and hooks provided by STORM’s framework. The Rx and Tx stubs provide the hooks

necessary for intercepting traffic on the NIC. These hooks, enable SCIRON’s firewall to filter traffic according to a

pre-defined security policy. The firmware’s trace module provides development and debugging capabilities of firmware code. We utilize this capability for transmitting packets from within the NIC for remote logging.

Fig. 2. STORM’s modules

Figure 3 presents a sample hook invocation performed by STORM´s framework. The storm pre recv hook is invoked in

the receive control flow of the NIC’s firmware.

Fig. 3. Installing a Hook

/* call the hook and get packet verdict */

BOOL allowed = storm_pre_recv(pkt); if (!allowed) { /* discard packet */ storm_discard(pkt) } /* allow packet */ ...

The storm pre recv hook receives a pointer to the beginning of the communication packet, i.e. the ethernet header, and

(3)

returns false if the packet is to be discarded or true if it is to be allowed (forwarded to the operating system).

SCIRON’s firewall is implemented as a set of such firmware

hooks that are installed using STORM’s API. These hooks are compiled and linked with the firmware image and are installed during NIC initialization.

In order to simulate common kernel-based firewalls for performance evaluation, we have also installed hooks at the driver layer (also using STORM’s framework). All comparisons shown in Section IV, compare the same firewall code (with the same filtering policy) between the driver based firewall and the NIC based firewall. Currently the firewall code is fully stateless thus state is not saved between successive hook invocations. D. SCIRON Architecture

This section presents the main components of SCIRON

runtime. The runtime is comprised of two main components: The SCIRONenforcement module and SCIRON’s management

console.

1) Enforcement module: The enforcement module is the engine of SCIRON’s firewall that actively enforces the security

policy upon incoming and outgoing packets. SCIRON’s firewall is an ordered 5-tuple firewall. When a packet arrives, a sequential pass over the rules is performed. The action (accept or reject) associated with the first rule that matches the packet header is performed. If there is no match, the default policy action (reject all) is performed

2) Management Console: SCIRON’s management console

provides remote administration and logging capabilities. Ad-ministrators can remotely install security policies at enforce-ment modules of machines in their domain. This is done by communicating with SCIRON’s embedded enforcement module using a proprietary protocol called SRPP (SCIRON Remote

Policy Protocol).

An administrator can also determine the policy for moni-toring and logging events to the management console. This is done by marking specific rules as log-rules. Packets caught by these rules will generate a log packet containing the packet’s information. The logged packet is then sent to the management console. Allowing real-time monitoring and tracking of the network activities, enables the administrator to immediately act upon potential attacks.

SCIRONmanagement console is comprised of the following

modules: (1) Management console GUI - a tool used for defining and managing the security policy. (2) Log viewer -A server application which receives log packets sent by the various enforcement modules and displays them graphically to the administrator. (3) Policy builder - a tool for verifying the correctness of the security policy defined by the administrator, by searching for shadowed and redundancy rules. The verifier implements the algorithm presented in [8]

IV. EXPERIMENTALRESULTS

There are many parameters that influence firewall perfor-mance. Such issues as, the number of rules, current CPU utilization, packet size, ratio of incoming to outgoing packets,

Fig. 4. SCIRON’s Managment Console Architecture

total number of packets, number of packets accepted vs number rejected etc., can all potentially influence performance. Performance is measured using two parameters. The first is the load on the CPU and the second is the throughput. In this section we present and discuss several typical results.

In the first scenario we present (Figure 5), the firewall discards all the packets it receives. During this scenario the CPU is only running system processes. The CPU is on the left of the graph and throughput is on the right.

Fig. 5. Receiving - Discarding all packets

As we expect, in this scenario the CPU utilization when using the firewall implemented on the NIC is approximately zero, whilst for the same firewall on the host it is quite high. The second scenario presented is the complementary sce-nario in which the firewall accepts all received packets.

In this case we see that the CPU utilization and throughput is much higher when the SCIRON firewall is deployed (see

Figure 6). The host CPU utilization is very high (98.12%), leaving very little CPU time to the host applications. Most of the CPU cycles are consumed in the networking stack and interrupt handlers. Although the CPU has more computational power than the NIC it has to do less work for each incoming packet (as rule matching is done by the NIC) leading to higher throughput.

(4)

Fig. 6. Receiving - Accepting all packets

The third scenario, given in Figure 7, is probably a more realistic behavior for a typical host machine.

Fig. 7. Receiving - 50% acceptance rate

It is evident that the NIC based firewall has better perfor-mance both in CPU utilization and throughput.

Finally, the last scenario is somewhat less typical. In this scenario all user packets are compliant to the firewall rules, hence all packets are forwarded.

As we can see in Figure 8, SCIRON firewall performance

is inferior to the host’s firewall. Since, all outgoing packets have to be processed, the computational power of the firewall becomes a bottleneck as the host CPU speed is faster than the NIC’s processor. We expect that this difference will be less outstanding in high end NIC cards. In practice, most machines are driven by incoming bandwidth and not outgoing bandwidth.

A. Extensions

Offloading code can potentially exacerbate the security problem by adding more opportunities for bugs. Unfortunately, even if an offloaded protocol design can be shown to be secure, this does not imply that all of its implementations would be secure. In fact, many (if not most) security holes are

Fig. 8. Sending - Accepting all packets

implementation bugs, not specification bugs. Hackers actively find and exploit bugs, and an offloaded code bug could be much more severe than traditional user-level applications bugs, because it might allow unbounded and unchecked access to host memory.

An additional minor problem is that in some scenarios implementation of a firewall on a NIC suffers drawbacks compared to standard implementations. These scenarios typ-ically involve a heavy load of outgoing packets. These two problems lead us to consider a mixed paradigm: Utilize a NIC firewall for preliminary filtering of incoming traffic along with a conventional firewall for additional filtering. Any bugs offloaded to the NIC can easily be dealt with via the conventional firewall. Since most of the performance gain is due to faster discarding of unwanted packets this solutions conserves most of the benefits of offloading the firewall logic to the NIC. In addition the conventional firewall can filter outgoing packets thereby eliminating the bottleneck associated with the NIC filtering.

In the future we intend to look at a Stateful Packet Inspec-tion (SPI) firewall, in which filtering depends on prior packets received to avoid such attacks as denial of service (DoS). We expect that the advantages of an SPI firewall on a NIC will be substantial although less pronounced than for a stateless packet inspection.

We also intend to port this work to the NICOS operating system which was also developed in our lab [5].

V. CONCLUSIONS

We learned that offloading firewall logic to a NIC has many advantages. In scenarios with a heavy incoming packet load (especially if packets need to be discarded) a firewall offloaded to a NIC significantly improves both CPU utilization as well as packet throughput. On the less likely scenarios of heavy outgoing packet traffic, offloading firewall logic to a NIC is slower than conventional firewalls. It is important to note that our implementation is based on an obsolete NIC. We expect that the performance gain will be more pronounced when utilizing an advanced NIC. Although current NICs hardware

(5)

is continuously improving, the host CPU speed will likely continue to be faster than NIC hardware. In order to further improve the sending flow performance, a mixed paradigm can be used. In this model, the processing of outgoing packet is performed at the host while the incoming packets are processed in the NIC. We will further study this kind of solution in future research.

REFERENCES

[1] 3com embedded firewall. http://www.3com.com/prod. [2] Checkpoint. http://www.checkpoint.com/.

[3] Cisco systems inc. http://www.cisco.com/. [4] Clamav. http://www.clamav.net/.

[5] Network Interface Card Operating System (NICOS). Homepage at http://www.cs.huji.ac.il/∼wyaron/.

[6] PCISIG industry organization, PCI specification. http://www.pcisig.com/specifications.

[7] Snort. http://www.snort.org/.

[8] E. S. Al-Shaer and H. H. Hamed. Discovery of policy anomalies in distributed firewalls. In INFOCOM, 2004.

[9] A. Currid. TCP offload to the rescue. Queue, 2(3):58–65, 2004.

[10] L. Journal. Building a firewall with ip chains.

www2.linuxjournal.com/article/3622.

[11] Y. Weinsberg. SCIRON, Secure Communication IntegRated over NIC. Homepage at http://www.cs.huji.ac.il/∼wyaron/sciron.html/.

[12] Y. Weinsberg. STORM, Super-fast Transport Over Replicated Machines. Homepage at http://www.cs.huji.ac.il/∼wyaron/storm.html/.

References

Related documents

Thus, conditions of employment for women workers are based on the expectation that they are not the sole breadwinners (Kotikula and Solotaroff, 2006: 3– 4; Hewamanne, 2010). In

A panel of five shRNA-CDK2 (sh76 and sh77) transduced TNBC cell lines were treated for a period of 10 days with doxycycline (to induce CDK2 knockdown) in combination with

Paul Finkelstein.. As of June 30, 2009, the Company owned, franchised or held ownership interests in over 12,900 worldwide locations. Regis’ corporate and franchised locations

• define good practice for archetype authorship • establish quality, governance and certification. processes for archetypes

Mullinax has over three decades of experience leading and conducting contract, forensic, and performance audits; fraud and misconduct investigations; and anti-fraud

Background of this research was the still high number of infant mortality and high use of Traditional Birth Attendants (TBAs). The This study aims aimed to determine TBAs

It is important to see the organization of the LEGO Group from all of the above-mentioned aspects, in order to create a better understanding of  how, and if, the company philosophy

Using a panel data set that consists of over 2,000 markets observed from 1988 to 2004, we report a number of findings regarding the market characteristics that are associated with