Now that you understand the OS aspects of server applications and how the TCP stack works, you can take advantage of network products and features to optimize the use of the servers in a Data Center. Optimizing the server performance involves a combination of many factors:
•
Use of faster bus architectures. This topic was covered in the section, “PCI and PCI-X Buses.”•
Use of the Window Scale option. This topic was described in the section, “High-Speed Networks and the Window Scale Option.”•
Bundling multiple NICs in EtherChannels. This topic was described in the section,“Server Multihoming.” More information about using multiple NICs from a single server and the network design aspects of this technique can be found in Chapter 20.
•
Alleviating the CPU load by using interrupt coalescing.•
Alleviating the CPU load by using TCP offloading.•
Alleviating the CPU load by using jumbo frames.•
Using reverse proxy caching to deliver static content.This section provides additional details about jumbo frames and reverse proxy caching.
Jumbo Frames
After reading the section, “Client and Server Packet Processing,” you should be aware of the amount of processing involved in the reception of an interrupt from a NIC in relation to the arrival of a packet.
The overhead associated with the processing of each interrupt can be a limiting factor in the achievable throughput because the CPU could be busy processing hardware interrupts without performing any other operation on the incoming traffic.
With the adoption of Gigabit Ethernet NICs, the number of interrupts per second that arrive at the CPU can range from about 81,000 (with 1518-byte frames) to about 1,488,000 (with 64-byte frames).
NOTE In Gigabit Ethernet, the interpacket gap is equivalent to 20 bytes. As a result, the number of packets per second for 64-byte frames can be calculated as follows:
1 Gbps ÷ 84 * 8 = 1.488 million packets per second
Similarly, the number of packets per second for 1518 frames equals:
1 Gbps ÷ 1538 bytes * 8, which is 81,274 packets per second
Network Architecture Design Options 63
The consequence is that wire-rate Gigabit Ethernet traffic could take up to the full CPU utilization, even with gigahertz processors. This is the result of interrupt processing and excessive context switching.
Two main solutions to this problem exist:
•
Interrupt coalescing—This mechanism consists of delaying the interrupt generation from the NIC to the CPU. When invoked, the CPU must process a number of packets instead of just one.•
Jumbo frames—This mechanism uses frames larger than the 1500 MTU.Interrupt coalescing is a feature available on NICs that does not require any extra network configuration. Jumbo frames allow throughput improvements and require configuration on the NICs as well as the network.
Jumbo frames are bigger than 1518 bytes. As a result, throughput increases, the CPU utilization decreases, and sending more data per frame achieves higher efficiency because of the per-packet ratio of data to control information.
How can an Ethernet frame be bigger than the maximum transmission unit? Ethernet specifications mandate a maximum frame size of 1518 bytes (1500 MTU), but, in reality, frames can exceed the 1518-byte size as long as the algorithm for the cyclic redundancy check (CRC) does not deteriorate (this happens with frames bigger than 12,000 bytes).
The typical size for jumbo frames is approximately 9000 bytes, which is well below the limits of the CRC and big enough to carry UDP Network File System (NFS).
In terms of network configuration, support for jumbo frames needs to be enabled on a per-port basis on switches and routers in the path between servers exchanging this type of traffic.
If jumbo packets go out to an interface of an intermediate network device with a smaller MTU, they are fragmented. The network device that needs to fragment performs this operation in software with an obvious performance impact.
Reverse Proxy Caching
Having a kernel mode and a user mode increases the reliability of servers regarding the operations performed by the applications. The drawback of separating the two modes is performance, which degrades depending on the number of times that the data is copied within the server when there is a transition from user mode to kernel mode, and vice versa (see the section, “User Mode and Kernel Mode”).
A solution to this problem involves deploying kernel caches in the form of either separate appliances or servers. Deployment of caches in Data Center environments typically goes by the name of reverse proxy caching.
Typically, a proxy cache is just a server, but it runs in kernel mode because it does not need to host a number of different applications. As a result, with the hardware being equivalent, a proxy cache is faster than a server in delivering static objects because of the reduced overhead on data processing. Figure 2-5 shows the architecture of a typical Data Center and highlights the placement of cache engines in relation to the aggregation switches.
Figure 2-5 Cache Attachment in a Data Center
Enterprise Campus Core
Aggregation Layer
Front End Layer
Mainframe
Servers Servers
Load Balancer Firewall SSL Offloader Cache Site Selector IDS Sensor
Network Architecture Design Options 65
Another benefit of deploying caches is outlined in Chapter 3. Using caches can simplify the design of multitier server farms.