3 Network Interface Layer
3.5 Buffer Management
Incoming packets must be placed in memory and passed to the appropriate protocol software for processing. Meanwhile, when an application program generates output, it must be stored in packets in memory and passed to a network hardware device for transmission. Thus, the network interface layer accepts outgoing data in memory and passes incoming data to higher-level protocol software in memory. The ultimate efficiency of protocol software depends on how it manages the memory used to hold packets. A good design allocates space quickly and avoids copying data as packets move between layers of protocol software.
Ideally, a system could make memory allocation efficient by dividing memory into fixed-size buffers, where each buffer is sufficient to hold a packet. In practice, however, choosing an optimum buffer size is complex for several reasons. First, a computer may connect to several networks, each of which has its own notion of maximum packet size.
Furthermore, it should be possible to add connections to new types of networks without changing the system's buffer size. Second, IP may need to store datagrams larger than Interface between IP and networks
Datagrams to and from local host
application program may choose to send or receive arbitrary size messages.
3.5.1 Large Buffer Solution
It may seem that the ideal solution is to allocate buffers that are capable of storing the largest possible message or packet. However, because an IP datagram can be 64K octets long, allocating buffers large enough for arbitrary datagrams quickly expends all available memory on only a few buffers. Furthermore, small packets are the norm; large datagrams are rare. Thus, using large buffers can result in a situation where memory utilization remains low even though the system does not have sufficient buffers to accommodate traffic.
In practice, designers who use the large buffer approach usually choose an upper bound on the size of datagrams the system will handle, D, and make buffers large enough to hold a datagram of size D plus the physical network frame header. The choice of D is a tradeoff between allowing large datagrams and having sufficient buffers for the expected traffic. Thus, D depends on the expected size of buffer memory as well as the expected use of the system. Typically, timesharing systems choose values of D between 4K and 8K bytes.
3.5.2 Linked List Solutions (mbufs)
The chief alternative to large buffers uses linked lists of smaller buffers to handle arbitrary datagram sizes. In linked list designs, the individual buffers on the list can be fixed or variable size. Most systems allocate fixed size buffers because doing so prevents fragmentation and guarantees high memory utilization. Usually, each buffer is small (e.g., between 128 and IK bytes), so many buffers must be linked together to represent a complete datagram. For example, Berkeley UNIX uses a linked structure known as the mbuf, where each mbuf is 128 bytes long. Individual mbufs need not be completely full; a short header specifies where data starts in the mbuf and how many bytes are present. Permitting buffers on the linked list to contain partial data has another advantage: it allows quick encapsulation without copying. When a layer of software receives a message from a higher layer, it allocates a new buffer, fills in its header information, and prepends the new buffer to the linked list that represents the message.
Thus, additional bytes can be inserted at the front of a message without moving the existing data.
3.5.3 Our Example Solution
Our example system chooses a compromise between having large buffers sufficient to store arbitrary datagrams and linked lists of small buffers: it allocates many network buffers large enough to hold a single packet and allocates a few buffers large enough to hold large datagrams. The system performs packet-level I/O using the small buffers, and
only resorts to using large buffers when generating or reassembling large datagrams.
This design was chosen because we expect most datagrams to be smaller than a conventional network MTU, but want to be able to reassemble larger datagrams as well.
Thus, in most instances, it will be possible to pass an entire buffer to IP after reading a packet into it; the system will only need to copy data when reassembling a large datagram.
To make buffer processing uniform, our system uses a self-identifying buffer scheme provided by the operating system. To allocate a buffer, the system calls function getbuf and specifies whether it needs a large buffer or a small one. However, once the buffer has been allocated, only the pointer to it need be saved. To return the buffer to the free list, the system call freebuf, passing it a pointer to the buffer being released; freebuf deduces the size of the buffer automatically. The advantage of having the buffer be self-identifying is that protocol software can pass along a pointer to the buffer without having to remember whether it was allocated from the large or small group. Thus, outgoing packets can be kept in a simple list that identifies them by address. Once a device has transmitted a packet, the driver software can call freebuf to dispose of the buffer without having to know the buffer type.
3.5.4 Other Suffer Issues
DMA Memory. Hardware requirements often complicate buffer management. For example, some devices can only perform I/O in an area of memory reserved for direct memory access (DMA). In such systems, the operating system may choose to allocate two sets of buffers: those used by protocol software and those used for device transfer.
The system must copy outgoing data from conventional buffers to the DMA area before transmission, and must copy incoming data from the DMA area to conventional buffers.
Gather-write, scatter-read. Some devices can transmit or receive packets in noncontiguous memory locations. On output, such devices accept a list of buffer addresses and lengths. They gather pieces of the packet from buffers on the list, and transmit the resulting sequence of bytes without requiring the system to assemble the packet in contiguous memory locations. The technique is known as gather-write.
Similarly, the hardware may also support scatter-read in which the hardware deposits the packet in noncontiguous memory locations according to a list of buffer addresses specified by the device driver. Obviously, gather-write and scatter-read make linked buffer allocation easy and efficient because they allow the hardware to pick up pieces of the packet from the buffers on the linked list without requiring the processor to assemble a complete packet in memory. These techniques can also be used with fixed-size buffers because they allow the driver to encapsulate a datagram without copying it. To do so, the driver places the frame header in one part of memory and passes to the hardware the address of the header along with the address of the datagram, which becomes the data portion of the physical packet.
Page alignment. In a computer system that supports paged virtual memory, protocol software can attempt to allocate buffers on page boundaries, making it possible to pass the buffer to other processes by exchanging page table entries instead of copying. The technique is especially useful on machines with small page sizes (e.g., a Digital Equipment Corporation. VAX architecture, which has 512 byte pages), but it does not work well on computers with large page sizes (e.g., Sun Microsystems Sun 3 architecture. which has 8K byte pages). Furthermore, swapping page table entries improves efficiency most when moving data between the operating system and an application program. However, incoming packets contain a set of headers that make the exact offset of user data difficult or impossible to determine before a packet has been read. Therefore, few implementations try to align data on page boundaries.