• No results found

Part II: Architecture of Network Implementation

Chapter 5. Network Devices

5.3 Network Drivers

The large number of different protocols in the Linux network architecture leads to considerable

differences in the implementations of drivers for different physical network adapters. As was mentioned in the section that described the net_device structure, the properties of different network adapters

are hidden at the interface of network devices, which means that they offer a uniform view upwards. Hiding specific functions (i.e., abstracting from the driver used) is achieved by using function pointers in the net_device structure. For example, a higher-layer protocol instance uses the method hard_start_xmit() to send an IP packet over a network device. Notice, however, that this is

merely a function pointer, hiding the method el3_start_xmit() in the case of a 3c509 network

adapter. This method takes the steps required to pass a socket buffer to the 3c509 adapter. The upper layers of the Linux network architecture don't know which driver or network adapter is actually used. The function pointer can be used to abstract from the hardware actually used and its particularities. The following sections provide an overview of the typical structuring and implementation

characteristics of the functions of a network driver, without discussing adapter-specific properties, such as manipulating the hardware registers or describing the transmit buffers. In general, these tasks depend on the hardware, so we will skip them here. Readers interested in these details can use the large number of network drivers included in the drivers/net directory as examples. We use the skeleton driver to explain how driver methods work. This is a sample driver used to show usual

processes in driver methods rather than a real driver for a network adapter. For this reason, it is particularly useful for explaining the implementation characteristics of network drivers.[2]

[2] At this point, we would like to thank Donald Becker, who implemented most of the network drivers for Linux, greatly contributing to the success of Linux. Donald Becker is also the author of the

skeleton driver used here.

Some of the methods listed below are not implemented by some drivers (e.g.,

example_set_config() to change system resources at runtime); others are essential, such as example_hard_start_xmit() to start a transmission process.

5.3.1 Initializing Network Adapters

Before a network device can be activated, we first have to find the appropriate network adapter; otherwise, it won't be added to the list of registered network devices. The init() function of the

network driver is responsible for searching for an adapter and initializing its net_device structure

with patching driver information. Because we search for a network adapter, this function is often called search function.

The argument of the init() method is a pointer to the initializing device dev. The return value of init() is usually 0, but a negative error code (e.g., -ENODEV) when no adapter was found.

net_init()/net_probe() net/core/dev.c

The tasks of the method dev->init(dev) are explained in the source text of our example driver, isa_skeleton. There is an example driver in drivers/net/pci_skeleton.c for PCI network

adapters, but we will not describe it here.

As was mentioned earlier, the main task of the init() method is to search for a matching network

adapter (i.e., it has to discover the I/O port, especially of the basic address stored in

dev->base_addr).

We distinguish between two different cases of searching for a network adapter:

• Specifying the basic address: In this case, the previously created net_device structure of

the network device is passed as parameter to the init() method. The caller can use this

structure to specify a basic address for I/O ports in advance. When no matching adapter is found in this address, then the init() method returns the error message -ENODEV. The

basic address can be specified in either of the two following ways:

o For modularized drivers, parameters can be passed when loading the module,

including the I/O basic address (e.g., io=0x280). In this case, it should be

transferred to the net_device structure of the network device in the init_mod ule() method of the driver module, so that it will be considered during the search

for the network adapter.

o For drivers permanently integrated in the kernel, we can also pass parameters when

the system boots; these parameters are maintained in the list dev_boot_setup.

They are transferred to the net_device structure of a network device in the

method init_netdev() (see Section 5.2) and can be used when the network

adapter is initialized.

• Searching in known basic addresses: A network adapter generally supports a set of defined port addresses. If no basic address is specified when calling the init() method, then the

addresses in this list can be probed one after the other. If no adapter can be found in any of these basic addresses in the list, then -ENODEV is returned.

The following source code of the init() method for the skeleton driver handles only the selection of

basic addresses where we want to search (by the methods described above). The actual verification of a specific basic address and the initialization of the net_device structure takes place in the method netcard_probe1(dev, ioaddr), which is actually part of the init() method and was

implemented separately to keep the code simple and easy to understand.

/* The name of the card. Is used for messages and in the requests for * io regions, irqs and dma channels */

static const char* cardname = "netcard"

/* A zero-terminated list of I/O addresses to be probed. */ static unsigned int netcard_portlist[] __initdata =

{ 0x200, 0x240, 0x280, 0x2C0, 0x300, 0x320, 0x340, 0};

/* The number of low I/O ports used by the ethercard. */ #define IO_NUM 32

/* Information that needs to be kept for each board. */ struct net_local {

struct net_device_stats stats;

long open_time; /* Useless example local info. */

/* Tx control lock. This protects the transmit buffer ring * state along with the "tx full" state of the driver. This * means all netif_queue flow control actions are protected * by this lock as well. */

spinlock_t lock;

};

/* The station (ethernet) address prefix, used for IDing the board. */ #define SA_ADDR0 0x00

#define SA_ADDR1 0x42 #define SA_ADDR2 0x65

int__init netcard_probe(struct net_device *dev) { int i;

int base_addr = dev->base_addr;

SET_MODULE_OWNER(dev);

if (base_addr > 0x1ff) /* Check a single specified location. */ return netcard_probe1(dev, base_addr);

else if (base_addr != 0) /* Don't probe at all. */ return -ENXIO;

for (i = 0; netcard_portlist[i]; i++) { int ioaddr = netcard_portlist[i]; if (check_region(ioaddr, IO_NUM)) continue; if (netcard_probe1(dev, ioaddr) == 0) return 0; } return -ENODEV; }

Once we have selected a basic address for the network adapter in the above method, the method

netcard_probe1(dev, ioaddr) tests whether the adapter we searched for is really at this basic

address. For this purpose, the method has to check specific properties of the card, where access should be limited to read access on the I/O ports to ensure that no other adapters will be involved. At this point, it is still unknown whether the adapter we're searching for is really present in the basic address ioaddr.

A very simple method to identify the adapter compares the manufacturer identification with the MAC address. Each network adapter has a unique MAC address, where the first three bytes identify the manufacturer. This identification must correspond with the manufacturer code of the searched card. In any event, additional checks should be done, but they are adapter-specific and are not described in detail here.

Once we are sure that the network adapter we searched for is present in the basic address ioaddr,

this address is stored in the net_device structure (dev->base_addr), and the network device is

initialized. The I/O ports, starting from the basic address, are reserved by

request_region(ioaddr, IO_NUM, cardname) at the end of the initialization function to

ensure that no other initialization method can get write access to it. The initialization process can be divided into the following three phases:

• If the network adapter does not support dynamic interrupt allocation, then the interrupt set by jumpers on the network adapter should be determined and reserved at this point. The kernel supports the search for the interrupt number. Calling the method autoirq_setup() makes

the kernel remember interrupt lines not currently registered in a variable. Subsequently, the network adapter should be caused to trigger an interrupt. We can then use the method

autoirq_report() to discover, from the previously stored and the actual interrupt vectors,

which interrupt was actually active. Next, the interrupt found is reserved for the network adapter by the method request_irq(). In addition, the DMA channel is determined and

reserved by request_dma().

For modern adapters that do not necessarily require specific interrupt or DMA lines, the two system resources are allocated not at this point, but rather when the device is opened. This is necessary to avoid conflicts with other devices.

• Once system resources have been allocated (for older adapters only), memory is reserved for the private data structure of the network device dev->priv and is initialized. This data

structure stores the private data of the network driver and statistic information collected during the operation of the network device (net_device_stats structure).

• Finally, the references to driver-specific methods are set in the net_device structure, so

that they can be used by the higher layers and protocols. The adapter-specific methods (see also Section 5.1.1) have to be set explicitly. Methods specific to the MAC protocol used (e.g., Ethernet) can be set by special methods (e.g., ether_setup()).

If the network adapter was found and all data structures were initialized correctly, then dev->init()

returns 0.

/* This is the real probe routine. Linux has a history of friendly device * probes on the ISA bus. A good device probe avoids doing writes, and * verifies that the correct device exists and functions.*/

static int __init netcard_probe1(struct net_device *dev, int ioaddr) { struct net_local *np;

static unsigned version_printed = 0; int i;

/*

* For Ethernet adaptors the first three octets of the station address * contains the manufacturer's unique code. That might be a good probe * method. Ideally you would add additional checks.

*/ if (inb(ioaddr + 0) != SA_ADDRO || inb(ioaddr + 1) != SA_ADDR1 || inb(ioaddr + 2) != SA_ADDR2) { return -ENODEV; }

if (net_debug && version_printed++ == 0) printk(KERN_DEBUG "%s", version);

printk(KERN_INFO "%s: %s found at %#3x, ", dev->name, cardname, ioaddr);

/* Fill in the 'dev' fields. */ dev->base_addr = ioaddr;

/* Retrieve and print the Ethernet address. */ for (i = 0; i < 6; i++)

printk(" %2.2x", dev->dev_addr[i] = inb(ioaddr + i));

#ifdef jumpered_interrupts

/* If this board has jumpered interrupts, allocate the interrupt * vector now. There is no point in waiting since no other device * can use the interrupt, and this marks the irq as busy. Jumpered * interrupts are typically not reported by the boards, and we must * used autoIRQ to find them. */

/* ... REMOVED for this book, details see in drivers/net/isa-skeleton.c */

#endif /* jumpered interrupt */ #ifdef jumpered_dma

/* If we use a jumpered DMA channel, that should be probed for and * allocated here as well. See lance.c for an example.*/

/* ... REMOVED for this book, details see in drivers/net/isa-skeleton.c */

#endif /* jumpered DMA */

/* Initialize the device structure. */ if (dev->priv == NULL) {

dev->priv = kmalloc(sizeof(struct net_local), GFP_KERNEL); if (dev->priv == NULL)

return -ENOMEM; }

memset(dev->priv, 0, sizeof(struct net_local));

np = (struct net_local *)dev->priv; spin_lock_init(&np->lock);

/* Grab the region so that no one else tries to probe our ioports. */ request_region(ioaddr, IO_NUM, cardname);

dev->open = net_open; dev->stop = net_close; dev->hard_start_xmit = net_send_packet; dev->get_stats = net_get_stats; dev->set_multicast_list = &set_multicast_list; dev->tx_timeout = &net_tx_timeout; dev->watchdog_timeo = MY_TX_TIMEOUT;

/* Fill in the fields of the device structure with Ethernet values. */ ether_setup(dev);

return 0;

}

Helper Functions to Allocate System Resources

request_region(), release_region(), check_region() kernel/resource.c

request_region(port, range, name) reserves a region of I/O ports, starting with the address

port, and marks them as allocated. The kernel manages these reserved port ranges in a linear list. This list can be output from the proc file /proc/ioports, where name is the output name of the

reserved instance.

We reserve ports to prevent a driver that searches for an adapter from accessing the ports of another device, causing that device to take an undefined or unintended state. For this reason, before port ranges are assigned, we should always use check_region() to check on whether that range is

already taken. The address of the first I/O port of an adapter is stored in the variable

dev->base_addr.

release_region(start, n) can be used to release allocated port ranges. request_irq(), free_irq() kernel/irq.c

request_irq(irq, handler, flags, device, dev_id) reserves and initializes the

interrupt line with number irq. At the same time, the handling routine handler() is registered for

this interrupt.

Similarly to what it does with I/O ports, the kernel manages a list of reserved interrupts and can output this list in the proc directory (/proc/interrupts). Again, the string device tells you who

reserved this interrupt. The parameter flags can be used to output options when reserving an

interrupt. For more information, see [RuCo01].

A reserved interrupt can be released by free_irq(irq, dev_id). request_dma(), free_dma() kernel/dma.c

request_dma(dmarr, device_id) tries to reserve the DMA channel dmarr. free_dma(dmarr) can be used to release a reserved DMA channel.

5.3.2 Opening and Closing a Network Adapter

We know from Section 5.2 that network devices are activated and deactivated by the command

ifconfig. More specifically, ioctl() calls invoke the methods dev_open() or dev_close(),

where the general steps to activate and deactivate a network device are executed. The

adapter-specific actions are handled in the driver methods dev->open() and dev->stop(), resp

ectively, of the present network adapter. We use the skeleton sample driver to explain these steps. net_open() drivers/net/isa_skeleton.c

The open() method is responsible for initializing and activating the network adapter. At the

beginning, the system resources required (interrupt, DMA channel, etc.) are requested. To make available these system resources, the kernel offers various methods you can use as helpers. These methods were introduced briefly in the previous section. System resources are reserved in the open()

method for modern adapters, which do not have fixed values for IRQ and DMA lines. For older cards, the resources are searched for and reserved in the init() method. (See init().)

Once a network adapter has been initialized successfully, the use counter of the module should be incremented for modularized drivers, to prevent inadvertent loading of the driver module from the kernel. We can use the macro MOD_INC_USE_COUNT for this purpose.

The network adapter is initialized when all system resources have been allocated successfully. Each adapter is initialized in an individual manner. Normally, a specific value is written to a hardware register (I/O port) of the adapter, which causes the adapter to initialize itself.

The transmission of packets over the network device is started by netif_start_queue(dev).

Finally, the value 0 is returned if the transmission was successful; otherwise, a negative error code is returned.

/*

* Open/initialize the board. This is called (in the current kernel) * sometime after booting when the 'ifconfig' program is run.

*

* This routine should set everything up anew at each open, even * registers that "should" only need to be set once at boot, so that * there is non-reboot way to recover if something goes wrong.

*/

static int net_open(struct net_device *dev) {

struct net_local *np = (struct net_local *)dev->priv; int ioaddr = dev->base_addr;

/*

* This is used if the interrupt line can turned off (shared). * See 3c503.c for an example of selecting the IRQ at config-time. */

if (request_irq(dev->irq, &net_interrupt, 0, cardname, dev)) return -EAGAIN;

} /*

* Always allocate the DMA channel after the IRQ, and clean up on failure. */ if (request_dma(dev->dma, cardname)) { free_irq(dev->irq, dev); return -EAGAIN; } MOD_INC_USE_COUNT;

/* Reset the hardware here. Don't forget to set the station address. */

chipset_init(dev, 1); outb(0x00, ioaddr); np->open_time = jiffies;

/* We are now ready to accept transmit requests from * the queuing layer of the networking.

*/

netif_start_queue(dev); return 0;

}

Deactivating a Network Adapter

example_stop() drivers/net/isa_skeleton.c

During deactivation of a network adapter, all operations done when the adapter was opened should be undone. This concerns mainly allocated system resources (interrupts, DMA channels, etc.), which should now be freed.

For modularized drivers, the use counter has to be decremented with MOD_DEC_USE_COUNT, and the

network device must not accept any more packets from higher layers (netif_stop_queue). Again,

the return value is either 0, if successful, or a negative error code.

/* The inverse routine to net_open(). */

static int net_close(struct net_device *dev) {

struct net_local *lp = (struct net_local *)dev->priv; int ioaddr = dev->base_addr;

lp->open_time = 0; netif_stop_queue(dev);

/* Flush the Tx and disable Rx here. */ disable-dma(dev->dma);

/* If not IRQ or DMA jumpered, free up the line. */

outw(0x00, ioaddr+0); /* Release the physical interrupt line. */ free_irq(dev->irq, dev);

free_dma(dev->dma);

/* Update the statistics here. */ MOD_DEC_USE_COUNT;

return 0; }

5.3.3 Transmitting Data

Each data transmission in the Linux network architecture occurs over a network device, more specifically by use of the method hard_start_xmit() (start hardware transmission). Of course,

this is a function pointer, pointing to a driver-specific transmission function, ..._start_xmit().

This method is responsible for forwarding the packet in the form of a socket buffer and starting the transmission. Before we discuss the usual steps involved in the driver method

dev->hard_start_xmit() in this section, we will briefly describe the common architecture of

network adapters.

A network adapter is an interface adapter that automatically transmits and receives network packets according to a defined MAC protocol (Ethernet, token ring, etc.). This means that a network adapter has an independent logic that works in parallel to the regular central processor(s). The network adapter and a system processor interact over I/O ports (hardware registers) and interrupts. When a processor wants to pass data to the network adapter, then the processor writes its data to the appropriate I/O ports and starts the desired action. When the adapter wants to pass data to the processor (e.g., a packet it received), then the adapter triggers an interrupt, and the processor uses the interrupt-handling routine of the network adapter to serve the network adapter. This shows clearly that system processors have a leading role versus interface adapters (master? slave relationship). Transmitting Data Packets

net_start_xmit() drivers/net/isa_skeleton.c

dev->hard_start_xmit(skb, dev) is responsible for forwarding a data packet to the network

adapter so that the latter can transmit it. The packet data of the socket buffer is copied to an internal buffer location in the network adapter, and the time stamp dev->trans_start = jiffies is at

tached, marking the beginning of that transmission. If this copying action was successful, it is also assumed that the transmission will be successful. In this case, hard_start_xmit() has to return a

value of 0. Otherwise, it should return 1, so that the kernel knows that the packet could not be sent. When forwarding network packets between the operating system and the network adapter, we can distinguish between two different techniques:

• Older network adapters (e.g., 3Com 3c509) have an internal buffer memory on the adapter for packets to be sent. This means that the kernel can always forward only one single packet to the adapter at a time. If a buffer is free, a packet is copied to the adapter right away and the kernel can delete the corresponding socket buffer.

• More recent network adapters work differently. The driver manages a ring buffer consisting of