• No results found

Command Function

16.3.4 The Forwarding Process

Section 14.2.1 discussed how a forwarding query is embedded into the processing of incoming IP packets: ip_rcv_finish() invokes ip_route_input() to find a dst_entry structure to

determine the packet's further route. Section 14.2.2 discussed outgoing packets: their routing decision is made in ip_route_output(), which is invoked by, for example, ip_queue_xmit().

ip_route_input() net/ipv4/route.c

The function ip_route_input() is invoked for each IP packet arriving over a network interface.

The parameters are a pointer to the socket-buffer structure, the destination and source addresses, the TOS value, and a pointer to the net_device structure of the receiving network interface.[4]

[4] The last four parameters could, alternatively, be worked out from the first. However, because they are passed separately, knowledge about their representation in the socket-buffer structure does not have to be present in ip_route_input(). Though the socket buffer structure is still not entirely treated as an encapsulated "black box," at least only few data elements especially present for routing are

accessed.

First, rt_hash_code() is used on the addresses and the TOS value to compute an index in the hash

table of the routing cache. If necessary, the list anchored in the chain element is walked through to

find a cache entry matching addresses, input interface, TOS value, and fwmark, if present. If this

search is successful, then a pointer to the entry is placed as dst in the sk_buff structure, and the

task is complete.

If no matching cache entry is found, then either of the two following functions is responsible for further handling:

• ip_route_input_mc() is invoked if the destination address is a multicast address. Another

prerequisite is that the input interface either belongs to that multicast group or has been configured for multicast routing. The packet can be discarded if this is not the case. The fun ction ip_route_input_mc() will be discussed later in the chapter about IP multicast

Section 17.4. What is done there is similar to the procedure for local-destination addresses described below, the only difference being that the packet is always delivered to the local machine rather than causing an FIB query.

• ip_route_input_slow() serves to handle "normal" destination addresses and is

described next.

Both functions take the same parameters as ip_route_input() itself. ip_route_input_slow() net/ipv4/route.c

To begin with, an rt_key structure is filled with the parameters passed. However, before it is used to

run an FIB query, the addresses are checked for invalid values? multicast source addresses, and addresses moving a network prefix beginning with null. Such packets are dropped, and, if verbose messages are configured with CONFIG_IP_ROUTE_VERBOSE, they are registered in the system log.

The use of 0.0.0.0 as source and destination addresses in the sense of limited broadcast is explicitly

allowed as an exception, for this is occasionally used for automatic network configuration. Next, the FIB query is started by calling fib_lookup(). If no matching entry is found, then ip_route_input_slow() also aborts processing and returns an error code, which subsequently

causes ip_rcv_finish() (from where ip_route_input_slow() was invoked) to discard the

packet.

If the routing NAT mentioned in Section 16.1.6 is active, the next step transforms the source address according to the information in the routing rule used (or the destination address, if the route found is an nat route). However, the new addresses are added only to the rt_key structure; the old

addresses are maintained in the call parameters of the function and are available for other operations. Once the destination address has been transformed, fib_lookup() has to be invoked once more to

find a regular routing entry (no other transformations are permitted) to the new destination address. Among other things, the result from the FIB query also shows whether the destination address is a local address, which means that the packet is intended for the local system. This case is handled separately in the further process.

• Local destination address: A new cache entry is created once the source address has been checked by the calling of fib_validate_source(). The function pointer output() gets the ip_rt_bug() value, because the packet is not allowed to leave the system. Next, the input() pointer is set to ip_local_deliver(), to cause the packet to be

delivered to the local machine. Because there is no next router, the rt_gateway element of

the cache entry is set to the destination address.

Broadcast addresses, such as the 0.0.0.0 address mentioned above as source and

destination or the normally limited broadcast to 255.255.255.255 are detected in the

address validation at the beginning and handled identically to local destination addresses. However, fib_validate_source() is not called for 0.0.0.0 addresses.

• Nonlocal destination address: Nonlocal destination addresses have to be handled only if the forwarding function for the input interface is enabled, so this is checked first.

If the routing-table entry found describes several output routes, then fib_select_multipath() is

called to select one of those routes. Subsequently, the source address is checked by the function

fib_validate_source(), which can also consider the output network interface found in this case. ip_forward() and ip_output() are set in the new cache entry for the input() and output()

function pointers. The function rt_set_nexthop() does the necessary assignment to rt_gateway

and also fills in other elements of the dst_entry structure that are of interest for forwarded packets

only.

rt_intern_hash() is used to integrate the rtable structure, which is almost complete, into the

hash table of the routing cache. It also supplies the return value of ip_route_input_slow(),

which then is complete.

ip_route_output() include/net/route.h

According to a comment in the route.h header file, where it is implemented as an inline function,

the function ip_route_output() is going to be replaced by ip_route_output_key(). Actually,

however, it is invoked in many different positions within the network implementation as the main routing interface (e.g., by the IP transmit function ip_queue_xmit() [see Section 14.2.2] or by udp_sendmsg() [see Section 14.2.2 25.3.1] for packets created by UDP).

Its only function currently is to create an rt_key structure with the source and destination addresses,

the TOS value, and the output device from the passed parameters and to subsequently invoke the function ip_route_output_key(), which will be described next. The last parameter is the pointer struct rtable **rp, which serves to return the result; it is also passed to

ip_route_output_key().

ip_route_output_key() net/ipv4/route.c

The task of ip_route_output_key() is to determine a routing entry for the rt_key structure

passed. The procedure is similar to that of ip_route_input(); the routing cache is searched for a

matching entry, and the process branches to ip_route_output_slow() only if no such entry is

found. Rather than the input interface, the output interface is used for hash computation and comparisons.

ip_route_output_slow() net/ipv4/route.c

ip_route_output_slow() is invoked if no entry for the destination of a locally created IP packet

exists in the routing cache. This function runs an FIB query, enters the result in the routing cache, and returns the new entry. In addition, it handles several special cases for which fib_lookup() alone is

not sufficient.

As with ip_route_output_key(), the only input parameter is an rt_key structure. This structure

initially is copied to a new structure of the same type, which can then be modified without losing the information passed. The iif and scope information is not considered; instead, the loopback device is

always assumed for iif, and scope is set to either RT_SCOPE_LINK or RT_SCOPE_UNIVERSE,

depending on the RTO_ONLINK flag in the tos element.

The input parameters are first checked for errors or special cases. For example, for multicast destination addresses, a route is created immediately without FIB query if a valid source address is specified, which can be used to identify a usable output device. This special handling simplifies the transmission of multicast packets (and some multicast tools that traditionally utilize this possibility continue to work). For a specified output interface, a matching source address is found, and

127.0.0.1 is used for the destination address, if none is specified.

An FIB query is started once all preparations have been completed. Notice that the process can, in some cases, continue even if this query returns a negative result. In fact, if an output interface was sp ecified during the call, then this interface is used by simply assuming that the destination is in the adjacent network.

The query result can be used to distinguish between local and nonlocal destination addresses. The loopback device is always the output interface used for local addresses, but

fib_select_multipath() might have to select one out of several routes, or

fib_select_default() might have to choose from several default routes for non-local addresses.

Again, ip_route_output_slow() completes the job by filling a new rtable structure, where the

function rt_set_nexthop() is used, similarly to ip_route_input_slow(). The output()

function pointer is set to ip_output(), and the input() pointer is set to ip_local_deliver(),

if the destination address is in the local system. rt_intern_hash() is used to add the cache entry

to the hash table and also yields the return value for ip_route_output_slow().

Chapter 17. IP Multicast for Group