Command Function
Chapter 22. Extending the Linux Network Architecture Functionality? KIDS
22.3 Using the KIDS Example to Extend the Linux Network Architecture
Now that we have given a brief overview of the elements in the KIDS framework, this section will discuss its implementation in the Linux kernel as an example of how the functionality of the Linux network architecture can be extended. We focus our discussion on the design and management of the components: how and why they were designed, and how they are introduced to the kernel at runtime. In addition, we will see how hooks are implemented on the basis of different existing kernel interfaces, which means that we don't have to change the kernel to be able to use KIDS. Finally, we use the
kidsd daemon as an example to show how components and hooks are configured and how they
interact between the kernel and the user level.
22.3.1 Components and Their Instances
The KIDS framework offers different types of components that can be used to implement different QoS mechanisms (e.g., token buckets? see Section 18.6.1). A component can occur more than once within a component chain, and each of these occurrences can have different parameters. This means that we should be able to create an arbitrary number of instances from a component, but still try to keep the memory required by these instances low. This principle reminds us strongly of the
object-orientation concept that lets you create an arbitrary number of object instances from a class. Although all of these classes exist independently, they have the same behavior, because they use the same methods.
This means that the component concept of Linux KIDS has an object-oriented character, though it was written in C, a programming language that doesn't support object orientation. The component concept of Linux KIDS consists of the following two parts:
• Components are QoS mechanisms implementing a specific behavior. They are managed in the
bhvr_type structure of Linux KIDS. This structure contains all properties of a component
(e.g., its behavior in the form of pointers to corresponding methods? shown below). These methods are used by several instances of that component concurrently, so they have to be reentrant. Components correspond to the principle of classes in the object-oriented model.
• Component instances are created when we need an instance of a component. To this end, we create a data structure of the type bhvr. It stores all information about this component
instance? mainly, its individual parameter configuration. The instance should have the component's behavior, so reference is made to the information stored in the bhvr_type
structure of the component. Component instances correspond to objects (or object instances) in the object-oriented model.
The following discussion introduces how these two structures are built and what the parameters mean. Subsequently, we will see how components can be registered or unregistered dynamically.
struct bhvr_type kids/kids_bhvr.h
Figure 22-4 shows how components and their instances interact. The bhvr_type structure of the
token bucket stores general component information.
Figure 22-4. The bhvr_type and bhvr structures manage components and their instances.
[View full size image]
struct bhvr_type {
char name[STRLEN]; unsigned int bhvr_class_id; unsigned long private_data_size; unsigned int instances;
struct bhvr_type *next;
int (*func) (struct bhvr *, struct sk_buff *); struct sk_buff* (*deq_func) (struct bhvr *);
int (*constructor) (struct bhvr *bhvr, char * data, int flag);
int (*destructor) (struct bhvr *bhvr);
struct bhvr* (*get_bhvr) (struct bhvr *bhvr, char * port); int (*append_bhvr) (struct bhvr *new_bhvr, struct bhvr *old_bhvr, char *port);
int (*proc) (struct bhvr *bhvr, char *ptr, int layer); int (*get_config) (struct bhvr *bhvr, char *ptr); };
The fields have the following meaning:
• name is the name of the component (e.g., Token_Bucket).
• bhvr_class_id contains the component's class. (see Section 22.2.1.) Possible values are BHVR_ID, ENQ_BHVR_ID, DEQ_BHVR_ID, DEQ_DISC_ID, and QUEUE_ID.
• private_data_size specifies the size of the private data structure used in the bhvr
structure for each instance of a component. Preferably, a separate private structure should be defined here (e.g., tb_data? see below), and a sizeof instruction to specify the size of
this structure should be inserted in this position.
• instances manages the number of instances created from a component. This variable is
managed by Linux KIDS. It should show a value of 0 when a component is removed from the
kernel.
• next is also used internally, namely to link bhvr_type structures in the bhvr_type_list.
(See Figure 22-4.)
The following elements of the bhvr_type structure are function pointers that specify the behavior of
a component and are used to managing it.
• func(bhvr, skb) refers to a function that is invoked when a packet (or a socket buffer, skb) is passed to an instance (bhvr) of this component. It implements the functionality of
this component type. A socket buffer is passed when func() is invoked. This means that this f
unction corresponds to the implementation of a packet interface and is used only for operative components and enqueuing components. Section 22.3.5 uses an example introducing the
func() method of the Token-Bucket component.
The bhvr parameter contains a pointer to the bhvr structure of the component instance,
which is passed to the socket buffer, skb, when the func() function is invoked. Because the func() method is used for all instances of the Token_Bucket component, the pointer to
the instance-specific information also has to be passed. Otherwise, it would be impossible to see which instances, with what parameter or variable assignments, is meant.
• deq_func(bhvr) is used for dequeuing and strategic components. It corresponds to the
implementation of a message interface and is invoked when a packet is requested from an instance (bhvr) of this component. A component implements only one of two functions,
either func() or deq_func(), depending on whether its input has a packet interface or a
message interface.
• constructor(bhvr, data, flag) is invoked when a bhvr instance of this component
is initialized or when its configuration changed. This method takes the character-string data with the component's private data to be configured as parameters. The flag parameter
shows whether this is the first initialization of this instance (INIT_BHVR) or it is a change to
its parameters at runtime, where only the information passed should be altered.
• destructor(bhvr) is invoked to destroy the bhvr instance of the component. All cleanup
work required (e.g., free memory or deactivate timer) should be done at this point.
• get_bhvr(bhvr, port) is invoked by KIDS to obtain a pointer to the bhvr structure of
the component instance appended to the output, port. The number and names of a
component's outputs are individual, so we have to implement a component-specific function.
• append_bhvr(new_bhvr, old_bhvr, port) connects the new_bhvr component
instance to the output, port, of the existing component instance, old_bhvr. Again, we have
to implement separate functions for the individual outputs of a component.
• proc(bhvr, ptr, layer) creates information about the bhvr component instance. This
information can be output from proc files. The layer parameter specifies the distance from
the component instance to the hook; this is required for indenting within the output. The ptr
pointer specifies the buffer space this output should be written to. (See Section 2.8.)
• get_config(bhvr, ptr) is invoked by KIDS to write the configuration of the bhvr
component instance to the ptr buffer space, based on the KIDS configuration syntax (see
Section 22.3.6).
struct bhvr kids/kids_bhvr.h
Each of the bhvr structures representing the specific instances of a component manages the
information of a component instance (e.g., name on number of references). The bhvr data structure
is built as follows:
struct bhvr {
char name[STRLEN]; unsigned int use_counter; struct bhvr *next_bhvr; struct bhvr_type *bhvr_type; char bhvr_data[0]; };
The fields have the following meaning:
• name: The name of this instance (e.g., tb0 or marker1);
• use_counter specifies the number of direct predecessors of this component instance? the
number of references to this bhvr structure.
• next_bhvr is used to link the individual bhvr data structures in the bhvr_list. This list is
managed by KIDS and used to search for a component by its name
(get_bhvr_by_name()).
• bhvr_type points to the relevant bhvr_type structure, representing the type of this
component instance. This means that this pointer specifies the behavior of the component instance, which is registered in the bhvr_type structure.
• bhvr_data is a placeholder for the private information of this component instance (as is
shown later). No type can be specified, because the structure of each component's information is individual. A type cast is required before each access? for example,
struct tb_data *data = (struct tb_data *) &(tb_bhvr->bhvr_data);}
The private information space is directly adjacent to the bhvr structure. The length of private
information is taken into account for reserving the memory of the bhvr structure. As was
mentioned earlier, it is managed in the bhvr_type structure.
Using the Token-Bucket Component as an Example for a Private Data Structure The data structure containing private information (bhvr_data) is of particular importance. Its
structure depends on the respective component, because it stores that component's parameters and runtime variables. Because all instances of a component have the same variables, though with different assignments, this data structure is stored in the instances (i.e., in the bhvr structure), and
its length (which is identical for all instances of a component) is stored in the bhvr_type structure.
This tells us clearly that all information concerning the state or configuration of a special component instance is managed in the instance itself in the private data structure of the bhvr structure.
The following example represents the private data structure of the Token_Bucket component: struct tb_data
{
unsigned int rate, bucket_size;
unsigned long token, packets_arvd, packets_in, packets_out; CPU_STAMP last_arvl, cycles_per_byte;
struct bhvr *enough_token_bhvr; struct bhvr *not_enough_token_bhvr;
int (*enough_token_func) (struct bhvr *, struct sk_buff *);
int (*not_enough_token_func) (struct bhvr *, struct sk_buff *);
};
The meaning of each of the variables in such a private data structure can be divided into three groups:
• The parameter and runtime variables of a component are individual in that the
component implements a special algorithm. This is the reason why they are managed in a private data structure of the component, which exists separately in each of that component's instances. Examples for parameter and runtime variables include the rate and
bucket_size variables in the Token-Bucket component.
• In addition, private information manages the following two elements for each component output, because the number of outputs is also individual to the respective component and so it cannot be accommodated in the bhvr_type structure:
o The first element is a function pointer to the func() function (for a packet interface)
or deq_func() (for a message interface) in the subsequent component instance.
This means that a component instance stores a reference to the handling routine for the component instance appended to this output.
o The second element is a reference to the bhvr_structure of the subsequent
component instance at this output. This pointer is used eventually to link the component instances.
The reference to the handling routine of the subsequent component instance is actually not required, because it can be identified over the bhvr_type pointer from the corresponding
structure of the successor. However, this double unreferencing method is saved at the cost of an additional pointer, for performance reasons. If no component instance is appended to an output, then the two variables take the value NULL, and a packet to be forwarded is
recursively returned to the hook. (See Section 22.3.5.)
22.3.2 Registering and Managing Components
Before we can use Linux KIDS to implement the desired QoS mechanisms, we have to tell the kernel which components are currently available. To this end, Linux KIDS maintains a list,
bhvr_type_list, to manage all registered components. This list is based on simple linking of the
respective bhvr_type data structures that store the entire information about components. (see
Figure 22-4.) Linking of the data structures into a list corresponds to the normal approach to manage functionalities in the Linux kernel. (see Section 22.1.)
We can use the function register_bhvr_type(bhvr_type) to register a component represented
by a bhvr_type structure. (See Figure 22-5.) More specifically, the bhvr_type structure is entered
in the bhvr_type_list. (See Figure 22-4.) From then on, this component is known in the kernel,
and we can create instances of that component. To remove a component from the list, we can invoke
unregister_bhvr_type(bhvr_type). Of course, we have to ensure that there are no instances
of the component left before we remove it, which is the reason why the instances variable has to
be checked first. Figure 22-5.
struct bhvr_type token_bucket_element = {
"Token_Bucket", /* name */ BHVR_ID, /* class */ sizeof(struct token_bucket_data), /* private data size */ 0, /* instances */ NULL, /* next */ token_bucket_func, /* packet interface */ NULL, /* message interface */ token_bucket_init, /* constructor */ NULL, /* destructor */ token_bucket_get, /* get bhvr of a port */ token_bucket_append, /* append bhvr on a port */ token_bucket_proc, /* proc output routine */ token_bucket_config /* get config of a bhvr */
}; int init_module(void) { register_bhvr_type(&token_bucket_element); } void cleanup_module(void) { unregister_bhvr_type(&token_bucket_element); }
In addition to the list of component categories, Linux KIDS has two other elements that can be used to register or unregister functionalities dynamically. To prevent this chapter from getting too long, we will discuss these two elements only briefly. They are managed similarly to the previous elements:
• Hooks are represented by the hook data structure; they are registered by
register_hook(hook) and unregistered by unregister_hook(hook). If a protocol
instance wants to supply a hook, it simulates a packet interface or message interface, builds an appropriate hook data structure, and registers the hook. Subsequently, components can
be appended to this hook. The files kids/layer2_hooks.c and kids/nf_hooks.c
include examples for hooks based on the TC or netfilter interface.
• Different queue categories are managed by the kids_queue_type data structure; we can
use register_queue_type() to register or unregister_queue_type() to unregister
them. An instance of a queue variant is represented by a kids_queue structure. The
management of queues is almost identical to that of component categories, but components and queues are different, so it was found necessary to manage them separately.
22.3.3 Managing Component Instances
The previous section described how we can register and manage components in Linux KIDS; this section discusses how we can manage instances of components? how component instances are created, deleted, and linked. A special syntax was developed to keep the managing of the QoS mech anisms as simple as possible. Section 22.3.6 will introduce this syntax. A character-oriented device,
/dev/kids, is used to pass configuration commands to Linux KIDS and to invoke one of the methods
introduced below.
create_bhvr() kids/kids_bhvr.c
create_bhvr(type, name, data, id) creates an instance of the type component designated
by name. For creating this instance, that component has to be present in the list of registered
components (bhvr_type_list).
Initially, storage space is reserved for the data of the new component instance. This memory space consists of a bhvr structure that is identical for all components and a private data structure that is
individual to each component. Subsequently, the bhvr structure is initialized, and the constructor of
the component occupying this private data with this component's configuration parameters is invoked. These configuration parameters were extracted from the CREATE command and passed in the data
character string. Once it has been created, the component is no longer connected to any other component. Finally, it is added to the bhvr_list.
remove_bhvr() kids/kids_bhvr.c
remove_bhvr(name, force) deletes the component instance designated by name and removes it
from the bhvr_list. force can be specified to state that the use_counter of that instance
should be ignored, as normally should not be the case, because there could still be references to this data structure. Before the data structure is released, the component's destructor is invoked to free resources, if present.
change_bhvr() kids/kids_bhvr.c
change_bhvr(name, data) can be used to alter the private data of a component instance at
runtime. All that happens here, however, is that the data character string holding the information to
be changed invokes the constructor. The INIT_BHVR flag is not set; thus, the constructor knows that
only the parameters specified have to be altered. Otherwise, the entire component instance would be reset.
22.3.4 Implementing Hooks
Hooks are extensions of existing protocol instances allowing us to easily embed QoS components based on the rules of the KIDS framework [Wehr01a]. One of the most important factors is the
position we want to extend by a hook? and thus by QoS mechanisms? within the process of a protocol instance. The reason is that we can always address a certain number of packets at specific positions (e.g., all packets to be forwarded, at the IP_FORWARD hook, or all packets of the IP instance to be
delivered locally, at the IP_LOCAL_DELIVER hook).
Thanks to its set of different interfaces, the Linux network architecture offers an inherent way to extend a protocol instance by a functionality. These interfaces have been utilized in the KIDS framework , and so the hooks shown in Figure 22-3 could be implemented without the need to change the source code of the Linux kernel. The hooks for the IP instance are based on the netfilter interface (see Section 19.3); the data-link layer hooks are based on the Traffic Control interface.
The following example represents the netfilter handling method of the IP_FORWARD hook. It merely
checks for whether a component instance is appended and invokes that instance, if present:
unsigned int ip_forward_hook_fn(unsigned int hooknum, struct sk_buff **skb, ...)
{
if (ip_forward_hook && ip_forward_hook->bhvr && ip_forward_h ook->func)
return ip_forward_hook->func(ip_forward_hook->bhvr, skb[0]); else
return NF_ACCEPT; };
Additional hooks can be integrated easily, even at runtime. To integrate a hook, we have to store the information required about the hook in a hook data structure and use the register_hook()
method to register it. The protocol instance we want to extend is then simply extended by a function call, structured similarly to the above example with the IP_FORWARD hook. You can find additional
information about the concept of hooks in [Wehr01a].
22.3.5 How a Component Works
Once we have registered all components of the KIDS framework with the kernel and created a component chain and appended it to a hook, we need a description of how such a component should operate. The following example uses a packet in the Token_Bucket component to describe how this
component operates:
token_bucket_func() kids/std_bhvr.c
int token_bucket_func(struct bhvr *tb_bhvr, struct sk_buff *skb) {
struct tb_data *data = (struct tb_data *) &(tb_bhvr->bhvr_data); CPU_STAMP now;
data->packets_arvd++; TAKE_TIME(now);
/* calcs the tokens, that are produced since the last packet arrival */ (unsigned long) data->token += (((unsigned long) (now -
data->last_arvl)) /
(unsigned long) data->cycles_per_byte);
/* check, if the bucket is overflood */ if (data->token > data->bucket_size) data->token = data->bucket_size; data->last_arvl = now;
/* check, if there are enough tokens to send the packet */ if (data->token < skb->len)
{ /* not enough tokens -> out of profile */ data->packets_out++;
/* forward the packet to the next behavior (out-of-profile) */ if ((data->not_enough_token_bhvr) && (data->not_enough_token_func)) return data->not_enough_token_func(data->not_enough_token_bhvr, skb);
} else
{ /* enough tokens -> in profile */ data->token -= skb->len;
data->packets_in++;
/* forward the packet to the next behavior (in-profile) */ if ((data->enough_token_bhvr ) && (data->enough_token_func)) return data->enough_token_func(data->enough_token_bhvr, skb); }
return KIDS_ACCEPT; /* Do not discard packet, when no behavior is attached */
}
The Token_Bucket component belongs to the operative component class, which means that it has a
packet input and up to n packet outputs. In this example, these are the Conform and Non_Conform
outputs.
When an instance of the Token_Bucket component receives a packet, the corresponding func()