• No results found

Memory Footprint Reduction

4.4 Data Structure and Implementation

4.4.1 Memory Footprint Reduction

The first way to improve our algorithms is to reduce the memory usage of our data structure. In particular, a more conscious usage of main memory gives us an higher chance to find the data in the CPU cache, and this can give an important boost to the performance of the matcher. Here we describe three implementation techniques that we use in our implementation to reduce the memory requirements:

1. permute the bits of the descriptors in the FIB according to their popularity.

2. implement the trie using a vector.

3. remove the chains (list of nodes with a single child) in the most depth levels of the trie.

In the following of this section we describe how we apply these implementa- tion techniques on the original data structure, and we analyze the implications and the advantages of each technique.

93 4.4 Data Structure and Implementation

Bit Permutation

The first transformation that we apply to our trie is to sort the bits in the descrip- tors contained in the FIB by popularity. Pushing the most popular bits in the first part of the descriptors we create many descriptors with a common prefix. Since we use a prefix-trie, we need to create only one path for all the descriptors that share the same prefix. Less paths to store means less nodes in the trie, and so less memory usage.

Bit Pos Bit Freq New Pos

1 2 3 2 1 5 3 4 2 4 1 6 5 5 1 6 0 9 7 1 7 8 2 4 9 1 8 10 0 10

Original Filters Permuted Filters

(1,5) (1,3) (1,3,8) (2,3,4) (2,3,5) (1,2,5) (3,4,5,9) (1,2,6,8) (3,5,7) (1,2,7) (5,8) (1,4) * 142 233 24 3 322 422 333 533 644 733 $22 $22 433 $3 3 844 $33 $33 $44

Figure 4.4. Trie compression with bit popularity (the original trie is the one presented in Figure 4.1)

Figure 4.4 shows this transformation applied to the trie in Figure 4.1. The ta- ble in the top left part of the picture shows the frequency of appearance of each bit. The new position is computed ranking the bits according to their frequency.

94 4.4 Data Structure and Implementation

What we obtain is a permutation that we apply to each descriptor in the ta- ble. For example, the descriptor (2,3,5) becomes (1,2,5), because the permuted position of 2 is 5, the new position of 3 is 2 and the one of 5 is 1.

The new trie that we obtain is represented in the lower part of Figure 4.4. This trie has 18 nodes, while the original one in Figure 4.1 has 22 nodes. Al- though the compression obtained in terms of nodes in this small example is not high, in reality the bit permutation is quite effective. In particular, this tech- nique is more efficient when the frequency of appearance of the bits is skewed, so there are only few bits that are set to one with high probability. This is some- thing that can happen easily in reality. For example, if each application adds an application-tag to the descriptors, then the bits related to popular applica- tions will have high probability to be set. The same happens also in hierarchical names, especially with names derived from urls. In a url like name there are just few domain names that can be used as a first name component (e.g com, it, ch, . . . ). These components are much more popular than other words, and this popularity is reflected to the bits set for these components in the descriptors.

Another advantage of this technique is that it helps us to skip significant parts of the trie during the matching execution. If an incoming descriptor does not have a popular bit set to 1, the algorithm can skip the entire subtree under such a bit.

The main disadvantage of this technique is that we need to permute also the bits in the descriptor of all the input packets of the matcher. However, this can be done in a fast way, and the algorithm requires a time proportional to the ones that appear in the descriptor of the packet.

Vector Representation

One of the main disadvantages in the trie representation is the usage of pointers. Pointers add a lot of overhead on each node. Each node carries only 3 bytes of useful informations, namely 1 byte for the position, 1 byte for the maximum depth and 1 byte for the minimum depth. All the rest, the pointers in particular, is overhead added by the data structure. One way to reduce this overhead is to consider the trie as a first child-next sibling binary trie, and then transform it in a vector, as described in Figure 4.5. The figure represents the same trie of Figure 4.4. In this representation each node has a single pointer to its first child, which is represented with a plain arrow in the picture, and all sibling nodes are linked together in a list, indicated with dashed arrows.

This data structure can be represented with a vector, where the pointer to the first child of a node is an index or an offset to a particular cell of the vector,

95 4.4 Data Structure and Implementation * 14 2 233 24 3 322 422 333 533 644 733 $22 $22 433 $33 844 $33 $33 $44 1 2 3 4 5 15 6 7 8 13 14 16 9 10 12 17 11 14 2 233 243 322 422 533 644 733 $33 844 $44 $33 $22 $22 333 433 $33 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

Figure 4.5. Trie represented as a vector

and adjacent nodes are sibling nodes on the trie. In order to recognize the last child of a node (the right most sibling) we add a new field in the node called

last_child. The lower part of Figure 4.5 shows the vector representation of the trie. Each node has a position, a maximum and a minim depth, as in the original trie. The pointer, implemented with a 32-bit integer, is represented with an arrow and the last_child field is indicated with a vertical bar that separates sets of sibling nodes.

The numbers close to each node of the trie indicates the position of the node in the vector. There are many ways to sort the nodes in the vector, the one in the figure is just an example. Different node layouts define different memory access patterns, which has an impact on the performance of the matching algorithm. Later we analyze two possible node layouts to see which one better fits our matching algorithm.

96 4.4 Data Structure and Implementation

Chains Removal

The last transformation that we apply to the prefix-trie to reduce the footprint of the data structure is to remove the chains. With the term chain we indicate a sequence of nodes with a single child. In this implementation we focus only on the chains that terminate in a final node, namely a node with position $. A chain is highlighted on the left side of Figure 4.6. When we reach a chain during the matching algorithm, we have the chance to match only a single descriptor, and this is because a chain defines a single path, that corresponds to a single descriptor. For this reason a chain is not useful to navigate among descriptors in the FIB, it simple says yes or no to a particular match. We decided to remove these chains from the trie, and store them in another data structure. The best candidate that we have is the list of tree-interface pairs. In fact, in case of match, we need to access this data structure anyway, and, since we do not expect really long chains, the probability to get a match when we enter a chain is high. The chains with the next hop information are stored in a vector of bytes, as described on the right part of Figure 4.6.

233 33 3 433 $33 T1,i3 T1,i6 $33 3 2 3 4 2 3 6 number of positions in the chain positions in the chain number of output interfaces output interfaces

Figure 4.6. A chain in the trie (on the right) and its representation (on the left)

As described by the example, in order to remove a chain from the trie, we re- move all the nodes in the chain and we add a final node instead of the first node

97 4.4 Data Structure and Implementation

of the chain, which is 233in this case. The new final node has the same maximum and minimum depth of the final node in the chain, which also correspond to the values in the first node. The final node points to a cell in a vector that stores the number of nodes in the chain, so the number of positions that we still have to check in order to match the entire descriptor. In the figure, this value is 3, because we have 3 nodes in the original chain. After the first value we store all the positions that are in the chain. After the positions, we store the information to forward a matching packet. The first value is the number of output interfaces that we can use for the packet, and the following values are the list of the output interfaces.