7.4 Kecleon Compilation Pipeline
7.4.2 Runtime Statistics and Data Collection
The second phase of the Kecleon compilation pipeline is composed of two steps. The former is where the actual runtime data of the previously identified instructions are retrieved; after having identified the configuration and state operations, Kecleon can retrieve all the runtime values by issuing a request to the back-end plugin. For instance, in the Katran example shown before, after having identified that vip_inf is a configuration variable, Kecleon can “ask” the eBPF plugin to retrieve the set of values within the vip_map.
The latter is where the original data plane code is instrumented to retrieve more specific information about the current execution of the program, which is particularly useful to adopt optimizations that improve the performance for par- ticular traffic patterns There are several ways in which this instrumentation can be achieved. A first possible approach would be to add a local MAT into the data plane of the NF that records the values of a packet (e.g., 5-tuple) traversing the NF. However, this may incur additional and unnecessary overhead if the actions performed within the NF data plane do not use those fields. A better approach would be just to monitor the used fields, but this requires to complete an addi- tional analysis on the NF data plane, making the instrumentation pass aware of
2A careful reader may argue that, if the variable is retrieved from a map with a lookup
operation and is later modified without an update operation, it can still be considered as read- only. This is not valid for eBPF, where the result of a lookup is apointer to the entry in the map; as a consequence, a modification on the pointer’s content will modify the corresponding map.
7 – Kecleon: A Dynamic Compiler and Optimizer for Software Network Data Planes
the actions performed on the packets or depending on its content.
To avoid this, Kecleon adopts an implicit traffic-specific mechanism by auto- matically instrumenting the sections of the code that access or modify the internal state of the NF. In particular, the Kecleon instrumentation pass retrieves all the MAT accesses and adds a Local MAT that stores the same data of the original table. It samples only the most-accessed entries within the original MAT and their corresponding values, saving only a limited number of entries to reduce its size and overhead. Then, the Kecleon platform-specific plugin exports the entries from the instrumented MAT using the Kecleon common data format so that the various optimization passes can easily consume them in a target-independent way.
This approach brings several advantages. First, it does not require any static analysis on the NF code to retrieve a common set of packet values that should be representative of the incoming traffic. Second, it gives to Kecleon a more fine- grained control of instrumentation that is applied. If it recognizes that the overhead of a single table is minimal or there is no space for improvements, the instrumen- tation can be disabled only for that specific branch or table, while still allowing to retrieve information on the other methods. Kecleon can change at runtime the size of the instrumented tables according to the level of information it needs and the sampling rate of the instrumented entries, which is a compromise between accuracy of the instrumented traffic and performance overhead introduced by the instrumen- tation. Finally, merging all the information coming from the different local MAT, it is possible to reconstruct the hot code paths and then optimize for them.
7.4.2.1 Implementation Details (eBPF Plugin)
We show here a small example of the bpf-iptables network function presented in Chapter5. In Listing 7.3 we illustrate a short code snippet of module matching the L4 ports of the packet, excluding the colored code that we will explain later. On line 8, a lookup in the port_map is performed to retrieve the bitvector associated; then from line11to15the final bitvector is calculated by performing a bitwise and operation with the current bitvector obtained in the other steps of the pipeline. If the resulting bitvector is zero, the default action is applied (e.g., the packet is dropped); otherwise, the next module in the pipeline is called.
1 u64 ∗ v a l = bpf_lookup (&port_map_c , &l 4 p o r t ) ;
2 ( ∗ v a l ) ++; 3 bpf_map_update(&cpu_port_map , &l 4 p o r t , v a l ) ; 4 i f ( l 4 p o r t == 8 0 8 0 ) { 5 i s A l l Z e r o = True ; 6 goto NEXT; 7 }
8 e l e = bpf_lookup (&port_map , &l 4 p o r t ) ; 9 . . .
10 #pragma u n r o l l
7 – Kecleon: A Dynamic Compiler and Optimizer for Software Network Data Planes 12 b i t s [ i ] = b i t s [ i ] & ( e l e −>b i t s ) [ i ] ; 13 i f ( r e s −>b i t s [ i ] != 0 ) 14 i s A l l Z e r o = f a l s e ; 15 } 16 goto NEXT; 17 . . . 18 NEXT : ; 19 i f ( i s A l l Z e r o ) { 20 a p p l y D e f a u l t A c t i o n ( ) ; 21 return ; 22 } 23 c a l l_ b p f _ p ro g r a m ( ctx , _NEXT_HOP_1) ; 24 . . .
Listing 7.3: Sample code of the bpf-iptables L4-Port matching module In the eBPF case, the Kecleon instrumentation uses a per CPU BPF_HASH map to store all the L4Ports that reaches the original map with a corresponding counter3.
The red code in Listing 7.3 shows the corresponding C code that would be added by the instrumentation pass.
The Kecleon optimization pass can then try to optimize the code using the result of the instrumentation. For instance, in this example, if most of the runtime flows contains packets with L4 port 8080, and the associated bitvector4 contains all
bits at zero, a Kecleon optimization pass can pre-compute the value (green code of Listing 7.3), and the loop from line 11 to15 will be avoided, saving a lot of CPU cycles for the most common case.