In this section we show how BEX can be used to verify real-world string coders and
how S-EFTs can be used model problems in networking and functional programming.
2.7.1 Analysis of string encoders
We first discuss how BEXcan prove the correctness of several real-world coders, and
then present some scalability results.
2.7.1.1 Functional correctness
A string encoder E transforms input strings in a given format into output strings in a different format. A decoderDinverts such a transformation. For codersEandDto invert each-other, the following equalities should hold:E◦D=1 IandD◦E=1 I(where
Iis the identity transducer).
We illustrated in Example 2.4 how the BASE64 encoder and decoder can be modeled us-
ing Cartesian S-EFTs. Similarly, we can model BASE32, BASE16, and UTF8 coders. Us-
ing the equivalence and composition procedures presented in this chapter we proved that the equality presented above hold for all these coders. Table 2.1 shows the cor- responding running times. The letters E, D, and I stand for encoder, decoder, and the identity transducer respectively. The first half of Table 2.1 shows the look-ahead sizes of both encoders and decoders, while the second half shows the running times for checking correctness. Composition times (typically 1-2 ms) are included in the mea- surements.
2.7.1.2 Running time analysis
In this section we analyze the cost of running our composition and equivalence algo- rithms on larger S-EFTs. Most of the compositions performed in this section will take more than 1 hour when trying to model the programs using finite state transducers.
Look-ahead Analysis (ms) E D E◦D=1 I D◦E=1 I UTF8: 2 4 16 24 BASE64: 3 5 53 19 BASE32: 5 8 8 12 BASE16: 1 2 2 1
TABLE2.1: Analysed coders with corresponding look-aheads and running times.
We consider consecutive compositions of encoders and decoders and analyze their cor- rectness using 1-equality for S-EFTs. We define the following notation for consecutive composition of S-EFTs. Given an S-EFTPwe defineP1 ≡ PandPi+1 ≡ P◦Pi. We fix
E/Dto be UTF8 encoder/decoder respectively, and verify analyze the running times
of the following checks.
Equivalence for Enc/Dec: : Ei◦Di 1
= Ifor 1≤i≤9, Figure 2.4(a);
Inequivalence for Enc/Dec: : Ei+1◦Di 61
= I for 1≤i≤9, Figure 2.4(a);
Equivalence for Dec/Enc: : Di◦Ei 1
= Ifor 1≤i≤3, Figure 2.4(b);
Inequivalence for Dec/Enc: : Di◦Ei+1 61
= I for 1≤i≤3, Figure 2.4(b).
The top of Figure 2.4 shows the running times for the case in which we first encode and then decode. The figure plots the following measures whereivaries between 1 and 9:
Composition: : cost of computingEi+1◦Di(we omit the cost of computingEi◦Di since
it is almost equivalent);
Equivalence: : cost of checkingEi◦Di 1
= I;
Inequivalence: : cost of checkingEi+1◦Di 61
= I.
In this case the algorithm scales pretty well with the number of S-Ts. It is worth notic- ing that at every iwe are analyzing the composition of 2i transducers in the case of equivalence and 2i+1 transducers in the case of inequivalence.
The bottom of Figure 2.4 shows the running times for the case in which we first decode and then encode. The plot has the same meaning as before, but in this case the running time increases at a faster pace. This happens because both the state space and the look- ahead become larger than for the case in which encode first. In the case in which we first encode, the number of states and transitions does not grow wheniincreases. However, when we first decode, we already reach a large number of states (3645) and transitions (6791) fori = 3. Moreover, while the size of the look-ahead remains 2 in the case of
0 5 10 15 20 25 30 35 40 1 2 3 4 5 6 7 8 9 Se conds
Number of composed Encoder/Decoder
Composi2on Euivalence Inequivalence 5 states 22 rules 2 lookahead 0 50 100 150 200 250 300 350 1 2 3 Se conds
Number of composed Decoder/Encoder
Composi-on Euivalence Inequivalence 3645 states 6791 rules 16 lookahead
FIGURE2.4: Running times for equivalence/inequivalence checking.
2.7.2 Deep packet inspection
Fast identification of network traffic patterns is of vital importance in network routing, firewall filtering, and intrusion detection. This task is addressed with the name “deep packet inspection” (DPI) [SEJK08]. Due to performance constraints, DPI must be per- formed in a single pass over the input. The simplest approach is to use DFAs and NFAs to identify patterns. These representations are either not succinct or not streamable. Ex- tended Finite Automata (XFA) [SEJK08] make use of registers to reduce the state space while preserving determinism and therefore deterministic S-EFAs can be seen as a sub- class of XFAs that are able to deal with finite look-ahead. Deterministic S-EFA can also represent the alphabet symbolically, which enables a new level of succinctness. We be- lieve that deterministic S-EFAs can help achieve further succinctness. To support this hypothesis we observe that examples shown described by Smith et al. [SEJK08, Fig- ure 2,3] can be represented as deterministic S-EFAs with few transitions. For example the language ^/\ncmd[^\n]{200}$, which accepts all strings of the form ‘\ncmds’
such thatshas 200 symbols, can be succinctly modeled as a deterministic S-EFA with one transition! Moreover, the ability to compile S-EFA to Symbolic Automata with reg- isters 2.6.2.1 makes this model appealing for efficient deterministic left-to-right DPI.
2.7.3 Verification of list manipulating programs
In our previous work [DVLM14] we showed how S-FTs can be used to verify pre and post conditions of list manipulating programs. However, S-FTs can only model pro- grams in which each node in the output list depends on at most one node in the input list. S-EFTs can be used to mitigate this problem as they can be used to model sequen- tial pattern matching. For example, the CAML guards
x1::x2::xs -> (x1+x2)::(f2 xs)and x1::x2::x3::xs -> (x1+x2+x3)::(f3 xs)
can be naturally expressed as S-EFT transitions. Let’s consider two functions f2 and
f3, both of type list int → list int, that respectively contain the two guards defined
above. These functions can be modeled as S-EFTs. Using the one-equality algorithm of Section 2.5 and the composition algorithm of Section 2.6.2.3 we were able to prove that ∀l.f3(f2l)=1 f2(f3l)in less than 1 millisecond.