• No results found

Extending Datalog for Static Analysis

3.3 Frontend

3.3.1 Extending Datalog for Static Analysis

The Souffl´e language was designed with large industrial static analyses in mind. While the language implements the basic features of Datalog e.g., Datalog with stratified negation and aggregation (See [139, 90], [41] and [37]), the fact remains that Datalog was originally designed with database querying in mind. Non-trivial static analyses, on the other hand, require additional features to better express static analysis specifications as well as to aid in user productivity. Therefore the stan- dard Datalog language is extended to accommodate large-scale static analysis use

cases [140]. In addition, the language design decisions take into account the fact that Souffl´e, unlike most Datalog engines, synthesises an analyser and as such, does not dynamically execute queries. Therefore, considerations such as a solid I/O systems, static typing etc., are paramount to its design. The non-standard features of Souffl´e are summarised below.

3.3.1.1 Type System

Types for logic programming are non-standard; however, for large Datalog speci- fications a rich type system is paramount. Large projects typically require several hundreds of relations (e.g. Doop) and tool support is needed to ensure that pro- grammers don’t bind wrong attribute types. For this reason, Souffl´e provides a type system that is static. All attributes in a relation declaration need to be typed and these types are then enforced at translation time. We avoid dynamic checks at runtime as evaluation speed is paramount in static analyses.

Souffl´e’s type system is built with two primitive types, namely, the symbol type and the number type. The symbol type is defined as the universe of all strings. Internally, it is implemented by an ordinal number which can be accessed using the ord(<string>) construct. The number type is the universe of all numbers, i.e., simple signed numbers set to 32 or 64 bit. Symbol and number types can be declared with .number_type <name> or .symbol_type <name> constructs, respectively. In addition, Souffl´e provides the means for defining user-defined types using the .type directive. Moreover, Souffl´e allows the user to construct type hierarchies via union types using the following syntax:

. type <ident> = <ident1> | <ident2> | ... | <identk>

For example, the code .type A = B | C, creates a type A that is either a B or C but not both.

3.3.1.2 Functors

The Souffl´e language has the ability to perform computations in numerical do- mains. Support for functors is thus provided to aid in computations in this domain, including arithmetic, bit-vectors etc. For example, the rule P(a+1):- G(a) is valid

3.3. Frontend 75 in Souffl´e. Additionally, several string functors are supported such as concatena- tion, length and substring. Functors significantly extend the Datalog semantics, allowing non-terminating Datalog programs, i.e., infinite relations can be defined. However, the underlying evaluation algorithm remains the same since properties such a monotonicity still hold, i.e., we have infinite, chains of increasing sets of tuples.

Functors can have several practical benefits for static analysis, including deal- ing with arithmetic operations in networks (See chapter 6), context increments in points-to analysis [28] among many other applications. However, they must be used sparingly when the semi-na¨ıve evaluation algorithm is used, due to their potential to cause non-termination. In Chapter 5 we provide an approach to solving Horn clauses with numerical constraints using model checking techniques that is a poten- tial candidate for rules that require heavy use of functors and numerical constraints.

3.3.1.3 Records

Relations are two dimensional structures in Datalog. Large-scale problems often require more complex structures. The Souffl´e language has the ability to construct objects that break out of the flat Datalog world. A data structure can be generated using records with the following syntax:

. type <name> = [<name1>: <type1>, ..., <namek> : <typek>] This construction can be used to form lists, trees and other data structures. Using logic rules, these data structures can be augmented, traversed, etc., similar to functional programming. In addition, records can be very powerful for implement complex domains of computation e.g., intervals.

. type list = [ val : number , tail : list ]

. type tree = [ val : number , l: tree , r: tree ]

Data structures are implemented by providing a hidden reference type in a re- lation for each data structure type. This translates the elements of a data structure into a number. During evaluation, if an element does not exist, it is created on the fly. Semantically, data structures are relations containing references that grow

Ref next x 1 0 10 2 1 20 3 2 30 l 1 2 3 L IntList References

Figure 3.4:Relational representation of a list using records

monotonically and structural equivalence is determined by identity with new ele- ments created on the fly. We note however, that Datalog lookup for data structures comes at the cost of performance, as an extra lookup is necessary. For example, the program below builds a list of numbers:

. type IntList = [ next : IntList , x: number ] . decl L(l: IntList )

L ([ nil ,10]) . L ([ r1 ,x +10]) :- L( r1 ) , r1 =[ r2 ,x], x < 30.

. decl Flatten (x: number ) Flatten (x) :- L ([_ ,x ]) .

Figure 3.4 illustrates the layout of a list in a in-memory relation. Here, intList is a set of references (Ref field) that is used to as a value in the next field that is itself of type intList. In this way we can build/traverse the list data structure.

As we can see in extended example in Figure 3.5, records can have practical benefits such as defining traces, usage in context sensitive points to analysis [28], and even the potential to define lattices (See Chapter 7).

3.3.1.4 Components

Large logic programs often have little structure. Such programs consist of unstruc- tured sets of rules. For large-scale static analyses specifications this creates serious software engineering challenges. To rectify this, Souffl´e provides support for com- ponents. Components provide support for encapsulation, i.e., separation of con- cerns, replication of code and adaption of code. Components can be seen as a form of meta semantics for Datalog. Similar to C++ templates, the templatised Datalog

3.3. Frontend 77 code is expanded at translation time but generates new instantiations of the tem- platised code substituted with input values. Components are first defined with the following syntax:

. comp <name>[<params, ...>][:<super − name>1[<params, ...>],... ,<super − name>k [<params,...>]]{<code>}

To use a component, a component needs to be instantiated: . init <name> = <name>[<params,...>]

Each component instantiation has its own name to create a namespace and type and relation definitions inside the component inherit the namespace. Note that definitions permit embedded component definitions as well. Similar to classes in C++, this results in an embedded namespace. The translation of components to standard Datalog is shown in the example:

. symbol_type s

. decl A(x:s , y:s) . input A . comp myC {

. decl B(x:s , y:s) . output B B(x ,y) :- A(x ,y). }

. comp myCC : myC {B(x ,z) :- A(x ,y) , B(y ,z) .} . init c = myCC

// outer scope : no name space . decl A(x:s , y:s) . input A // name scoping

// B is declared inside myC / myCC . decl c.B(x:s , y:s) . output c.B c.B(x ,y) :- A(x ,y).

c.B(x ,z) :- A(x ,y) , c.B(y ,z).

Here, two components are defined where one component inherits from another. Component myCC adds an additional rule to myC. We instantiate myCC and label it as c. Souffl´e then instantiates the rules from both components using c as a prefix.

Example 9. In Figure 3.5, we extend the Datalog static analysis of Example 1.1 with an extended static analysis that uses Souffl´e language extensions. While the static analysis in Example 1.1 gives us a list of insecure nodes, we may desire more information, such as a program trace. Using standard Datalog, this can be awk- ward to define. A user would be required to define several new relations and rules, many of which are redundant. However, using Souffl´e’s extensions we can use components to extend the analysis and encode traces into a list data structure.

We first wrap the analysis of the motivating example in a Base component. Here we can instantiate the analysis of different types of data. The analysis is instantiated in the second file with the line .init A1 = Base<Node1> where the type Node1 is given as an argument. In the analysis of analysis2.dl, we instantiate the Derived analysis as an A2 object. The Derived component inherits from the Base component, meaning that all rules are accessible to A2 as in A1. However, the Derived component defines an additional analysis. This analysis keeps track of the trace of all insecure nodes (in case several exist) and the length of edges traversed up to a user specified value of K. To do this, we define a constructor in the line .type Tr = [v : N, tail: Tr]. Note that this is a recursive list- like definition. Also note that, when we instantiate A2, we instantiate it with a super set type Node, which is a union of types Node1 and Node2 and a value K. The derived analysis contains two rules. The first rule represents the base case, and here we add s as the head of the list and keep the tail as nil (empty list). We initialise the edge size as 0. The next rule, increments the edge by 1 and adds a node to the head of the list, if it is not a protected node.