Figure 3.3 illustrates the architecture of MulVAL with the data sources of the analysis database shown. MulVAL scanners, running on each individual host, provide machine configuration information. Smart Firewall [10] provides network configuration in terms of HACL tuples. LDAP provides principal binding information. The security
Security policy violation & attack trace Interaction rules Analysis database Prolog Engine OVAL definition ICAT database Host 1 Host N MulVAL Scanner Smart Firewall MulVAL Scanner DB query OVAL definition ICAT database Host 1 Host N MulVAL Scanner Smart Firewall MulVAL Scanner DB query LDAP Database
Figure 3.3: Complete MulVAL framework
policy is defined by the system administrator, who also needs to define data binding as part of the policy. Data sources on the left come from third-party bug-reporting agencies. They provide the formal specification of software bugs, in OVAL definitions and from the NVD database.
Chapter 4
Basic analysis
The reasoning model and the formal description of the configuration discussed in the previous two chapters provide a foundation for conducting various kinds of security analysis. One can view the interaction rules in the reasoning model as formalized expert knowledge on security interaction. The process of MulVAL analysis is in es- sense applying the security knowledge to configuration data and derive the properties of a network. This is a logical deduction process, and for the particular logical lan- guage used in MulVAL, the problem is reduced to Datalog evaluation. This chapter briefly reviews the evaluation strategies for Datalog and discusses two basic analysis — attack simulation and policy check.
4.1
Datalog evaluation and XSB
Datalog has two basic evaluation strategies: bottom-up evaluation and top-down evaluation [11]. In the bottom-up evaluation, the rules in the Datalog program are applied to the input facts in EDB to derive new facts in IDB, until no new facts can be derived. In the top-down evaluation, given a goal, the rules are applied backward
to find subgoals that must be true to satisfy the original goal, until all subgoals hit the input facts. Bottom-up evaluation has the advantage of computing each fact only once. For a Datalog program, there is at most a polynomial number of facts that can be derived. If each fact is computed only once, the evaluation is guaranteed to be polynomial. The top-down evaluation has the advantage that only facts related to the query goal are computed. However, a naieve implementation of the top-down evaluation may compute a fact mutiple times, leading to inefficiencies.
A standard Prolog system operates in a top-down manner: each rule is tried in order and so is each subgoal of a rule. It does not remember what facts have already been computed, so a fact may be computed multiple times if it is needed at different places in the depth-first search process. This may be a problem for performance. A more severe problem in those Prolog engines are that cycles in Datalog rules may lead to nonterminating execution, and the order of the clauses, as well as that of the subgoals within a clause, affects the result of execution. For example, following is a Datalog specification for computing transitive closure.
reachable(v1, v3) :- reachable(v1, v2), reachable(v2, v3). reachable(v1, v2) :- edge(v1, v2).
Suppose the facts about edge are:
edge(node1, node2). edge(node1, node3). edge(node2, node3).
Executing the following query in a standard Prolog system will cause an infinite loop without outputing a single result.
CHAPTER 4. BASIC ANALYSIS 68
If we switch the order of the two rules for reachable, the query will output three results (correctly) before going into an infinite loop.
| ?- reachable(node1,V). V = node2 ? ;
V = node3 ? ; V = node3 ? ;
Cycles are common when it comes to modeling computer attacks. For example, an attacker can modify a user’s files if he can execute arbitrary code as the user. But it is possible that the reason he can execute arbitrary code as the user is because he modified some executables and installed a Trojan-horse program. In particular, the following two interaction rules may cause cycles in derivation.
accessFile(P, H, Access, Path) :- execCode(P, H, User),
localFileProtection(H, User, Access, Path).
execCode(Attacker, H, User) :- malicious(Attacker),
accessFile(Attacker, H, write, Path), not setuidProgram(H, Path),
localFileProtection(H, User, exec, Path).
The presence of cycles in interaction rules is completely legitimate in terms of the semantics of security interaction. Requiring interaction rules to be cycle-free is not only too restrictive, but also extremely hard, if not impossible. Unfortunately, these cycles will introduce infinite loops in a standard Prolog system, which views a Datalog program operationally rather than declaratively.
XSB [41] is a system that computes the well-founded semantics of logic pro- grams [20]. XSB supports tabled execution, which is a kind of memoization tech- niques. Put in simple words, the computation of a tabled predicate is conducted
only once and the result is stored in a table for reuse. The effects of tabling are two-fold. First, it essentially implements a dynamic-programming algorithm so that facts about a tabled predicate will not be recomputed during the execution of a logic program. Second, if a tabled predicate is involved in a cycle during evaluation, XSB will detect it and not enter a loop. As a result, cycles in Datalog programs will not introduce nonterminating computation, and the order of clauses does not affect the result of execution. This advantage makes XSB an ideal candidate for the logic engine in MulVAL.
4.1.1
Properties of Datalog evaluation in XSB
Soundness and completeness Soundness and completeness state that 1) any result of the analysis should be a logical consequence of the MulVAL interaction rules and the input facts; 2) the analysis is able to compute all such logical consequences. There are different notions of semantics for Datalog that formally define what logical consequences mean [16, 20, 11]. These semantics coincide for Datalog programs with stratified negation — the only kind of negations used in MulVAL. The XSB system can efficiently compute the well founded semantics [20], which captures the intuitive bottom-up derivation semantics of Datalog programs. Since MulVAL uses XSB as its logic engine, the soundness and completeness of XSB in computing the well founded semantics ensures that the analysis in MulVAL is both sound and complete.
Complexity The complexity of MulVAL analysis is affected by the data complexity of the Datalog interaction rules. Data complexity is the evaluation time of a Datalog program with respect to the data input, with the Datalog program fixed. For a pure Datalog program, there is only a constant number of predicates, and the maximum arity of the predicates is also constant. Since an argument of a predicate can only
CHAPTER 4. BASIC ANALYSIS 70
come from a input domain whose size is in proportion to the size of the data input, there is only a polynomial number of facts that can be possibly derived by the Datalog program. So the data complexity of pure Datalog is at most polynomial. Actually Datalog is data complete for P [14]. The introduction of stratified negation does not affect the polynomial complexity of Datalog [14].
In XSB, if every predicate is tabled, then every fact will be computed only once and the execution time of a Datalog program is guranteed to be polynomial. How- ever, table manipulation also introduces overhead which may counteract the benefit brought by the dynamic programming. Currently we table only enough predicates to avoid infinite loops in programs. The precise complexity of MulVAL reaonsing process, however, depends on the interaction rules and input data. Section 6.2 shows some experimental results that illustrate the speed of the reasoning engine on large synthesized inputs.