Preprocessing tools - Implementation details

Implementation details

1.2 Preprocessing tools

A number of tools were developed in parallel of Kparam to preprocess inputs. They are available in the tool folder of the archive previously mentioned. Equationalizer accepts flat ground cnf TPTP inputs containing booleans and returns fully equational formulæ. To do so, it uses t as a constant standing for ’true’ and converts propositional terms p and ¬p respectively into p ' t and p 6' t.

Remark 8.3 The constraint t < x for any x ∈ \Σ0{t}, that permits to simulate

resolution can be artificially enforced in the program by adding the tautological clause t ' t at the beginning of the file.

Flattener_for_kparam is a tool that flattens non-flat TPTP cnf inputs by replacing non-flat terms with fresh constants and instantiating the substitutivity axiom when necessary. For example, if c and d are introduced to replace f (a) and f (b) then the clause a 6' b ∨ c ' d must also be added.

2 cSP

2.1 Implementation details

LogTk. The main difference between cSP and cSP_flat is the use of the LogTk library [16]. In cSP its main use is to manage everything related to terms. Given that cSP handles non-flat terms, the simple order implemented in Kparam is no longer usable because it is not necessarily a reduction order in this case, as the following example shows.

Example 8.4 The order of appearance in the following input file is a ≺ b ≺ c ≺ d ≺ f (b) ≺ f (a).

cnf(cl1,plain,a=b|c=d). cnf(cl2,plain,f(b)!=f(a)).

In a reduction order, given a ≺ b necessarily f (a) ≺ f (b) because it is a rewrite order (see page 29). Here instead a ≺ b and f (b) ≺ f (a), thus this order is not

2. cSP

Fortunately, LogTk implements several reduction orders, e.g. the KBO (see Ex- ample ii.9) and the RPO (Recursive Path Ordering [2]) in its multiset and lexicographic variants. Among those, cSP relies (arbitrarily) on the KBO. Normalization. For the normalization of clauses, a Union-Find data structure is no longer sufficient.

Example 8.5 In the following input, the equivalence classes associated with cl are {a, b}, {f (a), f (b)}.

cnf(cl,plain,a!=b|f(a)=f(b)).

In a Union-Find data structure, the propagation of the identity a ≡cl b to

f (a) ≡clf (b) is not automatically computed. ♣

Instead, a congruence closure algorithm [51], also implemented in LogTk, is used. The advantage of such an algorithm is that the propagation of unification from the subterms to the superterms is done automatically. Going back to the previous example where a and b are unified, using a congruence closure algorithm ensures that f (a) and f (b) are put in the same equivalence class as well. Any such other terms, e.g. g(c, a) and g(c, b) are also unified.

Options. The options of cSP are:

— -max-size,-max-neg and -max-depth, three filters respectively limiting the number of literals appearing in the prime implicates, their number of negative literals and the depth of the terms.

— -cov, another filter accepting only prime implicates that entail one of the clauses of the input formula (see Example3.31and preceding paragraph). — -odiff, an option that, when a timeout is reached, compares the set of processed implicates to the original input formula and counts how many new implicates (not in the original input) have been processed. This option also sets the function that selects clauses in the processed set (see description below) to take into account only the clause part of constrained clauses.

A difference with Kparam is that the options of cSP can be combined.

Example 8.6 A call to cSP with the options -max-depth 1 and -max-size 2 returns only the prime implicates of size 1 or 2 made of terms of depth 0 or 1. If f (a) ' b, a ' c ∨ b ' d ∨ g(e) ' d and f (g(f (a))) ' d ∨ g(e) ' b are prime implicates of a formula, only the first one is computed. ♣ Clause selection. The order in which the clauses are extracted from the waiting set is controlled by the parameter cs_comp_func. It is easily modifiable in the code and three orders are considered.

— a sort based only on the size of the clausal part of constrained clauses, — a lexicographic sort based on the size of both parts of tree clauses starting

— the same as the previous one but starting with the constraints.

All three are used in increasing order, i.e. the smallest clause is the one selected. By default, the third one is used so as to find the simplest prime implicates first. Assertable terms. To ensure the termination of the calculus, cSP generates only implicates built on a finite number of terms that are introduced by the Assertion rules (see Chapter 3, Section 2). By default, these assertable terms are the terms already occuring in the input formula. However it is possible to specify additional assertable terms to cSP. To do so, a TPTP input file with the extension ’.conf’ that contains a cnf formula containing all assertable terms can be provided to cSP along with the input formula.

2.2 Preprocessing tools

A flattener has been developed in cSP but is not available as a standalone tool (the Kparam flattener can be used in this case). Instead, it is used in the full_array_preprocessing, a script that converts TPTP cnf files about the array theory into TPTP (untyped) ground cnf files, based on the method de- scribed in [11] to generate equisatisfiable problems free of the axioms of the theory of arrays with extensionality. The conversion proceeds in three steps. First the Array_preprocessing with option -arr1 flattens and decomposes the input into an operational and a definitional part. The definitional part contains all the clauses of the form ’store(a, i, e) ' b’ plus the axioms of the array theory. The clauses allowed in the operational part are strictly flat clauses and clauses of the form ’select(a, i) ' e’. Then the E theorem prover is used to saturate the definitional part. Finally, the Array_preprocessing realizes the necessary instantiations of the axioms of the array theory.

Example 8.7 (Example 66 in [11]) The following input: c n f ( a1 , axiom , s e l e c t ( s t o r e (A, I , E ) , I )=E ) .

c n f ( a2 , axiom , s e l e c t ( s t o r e (A, I , E ) , J)= s e l e c t (A, J ) | I=J ) . c n f ( c1 , p l a i n , s t o r e ( b , i , d)=a ) . c n f ( c2 , p l a i n , s e l e c t ( b , i 2 )= e ) . c n f ( c3 , p l a i n , s e l e c t ( c , i 2 )= e2 ) . c n f ( c4 , p l a i n , a=c ) . c n f ( c5 , p l a i n , i != i 2 ) . c n f ( c6 , p l a i n , e != e2 ) .

is first split in two. The operational part is: c n f ( c2 , p l a i n , s e l e c t ( b , i 2 )= e ) . c n f ( c3 , p l a i n , s e l e c t ( c , i 2 )= e2 ) . c n f ( c4 , p l a i n , a=c ) .

c n f ( c5 , p l a i n , i != i 2 ) . c n f ( c6 , p l a i n , e != e2 ) .

3. Summary

Since the ground clauses of the original formula are flat, no flattening was necessary to generate the operational part. The definitional part is:

c n f ( a1 , p l a i n , s e l e c t ( s t o r e ( X0 , X1 , X2 ) , X1)=X2 ) .

c n f ( a2 , p l a i n , s e l e c t ( s t o r e ( X0 , X1 , X2 ) , X3)= s e l e c t ( X0 , X3 ) | X1=X3 ) .

c n f ( c1 , p l a i n , s t o r e ( b , i , d)=a ) .

The renaming of the variables (in the definitional part) is an internal mechanism of LogTk. Then E saturates the definitional part:

c n f ( i_0_2 , p l a i n , ( s e l e c t ( s t o r e ( X1 , X2 , X3 ) , X2)=X3 ) ) . c n f ( i_0_1 , p l a i n , ( s e l e c t ( s t o r e ( X1 , X2 , X3 ) , X4)= s e l e c t ( X1 , X4 ) | X2=X4 ) ) . c n f ( i_0_3 , p l a i n , ( s t o r e ( b , i , d)=a ) ) . c n f ( i_0_4 , p l a i n , ( s e l e c t ( a , i )=d ) ) . c n f ( i_0_5 , p l a i n , ( s e l e c t ( a , X1)= s e l e c t ( b , X1 ) | i=X1 ) ) . Again, the renaming of the clauses and variables is an internal process of E. In this saturation two new clauses have been created. After the instantiation the final result is:

c n f ( p i 2 , p l a i n , s e l e c t ( b , i 2 )= e ) . c n f ( p i 3 , p l a i n , s e l e c t ( c , i 2 )= e2 ) . c n f ( p i 4 , p l a i n , a=c ) . c n f ( p i 5 , p l a i n , i != i 2 ) . c n f ( p i 6 , p l a i n , e != e2 ) . c n f ( p i 1 , p l a i n , s t o r e ( b , i , d)=a ) . c n f ( p i 0 , p l a i n , s e l e c t ( a , i )=d ) . c n f ( p i 7 , p l a i n , s e l e c t ( a , i 2 )= s e l e c t ( b , i 2 ) | i=i 2 ) . c n f ( p i 8 , p l a i n , s e l e c t ( a , i )= s e l e c t ( b , i ) | i=i ) .

The non-ground clause i_0_5 from the previous step has been used to generate the clauses pi7 and pi8. The other clauses are extracted unchanged from the operational and saturated definitional part. ♣

3 Summary

Table2 & Table3summarize the different options available in Kparam and cSP.

Kparam_s K-paramodulation calculus (see Part I Chap- ter2 Section1)

-b basic implementation

Kparam_u K-paramodulation calculus with atomic implicate generated in advance using unordered paramodulation (see PartIChapter2 Section2)

-up rewriting between the unordered paramodulation and the K-paramodulation. -ur rewriting propagated

during the unordered paramodulation. Kparam_r K-paramodulation cal-

culus with atom rewriting on the fly

-o2, -o5 different collision crite- rion (see page78) cSP_flat _{cSP calculus in E}0 (see

Chapter3)

-cs1,... -cs7 different selection function used on the waiting set

-csi cSP calculus with index (see page130)

-csize, -cov different filters (see page129)

-isize -csize filter plus index Table 2 – Summary of the different options of Kparam and cSP

cSP _{cSP calculus in E}1 (see

Chapter3)

-max-size, -max-neg, different filters (see page131)

-max-depth, -cov

-reg regression option

for comparison with Kparamand cSP_flat

-odiff tells how many new

clauses have been generated if a timeout is reached

Chapter 9

In document Prime implicate generation in equational logic (Page 151-156)