Experimental context
1.2 Kparam_u and Kparam_r
The variants of the K-paramodulation calculus implemented in Kparam_u and Kparam_r were developed to handle the formulæ with atomic prime im- plicates more efficiently than Kparam_s. To select the best variant, we started with the internal comparison of the options respectively available in Kparam_u and Kparam_r, then we compared the best of both with each other in order to determine the best setting of parameters (see Table2 page134for a summary of the options).
Plot (13a) shows the execution time of the -up (X axis) and -ur (Y axis) options. As expected, when the formulæ have no atomic implicates (light gray dots) the results are equivalent, because in this case both programs process inputs the same way. In contrast, when atomic implicates are involved the results vary a lot. Overall, 67% of the formulæ with atomic implicates are handled more efficiently by the -ur option than by the -up option. Thus Kparam_u is more efficient if rewriting is applied eagerly during the unordered paramodulation step, rather than waiting until all unit positive implicates are generated.
Plot (13b) shows that there is next to no difference between the -o2 and -o5 options. The expected improvement of the simplification implemented in the -o5 option is not verified experimentally. This could indicate that the cost of reexamining more clauses (in the waiting set) compensates the benefits of relaxing the collision criterion, but in fact, there are less than ten cases in which the numbers of processed clauses differ using -o2 and -o5. In these cases,
1. Kparam 100 101 102 103 104 105 106 Zres (pi nb) 0 200 400 600 800 1000 1200 Kparam_s (pi nb)
(a) Prime implicates
0 150 300 Zres (s) 0 150 300 Kparam_s (s) (b) Execution time 0 200 400 600 800 1000 1200 Kparam_s (pi nb) 0.0 0.5 1.0 1.5 2.0 Kparam_s (implicate nb) 1e7 (c) Kparam_s
Figure 12 – Kparam_s vs Zres, random flat benchmark
the difference is of just one clause thus the conclusion is that -o5 is a good approximation of -o2. A first hint of the efficiency of Kparam_r is the fact that all the formulæ with atomic implicates (dark gray dots) are processed in less than fifty seconds. Since a finer analysis reveals that -o5 is more efficient than -o2 (although not enough to make a real difference) on 68% of the benchmark, we selected this option for the comparison with Kparam_u.
The results presented in Plot (13c) confirm the efficiency of Kparam_r com- pared to Kparam_u. As with the other plots, the light gray dots represent the for- mulæ with no atomic implicates, on which Kparam_u and Kparam_r are roughly equivalent and the formulæ with atomic implicates are represented by dark gray dots. Since Kparam_r is clearly more efficient on the latter formulæ than Kparam_u, we select this variant for the comparison with Kparam_s.
1.3
Kparam_r vs Kparam_s
Before comparing Kparam_r and Kparam_s, a first result worth mentioning is that in Kparam_r the execution time of the last processing step, i.e. the recovery
0 150 300 Kparam_u up (s) 0 150 300 Kparam_u ur (s)
(a) Kparam_u: -up vs -ur
0 150 300 Kparam_r o2 (s) 0 150 300 Kparam_r o5 (s)
(b) Kparam_r: -o2 vs -o5
0 150 300 Kparam_u ur (s) 0 150 300 Kparam_r o5 (s)
(c) Kparam_u (-ur) vs Kparam_r(-o5)
Figure 13 – Kparam_u and Kparam_r time comparisons, random flat benchmark
of the original solution, is quasi-negligible no matter what the total execution time is: the maximum is less than one second and the mean is 0.04 seconds. In this benchmark, it always represents less that 1 percent of the total execution time1.
Another interesting indicator of the relative superiority of Kparam_r com- pared to Kparam_s is the fact that while 7% of the benchmark reaches timeout with Kparam_s, only 4% do so using Kparam_r. In our benchmark 49% of the formulæ have no atomic implicates (the light gray dots in Figure 13) and, as expected, Kparam_s and Kparam_r merely coincide on such problems. Results concerning the remaining 47% of the benchmark are presented in Figure14. On Plot (14a) the gain of going from Kparam_s to Kparam_r with regards to the execution time can be observed. A logarithmic scale is used for the X axis to highlight that this graph empirically indicates an exponential gain for our bench- mark. The results in Plot (14b) compare the number of implicates generated by 1. Note that this is a characteristic of randomly generated formulæ but it may not always be true.
1. Kparam 10-3 10-2 10-1 100 101 102 Kparam_s (s) 0 15 30 Kparam_r o5 (s)
(a) execution time
0 1 2 Kparam_s 1e7 0 1 2 3 Kparam_r o5 1e6 (b) generated implicates
Figure 14 – Kparam_r vs Kparam_s, random flat benchmark (formulæ with atomic implicates only)
Number of Atomic implicates 0 1 > 1 > 0 Total Kparam_s 69% 28% 25% 27% 48% Kparam_r 69% 86% 84% 85% 77% Table 4 – Percentage of Tests Executed Twice Faster than Zres
Kparam_s and Kparam_r. There are two kinds of dots represented on the graph: dark gray and light gray ones, the latter represent tests for which Kparam_s reaches the 5 minutes timeout before terminating. The difference of scale be- tween the X and Y axes shows that some formulæ with atomic implicates, that Kparam_s cannot solve by computing more than ten million implicates, can be solved by Kparam_r with less than two million implicates generated.
Our original motivation was to improve Kparam_s on formulæ with atomic implicates (denoted by f.a.i. in this paragraph) compared to Zres. We summa- rize these improvements in Table4. The observation that Zres is more efficient on f.a.i. is apparent in the first line of the table, where only 27% of these for- mulæ are executed at least twice faster with Kparam_s than with Zres, while on 69% of the rest of the benchmark Kparam_s is twice faster than Zres. The second line of Table4 shows an additional 58% of the f.a.i. are executed twice faster using Kparam_r than using Zres, validating this work as a real improve- ment over Kparam_s. The results also distinguish between formulæ with a single atomic implicate (71% of the f.a.i.) and several ones (29% of the f.a.i.). A slight improvement of the performance is noted for the latter, but not as significant as the gap between none and one atomic implicate.