An Improved Unification Algorithm for Coq
4.4 Evaluation of the Algorithm
We tested our algorithm compiling the Mathematical Components library (Gonthier et al., 2008), version 1.4. We chose this library as test-bench for three reasons: (1) It is a huge development, with a total of 62 files. (2) It uses canonical structures heavily, providing us with several examples of canonical structures idioms that our algorithm should support. (3) It uses its own set of better behaved tactics. This last point is extremely important, although a bit technical. Truth be told, Coq has actually two different unification algorithms. One of these algorithms is mainly used by the type inference mechanism, and it outputs a sound substitution (up to bugs, like the one shown in §4.2.2). This is the one mentioned in this thesis as “the original unification algorithm of Coq”. The other algorithm is used by most of Coq’s original tactics (like apply or rewrite), and it is unsound, with convoluted semantics. Ssreflect’s tactics use the former, better-behaved algorithm, which for this evaluation we have replaced with our own.
Since, as we saw in the previous section, our algorithm does not incorporate certain heuristics, it is reasonable to expect that it will not solve all the unification problems posed by the theories in the library. Surprisingly, only a very small set of problems were not solved by our algorithm. In total, only about 300 lines out of about 67,000 required changes. To give a sense of the magnitude of the task, our algorithm is called about 32 million times, of which only a very small fraction of times did it fail to find a solution when the original algorithm did (0.005%).
In the following paragraphs we group the different issues we found, ordered by number of affected lines. After presenting the issues, we discuss different solutions to solve them.
Non-dependent if−then−elses: Most notably, two thirds of the problematic lines can be easily solved with a minor change in the way Ssreflect handles if−then−elses.
Indeed, 200 out of those 300 lines are if−then−else constructions in which the return type does not depend on the conditional. While in standard Coq this is not a problem, in Ssreflect the type of the branches is assumed to be dependent on the conditional. The
following example illustrates this point:
if b then 0 else 1
It features a simple if−then−else with conditional b : bool, and branches with type nat.
In standard Coq (modified to use our algorithm) this example typechecks, but it does not if the Ssreflect library is imported. With Ssreflect, the typechecker generates a fresh meta-variable ?T for the type of the branches, and proceeds to unify this meta-variable with the actual type of each branch. More precisely, ?T is created with contextual type Type[b : true], and when unifying it with the actual type of each branch, b is substituted by the corresponding boolean constructor. This results in the following equations:
?T [true] ≈ nat
?T [false] ≈ nat
Since they are not of the form required by higher-order pattern unification, our algorithm fails. If ?T were created without depending on b, these equations can be easily solved.
False dependency in the in modifier: Another issue affecting 35 lines comes from the in modifier in Ssreflect’s rewrite tactic. This modifier allows the selection of a portion of the goal to perform the rewrite. For instance, if the goal is
1 + x = x + 1
and we want to apply commutativity of addition on the term on the right, we can perform the following rewrite:
rewrite [in X in = X]addnC
The standard unification algorithm of Coq instantiates X with the right hand side of the equation, and rewrite applies commutativity only to that portion of the goal. With our algorithm, however, rewrite fails. Again, the culprit is in the way meta-variables are created, with false dependencies forbidding the algorithm to apply higher-order pattern unification. In this case, the implicit ( ) is replaced by a meta-variable ?y, which is assumed to depend on X. But X is also replaced by a meta-variable, ?z, therefore the unification problem becomes
?y[x, ?z[x]] = ?z[x] ≈ 1 + x = x + 1
that, in turn, poses the equation ?y[x, ?z[x]] ≈ 1 + x, which does not have an MGU.
Non-dependent products: 30 lines required a simple typing annotation to remove dependencies in products. Consider the following Coq term:
∀P x. (P (S x) = True)
When Coq refines this term, it first assigns P and x two unknown types, ?T and ?U respectively, the latter depending on P . Then, it refines the term on the left of the equal sign, obtaining further information about the type ?T of P : it has to be a (possibly dependent) function ∀y : nat. ?T0[y]. The type of the term on the left is the type of P applied to S x, that is, ?T0[S x]. After refining the term on the right and finding out it is a Prop, it unifies the types of the two terms, obtaining the equation
?U0[S x] ≈ Prop
Since, again, this equation does not fall into the HOPU fragment, our algorithm fails.
Explicit duplicated dependencies: In 11 lines, the proof developer wrote explicitly a dependency that duplicates an existing one. Consider for instance the following rewrite statement:
rewrite [ ∗ w]mulrC
Here, the proof developer tries to rewrite using the commutativity of rings (mulrC), on a fragment of the goal matching the pattern ∗ w. Let’s assume that in the goal there are several occurrences of the ∗ operator, but only one has w occurring in the right-hand side. For illustration purposes, let’s assume this occurrence has the form t ∗ (w + u), for some terms t and u. The proof dev intends to select this particular expression, however, the pattern fails to match the expression.
The problem comes from the way implicit arguments are refined: An implicit is refined as a meta-variable depending on the entire local context. In this case, the local context will include w, so the pattern will be refined as ?y[w] ∗ ?z[w] w (assuming no other variables appear in the local context). When unifying the pattern with the desired occurrence of the ∗ operator we obtain the problem:
?z[w] w ≈ w + u
As we saw in Example 4.1, this equation does not have a most general unifier.
Other issues: The remaining 14 lines have other issues, and in most cases they require attention from an Ssreflect expert to identify the origin of the problem.
There are two possible solutions to solve these problems: (1) to modify the Ssreflect library to avoid these dependencies (to solve the first two cases), and the refiner to avoid creating spurious dependencies (to solve the third and fourth cases), or (2) to extend the class of problems handled by the unification algorithm. The former solution is out of the scope of this work, although we plan to explore it in the near future. We are convinced that it is not the role of the unification algorithm to decide which dependencies should be ignored. Nevertheless, in the following section (§4.5) we explore the second solution, obtaining very good results.
4.4.1 Performance
We have not yet looked into improving the performance of the algorithm. Our prelimi-nary findings show that compiling the whole Ssreflect library takes about 35% more time with our algorithm than with the original algorithm of Coq. Note that our algorithm performs more checks than the original algorithm (e.g., the hypotheses inside the grey box in Lam-η). We are certain that, with a careful profiling of the algorithm, we will detect major issues, and bring the compilation time closer to the current algorithm.
In the long term, we aim to find ways to drastically improve the performance of the algorithm.