Efficient Raw-term Enumeration - Optimizing The Synthesis Procedure

7.3 Optimizing The Synthesis Procedure

7.3.4 Efficient Raw-term Enumeration

Our refinement tree structure allows us to cache I-refinements, allowing us to optimize the synthesis procedure greatly. Can we perform similar caching for E-guessing as well? We observe that the inefficiency here lies in repetitive calls to our term-generation functions gen_E and gen_I. For example, we may need to repeatedly call gen_E(Σ;Γ;list;k) to synthesize expressions of type list as the potential body of our function, the leaf expression of a pattern match branch, as an argument to a function application, or otherwise part of some complex expression we are building up. Ideally, we should cache the results ofgen_E and

gen_I for particular combinations of arguments to ensure that we only ever generate a particular term once during the enumeration process.

If we examine the arguments to gen we see that the signature Σ remains constant and the goal typeτ and term sizek are natural “keys” by which we can cache results to gen. However, our context Γdoes not remain constant throughout the synthesis process. For example, in our refinement tree example above, we generate terms of typelist in several contexts:

• Γ₁ = f:list→list,l:list just inside the body of thefix and in theNilbranch of the match.

gen_E(Σ;Γ;τ;n)

gen_E(Σ;·;τ;n) ={}

gen_E(Σ;·;τ; 0) ={}

gen_E(Σ;x:τ1,Γ;τ;n) =genx:Eτ1(Σ;Γ;τ;n)∪genE(Σ;Γ;τ;n)

genx:τ1 E (Σ;Γ;τ;n) genx:τ1 E (Σ;Γ;τ; 0) ={} genx:τ E (Σ;Γ;τ; 1) ={x} genx:τ1 E (Σ;Γ;τ; 1) ={} (τ 6=τ1) genx:τ1 E (Σ;Γ;τ;n) = [ τ2→τ∈Γ n−1 [ k=1 (genx:τ1 E (Σ;Γ;τ2 →τ;k) _app⊗ genI(Σ;Γ;τ2;n−k)) ∪ (gen_E(Σ;Γ;τ2 →τ;k) ⊗ app gen x:τ1 I (Σ;Γ;τ2;n−k)) ∪ (genx:τ1 E (Σ;Γ;τ2 →τ;k) _app⊗ gen x:τ1 I (Σ;Γ;τ2;n−k)) Figure 7.7: Relevant E-term generation

Clearly, we cannot interchange the results ofgen_E(Σ;Γ1;list;k)andgenE(Σ;Γ2;list;k) because they contain different sets of expressions. But the two calls to gen also clearly share some expressions in common. We would like to be be able to realize this sharing in our term caches as well to avoid redundant work.

To do this, we present a technique for efficiently performing term enumeration in the presence of contexts calledrelevant term generation. Critically, we note that our contexts during the synthesis process only grow; they never shrink or shuffle their contents. As a result, we can factor the term generation function as follows:

gen_E(Σ;x:τ1,Γ;τ;n) =genx:_Eτ1(Σ;Γ;τ;n) ∪gen_E(Σ;Γ;τ;n)

gen_I(Σ;x:τ1,Γ;τ;n) =genx:_I τ1(Σ;Γ;τ;n) ∪gen_I(Σ;Γ;τ;n)

This factorization ensures that for a given goal type τ and size n, two calls to

genin different contexts Γandx:τ1,Γ share the same set terms under the shared contextΓ.

Here, genx:τ1

E and gen x:τ1

gen_I(Σ,Γ,τ,n)

gen_I(Σ;·;τ;n) ={}

gen_I(Σ;Γ;τ; 0) ={}

gen_I(Σ;x :τ1,Γ;τ;n) =genx:I τ1(Σ;Γ;τ;n)∪genI(Σ;Γ;τ;n)

gen_I(Σ;·;τ1→ τ2;n) = {fix f (x:τ1): τ2 = I | I ∈gen_I(Σ; f : τ1 →τ2,x: τ1;τ2;n−1)} genx:τ I (Σ;Γ;T;n) = genEx:τ(Σ;Γ;T;n) [ C:τ1∗...∗τk→T∈Σ [ n1,...,nkfor n1+...+nk=n {C(I1, . . .,Ik) | Ij ∈gen_I(Σ;Γ;τj;nj)} genx:τ1 I (Σ;Γ;τ;n) genx:τ I (Σ;Γ;τ1→τ2;n) = genEx:τ(Σ;Γ;τ1 →τ2;n)∪ {fix f (y:τ1) : τ2= I | I ∈ genx:I τ(Σ; f : τ1 →τ2,y: τ1,Γ;τ2;n−1)} genx:τ I (Σ;Γ;T;n) = genEx:τ(Σ;Γ;T;n) [ C:τ1∗...∗τk→T∈Σ [ n1,...,nkfor n1+...+nk=n [ r1,...,rk∈ parts(k) {C(I1, . . .,Ik) | Ij ∈ genp mj;x:τ I (Σ;Γ;τj;nj)}

parts(k) ={Not, . . .,Not

| {z }

i−1

,Must,May, . . .,May

| {z } k−i | i∈ 1, . . .,k} genpr;x:τ1 I (Σ;Γ;τ;n) =      genx:τ1 I (Σ;Γ;τ;n) r =Must gen_I(Σ;x: τ1,Γ;τ;n) r =May gen_I(Σ;Γ;τ;n) r =Not Figure 7.8: Relevant I-term generation

relevance logic [Anderson et al., 1992], these functions are variants of our standard term-enumeration functions except that they require that all expressions they generate must contain the relevant variable x. Figure 7.7 and Figure 7.8 gives the definition of relevant E- and I-term generation. The functions operate similarly

to the gen functions we developed in Figure 7.4. The critical difference is that when our relevantE-term generation functions bottoms out at size one, rather than generating all terms of goal typeτ, we generate only a single term, the relevant variable x, when the goal type is the relevant variable’s type.

The relevant term-generation functions ensure that the relevant variable x appears in every term generated by the function. When we generate terms that contain multiple sub-expressions in a relevant context, we must be careful to ensure that this property holds. For example, we can break up generation of a function application E I with a relevant variablex into three cases:

1. x must appear in Eand must not appear in I. 2. x must not appear inE and must appear in I. 3. x must appear in bothE and I.

These cases are reflected in the definition of relevant term generation for function applications in Figure 7.7. To ensure that x appears in a particular sub-term, we invoke the relevant term generation function with x,genx:τ

E or genx:I τ. To ensure that x does not appear in a particular sub-term, we invoke the non-relevant term generation function in a context not containing x.

For constructors (Figure 7.8), we must generalize this factorization to k subex- pressions rather than just two. To do this, employ a “sliding window” factorization (realized by the partshelper function) where we walk the list of sub-expressions, distinguishing the current expression Im as the one thatmustcontain the relevant variable x. Throughout this process, we note that x has been required to appear in all of the expressions before Im. Therefore, we require that x must not appearin the expressions before Im. In contrast, x mayappear in the expressions after Im as we have not placed any restrictions on them yet.

Now we have three cases of sub-term generation that we handle with thegenp helper function. The cases where xmust and must not appear in the sub-term are handled similarly to the function application case. To generate terms that may contain x, we appeal to the non-relevant term generation function, adding xinto the context.

Chapter 8 Evaluating Myth

In Chapter 7, we developed an efficient synthesis procedure from our core synthesis calculus MLsyn. In this chapter, we explore our implementation of this synthesis procedure, a prototype program synthesizer called Myth.1

Our goal with Mythis to further explore the type-theoretic foundations for program synthesis that we have developed so far. We started our exploration by carefully analyzing the metatheory of type-directed program synthesis, in particular the soundness and completeness of λ→syn and MLsyn. However, this is insufficient for getting a complete sense of how program synthesis systems built on top of these foundations will perform in practice. By exploring the behavior of an actual implementation, we can better understand the capabilities and limitations of our approach and identify areas for future improvement.

Note that an explicit non-goal of this exploration is to justify Myth-the-artifact as a practical tool for program synthesis. While we explore some aspects of the viability of Mythas an end-user tool,e.g., performance, we intentionally do not explore the usability of the tool. We do this primarily as a matter of pragmatics. There are plenty of empirical questions to investigate about Myth—How many examples do we need to synthesize a particular program? How long does it take to synthesize a particular program?—without delving into the usability side of the project. However, we also want to stress that, throughout this work, we have been less concerned with building a practical tool and more interested in answering foundational questions about the integration of types into program synthesis. We do not want to overshadow these important results with claims about usability that we do not have the time to develop thoroughly. We leave such investigation to future work.

In document Program Synthesis With Types (Page 124-129)