Discussion - Extending to circularity - Reliably composable language extensions

5.8 Extending to circularity

5.8.1 Discussion

We do not consider non-circularity to be a part of our modular well-deﬁnedness analysis, for a number of diﬀerent reasons.

The most important reason is that the flow sets the circularity analysis associates with nonterminals are not a clean interface. They are a very complex one. We can reasonably infer from a host language most flow types, with a few exceptions where the host would prefer extensions be allowed to depend on an attribute but the host language never does. The flow sets, however, are vastly more difficult to reason about (as they arise non-locally), much more difficult to specify, and result in error messages that would best be described as unhelpful. We no longer have a simple answer to what inherited attributes a synthesized attribute equation is permitted to depend upon. There may be a set of very subtly different flows which an extension developer may run afoul of, despite being well within the flow type. Further, these flows are global properties of an attribute grammar, and so our error messages cannot

5.8. Extending to circularity

necessarily pinpoint which piece of code caused the problem to appear, only that a particular production now has a problem. This makes understanding the restrictions (and understanding what went wrong in violating these restrictions) very diﬃcult.

Another reason to avoid non-circularity is that it is much more difficult to compute. The flow set analysis potentially blows up exponentially, while the flow type analysis is much simpler. If we believed the non-circularity analysis were an important part of ensuring that extensions would work, this might be worthwhile. However, there are a number of reasons to believe it is not important. For one, we have never actually encountered a circularity problem with an extension that wasn’t also a flow type problem. Flow types by themselves are quite good at preventing circularity issues, because they permit only certain circularities to exist. If a translation attribute can only depend upon env, then we can only introduce a circularity there by using translation of a subtree to define env for that subtree. This restriction is sufficient to make potential circularities rather obvious when they are written, as well as prevent accidental circularities (that are usually a result of accidentally emitting very wide dependencies, which would likely violate the flow type.)

Finally, there are a few other reasons not to bother. For one, because Silver is a lazy language, circularities can actually be productive, if they compute streams for example, or other data structures where only partial demands may be made. (Silver comes with a pretty printing library that uses a circular stream of this sort.) Or the circularities may be entirely false anyway (especially because our method of dealing with references is extremely conservative). Finally, circularity is just one kind of nontermination, and to close this gap we must also check for termination of higher-order attribute expansion [65] (via locals and forward equations) and well as general function termination. Some of these problems may be solvable, but we

5.9. Related work

consider it reasonable to call them beyond the scope of this thesis. As a result, for our purposes in developing reliably composable language extensions, we confine the modular well-definedness analysis to merely refer to modular effective completeness, which we consider sufficient.

5.9 Related work

Knuth provided (and later corrected) a circularity analysis when introducing attribute grammars [20]. In presenting higher-order attributes, Vogt et al. [56] extended Knuth’s completeness and circularity analyses to that setting. Reference and remote attributes do not have a precise circularity analysis [59], as the problem is undecidable. Completeness in these settings is simply a matter of using occur-on relationships to check for the existence of all equations for all attributes. With for- warding, flow-analysis is used to check completeness and thus a definedness analysis that combines the check of completeness and circularity was defined [21, 72]. This analysis used dependency functions instead of flow graphs in order to distinguish be- tween synthesized attributes that depend on no inherited attributes and those that cannot be computed because of a missing equation or circularity, and thus conflate these two types of errors. All of these are non-modular analyses.

Saraiva and Swierstra [79] present generic attribute grammars in which modules can be parameterized by nonterminals with a particular flow type. This is the origin of flow types, though we use them for slightly different purposes, and we infer them rather than specify them as part of the module. Generic AGs are very different from our language extension model, however. It does not allow for multiple independent extensions to be composed, except by first merging them into a single extension, on

5.9. Related work

which the analysis must then be performed, eﬀectively making it monolithic.

In AspectAG, Viera et al. [10] have shown the completeness analysis can be en- coded in the type system of Haskell. However, this analysis is again performed at the time of composition (by the type checker) and is thus a monolithic analysis.

Current AG systems such as JastAdd [81] and Kiama [9] do not do static flow analysis but, like previous versions of Silver, instead provide error messages at attribute evaluation time that indicates the missing equation or circularity. An extension writer can write test cases to test his or her specification and perhaps find any lurking problems, but this does not provide any assurances if independently developed grammars are later composed. And of course, no one has written tests for an attribute grammar composed of several independent extensions.

Chapter 6 Non-interference

In the previous chapter, we have described an analysis that ensures composition of language extensions will always result in a well-deﬁned attribute grammar. This ensures we are always able to compose extensions together, without errors occurring during the composition process. However, we are left knowing only fairly weak properties about the behavior of the resulting composed language. The trouble is that attribute values can now be computed by one independent extension and consumed by another, and it is quite easy to imagine ways in which this can result in undesirable behavior, despite the lack of type errors or well-deﬁnedness issues in the attribute grammar.

We will take two diﬀerent perspectives on this problem throughout this chapter. The ﬁrst perspective views the problem as undesirable interaction between language extensions, which we call interference. From this perspective, we are concerned with the extension developer’s task in ensuring their extension will continue to work when composed with other unknown extensions. Although this perspective makes clear the problem we wish to avoid, it does not leave us with much guidance on how to resolve the problem.

The second perspective is to consider modular and composable proofs of properties about attribute grammars. One major diﬀerence with this perspective is that now the host language, too, is involved. We wish to prove our host language and extensions

correct, and be sure these properties will hold of the composed language, too. In this way, we can ensure non-interference: if the proofs of correctness still hold, then the extensions behavior should be as expected. The most important development we make with this perspective is the notion of coherence, which is the particular tool we use to ensure our properties and proofs will not be invalidated when other extensions are composed into the system.

We are far from the day when our compilers are routinely verified, however, and so instead of doing verification, we wish to use this perspective as a theoretical frame- work for a more practical way of achieving non-interference. Throughout this chapter, we will weave these two perspectives together to develop a practical, testing-based approach to ensuring extensions are non-interfering, grounded in the notion of coherence. This allows us to identify potential interference problems in a quick and practical way, without the need to actually do verification.

We begin in section 6.1 with a more detailed explanation of the interference problem. We show two examples of language extensions that seem correct in isolation, but when composed together show observable errors. However, this perspective seems to give us no guidance on what went wrong: the problem just looks like something that needs glue code to resolve, what could the extension developers have done?

In order to ﬁnd a solution, we shift attention to the veriﬁcation perspective. In section 6.2, we take a look at what proofs of properties about attribute grammars look like. In particular, we consider how modular proofs can be constructed. Then in section 6.3, we introduce a coherence meta-property (that is, a property we can show about the properties we prove about attribute grammars). Under the assumption that language extensions do not violate coherent properties, we show how proofs of coherent properties can be automatically extended to cover the new cases arising from

composition with other language extensions. As a result, coherent properties can be proved for individual language extensions (and the host language) in isolation and remain true for their composition with other extensions.

We then justify that assumption, in section 6.4, by showing a set of restrictions that are sufficient to ensure an extension preserves all coherent properties, thus ensuring non-interference. While we consider these restrictions a reasonable burden on extension developers (enforcement can even be done syntactically, without the need to actually do verification), they are unreasonably restricting to the capabilities of extensions. And so in section 6.5 we develop an “attribute properties” approach to ensuring coherent properties are preserved. This is sufficient to solve the problems we identified in the overly restrictive approach.

Finally, in section 6.6, we describe a method for enforcement of this “attribute properties” approach using randomized property testing—managing to entirely avoid having to do veriﬁcation in practice. In section 6.7, we apply this technique to a real compiler, providing some evidence that the testing method works well enough. We conclude with some related work in section 6.8 and some discussion in section 6.9. In particular, we note that although the testing approach is less perfect than veriﬁcation, there are several reasons why it might work better than one might initially expect. Lastly, we note that our theoretical development has left us with a notion of blame: even if an interference bug slips through testing, we are able to identify the extension at fault, and problems in the composed language are not emergent behavior with no solution.

6.1. The problem

6.1 The problem

The problem of interference between composable extensions arises because the de- velopers of independent language extensions (EAand EB) are unable to examine the

composed language H / (EA]_∅EB). Each artifact for H, H / EA, and H / EB are

whole programs about which their developers can reason or write tests. It may not be possible to construct any trees in either individual extended language (that is,

H / EA or H / EB) that demonstrate any ﬂaws, but we may be able to do so for

the composed language. Indeed, it may be difficult to precisely identify what “flaw” means, as no one has necessarily developed a semantics for the composed language specifying how the two extensions should interact. Worse still, having found a tree that reveals (what we have decided is) a flaw in the composed compiler, there may not be any obvious way to fix it. Both extensions may seem perfectly innocent in isolation, and the flaw may be the result of an unfortunate interaction that seems the fault of neither or seems to require glue code to fix. When extensions are developed independently, the differing developers may be quite willing to simply point their fingers at each other, resolving nothing.

For example, consider figure 6.1, showing two extensions to a Boolean expression language (like that from figure 3.5 for reference, but referring back is likely unneces- sary). Each extension in this figure introduces some syntax and some associated synthesized attribute for analysis (aspects for or and literal shown, the rest omitted).

EAattempts to discover the use of unsanitized input (e.g., in a normal programming

language, to detect SQL-injection vulnerabilities) by introducing a taint annotation on expressions, as well as an analysis for discovering whether tainted values are used

6.1. The problem production taint e::Expr ::= x::Expr { e.is_tainted = true; forwards to x; } aspect or

e::Expr ::= l::Expr r::Expr { e.is_tainted = l.is_tainted || r.is_tainted; } aspect literal e::Expr ::= b::Boolean { e.is_tainted = false; } production identity e::Expr ::= x::Expr { forwards to x.id_transform; } aspect or

e::Expr ::= l::Expr r::Expr { e.id_transform = or(l.id_transform, r.id_transform); } aspect literal e::Expr ::= b::Boolean { e.id_transform = literal(b); }

Figure 6.1: A simple example of interference. Left: E_A. Right: E_B.

in a subexpression. That is, we expect

or(literal(false), taint(t)).is_tainted

to discover the tainted subtree, for any t. Extension EB does something seemingly

useless, but also perfectly innocent: it transforms an input tree (in the host language) to itself. It does this by way of a synthesized attribute on expressions that recur- sively reconstructs the same expression. (Although identity seems useless, it is the simplest of tree transformations, which are generally quite useful.)

Because this EB transformation is only deﬁned on host language productions (and

is unaware of other extensions like EAand so cannot handle taint except via forward-

ing) this attribute has the eﬀect of replacing forwarding productions with what they forward to. In the ﬁgure, we see that the implementors of EA have their taint pro-

duction simply forward to whatever expression it wraps. Thus, the analysis’s success depends on their is_tainted analysis being applied to the forwarding production.

In document Reliably composable language extensions (Page 182-191)