Copper - Reliably composable language extensions

appropriate for implementing compiler front-ends. That is, it is well suited to parsing (via Copper, see below), analyzing the abstract syntax tree, and transforming trees. Its current design is not well-suited to non-tree analysis activities (such as iterative algorithms over graphs), however. This means our target compiler design will likely go from source code to an intermediate language and then hand off to another system to complete compilation (lower level optimizations, code generation, etc). This could be a compiler back-end (such as LLVM) or it could simply be pretty printing a program (that has had all extensions translated away) to hand off to a traditional compiler. Thus, our target domain of extensibility is the concrete and abstract syntax of the language. When we speak of domain-specific optimizations, we expect these to be high-level optimizations applied to the AST, rather than lower level optimizations concerned with basic blocks and instructions.

3.3 Copper

This thesis is not concerned directly with concrete syntax, thanks to an “off the shelf” solution provided by Copper [66, 22]. Copper is a context-aware scanner and parser generator that is ideal for language extension for several reasons. The context- sensitivity of its lexer allows the introduction of keywords in language extensions without causing clashes between extensions, as the parsing context is sufficient to disambiguate them. This is ensured by a modular determinism analysis [67, 68] that, although it makes certain restrictions on the syntax that can be introduced in an extension, confines the possible errors at composition time to conflicts between “marking terminals,” which will be explained shortly. These conflicts are the ex- pected kind that programmers can easily handle, as they routinely do essentially the

3.3. Copper

same thing for ordinary libraries (renaming or preﬁxing clashing symbol names, for example, when two imported libraries export the same name.)

The modular determinism analysis works roughly as follows: (∀i ∈ [1, n].isComposable(CF GH_{, CF G}E

i ) ∧ conflictFree(CFG

H_{∪ {CFG}E i}))

=⇒ conflictFree(CFGH∪{CFGE₁, . . . , CFGE_n})

Here, we ensure that an extension passes the analysis (isComposable(CF GH_{, CF G}E i )),

and that the result of composing the host language and this single extension does not have any conﬂicts (conflictFree(CFGH∪ {CFGE_i )})). As a result, we can compose

any number of such extensions and the result will also have no conﬂicts, and we can

therefore build an LR parser. This ability to achieve a global property by analyzing individual artifacts in isolation is why we call this analysis modular.

We apply an additional check to each extension (isComposable(CF GH_{, CF G}E i )),

about which we will not go into in exhaustive technical detail. But roughly speaking, it imposes two restrictions at the boundaries between host and extension syntax. First, we call a transition from host language (an existing nonterminal in H) to extension syntax a bridge production. All bridge productions must have the form:

HostNT ::= MarkingToken <extension syntax>

The marking token signals an unambiguous transition from host syntax to extension, and must not be preceded by anything else in the right-hand side of the production. If two marking tokens clash, then Copper allows the user to give them a transparent

preﬁx that simply precedes the token, disambiguating them. The preﬁx is essentially

a module indicator (like std::cout in C++ or os.path in Python.)

The second restriction concerns transition from extension syntax to host language syntax—or in a sense, the transition back again. Given an extension production with

3.3. Copper

a host nonterminal in its signature, which looks something like:

ExtNT ::= . . . HostNT T . . .

We require the terminal T already be in the follow set of HostNT in the host lan- guage. In other words, there must already be syntax in the host language where a

HostNT is potentially followed by a T . In real world languages, this might mean an

expression could easily be followed by a close parenthesis ')' but a statement might not be permitted to be followed by anything but a ';'. This is to signal an unambiguous transition back from the host language to the extension without involving new terminals or rules added to the host parsing table which may cause conﬂicts.

For this thesis, we consider the syntactic composition problem solved by Copper, though there is undoubtedly room for improvement. Instead, we focus on providing the equivalent of the modular determinism analysis, but for composing semantics (attribute grammars), instead of composing syntax (LR grammars).

Chapter 4 Integrating AGs and functional programming

This chapter will present a subset of Silver which we call Ag. Our aim with Ag is to distill Silver down to its essential features for two purposes. First, we wish to examine those particular issues involved in blending the features of an attribute- grammar based language with those of modern functional programming languages (the subject of this chapter.) Second, it will serve as the language on which we develop an analysis related to ensuring well-deﬁned composition in chapter 5. And so, Ag contains all features of Silver that are interesting for these purposes.

Attribute grammars play a central role in our story of how language extension can be accomplished. But this means host language compilers must be implemented in an attribute grammar-based language. As a result, we have two major goals for this language.

First, it must be analyzable for the purposes of reasoning about extensibility and composition. Attribute grammars were originally described in an expression-language agnostic way. Some attribute grammar-based languages provide this expression language by essentially hybridizing with another language (such as Java or Haskell) which makes this sort of analysis diﬃcult. If attribute equations are simply arbitrary Java code, then we may have some trouble understanding how they might interact when composed with an extension. For example, that code might involve mutation, and extensions can easily cause an unexpected mutation to happen multiple times or

zero times. As a result, Ag is a programming language in its own right, in order to limit the power of expressions to that which we can reason about in a modular way. But equally important, if we are to insist on host language compilers (and extensions) being written in this language, it must be a fundamentally good one. For example, some other kinds of attribute grammar-based languages chose an extremely limited expression language, leaving the programmers quite constrained, especially compared to general purpose languages they are used to. This often makes program- mers really feel like they are trapped in an inferior sort of speciﬁcation language, instead of the programming language they would prefer to use to implement com- pilers. And so, in this chapter we reach for functional programming as a means to make this language less cumbersome and more amenable to the modern development of large programs. In particular, we develop Ag as, in essence, a (nearly) general- purpose purely-functional programming language, but with attribute grammars as their fundamental data abstraction mechanism (instead of algebraic datatypes, or objects.)

Ag is a language with the attribute grammar features discussed in chapter 3, but with several features typical of functional languages as well. This includes the integration of a Hindley-Milner-style type system, parameterized nonterminals, an attribute grammar equivalent of generalized algebraic datatypes, functions, pattern matching on attribute grammars, and some useful ways of leveraging types that are specific to attribute grammars. We also identify and explore some difficulties in pulling off this integration, and point to a few areas in the design space that might be interesting future work.

This chapter is based on our previously published paper “Integrating functional programming and attribute grammar language features”[69]. We will begin in sec-

4.1. Language Design Goals

tion 4.1 by describing in more detail our goals for the language design. Following that in section 4.2, we will introduce the syntax and semantics of Ag, including some dis- cussion of interesting design points and alternatives. In section 4.3, we will describe our type system for Ag, again considering some design alternatives, while highlight- ing points of friction and synergy between the AG and functional styles. Useful integration of pattern matching without compromising the extensibility properties, and without unexpected behavior, requires a number of careful design decisions, con- sidered throughout section 4.4. Finally, we conclude with some consideration of the language’s extensibility properties (section 4.5) and related work (section 4.6).

4.1 Language Design Goals

Attribute grammars come with a straightforward means of composing together fragments (or modules.) Because each module consists only of a collection of unordered declarations, we can take composition of several modules to be the simple union of the set of declarations from each module. Preservation of this easy composition property is the first goal for the design of an implementation language for language extensions. The forwarding feature was invented to allow introducing new productions without causing problems with other extensions that introduce new attributes. While mere composition of fragments is the first hurdle, the next battle is to ensure that the resulting composed attribute grammar is well-defined. Each extended language (e.g.

H / E1) may be well-deﬁned, but composition leaves us no assurances about the

result: H / (E1]∅E2) (this assurance is the subject of the next chapter.) Forwarding

is intended to make it possible for the result to still be well-deﬁned, as equations for new attributes on new productions can be evaluated via the forwarded-to tree.

In document Reliably composable language extensions (Page 69-75)