• No results found

Outline of the thesis

integrating a language feature of interest, host language designers can be concerned with modifying the host language design to expand the space of possible extensions to encompass that feature instead. This can help focus design attention on important things, instead of on bikeshedding.

1.5 Outline of the thesis

In chapter 2 (related work), we look at some of the history of research into extensible languages. We discuss the expression problem in more detail, and survey the compiler construction and language implementation literature, discussing other work that is not directly drawn upon as part of our solution, and how they compare to it. Note that while we give an overall set of related work here, each chapter also includes its own concluding related work, and the related work of chapter 7 (section 7.7) is also relevant to the thesis as a whole, as this chapter is a synthesis of our contributions.

In chapter 3 (background), we introduce the prior work that we do directly draw upon for this thesis, in particular attribute grammars, forwarding, the Silver pro- gramming language, and the Copper parser generator [22]. Copper is an integral part of our solution, solving the syntax side of the composition problem, while this thesis solves the semantics side. Although this thesis is concerned with the Silver pro- gramming language, it is worth emphasizing that this language is for implementing extensible compilers. The languages that we are making extensible are any language with a compiler implemented using Silver. Silver is not the language we are interested in making extensible (although it is, by virtue of being implemented in itself).

In chapter 4, we introduce the language Ag, which is the relevant subset of the Silver programming language. This integrates the various features of attribute gram-

1.5. Outline of the thesis

mars we introduced in chapter 3, together with specifying its type system. We further develop a novel semantics for pattern matching, which ensures that pattern matching and attribute evaluation behave identically. This ensures that the use of pattern matching does not compromise the extensibility of the program (which forwarding was introduced to enable). Finally, we observe that, with Ag, the language com- position operators merely specify which modules are being composed, and there is no special meaning to their shape, other than implications for dependencies between modules.

In chapter 5, we develop the modular well-definedness analysis that applies to Ag modules independently of each other. Full details of the previous chapter are not strictly necessary, though one should have a good understanding of how the language Ag works, as this is an analysis over it. This analysis is central to ensuring that composition of Silver modules will succeed, and as a result ensuring composition of language extensions will succeed. The primary restriction is that language extensions cannot introduce new non-forwarding productions for existing nonterminals, thus ensuring everyone agrees on a “ground truth” set of these. The remaining restrictions are primarily concerned with other aspects of ensuring attribute grammars are well- defined, including the use of flow types for tracking dependencies.

In chapter 6, we introduce the problem of interference. With the exception of understanding how non-forwarding productions are special (as well as understanding how forwarding works generally), this chapter does not require in-depth understand- ing of previous ones. (With the possible exception of a good understanding of dec- orated trees, emphasized in section 4.2.1.) To ensure extensions are non-interfering, we impose a requirement that they only rely on coherent properties. Given this, we show how we are able to take modular proofs of properties and extend these auto-

1.5. Outline of the thesis

matically to the composed attribute grammar, ruling out interference. Further, we show that there is a testing-based approach to finding interference problems, without requiring the extension developers to write specifications and proofs of their exten- sions. Despite the potential for testing to miss bugs, coherence also gives us precise notion of blame: if a interference problem does arise in practice, it will be the fault of a single extension observable in isolation, and not a “gestalt” failure arising from the composition.

In chapter 7, we synthesize these tools into AbleC, a C compiler front-end capable of reliable composition of language extensions. For AbleC, we build several example language extensions under the constraints imposed by our analyses, and we attempt to probe the upper and lower bounds of the space of extensions we can build for AbleC. We also observe how changes to the C host language (and AbleC host language implementation) could have enabled a broader class of language extensions.

Finally, in chapter 8, we re-summarize these contributions with more detail, note important future work, and conclude.

Chapter 2

Related work

Before we move on to background on attribute grammars and the contributions of this thesis, we take a look at other language specification systems. Many of these are simply methods of building compilers, but each has different characteristics. At its simplest, for example, if we choose to represent our abstract syntax trees (ASTs) using objects, then the compiler has extensible syntax (ignoring parsing concerns,) since new derived classes can be introduced. Or if the AST is represented using something like algebraic datatypes, then the compiler has extensible semantics, as arbitrary new functions can be written over the AST.

Beyond just looking at these tools as ways of specifying compilers and languages, there are also many tools that were created specifically to support extensible lan- guages. In this chapter, we will explore how each of these does not meet our goals. In some cases, there is no support for composing language extensions. In others, the kinds of language extensions that can be composed are far too limited (do not support new analysis, for example.) Finally, there are some which can attempt to compose extensions, but without any assurance of success. The result of composing several extensions together may be a compiler that crashes on some inputs.

In this chapter we take a very high-level view of each of these systems, and compare them primarily to our goals. In later chapters, as we make specific contributions, we will make more detailed comparisons where it is interesting to do so.

2.1. Extensible languages, historically

2.1 Extensible languages, historically

The notion of extensible languages is not a new one, it dates back to nearly the inven- tion of compilers themselves. However, early work focused on the idea of an extensible

language rather than language extensions more generally. As a direct consequence,

most of the work focused on macros as the tool to achieve their goals, sometimes to the point of there being nothing but macros between parser and assembler (for example, META II [23].) These were often quite rudimentary; Algol 60, for example, simply had call by name procedures. More modern macro systems will be examined more closely in section 2.3.4.

One of the earliest pieces of relevant historical work was the introduction of the notion, not of an extensible language, but of an extensible compiler. In 1971, Scowen wrote [24]:

“The normal approach in providing an extensible programming language seems to be to design and implement a base language which has facilities enabling the programmer to define and use extensions. This paper dis- cusses a solution using an alternative approach in which extensions are made by changing the compiler.”

They defined several goals for an extensible compiler which, in essence, covered most of our goals: new syntax, new semantics, good error messages, and complex transla- tion.

However, there are two notable things about these goals that well characterize the historical research in extensible languages. First, the possibility of composing together multiple extensions was not considered. Language extension was considered just a way to modify a language to better suit some purpose. Second, one of the

2.1. Extensible languages, historically

stated goals for their approach requires that “it is possible to define a subset or to change the meaning of a language.” This goal is in direct conflict with composition, and may be why composition was never considered a possibility. If one extension unexpectedly changes the meaning of the host language, we cannot realistically expect other extensions to continue to work. As a result, the goals of much early language extension research differed considerably from our own.

Another revealing piece of historical work is a paper introducing Simula 67 to the International Conference on Extensible Programming Languages in 1971. This conference took place twice, representing the peak of historical interest in extensi- ble programming languages, and it’s worth consider why interest died down subse- quently. Simula was described–unexpectedly, to us–as an extensible programming language [25]:

“Extensions can be divided into syntactic extensions1 and extensions by

introduction of new data types and of operations on these data types. The extensions in Simula are of the latter form. They correspond to the concept of class.”

We believe this hints at, essentially, a profoundly different reason for interest in ex- tensible languages than modern reasons. Programming languages of this era lacked sufficient abstraction mechanisms. The problem of introducing a new type, like com- plex numbers, or linked lists, looked like a language extension problem, because ex- tending the language was the only known way of introducing such things. With the development of object-orientation and other abstractions (e.g. data and procedural

1It is worth pointing out that new data types and operations would otherwise be syntactic

extensions, so this seems more identifying a subgroup of syntactic extension rather than a separate group in contrast to it.