Lessons for language embedders

In document Optimizing and Incrementalizing Higher-order Collection Queries by AST Transformation (Page 64-66)

5.5 Lifting collections

7.2.4 Lessons for language embedders

Various domains, such as the one considered in our case study, allow powerful domain-specific optimizations. Such optimizations often are hard to express in a compositional way, hence they cannot be performed while building the query but must be expressed as global optimizations passes. For those domains, deep embedding is key to allow significant optimizations. On the other hand, deep embedding requires to implement an interpreter or a compiler.

On the one hand, interpretation overhead is significant in Scala, even when using HOAS to take advantage of the metalevel implementation of argument access.

Instead of interpreting a program, one can compile a EDSL program to Scala and load it, as done by Rompf et al. [2011]; while we are using this approach, the disadvantage is the compilation delay, especially for Scala whose compilation process is complex and time-consuming. Possible alternatives include generating bytecode directly or combining interpretation and compilations similarly to tiered JIT compilers, where only code which is executed often is compiled. We plan to investigate such alternatives in future work.

8https://issues.scala-lang.org/browse/SI-5651, reported by us.

9One could of course write a specific implicit conversions for this case; however,(a, (b, c))requires already a

Chapter 8

Related work

This chapter builds on prior work on language-integrated queries, query optimization, techniques for DSL embedding, and other works on code querying.


Language-Integrated Queries

Microsoft’s Language-Integrated Query technology (Linq) [Meijer et al., 2006; Bierman et al., 2007] is similar to our work in that it also reifies queries on collections to enable analysis and optimization. Such queries can be executed against a variety of backends (such as SQL databases or in-memory objects), and adding new back-ends is supported. Its implementation uses expression trees, a compiler-supported implicit conversion between expressions and their reification as a syntax tree. There are various major differences, though. First, the support for expression trees is hard- coded into the compiler. This means that the techniques are not applicable in languages that do not explicitly support expression trees. More importantly, the way expression trees are created in Linq is generic and fixed. For instance, it is not possible to create different tree nodes for method calls that are relevant to an analysis (such as themapmethod) than for method calls that are irrelevant

for the analysis (such as thetoStringmethod). For this reason, expression trees in Linq cannot be

customized to the task at hand and contain too much low-level information. It is well-known that this makes it quite hard to implement programs operating on expression trees [Eini, 2011].

Linq queries can also not easily be decomposed and modularized. For instance, consider the task of refactoring the filter in the query from x in y where x.z == 1 select x into a function. Defining this function as bool comp(int v) { return v == 1; } would destroy the possibility of analyzing the filter for optimization, since the resulting expression tree would only contain a reference to an opaque function. The function could be declared as returning an expression tree instead, but then this function could not be used in the original query anymore, since the compiler expects an expression of type bool and not an expression tree of type bool. It could only be integrated if the expression tree of the original query is created by hand, without using the built-in support for expression trees.

Although queries against in-memory collections could theoretically also be optimized in Linq, the standard implementation, Linq2Objects, performs no optimizations.

A few optimized embedded DSLs allow executing queries or computations on distributed clusters. DryadLINQ [Yu et al., 2008], based on Linq, optimizes queries for distributed execution. It inherits Linq’s limitations and thus does not support decomposing queries in different modules. Modulariz- ing queries is supported instead by FlumeJava [Chambers et al., 2010], another library (in Java) for distributed query execution. However, FlumeJava cannot express many optimizations because its

48 Chapter 8. Related work representation of expressions is more limited; also, its query language is more cumbersome. Both problems are rooted in Java’s limited support for embedded DSLs. Other embedded DSLs support parallel platforms such as GPUs or many-core CPUs, such as Delite [Rompf et al., 2013].

Willis et al. [2006, 2008] add first-class queries to Java through a source-to-source translator and implement a few selected optimizations, including join order optimization and incremental maintenance of query results. They investigate how well their techniques apply to Java programs, and they suggest that programmers use manual optimizations to avoid expensive constructs like nested loops. While the goal of these works is similar to ours, their implementation as an external source-to-source-translator makes the adoption, extensibility, and composability of their technique difficult.

There have been many approaches for a closer integration of SQL queries into programs, such as HaskellDB [Leijen and Meijer, 1999] (which also inspired Linq), or Ferry [Grust et al., 2009] (which moves part of a program execution to a database). In Scala, there are also APIs which integrate SQL queries more closely such as Slick.1 Its frontend allows to define and combine type-safe queries, similarly to ours (also in the way it is implemented). However, the language for defining queries maps to SQL, so it does not support nesting collections in other collections (a feature which simplified our example in Sec. 2.2), nor distinguishes statically between different kinds of collections, such asSetorSeq. Based on Ferry, ScalaQL [Garcia et al., 2010] extends Scala with a compiler-plugin

to integrate a query language on top of a relational database. The work by Spiewak and Zhao [2009] is unrelated to [Garcia et al., 2010] but also called ScalaQL. It is similar to our approach in that it also proposes to reify queries based on for-comprehensions, but it is not clear from their paper how the reification works.2

In document Optimizing and Incrementalizing Higher-order Collection Queries by AST Transformation (Page 64-66)