Limitations and future work

In document Optimizing and Incrementalizing Higher-order Collection Queries by AST Transformation (Page 176-178)

In this section we describe limitations to be addressed in future work.

17.5.1

Hiding the cache type

In our experiments, functions of the same type f1, f2:: A → B can be transformed to CTS functions f1:: A → (B, C1), f2:: A → (B, C2)with different cache types C1, C2, since cache types depend on the implementation. We can fix this problem with some runtime overhead by using a single cache type Cache, defined as a tagged union of all cache types. If we defunctionalize function changes, we can index this cache type with tags representing functions, but other approaches are possible and we omit details. We conjecture (but have not proven) this fix gives a type-preserving translation, but leave this question for future work.

Chapter 17. Cache-transfer-style conversion 159

17.5.2

Nested bags

Our implementation of bags makes nested bags overly slow: we represent bags as tree-based maps from elements to multiplicity, so looking up a bag b in a bag of bags takes time proportional to the size of b. Possible solutions in this case include shredding, like done by [Koch et al., 2016]. We have no such problem for nested sequences, or other nested data which can be addressed in O(1).

17.5.3

Proper tail calls

CTS transformation conflicts with proper tail calls, as it turns most tail calls into non-tail calls. In A′NF syntax, tail calls such as let y = f x in g y become let y = f x in let z = g y in z, and in CTS that becomes let (y, cy) = f x in let (z, cz) = g y in (z, (cy, cz)), where the call to gis genuinely not in tail position. This prevents recursion on deeply nested data structures like long lists: but such programs incrementalize inefficiently if deeply nested data is affected, so it is advisable to replace lists by other sequences anyway. It’s unclear whether such fixes are available for other uses of tail calls.

17.5.4

Pervasive replacement values

Thanks to replacement changes, we can compute a change from any v1to any v2in constant time. Cai et al. [2014] use a difference operator ⊖ instead, but it’s hard to implement ⊖ in constant time on values of non-constant size. So our formalization and implementation allow replacement values everywhere to ensure all computations can be incrementalized in some sense. Supporting replacement changes introduces overhead even if they are not used, because it prevents writing self-maintainable CTS derivatives. Hence, to improve performance, one should consider dropping support for replacement values and restricting supported computations. Consider a call to a binary CTS derivative dfc da db c1after computing (y1, c1)= fc a1b1: if db is a replacement change !b2, then dfc must compute a result afresh by invoking fc a2b2, and a2= a1⊕ darequires remembering previous input a1 inside c1. By the same argument, c1 must also remember input b1. Worse, replacement values are only needed to handle cases where incremental computation reduces to recomputation, because the new input is completely different, or because the condition of an if expression changed. Such changes to conditions are forbidden by other works [Koch et al., 2016], we believe for similar reasons.

17.5.5

Recomputing updated values

In some cases, the same updated input might be recomputed more than once. If a derivative df needs some base input x (that is, if df is not self-maintainable), df ’s input cache will contain a copy of x, and df ’s output cache will contain its updated value x ⊕ dx. When all or most derivatives are self-maintainable this is convenient, because in most cases updated inputs will not need to be computed. But if most derivatives are not self-maintainable, the same updated input might be computed multiple times: specifically, if derivative dh calls functions df and dg, and both df and dg need the same base input x, caches for both df and dg will contain the updated value of x ⊕ dx, computed independently. Worse, because of pervasive replacement values (Sec. 17.5.4), derivatives in our case studies tend to not be self-maintainable.

In some cases, such repeated updates should be removable by a standard optimizer after inlining and common-subexpression elimination, but it is unclear how often this happens. To solve this problem, derivatives could take and return both old inputs x1and updated ones x2= x1⊕ dx, and x2could be computed at the single location where dx is bound. In this case, to avoid updates for unused base inputs we would have to rely more on absence analysis (Sec. 17.5.6); pruning function

160 Chapter 17. Cache-transfer-style conversion inputs appears easier than pruning caches. Otherwise, computations of updated inputs that are not used, in a lazy context, might cause space leaks, where thunks for x2= x1⊕ dx1, x3= x2⊕ dx2and so on might accumulate and grow without bounds.

17.5.6

Cache pruning via absence analysis

To reduce memory usage and runtime overhead, it should be possible to automatically remove from transformed programs any caches or cache fragments that are not used (directly or indirectly) to compute outputs. Liu [2000] performs this transformation on CTS programs by using absence analysis, which was later extended to higher-order languages by Sergey et al. [2014]. In lazy languages, absence analysis removes thunks that are not needed to compute the output. We conjecture that, as long as the analysis is extended to not treat caches as part of the output, it should be able to remove unused caches or inputs (assuming unused inputs exist, see Sec. 17.5.4).

17.5.7

Unary vs n-ary abstraction

We only show our transformation correct for unary functions and tuples. But many languages provide efficient support for applying curried functions such as div :: Z → Z → Z, via either the push-enter or eval-apply evaluation model. For instance, invoking div m n should not re- quire the overhead of a function invocation for each argument [Marlow and Peyton Jones, 2006]. Naively transforming such a curried functions to CTS would produce a function divc of type Z → (Z → (Z, DivC2)), DivC1)with DivC1= (), which adds excessive overhead.3Based on prelimi- nary experiments, we believe we can straightforwardly combine our approach with a push-enter or eval-apply evaluation strategy for transformed curried functions. Alternatively, we expect that translating Haskell to Strict Core [Bolingbroke and Peyton Jones, 2009] would take care of turning Haskell into a language where function types describe their arity, which our transformation can easily take care of. We leave a more proper investigation for future work.

In document Optimizing and Incrementalizing Higher-order Collection Queries by AST Transformation (Page 176-178)