In this section, we describe a collection API that we incrementalize (partially) in this chapter. To avoid notation conflicts, we represent lists via datatype List a, defined as follows:
data List a= Nil | Cons a (List a)
We also consider as primitive operation a standard mapping function map. We also support two restricted forms of aggregation: (a) folding over an abelian group via fold, similar to how one usually folds over a monoid;1 (b) list concatenation via concat. We will not discuss how to differentiate concat, as we reuse existing solutions by Firsov and Jeltsch .
singleton:: a → List a singleton x= Cons x Nil
map:: (a → b) → List a → List b map f Nil= Nil
map f (Cons x xs)= Cons (f x) (map f xs) fold:: AbelianGroupChangeStruct b ⇒ List b → b fold Nil= mempty
fold (Cons x xs)= x ⋄ fold xs -- Where ⋄ is infix for mappend. concat:: List (List a) → List a
concat= . . .
While usually fold requires only an instance Monoid b of type class Monoid to aggregate collection elements, our variant of fold requires an instance of type class GroupChangeStruct, a subclass of Monoid. This type class is not used by fold itself, but only by its derivative, as we explain in Sec. 11.3.2; nevertheless, we add this stronger constraint to fold itself because we forbid derivatives
Chapter 11. A tour of differentiation examples 71 with stronger type-class constraints. With this approach, all clients of fold can be incrementalized using differentiation.
Using those primitives, one can define further higher-order functions on collections such as concatMap, filter, foldMap. In turn, these functions form the kernel of a collection API, as studied for instance by work on the monoid comprehension calculus [Grust and Scholl, 1996a; Fegaras and Maier, 1995; Fegaras, 1999], even if they are not complete by themselves.
concatMap:: (a → List b) → List a → List b concatMap f = concat ◦ map f
filter:: (a → Bool) → List a → List a
filter p= concatMap (λx → if p x then singleton x else Nil) foldMap:: AbelianGroupChangeStruct b ⇒ (a → b) → List a → b foldMap f = fold ◦ map f
In first-order DSLs such as SQL, such functionality must typically be added through separate primitives (consider for instance filter), while here we can simply define, for instance, filter on top of concatMap, and incrementalize the resulting definitions using differentiation.
Function filter uses conditionals, which we haven’t discussed yet; we show how to incrementalize filtersuccessfully in Sec. 11.6.
Changes to type-class instances?
In this whole chapter, we assume that type-class instances, such as fold’s AbelianGroupChangeStruct argument, do not undergo changes. Since type-class instances are closed top-level definitions of operations and are canonical for a datatype, it is hard to imagine a change to a type-class instance. On the other hand, type-class instances can be encoded as first-class values. We can for instance imagine a fold taking a unit value and an associative operation as argument. In such scenarios, one needs additional effort to propagate changes to operation arguments, similarly to changes to the function argument to map.
Let’s now discuss how to incrementalize fold. We consider an oversimplified change structure that allows only two sorts of changes: prepending an element to a list or removing the list head of a non-empty list, and study how to incrementalize fold for such changes:
data ListChange a= Prepend a | Remove instance ChangeStruct (List a) where
type∆(List a) = ListChange a xs ⊕ Prepend x= Cons x xs (Cons x xs) ⊕ Remove= xs
Nil ⊕ Remove= error "Invalid change" dfold xs (Prepend x)= . . .
Removing an element from an empty list is an invalid change, hence it is safe to give an error in that scenario as mentioned when introducing ⊕ (Sec. 10.3).
By using equational reasoning as suggested in Sec. 11.2, starting from Eq. (11.1), one can show formally that dfold xs (Prepend x) should be a change that, in a sense, “adds” x to the result using group operations:
72 Chapter 11. A tour of differentiation examples dfold xs (Prepend x)
=∆fold (xs ⊕ Prepend x) ⊖ fold xs
= fold (Cons x xs) ⊖ fold xs = (x ⋄ fold xs) ⊖ fold xs
Similarly, dfold (Cons x xs) Remove should instead “subtract” x from the result: dfold (Cons x xs) Remove
=∆fold (Cons x xs ⊕ Remove) ⊖ fold (Cons x xs)
= fold xs ⊖ fold (Cons x xs) = fold xs ⊖ (x ⋄ fold xs)
As discussed, using ⊖ is fast enough on, say, integers or other primitive types, but not in general. To avoid using ⊖ we must rewrite its invocation to an equivalent expression. In this scenario we can use group changes for abelian groups, and restrict fold to situations where such changes are available.
dfold:: AbelianGroupChangeStruct b ⇒ List b → ∆(List b) → ∆b dfold xs (Prepend x)= inject x
dfold (Cons x xs) Remove= inject (invert x) dfold Nil Remove= error "Invalid change"
To support group changes we define the following type classes to model abelian groups and group change structures, omitting APIs for more general groups. AbelianGroupChangeStruct only requires that group elements of type g can be converted into changes (type ∆g), allowing change type ∆g to contain other sorts of changes.
class Monoid g ⇒ AbelianGroup g where invert:: g → g
class (AbelianGroup a, ChangeStruct a) ⇒ AbelianGroupChangeStruct a where -- Inject group elements into changes. Law: -- a ⊕ inject b = a ⋄ b
inject :: a → ∆a
Chapter 16 discusses how we can use group changes without assuming a single group is defined on elements, but here we simply select the canonical group as chosen by type-class resolution. To use a different group, as usual, one defines a different but isomorphic type via the Haskell newtype construct. As a downside, derivatives newtype constructors must convert changes across different representations.
Rewriting ⊖ away can also be possible for other specialized folds, though sometimes they can be incrementalized directly; for instance map can be written as a fold. Incrementalizing map for the insertion of x into xs requires simplifying map f (Cons x xs) ⊖ map f xs. To avoid ⊖ we can rewrite this change statically to Insert (f x); indeed, we can incrementalize map also for more realistic change structures.
Associative tree folds Other usages of fold over sequences produce result type of small bounded size (such as integers). In this scenario, one can incrementalize the given fold efficiently using ⊖ instead of relying on group operations. For such scenarios, one can design a primitive
Chapter 11. A tour of differentiation examples 73 for associative tree folds, that is, a function that folds over the input sequence using a monoid (that is, an associative operation with a unit). For efficient incrementalization, foldMonoid’s intermediate results should form a balanced tree and updating this tree should take logarithmic time: one approach to ensure this is to represent the input sequence itself using a balanced tree, such as a finger tree [Hinze and Paterson, 2006].
Various algorithms store intermediate results of folding inside an input balanced tree, as described by Cormen et al. [2001, Ch. 14] or by Hinze and Paterson . But intermediate results can also be stored outside the input tree, as is commonly done using self-adjusting computation [Acar, 2005, Sec. 9.1], or as can be done in our setting. While we do not use such folds, we describe the existing algorithms briefly and sketch how to integrate them in our setting.
Function foldMonoid must record the intermediate results, and the derivative dfoldMonoid must propagate input changes to affected intermediate results.
2To study time complexity of input change propagation, it is useful to consider the dependency graphof intermediate results: in this graph, an intermediate result v1has an arc to intermediate result v2if and only if computing v1depends on v2. To ensure dfoldMonoid is efficient, the dependency graph of intermediate results from foldMonoid must form a balanced tree of logarithmic height, so that changes to a leaf only affect a logarithmic number of intermediate results.
In contrast, implementing foldMonoid using foldr on a list produces an unbalanced graph of intermediate results. For instance, take input list xs = [1 . . 8], containing numbers from 1 to 8, and assume a given monoid. Summing them with foldr (⋄) mempty xs means evaluating
1 ⋄ (2 ⋄ (3 ⋄ (4 ⋄ (5 ⋄ (6 ⋄ (7 ⋄ (8 ⋄ mempty))))))).
Then, a change to the last element of input xs affects all intermediate results, hence incrementaliza- tion takes at least O(n). In contrast, using foldAssoc on xs should evaluate a balanced tree similar to
((1 ⋄ 2) ⋄ (3 ⋄ 4)) ⋄ ((5 ⋄ 6) ⋄ (7 ⋄ 8)),
so that any individual change to a leave, insertion or deletion only affects O(logn) intermediate results (where n is the sequence size). Upon modifications to the tree, one must ensure that the balancing is stable [Acar, 2005, Sec. 9.1]. In other words, altering the tree (by inserting or removing an element) must only alter O(logn) nodes.
We have implemented associative tree folds on very simple but unbalanced tree structures; we believe they could be implemented and incrementalized over balanced trees representing sequences, such as finger trees or random access zippers [Headley and Hammer, 2016], but doing so requires transforming their implementation of their data structure to cache-transfer style (CTS) (Chapter 17). We leave this for future work, together with an automated implementation of CTS transformation.
Modifying list elements
In this section, we consider another change structure on lists that allows expressing changes to individual elements. Then, we present dmap, derivative of map for this change structure. Finally, we sketch informally the correctness of dmap, which we prove formally in Example 14.1.1.
We can then represent changes to a list (List a) as a list of changes (List (∆a)), one for each element. A list change dxs is valid for source xs if they have the same length and each element change is valid for its corresponding element. For this change structure we can define ⊕ and ⊚, but not a total ⊖: such list changes can’t express the difference between two lists of different lengths. Nevertheless, this change structure is sufficient to define derivatives that act correctly
74 Chapter 11. A tour of differentiation examples on the changes that can be expressed. We can describe this change structure in Haskell using a type-class instance for class ChangeStruct:
instance ChangeStruct (List a) where type∆(List a) = List (∆a)
Nil ⊕ Nil= Nil
(Cons x xs) ⊕ (Cons dx dxs)= Cons (x ⊕ xs) (dx ⊕ dxs) _ ⊕ _ = Nil
The following dmap function is a derivative for the standard map function (included for reference) and the given change structure. We discuss derivatives for recursive functions in Sec. 15.1.
map: (a → b) → List a → List b map f Nil= Nil
map f (Cons x xs)= Cons (f x) (map f xs)
dmap: (a → b) → ∆(a → b) → List a → ∆List a → ∆List b -- A valid list change has the same length as the base list: dmap f df Nil Nil= Nil
dmap f df (Cons x xs) (Cons dx dxs)= Cons (df x dx) (dmap f df xs dxs)
-- Remaining cases deal with invalid changes, and a dummy -- result is sufficient.
dmap f df xs dxs= Nil
Function dmap is a correct derivative of map for this change structure, according to Slogan 10.3.3: we sketch an informal argument by induction. The equation for dmap f df Nil Nil returns Nil, a valid change from initial to updated outputs, as required. In the equation for dmap f df (Cons x xs) (Cons dx dxs)we compute changes to the head and tail of the result, to produce a change from map f (Cons x xs)to map (f ⊕ df ) (Cons x xs ⊕ Cons dx dxs). To this end, (a) we use df x dx to compute a change to the head of the result, from f x to (f ⊕ df ) (x ⊕ dx); (b) we use dmap f df xs dxs recursively to compute a change to the tail of the result, from map f xs to map (f ⊕ df ) (xs ⊕ dxs); (c) we assemble changes to head and tail with Cons into a change from In other words, dmap turns input changes to output changes correctly according to our Slogan 10.3.3: it is a correct derivative for map according to this change structure. We have reasoned informally; we formalize this style of reasoning in Sec. 14.1. Crucially, our conclusions only hold if input changes are valid, hence term map f xs ⊕ dmap f df xs dxsis not denotationally equal to map (f ⊕ df ) (xs ⊕ dxs) for arbitrary change environments: these two terms only evaluate to the same result for valid input changes.
Since this definition of dmap is a correct derivative, we could use it in an incremental DSL for list manipulation, together with other primitives. Because of limitations we describe next, we will use instead improved language plugins for sequences.
We have shown simplified list changes, but they have a few limitations. Fixing those requires more sophisticated definitions.
As discussed, our list changes intentionally forbid changing the length of a list. And our definition of dmap has further limitations: a change to a list of n elements takes size O(n), even when most elements do not change, and calling dmap f df on it requires n calls to df . This is only faster if df is faster than f , but adds no further speedup.
We can describe instead a change to an arbitrary list element x in xs by giving the change dx and the position of x in xs. A list change is then a sequence of such changes:
Chapter 11. A tour of differentiation examples 75 type∆(List a) = List (AtomicChange a)
data AtomicChange a= Modify Z (∆a)
However, fetching the i-th list element still takes time linear in i: we need a better representation of sequences. In next section, we switch to a change structure on sequences defined by finger trees [Hinze and Paterson, 2006], following Firsov and Jeltsch .