In this chapter we have toured what can and cannot be incrementalized using differentiation, and how using higher-order functions allows defining generic primitives to incrementalize.
Changes and differentiation,
To support incrementalization, in this chapter we introduce differentiation and formally prove it correct. That is, we prove that ⟦ D ⟦ t ⟧ ⟧ produces derivatives. As we explain in Sec. 12.1.2, derivatives transform valid input changes into valid output changes (Slogan 10.3.3). Hence, we define what are valid changes (Sec. 12.1). As we’ll explain in Sec. 12.1.4, validity is a logical relation. As we explain in Sec. 12.2.2, our correctness theorem is the fundamental property for the validity logical relation, proved by induction over the structure of terms. Crucial definitions or derived facts are summarized in Fig. 12.1. Later, in Chapter 13 we study consequences of correctness and change operations.
All definitions and proofs in this and next chapter is mechanized in Agda, except where otherwise indicated. To this writer, given these definitions all proofs have become straightforward and unsurprising. Nevertheless, first obtaining these proofs took a while. So we typically include full proofs. We also believe these proofs clarify the meaning and consequences of our definitions. To make proofs as accessible as possible, we try to provide enough detail that our target readers can follow along without pencil and paper, at the expense of making our proofs look longer than they would usually be. As we target readers proficient with STLC (but not necessarily proficient with logical relations), we’ll still omit routine steps needed to reason on STLC, such as typing derivations or binding issues.
Changes and validity
In this section we introduce formally (a) a description of changes; (b) a definition of which changes are valid. We have already introduced informally in Chapter 10 these notions and how they fit together. We next define the same notions formally, and deduce their key properties. Language plugins extend these definitions for base types and constants that they provide.
To formalize the notion of changes for elements of a set V, we define the notion of basic change structureon V.
Definition 12.1.1 (Basic change structures)
A basic change structure on set V, writteneV, comprises: (a) a change set ∆V
80 Chapter 12. Changes and differentiation, formally
∆ι = . . .
∆(σ → τ ) = σ → ∆σ → ∆τ
(a) Change types (Definition 12.1.17).
∆ε = ε
∆ (Γ, x: τ ) = ∆Γ, x : τ, dx : ∆τ
(b) Change contexts (Definition 12.1.23).
D ⟦ t ⟧ D ⟦ λ(x : σ ) → t ⟧ = λ(x : σ ) (dx : ∆σ ) → D ⟦ t ⟧ D ⟦ s t ⟧ = D ⟦ s ⟧ t D ⟦ t ⟧ D ⟦ x ⟧ = dx D ⟦ c ⟧ = DC⟦ c ⟧ (c) Differentiation (Definition 10.5.1). Γ ⊢ t: τ ∆Γ ⊢ D ⟦ t ⟧ : ∆τ Derive
(d) Differentiation typing (Lemma 12.2.8).
dv ▷ v1,→ v2: τ with v1, v2: ⟦τ ⟧, dv : ⟦ ∆τ ⟧ dv ▷ v1,→ v2: ι = . . . df ▷ f1,→ f2: σ → τ = ∀da ▷ a1,→ a2: σ . df a1da ▷ f1a1,→ f2a2: τ dρ ▷ ρ1,→ ρ2: Γ with ρ1, ρ2: ⟦ Γ ⟧, dρ : ⟦ ∆Γ ⟧ ▷ ,→ : ε dρ ▷ ρ1,→ ρ2: Γ da ▷ a1,→ a2: τ (dρ, x= a1, dx = da) ▷ (ρ1, x = a1),→ (ρ2, x = a2): Γ, x : τ
(e) Validity (Definitions 12.1.21 and 12.1.24).
If Γ ⊢ t : τ then
∀dρ ▷ ρ1,→ ρ2: Γ. ⟦ D ⟦ t ⟧ ⟧ dρ ▷ ⟦ t ⟧ ρ1,→ ⟦ t ⟧ ρ2: τ .
(f) Correctness of D ⟦ – ⟧ (from Theorem 12.2.2).
Figure 12.1: Defining differentiation and proving it correct. The rest of this chapter explains and motivates the above definitions.
Chapter 12. Changes and differentiation, formally 81 (b) a ternary validity relation dv ▷ v1,→ v2: V, for v1, v2∈ V and dv ∈ ∆V, that we read as “dv is a valid change from source v1to destination v2(on set V)”. □ Example 12.1.2
In Examples 10.3.1 and 10.3.2 we exemplified informally change types and validity on naturals, integers and bags. We define formally basic change structures on naturals and integers. Compared to validity for integers, validity for naturals ensures that the destination v1+ dv is again a natural. For instance, given source v1 = 1, change dv = −2 is valid (with destination v2 = −1) only on
integers, not on naturals. □
Definition 12.1.3 (Basic change structure on integers)
Basic change structureeZ on integers has integers as changes (∆Z = Z) and the following validity judgment.
dv ▷ v1,→ v1+ dv : Z
□ Definition 12.1.4 (Basic change structure on naturals)
Basic change structureeN on naturals has integers as changes (∆N = Z) and the following validity judgment.
v1+ dv ⩾ 0 dv ▷ v1,→ v1+ dv : N
□ Intuitively, we can think of a valid change from v1to v2as a graph edge from v1to v2, so we’ll often use graph terminology when discussing changes. This intuition is robust and can be made fully precise.1More specifically, a basic change structure on V can be seen as a directed multigraph, having as vertexes the elements of V, and as edges from v1to v2the valid changes dv from v1to v2. This is a multigraph because our definition allows multiple edges between v1and v2.
A change dv can be valid from v1to v2and from v3to v4, but we’ll still want to talk about the source and the destination of a change. When we talk about a change dv valid from v1to v2, value v1is dv’s source and v2is dv’s destination. Hence we’ll systematically quantify theorems over valid changes dv with their sources v1and destination v2, using the following notation.2
Notation 12.1.5 (Quantification over valid changes) We write
∀dv ▷ v1,→ v2: V . P,
and say “for all (valid) changes dv from v1to v2on set V we have P”, as a shortcut for ∀v1, v2∈ V , dv ∈∆V, if dv ▷ v1,→ v2: V then P.
Since we focus on valid changes, we’ll omit the word “valid” when clear from context. In
particular, a change from v1to v2is necessarily valid. □
1See for instance Robert Atkey’s blog post [Atkey, 2015] or Yufei Cai’s PhD thesis [Cai, 2017].
2If you prefer, you can tag a change with its source and destination by using a triple, and regard the whole triple
(v1, dv, v2)as a change. Mathematically, this gives the correct results, but we’ll typically not use such triples as changes in programs for performance reasons.
82 Chapter 12. Changes and differentiation, formally We can have multiple basic change structures on the same set.
Example 12.1.6 (Replacement changes)
For instance, for any set V we can talk about replacement changes on V: a replacement change dv= !v2for a value v1: V simply specifies directly a new value v2: V, so that !v2▷ v1,→ v2: V. We read ! as the “bang” operator.
A basic change structure can decide to use only replacement changes (which can be appropriate for primitive types with values of constant size), or to make ∆V a sum type allowing both replacement changes and other ways to describe a change (as long as we’re using a language plugin that adds
sum types). □
Nil changes Just like integers have a null element 0, among changes there can be nil changes: Definition 12.1.7 (Nil changes)
We say that dv : ∆V is a nil change for v : V if dv ▷ v ,→ v : V. □
For instance, 0 is a nil change for any integer number n. However, in general a change might be nil for an element but not for another. For instance, the replacement change !6 is a nil change on 6 but not on 5.
We’ll say more on nil changes in Sec. 12.1.2 and 13.2.1.
Next, we define a basic change structure that we calleA → eBfor an arbitrary function space A → B, assuming we have basic change structures for both A and B. We claimed earlier that valid function changes map valid input changes to valid output changes. We make this claim formal through next definition.
Definition 12.1.8 (Basic change structure onA → B)
Given basic change structures on A and B, we define a basic change structure on A → B that we writeeA → eBas follows:
(a) Change set ∆(A → B) is A → ∆A → ∆B.
(b) Function change df is valid from f1to f2(that is, df ▷ f1,→ f2: A → B) if and only if, for all valid input changes da ▷ a1,→ a2: A, value df a1dais a valid output change from f1a1to
f2a2(that is, df a1da ▷ f1a1,→ f2a2: B). □
Notation 12.1.9 (Applying function changes to changes)
When reading out df a1dawe’ll often talk for brevity about applying df to da, leaving da’s source
a1implicit when it can be deduced from context. □
We’ll also consider valid changes df for curried n-ary functions. We show what their validity means for curried binary functions f : A → B → C. We omit similar statements for higher arities, as they add no new ideas.
Lemma 12.1.10 (Validity onA → B → C)
For any basic change structureseA,eBandeC, function change df : ∆(A → B → C) is valid from f1to f2(that is, df ▷ f1 ,→ f2 : A → B → C) if and only if applying df to valid input changes da ▷ a1,→ a2: A and db ▷ b1,→ b2: B gives a valid output change
Chapter 12. Changes and differentiation, formally 83 Proof. The equivalence follows from applying the definition of function validity of df twice.
That is: function change df is valid (df ▷ f1,→ f2: A → (B → C)) if and only if it maps valid input change da ▷ a1,→ a2: A to valid output change
df a1da ▷ f1a1,→ f2a2: B → C.
In turn, df a1 dais a function change, which is valid if and only if it maps valid input change db ▷ b1,→ b2: B to
df a1da b1db ▷ f a1b1,→ f a2b2: C
as required by the lemma. □
Among valid function changes, derivatives play a central role, especially in the statement of correctness of differentiation.
Definition 12.1.11 (Derivatives)
Given function f : A → B, function df : A → ∆A → ∆B is a derivative for f if, for all changes da from a1to a2on set A, change df a1dais valid from f a1to f a2. □
However, it follows that derivatives are nil function changes: Lemma 12.1.12 (Derivatives as nil function changes)
Given function f : A → B, function df : ∆(A → B) is a derivative of f if and only if df is a nil
change of f (df ▷ f ,→ f : A → B). □
Proof. First we show that nil changes are derivatives. First, a nil change df ▷ f ,→ f : A → B has the right type to be a derivative, because A → ∆A → ∆B = ∆(A → B). Since df is a nil change from f to f , by definition it maps valid input changes da ▷ a1,→ a2: A to valid output changes df a1da ▷ f a1,→ f a2: B. Hence df is a derivative as required.
In fact, all proof steps are equivalences, and by tracing them backward, we can show that derivatives are nil changes: Since a derivative df maps valid input changes da ▷ a1,→ a2: A to valid output changes df a1da ▷ f a1,→ f a2: B, df is a change from f to f as required. □ Applying derivatives to nil changes gives again nil changes. This fact is useful when reasoning on derivatives. Its proof is a useful exercise on validity.
Lemma 12.1.13 (Derivatives preserve nil changes)
For any basic change structureseAandeB, if function change df : ∆(A → B) is a derivative of f : A → B (df ▷ f ,→ f : A → B) then applying df to an arbitrary input nil change da ▷ a ,→ a : A gives a nil change
df a da ▷ f a ,→ f a : B. □
Proof. Just rewrite the definition of derivatives (Lemma 12.1.12) using the definition of validity of df.
In detail, by definition of validity for function changes (Definition 12.1.8), df ▷ f1,→ f2: A → B means that from da ▷ a1,→ a2: A follows df a1da ▷ f1a1,→ f2a2: B. Just substitute f1= f2= f
and a1= a2= a to get the required implication. □
Also derivatives of curried n-ary functions f preserve nil changes. We only state this formally for curried binary functions f : A → B → C; higher arities require no new ideas.
84 Chapter 12. Changes and differentiation, formally Lemma 12.1.14 (Derivatives preserve nil changes onA → B → C)
For any basic change structureseA,eBandeC, if function change df : ∆(A → B → C) is a derivative of f : A → B → C then applying df to nil changes da ▷ a ,→ a : A and db ▷ b ,→ b : B gives a nil change
df a da b db ▷ f a b ,→ f a b : C. □
Proof. Similarly to validity on A → B → C (Lemma 12.1.10), the thesis follows by applying twice the fact that derivatives preserve nil changes (Lemma 12.1.13).
In detail, since derivatives preserve nil changes, df is a derivative only if for all da ▷ a ,→ a : A we have df a da ▷ f a ,→ f a : B → C. But then, df a da is a nil change, that is a derivative, and since it preserves nil changes, df is a derivative only if for all da ▷ a ,→ a : A and db ▷ b ,→ b : B
we have df a da b db ▷ f a b ,→ f a b : C. □
Basic change structures on types
After studying basic change structures in the abstract, we apply them to the study of our object language.
For each type τ , we can define a basic change structure on domain ⟦τ ⟧, which we writeeτ. Language plugins must provide basic change structures for base types. To provide basic change structures for function types σ → τ , we use the earlier construction for basic change structures e
σ →eτ on function spaces ⟦ σ → τ ⟧ (Definition 12.1.8). Definition 12.1.15 (Basic change structures on types)
For each type τ we associate a basic change structure on domain ⟦τ ⟧, writteneτ through the following equations:
eι= . . .
σ → τ =eσ →eτ
Plugin Requirement 12.1.16 (Basic change structures on base types)
For each base type ι, the plugin defines a basic change structure on ⟦ι ⟧ that we writeeι. □ Crucially, for each type τ we can define a type ∆τ of changes, that we call change type, such that the change set ∆ ⟦τ ⟧ is just the domain ⟦ ∆τ ⟧ associated to change type ∆τ: ∆ ⟦τ ⟧ = ⟦ ∆τ ⟧. This equation allows writing change terms that evaluate directly to change values.3
Definition 12.1.17 (Change types)
The change type ∆τ of a type τ is defined as follows: ∆ι = . . .
∆(σ → τ ) = σ → ∆σ → ∆τ □
Lemma 12.1.18 (∆ and ⟦ – ⟧ commute on types)
For each type τ , change set ∆ ⟦τ ⟧ equals the domain of change type ⟦ ∆τ ⟧. □ Proof. By induction on types. For the case of function types, we simply prove equationally that ∆ ⟦ σ → τ ⟧ = ∆(⟦ σ ⟧ → ⟦τ ⟧) = ⟦ σ ⟧ → ∆ ⟦ σ ⟧ → ∆ ⟦τ ⟧ = ⟦ σ ⟧ → ⟦ ∆σ ⟧ → ⟦ ∆τ ⟧ = ⟦ σ → ∆σ → ∆τ ⟧ = ⟦ ∆(σ → τ ) ⟧. The case for base types is delegates to plugins (Plugin Require-
ment 12.1.19). □
3Instead, in earlier proofs [Cai et al., 2014] the values of change terms were not change values, but had to be related to
Chapter 12. Changes and differentiation, formally 85 Plugin Requirement 12.1.19 (Base change types)
For each base type ι, the plugin defines a change type ∆ι such that ∆ ⟦ι ⟧ = ⟦ ∆ι ⟧. □ We refer to values of change types as change values or just changes.
We write basic change structures for typeseτ, not⟦ τ ⟧, and dv ▷ v1g ,→ v2: τ , not dv ▷ v1,→ v2:
⟦ τ ⟧. We also write consistently ⟦ ∆τ ⟧, not ∆ ⟦ τ ⟧. □
Validity as a logical relation
Next, we show an equivalent definition of validity for values of terms, directly by induction on types, as a ternary logical relation between a change, its source and destination. A typical logical relation constrains functions to map related input to related outputs. In a twist, validity constrains function changesto map related inputs to related outputs.
Definition 12.1.21 (Change validity)
We say that dv is a (valid) change from v1to v2(on type τ ), and write dv ▷ v1,→ v2: τ , if dv : ⟦ ∆τ ⟧, v1, v2: ⟦τ ⟧ and dv is a “valid” description of the difference from v1to v2, as we define in Fig. 12.1e.□
The key equations for function types are: ∆(σ → τ ) = σ → ∆σ → ∆τ
df ▷ f1,→ f2: σ → τ = ∀da ▷ a1,→ a2: σ . df a1da ▷ f1a1,→ f2a2: τ Remark 12.1.22
We have kept repeating the idea that valid function changes map valid input changes to valid output changes. As seen in Sec. 10.4 and Lemmas 12.1.10 and 12.1.14, such valid outputs can in turn be valid function changes. We’ll see the same idea at work in Lemma 12.1.27, in the correctness proof of D ⟦ – ⟧.
As we have finally seen in this section, this definition of validity can be formalized as a logical relation, defined by induction on types. We’ll later take for granted the consequences of validity,
together with lemmas such as Lemma 12.1.10. □
Change structures on typing contexts
To describe changes to the inputs of a term, we now also introduce change contexts ∆Γ, environment changes dρ : ⟦ ∆Γ ⟧, and validity for environment changes dρ ▷ ρ1,→ ρ2: Γ.
A valid environment change from ρ1 : ⟦ Γ ⟧ to ρ2 : ⟦ Γ ⟧ is an environment dρ : ⟦ ∆Γ ⟧ that extends environment ρ1with valid changes for each entry. We first define the shape of environment changes through change contexts:
Definition 12.1.23 (Change contexts)
For each context Γ we define change context ∆Γ as follows: ∆ε = ε
∆ (Γ, x : τ ) = ∆Γ, x : τ, dx : ∆τ. □
86 Chapter 12. Changes and differentiation, formally Definition 12.1.24 (Environment change validity)
We define validity for environment changes through judgment dρ ▷ ρ1,→ ρ2: Γ, pronounced “dρ is an environment change from ρ1to ρ2(at context Γ)”, where ρ1, ρ2 : ⟦ Γ ⟧, dρ : ⟦ ∆Γ ⟧, via the following inference rules:
▷ ,→ : ε
dρ ▷ ρ1,→ ρ2: Γ da ▷ a1,→ a2: τ
(dρ, x= a1, dx = da) ▷ (ρ1, x = a1),→ (ρ2, x = a2): Γ, x : τ □ Definition 12.1.25 (Basic change structures for contexts)
To each context Γ we associate a basic change structure on set ⟦ Γ ⟧. We take ⟦ ∆Γ ⟧ as change set
and reuse validity on environment changes (Definition 12.1.24). □
We write dρ ▷ ρ1,→ ρ2: Γ rather than dρ ▷ ρ1,→ ρ2: ⟦ Γ ⟧. □
Finally, to state and prove correctness of differentiation, we are going to need to discuss function changes on term semantics. The semantics of a term Γ ⊢ t : τ is a function ⟦ t ⟧ from environments in ⟦ Γ ⟧ to values in ⟦τ ⟧. To discuss changes to ⟦ t ⟧ we need a basic change structure on function space ⟦ Γ ⟧ → ⟦τ ⟧.
The construction of basic change structures on function spaces (Definition 12.1.8) associates to each context Γ and type τ a basic change structureeΓ →eτ on function space ⟦ Γ ⟧ → ⟦τ ⟧. □ Notation 12.1.28
As usual, we write the change set as ∆(⟦ Γ ⟧ → ⟦τ ⟧); for validity, we write df ▷ f1 ,→ f2 : Γ, τ
rather than df ▷ f1,→ f2: ⟦ Γ ⟧ → ⟦τ ⟧. □