In this section we state and prove correctness of differentiation, a term-to-term transformation written D ⟦ t ⟧ that produces incremental programs. We recall that all our results apply only to well-typed terms (since we formalize no other ones).

Earlier, we described how D ⟦ – ⟧ behaves through Slogan 10.3.3 — here is it again, for reference: Slogan 10.3.3

TermD ⟦ t ⟧ maps input changes to output changes. That is, D ⟦ t ⟧ applied to initial base inputs and valid input changes (from initial inputs to updated inputs) gives a valid output change from t applied

on old inputs to t applied on new inputs. □

In our slogan we do not specify what we meant by inputs, though we gave examples during the discussion. We have now the notions needed for a more precise statement. Term D ⟦ t ⟧ must satisfy our slogan for two sorts of inputs:

1. Evaluating D ⟦ t ⟧ must map an environment change dρ from ρ1 to ρ2 into a valid result change ⟦ D ⟦ t ⟧ ⟧ dρ, going from ⟦ t ⟧ ρ1to ⟦ t ⟧ ρ2.

2. As we learned since stating our slogan, validity is defined by recursion over types. If term t has type σ → τ , change ⟦ D ⟦ t ⟧ ⟧ dρ can in turn be a (valid) function change (Remark 12.1.22). Function changes map valid changes for their inputs to valid changes for their outputs.

Chapter 12. Changes and differentiation, formally 87
Instead of saying that ⟦ D ⟦ t ⟧ ⟧ maps dρ ▷ ρ1,→ ρ2 : Γ to a change from ⟦ t ⟧ ρ1to ⟦ t ⟧ ρ2,
we can say that function ⟦ t ⟧∆_{= λρ dρ → ⟦ D ⟦ t ⟧ ⟧ dρ must be a nil change for ⟦ t ⟧, that is, a}
derivativefor ⟦ t ⟧. We give a name to this function change, and state D ⟦ – ⟧’s correctness theorem.
Definition 12.2.1 (Incremental semantics)

We define the incremental semantics of a well-typed term Γ ⊢ t : τ in terms of differentiation as: ⟦ t ⟧∆= (λρ1dρ →⟦ D ⟦ t ⟧ ⟧ dρ) : ⟦ Γ ⟧ → ⟦ ∆Γ ⟧ → ⟦ ∆τ ⟧ . □

Theorem 12.2.2 (D ⟦ – ⟧ is correct)

Function ⟦ t ⟧∆_{is a derivative of ⟦ t ⟧. That is, if Γ ⊢ t : τ and dρ ▷ ρ1}_{,→ ρ2}_{: Γ then ⟦ D ⟦ t ⟧ ⟧ dρ ▷}

⟦ t ⟧ ρ1,→ ⟦ t ⟧ ρ2: τ . □

For now we discuss this statement further; we defer the proof to Sec. 12.2.2. Remark 12.2.3 (Why⟦ – ⟧∆ignoresρ1)

Incremental semantics ⟦ t ⟧∆_{= λρ1}_{dρ →}

⟦ D ⟦ t ⟧ ⟧ dρ can safely ignore ρ1because ⟦ – ⟧∆assumes that change environment dρ is valid (dρ ▷ ρ1 ,→ ρ2 : Γ), so dρ extends environment ρ1and ρ1

provides no further information. □

Remark 12.2.4 (Term derivatives)

In Chapter 10, we suggested that D ⟦ t ⟧ only produced a derivative for closed terms, not for open
ones. But ⟦ t ⟧∆ _{= λρ dρ → ⟦ D ⟦ t ⟧ ⟧ dρ is always a nil change and derivative of ⟦ t ⟧ for any}
Γ ⊢ t: τ . There is no contradiction, because the value of D ⟦ t ⟧ is ⟦ D ⟦ t ⟧ ⟧ dρ, which is only a nil
change if dρ is a nil change as well. In particular, for closed terms (Γ = ε), dρ must equal the empty
environment , hence dρ is a nil change. If τ is a function type, df = ⟦ D ⟦ t ⟧ ⟧ dρ accepts further
inputs; since df must be a valid function change, it will also map them to valid outputs as required
by our Slogan 10.3.3. Finally, if Γ = ε and τ is a function type, then df = ⟦ D ⟦ t ⟧ ⟧ is a derivative
of f = ⟦ t ⟧ .

We summarize this remark with the following definition and corollary. □ Definition 12.2.5 (Derivatives of terms)

For all closed terms of function type ⊢ t : σ → τ , we call D ⟦ t ⟧ the (term) derivative of t. □ Corollary 12.2.6 (Term derivatives evaluate to derivatives)

For all closed terms of function type ⊢ t : σ → τ , function ⟦ D ⟦ t ⟧ ⟧ is a derivative of ⟦ t ⟧ .□ Proof. Because ⟦ t ⟧∆is a derivative (Theorem 12.2.2), and applying derivative ⟦ t ⟧∆to nil change

gives a derivative (Lemma 12.1.13). □

Remark 12.2.7

We typically talk a derivative of a function value f : A → B, not the derivative, since multiple different functions can satisfy the specification of derivatives. We talk about the derivative to refer to a canonically chosen derivative. For terms and their semantics, the canonical derivative the one produced by differentiation. For language primitives, the canonical derivative is the one chosen by

the language plugin under consideration. □

88 Chapter 12. Changes and differentiation, formally Lemma 12.2.8 (Typing ofD ⟦ – ⟧) Typing rule Γ ⊢ t: τ ∆Γ ⊢ D ⟦ t ⟧ : ∆τ Derive is derivable. □

After we’ll define ⊕, in next chapter, we’ll be able to relate ⊕ to validity, by proving Lemma 13.4.4, which we state in advance here:

Lemma 13.4.4 (⊕ agrees with validity)

If dv ▷ v1,→ v2: τ then v1⊕ dv= v2. _{□}

Hence, updating base result ⟦ t ⟧ ρ1by change ⟦ D ⟦ t ⟧ ⟧ dρ via ⊕ gives the updated result ⟦ t ⟧ ρ2.

Corollary 13.4.5 (D ⟦ – ⟧ is correct, corollary)

If Γ ⊢ t : τ and dρ ▷ ρ1,→ ρ2: Γ then ⟦ t ⟧ ρ1⊕⟦ D ⟦ t ⟧ ⟧ dρ = ⟦ t ⟧ ρ2. _{□}
We anticipate the proof of this corollary:

Proof. First, differentiation is correct (Theorem 12.2.2), so under the hypotheses ⟦ D ⟦ t ⟧ ⟧ dρ ▷ ⟦ t ⟧ ρ1,→ ⟦ t ⟧ ρ2: τ ;

that judgement implies the thesis

⟦ t ⟧ ρ1⊕_{⟦ D ⟦ t ⟧ ⟧ dρ = ⟦ t ⟧ ρ2}

because ⊕ agrees with validty (Lemma 13.4.4). □

### 12.2.1

### Plugin requirements

Differentiation is extended by plugins on constants, so plugins must prove their extensions correct.
Plugin Requirement 12.2.9 (Typing ofDC_{⟦ – ⟧)}

For all ⊢C_{c}: τ , the plugin defines DC_{⟦ c ⟧ satisfying ⊢ D}C_{⟦ c ⟧ : ∆τ .}

□

Plugin Requirement 12.2.10 (Correctness ofDC⟦ – ⟧)
For all ⊢C_{c}_{: τ ,}_{D}C_{⟦ c ⟧}_{is a derivative for ⟦ c ⟧.}

□
Since constants are typed in the empty context, and the only change for an empty environment
is an empty environment, Plugin Requirement 12.2.10 means that for all ⊢C_{c}_{: τ we have}

DC⟦ c ⟧ ▷⟦ c ⟧ ,→ ⟦ c ⟧ : τ .

### 12.2.2

### Correctness proof

Chapter 12. Changes and differentiation, formally 89 Definition 10.5.1 (Differentiation)

Differentiation is the following term transformation:

D ⟦ λ(x : σ ) → t ⟧ = λ(x : σ ) (dx : ∆σ ) → D ⟦ t ⟧ D ⟦ s t ⟧ = D ⟦ s ⟧ t D ⟦ t ⟧

D ⟦ x ⟧ = dx

D ⟦ c ⟧ = DC⟦ c ⟧ □

where DC_{⟦ c ⟧ defines differentiation on primitives and is provided by language plugins (see}
Appendix A.2.5), and dx stands for a variable generated by prefixing x’s name with d, so that
D ⟦ y ⟧ = dy and so on.

Before correctness, we prove Lemma 12.2.8: Lemma 12.2.8 (Typing ofD ⟦ – ⟧)

Typing rule

Γ ⊢ t: τ

∆Γ ⊢ D ⟦ t ⟧ : ∆τ Derive

is derivable. □

Proof. The thesis can be proven by induction on the typing derivation Γ ⊢ t : τ . The case for

constants is delegated to plugins in Plugin Requirement 12.2.9. □

We prove Theorem 12.2.2 using a typical logical relations strategy. We proceed by induction on term t and prove for each case that if D ⟦ – ⟧ preserves validity on subterms of t, then also D ⟦ t ⟧ preserves validity. Hence, if the input environment change dρ is valid, then the result of differentiation evaluates to valid change ⟦ D ⟦ t ⟧ ⟧ dρ.

Readers familiar with logical relations proofs should be able to reproduce this proof on their own, as it is rather standard, once one uses the given definitions. In particular, this proof resembles closely the proof of the abstraction theorem or relational parametricity (as given by Wadler [1989, Sec. 6] or by Bernardy and Lasson [2011, Sec. 3.3, Theorem 3]) and the proof of the fundamental theorem of logical relations by Statman [1985].

Nevertheless, we spell this proof out, and use it to motivate how D ⟦ – ⟧ is defined, more formally than we did in Sec. 10.5. For each case, we first give a short proof sketch, and then redo the proof in more detail to make the proof easier to follow.

Theorem 12.2.2 (D ⟦ – ⟧ is correct)

Function ⟦ t ⟧∆_{is a derivative of ⟦ t ⟧. That is, if Γ ⊢ t : τ and dρ ▷ ρ1}_{,→ ρ2}_{: Γ then ⟦ D ⟦ t ⟧ ⟧ dρ ▷}

⟦ t ⟧ ρ1,→ ⟦ t ⟧ ρ2: τ . □

Proof. By induction on typing derivation Γ ⊢ t : τ .

• Case Γ ⊢ x : τ . The thesis is that ⟦ D ⟦ x ⟧ ⟧ is a derivative for ⟦ x ⟧, that is ⟦ D ⟦ x ⟧ ⟧ dρ ▷ ⟦ x ⟧ ρ1,→ ⟦ x ⟧ ρ2: τ . Since dρ is a valid environment change from ρ1to ρ2, ⟦ dx ⟧ dρ is a valid change from ⟦ x ⟧ ρ1to ⟦ x ⟧ ρ2. Hence, defining D ⟦ x ⟧ = dx satisfies our thesis. • Case Γ ⊢ s t : τ . The thesis is that ⟦ D ⟦ s t ⟧ ⟧ is a derivative for ⟦ s t ⟧, that is ⟦ D ⟦ s t ⟧ ⟧ dρ ▷

⟦ s t ⟧ ρ1,→ ⟦ s t ⟧ ρ2: τ . By inversion of typing, there is some type σ such that Γ ⊢ s : σ → τ and Γ ⊢ t : σ.

90 Chapter 12. Changes and differentiation, formally To prove the thesis, in short, you can apply the inductive hypothesis to s and t, obtaining respec- tively that ⟦ D ⟦ s ⟧ ⟧ and ⟦ D ⟦ t ⟧ ⟧ are derivatives for ⟦ s ⟧ and ⟦ t ⟧. In particular, ⟦ D ⟦ s ⟧ ⟧ evaluates to a validity-preserving function change. Term D ⟦ s t ⟧, that is D ⟦ s ⟧ t D ⟦ t ⟧, applies validity-preserving function D ⟦ s ⟧ to valid input change D ⟦ t ⟧, and this produces a valid change for s t as required.

In detail, our thesis is that for all dρ ▷ ρ1,→ ρ2: Γ we have
⟦ D ⟦ s t ⟧ ⟧ dρ ▷ ⟦ s t ⟧ ρ1,→ ⟦ s t ⟧ ρ2: τ,
where ⟦ s t ⟧ ρ = (⟦ s ⟧ ρ) (⟦ t ⟧ ρ) and
⟦ D ⟦ s t ⟧ ⟧ dρ
= ⟦ D ⟦ s ⟧ t D ⟦ t ⟧ ⟧ dρ
= (⟦ D ⟦ s ⟧ ⟧ dρ) (⟦ t ⟧ dρ) (⟦ D ⟦ t ⟧ ⟧ dρ)
= (⟦ D ⟦ s ⟧ ⟧ dρ) (⟦ t ⟧ ρ1) (_{⟦ D ⟦ t ⟧ ⟧ dρ)}

The last step relies on ⟦ t ⟧ dρ = ⟦ t ⟧ ρ1. Since weakening preserves meaning (Lemma A.2.8), this follows because dρ : ⟦ ∆Γ ⟧ extends ρ1: ⟦ Γ ⟧, and t can be typed in context Γ.

Our thesis becomes

⟦ D ⟦ s ⟧ ⟧ dρ (⟦ t ⟧ ρ1) (_{⟦ D ⟦ t ⟧ ⟧ dρ) ▷ ⟦ s ⟧ ρ1}(_{⟦ t ⟧ ρ1}),→ ⟦ s ⟧ ρ2(_{⟦ t ⟧ ρ2}): τ .
By the inductive hypothesis on s and t we have

⟦ D ⟦ s ⟧ ⟧ dρ ▷ ⟦ s ⟧ ρ1,→ ⟦ s ⟧ ρ2: σ → τ ⟦ D ⟦ t ⟧ ⟧ dρ ▷ ⟦ t ⟧ ρ1,→ ⟦ t ⟧ ρ2: σ . Since ⟦ s ⟧ is a function, its validity means

∀da ▷ a1,→ a2: σ . ⟦ D ⟦ s ⟧ ⟧ dρ a1da ▷⟦ s ⟧ ρ1a1,→ ⟦ s ⟧ ρ2a2: τ .

Instantiating in this statement the hypothesis da ▷ a1 ,→ a2 : σ by ⟦ D ⟦ t ⟧ ⟧ dρ ▷ ⟦ t ⟧ ρ1,→ ⟦ t ⟧ ρ2: σ gives the thesis.

• Case Γ ⊢ λx → t : σ → τ . By inversion of typing, Γ, x : σ ⊢ t : τ . By typing of D ⟦ – ⟧ you can show that

∆Γ, x: σ, dx : ∆σ ⊢ D ⟦ t ⟧ : ∆τ .
In short, our thesis is that ⟦ λx → t ⟧∆_{= λρ1}

dρ →⟦ λx dx → D ⟦ t ⟧ ⟧ dρ is a derivative of ⟦ λx → t ⟧. After a few simplifications, our thesis reduces to

⟦ D ⟦ t ⟧ ⟧ (dρ, x = a1, dx = da) ▷ ⟦ t ⟧ (ρ1, x = a1),→ ⟦ t ⟧ (ρ2, x = a2): τ

for all dρ ▷ ρ1,→ ρ2: Γ and da ▷ a1,→ a2: σ. But then, the thesis is simply that ⟦ t ⟧∆_{is}
the derivative of ⟦ t ⟧, which is true by inductive hypothesis.

More in detail, our thesis is that ⟦ λx → t ⟧∆_{is a derivative for ⟦ λx → t ⟧, that is}
∀dρ ▷ ρ1,→ ρ2: Γ.

Chapter 12. Changes and differentiation, formally 91 By simplifying, the thesis Eq. (12.1) becomes

∀dρ ▷ ρ1,→ ρ2: Γ.

λa1da →⟦ D ⟦ t ⟧ ⟧ (dρ, x = a1, dx = da) ▷

(λa1→⟦ t ⟧ (ρ1, x = a1)),→ (λa2→⟦ t ⟧ (ρ2, x = a2)): σ → τ . (12.2) By definition of validity of function type, the thesis Eq. (12.2) becomes

∀dρ ▷ ρ1,→ ρ2: Γ. ∀da ▷ a1,→ a2: σ .

⟦ D ⟦ t ⟧ ⟧ (dρ, x = a1, dx = da) ▷

⟦ t ⟧ (ρ1, x = a1),→ ⟦ t ⟧ (ρ2, x = a2): τ . (12.3) To prove the rewritten thesis Eq. (12.3), take the inductive hypothesis on t: it says that ⟦ D ⟦ t ⟧ ⟧ is a derivative for ⟦ t ⟧, so ⟦ D ⟦ t ⟧ ⟧ maps valid environment changes on Γ, x : σ to valid changes on τ . But by inversion of the validity judgment, all valid environment changes on Γ, x : σ can be written as

(dρ, x= a1, dx = da) ▷ (ρ1, x = a1),→ (ρ2, x = a2): Γ, x : σ,

for valid changes dρ ▷ ρ1,→ ρ2 : Γ and da ▷ a1 ,→ a2 : σ. So, the inductive hypothesis is that

∀dρ ▷ ρ1,→ ρ2: Γ. ∀da ▷ a1,→ a2: σ .

⟦ D ⟦ t ⟧ ⟧ (dρ, x = a1, dx = da) ▷

⟦ t ⟧ (ρ1, x = a1),→ ⟦ t ⟧ (ρ2, x = a2): τ . (12.4) But that is exactly our thesis Eq. (12.3), so we’re done!

• Case Γ ⊢ c : τ . In essence, since weakening preserves meaning, we can rewrite the thesis to match Plugin Requirement 12.2.10.

In more detail, the thesis is that DC_{⟦ c ⟧ is a derivative for c, that is, if dρ ▷ ρ1}_{,→ ρ}_{2}: Γ then
⟦ D ⟦ c ⟧ ⟧ dρ ▷ ⟦ c ⟧ ρ1,→ ⟦ c ⟧ ρ2 : τ . Since constants don’t depend on the environment
and weakening preserves meaning (Lemma A.2.8), and by the definitions of ⟦ – ⟧ and D ⟦ – ⟧
on constants, the thesis can be simplified toDC⟦ c ⟧ ▷⟦ c ⟧ ,→ ⟦ c ⟧ : τ , which is

delegated to plugins in Plugin Requirement 12.2.10. □