• No results found

Practical Foundations for Programming Languages - Free Computer, Programming, Mathematics, Technical Books, Lecture Notes and Tutorials

N/A
N/A
Protected

Academic year: 2020

Share "Practical Foundations for Programming Languages - Free Computer, Programming, Mathematics, Technical Books, Lecture Notes and Tutorials"

Copied!
590
0
0

Loading.... (view fulltext now)

Full text

(1)

Practical Foundations for Programming

Languages

Robert Harper

Carnegie Mellon University

(2)

Copyright c2012 by Robert Harper.

All Rights Reserved.

The electronic version of this work is licensed under the Cre-ative Commons Attribution-Noncommercial-No DerivCre-ative Works 3.0 United States License. To view a copy of this license, visit

http://creativecommons.org/licenses/by-nc-nd/3.0/us/

(3)

Preface

Types are the central organizing principle of the theory of programming languages. Language features are manifestations of type structure. The syntax of a language is governed by the constructs that define its types, and its semantics is determined by the interactions among those constructs. The soundness of a language design—the absence of ill-defined programs— follows naturally.

The purpose of this book is to explain this remark. A variety of pro-gramming language features are analyzed in the unifying framework of type theory. A language feature is defined by its statics, the rules govern-ing the use of the feature in a program, and itsdynamics, the rules defining how programs using this feature are to be executed. The concept ofsafety emerges as the coherence of the statics and the dynamics of a language.

In this way we establish a foundation for the study of programming languages. But why these particular methods? The main justification is provided by the book itself. The methods we use are bothpreciseand in-tuitive, providing a uniform framework for explaining programming lan-guage concepts. Importantly, these methodsscale to a wide range of pro-gramming language concepts, supporting rigorous analysis of their prop-erties. Although it would require another book in itself to justify this as-sertion, these methods are alsopracticalin that they aredirectly applicableto implementation anduniquely effectiveas a basis for mechanized reasoning. No other framework offers as much.

(4)

(1996),Pierce(2002,2004), andReynolds(1998).

The book is divided into parts that are, in the main, independent of one another. Parts I and II, however, provide the foundation for the rest of the book, and must therefore be considered prior to all other parts. On first reading it may be best to skim Part I, and begin in earnest with Part II, returning to Part I for clarification of the logical framework in which the rest of the book is cast.

Numerous people have read and commented on earlier editions of this book, and have suggested corrections and improvements to it. I am particu-larly grateful to Andrew Appel, Iliano Cervesato, Lin Chase, Derek Dreyer, Zhong Shao, and Todd Wilson for their extensive efforts in reading and criticizing the book. I also thank the following people for their sugges-tions: Arbob Ahmad, Zena Ariola, Eric Bergstrome, Guy Blelloch, William Byrd, Luis Caires, Luca Cardelli, Manuel Chakravarti, Richard C. Cobbe, Karl Crary, Yi Dai, Daniel Dantas, Anupam Datta, Jake Donham, Favo-nia, Matthias Felleisen, Kathleen Fisher, Dan Friedman, Maia Ginsburg, Byron Hawkins, Kevin Hely, Justin Hsu, Cao Jing, Salil Joshi, Gabriele Keller, Scott Kilpatrick, Danielle Kramer, Akiva Leffert, Ruy Ley-Wild, Dan Licata, Karen Liu, Dave MacQueen, Chris Martens, Greg Morrisett, Tom Murphy, Aleksandar Nanevski, Georg Neis, David Neville, Doug Perkins, Frank Pfenning, Jean Pichon, Benjamin Pierce, Andrew M. Pitts, Gordon Plotkin, David Renshaw, John Reynolds, Carter Schonwald, Dale Schu-macher, Dana Scott, Robert Simmons, Pawel Sobocinski, Daniel Spoon-hower, Paulo Tanimoto, Peter Thiemann, Bernardo Toninho, Michael Tschantz, Kami Vaniea, Carsten Varming, David Walker, Dan Wang, Jack Wileden, Roger Wolff, Omer Zach, Luke Zarko, Yu Zhang. I am grateful to the stu-dents of 15–312 and 15–814 at Carnegie Mellon who have provided the impetus for the preparation of this book and who have endured the many revisions to it over the last ten years.

I thank the Max Planck Institute for Software Systems in Germany for its hospitality and support. I also thank Espresso a Mano in Pittsburgh, CB2 Cafe in Cambridge, and Thonet Cafe in Saarbr ¨ucken for providing a steady supply of coffee and a conducive atmosphere for writing.

(5)
(6)
(7)

Contents

Preface iii

I Judgments and Rules 1

1 Syntactic Objects 3

1.1 Abstract Syntax Trees . . . 4

1.2 Abstract Binding Trees . . . 7

1.3 Notes . . . 12

2 Inductive Definitions 15 2.1 Judgments . . . 15

2.2 Inference Rules . . . 16

2.3 Derivations. . . 17

2.4 Rule Induction. . . 19

2.5 Iterated and Simultaneous Inductive Definitions . . . 21

2.6 Defining Functions by Rules . . . 22

2.7 Modes . . . 24

2.8 Notes . . . 25

3 Hypothetical and General Judgments 27 3.1 Hypothetical Judgments . . . 27

3.1.1 Derivability . . . 27

3.1.2 Admissibility . . . 29

3.2 Hypothetical Inductive Definitions . . . 31

3.3 General Judgments . . . 33

3.4 Generic Inductive Definitions . . . 34

(8)

viii CONTENTS

II Statics and Dynamics 37

4 Statics 39

4.1 Syntax . . . 39

4.2 Type System . . . 40

4.3 Structural Properties . . . 42

4.4 Notes . . . 44

5 Dynamics 45 5.1 Transition Systems . . . 45

5.2 Structural Dynamics . . . 47

5.3 Contextual Dynamics . . . 49

5.4 Equational Dynamics . . . 51

5.5 Notes . . . 54

6 Type Safety 55 6.1 Preservation . . . 56

6.2 Progress . . . 56

6.3 Run-Time Errors. . . 58

6.4 Notes . . . 59

7 Evaluation Dynamics 61 7.1 Evaluation Dynamics . . . 61

7.2 Relating Structural and Evaluation Dynamics . . . 63

7.3 Type Safety, Revisited. . . 64

7.4 Cost Dynamics . . . 65

7.5 Notes . . . 66

III Function Types 67 8 Function Definitions and Values 69 8.1 First-Order Functions . . . 70

8.2 Higher-Order Functions . . . 71

8.3 Evaluation Dynamics and Definitional Equality . . . 73

8.4 Dynamic Scope . . . 74

(9)

CONTENTS ix

9 G ¨odel’s T 77

9.1 Statics . . . 78

9.2 Dynamics . . . 79

9.3 Definability . . . 80

9.4 Undefinability . . . 82

9.5 Notes . . . 84

10 Plotkin’s PCF 85 10.1 Statics . . . 87

10.2 Dynamics . . . 88

10.3 Definability . . . 90

10.4 Notes . . . 92

IV Finite Data Types 93 11 Product Types 95 11.1 Nullary and Binary Products . . . 96

11.2 Finite Products . . . 97

11.3 Primitive and Mutual Recursion . . . 98

11.4 Notes . . . 100

12 Sum Types 101 12.1 Nullary and Binary Sums . . . 101

12.2 Finite Sums . . . 103

12.3 Applications of Sum Types. . . 104

12.3.1 Void and Unit . . . 104

12.3.2 Booleans . . . 105

12.3.3 Enumerations . . . 106

12.3.4 Options . . . 106

12.4 Notes . . . 108

13 Pattern Matching 109 13.1 A Pattern Language. . . 110

13.2 Statics . . . 110

13.3 Dynamics . . . 112

13.4 Exhaustiveness and Redundancy . . . 114

13.4.1 Match Constraints . . . 114

13.4.2 Enforcing Exhaustiveness and Redundancy. . . 116

13.4.3 Checking Exhaustiveness and Redundancy . . . 117

(10)

x CONTENTS

13.5 Notes . . . 119

14 Generic Programming 121 14.1 Introduction . . . 121

14.2 Type Operators . . . 122

14.3 Generic Extension . . . 122

14.4 Notes . . . 125

V Infinite Data Types 127 15 Inductive and Co-Inductive Types 129 15.1 Motivating Examples . . . 129

15.2 Statics . . . 133

15.2.1 Types . . . 133

15.2.2 Expressions . . . 134

15.3 Dynamics . . . 134

15.4 Notes . . . 135

16 Recursive Types 137 16.1 Solving Type Isomorphisms . . . 138

16.2 Recursive Data Structures . . . 139

16.3 Self-Reference . . . 141

16.4 The Origin of State . . . 144

16.5 Notes . . . 145

VI Dynamic Types 147 17 The Untypedλ-Calculus 149 17.1 Theλ-Calculus . . . 149

17.2 Definability . . . 150

17.3 Scott’s Theorem . . . 153

17.4 Untyped Means Uni-Typed . . . 155

17.5 Notes . . . 156

18 Dynamic Typing 159 18.1 Dynamically Typed PCF . . . 159

18.2 Variations and Extensions . . . 163

18.3 Critique of Dynamic Typing . . . 166

(11)

CONTENTS xi

19 Hybrid Typing 169

19.1 A Hybrid Language. . . 169

19.2 Dynamic as Static Typing . . . 171

19.3 Optimization of Dynamic Typing . . . 172

19.4 Static Versus Dynamic Typing . . . 175

19.5 Notes . . . 176

VII Variable Types 177 20 Girard’s System F 179 20.1 SystemF . . . 180

20.2 Polymorphic Definability . . . 183

20.2.1 Products and Sums . . . 183

20.2.2 Natural Numbers . . . 184

20.3 Parametricity Overview . . . 185

20.4 Restricted Forms of Polymorphism . . . 186

20.4.1 Predicative Fragment. . . 186

20.4.2 Prenex Fragment . . . 187

20.4.3 Rank-Restricted Fragments . . . 189

20.5 Notes . . . 190

21 Abstract Types 191 21.1 Existential Types . . . 192

21.1.1 Statics . . . 192

21.1.2 Dynamics . . . 193

21.1.3 Safety. . . 194

21.2 Data Abstraction Via Existentials . . . 194

21.3 Definability of Existentials . . . 196

21.4 Representation Independence . . . 197

21.5 Notes . . . 200

22 Constructors and Kinds 201 22.1 Statics . . . 202

22.2 Higher Kinds . . . 204

22.3 Canonizing Substitution . . . 206

22.4 Canonization . . . 209

22.5 Notes . . . 211

(12)

xii CONTENTS

VIII Subtyping 213

23 Subtyping 215

23.1 Subsumption. . . 216

23.2 Varieties of Subtyping . . . 216

23.2.1 Numeric Types . . . 216

23.2.2 Product Types . . . 217

23.2.3 Sum Types . . . 218

23.3 Variance . . . 218

23.3.1 Product and Sum Types . . . 219

23.3.2 Function Types . . . 219

23.3.3 Quantified Types . . . 220

23.3.4 Recursive Types. . . 221

23.4 Safety . . . 223

23.5 Notes . . . 225

24 Singleton Kinds 227 24.1 Overview. . . 228

24.2 Singletons . . . 229

24.3 Dependent Kinds . . . 231

24.4 Higher Singletons . . . 235

24.5 Notes . . . 237

IX Classes and Methods 239 25 Dynamic Dispatch 241 25.1 The Dispatch Matrix . . . 242

25.2 Class-Based Organization . . . 244

25.3 Method-Based Organization . . . 245

25.4 Self-Reference . . . 247

25.5 Notes . . . 249

26 Inheritance 251 26.1 Class and Method Extension . . . 251

26.2 Class-Based Inheritance . . . 253

26.3 Method-Based Inheritance . . . 254

(13)

CONTENTS xiii

X Exceptions and Continuations 257

27 Control Stacks 259

27.1 Machine Definition . . . 259

27.2 Safety . . . 261

27.3 Correctness of the Control Machine. . . 262

27.3.1 Completeness . . . 264

27.3.2 Soundness . . . 264

27.4 Notes . . . 266

28 Exceptions 267 28.1 Failures . . . 267

28.2 Exceptions . . . 269

28.3 Exception Type . . . 271

28.4 Encapsulation of Exceptions . . . 272

28.5 Notes . . . 275

29 Continuations 277 29.1 Informal Overview . . . 277

29.2 Semantics of Continuations . . . 279

29.3 Coroutines . . . 281

29.4 Notes . . . 285

XI Types and Propositions 287 30 Constructive Logic 289 30.1 Constructive Semantics. . . 290

30.2 Constructive Logic . . . 291

30.2.1 Provability. . . 292

30.2.2 Proof Terms . . . 293

30.3 Proof Dynamics . . . 295

30.4 Propositions as Types . . . 296

30.5 Notes . . . 297

31 Classical Logic 299 31.1 Classical Logic . . . 300

31.1.1 Provability and Refutability . . . 300

31.1.2 Proofs and Refutations . . . 302

31.2 Deriving Elimination Forms . . . 305

31.3 Proof Dynamics . . . 306

(14)

xiv CONTENTS

31.4 Law of the Excluded Middle . . . 308

31.5 The Double-Negation Translation . . . 310

31.6 Notes . . . 311

XII Symbols 313 32 Symbols 315 32.1 Symbol Declaration . . . 316

32.1.1 Scoped Dynamics. . . 317

32.1.2 Scope-Free Dynamics . . . 318

32.2 Symbolic References . . . 319

32.2.1 Statics . . . 319

32.2.2 Dynamics . . . 320

32.2.3 Safety. . . 320

32.3 Notes . . . 321

33 Fluid Binding 323 33.1 Statics . . . 323

33.2 Dynamics . . . 324

33.3 Type Safety. . . 325

33.4 Some Subtleties . . . 326

33.5 Fluid References . . . 328

33.6 Notes . . . 330

34 Dynamic Classification 331 34.1 Dynamic Classes . . . 332

34.1.1 Statics . . . 332

34.1.2 Dynamics . . . 333

34.1.3 Safety. . . 334

34.2 Class References . . . 334

34.3 Definability of Dynamic Classes. . . 335

34.4 Classifying Secrets . . . 336

34.5 Notes . . . 337

XIII State 339 35 Modernized Algol 341 35.1 Basic Commands . . . 341

(15)

CONTENTS xv

35.1.2 Dynamics . . . 343

35.1.3 Safety. . . 345

35.2 Some Programming Idioms . . . 346

35.3 Typed Commands and Typed Assignables. . . 348

35.4 Notes . . . 351

36 Assignable References 353 36.1 Capabilities . . . 354

36.2 Scoped Assignables . . . 355

36.3 Free Assignables. . . 357

36.4 Safety for Free Assignables. . . 360

36.5 Benign Effects . . . 362

36.6 Notes . . . 364

XIV Laziness 365 37 Lazy Evaluation 367 37.1 By-Need Dynamics . . . 368

37.2 Safety . . . 372

37.3 Lazy Data Structures . . . 374

37.4 Suspensions . . . 375

37.5 Notes . . . 377

38 Polarization 379 38.1 Positive and Negative Types . . . 380

38.2 Focusing . . . 381

38.3 Statics . . . 382

38.4 Dynamics . . . 384

38.5 Safety . . . 385

38.6 Notes . . . 386

XV Parallelism 387 39 Nested Parallelism 389 39.1 Binary Fork-Join . . . 390

39.2 Cost Dynamics . . . 393

39.3 Multiple Fork-Join . . . 396

39.4 Provably Efficient Implementations. . . 398

39.5 Notes . . . 402

(16)

xvi CONTENTS

40 Futures and Speculations 403

40.1 Futures . . . 404

40.1.1 Statics . . . 404

40.1.2 Sequential Dynamics . . . 404

40.2 Speculations . . . 405

40.2.1 Statics . . . 405

40.2.2 Sequential Dynamics . . . 405

40.3 Parallel Dynamics . . . 406

40.4 Applications of Futures. . . 408

40.5 Notes . . . 411

XVI Concurrency 413 41 Process Calculus 415 41.1 Actions and Events . . . 415

41.2 Interaction . . . 417

41.3 Replication . . . 419

41.4 Allocating Channels . . . 421

41.5 Communication . . . 424

41.6 Channel Passing . . . 427

41.7 Universality . . . 430

41.8 Notes . . . 432

42 Concurrent Algol 435 42.1 Concurrent Algol . . . 436

42.2 Broadcast Communication . . . 438

42.3 Selective Communication . . . 441

42.4 Free Assignables as Processes . . . 444

42.5 Notes . . . 446

43 Distributed Algol 447 43.1 Statics . . . 448

43.2 Dynamics . . . 450

43.3 Safety . . . 451

43.4 Situated Types . . . 452

(17)

CONTENTS xvii

XVII Modularity 457

44 Components and Linking 459

44.1 Simple Units and Linking . . . 460

44.2 Initialization and Effects . . . 461

44.3 Notes . . . 463

45 Type Abstractions and Type Classes 465 45.1 Type Abstraction . . . 467

45.2 Type Classes . . . 468

45.3 A Module Language . . . 472

45.4 First- and Second-Class. . . 476

45.5 Notes . . . 478

46 Hierarchy and Parameterization 481 46.1 Hierarchy . . . 481

46.2 Parameterizaton . . . 485

46.3 Extending Modules with Hierarchies and Parameterization. 488 46.4 Applicative Functors . . . 491

46.5 Notes . . . 493

XVIII Equational Reasoning 495 47 Equational Reasoning for T 497 47.1 Observational Equivalence. . . 498

47.2 Logical Equivalence. . . 502

47.3 Logical and Observational Equivalence Coincide . . . 503

47.4 Some Laws of Equality . . . 506

47.4.1 General Laws . . . 506

47.4.2 Equality Laws . . . 507

47.4.3 Induction Law . . . 507

47.5 Notes . . . 508

48 Equational Reasoning for PCF 509 48.1 Observational Equivalence. . . 509

48.2 Logical Equivalence. . . 510

48.3 Logical and Observational Equivalence Coincide . . . 511

48.4 Compactness. . . 514

48.5 Co-Natural Numbers . . . 517

48.6 Notes . . . 519

(18)

xviii CONTENTS

49 Parametricity 521

49.1 Overview. . . 521

49.2 Observational Equivalence. . . 522

49.3 Logical Equivalence. . . 524

49.4 Parametricity Properties . . . 530

49.5 Representation Independence, Revisited . . . 533

49.6 Notes . . . 534

50 Process Equivalence 537 50.1 Process Calculus. . . 537

50.2 Strong Equivalence . . . 540

50.3 Weak Equivalence . . . 543

50.4 Notes . . . 545

XIX Appendices 547

(19)

Part I

(20)
(21)

Chapter 1

Syntactic Objects

Programming languages are languages, a means of expressing computa-tions in a form comprehensible to both people and machines. The syntax of a language specifies the means by which various sorts of phrases (expres-sions, commands, declarations, and so forth) may be combined to form programs. But what sort of thing are these phrases? What is a program made of?

The informal concept of syntax may be seen to involve several distinct concepts. The surface, or concrete, syntaxis concerned with how phrases are entered and displayed on a computer. The surface syntax is usually thought of as given by strings of characters from some alphabet (say, ASCII or Unicode). Thestructural, orabstract,syntaxis concerned with the struc-ture of phrases, specifically how they are composed from other phrases. At this level a phrase is a tree, called anabstract syntax tree, whose nodes are operators that combine several phrases to form another phrase. The binding structure of syntax is concerned with the introduction and use of identifiers: how they are declared, and how declared identifiers are to be used. At this level phrases areabstract binding trees, which enrich abstract syntax trees with the concepts of binding and scope.

(22)

4 1.1 Abstract Syntax Trees

1.1

Abstract Syntax Trees

Anabstract syntax tree, orastfor short, is an ordered tree whose leaves are variables, and whose interior nodes are operators whose arguments are its children. Ast’s are classified into a variety ofsortscorresponding to differ-ent forms of syntax. Avariablestands for an unspecified, or generic, piece of syntax of a specified sort. Ast’s may be combined by anoperator, which has both a sort and anarity, a finite sequence of sorts specifying the num-ber and sorts of its arguments. An operator of sorts and arity s1, . . . ,sn combinesn≥ 0 ast’s of sorts1, . . . ,sn, respectively, into a compound ast of sorts. As a matter of terminology, a nullaryoperator is one that takes no arguments, aunaryoperator takes one, abinaryoperator two, and so forth. The concept of a variable is central, and therefore deserves special em-phasis. As in mathematics a variable is anunknownobject drawn from some domain, itsrange of signficance. In school mathematics the (often implied) range of significance is the set of real numbers. Here variables range over ast’s of a specified sort. Being an unknown, the meaning of a variable is given bysubstitution, the process of “plugging in” an object from the do-main for the variable in a formula. So, in school, we might plug inπ for xin a polynomial, and calculate the result. Here we would plug in an ast of the appropriate sort for a variable in an ast to obtain another ast. The process of substitution is easily understood for ast’s, because it amounts to a “physical” replacement of the variable by an ast within another ast. We will shortly consider a generalization of the concept of ast for which the substitution process is somewhat more complex, but the essential idea is the same, and bears repeating:a variable is given meaning by substitution.

For example, consider a simple language of expressions built from num-bers, addition, and multiplication. The abstract syntax of such a language would consist of a single sort,Exp, and an infinite collection of operators that generate the forms of expression:num[n]is a nullary operator of sort

Exp whenever n ∈ N; plus and times are binary operators of sort Exp

whose arguments are both of sortExp. The expression 2+ (3×x), which involves a variable,x, would be represented by the ast

plus(num[2];times(num[3];x))

of sortExp, under the assumption that x is also of this sort. Because, say, num[4], is an ast of sort Exp, we may plug it in for x in the above ast to obtain the ast

(23)

1.1 Abstract Syntax Trees 5

which is written informally as 2+ (3×4). We may, of course, plug in more complex ast’s of sortExpforxto obtain other ast’s as result.

The tree structure of ast’s supports a very useful principle of reasoning, calledstructural induction. Suppose that we wish to prove that some prop-erty, P(a), holds of all ast’s, a, of a given sort. To show this it is enough to consider all the ways in which a may be generated, and show that the property holds in each case, under the assumption that it holds for each of its constituent ast’s (if any). So, in the case of the sort Expjust described, we must show

1. The property holds for any variable,x, of sortExp: P(x).

2. The property holds for any number,num[n]: for everyn∈N,P(num[n]).

3. Assuming that the property holds fora1anda2, show that it holds for

plus(a1;a2)andtimes(a1;a2): ifP(a1)andP(a2), thenP(plus(a1;a2))

andP(times(a1;a2)).

Because these cases exhaust all possibilities for the formation of a, we are assured thatP(a)holds for any astaof sortExp.

For the sake of precision, and to prepare the ground for further develop-ments, we will now give precise definitions of the foregoing concepts. Let S be a finite set of sorts. Let{Os}s∈S be a sort-indexed family ofoperators, o, of sort s with arityar(o) = (s1, . . . ,sn). Let{ Xs}s∈S be a sort-indexed family of variables, x, of each sort s. The family A[X] = { A[X]s}s∈S of ast’s of sortsis defined as follows:

1. A variable of sortsis an ast of sorts: ifx∈ Xs, thenx ∈ A[X]s.

2. Operators combine ast’s: ifois an operator of sortssuch thatar(o) = (s1, . . . ,sn), and ifa1 ∈ A[X]s1, . . . ,an∈ A[X]sn, theno(a1;. . .;an)∈

A[X]s.

It follows from this definition that the principle ofstructural inductionmay be used to prove that some property, P, holds of every ast. To showP(a) holds for everya∈ A[X], it is enough to show:

1. Ifx∈ Xs, thenPs(x).

2. Ifo∈ Osandar(o) = (s1, . . . ,sn), then ifPs1(a1)and . . . andPsn(an),

thenPs(o(a1;. . .;an)).

(24)

6 1.1 Abstract Syntax Trees

For example, it is easy to prove by structural induction that ifX ⊆ Y, then A[X]⊆ A[Y].

If X is a sort-indexed family of variables, we writeX,x, wherex is a variable of sorts such that x ∈ X/ s, to stand for the family of sets Y such thatYs = Xs∪ {x}and Ys0 = Xs0 for all s0 6= s. The familyX,x, where x is a variable of sort s, is said to be the family obtained by adjoiningthe variablexto the familyX.

Variables are given meaning bysubstitution. Ifx is a variable of sorts, a ∈ A[X,x]s0, and b ∈ A[X]s, then [b/x]a ∈ A[X]s0 is defined to be the result of substitutingbfor every occurrence ofxina. The astais called the target, andxis called thesubject, of the substitution. Substitution is defined by the following equations:

1. [b/x]x=band[b/x]y=yifx6=y.

2. [b/x]o(a1;. . .;an)=o([b/x]a1;. . .;[b/x]an).

For example, we may check that

[num[2]/x]plus(x;num[3])=plus(num[2];num[3]).

We may prove by structural induction that substitution on ast’s is well-defined.

Theorem 1.1. If a ∈ A[X,x], then for every b ∈ A[X]there exists a unique c∈ A[X]such that[b/x]a=c

Proof. By structural induction on a. If a = x, then c = b by definition, otherwise if a = y 6= x, then c = y, also by definition. Otherwise, a = o(a1, . . . ,an), and we have by induction uniquec1, . . . ,cnsuch that[b/x]a1=

c1and . . . [b/x]an = cn, and socisc= o(c1;. . .;cn), by definition of

sub-stitution.

(25)

1.2 Abstract Binding Trees 7

is essential that distinct parameters determine distinct operators: ifu and vare active parameters, andu 6= v, then cls[u]andcls[v]are different operators. Extensibility is achieved by introducing new active parameters. So, if uis not active, thencls[u]makes no sense, but ifubecomes active, thencls[u]is a nullary operator.

Parameters are easily confused with variables, but they are fundamen-tally different concepts. As we remarked earlier, a variable stands for an unknown ast of its sort, but a parameterdoes not stand for anything. It is a purely symbolic identifier whose only significance is whether it is the same or different as another parameter. Whereas variables are given meaning by substitution, it is not possible, or sensible, to substitute for a parameter. As a consequence, disequality of parameters is preserved by substitution, whereas disequality of variables is not (because the same ast may be sub-stituted for two distinct variables).

To account for the set of active parameters, we will writeA[U;X]for the set of ast’s with variables drawn from X and with parameters drawn fromU. Certain operators, such ascls[u], areparameterizedby parameters, u, of a given sort. The parameters are distinguished from the arguments by the square brackets around them. Instances of such operators are permitted only for parameters drawn from the active set,U. So, for example, ifu∈ U, thencls[u]is a nullary operator, but ifu ∈ U/ , thencls[u]is not a valid operator. In the next section we will introduce the means of extendingUto make operators available within that context.

1.2

Abstract Binding Trees

Abstract binding trees, orabt’s, enrich ast’s with the means to introduce new variables and parameters, called a binding, with a specified range of sig-nificance, called its scope. The scope of a binding is an abt within which the bound identifier may be used, either as a placeholder (in the case of a variable declaration) or as the index of some operator (in the case of a pa-rameter declaration). Thus the set of active identifiers may be larger within a subtree of an abt than it is within the surrounding tree. Moreover, dif-ferent subtrees may introduce identifiers with disjoint scopes. The crucial principle is that any use of an identifier should be understood as a refer-ence, or abstract pointer, to its binding. One consequence is that the choice of identifiers is immaterial, so long as we can always associate a unique binding with each use of an identifier.

As a motivating example, consider the expressionletxbea1ina2, which

(26)

8 1.2 Abstract Binding Trees

introduces a variable,x, for use within the expressiona2to stand for the

ex-pressiona1. The variablexis bound by theletexpression for use withina2;

any use ofxwithina1refers to a different variable that happens to have the

same name. For example, in the expressionletxbe7inx+xoccurrences of x in the addition refer to the variable introduced by the let. On the other hand in the expressionletxbex∗xinx+x, occurrences ofxwithin the multiplication refer to a different variable than those occurring within the addition. The latter occurrences refer to the binding introduced by the let, whereas the former refer to some outer binding not displayed here.

The names of bound variables are immaterial insofar as they determine the same binding. So, for example, the expressionletxbex∗xinx+x could just as well have been writtenletybex∗xiny+ywithout chang-ing its meanchang-ing. In the former case the variablexis bound within the ad-dition, and in the latter it is the variabley, but the “pointer structure” re-mains the same. On the other hand the expressionletxbey∗yinx+x has a different meaning to these two expressions, because now the vari-abley within the multiplication refers to a different surrounding variable. Renaming of bound variables is constrained to the extent that it must not alter the reference structure of the expression. For example, the expression letxbe2in letybe3inx+xhas a different meaning than the expression letybe2in letybe3iny+y, because theyin the expressiony+yin the second case refers to the inner declaration, not the outer one as before.

The concept of an ast may be enriched to account for binding and scope of a variable. These enriched ast’s are calledabstract binding trees, orabt’s for short. Abt’s generalize ast’s by allowing an operator to bind any finite number (possibly zero) of variables in each argument position. An argu-ment to an operator is called anabstractor, and has the form x1, . . . ,xk.a. The sequence of variablesx1, . . . ,xkare bound within the abta. (Whenkis zero, we elide the distinction between.aandaitself.) Written in the form of an abt, the expressionletxbea1ina2has the formlet(a1;x.a2), which

more clearly specifies that the variablexis bound withina2, and not within

a1. We often write~xto stand for a finite sequencex1, . . . ,xnof distinct

vari-ables, and write~x.ato meanx1, . . . ,xn.a.

(27)

1.2 Abstract Binding Trees 9

sequence s1, . . . ,sn of sorts, and we say that~xis of sort~sto mean that the two sequences have the same length and that eachxiis of sortsi.

Thus, for example, the arity of the operatorletis(Exp,(Exp)Exp), which indicates that it takes two arguments described as follows:

1. The first argument is of sortExpand binds no variables.

2. The second argument is of sortExpand binds one variable of sortExp. The definition expressionletxbe2+2inx×xis represented by the abt

let(plus(num[2];num[2]);x.times(x;x)).

LetObe a sort-indexed family of operators,o, with arities,ar(o). For a given sort-indexed family,X, of variables, the sort-indexed family of abt’s, B[X], is defined similarly to A[X], except that the set of active variables changes for each argument according to which variables are bound within it. A first cut at the definition is as follows:

1. Ifx∈ Xs, thenx ∈ B[X]s.

2. Ifar(o) = ((~s1)s1, . . . ,(~sn)sn), and if, for each 1≤i≤n,~xiis of sort~si andai ∈ B[X,~xi]si, theno(~x1.a1;. . .;~xn.an)∈ B[X]s.

The bound variables are adjoined to the set of active variables within each argument, with the sort of each variable determined by the valence of the operator.

This definition is almost correct, but fails to properly account for the behavior of bound variables. An abt of the form let(a1;x.let(a2;x.a3))

is ill-formed according to this definition, because the first binding adjoins xtoX, which implies that the second cannot also adjoinxtoX,xwithout causing confusion. The solution is to ensure that each of the arguments is well-formed regardless of the choice of bound variable names. This is achieved by altering the second clause of the definition using renaming as follows:1

Ifar(o) = ((~x1)s1, . . . ,(~xn)sn), and if, for each 1≤i≤nand for each renaming πi : ~xi ↔ ~x0i, where~x0i ∈ X/ , we have πi·ai ∈ B[X,~xi0], theno(~x1.a1;. . .;~xn.an)∈ B[X]s.

1The action of a renaming extends to abt’s in the obvious way by replacing every occur-rence ofxbyπ(x), including any occurrences in the variable list of an abstractor as well as

within its body.

(28)

10 1.2 Abstract Binding Trees

The renaming ensures that when we encounter nested binders we avoid collisions. This is called thefreshness condition on bindersbecause it ensures that all bound variables are “fresh” relative to the surrounding context.

The principle of structural induction extends to abt’s, and is called struc-tural induction modulo renaming. It states that to show thatP(a)[X]holds for everya∈ B[X], it is enough to show the following:

1. ifx ∈ Xs, thenP[X]s(x).

2. For everyoof sortsand arity((~s1)s1, . . . ,(~sn)sn), and if for each 1≤

i ≤ n, we have P[X,~x0i]sii·ai) for every renaming πi : ~xi ↔ ~x0i, thenP[X]s(o(~x1.a1;. . .;~xn.an)).

The renaming in the second condition ensures that the inductive hypothe-sis holds forallfresh choices of bound variable names, and not just the ones actually given in the abt.

As an example let us define the judgmentx ∈ a, wherea ∈ B[X,x], to mean thatx occurs freein a. Informally, this means that x is bound some-where outside ofa, rather than withinaitself. Ifxis bound withina, then those occurrences ofxare different from those occurring outside the bind-ing. The following definition ensures that this is the case:

1. x∈ x.

2. x ∈ o(~x1.a1;. . .;~xn.an)if there exists 1 ≤ i≤ nsuch that for every fresh renamingπ :~xi ↔~zi we havex ∈π·ai.

The first condition states thatx is free inx, but not free inyfor any vari-abley other than x. The second condition states that if x is free in some argument, independently of the choice of bound variable names in that ar-gument, then it is free in the overall abt. This implies, in particular, thatx isnotfree inlet(zero;x.x).

The relationa =α bofα-equivalence(so-called for historical reasons), is defined to mean thataandbare identical up to the choice of bound variable names. This relation is defined to be the strongest congruence containing the following two conditions:

1. x=α x.

(29)

1.2 Abstract Binding Trees 11

The idea is that we rename~xi and~x0i consistently, avoiding confusion, and check thatai anda0i areα-equivalent. Ifa =α b, thenaandbare said to be

α-variantsof each other.

Some care is required in the definition ofsubstitutionof an abtbof sorts for free occurrences of a variablexof sortsin some abtaof some sort, writ-ten[b/x]a. Substitution is partially defined by the following conditions:

1. [b/x]x =b, and[b/x]y =yifx6=y.

2. [b/x]o(~x1.a1;. . .;~xn.an)=o(~x1.a10;. . .;~xn.a0n), where, for each 1≤ i ≤ n, we require that~xi 6∈ b, and we set ai0 = [b/x]ai ifx ∈/~xi, and a0i = ai otherwise.

If xis bound in some argument to an operator, then substitution does not descend into its scope, for to do so would be to confuse two distinct vari-ables. For this reason we must take care to definea0iin the second equation according to whether or not x ∈ ~xi. The requirement that~xi 6∈ bin the second equation is calledcapture avoidance. If some xi,j occurred free inb, then the result of the substitution[b/x]aiwould in general containxi,j free as well, but then forming~xi.[b/x]ai would incur captureby changing the referent of xi,j to be the jth bound variable of the ith argument. In such casessubstitution is undefined because we cannot replacex bybin ai with-out incurring capture.

One way around this is to alter the definition of substitution so that the bound variables in the result are chosen fresh by substitution. By the prin-ciple of structural induction we know inductively that, for any renaming πi : ~xi ↔ ~x0i with~x0i fresh, the substitution[b/x](πi·ai) is well-defined. Hence we may define

[b/x]o(~x1.a1;. . .;~xn.an)=o(~x10.[b/x](π1·a1);. . .;~x0n.[b/x](πn·an)) for some particular choice of fresh bound variable names (any choice will do). There is no longer any need to take care thatx ∈/~xiin each argument, because the freshness condition on binders ensures that this cannot occur, the variablexalready being active. Noting that

o(~x1.a1;. . .;~xn.an)=α o(~x10.π1·a1;. . .;~x0n.πn·an),

another way to avoid undefined substitutions is to first choose anα-variant of the target of the substitution whose binders avoid any free variables in the substituting abt, and then perform substitution without fear of incur-ring capture. In other words substitution is totally defined onα-equivalence classes of abt’s.

(30)

12 1.3 Notes

To avoid all the bureaucracy of binding, we adopt the following identi-fication conventionthroughout this book:

Abstract binding trees are always to be identified up toα-equivalence.

That is, we implicitly work withα-equivalence classes of abt’s, rather than abt’s themselves. We tacitly assert that all operations and relations on abt’s respectα-equivalence, so that they are properly defined onα-equivalence classes of abt’s. Whenever we examine an abt, we are choosing a repre-sentative of itsα-equivalence class, and we have no control over how the bound variable names are chosen. On the other hand experience shows that any operation or property of interest respectsα-equivalence, so there is no obstacle to achieving it. Indeed, we might say that a property or operation is legitimate exactly insofar as it respectsα-equivalence!

Parameters, as well as variables, may be bound within an argument of an operator. Such binders introduce a “new,” or “fresh,” parameter within the scope of the binder wherein it may be used to form further abt’s. To allow for parameter declaration, the valence of an argument is generalized to indicate the sorts of the parameters bound within it, as well as the sorts of the variables, by writing(~s1;~s2)s, where~s1specifies the sorts of the

pa-rameters and~s2specifies the sorts of the variables. The sort-indexed family

B[U;X]is the set of abt’s determined by a fixed set of operators using the parameters, U, and the variables, X. We rely on naming conventions to distinguish parameters from variables, reservingu and v for parameters, andxandyfor variables.

1.3

Notes

(31)

1.3 Notes 13

for the binding and scope of variables. These ideas were developed further in LF (Harper et al.,1993).

(32)
(33)

Chapter 2

Inductive Definitions

Inductive definitions are an indispensable tool in the study of program-ming languages. In this chapter we will develop the basic framework of inductive definitions, and give some examples of their use. An inductive definition consists of a set of rulesfor derivingjudgments, or assertions, of a variety of forms. Judgments are statements about one or more syntactic objects of a specified sort. The rules specify necessary and sufficient condi-tions for the validity of a judgment, and hence fully determine its meaning.

2.1

Judgments

We start with the notion of ajudgment, orassertion, about a syntactic object. We shall make use of many forms of judgment, including examples such as these:

nnat nis a natural number n= n1+n2 nis the sum ofn1andn2

τtype τis a type

e :τ expressionehas typeτ e ⇓v expressionehas valuev

A judgment states that one or more syntactic objects have a property or stand in some relation to one another. The property or relation itself is called ajudgment form, and the judgment that an object or objects have that property or stand in that relation is said to be aninstanceof that judgment form. A judgment form is also called apredicate, and the objects constituting an instance are its subjects. We writea Jfor the judgment asserting thatJ

(34)

16 2.2 Inference Rules

we write J to stand for an unspecified judgment. For particular judgment forms, we freely use prefix, infix, or mixfix notation, as illustrated by the above examples, in order to enhance readability.

2.2

Inference Rules

Aninductive definitionof a judgment form consists of a collection ofrulesof the form

J1 . . . Jk

J (2.1)

in whichJ and J1, . . . , Jk are all judgments of the form being defined. The judgments above the horizontal line are called thepremisesof the rule, and the judgment below the line is called itsconclusion. If a rule has no premises (that is, whenkis zero), the rule is called an axiom; otherwise it is called a proper rule.

An inference rule may be read as stating that the premises are suffi-cientfor the conclusion: to show J, it is enough to show J1, . . . ,Jk. When k is zero, a rule states that its conclusion holds unconditionally. Bear in mind that there may be, in general, many rules with the same conclusion, each specifying sufficient conditions for the conclusion. Consequently, if the conclusion of a rule holds, then it is not necessary that the premises hold, for it might have been derived by another rule.

For example, the following rules constitute an inductive definition of the judgmentanat:

zeronat (2.2a)

anat

succ(a)nat (2.2b)

These rules specify that a nat holds whenever either a is zero, or a is succ(b) where b nat for some b. Taking these rules to be exhaustive, it follows thatanatiffais a natural number.

Similarly, the following rules constitute an inductive definition of the judgmentatree:

emptytree (2.3a)

a1tree a2tree

node(a1;a2)tree (2.3b)

These rules specify thatatreeholds if eitheraisempty, oraisnode(a1;a2),

(35)

2.3 Derivations 17

thatais a binary tree, which is to say it is either empty, or a node consisting of two children, each of which is also a binary tree.

The judgment a=b nat defining equality of a nat and b natis induc-tively defined by the following rules:

zero=zeronat (2.4a)

a=bnat

succ(a)=succ(b)nat (2.4b)

In each of the preceding examples we have made use of a notational convention for specifying an infinite family of rules by a finite number of patterns, or rule schemes. For example, Rule (2.2b) is a rule scheme that determines one rule, called aninstanceof the rule scheme, for each choice of object a in the rule. We will rely on context to determine whether a rule is stated for aspecificobject, a, or is instead intended as a rule scheme specifying a rule foreach choiceof objects in the rule.

A collection of rules is considered to define thestrongestjudgment that is closed under, or respects, those rules. To be closed under the rules sim-ply means that the rules aresufficientto show the validity of a judgment: J holdsif there is a way to obtain it using the given rules. To be thestrongest judgment closed under the rules means that the rules are alsonecessary: J holds only if there is a way to obtain it by applying the rules. The suffi-ciency of the rules means that we may show that J holds byderivingit by composing rules. Their necessity means that we may reason about it using rule induction.

2.3

Derivations

To show that an inductively defined judgment holds, it is enough to exhibit aderivationof it. A derivation of a judgment is a finite composition of rules, starting with axioms and ending with that judgment. It may be thought of as a tree in which each node is a rule whose children are derivations of its premises. We sometimes say that a derivation of J is evidence for the validity of an inductively defined judgment J.

We usually depict derivations as trees with the conclusion at the bot-tom, and with the children of a node corresponding to a rule appearing above it as evidence for the premises of that rule. Thus, if

J1 . . . Jk J

(36)

18 2.3 Derivations

is an inference rule and∇1, . . . ,∇kare derivations of its premises, then ∇1 . . . ∇k

J

is a derivation of its conclusion. In particular, ifk=0, then the node has no children.

For example, this is a derivation ofsucc(succ(succ(zero)))nat:

zeronat

succ(zero)nat

succ(succ(zero))nat

succ(succ(succ(zero)))nat .

(2.5)

Similarly, here is a derivation ofnode(node(empty;empty);empty)tree:

emptytree emptytree

node(empty;empty)tree emptytree

node(node(empty;empty);empty)tree .

(2.6)

To show that an inductively defined judgment is derivable we need only find a derivation for it. There are two main methods for finding derivations, calledforward chaining, orbottom-up construction, andbackward chaining, ortop-down construction. Forward chaining starts with the axioms and works forward towards the desired conclusion, whereas backward chaining starts with the desired conclusion and works backwards towards the axioms.

More precisely, forward chaining search maintains a set of derivable judgments, and continually extends this set by adding to it the conclusion of any rule all of whose premises are in that set. Initially, the set is empty; the process terminates when the desired judgment occurs in the set. As-suming that all rules are considered at every stage, forward chaining will eventually find a derivation of any derivable judgment, but it is impossible (in general) to decide algorithmically when to stop extending the set and conclude that the desired judgment is not derivable. We may go on and on adding more judgments to the derivable set without ever achieving the intended goal. It is a matter of understanding the global properties of the rules to determine that a given judgment is not derivable.

(37)

2.4 Rule Induction 19

backward chaining is goal-directed. Backward chaining search maintains a queue of current goals, judgments whose derivations are to be sought. Initially, this set consists solely of the judgment we wish to derive. At each stage, we remove a judgment from the queue, and consider all rules whose conclusion is that judgment. For each such rule, we add the premises of that rule to the back of the queue, and continue. If there is more than one such rule, this process must be repeated, with the same starting queue, for each candidate rule. The process terminates whenever the queue is empty, all goals having been achieved; any pending consideration of candidate rules along the way may be discarded. As with forward chaining, back-ward chaining will eventually find a derivation of any derivable judgment, but there is, in general, no algorithmic method for determining in general whether the current goal is derivable. If it is not, we may futilely add more and more judgments to the goal set, never reaching a point at which all goals have been satisfied.

2.4

Rule Induction

Because an inductive definition specifies thestrongestjudgment closed un-der a collection of rules, we may reason about them byrule induction. The principle of rule induction states that to show that a propertyP holds of a judgmentJwheneverJis derivable, it is enough to show thatPisclosed un-der, orrespects, the rules definingJ. WritingP(J)to mean that the property Pholds of the judgment J, we say thatPrespects the rule

J1 . . . Jk J

ifP(J)holds wheneverP(J1), . . . ,P(Jk). The assumptionsP(J1), . . . ,P(Jk) are called theinductive hypotheses, andP(J)is called theinductive conclusion of the inference.

The principle of rule induction is simply the expression of the definition of an inductively defined judgment form as the strongestjudgment form closed under the rules comprising the definition. This means that the judg-ment form defined by a set of rules is both (a) closed under those rules, and (b) sufficient for any other property also closed under those rules. The for-mer means that a derivation is evidence for the validity of a judgment; the latter means that we may reason about an inductively defined judgment form by rule induction.

(38)

20 2.4 Rule Induction

When specialized to Rules (2.2), the principle of rule induction states that to showP(anat)wheneveranat, it is enough to show:

1. P(zeronat).

2. for everya, ifanatandP(anat), then (succ(a)natand)P(succ(a)nat). This is just the familiar principle ofmathematical inductionarising as a spe-cial case of rule induction.

Similarly, rule induction for Rules (2.3) states that to show P(atree) wheneveratree, it is enough to show

1. P(emptytree).

2. for everya1anda2, ifa1treeandP(a1tree), and ifa2 treeandP(a2 tree),

then (node(a1;a2)treeand)P(node(a1;a2)tree).

This is called the principle oftree induction, and is once again an instance of rule induction.

We may also show by rule induction that the predecessor of a natural number is also a natural number. Although this may seem self-evident, the point of the example is to show how to derive this from first principles.

Lemma 2.1. Ifsucc(a)nat, then anat.

Proof. It suffices to show that the property,P(anat)stating that a natand thata =succ(b)impliesbnatis closed under Rules (2.2).

Rule(2.2a) Clearly zero nat, and the second condition holds vacuously, becausezerois not of the formsucc(−).

Rule(2.2b) Inductively we know that a nat and that if a is of the form succ(b), thenbnat. We are to show thatsucc(a)nat, which is imme-diate, and that ifsucc(a)is of the formsucc(b), thenbnat, and we havebnatby the inductive hypothesis.

This completes the proof.

Using rule induction we may show that equality, as defined by Rules (2.4) is reflexive.

(39)

2.5 Iterated and Simultaneous Inductive Definitions 21

Rule(2.2a) Applying Rule (2.4a) we obtainzero=zeronat.

Rule(2.2b) Assume thata=a nat. It follows that succ(a)=succ(a) nat

by an application of Rule (2.4b).

Similarly, we may show that the successor operation is injective.

Lemma 2.3. Ifsucc(a1)=succ(a2)nat, then a1=a2nat.

Proof. Similar to the proof of Lemma2.1.

2.5

Iterated and Simultaneous Inductive Definitions

Inductive definitions are often iterated, meaning that one inductive defi-nition builds on top of another. In an iterated inductive defidefi-nition the premises of a rule

J1 . . . Jk J

may be instances of either a previously defined judgment form, or the ment form being defined. For example, the following rules define the judg-mentaliststating thatais a list of natural numbers.

nillist (2.7a)

anat blist

cons(a;b)list (2.7b)

The first premise of Rule (2.7b) is an instance of the judgment forma nat, which was defined previously, whereas the premise blistis an instance of the judgment form being defined by these rules.

Frequently two or more judgments are defined at once by a simultane-ous inductive definition. A simultaneous inductive definition consists of a set of rules for deriving instances of several different judgment forms, any of which may appear as the premise of any rule. Because the rules defining each judgment form may involve any of the others, none of the judgment forms may be taken to be defined prior to the others. Instead we must un-derstand that all of the judgment forms are being defined at once by the entire collection of rules. The judgment forms defined by these rules are, as

(40)

22 2.6 Defining Functions by Rules

before, the strongest judgment forms that are closed under the rules. There-fore the principle of proof by rule induction continues to apply, albeit in a form that requires us to prove a property of each of the defined judgment forms simultaneously.

For example, consider the following rules, which constitute a simulta-neous inductive definition of the judgmentsaeven, stating thatais an even natural number, andaodd, stating thatais an odd natural number:

zeroeven (2.8a)

aodd

succ(a)even (2.8b)

a even

succ(a)odd (2.8c)

The principle of rule induction for these rules states that to show simul-taneously thatP(aeven)wheneveraevenandP(aodd)wheneveraodd, it is enough to show the following:

1. P(zeroeven);

2. ifP(aodd), thenP(succ(a)even); 3. ifP(aeven), thenP(succ(a)odd).

As a simple example, we may use simultaneous rule induction to prove that (1) ifa even, thena nat, and (2) ifa odd, thena nat. That is, we define the propertyP by (1) P(aeven) iff a nat, and (2) P(aodd)iff a nat. The principle of rule induction for Rules (2.8) states that it is sufficient to show the following facts:

1. zeronat, which is derivable by Rule (2.2a).

2. Ifanat, thensucc(a)nat, which is derivable by Rule (2.2b). 3. Ifanat, thensucc(a)nat, which is also derivable by Rule (2.2b).

2.6

Defining Functions by Rules

(41)

2.6 Defining Functions by Rules 23

relationsum(a;b;c), with the intended meaning thatcis the sum ofaandb, as follows:

bnat

sum(zero;b;b) (2.9a)

sum(a;b;c)

sum(succ(a);b;succ(c)) (2.9b) The rules define a ternary (three-place) relation,sum(a;b;c), among natural numbers a,b, andc. We may show thatcis determined byaandbin this relation.

Theorem 2.4. For every a natand b nat, there exists a unique c natsuch that

sum(a;b;c).

Proof. The proof decomposes into two parts:

1. (Existence) Ifanatandbnat, then there existscnatsuch thatsum(a;b;c). 2. (Uniqueness) Ifsum(a;b;c), andsum(a;b;c0), thenc=c0 nat.

For existence, let P(anat)be the proposition if b natthen there exists c nat

such thatsum(a;b;c). We prove that ifanatthenP(anat)by rule induction on Rules (2.2). We have two cases to consider:

Rule(2.2a) We are to showP(zeronat). Assuming b natand takingcto beb, we obtainsum(zero;b;c)by Rule (2.9a).

Rule(2.2b) AssumingP(anat), we are to show P(succ(a)nat). That is, we assume that ifb natthen there existscsuch thatsum(a;b;c), and are to show that ifb0 nat, then there existsc0such thatsum(succ(a);b0;c0). To this end, suppose thatb0nat. Then by induction there existscsuch thatsum(a;b0;c). Taking c0 = succ(c), and applying Rule (2.9b), we obtainsum(succ(a);b0;c0), as required.

For uniqueness, we prove thatifsum(a;b;c1), then ifsum(a;b;c2), then c1=c2nat

by rule induction based on Rules (2.9).

Rule(2.9a) We have a = zero andc1 = b. By an inner induction on the

same rules, we may show that if sum(zero;b;c2), then c2 is b. By

Lemma2.2we obtainb=bnat.

Rule(2.9b) We have thata=succ(a0)andc1 =succ(c01), wheresum(a0;b;c10).

By an inner induction on the same rules, we may show that ifsum(a;b;c2),

thenc2=succ(c02)natwheresum(a0;b;c02). By the outer inductive

hy-pothesisc01=c02 natand soc1=c2nat.

(42)

24 2.7 Modes

2.7

Modes

The statement that one or more arguments of a judgment is (perhaps uniquely) determined by its other arguments is called amode specificationfor that judg-ment. For example, we have shown that every two natural numbers have a sum according to Rules (2.9). This fact may be restated as a mode spec-ification by saying that the judgment sum(a;b;c) has mode (∀,∀,∃). The notation arises from the form of the proposition it expresses: for all a nat

and for all b nat, there exists c natsuch that sum(a;b;c). If we wish to fur-ther specify thatcisuniquelydetermined byaandb, we would say that the judgmentsum(a;b;c)has mode(∀,∀,∃!), corresponding to the proposition for all anatand for all b nat, there exists a unique cnatsuch thatsum(a;b;c). If we wish only to specify that the sum is unique,if it exists, then we would say that the addition judgment has mode(∀,∀,∃≤1), corresponding to the

propositionfor all anatand for all bnatthere exists at most one cnatsuch that

sum(a;b;c).

As these examples illustrate, a given judgment may satisfy several dif-ferent mode specifications. In general the universally quantified arguments are to be thought of as the inputs of the judgment, and the existentially quantified arguments are to be thought of as itsoutputs. We usually try to arrange things so that the outputs come after the inputs, but it is not es-sential that we do so. For example, addition also has the mode(∀,∃≤1,∀),

stating that the sum and the first addend uniquely determine the second addend, if there is any such addend at all. Put in other terms, this says that addition of natural numbers has a (partial) inverse, namely subtraction. We could equally well show that addition has mode(∃≤1,,∀), which is

just another way of stating that addition of natural numbers has a partial inverse.

Often there is an intended, or principal, mode of a given judgment, which we often foreshadow by our choice of notation. For example, when giving an inductive definition of a function, we often use equations to indi-cate the intended input and output relationships. For example, we may re-state the inductive definition of addition (given by Rules (2.9)) using equations:

anat

a+zero=anat (2.10a)

a+b=cnat

a+succ(b)=succ(c)nat (2.10b)

(43)

2.8 Notes 25

equations is determined as a function of those on the left. Having done so, we abuse notation, writinga+bfor the uniquecsuch thata+b=cnat.

2.8

Notes

Aczel(1977) provides a thorough account of the theory of inductive defi-nitions. The formulation given here is strongly influenced by Martin-L ¨of’s development of the logic of judgments (Martin-L ¨of,1983,1987).

(44)
(45)

Chapter 3

Hypothetical and General

Judgments

Ahypothetical judgment expresses an entailment between one or more hy-potheses and a conclusion. We will consider two notions of entailment, calledderivabilityandadmissibility. Both enjoy the same structural proper-ties, but they differ in that derivability is stable under extension with new rules, admissibility is not. Ageneral judgmentexpresses the universality, or genericity, of a judgment. There are two forms of general judgment, the genericand theparametric. The generic judgment expresses generality with respect to all substitution instances for variables in a judgment. The para-metric judgment expresses generality with respect to renamings of sym-bols.

3.1

Hypothetical Judgments

The hypothetical judgment codifies the rules for expressing the validity of a conclusion conditional on the validity of one or more hypotheses. There are two forms of hypothetical judgment that differ according to the sense in which the conclusion is conditional on the hypotheses. One is stable under extension with additional rules, and the other is not.

3.1.1 Derivability

(46)

28 3.1 Hypothetical Judgments

additional axioms

J1

. . .

Jk .

We treat thehypotheses, orantecedents, of the judgment,J1, . . . ,Jnas “tempo-rary axioms”, and derive theconclusion, orconsequent, by composing rules inR. Thus, evidence for a hypothetical judgment consists of a derivation of the conclusion from the hypotheses using the rules inR.

We use capital Greek letters, frequentlyΓor∆, to stand for a finite col-lection of basic judgments, and writeR[Γ]for the expansion ofRwith an axiom corresponding to each judgment inΓ. The judgmentΓ`R Kmeans that K is derivable from rules R[Γ], and the judgment `R Γ means that `R Jfor eachJinΓ. An equivalent way of defining J1, . . . ,Jn`R Jis to say that the rule

J1 . . . Jn

J (3.1)

isderivable fromR, which means that there is a derivation of J composed of the rules inRaugmented by treatingJ1, . . . ,Jnas axioms.

For example, consider the derivability judgment

anat`(2.2)succ(succ(a))nat (3.2) relative to Rules (2.2). This judgment is valid for anychoice of objecta, as evidenced by the derivation

anat

succ(a)nat

succ(succ(a))nat

(3.3)

which composes Rules (2.2), starting witha natas an axiom, and ending with succ(succ(a)) nat. Equivalently, the validity of (3.2) may also be expressed by stating that the rule

anat

succ(succ(a))nat (3.4)

is derivable from Rules (2.2).

It follows directly from the definition of derivability that it is stable un-der extension with new rules.

Theorem 3.1(Stability). IfΓ`R J, thenΓ`R∪R0 J.

(47)

3.1 Hypothetical Judgments 29

Derivability enjoys a number ofstructural propertiesthat follow from its definition, independently of the rules,R, in question.

Reflexivity Every judgment is a consequence of itself: Γ,J `R J. Each hypothesis justifies itself as conclusion.

Weakening If Γ `R J, then Γ,K `R J. Entailment is not influenced by unexercised options.

Transitivity IfΓ,K `R J and Γ `R K, thenΓ `R J. If we replace an ax-iom by a derivation of it, the result is a derivation of its consequent without that hypothesis.

Reflexivity follows directly from the meaning of derivability. Weakening follows directly from the definition of derivability. Transitivity is proved by rule induction on the first premise.

3.1.2 Admissibility

Admissibility, written Γ |=R J, is a weaker form of hypothetical judgment stating that`R Γimplies `R J. That is, the conclusion J is derivable from rules R whenever the assumptions Γ are all derivable from rules R. In particular if any of the hypotheses arenotderivable relative toR, then the judgment is vacuously true. An equivalent way to define the judgment

J1, . . . ,Jn |=R J is to state that the rule J1 . . . Jn

J (3.5)

isadmissible relative to the rules in R. This means that given any deriva-tions of J1, . . . ,Jnusing the rules inR, we may construct a derivation of J using the rules inR.

For example, the admissibility judgment

succ(a)nat|=(2.2) anat (3.6) is valid, because any derivation ofsucc(a)natfrom Rules (2.2) must con-tain a sub-derivation ofa natfrom the same rules, which justifies the con-clusion. The validity of (3.6) may equivalently be expressed by stating that the rule

succ(a)nat

anat (3.7)

is admissible for Rules (2.2).

(48)

30 3.1 Hypothetical Judgments

In contrast to derivability the admissibility judgment isnotstable under extension to the rules. For example, if we enrich Rules (2.2) with the axiom

succ(junk)nat (3.8)

(wherejunk is some object for whichjunk nat is not derivable), then the admissibility (3.6) is invalid. This is because Rule (3.8) has no premises, and there is no composition of rules derivingjunk nat. Admissibility is as sensitive to which rules are absentfrom an inductive definition as it is to which rules arepresentin it.

The structural properties of derivability ensure that derivability is stronger than admissibility.

Theorem 3.2. IfΓ`R J, thenΓ|=R J.

Proof. Repeated application of the transitivity of derivability shows that if

Γ`R Jand`RΓ, then`R J.

To see that the converse fails, observe that there is no composition of rules such that

succ(junk)nat`(2.2)junknat, yet the admissibility judgment

succ(junk)nat|=(2.2) junknat

holds vacuously.

Evidence for admissibility may be thought of as a mathematical func-tion transforming derivafunc-tions∇1, . . . ,∇n of the hypotheses into a deriva-tion∇of the consequent. Therefore, the admissibility judgment enjoys the same structural properties as derivability, and hence is a form of hypothet-ical judgment:

Reflexivity IfJis derivable from the original rules, thenJis derivable from the original rules: J |=R J.

(49)

3.2 Hypothetical Inductive Definitions 31

Transitivity IfΓ,K|=R JandΓ |=R K, thenΓ|=R J. If the judgments inΓ are derivable, so isK, by assumption, and hence so are the judgments inΓ,K, and hence so isJ.

Theorem 3.3. The admissibility judgmentΓ |=R J enjoys the structural proper-ties of entailment.

Proof. Follows immediately from the definition of admissibility as stating that if the hypotheses are derivable relative toR, then so is the conclusion.

If a rule, r, is admissible with respect to a rule set, R, then `R,r J is equivalent to `R J. For if `R J, then obviously `R,r J, by simply disre-garding r. Conversely, if `R,r J, then we may replace any use ofr by its expansion in terms of the rules in R. It follows by rule induction on R,r that every derivation from the expanded set of rules, R,r, may be trans-formed into a derivation fromR alone. Consequently, if we wish to show that P(J)whenever `R,r J, it is sufficient to show that P is closed under the rulesRalone. That is, we need only consider the rulesRin a proof by rule induction to deriveP(J).

3.2

Hypothetical Inductive Definitions

It is useful to enrich the concept of an inductive definition to permit rules with derivability judgments as premises and conclusions. Doing so per-mits us to introducelocal hypotheses that apply only in the derivation of a particular premise, and also allows us to constrain inferences based on the global hypothesesin effect at the point where the rule is applied.

Ahypothetical inductive definition consists of a collection ofhypothetical rulesof the following form:

Γ Γ1` J1 . . . Γ Γn` Jn

Γ` J . (3.9)

The hypotheses Γare the global hypothesesof the rule, and the hypotheses

Γiare thelocal hypothesesof theith premise of the rule. Informally, this rule states thatJis a derivable consequence ofΓwhenever eachJi is a derivable consequence ofΓ, augmented with the additional hypothesesΓi. Thus, one way to show that J is derivable from Γis to show, in turn, that each Ji is derivable from Γ Γi. The derivation of each premise involves a “context

(50)

32 3.2 Hypothetical Inductive Definitions

switch” in which we extend the global hypotheses with the local hypothe-ses of that premise, establishing a new set of global hypothehypothe-ses for use within that derivation.

In most cases a rule is stated forallchoices of global context, in which case it is said to be uniform. A uniform rule may be given in the implicit form

Γ1` J1 . . . Γn ` Jn

J (3.10)

which stands for the collection of all rules of the form (3.9) in which the global hypotheses have been made explicit.

A hypothetical inductive definition is to be regarded as an ordinary in-ductive definition of aformal derivability judgmentΓ` Jconsisting of a finite set of basic judgments, Γ, and a basic judgment, J. A collection of hypo-thetical rules, R, defines the strongest formal derivability judgment that isstructuralandclosed under rulesR. Structurality means that the formal derivability judgment must be closed under the following rules:

Γ,J` J (3.11a)

Γ` J

Γ,K ` J (3.11b)

Γ` K Γ,K` J

Γ` J (3.11c)

These rules ensure that formal derivability behaves like a hypothetical judg-ment. By a slight abuse of notation we writeΓ`R Jto indicate thatΓ` Jis derivable from rulesR.

The principle of hypothetical rule induction is just the principle of rule induction applied to the formal hypothetical judgment. So to show that P(Γ ` J) wheneverΓ `R J, it is enough to show that P is closed under both the rules ofR and under the structural rules. Thus, for each rule of the form (3.10), whether structural or inR, we must show that

ifP(Γ Γ1` J1)and . . . andP(Γ Γn ` Jn), thenP(Γ` J).

This is just a restatement of the principle of rule induction given in Chap-ter2, specialized to the formal derivability judgmentΓ` J.

(51)

3.3 General Judgments 33

alone. If all of the rules of a hypothetical inductive definition are uniform, the structural rules (3.11b) and (3.11c) are readily seen to be admissible. Usually, Rule (3.11a) must be postulated explictly as a rule, rather than shown to be admissible on the basis of the other rules.

3.3

General Judgments

General judgments codify the rules for handling variables in a judgment. As in mathematics in general, a variable is treated as anunknownranging over a specified collection of objects. A genericjudgment expresses that a judgment holds for any choice of objects replacing designated variables in the judgment. Another form of general judgment codifies the handling of parameters. A parametric judgment expresses generality over any choice of fresh renamings of designated parameters of a judgment. To keep track of the active variables and parameters in a derivation, we write Γ `UR;X J to indicate that J is derivable from Γ according to rules R, with objects consisting of abt’s over parametersUand variablesX.

Generic derivabilityjudgment is defined by

~x |Γ`XR J iff ∀π:~x↔~x0π·Γ`X,~x 0 R π·J,

where the quantification is restricted to variables~x0 not already active in X. Evidence for generic derivability consists of a generic derivation, ∇, in-volving the variables~x such that for every fresh renamingπ :~x ↔~x0, the derivationπ· ∇is evidence forπ·Γ`X,~x

0

R π·J. The renaming ensures that the variables in the generic judgment are fresh (not already declared inX), and that the meaning of the judgment is not dependent on the choice of variable names.

For example, the generic derivation,∇,

xnat

succ(x)nat

succ(succ(x))nat

is evidence for the judgment,

x|xnat`X(2.2)succ(succ(x))nat.

This is because every fresh renaming ofxtoyin∇results in a valid deriva-tion of the corresponding renaming of the indicated judgment.

The generic derivability judgment enjoys the followingstructural prop-ertiesgoverning the behavior of variables:

References

Related documents

Second, by distinguishing between the impact of source country female labor supply on the labor supply versus the wages of immigrant women in the United States, we are able to

The ques- tions guiding this research are: (1) What are the probabilities that Korean and Finnish students feel learned helplessness when they attribute academic outcomes to the

The purpose of the study was to assess the de- velopment in oral proficiency in English language among prospective teachers undertaking one year teacher education program in

Physical inactivity was significantly more common in urban areas of the country compared to rural areas Males were significantly more active than females Most of the time spent

In relation to this study, a possible explanation for this is that students who were in the high mathematics performance group were driven to learn for extrinsic reasons and

The main purpose of this work was the evaluation of different CZM formulations to predict the strength of adhesively-bonded scarf joints, considering the tensile and shear

The results show that our multilin- gual annotation scheme proposal has been utilized to produce data useful to build anaphora resolution systems for languages with

ADC: Apparent diffusion coefficient; AP: Arterial perfusion; ceCT: Contrast- enhanced computed tomography; CEUS: Contrast-enhanced ultrasound; CRLM: Colorectal liver metastases