Dependent types - Space cost analysis using sized types

The defining characteristic of dependent type systems is the possibility of param- eterising types over values. Dependent type systems generalise the function type

A→ B to the dependent product Πx: A. B where the type B of the co-domain is allowed to vary withx; the simple function type is obtained as an instance wherex

does not occur inB.

Restricted forms of type dependency have long been used in programming languages. For example, the Pascal array type depends on its size; and the types of arguments of the C-language printf depend on its first argument (the format string). Dependent type systems are formal basis for reasoning about such notions. Following the Curry-Howard correspondence (Girard et al. 1989), dependent types allow expressing bothpropositions andcomputational (data) types in a single framework; therefore dependent type theories can form the basis of proof assistants, e.g.Coq (Coq 2006) and program verifiers, e.g.Lego (Luo and Pollack 1992).

More recently, there has been an increase of research in functional programming languages incorporating dependent types, e.g. Dependent ML (Xi 1998), Cayenne (Augustsson 1998),Agda(Coquand and Coquand 1999) andEpigram(McBride and McKinna 2004). This is motivated by the desire to express more refined program properties using types than is possible with the standard polymorphic type systems. In fact, some extensions of the Haskell type system implemented in GHC, e.g. type classes with functional dependencies (Jones 2000) and generalised algebraic data types (Jones et al. 2006) allow simulating some of the expressive power of dependent types (McBride 2002, Apple and Weimer 2007).

Dependent ML(DML) is a conservative extension of the ML language with dependent types (Xi 1998, Xi and Pfenning 1999). The motivation for DML was to extend a realistic programming language with dependent types whist retaining both decidability of type checking and a low overhead of type annotations. This is

achieved by separating arbitrary ML terms (where general recursion is allowed and whose equivalence is therefore undecidable) from theindices allowed in types (taken from some decidable constraint domain).

Computation on DML type indices is restricted to constraint normalisation; this allows reducing the type checking of DML programs to constraint solving in the underlying domain. The constraint domain of natural indices with addition allows capturing size invariants of data structures; deciding the equivalence of the DML types with such indices can then be reduced to checking equivalence of Presburger constraints, e.g. using the Omega calculator (Pugh 1992). Xi (1998) presented ap- plications of DML types with natural indices to program error detection and opti- misations, e.g. elimination of array bounds check and dead-code.

Dependent types in DML are introduced by refining a standard data type declaration. For example, a canonical declaration for a list data type

datatype ’a = nil | cons of ’a * a’ list

can be refined with a natural length measure by the declaration:

typeref ’a list of nat with nil <| ’a list(0)

| cons <| {n:nat} ’a * ’a list(n) -> ’a list(n+1)

This refinement assigns a type with length zero for niland a type for cons that increases the length by one; the notation{n:nat} is the concrete syntax for introducing a dependent product Πn : nat. Size properties regarding lists can then be expressed by dependent type annotations; for example, the size relation for the list append function is expressed by the type

append <| {m:nat}{n:nat} ’a list(m) * ’a list(n) -> ’a list(m+n)

and the DML type checker can verify that this size relation holds for the canonical recursive definition of append.

For size relations that cannot be expressed exactly, DML allows the use ofdepen- dent sum types. For example, the higher-orderfilter function computes a sub-list of elements verifying some predicate; since the length of result depends on the predicate it cannot be specified exactly; however, an upper-bound can be specified by the type

filter <| (’a -> bool) * {n:nat} ’a list(n)

-> [m:nat | m<=n] ’a list(m)

where[m:nat | m<=n]is adependent sum that constraints the result list lengthm

DML with integer indices allows expressing properties similar to the sized type systems (Reistad and Gifford 1994, Hughes et al. 1996, Chin and Khoo 2001). The main distinctions between the two approaches are: DML indices are user-definable for each data type whereas the notion of “size” in the sized type systems above is rigid; there is no implicit subtyping relation for size coercion in DML (instead, relevant functions must be annotated with dependent sum types); and finally, the DML type checker can verify user-annotated size relations but not infer them as in (Reistad and Gifford 1994, Chin and Khoo 2001).

Grobauer (2001) presented a method for automatically deriving cost recurrences from first-order DML programs. The main contribution is the use of indices in DML types as data sizes for expressing the recurrences. This allows the user to specify more precise size measures for data, e.g. nested lists or trees. The cost model is asymptotic (e.g. the number of function calls or some other primitive operation). This work focuses on extracting cost recurrences but not on obtainingsolutions to the cost equations. Except in very simple cases, obtaining closed form solutions requires human intervention. For example, a function merging two lists in order (part of a merge sort example)

fun merge l = case l of (nil, l2) => l2 (l1, nil) => l1

(cons(h1,t1),cons(h2,t2)) =>

if h1<h2 then cons(h1, merge(t1,l2)) else cons(h2, merge(l1,t2))

with merge <| {n1:nat}{n2:nat} list(n1)*list(n2) -> list(n1+n2)

yields the following cost recurrence (braces represent possibly-guarded maximum between alternatives): mergec n1 n2=            n1= 07→0 n2= 07→0 n1>0∧n2>07→1 + ( mergec ₍_n 1−1) n2 mergec n1(n2−1) (3.8)

It is immediate that the cost recurrence mimics the recursive structure of the original function. Even using computer algebra systems such asMaple orMathematica, some human intervention is required to convert a recurrence such as (3.8) into the closed form expressionmergec _n

1n2= min(n1, n2).

1997) to perform verification of resources bounds. This system is based on a inter- mediate compiler language calledLXresthat allows expressing resource properties in types by exposing a “virtual clock” representing some available resource (e.g. time). Resource properties can then verified by the type checker. To deal with variable-time procedures, they employ a technique of encoding static type-level representations of data using using sum and inductive kinds; this simulates type dependency while allowing a simpler theory and type checker.

Costs can be expressed as primitive-recursive functions over the static data representations (so that type checking remains decidable). These must be provided by the user: the system allows verifying resource bounds, but makes no attempt to infer them.

Brady and Hammond (2006) employed a dependently-typed language similar to Epigramto encode and verify size properties of functional programs. Their approach generalises the previous examples of sized lists in DML by introducing a dependent type Sizethat pairs a type indexed by a natural size and a predicate (itself repre- sented as an dependent type). A termsizev p:SizeA P pairs a valuevof indexed typeA nand a proofpthatv respects a size propertyP.

Brady and Hammond applied this framework to express size relations of functions on lists, including an example similar to the split by function of Section 3.3. They also extend the technique to capture size relations for higher-order functions by associating size predicates and functions with higher-order arguments. The authors illustrate the technique with the higher-order functions suchtwice,map andfold.

A first limitation of this work is that it considers only verifying sizes expressed as dependent types. The elaboration of a simply typed program in Haskell or ML into a dependently typed version with size annotations is left to the user (particularly guessing size relations of functions). The extent to which this step can be automated is not addressed.

Secondly, this work uses dependent types for expressing size information but not time or space costs. Although the authors mention that the technique should be extendible to other metrics such as heap, stack or time usage, we remark that such extension is not likely to be straightforward because it requires reasoning also about intentional properties of evaluation (e.g. cost) rather than just denotational ones (e.g. size).

Danielsson (2008) has also used a dependently-typed language for expressing complexity analysis of functional programs. This work focuses on expressing costs rather than sizes by encapsulating values in a cost monad (Wadler 1993) parame-

terised by the number of computation steps: Thunkn ais the type of a computation that evaluates to anainnsteps. The unit and bind operations for the thunk monad are:

return:a→Thunk0 a

>>= :Thunkm a→(a→Thunkn b)→Thunk(m+n)b

The monadic unit injects a value into the cost monad with zero cost while the bind combines costs from two computations. Any atomic costs must be explicitly introduced using “tick” annotations in the program; eachtick adds one unit of cost:

tick :Thunkn a→Thunk(1 +n)a

Note that theThunktype is dependent on the naturalnand that both the monadic operations andtick have dependent types.

These basic combinators form a library implemented in the dependently typed language Agda and allow a programmer to specify machine-checkable complexity proofs; for example, assuming a dependent type for lists annotated with their length, and assigning a unit cost to each lambda-abstraction, we can type check a list concatenation function annotated with a linear cost on the first argument:

(++ ) :Listm a→Listn a→Thunk(1 + 2∗m) (List(m+n)a)

[]++ys=tick (return ys)

(x:xs) ++ys=tick (xs++ys >>=λt→tick (return (x:t)))

The use of a dependently-typed cost monad allows expressing quite precise cost information, e.g. it can be used to reason about the complexity oflazy evaluation by explicitly embeddingThunktypes into data structures.

However, it requires insightful annotations by the user and a considerable knowl- edge of dependent type systems. For example, to type check the concatenation example above requires providing a lemma for the arithmetic equality 1 + ((1 + 2∗

m) + (1 + 0)) = 1 + 2∗(m+ 1). Non-trivial programs also require the introduction of auxiliary operators, e.g. to “waste” costs and ensure that the two branches of a conditional admit the same type14.

The cost model used is quite abstract: it counts number of “steps” specified by the number of ticks annotated in the code. Presumably the technique could be extended to a model of cost based on an abstract machine, e.g. as in (Hughes and Pareto 1999).

14_{This is analogous to the subeffecting allowed in effects systems for time (Reistad and Gifford}

Finally, the system allows onlycheckingcost bounds but does not aid in obtaining the cost bounds in the first place.

In document Space cost analysis using sized types (Page 79-84)