Amortised cost analysis - Space cost analysis using sized types

Amortised complexity analysis aims at obtaining bounds for the cost of a sequence of operations (Tarjan 1985, Okasaki 1998); it is sometimes possible to obtain better worst-case bounds by amortisation than by reasoning about the costs of individual operations. For example, it might be possible to obtain a worst-case bound ofO(n) for a sequence of noperations even if some of the individual operations cost more thanO(1).

The “physicist method” for deriving amortised bounds starts by assigning a non- negative potential function to data. The amortised cost of an operation is then defined as the sum of the actual cost (e.g. time cost or heap cells allocated) plus the difference in potential incurred by the operation. The key idea is to choose the potential functions so as to facilitate computing the amortised cost, e.g. in such a way as to make the amortised costs constant. Provided the potential is always non- negative and initially zero, the accumulated amortised costs will be an upper-bound on the accumulated actual costs (Okasaki 1998).

Hofmann and Jost (2003) proposed a type-based analysis for heap space usage using amortisation. Instead of extending type judgements with effects as in (Dornic et al. 1992, Reistad and Gifford 1994, Hughes and Pareto 1999), the analysis of Hoffman and Jost is based on annotating data types with weights representing the relative contribution of parts of a data structure to the overall heap usage (the potential associated with the data structure).

The language under analysis is a first order functional notation with a strict semantics and algebraic data types including sums, products, booleans and lists. There are two kinds of pattern-matching deconstructors for heap-allocated values: a deallocating match and non-deallocating match0. The heap cost is defined by a big-step operational semantics instrumented with the size of a free list of heap cells; the free list reduces at each constructor application and grows at each match (but not atmatch0).

The augmented typing judgements take the form Γ, k ` e : A, k0 where Γ are the type assumptions, eis an expression,Ais an annotated type andk, k0 are non- negative rational numbers representing the available potential before and after the evaluation of e. The annotations in A together with k and k0 give both an upper bound on the initial heap space for evaluation ofeand a lower bound on the available

heap space after evaluation. For example, the judgement

x:L(L(B,1),2),3`e:L(B,4),5

informally says that if x is a list of lists of booleans then e is a list of booleans; furthermore, ifx= [l1, . . . , ln] then a free list of size 3 + 2n+ 1P_i|li| is sufficient to evaluatee; and ifeevaluates to a list [b1, . . . , bm] of lengthm, the resulting free list will have size at least 5 + 4m.

From this example we can see that type annotations play a very different role here than in the sized type systems: in the system of Hoffman and Jost an annotation represents not asize, but thecoefficient of the heap cost incurred by a part of a data structure. The upper bound on the initial free list is a function of the (unknown) sizes of the input. Note also that the lower bound on the final free list size is a function of the (unknown) size of the output and that no input/output size relation is obtained.

The type system of Hofmann and Jost performs an amortised analysis of the size of the free list: the coefficients in types represent the potential associated with the data structures; the typing rules constrain the annotations so that the amortised costs for each expression are properly accounted. For example, the typing rules for constructing and deconstructing a list node are:15

n≥SIZE(A⊗L(A, k)) +k+n0

Γ, xh:A, xt:L(A, k), n`cons(xh, xt) :L(A, k), n0

(3.9)

Γ, n`e1:C, n0

Γ, xh:A, xt:L(A, k), n+SIZE(A⊗L(A, k)) +k`e2:C, n0

Γ, x:L(A, k), n`matchxwith |nil⇒e1

|cons(xh, xt)⇒e2

:C, n0

(3.10)

Rule (3.9) specifies that the available potential n must be at least the amortised cost of cons, that is, the actual heap cells used (given by theSIZEfunction) plus the potentialkassociated with the list elements (because the list length is increased by one). Dually, rule (3.10) specifies that the available potential at theconsalternative increases by the amortised cost (becausematchdoes deallocation).

Hofmann and Jost presented an algorithm that automatically infers the type annotations. Their technique associates each program P with a system of linear inequalitiesL(P) such that the valid annotated type derivations for P correspond to the admissible solutions ofL(P); these solutions can be obtained by a standard linear programming solvers.

15_{Following Hofmann and Jost (2003) and without loss of generality, we present the type rules}

The worst-case theoretical complexity for solving linear programs is polynomial; the variants of the simplex algorithm used in solver implementations, although exponential in the worst-case, are quite efficient in practice. This compares favourably with the sized type systems (Hughes et al. 1996, Hughes and Pareto 1999, Chin and Khoo 2001) where type checking alone requires checking validity of Presburger constraints with doubly-exponential worst-case time.

Since annotations represent coefficients of the potential function, the system can only derive heap bounds that are linear on the sizes of data structures. However, since the language implements deallocation using destructive matching, it is still expressive enough to obtain heap costs for many list processing functions, including insertion algorithms such as insertion sort and quicksort.16 _{Unlike the sized type}

analysis of Hughes and Pareto (1999), the amortised analysis deals with the irregular divide-and-conquer recursions by “splitting” the potentials between the two recursive calls. Hofmann and Jost also present good results for a binary tree traversal and report successfully analysis of other textbook examples.

One limitation of the analysis of Hofmann and Jost is that the inferred type annotations are sometimes not sufficiently polymorphic because every use of a function shares the same potentials. Consider the identity function f : L(B) → L(B) on a list of booleans; if a particular use requires the annotationf :L(B,5),3→L(B,5),3 then it not possible to applyf to an argument of typeL(B,0). The authors suggest that this can be relaxed by conducting separate analysis for each use off. However, this implies that is not possible to analyse functions separately from their use, i.e. the analysis is not fully modular.

Hofmann and Jost have considered heap usage but not time or stack usage. Time could, in principle, be treated similarly to heap, by simply recording the number of execution steps instead of the size of a free list. The only difference is the absence of a deallocation mechanism for time costs.

Extending the amortised analysis for stack usage is less straightforward. One technical problem is that a realistic model for stack must employ a small-step rather than a big-step semantics as used in (Hofmann and Jost 2003). Another concern is that the bounds expressible by the amortised analysis are linear on the size of data structures (the total number of elements). While this is generally a good match for obtaining heap bounds, for example, it will yield coarse stack bounds for a tree search algorithm whose worst-case complexity is linear on the depth of the tree. A recently submitted PhD thesis investigates the extension of amortised analysis

16 _{The sorting algorithms exhibit linear space or even constant bounds by reusing the heap}

to stack costs; the definition of potential is modified to account the depth of data structures (Campbell 2008).

In document Space cost analysis using sized types (Page 84-87)