Bibliographic Notes - Data Structures and Algorithms Alfred V Aho pdf

The concept of an abstract data type can be traced to the class type in the language SIMULA 67 (Birtwistle et al. [1973]). Since that time, a variety of other languages that support abstract data types have been developed including Alphard (Shaw, Wulf, and London [1977]), C with classes (Stroustrup [1982]), CLU (Liskov, et al. [1977]), MESA (Geschke, Morris, and Satterthwaite [1977]), and Russell (Demers and Donahue [1979]). The ADT concept is further discussed in works such as Gotlieb and Gotlieb [1978] and Wulf et al. [1981].

Knuth [1968] was the first major work to advocate the systematic study of the running time of programs. Aho, Hopcroft, and Ullman [1974] relate the time and space complexity of algorithms to various models of computation, such as Turing machines and random- access machines. See also the bibliographic notes to Chapter 9 for more references to the subject of analysis of algorithms and programs.

For additional material on structured programming see Hoare, Dahl, and Dijkstra

[1972], Wirth [1973], Kernighan and Plauger [1974], and Yourdon and Constantine [1975]. Organizational and psychological problems arising in the development of large software projects are discussed in Brooks [1974] and Weinberg [1971]. Kernighan and Plauger [1981] show how to build useful software tools for a programming environment. † The symbol Ø stands for the empty set.

‡ We distinguish the abstract data type SET from the built-in set type of Pascal.

† The record has no known name because it was created by a call new(header), which made

header point to this newly-created record. Internal to the machine, however, there is a

memory address that can be used to locate the cell.

† Note the asymmetry between big-oh and big-omega notation. The reason such asymmetry is often useful is that there are many times when an algorithm is fast on many but not all inputs. For example, there are algorithms to test whether their input is of prime length that run very fast whenever that length is even, so we could not get a good lower bound on running time that held for all n ≥ n₀.

† Unless otherwise specified all logarithms are to the base 2. Note that O(logn) does not depend on the base of the logarithm since log_an = clog_bn, where c = log_ab.

† UNIX is a Trademark of Bell Laboratories.

‡ We could use an unabridged dictionary, but many misspellings are real words one has never heard of.

Basic Abstract DataTypes

In this chapter we shall study some of the most fundamental abstract data types. We consider lists, which are sequences of elements, and two special cases of lists: stacks, where elements are inserted and deleted at one end only, and queues, where elements are inserted at one end and deleted at the other. We then briefly study the mapping or associative store, an ADT that behaves as a function. For each of these ADT's we consider several implementations and compare their relative merits.

2.1 The Abstract Data Type "List"

Lists are a particularly flexible structure because they can grow and shrink on demand, and elements can be accessed, inserted, or deleted at any position within a list. Lists can also be concatenated together or split into sublists. Lists arise routinely in applications such as information retrieval, programming language translation, and simulation. Storage management techniques of the kind we discuss in Chapter 12 use list-processing techniques extensively. In this section we shall introduce a number of basic list operations, and in the remainder of this chapter present data structures for lists that support various subsets of these operations efficiently.

Mathematically, a list is a sequence of zero or more elements of a given type (which we generally call the elementtype). We often represent such a list by a comma-separated sequence of elements

a_l, a₂, . . . ,a_n

where n ≥ 0, and each a_i is of type elementtype. The number n of elements is said to be the length of the list. Assuming n ≥ 1, we say that a₁ is the first element and a_n is the last element. If n = 0, we have an empty list, one which has no elements.

An important property of a list is that its elements can be linearly ordered

according to their position on the list. We say a_i precedes a_i+1 for i = 1, 2, . . . , n-1, and a_i follows a_i-1 for i = 2, 3, . . . ,n. We say that the element a_i is at position i. It is also convenient to postulate the existence of a position following the last element on a list. The function END(L) will return the position following position n in an n- element list L. Note that position END(L) has a distance from the beginning of the

list that varies as the list grows or shrinks, while all other positions have a fixed distance from the beginning of the list.

To form an abstract data type from the mathematical notion of a list we must define a set of operations on objects of type LIST.† As with many other ADT's we discuss in this book, no one set of operations is suitable for all applications. Here, we shall give one representative set of operations. In the next section we shall offer several data structures to represent lists and we shall write procedures for the typical list operations in terms of these data structures.

To illustrate some common operations on lists, let us consider a typical

application in which we have a mailing list from which we wish to purge duplicate entries. Conceptually, this problem can be solved quite simply: for each item on the list, remove all equivalent following items. To present this algorithm, however, we need to define operations that find the first element on a list, step through all

successive elements, and retrieve and delete elements.

We shall now present a representative set of list operations. In what follows, L is a list of objects of type elementtype, x is an object of that type, and p is of type position. Note that "position" is another data type whose implementation will vary for different list implementations. Even though we informally think of positions as integers, in practice, they may have another representation.

1. INSERT(x, p, L). Insert x at position p in list L, moving elements at p and following positions to the next higher position. That is, if L is a_l, a₂, . . . ,a_n, then L becomes a₁, a₂,. . . ,a_{p- 1}, x, a_p, . . . ,a_n. If p is END(L), then L becomes

a₁, a₂, . . . , a_n, x. If list L has no position p, the result is undefined.

2. LOCATE(x, L). This function returns the position of x on list L. If x appears more than once, then the position of the first occurrence is returned. If x does not appear at all, then END(L) is returned.

3. RETRIEVE(p, L). This function returns the element at position p on list L. The result is undefined if p = END(L) or if L has no position p. Note that the elements must be of a type that can be returned by a function if RETRIEVE is used. In practice, however, we can always modify RETRIEVE to return a pointer to an object of type elementtype.

4. DELETE(p, L). Delete the element at position p of list L. If L is a₁, a₂, . . . ,a_n, then L becomes a₁, a₂, . . . ,a_{p- 1}, a_p+1, . . . ,a_n. The result is undefined if L has no position p or if p = END(L).

5. NEXT(p, L) and PREVIOUS(p, L) return the positions following and

preceding position p on list L. If p is the last position on L, then NEXT(p, L) = END(L). NEXT is undefined if p is END(L). PREVIOUS is undefined if p is

1. Both functions are undefined if L has no position p.

6. MAKENULL(L). This function causes L to become an empty list and returns position END(L).

7. FIRST(L). This function returns the first position on list L. If L is empty, the position returned is END(L).

8. PRINTLIST(L). Print the elements of L in the order of occurrence.

Example 2.1. Let us write, using these operators, a procedure PURGE that takes a

list as argument and eliminates duplicates from the list. The elements of the list are of type elementtype, and a list of such elements has type LIST, a convention we shall follow throughout this chapter. There is a function same(x,y), where x and y are of elementtype, that is true if x and y are "the same" and false if not. The notion of sameness is purposely left vague. If elementtype is real, for example, we might want

same(x,y) true if and only if x = y. However, if elementtype is a record containing the

account number, name, and address of a subscriber as in type

elementtype = record acctno: integer;

name: packed array [1..20] of char; address: packed array [1..50] of char end

then we might want same(x, y) to be true whenever x.acctno=y.acctno.† Figure 2.1 shows the code for PURGE. The variables p and q are used to

represent two positions in the list. As the program proceeds, duplicate copies of any elements to the left of position p have been deleted from the list. In one iteration of the loop (2)-(8), q is used to scan the list following position p to delete any duplicates of the element at position p. Then p is moved to the next position and the process is repeated.

In the next section we shall provide appropriate declarations for LIST and

position, and implementations for the operations so that PURGE becomes a working program. As written, the program is independent of the manner in which lists are represented so we are free to experiment with various list implementations.

procedure PURGE ( var L: LIST );

{ PURGE removes duplicate elements from list L } var

p, q: position; { p will be the "current" position

begin

(1) p := FIRST(L);

(2) while p <> END(L) do begin (3) q := NEXT(p, L); (4) while q <> END(L) do (5) if same(RETRIEVE(p, L), RETRIEVE(q, L)) then (6) DELETE(q, L) else (7) q := NEXT(q, L); (8) p := NEXT(p, L) end end; { PURGE }

Fig. 2.1. Program to remove duplicates.

A point worth observing concerns the body of the inner loop, lines (4)-(7) of Fig. 2.1. When we delete the element at position q at line (6), the elements that were at positions q+1, q+2, . . . , and so on, move up one position in the list. In particular, should q happen to be the last position on L, the value of q would become END(L). If we then executed line (7), NEXT(END(L), L) would produce an undefined result. Thus, it is essential that either (6) or (7), but never both, is executed between the tests for q = END(L) at line (4).

2.2 Implementation of Lists

In this section we shall describe some data structures that can be used to represent lists. We shall consider array, pointer, and cursor implementations of lists. Each of these implementations permits certain list operations to be done more efficiently than others.

In document Data Structures and Algorithms Alfred V Aho pdf (Page 40-45)