A Linked-List Implementation of Sets - Abstract Data Types Based on Sets

Abstract Data Types Based on Sets

4.4 A Linked-List Implementation of Sets

It should also be evident that sets can be represented by linked lists, where the items in the list are the elements of the set. Unlike the bit-vector representation, the list representation uses space proportional to the size of the set represented, not the size of the universal set. Moreover, the list representation is somewhat more general since it can handle sets that need not be subsets of some finite universal set.

When we have operations like INTERSECTION on sets represented by linked lists, we have several options. If the universal set is linearly ordered, then we can represent a set by a sorted list. That is, we assume all set members are comparable by a relation "<" and the members of a set appear on a list in the order e₁, e₂, . . . , e_n, where e₁ < e₂ < e₃ < . . . < e_n. The advantage of a sorted list is that we do not need to search the entire list to determine whether an element is on the list.

An element is in the intersection of lists L₁ and L₂ if and only if it is on both lists. With unsorted lists we must match each element on L₁ with each element on L₂, a process that takes O(n2) steps on lists of length n. The reason that sorting the lists makes intersection and some other operations easy is that if we wish to match an element e on one list L₁ with the elements of another list L₂, we have only to look down L₂ until we either find e, or find an element greater than e; in the first case we have found the match, while in the second case, we know none exists. More

importantly, if d is the element on L₁ that immediately precedes e, and we have found onL₂ the first element, say f, such that d ≤ f, then to search L₂ for an occurrence of e we can begin with f. The conclusion from this reasoning is that we can find matches for all the elements of L₁, if they exist, by scanning L₁ and L₂ once, provided we advance the position markers for the two lists in the proper order, always advancing the one with the smaller element. The routine to implement INTERSECTION is

shown in Fig. 4.5. There, sets are represented by linked lists of "cells" whose type is defined type celltype = record element: elementtype; next: ↑ celltype end

Figure 4.5 assumes elementtype is a type, such as integer, that can be compared by <. If not, we have to write a function that determines which of two elements precedes the other.

The linked lists in Fig. 4.5 are headed by empty cells that serve as entry points to the lists. The reader may, as an exercise, write this program in a more general abstract form using list primitives. The program in Fig. 4.5, however, may be more efficient than the more abstract program. For example, Fig. 4.5 uses pointers to particular cells rather than "position" variables that point to previous cells. We can do so because we only append to list C, and A and B are only scanned, with no insertions or deletions done on those lists.

The operations of UNION and DIFFERENCE can be written to look surprisingly like the INTERSECTION procedure of Fig. 4.5. For UNION, we must attach all elements from either the A or B list to the C list, in their proper, sorted order, so when the elements are unequal (lines 12-14), we add the smaller to the C list just as we do when the elements are equal. We also append to list C all elements on the list not exhausted when the test of line (5) fails. For DIFFERENCE we do not add an element to the C list when equal elements are found. We only add the current A list element to the C list when it is smaller than the current B list element; for then we know the former cannot be found on the B list. Also, we add to C those elements on A when and if the test of line (5) fails because B is exhausted.

The operator ASSIGN(A, B) copies list A into list B. Note that, this operator

cannot be implemented simply by making the header cell of A point to the same place as the header cell of B, because in that case, subsequent changes to B would cause unexpected changes to A. The MIN operator is especially easy; just return the first element on the list. DELETE and FIND can be implemented by finding the target item as discussed for general lists and in the case of a DELETE, disposing of its cell. Lastly, insertion is not difficult to implement, but we must arrange to insert the new element into the proper position. Figure 4.6 shows a procedure INSERT that takes as parameters an element and a pointer to the header cell of a list, and inserts the

element into the list. Figure 4.7 shows the crucial cells and pointers just before (solid) and after (dashed) insertion.

procedure INTERSECTION ( ahead, bhead: ↑ celltype; var pc: ↑ celltype );

{ computes the intersection of sorted lists A and B with header cells ahead and bhead, leaving the result as a sorted list whose header is pointed to by pc } var

acurrent, bcurrent, ccurrent: ↑ celltype;

{the current cells of lists A and B, and the last cell added list C } begin

(1) new(pc); { create header for list C } (2) acurrent := ahead↑.next;

(3) bcurrent := bhead ↑.next; (4) ccurrent := pc;

(5) while (acurrent <> nil) and (bcurrent <> nil)

do begin

{ compare current elements on lists A and B } (6) if acurrent ↑.element =

bcurrent ↑.element then begin { add to intersection } (7) new( ccurrent ↑.next ); (8) ccurrent := ccurrent ↑.next; (9) ccurrent ↑.element := acurrent

↑.element;

(10 acurrent := acurrent↑.next; (11 bcurrent := bcurrent↑.next end

else { elements unequal } (12) if acurrent↑.element <

bcurrent↑.element then

(13) acurrent := acurrent↑.next else

(14) bcurrent := bcurrent↑.next end;

(15) ccurrent↑.next := nil end; { INTERSECTION }

In document Data Structures and Algorithms Alfred V Aho pdf (Page 143-146)