Restructuring Hash Tables - Data Structures and Algorithms Alfred V Aho pdf

If we use an open hash table, the average time for operations increases as N/B, a quantity that grows rapidly as the number of elements exceeds the number of buckets. Similarly, for a closed hash table, we saw from Fig. 4.15 that efficiency goes down as

N approaches B, and it is not possible that N exceeds B.

To retain the constant time per operation that is theoretically possible with hash tables, we suggest that if N gets too large, for example N ≥ .9B for a closed table or N

≥ 2B for an open one, we simply create a new hash table with twice as many buckets. The insertion of the current members of the set into the new table will, on the

average, take less time than it took to insert them into the smaller table, and we more than make up this cost when doing subsequent dictionary operations.

4.9 Implementation of the Mapping ADT

Recall our discussion of the MAPPING ADT from Chapter 2 in which we defined a mapping as a function from domain elements to range elements. The operations for this ADT are:

1. MAKENULL(A) initializes the mapping A by making each domain element have no assigned range value.

3. COMPUTE(A, d, r) returns true and sets r to A(d) if A(d) is defined; false is returned otherwise.

The hash table is an effective way to implement a mapping. The operations ASSIGN and COMPUTE are implemented much like INSERT and MEMBER operations for a dictionary. Let us consider an open hash table first. We assume the hash function h(d) hashes domain elements to bucket numbers. While for the

dictionary, buckets consisted of a linked list of elements, for the mapping we need a list of domain elements paired with their range values. That is, we replace the cell definition in Fig. 4.12 by type celltype = record domainelement: domaintype; range: rangetype; next: ↑ celltype end

where domaintype and rangetype are whatever types domain and range elements have in this mapping. The declaration of a MAPPING is type

MAPPING = array[0..B-1] of ↑ celltype

This array is the bucket array for a hash table. The procedure ASSIGN is written in Fig. 4.17. The code for MAKENULL and COMPUTE are left as an exercise.

Similarly, we can use a closed hash table as a mapping. Define cells to consist of domain element and range fields and declare a MAPPING to be an array of cells. As for open hash tables, let the hash function apply to domain elements, not range elements. We leave the implementation of the mapping operations for a closed hash table as an exercise.

4.10 Priority Queues

The priority queue is an ADT based on the set model with the operations INSERT and DELETEMIN, as well as the usual MAKENULL for initialization of the data structure. To define the new operation, DELETEMIN, we first assume that elements of the set have a "priority" function defined on

procedure ASSIGN ( var A: MAPPING; d: domaintype; r: rangetype ); var bucket: integer; current: ↑ celltype; begin bucket := h(d); current := A[bucket]; while current <> nil do

if current↑.domainelement = d then begin current↑.range := r;

{ replace old value for d } return end

else

current := current↑.next;

{ at this point, d was not found on the list }

current := A[bucket]; { use current to remember first cell } new(A[bucket]); A [bucket]↑.domainelement := d; A [bucket]↑.range: = r; A [bucket]↑.next := current end; { ASSIGN }

Fig. 4.17. The procedure ASSIGN for an open hash table.

them; for each element a, p(a), the priority of a, is a real number or, more generally, a member of some linearly ordered set. The operation INSERT has the usual meaning, while DELETEMIN is a function that returns some element of smallest priority and, as a side effect, deletes it from the set. Thus, as its name implies, DELETEMIN is a combination of the operations DELETE and MIN discussed earlier in the chapter.

Example 4.9. The term "priority queue" comes from the following sort of use for this

ADT. The word "queue" suggests that people or entities are waiting for some service, and the word "priority" suggests that the service is given not on a "first-come-first- served" basis as for the QUEUE ADT, but rather that each person has a priority based on the urgency of need. An example is a hospital waiting room, where patients having potentially fatal problems will be taken before any others, no matter how long the

respective waits are.

As a more mundane example of the use of priority queues, a time- shared

computing system needs to maintain a set of processes waiting for service. Usually, the system designers want to make short processes appear to be instantaneous (in practice, response within a second or two appears instantaneous), so these are given priority over processes that have already consumed substantial time. A process that requires several seconds of computing time cannot be made to appear instantaneous, so it is sensible strategy to defer these until all processes that have a chance to appear instantaneous have been done. However, if we are not careful, processes that have taken substantially more time than average may never get another time slice and will wait forever.

One possible way to favor short processes, yet not lock out long ones is to give process P a priority 100t_used(P) - t_init(P). The parameter tused gives the amount of time consumed by the process so far, and t_init gives the time at which the process initiated, measured from some "time zero." Note that priorities will generally be large negative integers, unless we choose to measure t_init from a time in the future. Also note that 100 in the above formula is a "magic number"; it is selected to be somewhat larger than the largest number of processes we expect to be active at once. The reader may observe that if we always pick the process with the smallest priority number, and there are not too many short processes in the mix, then in the long run, a process that does not finish quickly will receive 1% of the processor's time. If that is too much or too little, another constant can replace 100 in the priority formula.

We shall represent processes by records consisting of a process identifier and a priority number. That is, we define

type

processtype = record id: integer;

priority: integer end;

The priority of a process is the value of the priority field, which here we have defined to be an integer. We can define the priority function as follows.

function p ( a: processtype ): integer; begin

return (a.priority) end;

In selecting processes to receive a time slice, the system maintains a priority queue WAITING of processtype elements and uses two procedures, initial and select, to manipulate the priority queue by the operations INSERT and DELETEMIN.

Whenever a process is initiated, procedure initial is called to place a record for that process in WAITING. Procedure select is called when the system has a time slice to award to some process. The record for the selected process is deleted from

WAITING, but retained by select for reentry into the queue with a new priority; the priority is increased by 100 times the amount of time used.

We make use of function currenttime, which returns the current time, in whatever time units are used by the system, say microseconds, and we use procedure

execute(P) to cause the process with identifier P to execute for one time slice. Figure

4.18 shows the procedures initial and select.

procedure initial ( P: integer );

{ initial places process with id P on the queue } var

process: processtype; begin

process.id := P;

process.priority := - currenttime; INSERT (process, WAITING) end; { initial }

procedure select;

{ select allocates a time slice to process with highest priority } var

begintime, endtime: integer; process: processtype;

begin

process := ↑ DELETEMIN(WAITING);

{ DELETEMIN returns a pointer to the deleted element } begintime := currenttime;

execute (process.id); endtime := currenttime;

process.priority := process.priority + 100*(endtime -

begintime);

{ adjust priority to incorporate amount of time used } INSERT (process, WAITING)

{ put selected process back on queue with new priority } end; { select }

Fig. 4.18. Allocating time to processes.

4.11 Implementations of Priority Queues

With the exception of the hash table, the set implementations we have studied so far are also appropriate for priority queues. The reason the hash table is inappropriate is that there is no convenient way to find the minimum element, so hashing merely adds complications, and does not improve performance over, say, a linked list.

If we use a linked list, we have a choice of sorting it or leaving it unsorted. If we sort the list, finding a minimum is easy -- just take the first element on the list. However, insertion requires scanning half the list on the average to maintain the sorted list. On the other hand, we could leave the list unsorted, which makes insertion easy and selection of a minimum more difficult.

Example 4.10. We shall implement DELETEMIN for an unsorted list of elements of

type processtype, as defined in Example 4.9. The list is headed by an empty cell. The implementations of INSERT and MAKENULL are straightforward, and we leave the implementation using sorted lists as an exercise. Figure 4.19 gives the declaration for cells, for the type PRIORITYQUEUE, and for the procedure DELETEMIN.

Partially Ordered Tree Implementation of

In document Data Structures and Algorithms Alfred V Aho pdf (Page 163-168)