Linear Search - Algorithms and Data Structures Niklaus Wirth pdf

1.9. Searching

1.9.1. Linear Search

When no further information is given about the searched data, the obvious approach is to proceed sequentially through the array in order to increase step by step the size of the section, where the desired element is known not to exist. This approach is called linear search. There are two conditions which terminate the search:

1. The element is found, i.e. ai = x.

2. The entire array has been scanned, and no match was found. This results in the following algorithm:

i := 0;

WHILE (i < N) & (a[i] # x) DO INC(i) END

Note that the order of the terms in the Boolean expression is relevant. The invariant, i.e the condition satisfied before each incrementing of the index i, is

(0 ≤ i < N) & (Ak : 0 ≤ k < i : ak ≠ x)

expressing that for all values of k less than i no match exists. From this and the fact that the search terminates only if the condition in the while-clause is false, the resulting condition is derived as

((i = N) OR (ai = x )) & (Ak : 0 < k < i : ak≠ x)

This condition not only is our desired result, but also implies that when the algorithm did find a match, it found the one with the least index, i.e. the first one. i = N implies that no match exists.

Termination of the repetition is evidently guaranteed, because in each step i is increased and therefore certainly will reach the limit N after a finite number of steps; in fact, after N steps, if no match exists. Each step evidently requires the incrementing of the index and the evaluation of a Boolean expression. Could this task be simplifed, and could the search thereby be accelerated? The only possibility lies in finding a simplification of the Boolean expression which notably consists of two factors. Hence, the only chance for finding a simpler solution lies in establishing a condition consisting of a single factor that implies both factors. This is possible only by guaranteeing that a match will be found, and is achieved by posting an additional element with value x at the end of the array. We call this auxiliary element a sentinel, because it prevents the search from passing beyond the index limit. The array a is now declared as

a: ARRAY N+1 OF INTEGER

and the linear search algorithm with sentinel is expressed by a[N] := x; i := 0;

WHILE a[i] # x DO INC(i) END

The resulting condition, derived from the same invariant as before, is (ai = x) & (Ak : 0 ≤ k < i : ak≠ x)

Evidently, i = N implies that no match (except that for the sentinel) was encountered. 1.9.2. Binary Search

There is quite obviously no way to speed up a search, unless more information is available about the searched data. It is well known that a search can be made much more effective, if the data are ordered. Imagine, for example, a telephone directory in which the names were not alphabetically listed. It would be utterly useless. We shall therefore present an algorithm which makes use of the knowledge that a is ordered, i.e., of the condition

Ak: 1 ≤ k < N : ak-1 ≤ ak

The key idea is to inspect an element picked at random, say am, and to compare it with the search argument x. If it is equal to x, the search terminates; if it is less than x, we infer that all elements with indices less or equal to m can be eliminated from further searches; and if it is greater than x, all with index greater or equal to m can be eliminated. This results in the following algorithm called binary search; it uses two index variables L and R marking the left and at the right end of the section of a in which an element may still be found.

L := 0; R := N-1; found := FALSE ; WHILE (L ≤ R) & ~found DO m := any value between L and R; IF a[m] = x THEN found := TRUE ELSIF a[m] < x THEN L := m+1 ELSE R := m-1

END

The loop invariant, i.e. the condition satisfied before each step, is (L ≤ R) & (Ak : 0 ≤ k < L : ak < x) & (Ak : R < k < N : ak > x) from which the result is derived as

found OR ((L > R) & (Ak : 0 ≤ k < L : ak < x ) & (Ak : R < k < N : ak > x)) which implies

(am = x) OR (Ak : 0 ≤ k < N : ak ≠ x)

The choice of m is apparently arbitrary in the sense that correctness does not depend on it. But it does influence the algorithm's effectiveness. Clearly our goal must be to eliminate in each step as many elements as possible from further searches, no matter what the outcome of the comparison is. The optimal solution is to choose the middle element, because this eliminates half of the array in any case. As a result, the maximum number of steps is log2N, rounded up to the nearest integer. Hence, this algorithm offers a drastic improvement over linear search, where the expected number of comparisons is N/2.

The efficiency can be somewhat improved by interchanging the two if-clauses. Equality should be tested second, because it occurs only once and causes termination. But more relevant is the question, whether -- as in the case of linear search -- a solution could be found that allows a simpler condition for termination. We indeed find such a faster algorithm, if we abandon the naive wish to terminate the search as soon as a match is established. This seems unwise at first glance, but on closer inspection we realize that the gain in efficiency at every step is greater than the loss incurred in comparing a few extra elements. Remember that the number of steps is at most log N. The faster solution is based on the following invariant:

(Ak : 0 ≤ k < L : ak < x) & (Ak : R ≤ k < N : ak≥ x)

and the search is continued until the two sections span the entire array. L := 0; R := N;

WHILE L < R DO m := (L+R) DIV 2;

IF a[m] < x THEN L := m+1 ELSE R := m END END

The terminating condition is L ≥ R. Is it guaranteed to be reached? In order to establish this guarantee, we must show that under all circumstances the difference R-L is diminished in each step. L < R holds at the

beginning of each step. The arithmetic mean m then satisfies L ≤ m < R. Hence, the difference is indeed diminished by either assigning m+1 to L (increasing L) or m to R (decreasing R), and the repetition terminates with L = R. However, the invariant and L = R do not yet establish a match. Certainly, if R = N, no match exists. Otherwise we must take into consideration that the element a[R] had never been compared. Hence, an additional test for equality a[R] = x is necessary. In contrast to the first solution, this algorithm -- like linear search -- finds the matching element with the least index.

In document Algorithms and Data Structures Niklaus Wirth pdf (Page 30-32)