• No results found

Chapter Objectives. Chapter 9. Sequential Search. Search Algorithms. Search Algorithms. Binary Search

N/A
N/A
Protected

Academic year: 2022

Share "Chapter Objectives. Chapter 9. Sequential Search. Search Algorithms. Search Algorithms. Binary Search"

Copied!
8
0
0

Loading.... (view fulltext now)

Full text

(1)

Data Structures Using C++ 1

Chapter 9

Search Algorithms

Data Structures Using C++ 2

Chapter Objectives

• Learn the various search algorithms

• Explore how to implement the sequential and binary search algorithms

• Discover how the sequential and binary search algorithms perform

• Become aware of the lower bound on comparison-based search algorithms

• Learn about hashing

Data Structures Using C++ 3

Sequential Search

template<class elemType>

int arrayListType<elemType>::seqSearch(const elemType& item) {

int loc;

bool found = false;

for(loc = 0; loc < length; loc++) if(list[loc] == item) {

found = true;

break;

} if(found)

return loc;

else return -1;

}//end seqSearch

What is the time complexity?

Data Structures Using C++ 4

Search Algorithms

• Search item: target

• To determine the average number of comparisons in the successful case of the sequential search algorithm:

– Consider all possible cases

– Find the number of comparisons for each case – Add the number of comparisons and divide by

the number of cases

Search Algorithms

Suppose that there are n elements in the list. The following expression gives the average number of comparisons, assuming that each element is equally likely to be sought:

It is known that

Therefore, the following expression gives the average number of comparisons made by the sequential search in the successful case:

Binary Search

(assumes list is sorted)

(2)

Data Structures Using C++ 7

Binary Search: middle element

first + last mid = 2

Data Structures Using C++ 8

Binary Search

template<class elemType>

int orderedArrayListType<elemType>::binarySearch (const elemType& item) {

int first = 0;

int last = length - 1;

int mid;

bool found = false;

while(first <= last && !found) {

mid = (first + last) / 2;

if(list[mid] == item) found = true;

else

if(list[mid] > item) last = mid - 1;

else

first = mid + 1;

} if(found)

return mid;

else return –1;

}//end binarySearch

Data Structures Using C++ 9

Binary Search: Example

Data Structures Using C++ 10

Binary Search: Example

• Unsuccessful search

• Total number of comparisons is 6

Performance of Binary Search Performance of Binary Search

(3)

Data Structures Using C++ 13

Performance of Binary Search

• Unsuccessful search

– for a list of length n, a binary search makes approximately 2 log2 (n + 1) key comparisons

• Successful search

– for a list of length n, on average, a binary search makes 2 log2 n – 4 key comparisons

• Worst case upper bound: 2 + 2 log2n

Data Structures Using C++ 14

Search Algorithm Analysis Summary

Data Structures Using C++ 15

Lower Bound on Comparison- Based Search

• Definition: A comparison-based search algorithm performs its search by repeatedly comparing the target element to the list elements.

• Theorem: Let L be a list of size n > 1. Suppose that the elements of L are sorted. If SRH(n) denotes the minimum number of comparisons needed, in the worst case, by using a comparison-based algorithm to recognize whether an element x is in L, then SRH(n) = log2 (n + 1).

– If list not sorted, worst case is n comparisons

• Corollary: The binary search algorithm is the optimal worst-case algorithm for solving search problems by the comparison method (when the list is sorted).

– For unsorted lists, sequential search is optimal

Data Structures Using C++ 16

Hashing

• An alternative to comparison-based search

• Requires storing data in a special data structure, called a hash table

• Main objectives to choosing hash functions:

– Choose a hash function that is easy to compute – Minimize the number of collisions

Commonly Used Hash Functions

• Mid-Square

– Hash function, h, computed by squaring the identifier – Using appropriate number of bits from the middle of

the square to obtain the bucket address – Middle bits of a square usually depend on all the

characters, it is expected that different keys will yield different hash addresses with high probability, even if some of the characters are the same

Commonly Used Hash Functions

• Folding

– Key X is partitioned into parts such that all the parts, except possibly the last parts, are of equal length – Parts then added, in convenient way, to obtain hash

address

• Division (Modular arithmetic) – Key X is converted into an integer iX – This integer divided by size of hash table to get

remainder, giving address of X in HT

(4)

Data Structures Using C++ 19

Commonly Used Hash Functions

Suppose that each key is a string. The following C++ function uses the division method to compute the address of the key:

int hashFunction(char *key, int keyLength) {

int sum = 0;

for(int j = 0; j <= keyLength; j++) sum = sum + static_cast<int>(key[j]);

return (sum % HTSize);

}//end hashFunction

Data Structures Using C++ 20

Collision Resolution

• Algorithms to handle collisions

• Two categories of collision resolution techniques

– Open addressing (closed hashing) – Chaining (open hashing)

Data Structures Using C++ 21

Collision Resolution:

Open Addressing

Pseudocode implementing linear probing:

hIndex = hashFunction(insertKey);

found = false;

while(HT[hIndex] != emptyKey && !found) if(HT[hIndex].key == key)

found = true;

else

hIndex = (hIndex + 1) % HTSize;

if(found)

cerr<<”Duplicate items are not allowed.”<<endl;

else

HT[hIndex] = newItem;

Data Structures Using C++ 22

Linear Probing

• 9 will be next location if h(x) = 6,7,8, or 9

ƒ Probability of 9 being next = 4/20, for 14, it’s 5/20, but only 1/20 for 0 or 1

ƒ Clustering

Random Probing

• Uses a random number generator to find the next available slot

• ith slot in the probe sequence is: (h(X) + r

i

)

% HTSize where r

i

is the ith value in a random permutation of the numbers 1 to HTSize – 1

• All insertions and searches use the same sequence of random numbers

Quadratic Probing

• ith slot in the probe sequence is: (h(X) + i2) % HTSize (start i at 0)

• Reduces primary clustering of linear probing

• We do not know if it probes all the positions in the table

• When HTSize is prime, quadratic probing probes about half the table before repeating the probe sequence

(5)

Data Structures Using C++ 25

Deletion: Open Addressing

• When deleting, need to remove the item from its spot, but cannot reset it to empty (Why?)

Data Structures Using C++ 26

Deletion: Open Addressing

• IndexStatusList[i] set to –1 to mark item i as deleted

Data Structures Using C++ 27

Collision Resolution:

Chaining (Open Hashing)

• No probing needed; instead put linked list at each hash

position Data Structures Using C++ 28

Hashing Analysis

Let

Then a is called the load factor

Average Number of Comparisons

• Linear probing:

• Successful search:

• Unsuccessful search:

• Quadratic probing:

• Successful search:

• Unsuccessful search:

Chaining:

Average Number of Comparisons

1. Successful search

2. Unsuccessful search

(6)

1

Sear ching L ists

aremanyinstanceswhenoneisinterested andsearchingalist: phonecompanywantstoprovidecallerID: venaphonenumberanameisreturned. whoplaystheLottothinkstheycan theirchancetowiniftheykeeptrack thewinningnumberfromthelast3years. thesehavesimplesolutionsusingarrays, lists,binarytrees,etc. ,noneofthesesolutionsisadequate, willsee.

Hashing2

Pr oblematic L ist Sear ching

Herearesomesolutionstotheproblems: •ForcallerID: Useanarrayindexedbyphonenumber. ∗Requiresoneoperationtoreturnname. ∗Anarrayofsize1,000,000,000isneeded. Usealinkedlist. ∗ThisrequiresO(n)timeandspace,wheren isthenumberofactualphonenumbers. ∗Thisisthebestwecandospace-wise,but thetimerequiredishorrendous. Abalancedbinarytree:O(n)spaceand O(logn)time.Better,butnotgoodenough. •FortheLottowecouldusethesamesolutions. Thenumberofpossiblelottonumbersis 15,625,000fora6/50lotto. Thenumberofentriesstoredisontheorderof 1000,assumingadailylotto.

s

Hashing3

The H ash T able Solution

•AHashtableissimilartoanarray: Hasfixedsizem. Anitemwithkeykgoesintoindexh(k), wherehisafunctionfromthekeyspaceto {0,1,...,m−1} •Thefunctionhiscalledahashfunction. •Wecallh(k)thehashvalueofk. •Ahashtableallowsustostorevaluesfroma largesetinasmallarrayinsuchawaythat searchingisfast. •Thespacerequiredtostorennumbersinahash tableofsizemisO(m+n).Noticethatthisdoes notdependonthesizeofthekeyspace. •Theaveragetimerequiredtoinsert,delete,and searchforanelementinahashtableisO(1). •Soundsperfect.Thenwhat’stheproblem?Let’s lookatanexample. 4

Hash T a ble E xample

5friendswhosephonenumbersIwantto ahashtableofsize8.Theirnamesand are: SusyOlson555-1212 SarahSkillet555-4321 RyanHamster545-3241 JustinCase555-6798 ChrisLindmeyer535-7869 ahashfunctionh(k)=kmod8,wherek phonenumberviewedasa7-digitdecimal . that mod8=5453241mod8=1. tputSarahSkilletandRyanHamster theposition1. efixthisproblem?

Hashing5

Hash T a ble P ro blems

•Theproblemwithhashtablesisthattwokeyscan havethesamehashvalue.Thisiscalledcollision. •Thereareseveralwaystodealwiththisproblem: Pickthehashfunctionhtominimizethe numberofcollisions. Implementthehashtableinawaythatallows keyswiththesamehashvaluetoallbestored. •Thesecondmethodisalmostalwaysneeded,even forverygoodhashfunctions.Why? •Wewilltalkabout2collisionresolution techniques: Chaining OpenAddressing •Butfirst,abitmoreonhashfunctions

Hashing6

Hash Functions

•Mosthashfunctionsassumethekeyscomefrom theset{0,1,...}. •Ifthekeysarenotnaturalnumbers,somemethod mustbeusedtoconvertthem. PhonenumberandIDnumberscanbe convertedbyremovingthehyphens. CharacterscanbeconvertedusingASCII. Stringscanbeconvertedbyconvertingeach characterusingASCII,andtheninterpreting thestringofnaturalnumbersasifitwhere storedbase128. •Weneedtochooseagoodhashfunction. •Ahashfunctionisgoodif itcanbecomputedquickly,and thekeysaredistributeduniformlythroughout thetable.

(7)

7

Good Hash Functions

method: h(k)=kmodm mmustbechosencarefully. of2and10canbebad.Why? nottooclosetopowersof2 choice. pickmbychoosinganappropriate thatisclosetothetablesizewe method:LetAbeaconstant 1.Thenweuse h(k)=m(kAmod1), mod1”meansthefractionalpartof ofmisnotascriticalhere. choosemtomaketheimplementation fast.

Hashing8

Uni v ersal Hashing

•Letxandybedistinctkeys. •LetHbeasetofhashfunctions. •Hiscalleduniversalifthenumberoffunctions h∈Hforwhichh(x)=h(y)isprecisely|H|/m. •Inotherwords,ifwepickarandomh∈H,the probabilitythatxandycollideunderhis1/m. •Universalhashingcanbeusefulinmany situations. Youareaskedtocomeupwithahashing techniqueforyourboss. Yourbosstellsyouthatafteryouaredone,he willpicksomekeystohash. Ifyougettoomanycollisions,hewillfireyou. Ifyouuseuniversalhashing,theonlywayhe cansucceedisbygettinglucky.Why?

Hashing9

Collision R esolution: Chaining

•Withchaining,wesetupanarrayoflinks, indexedbythehashvalues,tolistsofitemswith thesamehashvalue. 13465

5 21

320 1 2 3 4 •Letnbethenumberofkeys,andmthesizeofthe hashtable.Definetheloadfactorα=n/m. •Successfulandunsuccessfulsearchingbothtake timeΘ(1+α)onaverage,assumingsimple uniformhashing. •Bysimpleuniformhashingwemeanthatakey hasequalprobabilitytohashintoanyofthem slots,independentoftheotherelementsofthe table. 10

esolution: Open Addr essing

Addressing,allelementsarestoredin severalthings isusedthanwithchainingsince thavetostorepointers. tablehasanabsolutesizelimitofm. lanningaheadisimportantwhenusing haveawaytostoremultipleelements samehashvalue. hashfunction,weneedtouseaprobe Thatis,asequenceofhashvalues. thesequenceonebyoneuntilwe position. wedothesamething,skipping donotmatchourkey.

Hashing11

Pr obe S equences

•Wedefineourhashfunctionswithanextra parameter,theprobenumberi.Thusourhash functionslooklikeh(k,i). •Wecomputeh(k,0),h(k,1),...,h(k,i)until h(k,i)isempty. •Thehashfunctionhmustbesuchthattheprobe sequenceh(k,0),...h(k,m−1)isa permutationof0,1,...,m−1.Why? •Insertionandsearchingarefairlyquick,assuming agoodimplementation. •Deletioncanbeaproblembecausetheprobe sequencecanbecomplex. •Wewilldiscuss2waysofdefiningprobe sequences: LinearProbing DoubleHashing

Hashing12

Linear Pr obing

•Letgbeahashfunction. •Whenweuselinearprobing,theprobesequence iscomputedby h(k,i)=(g(k)+i)modm. •Whenweinsertanelement,westartwiththehash value,andproceedelementbyelementuntilwe findanemptyslot. •Forsearching,westartwiththehashvalueand proceedelementbyelementuntilwefindthekey wearelookingfor. •Example:Letg(k)=kmod13.Wewillinsert thefollowingkeysintothehashtable: 18,41,22,44,59,32,31,73 5432161211109870 •Problem:Thevaluesinthetabletendtocluster.

(8)

13

Double H ashing

hashingweusetwohashfunctions. andh2behashfunctions. ourprobesequenceby h(k,i)=(h1(k)+i×h2(k))modm. pickm=2nforsomen,andensurethat isalwaysodd. hashingtendstodistributekeysmore thanlinearprobing.

Hashing14

Double H ashing Example

•Leth1=kmod13 •Leth2=1+(kmod8),and •Letthehashtablehavesize13. •Thenourprobesequenceisdefinedby h(k,i)=(h1(k)+i×h2(k))mod13 =(kmod13+i(1+(kmod8)))mod13 •Insertthefollowingkeysintothetable: 18,41,22,44,59,32,31,73 5432161211109870

Hashing15

Open Addr essing: P erf ormance

•Again,wesetα=n/m.Thisistheaverage numberofkeysperarrayindex. •Notethatα≤1foropenaddressing. •Weassumeagoodprobesequencehasbeenused. Thatis,foranykey,eachpermutationof 0,1...,m−1isequallylikelyasaprobe sequence. •Theaveragenumberofprobesforinsertionor unsuccessfulsearchisatmost1/(1−α). •Theaveragenumberofprobesforasuccessful searchisatmost 1 αln1 1−α+1 α. 16

OA: SS OA: I/US Ch: S Ch: I

0 5 10 15 20 0 0.2

0.4 0.6

0.8 1

Hashing Performance: Chaining Verses Open Addressing

1 1+x 1/(1-x) (1/x)*log(1/(1-x))+1/x

References

Related documents

The Company‘s current pro-forma financial outlook builds on the Restructuring Plan presented to Congress on December 2, incorporating updated economic and industry

For an array of a million elements, binary search, O(log N), will find the target element with a worst case of only 20 comparisons.. Linear search, O(N), on average will take

If the grid is stored using a binary search tree, the expected running time is O(n log n).. Using hashing, the running time improves by

NML also offers blood antibody testing services for groups of 5-10 prevaccinated youngstock over 6 months of age (in each management group) to identify if animals have

How is texas, short term lease apartments antonio, like to stay suite rentals available and all of which reside in san antonio apartment.. Roommate may give each year, short term

We will assess methods used to blind outcome assessment as low, high or unclear risk of bias.. (4) Incomplete outcome data (checking for possible attrition bias due to the

Customer check-up see Maintenance schedules chapter Change of descaling filter see menu service date Descaling see service routines menu.. Menu PIN 

What is the average number of comparisons needed in a sequential search to determine the position of an element in an array of 100 elements, if the elements are ordered from