The Open Addressing Method: Which finds an alternate place with respect to the hashed value i by using an appropriate increment (for

some constant c). The record being deleted is on a chain (logical or physical) of the collided records. This chain needs to be adjusted (not an easy task) before deleting the record. An easy way would be to flag the record as deleted.

Hashed files, in general, give a good performance however it does not facilitate sequential access, and only one hashed access can be set up. Indexed Files

Although indexed sequential file uses a sparse index, and stores the index efficiently, it does not handle insertions/deletions efficiently, and it allows only one sparse index.

The indexed files use a dense index, where the index has an entry for every key value, many independent indexes can be created which facilitate both sequential and random access on many fields (e.g., on both account number and customer name for accounts file for a bank). Here the file records need not be physically stored in key sequence to simplify insertion. The index is itself stored in ascending order of key values to facilitate sequential retrieval of records in the ascending order of keys. Being in sorted order, it can be searched efficiently for random access.

For a large data, the index itself becomes large. Since it is stored on disk, it must be so organised that it permits efficient search and updates without incurring high I/O cost.

B-Tree:

B-Tree is the practical and efficient method for organising indexes on external storage devices. Each level in the B-tree is like a level in the index, leading to a multilevel index. It effectively provides indexes to indexes from one level to another, until we reach the node leading to the desired record. The B-tree data structure is defined as follows:

 an order m is associated with a B-tree

 the root node has at least 1 key value and 2 pointers  all leaf nodes are at the same level

 all nodes other than root have at least m/2 keys and m/2 + 1 pointers (maximum keys can be m)

Searching B-Tree for a given key value k:

 First start with the root node, if the node contains k, the search ends here else,

 look for two consecutive keys in the node between which k falls, and take the pointer between them to the node on next level.

 The above process is repeated until k is found.

From the above we conclude that maximum length of search is equal to the height of B-tree.

Insertion of new key value:

 Starting from the root node, locate the leaf node B into which k must be placed.

 If B is not full (has less than m keys) then k is added to B (maintaining order of keys).

 If B is already full, adding k to it will make it have m+1 keys. We now need to split B. This is done as follows:

 get a free node B.

 redistribute m+1 keys in B and B’, each having m/2 keys the middle key and pointer to B are inserted in the parent of.

 B using same procedure.

One may also note that B-tree always grows upwards:  Deletion of the key.

 First ensure that the definition of B-tree is preserved. The deletion of k is simple if it is in a leaf node. Otherwise, we replace it by the next higher key k1, which would be in a leaf, and delete k1 instead.

 While deleting, a leaf may become critical when keys in it reduce below m/2: in which case, either borrow key values from its brother nodes, or merge it with others.

 The merging reduces 1 node in the tree. The merging also propagates upwards, and may reduce height of the tree.

The advantages of B-tree for organising index are :

 Usually order (m) is quite large (100-200); hence, their heights are usually small.

 With buffering, most of action takes place in main memory : In one experiment with m = 120, file was created with 1,00,000 keys; 10 buffers were used for buffering nodes; it required only 22 reads and 857 writes to create the index.

 Space utilisation is good as nodes are required by definition to be at least half-full; can be further improved by modifying definition.

Secondary Indexes

The salient features regarding secondary indexes can be listed as below:  It is an index on a non-key field, which may not have unique values in

the records.

 A file may have many secondary indexes to provide efficient access paths on many attributes independently.

 This index may be exhaustive or selective, where in the former case, index entries are made for all values of the attribute and in the later case, the entries exist only for selected values of the attributes.

 As a key value may occur in many records, a typical index entry consists of a value and a set of pointers to records.

 The size of index entry will vary depending on the set size. One may choose an appropriate method for storing such varying- length entries.  For Insertion and deletion of records for a file requires modifying the

index too. i.e. Insertion requires a pointer to be added to the set and deletion requires a pointer to be removed from the set.

Varying Length Records

When file records are of fixed length, it is easy to calculate offset (i.e., relative position) of a field within the record and access the field value. When the fields are of varying length, we need to store field lengths also within the record, which makes access to field values difficult. A varying length record contains a varying length field (e.g., employee name), or varying number of occurrences of a field or actual number of occurrences must be stored within a record as illustrated below:

E# Length of name Employee name Salary ….

In order to interpret such a record, we must know how many varying-length fields are present and how the lengths are stored. The contents of a record need to be scanned in order to locate a field value (such as salary).

A varying length record may be stored using different methods as discussed below:

1. Reserved space: Here we allocate maximum length required by a field,

In document Normalisation and Data Storage Devices (Page 34-38)