The idea behind hashing is that some data either has no inherent ordering (such as images) or is expensive to compare (such as images). If the data has no inherent ordering, you can’t perform comparison searches. If the data is expensive to compare, the number of comparisons used even by a binary search might be too many. So instead of looking at the data themselves, you’ll condense (hash) the data to an integer (its hash value) and keep all the data with the same hash value in the same place. This task is carried out by using the hash value as an index into an array.
To search for an item, you simply hash it and look at all the data whose hash values match that of the data you’re looking for. This technique greatly lessens the number of items you have to look at. If the parameters are set up with care and enough storage is available for the hash table, the number of comparisons needed to find an item can be made arbitrarily close to one.
One aspect that affects the efficiency of a hashing implementation is the hash function itself. It should ideally distribute data randomly throughout the entire hash table, to reduce the likelihood of collisions. Collisions occur when two different keys have the same hash value. There are two ways to resolve this problem. In “open addressing,” the collision is resolved by the choosing of another position in the hash table for the element inserted later. When the hash table is searched, if the entry is not found at its hashed position in the table, the search continues checking until either the element is found or an empty position in the table is found.
The second method of resolving a hash collision is called “chaining.” In this method, a “bucket” or linked list holds all the elements whose keys hash to the same value. When the hash table is searched, the list must be searched linearly.
No comments:
Post a Comment