Hash functions – efficency and speed

hash function is any function that can be used to map data of arbitrary size to data of a fixed size. The values returned by a hash function are called hash valueshash codesdigests, or simply hashes. Hash functions are often used in combination with a hash table, a common data structure used in computer software for rapid data lookup.

If you happen to need a non cryptographic hash function (and you might need it even if your are using languages that have builtin hashtables like java or c++ ) here are some references taken from various parts:

Interesting comparison of different hash tables :

http://www.eternallyconfuzzled.com/tuts/algorithms/jsw_tut_hashing.aspx

Some more hashfunctions here :

http://www.cse.yorku.ca/~oz/hash.html

Some really interesting benchmarks here by Peter Kankowsky.

And again hash benchmarks here.

Some (a mininum set) benchmarks ran using User-Agents as keys, 1.5 multiply factor on hash size, next power of 2 hash sizes :

Keys 3130591, Average fh_default_hash(8388608) time in nanosecs : 292.56
Total collisions 517939, longest chain 6, hash_dim_factor 1.500000
----------------------
Keys 3130591, Average djb_hash(8388608) time in nanosecs : 181.24
Total collisions 517445, longest chain 7, hash_dim_factor 1.500000
----------------------
Keys 3130591, Average sax_hash(8388608) time in nanosecs : 244.49
Total collisions 517431, longest chain 7, hash_dim_factor 1.500000
----------------------
Keys 3130591, Average jsw_hash(8388608) time in nanosecs : 141.39
Total collisions 522951, longest chain 14, hash_dim_factor 1.500000
----------------------
Keys 3130591, Average jen_hash(8388608) time in nanosecs : 148.54
Total collisions 518565, longest chain 7, hash_dim_factor 1.500000
----------------------
Keys 3130591, Average djb2_hash(8388608) time in nanosecs : 182.37
Total collisions 516405, longest chain 7, hash_dim_factor 1.500000
----------------------
Keys 3130591, Average sdbm_hash(8388608) time in nanosecs : 238.93
Total collisions 525513, longest chain 7, hash_dim_factor 1.500000
----------------------
Keys 3130591, Average fnv_hash(8388608) time in nanosecs : 237.01
Total collisions 517530, longest chain 7, hash_dim_factor 1.500000
----------------------
Keys 3130591, Average oat_hash(8388608) time in nanosecs : 294.78
Total collisions 517939, longest chain 6, hash_dim_factor 1.500000
----------------------


Tests done with the hashtable code you can find here

And last a research study (which I’m still working on) for SIMD optimized hash functions : here  some theory and code here. The all use the vector extensions present in gcc.