## Hash Function

Performance of Hash Tables depends on how good a hash functions is

In chaining implementation

- insert/delete could be anywhere from $\cfrac{m}{n}$ to $m$ for $m$ objects
- $\cfrac{m}{n}$ - equally distributed
- $m$ - all in the same bucket

That means that performance depends on the choice of hash function

### Good Hash Function

So a good hash function should

- spread data out evenly
- be easy to store
- be fast to evaluate

### Bad Hash Function

Example of bad function

- given: memory locations for objects
- $h(x) = x \mod 1000$
- all odd buckets will be empty!

Pathological data sets

- even a super-clever hash function does not guarantee even distribution
- for every hash function there exists a pathological data set

## Collisions

A *collision* is

- distinct $x, y \in U$
- such that $h(x) = h(y)$

Well-known issue: Birthday paradox

## See also

## Sources