The Invisible Wall Around Hash Tables

Why we believed reordering was essential — and what happens when someone proves it’s not

We’ve all done it.

You’re building a hash table. Open addressing. Greedy insertion. Straightforward: hash the key, scan forward until you find an empty slot, drop it in.

And yeah, we know the performance hit $ \Theta(\log(1/\delta)) $ probes as the table fills up. That’s what Yao proved in the 80s. Ullman saw it coming even earlier. We’ve lived with that bound as if it’s gravity.

What does \(\Theta(\log(1/\delta))\) actually mean?

This bit of notation might look like a maths flex, but it describes something clear:

  • $ \delta $ is how full your hash table is. If it’s 90% full, then $ \delta = 0.9 $.
  • \(1/\delta\) tells you how little room is left. \(1/0.9 \approx 1.11\) — not much room left.
  • Taking the log? That grows slowly at first, but steeply as ( \delta \to 1 ).

Here’s what it feels like:

Load FactorLog TermProbes Needed
0.50.30few
0.750.58more
0.91.00lots
0.992.00painful
0.9993.00why

Takeaway:
Even though \( \log(1/\delta)\) grows slowly, the tail end hurts.
Greedy insertions struggle badly once tables get nearly full.

That “law”? Not really a law. Just a habit.


The mental model we never questioned

The story was: if you don’t reorder elements, you can’t beat \(\Theta(\log(1/\delta))\) probe complexity. Most improvements tried to sidestep the rule — move keys around after insertions, or use clever backup schemes.

Recent research reveals something big: you can break that bound — without any clever reordering. No reordering. No cleanup. Just stop being greedy.

Let’s see what that actually means.


What’s the state of the art?

Most modern hash tables don’t just do naive linear probing. We’ve gotten smarter — but not that much smarter.

Here’s what’s common today:

  • Linear Probing: Fast and cache-friendly, but clusters badly under load.
  • Quadratic Probing: Spreads out collisions better but can still form secondary clusters.
  • Double Hashing: Uses a second hash to avoid predictable probe sequences. Good in theory, finicky in practice.
  • Robin Hood Hashing: Tries to balance probe lengths by stealing slots from rich (short-probe) keys. It helps — but requires reordering.
  • Hopscotch and Cuckoo Hashing: Push even further to avoid clusters. Both reorder aggressively and juggle entries to keep probe lengths low.

Each of these try to dodge the log-bound in their own way — but they either:

  1. reorder entries, or
  2. introduce more bookkeeping (like tracking probe lengths or relocation windows).

Which makes sense, right? If you assume greedy is your only move, your only fix is to clean up afterwards.

What if you didn’t need to?


Seeing Greedy in Action

We’ll start with the classic greedy insert, and add a simple visualisation so you can see the clustering happen.

Here’s the core logic:

pub fn insert_greedy(&mut self, key: K, value: V) -> usize {
    let mut probes = 0;
    loop {
        let idx = (self.hash(&key) + probes) % self.table.len();
        if self.table[idx].is_none() {
            self.table[idx] = Some((key, value));
            return probes + 1;
        }
        probes += 1;
    }
}

Each key just walks forward until it finds a spot. Direct. Deadly.


Run it yourself

If you want to feel the slowdown, here’s a Rust demo you can tweak live:

https://github.com/oliverdaff/hash-tables-without-assumptions

cargo run --bin post1-invisible-wall

Or customise it:

cargo run --bin post1-invisible-wall -- --slots 40 --keys 25

You can also control the hashing behaviour:

# Use real-world hashing (DefaultHasher) — the default
cargo run --bin post1-invisible-wall -- --slots 40 --keys 25

# Force terrible clustering with a mod-10 hash
cargo run --bin post1-invisible-wall -- --slots 40 --keys 25 --hash-strategy mod10

A note about hashing

By default, we use Rust’s DefaultHasher — the same randomised SipHash that production systems use.

No tricks. No cheating. Keys land evenly unless you’re deliberately chasing chaos.

If you want to see worst-case clustering — to feel the wall hit even harder — you can switch to a deliberately bad hash (mod10).


HashTable Config Example

When you run it, you’ll see something like:

cargo run --bin post1-invisible-wall
HashTable Config:
  - Slots: 30
  - Keys: 0..14
  - Hash strategy: Default

Hash Table View:

  [ ]  [2]  [4]  [9]  [ ] [11]  [ ]  [5]  [ ]  [ ]
  [7]  [ ]  [ ]  [ ]  [ ] [12]  [ ]  [8]  [ ] [13]
  [0] [14]  [3] [10]  [1]  [ ]  [ ]  [ ]  [ ]  [6]

or if you force a bad hash:

cargo run --bin post1-invisible-wall -- --hash-strategy=mod10

HashTable Config:
  - Slots: 30
  - Keys: 0..14
  - Hash strategy: Mod10

Hash Table View:

  [0]  [1]  [2]  [3]  [4]  [5]  [6]  [7]  [8]  [9]
 [10] [11] [12] [13] [14]  [ ]  [ ]  [ ]  [ ]  [ ]
  [ ]  [ ]  [ ]  [ ]  [ ]  [ ]  [ ]  [ ]  [ ]  [ ]

It’s all explicit — no surprises.


Measuring the Invisible Wall

Demos are fun, but we also measured it properly.

We benchmarked insertion times at across different load factors —
then plotted them.

Here’s what happens:

Insert Times vs Load Factor

Take a look at the right end.
Greedy looks fine… until about 90% full. After that, things quickly unravel.


How we did it

We took two different measurements:


Measuring probes per insertion

We wanted to see the real invisible wall — how the number of probes needed grows as the table fills up.

Commands:

just probes_at_load
just plot_probe_time

The first command inserts keys one-by-one into a large table,
recording the load factor and the number of probes used for each insertion.

The second plots probes per insertion against load factor.

This highlights the raw logarithmic slowdown without any averaging effects.


Measuring average insertion time

We also measured the practical impact — how slow greedy insertion feels when inserting many keys at once.

Commands:

just bench
just bench-report

The bench command runs Criterion benchmarks at five load factors (50%, 75%, 90%, 95%, 99%), then bench-report opens the bench mark graphs in your browser.

This shows how quickly performance degrades under real-world usage as the tables fill up.


Two measurements, two graphs, two ways to see the invisible wall.


Explore the Code

You can grab the repo:

git clone https://github.com/oliverdaff/hash-tables-without-assumptions
cd hash-tables-without-assumptions

Then:

just post1
just bench
just bench-report

or with nix

nix develop -c just post1
nix develop -c just bench
nix develop -c just bench-report

We use just for shortcuts, flake.nix for reproducibility


Next up:

How to break the wall — without reordering anything.
Elastic insertions, sub-logarithmic probes, and fairer hashing.

It gets better.