unagi-bloomfilter

A fast, cache-efficient, concurrent bloom filter

http://github.com/jberryman/unagi-bloomfilter

Latest on Hackage:

0.1.1.2

This package is not currently in any snapshots. If you're interested in using it, we recommend adding it to Stackage Nightly. Doing so will make builds more reliable, and allow stackage.org to host generated Haddocks.

BSD-3-Clause licensed by Brandon Simmons

Maintained by [email protected]

This library implements a fast concurrent bloom filter, based on bloom-1 from "Fast Bloom Filters and Their Generalization" by Y Qiao, et al.

A bloom filter is a probabilistic, constant-space, set-like data structure supporting insertion and membership queries. This implementation is backed by SipHash so can safely consume untrusted inputs.

The implementation here compares favorably with traditional set implementations in a single-threaded context, e.g. here are 10 inserts or lookups compared across some sets of different sizes:

With the llvm backend benchmarks take around 75-85% of the runtime of the native code gen.

Unfortunately writes in particular don't seem to scale currently; i.e. distributing writes across multiple threads may be slower than in a single-threaded context, because of memory effects. We plan to export functionality that would support using the filter here in a concurrent context with better memory behavior (e.g. a server that shards to a thread-pool which handles only a portion of the bloom array).