This library implements a fast concurrent bloom filter, based on bloom-1 from
"Fast Bloom Filters and Their Generalization" by Y Qiao, et al.
A bloom filter is a probabilistic, constant-space, set-like data structure
supporting insertion and membership queries. This implementation is backed by
SipHash so can safely consume untrusted inputs.
The implementation here compares favorably with traditional set
implementations in a single-threaded context, e.g. here are 10 inserts or
lookups compared across some sets of different sizes:
With the llvm backend benchmarks take around 75-85% of the runtime of the
native code gen.
Unfortunately writes in particular don't seem to scale currently; i.e.
distributing writes across multiple threads may be slower than in a
single-threaded context, because of memory effects. We plan to export
functionality that would support using the filter here in a concurrent
context with better memory behavior (e.g. a server that shards to a
thread-pool which handles only a portion of the bloom array).