Ghost

Image by: Bertrand Colombo

Understanding Bloom Filters

Published on: June 29, 2025

Bloom filters: what they are, how they work, and how we can implement and use them in real systems.

Table of Contents

Introduction

Recently, while studying the Design a URL Shortener problem in System Design Interview, I came across Bloom filters again. I remembered how useful they are for checking whether an item exists in a set quickly and with very little memory.

I thought it would be a good idea to write about Bloom filters: what they are, how they work, and how we can implement and use them in real systems.

This made me think about why Bloom filters are so popular in system design problems, so I decided to take a deeper look and share what I learned.

Have you ever needed to check if an item exists in a large set, but wanted to do it quickly and using very little memory? Bloom filters are data structures that help you do exactly that. They can tell you if an item is definitely not in the set or maybe in the set, with very little memory usage and high speed.

In this post, we will:

What is a Bloom Filter?

A Bloom filter is a probabilistic(it can give you results that are probably correct, but not always guaranteed.) data structure. It allows you to check for membership in a set, but it has a small chance of false positives.

This means:

Here is a simple diagram showing how a Bloom filter works between a client and storage:

sequenceDiagram
    participant Client
    participant Filter
    participant Storage

    Client->>Filter: Do you have 'key1'?
    Filter-->>Client: No

    Client->>Filter: Do you have 'key2'?
    Filter-->>Client: Yes
    Client->>Storage: Do you have 'key2'?
    Storage-->>Client: Yes, here is key2

    Client->>Filter: Do you have 'key3'?
    Filter-->>Client: Yes (False Positive)
    Client->>Storage: Do you have 'key3'?
    Storage-->>Client: No

In this diagram, you can see:

Bloom filters use:

Because of their efficiency, Bloom filters are widely used in real-world systems.

For example, they are used in:

Now, let’s see how Bloom filters actually work in more detail.

How Does It Work?

A Bloom filter works by using multiple hash functions to set and check bits in a bit array.

graph TD
    Item["Item: apple"] --> Hash1["Hash Function 1 → index 2"]
    Item --> Hash2["Hash Function 2 → index 5"]
    Item --> Hash3["Hash Function 3 → index 8"]

    subgraph BitArray["Bit Array"]
        direction LR
        B0["0"]
        B1["0"]
        B2["1"]
        B3["0"]
        B4["0"]
        B5["1"]
        B6["0"]
        B7["0"]
        B8["1"]
        B9["0"]
    end

    Hash1 --> B2
    Hash2 --> B5
    Hash3 --> B8

Here is how it works step by step:

Adding an item:

When you add an item to the Bloom filter, it runs the item through several hash functions. Each hash function returns an index in the bit array. The filter sets the bits at these indexes to 1. Using multiple hash functions* ensures that each item affects multiple bits, which makes the filter more accurate.

graph TB
    subgraph HashFunctions["Hash Functions"]
        direction LR
        Hash1["Hash1 → index 2"]
        Hash2["Hash2 → index 5"]
        Hash3["Hash3 → index 8"]
    end

    Item["Add Item: apple"] --> Hash1
    Item --> Hash2
    Item --> Hash3

    subgraph BitArray["Bit Array (indexes 0-9)"]
        direction LR
        B0["0"]
        B1["0"]
        B2["0 → 1"]
        B3["0"]
        B4["0"]
        B5["0 → 1"]
        B6["0"]
        B7["0"]
        B8["0 → 1"]
        B9["0"]
    end

    Hash1 --> B2
    Hash2 --> B5
    Hash3 --> B8

Checking an item:

When you check for an item, the same hash functions are used to calculate the indexes. If any of these bits are 0, the item is definitely not in the set. If all bits are 1, the item might be in the set. (This is where false positives happen.)

graph TB
    subgraph HashFunctions["Hash Functions"]
        direction LR
        Hash1["Hash1 → index 2"]
        Hash2["Hash2 → index 5"]
        Hash3["Hash3 → index 8"]
    end

    Item["Check Item: apple"] --> Hash1
    Item --> Hash2
    Item --> Hash3

    Hash1 --> Check2["Check bit 2"]
    Hash2 --> Check5["Check bit 5"]
    Hash3 --> Check8["Check bit 8"]

    subgraph BitArray["Bit Array (indexes 0-9)"]
        direction LR
        B0["0"]
        B1["0"]
        B2["1"]
        B3["0"]
        B4["0"]
        B5["1"]
        B6["0"]
        B7["0"]
        B8["1"]
        B9["0"]
    end

    Check2 --> B2
    Check5 --> B5
    Check8 --> B8

    B2 --> IsSet2["bit 2 is set"]
    B5 --> IsSet5["bit 5 is set"]
    B8 --> IsSet8["bit 8 is set"]

    IsSet2 --> FinalResult["Result"]
    IsSet5 --> FinalResult
    IsSet8 --> FinalResult

    subgraph Outcome["Outcome"]
        direction LR
        Outcome1["All bits set → maybe in set"]
        Outcome2["Any bit unset → definitely not in set"]
    end

    FinalResult --> Outcome1
    FinalResult --> Outcome2

Why False Positives Occur:

False positives happen because different items can produce the same indexes after hashing. As more items are added, the chance of these overlaps increases.

Why Multiple Hash Functions?

If we used only one hash function:

By using k different hash functions:

In short:

However, using too many hash functions will slow down the filter without giving much improvement.  

Example Implementation Go

Let’s implement a simple Bloom filter in Go to understand how it works in practice. In this implementation:

We use xxHash for fast hashing.

We store bits efficiently using a byte slice ([]byte) instead of a boolean array.

Multiple hash functions are simulated by salting the data with a counter.

Here is the full code:


package main

import (
        "fmt"

        "github.com/cespare/xxhash/v2"
)

type BloomFilter struct {
        size      uint64
        hashCount uint64
        bitArray  []byte
}

func NewBloomFilter(size, hashCount uint64) *BloomFilter {
        byteSize := (size + 7) / 8
        return &BloomFilter{
                size:      size,
                hashCount: hashCount,
                bitArray:  make([]byte, byteSize),
        }
}

func (bf *BloomFilter) getHashes(data string) []uint64 {
        hashes := make([]uint64, bf.hashCount)
        for i := uint64(0); i < bf.hashCount; i++ {
                h := xxhash.New()
                h.Write(fmt.Appendf(nil, "%s%d", data, i))
                hashes[i] = h.Sum64() % bf.size
        }
        return hashes
}

func (bf *BloomFilter) setBit(pos uint64) {
        byteIndex := pos / 8
        bitIndex := pos % 8
        bf.bitArray[byteIndex] |= 1 << bitIndex
}

func (bf *BloomFilter) getBit(pos uint64) bool {
        byteIndex := pos / 8
        bitIndex := pos % 8
        return (bf.bitArray[byteIndex] & (1 << bitIndex)) != 0
}

func (bf *BloomFilter) Add(data string) {
        for _, idx := range bf.getHashes(data) {
                bf.setBit(idx)
        }
}

func (bf *BloomFilter) Check(data string) bool {
        for _, idx := range bf.getHashes(data) {
                if !bf.getBit(idx) {
                        return false
                }
        }
        return true
}

Explanation:

Bloom filters are simple but powerful. In the next section, we will see how to use Bloom filters with Redis for scalable and production-ready membership checks.

Using Bloom Filters with Redis Stack

In production systems, you often need a distributed and scalable Bloom filter. Redis Stack makes this easy because it includes RedisBloom by default, along with other useful modules like RediSearch and RedisJSON.

What is Redis Stack?

Redis Stack is an enhanced version of Redis that combines:

With Redis Stack, you can use Bloom filters out of the box without extra setup.

Installing Redis Stack

If you use Docker, you can start Redis Stack quickly:


docker run -d -p 6379:6379 redis/redis-stack:latest

This runs Redis enabled on port 6379.

Basic Usage

If you want to use redis-cli directly inside the Redis Stack container, follow these steps:


docker ps
docker exec -it <container_id_or_name> redis-cli

This will open the Redis CLI inside your running Redis Stack container. You can now run commands like:


# Create a Bloom filter named myfilter
BF.RESERVE myfilter 0.01 1000
# Syntax: BF.RESERVE {key} {error_rate} {capacity}

# Add an item
BF.ADD myfilter "apple"

# Check if an item exists
BF.EXISTS myfilter "apple"
# Returns 1 (exists) or 0 (does not exist)

Why Use Redis Stack for Bloom Filters?

In the next section, we will summarise what we learned about Bloom filters and when to use them in your projects.

Using Bloom Filter in Go with Redis Stack

Here is a simple example using go-redis and Redis Stack:


package main

import (
        "context"
        "fmt"

        "github.com/redis/go-redis/v9"
)

type RedisBloomFilter struct {
        client *redis.Client
        key    string
}

func NewRedisBloomFilter(addr, key string, errorRate float64, capacity int64) (*RedisBloomFilter, error) {
        rdb := redis.NewClient(&redis.Options{
                Addr: addr,
        })

        ctx := context.Background()

        err := rdb.Do(ctx, "BF.RESERVE", key, fmt.Sprintf("%f", errorRate), capacity).Err()
        if err != nil && err.Error() != "ERR item exists" {
                return nil, fmt.Errorf("failed to create bloom filter: %v", err)
        }

        return &RedisBloomFilter{
                client: rdb,
                key:    key,
        }, nil
}

func (rbf *RedisBloomFilter) Add(item string) (bool, error) {
        ctx := context.Background()
        added, err := rbf.client.Do(ctx, "BF.ADD", rbf.key, item).Bool()
        if err != nil {
                return false, fmt.Errorf("failed to add item: %v", err)
        }
        return added, nil
}

func (rbf *RedisBloomFilter) Check(item string) (bool, error) {
        ctx := context.Background()
        exists, err := rbf.client.Do(ctx, "BF.EXISTS", rbf.key, item).Bool()
        if err != nil {
                return false, fmt.Errorf("failed to check item: %v", err)
        }
        return exists, nil
}

Conclusion

In this post, we explored Bloom filters, a powerful and memory-efficient data structure that:

We:

When should you use Bloom filters?

Final thoughts

Bloom filters are simple to implement but extremely powerful in real-world applications such as caches, URL shorteners, web crawlers, and databases. They save memory and time, making your systems faster and more efficient.

You can find the full implementation on GitHub: View the full code on GitHub

Tags: #Bloom Filters #System Design #redis #go #en