What is a Bloom Filter and What are they used for?

Shadid Haque - Oct 2 - - Dev Community

What is a Bloom Filter?

A Bloom filter is like a "quick check" tool that helps you know if something might be in a list, without needing to store the whole list. It's really fast and uses very little space, but sometimes it might say something is in the list when it's actually not. However, if it says something is not in the list, you can be 100% sure it's not.

A Bloom filter is a space-efficient probabilistic data structure used to test whether an element is a member of a set. It is very fast and consumes minimal memory, making it ideal for applications where quick membership checks are needed. However, it comes with a trade-off: false positives are possible, but false negatives are not.

How does it work?

To build a bloom filter we need an array of bits and a couple fast hash functions.

  • Say we have an empty Bloom filter with 10 bits: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0].
  • Suppose we want to insert the element "apple", and we have 3 hash functions that hash "apple" to positions 1, 4, and 7 in the array.
  • We set those bits to 1: [0, 1, 0, 1, 0, 0, 1, 0, 0, 0].
  • Now, if we check for "apple", we hash it again and check bits 1, 4, and 7. Since they are all 1, the Bloom filter says "apple" is probably in the set.
  • If we query for "orange", and any of the corresponding bits (based on its hash) are 0, we know it's definitely not in the set.

demo of how bloom filter works

Use cases of Bloom Filters?

*In Content Delivery Networks (CDNs): *

Bloom filters are widely used in Content Delivery Networks (CDNs) to optimize efficiency, especially when handling large amounts of data. Their primary role in a CDN is to help in quickly determining whether a content or a resource is cached at a particular edge server or not, without requiring an exhaustive search.

The Bloom filter is used to quickly check if a particular object (e.g., a web page, video segment) is possibly cached. If the filter returns a negative response, it means the object is definitely not in the cache.

Akamai Technologies, a leading content delivery network (CDN) provider, uses Bloom filters in an innovative way to optimize their cache performance.

Cache Filtering in Databases: Databases may use Bloom filters to reduce cache lookup times by ensuring that only relevant data is fetched into the cache. If the Bloom filter indicates that a key is not present, it avoids unnecessary cache queries.

Key reasons Bloom filters are used:

Speed: Queries (to check for membership) are performed in constant time, O(k), where k is the number of hash functions used. This makes it very fast for applications where quick lookups are essential. Bloom filters are in memory lookup. When you use Redis for cache you are still using an external service that adds latency. Fun fact Redis (the database) internally uses bloom filters.

Space Efficiency: Bloom filters are highly space-efficient compared to other data structures like hash tables or arrays when it comes to storing large sets. They use a bit array and several hash functions to achieve this, which results in significant memory savings, especially when the set of possible elements is large.

Avoiding Expensive Lookups: In scenarios where querying an actual database or collection is costly (in terms of time or computation), a Bloom filter can be used as a preliminary step. If the filter says the item is not present, no further lookup is needed.

JavaScript implementation of a Bloom Filter

The following code implements a simple bloom filter in javascript.

class BloomFilter {
    constructor(size = 100) {
        this.size = size;
        this.bitArray = new Array(size).fill(0);
    }

    // Simple hash functions (these are not cryptographic hash functions)
    hash1(value) {
        let hash = 0;
        for (let i = 0; i < value.length; i++) {
            hash = (hash + value.charCodeAt(i) * i) % this.size;
        }
        return hash;
    }

    hash2(value) {
        let hash = 0;
        for (let i = 0; i < value.length; i++) {
            hash = (hash + value.charCodeAt(i) * (i + 1)) % this.size;
        }
        return hash;
    }

    // Insert element into Bloom Filter
    add(value) {
        const hash1 = this.hash1(value);
        const hash2 = this.hash2(value);

        this.bitArray[hash1] = 1;
        this.bitArray[hash2] = 1;
    }

    // Check if an element might be in the set
    contains(value) {
        const hash1 = this.hash1(value);
        const hash2 = this.hash2(value);

        return this.bitArray[hash1] === 1 && this.bitArray[hash2] === 1;
    }
}

// Example usage
const bloom = new BloomFilter(50); // Bloom filter with a bit array of size 50

bloom.add("apple");
bloom.add("banana");

console.log(bloom.contains("apple")); // true (or possibly true)
console.log(bloom.contains("banana")); // true (or possibly true)
console.log(bloom.contains("grape")); // false (definitely not in the set)
Enter fullscreen mode Exit fullscreen mode
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Terabox Video Player