Implementing boolean OR Search in meilisearch

Bedram Tamang - Sep 4 - - Dev Community

Meilisearch is a lightning-fast search engine implemented in rust, It's starightforward setup & easy APIs makes it very attractive among developers. Additionally, first-party support by Laravel with it's package scout helps it to span across Laravel community. In this article, I will show you how OR search is implemented in meilisearch.

Problem Statement:

We would like to perform a full-text search among selected attributes with OR conditions, for example: a search of "PHP" or "Engineer" should return all the documents that contain either "PHP" or "Engineer" or both "PHP" & "Engineer". This is a fundamental idea of the OR condition we studied in Boolean algebra. However, there is no direct solution for this problem in [Meilisearch][1]. The closest possible you can reach is to use "last" or "frequency" for MatchingStrategy option, but it's not exactly the OR, instead, these algorithms remove the one word (last or more frequent) at a time and perform search if there are not enough results.

Solution

The basic idea to solve this problem is to use the A U B formula of Math's SET formula, The union formula can be written as:

n(A∪B)=n(A)+n(B)−n(A∩B)
Enter fullscreen mode Exit fullscreen mode

Where,

  • n(A U B) represents A OR B, that means it should return all the documents containing terms either A or either B, or both A and B
  • n(A) is the number of documents that contains the term A,
  • n(B) is the number of documents that contains the term B
  • n(A∩B) is the number of documents that contains both term A and B

By using above formula, the search for terms "PHP" OR "Engineer" would look like,

n(PHP ∪ Engineer) = n(PHP) + n(Engineer) - n(PHP ∩ Engineer)
Enter fullscreen mode Exit fullscreen mode

Similarly, the formula for the three search terms would look like,

n(A∪B∪C)=n(A)+n(B)+n(C)−n(A∩B)−n(B∩C)−n(A∩C)+n(A∩B∩C)
Enter fullscreen mode Exit fullscreen mode

Since n(A) doesn't only represent the cardinality of documents containing terms A, but also the documents that contain both A and other sets. For the simplicity of calculation, we would like to use the formula that deals with explicitly only A, or only that part. For example,

n(A∩B) = n0(A∩B) + n(A∩B∩C)  
Enter fullscreen mode Exit fullscreen mode

Here, n0(A∩B) is the set that only contains the intersection between A and B.

Image description

Which can be further written as,

n(A∪B∪C)=n0(A)+n0(B)+n0(C)+n0(A∩B)+n0(B∩C)+n0(A∩C)+n(A∩B∩C)
Enter fullscreen mode Exit fullscreen mode

Where,

n0(A) represents containing only A, excluding B and C entirely. In search, n0(PHP) refers to documents that include PHP but exclude any mention of Engineer or Manager.
In Meilisearch, exclusion can be indicated using the - sign, such as 'PHP' -'Engineer' -'Manager'.

So, our search becomes:

n0(A) = PHP -Engineer -Manager
n0(B) = Engineer -PHP -Manager
n0(C) = Manager -Engineer -PHP
n0(AB) = PHP Engineer -Manager
n0(BC) = Engineer Manager -PHP
n0(AC) = PHP Manager -Engineer
n(ABC) = PHP Engineer Manager
Enter fullscreen mode Exit fullscreen mode

This means we need to make 7 API calls for 3 search terms. As the number of search terms increases, the number of API calls required also increases, following the formula 2^n−1. So, the number of API calls needed is as follows:

Number of Search Terms (n) Number of API Calls
1 1
2 3
3 7
4 15
5 31

To search using OR conditions with 5 terms, we would need to make 31 API calls. Since there’s no alternative to implement this feature directly, we could limit the number of search terms to 4 or 5. However, there’s a feature called multi-search that allows us to perform multiple searches simultaneously, enhancing performance by executing all the API calls at once.

We can generate these terms using a recursive approach:

function generateTerm(array $ans, $data, array $current)
{
    $last = end($current);

    if ($last == count($data) - 1) {
        return $ans;
    }

    $nextCurrent   = $current;
    $nextCurrent[] = $last + 1;

    array_pop($current);
    $current[] = $last + 1;

    $ans[formatTerm($data, $nextCurrent)] = count($nextCurrent);
    $ans[formatTerm($data, $current)]     = count($current);

    $ans = generateTerm($ans, $data, $nextCurrent);
    $ans = generateTerm($ans, $data, $current);

    return $ans;
}
Enter fullscreen mode Exit fullscreen mode

The algorithm above can be a bit tricky to understand. If you’d like a detailed explanation, let me know. This method can be called as follows:

$data = ['php', 'engineer', 'manager'];
$ans[formatTerm($data, [0])] = 1;

$ans = generateTerm($ans, $data, [0]);
arsort($ans);
Enter fullscreen mode Exit fullscreen mode

The formatTerm function is used to format the search terms specifically for Meilisearch. The output of the above algorithm would look something like this, depending on the input data:

Image description

We have prepared all the search components necessary to perform OR operations. In this structure, the key represents the search terms, while the value represents the score. The score indicates the number of matching terms in each document, with the idea being that documents with a higher number of matching terms will appear first in the search results.

To count the total documents for each search terms

$searches = [];
foreach ($terms as $term => $index) {
    $searches[] = (new SearchQuery())
        ->setQuery($term)
        ->setAttributesToRetrieve(['id'])
        ->setIndexUid('jobins_data')
        ->setPage(1)
        ->setHitsPerPage(1)
        ->setMatchingStrategy('all');
}

$client   = new \Meilisearch\Client('http://localhost:7700', 'masterKey');
$response = $client->multiSearch($searches);
Enter fullscreen mode Exit fullscreen mode

To find the what are terms we should query for a particular page

$page     = 1;
$per_page = 10;
$offset   = ($page - 1) * $per_page;

$possibleSearchTerms = [];
$count               = 0;
$termKeys            = array_keys($terms);
$next                = $offset + $per_page;
$hasOffset           = false;
while (count($termKeys) > 0) {
    $term            = array_pop($termKeys);
    $neededForOffset = $offset - $count;
    $neededForLimit  = $offset + $per_page - $count;
    $count           = $count + $metadata[$term];

    if ($count > $offset && $count < $offset + $per_page) {
        $possibleSearchTerms[] = [
            'q'      => $term,
            'offset' => $hasOffset ? 0 : $neededForOffset,
        ];
        $hasOffset             = true;
    }

    if ($count >= $offset + $per_page) {
        $possibleSearchTerms[] = [
            'offset' => $hasOffset ? 0 : $neededForOffset,
            'q'      => $term,
            'limit'  => min($neededForLimit, $per_page),
        ];
        break;
    }
}
Enter fullscreen mode Exit fullscreen mode

The $possibleSearchTerms gives the possible search terms to query for the results. Finally performing search to the possible search terms gives the necessary results.

$searches = [];
foreach ($possibleSearchTerms as $term) {
    $searches[] = (new SearchQuery())
        ->setQuery($term['q'])
        ->setAttributesToRetrieve(['id'])
        ->setIndexUid('jobins_data')
        ->setOffset($term['offset'] ?? 0)
        ->setLimit($term['limit'] ?? $per_page)
        ->setSort(['id:desc'])
        ->setMatchingStrategy('all');
}

$response        = $client->multiSearch($searches);
$results['hits'] = collect($response['results'])->pluck('hits')->flatten(1);
Enter fullscreen mode Exit fullscreen mode

The complete example in tinker is given below,

<?php

const BASE_PATH = '/Users/ellite/code/office/api.jobins.test';

require_once BASE_PATH.'/vendor/autoload.php';
use Illuminate\Foundation\Console\Kernel;
use Meilisearch\Contracts\SearchQuery;

$app = require BASE_PATH.'/bootstrap/app.php';
$app->make(Kernel::class)->bootstrap();

/** Data preparation */
config()->set('scout.queue', false);
config()->set('scout.after_commit', false);

class Post extends \Illuminate\Database\Eloquent\Model
{
    use \Laravel\Scout\Searchable;

    protected $table = 'posts';

    protected $fillable = [
        'title',
        'body',
    ];

    public function searchableAs(): string
    {
        return 'posts';
    }

    public function toSearchableArray()
    {
        return $this->toArray();
    }
}

config()->set('scout.meilisearch.index-settings', [
    Post::class => [

    ],
]);

\Illuminate\Support\Facades\Artisan::call('scout:sync-index-settings');

\Illuminate\Support\Facades\Schema::dropIfExists('posts');
\Illuminate\Support\Facades\Schema::create('posts', function (\Illuminate\Database\Schema\Blueprint $table) {
    $table->id();
    $table->string('title');
    $table->text('body')->nullable();
    $table->timestamps();
});

$data = [
    ['title' => 'PHP'],
    ['title' => 'Engineer'],
    ['title' => 'Manager'],
    ['title' => 'PHP Engineer'],
    ['title' => 'PHP Manager'],
    ['title' => 'Engineer Manager'],
    ['title' => 'PHP Engineer Manager'],

    ['title' => 'Engineering'],
];

foreach ($data as $row) {
    $post = Post::query()->create($row);
}

/** End of Data Preparation*/

function formatTerm($data, array $current): string
{
    $data    = array_map(fn($item) => "'$item'", $data);
    $current = array_map(fn($item) => $data[$item], $current);

    $exclude = array_diff($data, $current);

    $current = implode(' ', $current);
    $exclude = implode(' -', $exclude);

    return trim(implode(' -', [$current, $exclude]), ' -');
}

function generateTerm(array $ans, $data, array $current)
{
    $last = end($current);

    if ($last == count($data) - 1) {
        return $ans;
    }

    $nextCurrent   = $current;
    $nextCurrent[] = $last + 1;

    array_pop($current);
    $current[] = $last + 1;

    $ans[formatTerm($data, $nextCurrent)] = count($nextCurrent);
    $ans[formatTerm($data, $current)]     = count($current);

    $ans = generateTerm($ans, $data, $nextCurrent);
    $ans = generateTerm($ans, $data, $current);

    return $ans;
}

$data = ['PHP', 'Engineer', 'manager'];

$ans[formatTerm($data, [0])] = 1;

$terms = generateTerm($ans, $data, [0]);
arsort($terms);


$searches = [];
foreach ($terms as $term => $index) {
    $searches[] = (new SearchQuery())
        ->setQuery($term)
        ->setAttributesToRetrieve(['id'])
        ->setIndexUid('posts')
        ->setPage(1)
        ->setHitsPerPage(1)
        ->setMatchingStrategy('all');
}

$client   = new \Meilisearch\Client('http://localhost:7700', 'masterKey');
$response = $client->multiSearch($searches);
$metadata = collect($response['results'])->pluck('totalHits', 'query');


$page     = 1;
$per_page = 10;
$offset   = ($page - 1) * $per_page;

$count               = 0;
$possibleSearchTerms = [];
$termKeys            = array_keys($terms);
$next                = $offset + $per_page;
$hasOffset           = false;
while (count($termKeys) > 0) {
    $term            = array_pop($termKeys);
    $neededForOffset = $offset - $count;
    $neededForLimit  = $offset + $per_page - $count;
    $count           = $count + $metadata[$term];

    if ($count > $offset && $count < $offset + $per_page) {
        $possibleSearchTerms[] = [
            'q'      => $term,
            'offset' => $hasOffset ? 0 : $neededForOffset,
        ];
        $hasOffset             = true;
    }

    if ($count >= $offset + $per_page) {
        $possibleSearchTerms[] = [
            'offset' => $hasOffset ? 0 : $neededForOffset,
            'q'      => $term,
            'limit'  => min($neededForLimit, $per_page),
        ];
        break;
    }
}

$searches = [];
foreach ($possibleSearchTerms as $term) {
    $searches[] = (new SearchQuery())
        ->setQuery($term['q'])
        ->setAttributesToRetrieve(['id'])
        ->setIndexUid('jobins_data')
        ->setOffset($term['offset'] ?? 0)
        ->setLimit($term['limit'] ?? $per_page)
        ->setSort(['id:desc'])
        ->setMatchingStrategy('all');
}

$response        = $client->multiSearch($searches);
$results['hits'] = collect($response['results'])->pluck('hits')->flatten(1);

dd($results['hits']);
Enter fullscreen mode Exit fullscreen mode

Conclusion

In conclusion, implementing OR search in Meilisearch requires a strategic workaround due to the absence of a native OR operation. By leveraging the principles of set theory, we can craft a solution that involves performing multiple search queries and combining the results. This approach, while computationally intensive with the number of API calls increasing exponentially with the number of search terms, ensures that we accurately retrieve documents that match any combination of the specified terms. The use of multi-search functionality in Meilisearch helps mitigate some of the performance concerns, allowing for simultaneous execution of queries. This method provides a practical solution for complex search scenarios where OR conditions are essential.

References

. . . . . . .
Terabox Video Player