Fuzzy Matching with Trigram Search: Building Intelligent Search in Node.js and MySQL

Mukesh Rajbanshi - Sep 17 - - Dev Community

Introduction
In modern applications, efficient text search is crucial, especially when dealing with large databases. While MySQL provides basic full-text search capabilities, it falls short when it comes to fuzzy matching or handling misspellings. This is where trigram-based search comes into play. In this blog, we’ll explore what a trigram is, how it improves search performance, and how you can implement trigram search in MySQL.

What is a Trigram?
A trigram is a sequence of three consecutive characters from a given string. For instance, the word "search" can be broken down into the following trigrams:

  • sea
  • ear
  • arc
  • rch By breaking down words into trigrams, we can perform more flexible and efficient text matching, especially when trying to match incomplete or slightly misspelled terms.

Implementing Trigram Search in MySQL By Creating Temporary Table

1.Create Trigram Function in MySQL database. Trigram function code:

CREATE FUNCTION TRIGRAM_SEARCH(search_string VARCHAR(255), target_string VARCHAR(255)) 
RETURNS FLOAT
DETERMINISTIC
BEGIN
    DECLARE i INT DEFAULT 1;
    DECLARE total_trigrams INT DEFAULT 0;
    DECLARE matched_trigrams INT DEFAULT 0;
    DECLARE search_length INT;
    DECLARE target_length INT;

    SET search_length = CHAR_LENGTH(search_string);
    SET target_length = CHAR_LENGTH(target_string);

    -- Handle edge cases where strings are too short
    IF search_length < 3 OR target_length < 3 THEN
        RETURN 0;
    END IF;

    -- Create temporary tables to store trigrams
    CREATE TEMPORARY TABLE search_trigrams (trigram VARCHAR(3));
    CREATE TEMPORARY TABLE target_trigrams (trigram VARCHAR(3));

    -- Insert trigrams from search_string into temporary table
    WHILE i <= search_length - 2 DO
        INSERT INTO search_trigrams VALUES (SUBSTRING(search_string, i, 3));
        SET i = i + 1;
    END WHILE;

    -- Reset index for target_string
    SET i = 1;

    -- Insert trigrams from target_string into temporary table
    WHILE i <= target_length - 2 DO
        INSERT INTO target_trigrams VALUES (SUBSTRING(target_string, i, 3));
        SET i = i + 1;
    END WHILE;

    -- Count the number of matching trigrams
    SELECT COUNT(DISTINCT t1.trigram) INTO matched_trigrams
    FROM search_trigrams t1
    JOIN target_trigrams t2 ON t1.trigram = t2.trigram;

    -- Count the total number of trigrams in search_string
    SELECT COUNT(DISTINCT trigram) INTO total_trigrams FROM search_trigrams;

    -- Drop temporary tables
    DROP TEMPORARY TABLE search_trigrams;
    DROP TEMPORARY TABLE target_trigrams;

    -- Return similarity score (0 to 1)
    IF total_trigrams > 0 THEN
        RETURN matched_trigrams / total_trigrams;
    ELSE
        RETURN 0;
    END IF;
END;
Enter fullscreen mode Exit fullscreen mode

2.Now Indexing Desired Column to full-text

@Entity()
@Index(['title'], { fulltext: true })
export class Ebook extends BaseEntity {
  @PrimaryGeneratedColumn()
  ebookId: number;

  @Column({ nullable: true })
  title: string;
}
Enter fullscreen mode Exit fullscreen mode

3.Testing of Trigram Search Function

select *
FROM ebook e 
WHERE TRIGRAM_SEARCH('physis onlu', e.title) > 0.4
ORDER BY TRIGRAM_SEARCH('physis onlu', e.title) desc;

Enter fullscreen mode Exit fullscreen mode

4.Implement trigram search in code

 async find(title?: string) {
        const eBooks = await this.dataSource
      .getRepository(Ebook)
      .createQueryBuilder('eBook');
    if (title) {
      eBooks.where(`TRIGRAM_SEARCH(:title, eBook.title) > 0.4`, { title });
    }
    const result = await eBooks.getMany();
    return result;
  }
Enter fullscreen mode Exit fullscreen mode

Conclusion
Trigram search offers a powerful way to implement fuzzy matching in MySQL databases. By breaking down text into trigrams, we can perform more flexible and forgiving searches, greatly enhancing the user experience in applications where text search is crucial.
While this approach has its strengths, it's important to consider alternatives like Levenshtein distance or soundex algorithms depending on your specific use case and performance requirements.
By implementing trigram search, you can significantly improve the search capabilities of your Node.js and MySQL applications, providing users with more intelligent and forgiving search results.

.
Terabox Video Player