How I Built a Profanity Blocking JavaScript Library
Introduction
As developers, we often come across situations where we need to filter and sanitize text to block or remove profanity. To tackle this problem, I decided to create a JavaScript library called bc-ProfanityBlock
. In this article, I will walk you through the steps I took to build this library and explain how it can be used to effectively block profanity in your applications.
Step 1: Defining the Problem
The first step in building any library is to clearly define the problem we are trying to solve. In this case, the problem was to create, well, a solution that could detect and handle profanity in text. Profanity can be present in various forms, including common words, variations, and even with evasion characters. The goal was to build a library that could efficiently detect and sanitize such content.
Step 2: Research and Planning
Before diving into the implementation, I conducted thorough research on existing profanity blocking techniques and libraries. Among these were
https://github.com/2Toad/Profanity
and
https://www.npmjs.com/package/bad-words
(both fantastic libraries!).
This helped me understand the different approaches and challenges involved. Based on my research, I decided to use a combination of encoded bad words and evasion pattern detection to build an effective solution.
Step 3: Designing the Architecture
Next, I designed the architecture of the library. I created a ContentFilterBadWord
class that would encapsulate all the necessary methods and properties for filtering and cleaning text. The class would have functions for decoding encoded bad words, normalizing text with evasion patterns, checking if text contains bad words, and cleaning text by replacing or removing bad words.
Step 4: Implementing the Functionality
With the architecture in place, I started implementing the functionality of the library. I created methods to decode base64 encoded bad words, normalize text with evasion patterns, and check if text contains bad words. I also added options to match bad words as whole words and detect evasion characters and separators. Lastly, I implemented functions to clean text by replacing or removing bad words.
Step 5: Testing and Optimization
Once the functionality was implemented, I conducted extensive testing to ensure the library was working as expected. I created test cases with different scenarios, including common bad words, variations, and evasion techniques. I also tested the library's performance with large volumes of text. Based on the test results, I made optimizations to improve the speed and accuracy of the library. (There are some very minor bugs I am still working on).
Now, Let's take a look at the code
Step 1: Define a class (ContentFilterBadWord
)
In JavaScript, a class is a blueprint or template for creating objects that share similar properties and behaviors. It provides a way to define the structure and behavior of an object.
To create a class in JavaScript, you can use the class
keyword followed by the name of the class. Here's an example:
class Person {
constructor(name, age) {
this.name = name;
this.age = age;
}
greet() {
console.log(`Hello, my name is ${this.name} and I'm ${this.age} years old.`);
}
}
In the above example, we define a Person
class with a constructor and a greet
method. The constructor is a special method that gets called when a new object is created from the class. It is used to initialize the object's properties.
To create an instance of a class, you can use the new
keyword followed by the class name with parentheses. Here's an example:
const person1 = new Person("John", 25);
const person2 = new Person("Jane", 30);
person1.greet(); // Output: Hello, my name is John and I'm 25 years old.
person2.greet(); // Output: Hello, my name is Jane and I'm 30 years old.
In the above example, we create two instances of the Person
class and call the greet
method on each instance.
Using classes in JavaScript allows you to create reusable and organized code by encapsulating related properties and behaviors within a single class.
In this case, we define a class called ContentFilterBadWord
. In our constructor, we put our bad word list (Base64 encoded, so we can't just read them) and our evasion patterns. Now, we need to add some functions to our class. See below:
class ContentFilterBadWord {
constructor() {
// Base64 encoded bad words
this.encodedCussWords = [
"AAAAAA",
"BBBBBB",
"CCCCCC",
];
this.evasionPatterns = [
{ pattern: /4/gi, replacement: "a" },
{ pattern: /\$/gi, replacement: "s" },
{ pattern: /5/gi, replacement: "s" },
{ pattern: /0/gi, replacement: "o" },
{ pattern: /1/gi, replacement: "i" },
{ pattern: /!/gi, replacement: "i" },
{ pattern: /@/gi, replacement: "a" },
];
}
Step 2: decodeBase64
This one is pretty simple.
decodeBase64(encodedString) {
return atob(encodedString);
}
Step 3: normalizeText
This one is also pretty simple. The code defines a normalizeText
function that takes a text parameter and applies evasion patterns to normalize the text by replacing specified patterns with their replacements. It uses the evasionPatterns
array to iterate through each pattern and replacement and apply the replacements to the text.
normalizeText(text) {
// Apply evasion patterns to normalize text
this.evasionPatterns.forEach(({ pattern, replacement }) => {
text = text.replace(pattern, replacement);
});
return text;
}
Step 4: containsBadWords
The function containsBadWords
accepts four parameters:
-
text
: The string to be checked for bad words. -
matchWord
(default false): A boolean indicating whether to match only whole words. -
detectEvasionCharacters
(default true): A boolean indicating whether to normalize the text for character evasion attempts (like using "@" instead of "a"). -
detectEvasionSeperators
(default true): A boolean indicating whether to remove certain separators or spaces that might be used to disguise bad words.
The function begins by decoding an array of base64-encoded bad words (this.encodedCussWords) to their original form for comparison.
If detectEvasionCharacters
is true, the function applies a series of patterns (defined in this.evasionPatterns) to replace evasion characters in text with their normal counterparts.
If detectEvasionseperators
is true, the function removes common separators (like hyphens, underscores, and periods) from the text. It then goes further to remove spaces between the letters of each bad word within the text, to catch cases where spaces are used to evade detection.
After normalization, the function logs the normalized text to the console.
Finally, it uses the Array.prototype.some method
to check if any bad words are present in the normalized text. It does this by creating a regular expression for each bad word. If matchWord
is true, it ensures that only whole words are matched by using word boundaries (\b
). Otherwise, it matches the bad word as a substring anywhere in the text. The function returns true if any bad word is detected, and false otherwise.
containsBadWords(
text,
matchWord = false,
detectEvasionCharacters = true,
detectEvasionSeperators = true
) {
// Decode bad words for comparison
const cussWords = this.encodedCussWords.map((encodedWord) =>
this.decodeBase64(encodedWord)
);
// Normalize text to catch evasion attempts
let normalizedText = text;
if (detectEvasionCharacters) {
// Apply evasion patterns to normalize text
this.evasionPatterns.forEach(({ pattern, replacement }) => {
normalizedText = normalizedText.replace(pattern, replacement);
});
}
if (detectEvasionSeperators) {
// Remove common separators between letters
normalizedText = normalizedText.replace(/[-_.]/g, "");
// Remove spaces between letters only for bad words
cussWords.forEach((cussWord) => {
// Create a dynamic regular expression that matches the bad word with any spaces between the letters
let wordRegex = new RegExp(cussWord.split("").join("\\s*"), "gi");
// Replace the matched substring with the bad word without spaces
normalizedText = normalizedText.replace(wordRegex, (match) => {
return match.replace(/\s/g, "");
});
});
}
Step 5: cleanText
Here is a breakdown of the function cleanText
:
Parameters:
-
text
: The text to be cleaned. -
method
(default "replace"): The method to use for cleaning the text. Can be either "replace" or "remove". -
detectEvasionCharacters
(default true): A boolean indicating whether to normalize the text for character evasion attempts (like using "@" instead of "a"). -
detectEvasionSeparators
(default true): A boolean indicating whether to remove certain separators or spaces that might be used to disguise bad words.
Function Body:
-
Initialization:
- The function creates a new variable
cleanedText
and assigns it the value of the inputtext
.
- The function creates a new variable
-
Evasion Character Detection (if enabled):
- If
detectEvasionCharacters
is true, the function calls thenormalizeText
function (not provided in the snippet) to replace evasion characters incleanedText
with their normal counterparts.
- If
-
Evasion Separator Detection (if enabled):
- If
detectEvasionSeparators
is true:- The function removes common separators (like hyphens, underscores, and periods) from
cleanedText
using a regular expression[-_.]/g
. - It iterates over an array of base64-encoded bad words (
this.encodedCussWords
). - For each encoded word:
- It decodes the word using
this.decodeBase64
(not provided in the snippet). - It creates a regular expression object
wordRegex
for the decoded word, with the flagsg
(global) andi
(case-insensitive). - Based on the
method
value:- If
method
is "replace": - The function replaces all occurrences of the bad word in
cleanedText
with the same number of asterisks using a callback function. - If
method
is "remove": - The function replaces all occurrences of the bad word in
cleanedText
with an empty string.
- If
- The function removes common separators (like hyphens, underscores, and periods) from
- If
-
Return:
- The function returns the cleaned text
cleanedText
.
- The function returns the cleaned text
Overall, this function takes text as input and cleans it by removing or replacing bad words. It can optionally handle evasion attempts by normalizing characters and separators.
cleanText(
text,
method = "replace",
detectEvasionCharacters = true,
detectEvasionSeparators = true
) {
let cleanedText = text;
if (detectEvasionCharacters) {
cleanedText = this.normalizeText(cleanedText);
}
if (detectEvasionSeparators) {
cleanedText = cleanedText.replace(/[-_.]/g, "");
this.encodedCussWords.forEach((encodedWord) => {
const cussWord = this.decodeBase64(encodedWord);
let wordRegex;
wordRegex = new RegExp(cussWord, "gi");
if (method === "replace") {
cleanedText = cleanedText.replace(wordRegex, (match) => {
return match.replace(/\S/g, "*");
});
} else if (method === "remove") {
cleanedText = cleanedText.replace(wordRegex, "");
}
});
}
return cleanedText;
}
That's it as far as code!
How to use...
If you are interested in the Usage docs, see https://github.com/The-Best-Codes/bc-ProfanityBlock.
Conclusion
In this article, I shared the process of building the bc-ProfanityBlock
JavaScript library for blocking profanity in text. By combining encoded bad words and evasion pattern detection, the library provides an efficient and effective solution for filtering and sanitizing content. Whether you are building a social media platform, chat application, or any other system where content moderation is important, this library can be a valuable addition to your toolkit.
You can find the complete source code and documentation for the ContentFilterBadWord
library on GitHub. I hope this article has been informative and encourages you to explore the world of content moderation in your applications.
If you have any questions or feedback, please feel free to reach out to me via email at best-codes@proton.me.
Happy coding!
Some content in this article is generated by the BestCodes AI.
Article by Best_codes.