Drawbacks of Traditional WAFs

Traditional Web Application Firewalls (WAFs) often rely on regular expressions to identify attack patterns. For instance, the widely-used ModSecurity engine powers about 80% of WAFs globally. However, the effectiveness of these traditional WAFs is limited by the simplistic nature of regex-based rules.

Consider the following examples from typical WAF rules:

union[\w\s]?select: This rule identifies an SQL injection attack pattern when it detects the presence of "union" and "select" in traffic.
\balert\s(: This rule flags potential XSS attacks when it detects the keyword "alert" followed by a left parenthesis "(".

Real attackers can easily bypass these basic rules, rendering the WAF ineffective. For example:

union/**/select: By inserting a comment character between "union" and "select," the pattern breaks, evading detection.
window'\x61lert': Replacing the letter "a" with "\x61" disrupts the pattern, allowing the attack to slip through.

These examples illustrate how traditional regex-based WAFs can fail to prevent attacks due to the ease with which hackers can bypass the rules.

Additionally, regex-based WAFs often result in high rates of false positives, inadvertently blocking legitimate users. Consider the following scenarios:

The union select members from each department to form a committee: This benign sentence might trigger an SQL injection rule.
Her down on the alert(for the man) and walked into a world of rivers: This innocent phrase could be mistakenly identified as an XSS attack.

These limitations highlight the need for more advanced detection methods, as showcased in studies like AutoSpear: Towards Automatically Bypassing and Inspecting Web Application Firewalls and Web Application Firewalls: Attacking Detection Logic Mechanisms from the Black Hat conference.

How SafeLine Uses Syntax Analysis in WAF

SafeLine WAF takes a different approach, employing a syntax analysis algorithm at its core. Unlike traditional WAFs that rely on simple regex patterns, SafeLine truly understands the user inputs within traffic and thoroughly analyzes potential attack behaviors.

Let's explore this with SQL injection as an example. For an SQL injection attack to succeed, two conditions must be met:

The traffic must contain an SQL statement that is syntactically valid.
- union select xxx from xxx where is a valid SQL statement fragment.
- union select xxx from xxx xxx xxx xxx xxx where is not valid.
- 1 + 1 = 2 is a valid SQL statement fragment.
- 1 + 1 is 2 is not valid.
The SQL statement must exhibit malicious behavior, beyond being merely syntactically correct.
- union select xxx from xxx where has the potential for malicious intent.
- 1 + 1 = 2 has no harmful purpose.

SafeLine's approach to attack detection focuses on these essential aspects of SQL injection attacks. The process is as follows:

Parsing HTTP traffic to identify potential input locations.
Recursively decoding parameters to reach the most original user input.
Checking if the user input conforms to SQL syntax.
Analyzing the potential malicious intent behind the SQL syntax.
Scoring the malicious intent and deciding whether to intercept the request.

SafeLine WAF includes built-in compilers for common programming languages. By deeply decoding HTTP payloads, it matches the appropriate syntax compiler based on the language type, compares it to the threat model, and assigns a threat rating, allowing or blocking requests accordingly.

Why Semantic Analysis is Superior

In computer science, the Chomsky hierarchy categorizes formal languages into four types:

Type 0 Grammar (Unrestricted Grammar): Recognizable by Turing Machines
Type 1 Grammar (Context-Sensitive Grammar): Recognizable by Linear Bounded Automata
Type 2 Grammar (Context-Free Grammar): Recognizable by Pushdown Automata
Type 3 Grammar (Regular Grammar): Recognizable by Finite State Automata

The expressive power of these grammars diminishes from Type 0 to Type 3. Most programming languages, such as SQL, HTML, and JavaScript, are Type 2 grammars, with elements of Type 1 grammars. In contrast, regular expressions fall under Type 3, the weakest in expressive power.

A significant limitation of regular expressions is their inability to count or recognize complex patterns, like matched parentheses. This limitation makes regex-based WAFs inadequate for matching dynamically evolving attack payloads. Regular expressions simply cannot cover the complexity of attack payloads based on programming languages, which is why traditional rule-based WAFs fall short.

Compared to regex-based pattern matching, syntax analysis offers high accuracy and a low false-positive rate, making it a more powerful tool for threat detection.

Finally, I invite you to try out SafeLine for yourself, and join the discussion on Discord and GitHub.

Overcome the Limitations of Traditional WAFs: How SafeLine Enhances Security