Inefficient Regular Expression Complexity

Draft Base
Structure: Simple
Description

The product uses a regular expression with an inefficient, possibly exponential worst-case computational complexity that consumes excessive CPU cycles.

Extended Description

Some regular expression engines have a feature called "backtracking". If the token cannot match, the engine "backtracks" to a position that may result in a different token that can match. Backtracking becomes a weakness if all of these conditions are met: - The number of possible backtracking attempts are exponential relative to the length of the input. - The input can fail to match the regular expression. - The input can be long enough. Attackers can create crafted inputs that intentionally cause the regular expression to use excessive backtracking in a way that causes the CPU consumption to spike.

Common Consequences 1
Scope: Availability

Impact: DoS: Resource Consumption (CPU)

Potential Mitigations 4
Phase: Architecture and Design
Use regular expressions that do not support backtracking, e.g. by removing nested quantifiers.

Effectiveness: High

Phase: System Configuration
Set backtracking limits in the configuration of the regular expression implementation, such as PHP's pcre.backtrack_limit. Also consider limits on execution time for the process.

Effectiveness: Moderate

Phase: Implementation
Do not use regular expressions with untrusted input. If regular expressions must be used, avoid using backtracking in the expression.

Effectiveness: High

Phase: Implementation
Limit the length of the input that the regular expression will process.

Effectiveness: Moderate

Demonstrative Examples 2

ID : DX-158

This example attempts to check if an input string is a "sentence" [REF-1164].

Code Example:

Bad
JavaScript

var test_string = "Bad characters: $@#"; var bad_pattern = /^(\w+\s?)*$/i; var result = test_string.search(bad_pattern);

The regular expression has a vulnerable backtracking clause inside (\w+\s?)*$ which can be triggered to cause a Denial of Service by processing particular phrases. To fix the backtracking problem, backtracking is removed with the ?= portion of the expression which changes it to a lookahead and the \2 which prevents the backtracking. The modified example is:

Code Example:

Good
JavaScript

var test_string = "Bad characters: $@#"; var good_pattern = /^((?=(\w+))\2\s?)*$/i; var result = test_string.search(good_pattern);

Note that [REF-1164] has a more thorough (and lengthy) explanation of everything going on within the RegEx.
This example attempts to check if an input string is a "sentence" and is modified for Perl [REF-1164].

Code Example:

Bad
Perl

my $test_string = "Bad characters: $@#"; my $bdrslt = $test_string; $bdrslt =~ /^(\w+\s?)*$/i;

The regular expression has a vulnerable backtracking clause inside (\w+\s?)*$ which can be triggered to cause a Denial of Service by processing particular phrases. To fix the backtracking problem, backtracking is removed with the ?= portion of the expression which changes it to a lookahead and the \2 which prevents the backtracking. The modified example is:

Code Example:

Good
Perl

my $test_string = "Bad characters: $@#"; my $gdrslt = $test_string; $gdrslt =~ /^((?=(\w+))\2\s?)*$/i;

Note that [REF-1164] has a more thorough (and lengthy) explanation of everything going on within the RegEx.
Observed Examples 8
CVE-2020-5243server allows ReDOS with crafted User-Agent strings, due to overlapping capture groups that cause excessive backtracking.
CVE-2021-21317npm package for user-agent parser prone to ReDoS due to overlapping capture groups
CVE-2019-16215Markdown parser uses inefficient regex when processing a message, allowing users to cause CPU consumption and delay preventing processing of other messages.
CVE-2019-6785Long string in a version control product allows DoS due to an inefficient regex.
CVE-2019-12041Javascript code allows ReDoS via a long string due to excessive backtracking.
CVE-2015-8315ReDoS when parsing time.
CVE-2015-8854ReDoS when parsing documents.
CVE-2017-16021ReDoS when validating URL.
References 7
Runaway Regular Expressions: Catastrophic Backtracking
Jan Goyvaerts
22-12-2019
ID: REF-1162
Regular expression Denial of Service - ReDoS
Adar Weidman
ID: REF-1163
Catastrophic backtracking
Ilya Kantor
13-12-2020
ID: REF-1164
Freezing the Web: A Study of ReDoS Vulnerabilities in JavaScript-based Web Servers
Cristian-Alexandru Staicu and Michael Pradel
USENIX Security Symposium
11-07-2018
ID: REF-1165
The Impact of Regular Expression Denial of Service (ReDoS) in Practice: An Empirical Study at the Ecosystem Scale
James C. Davis, Christy A. Coghlan, Francisco Servant, and Dongyoon Lee
01-08-2018
ID: REF-1166
The Regular Expression Denial of Service (ReDoS) cheat-sheet
James Davis
23-05-2020
ID: REF-1167
Likelihood of Exploit

High

Applicable Platforms
Languages:
Not Language-Specific : Undetermined
Modes of Introduction
Implementation
Alternate Terms

ReDoS

ReDoS is an abbreviation of "Regular expression Denial of Service".

Regular Expression Denial of Service

While this term is attack-focused, this is commonly used to describe the weakness.

Catastrophic backtracking

This term is used to describe the behavior of the regular expression as a negative technical impact.