Incorrect Regular Expression

Draft Class
Structure: Simple
Description

The product specifies a regular expression in a way that causes data to be improperly matched or compared.

Extended Description

When the regular expression is used in protection mechanisms such as filtering or validation, this may allow an attacker to bypass the intended restrictions on the incoming data.

Common Consequences 2
Scope: Other

Impact: Unexpected StateVaries by Context

When the regular expression is not correctly specified, data might have a different format or type than the rest of the program expects, producing resultant weaknesses or errors.

Scope: Access Control

Impact: Bypass Protection Mechanism

In PHP, regular expression checks can sometimes be bypassed with a null byte, leading to any number of weaknesses.

Detection Methods 1
Automated Static AnalysisHigh
Automated static analysis, commonly referred to as Static Application Security Testing (SAST), can find some instances of this weakness by analyzing source code (or binary/compiled code) without having to execute it. Typically, this is done by building a model of data flow and control flow, then searching for potentially-vulnerable patterns that connect "sources" (origins of input) with "sinks" (destinations where the data interacts with external components, a lower layer such as the OS, etc.)
Potential Mitigations 1
Phase: Architecture and Design

Strategy: Refactoring

Regular expressions can become error prone when defining a complex language even for those experienced in writing grammars. Determine if several smaller regular expressions simplify one large regular expression. Also, subject the regular expression to thorough testing techniques such as equivalence partitioning, boundary value analysis, and robustness. After testing and a reasonable confidence level is achieved, a regular expression may not be foolproof. If an exploit is allowed to slip through, then record the exploit and refactor the regular expression.
Demonstrative Examples 2

ID : DX-37

The following code takes phone numbers as input, and uses a regular expression to reject invalid phone numbers.

Code Example:

Bad
Perl
perl

looks like it only has hyphens and digits*

perl
An attacker could provide an argument such as: "; ls -l ; echo 123-456" This would pass the check, since "123-456" is sufficient to match the "\d+-\d+" portion of the regular expression.

ID : DX-154

This code uses a regular expression to validate an IP string prior to using it in a call to the "ping" command.

Code Example:

Bad
Python
python

The ping command treats zero-prepended IP addresses as octal*

python
Since the regular expression does not have anchors (Regular Expression without Anchors), i.e. is unbounded without ^ or $ characters, then prepending a 0 or 0x to the beginning of the IP address will still result in a matched regex pattern. Since the ping command supports octal and hex prepended IP addresses, it will use the unexpectedly valid IP address (Incorrect Parsing of Numbers with Different Radices). For example, "0x63.63.63.63" would be considered equivalent to "99.63.63.63". As a result, the attacker could potentially ping systems that the attacker cannot reach directly.
Observed Examples 11
CVE-2002-2109Regexp isn't "anchored" to the beginning or end, which allows spoofed values that have trusted values as substrings.
CVE-2005-1949Regexp for IP address isn't anchored at the end, allowing appending of shell metacharacters.
CVE-2001-1072Bypass access restrictions via multiple leading slash, which causes a regular expression to fail.
CVE-2000-0115Local user DoS via invalid regular expressions.
CVE-2002-1527chain: Malformed input generates a regular expression error that leads to information exposure.
CVE-2005-1061Certain strings are later used in a regexp, leading to a resultant crash.
CVE-2005-2169MFV. Regular expression intended to protect against directory traversal reduces ".../...//" to "../".
CVE-2005-0603Malformed regexp syntax leads to information exposure in error message.
CVE-2005-1820Code injection due to improper quoting of regular expression.
CVE-2005-3153Null byte bypasses PHP regexp check.
CVE-2005-4155Null byte bypasses PHP regexp check.
References 1
Writing Secure Code
Michael Howard and David LeBlanc
Microsoft Press
04-12-2002
ID: REF-7
Applicable Platforms
Languages:
Not Language-Specific : Undetermined
Modes of Introduction
Implementation
Taxonomy Mapping
  • PLOVER
Notes
RelationshipWhile there is some overlap with allowlist/denylist problems, this entry is intended to deal with incorrectly written regular expressions, regardless of their intended use. Not every regular expression is intended for use as an allowlist or denylist. In addition, allowlists and denylists can be implemented using other mechanisms besides regular expressions.
Research GapRegexp errors are likely a primary factor in many MFVs, especially those that require multiple manipulations to exploit. However, they are rarely diagnosed at this level of detail.