Improper Control of Generation of Code ('Code Injection')

Draft Base
Structure: Simple
Description

The product constructs all or part of a code segment using externally-influenced input from an upstream component, but it does not neutralize or incorrectly neutralizes special elements that could modify the syntax or behavior of the intended code segment.

The product constructs all or part of a code segment using externally-influenced input from an upstream component, but it does not neutralize or incorrectly neutralizes special elements that could modify the syntax or behavior of the intended code segment.
Common Consequences 4
Scope: Access Control

Impact: Bypass Protection Mechanism

In some cases, injectable code controls authentication; this may lead to a remote vulnerability.

Scope: Access Control

Impact: Gain Privileges or Assume Identity

Injected code can access resources that the attacker is directly prevented from accessing.

Scope: IntegrityConfidentialityAvailability

Impact: Execute Unauthorized Code or Commands

When a product allows a user's input to contain code syntax, it might be possible for an attacker to craft the code in such a way that it will alter the intended control flow of the product. As a result, code injection can often result in the execution of arbitrary code. Code injection attacks can also lead to loss of data integrity in nearly all cases, since the control-plane data injected is always incidental to data recall or writing.

Scope: Non-Repudiation

Impact: Hide Activities

Often the actions performed by injected control code are unlogged.

Detection Methods 1
Automated Static AnalysisHigh
Automated static analysis, commonly referred to as Static Application Security Testing (SAST), can find some instances of this weakness by analyzing source code (or binary/compiled code) without having to execute it. Typically, this is done by building a model of data flow and control flow, then searching for potentially-vulnerable patterns that connect "sources" (origins of input) with "sinks" (destinations where the data interacts with external components, a lower layer such as the OS, etc.)
Potential Mitigations 8
Phase: Architecture and Design
Refactor your program so that you do not have to dynamically generate code.
Phase: Architecture and Design
Run your code in a "jail" or similar sandbox environment that enforces strict boundaries between the process and the operating system. This may effectively restrict which code can be executed by your product. Examples include the Unix chroot jail and AppArmor. In general, managed code may provide some protection. This may not be a feasible solution, and it only limits the impact to the operating system; the rest of your application may still be subject to compromise. Be careful to avoid Creation of chroot Jail Without Changing Working Directory and other weaknesses related to jails.
Phase: Implementation

Strategy: Input Validation

Assume all input is malicious. Use an "accept known good" input validation strategy, i.e., use a list of acceptable inputs that strictly conform to specifications. Reject any input that does not strictly conform to specifications, or transform it into something that does. When performing input validation, consider all potentially relevant properties, including length, type of input, the full range of acceptable values, missing or extra inputs, syntax, consistency across related fields, and conformance to business rules. As an example of business rule logic, "boat" may be syntactically valid because it only contains alphanumeric characters, but it is not valid if the input is only expected to contain colors such as "red" or "blue." Do not rely exclusively on looking for malicious or malformed inputs. This is likely to miss at least one undesirable input, especially if the code's environment changes. This can give attackers enough room to bypass the intended validation. However, denylists can be useful for detecting potential attacks or determining which inputs are so malformed that they should be rejected outright. To reduce the likelihood of code injection, use stringent allowlists that limit which constructs are allowed. If you are dynamically constructing code that invokes a function, then verifying that the input is alphanumeric might be insufficient. An attacker might still be able to reference a dangerous function that you did not intend to allow, such as system(), exec(), or exit().
Phase: Testing
Use automated static analysis tools that target this type of weakness. Many modern techniques use data flow analysis to minimize the number of false positives. This is not a perfect solution, since 100% accuracy and coverage are not feasible.
Phase: Testing
Use dynamic tools and techniques that interact with the product using large test suites with many diverse inputs, such as fuzz testing (fuzzing), robustness testing, and fault injection. The product's operation may slow down, but it should not become unstable, crash, or generate incorrect results.
Phase: Operation

Strategy: Compilation or Build Hardening

Run the code in an environment that performs automatic taint propagation and prevents any command execution that uses tainted variables, such as Perl's "-T" switch. This will force the program to perform validation steps that remove the taint, although you must be careful to correctly validate your inputs so that you do not accidentally mark dangerous inputs as untainted (see Permissive List of Allowed Inputs and Incomplete List of Disallowed Inputs).
Phase: Operation

Strategy: Environment Hardening

Run the code in an environment that performs automatic taint propagation and prevents any command execution that uses tainted variables, such as Perl's "-T" switch. This will force the program to perform validation steps that remove the taint, although you must be careful to correctly validate your inputs so that you do not accidentally mark dangerous inputs as untainted (see Permissive List of Allowed Inputs and Incomplete List of Disallowed Inputs).
Phase: Implementation
For Python programs, it is frequently encouraged to use the ast.literal_eval() function instead of eval, since it is intentionally designed to avoid executing code. However, an adversary could still cause excessive memory or stack consumption via deeply nested structures [REF-1372], so the python documentation discourages use of ast.literal_eval() on untrusted data [REF-1373].

Effectiveness: Discouraged Common Practice

Demonstrative Examples 3

ID : DX-32

This example attempts to write user messages to a message file and allow users to view them.

Code Example:

Bad
PHP
php
While the programmer intends for the MessageFile to only include data, an attacker can provide a message such as:

Code Example:

Attack
bash
which will decode to the following:

Code Example:

Attack
bash
The programmer thought they were just including the contents of a regular data file, but PHP parsed it and executed the code. Now, this code is executed any time people view messages.

ID : DX-31

edit-config.pl: This CGI script is used to modify settings in a configuration file.

Code Example:

Bad
Perl
perl

code to add a field/key to a file goes here*

perl
perl

code to delete key from a particular file goes here*

perl
perl
The script intends to take the 'action' parameter and invoke one of a variety of functions based on the value of that parameter - config_file_add_key(), config_file_set_key(), or config_file_delete_key(). It could set up a conditional to invoke each function separately, but eval() is a powerful way of doing the same thing in fewer lines of code, especially when a large number of functions or variables are involved. Unfortunately, in this case, the attacker can provide other values in the action parameter, such as:

Code Example:

Attack
bash
This would produce the following string in handleConfigAction():

Code Example:

Result
bash
Any arbitrary Perl code could be added after the attacker has "closed off" the construction of the original function call, in order to prevent parsing errors from causing the malicious eval() to fail before the attacker's payload is activated. This particular manipulation would fail after the system() call, because the "_key(\$fname, \$key, \$val)" portion of the string would cause an error, but this is irrelevant to the attack because the payload has already been activated.

ID : DX-156

This simple script asks a user to supply a list of numbers as input and adds them together.

Code Example:

Bad
Python
python
The eval() function can take the user-supplied list and convert it into a Python list object, therefore allowing the programmer to use list comprehension methods to work with the data. However, if code is supplied to the eval() function, it will execute that code. For example, a malicious user could supply the following string:

Code Example:

Attack
bash
This would delete all the files in the current directory. For this reason, it is not recommended to use eval() with untrusted input.
A way to accomplish this without the use of eval() is to apply an integer conversion on the input within a try/except block. If the user-supplied input is not numeric, this will raise a ValueError. By avoiding eval(), there is no opportunity for the input string to be executed as code.

Code Example:

Good
Python
python
An alternative, commonly-cited mitigation for this kind of weakness is to use the ast.literal_eval() function, since it is intentionally designed to avoid executing code. However, an adversary could still cause excessive memory or stack consumption via deeply nested structures [REF-1372], so the python documentation discourages use of ast.literal_eval() on untrusted data [REF-1373].
Observed Examples 22
CVE-2023-29374Math component in an LLM framework translates user input into a Python expression that is input into the Python exec() method, allowing code execution - one variant of a "prompt injection" attack.
CVE-2024-5565Python-based library uses an LLM prompt containing user input to dynamically generate code that is then fed as input into the Python exec() method, allowing code execution - one variant of a "prompt injection" attack.
CVE-2024-4181Framework for LLM applications allows eval injection via a crafted response from a hosting provider.
CVE-2022-2054Python compiler uses eval() to execute malicious strings as Python code.
CVE-2021-22204Chain: regex in EXIF processor code does not correctly determine where a string ends (Permissive Regular Expression), enabling eval injection (Improper Neutralization of Directives in Dynamically Evaluated Code ('Eval Injection')), as exploited in the wild per CISA KEV.
CVE-2020-8218"Code injection" in VPN product, as exploited in the wild per CISA KEV.
CVE-2008-5071Eval injection in PHP program.
CVE-2002-1750Eval injection in Perl program.
CVE-2008-5305Eval injection in Perl program using an ID that should only contain hyphens and numbers.
CVE-2002-1752Direct code injection into Perl eval function.
CVE-2002-1753Eval injection in Perl program.
CVE-2005-1527Direct code injection into Perl eval function.
CVE-2005-2837Direct code injection into Perl eval function.
CVE-2005-1921MFV. code injection into PHP eval statement using nested constructs that should not be nested.
CVE-2005-2498MFV. code injection into PHP eval statement using nested constructs that should not be nested.
CVE-2005-3302Code injection into Python eval statement from a field in a formatted file.
CVE-2007-1253Eval injection in Python program.
CVE-2001-1471chain: Resultant eval injection. An invalid value prevents initialization of variables, which can be modified by attacker and later injected into PHP eval statement.
CVE-2002-0495Perl code directly injected into CGI library file from parameters to another CGI program.
CVE-2005-1876Direct PHP code injection into supporting template file.
CVE-2005-1894Direct code injection into PHP script that can be accessed by attacker.
CVE-2003-0395PHP code from User-Agent HTTP header directly inserted into log file implemented as PHP script.
References 3
24 Deadly Sins of Software Security
Michael Howard, David LeBlanc, and John Viega
McGraw-Hill
2010
ID: REF-44
How ast.literal_eval can cause memory exhaustion
Reddit
14-12-2022
ID: REF-1372
ast - Abstract Syntax Trees
Python
02-11-2023
ID: REF-1373
Likelihood of Exploit

Medium

Applicable Platforms
Languages:
Interpreted : Sometimes
Technologies:
AI/ML : Undetermined
Modes of Introduction
Implementation
Alternate Terms

Code Injection

Taxonomy Mapping
  • PLOVER
  • ISA/IEC 62443
  • ISA/IEC 62443
  • ISA/IEC 62443
  • ISA/IEC 62443
Notes
Theoretical Injection problems encompass a wide variety of issues -- all mitigated in very different ways. For this reason, the most effective way to discuss these weaknesses is to note the distinct features that classify them as injection weaknesses. The most important issue to note is that all injection problems share one thing in common -- i.e., they allow for the injection of control plane data into the user-controlled data plane. This means that the execution of the process may be altered by sending code in through legitimate data channels, using no other mechanism. While buffer overflows, and many other flaws, involve the use of some further issue to gain execution, injection problems need only for the data to be parsed. The most classic instantiations of this category of weakness are SQL injection and format string vulnerabilities.