Improper Input Validation

Stable Class

Structure: Simple

Description

The product receives input or data, but it does not validate or incorrectly validates that the input has the properties that are required to process the data safely and correctly.

Extended Description

Input validation is a frequently-used technique for checking potentially dangerous inputs in order to ensure that the inputs are safe for processing within the code, or when communicating with other components. Input can consist of: - raw data - strings, numbers, parameters, file contents, etc. - metadata - information about the raw data, such as headers or size Data can be simple or structured. Structured data can be composed of many nested layers, composed of combinations of metadata and raw data, with other simple or structured data. Many properties of raw data or metadata may need to be validated upon entry into the code, such as: - specified quantities such as size, length, frequency, price, rate, number of operations, time, etc. - implied or derived quantities, such as the actual size of a file instead of a specified size - indexes, offsets, or positions into more complex data structures - symbolic keys or other elements into hash tables, associative arrays, etc. - well-formedness, i.e. syntactic correctness - compliance with expected syntax - lexical token correctness - compliance with rules for what is treated as a token - specified or derived type - the actual type of the input (or what the input appears to be) - consistency - between individual data elements, between raw data and metadata, between references, etc. - conformance to domain-specific rules, e.g. business logic - equivalence - ensuring that equivalent inputs are treated the same - authenticity, ownership, or other attestations about the input, e.g. a cryptographic signature to prove the source of the data Implied or derived properties of data must often be calculated or inferred by the code itself. Errors in deriving properties may be considered a contributing factor to improper input validation.

Common Consequences 3

Scope: Availability

Impact: DoS: Crash, Exit, or RestartDoS: Resource Consumption (CPU)DoS: Resource Consumption (Memory)

An attacker could provide unexpected values and cause a program crash or arbitrary control of resource allocation, leading to excessive consumption of resources such as memory and CPU.

Scope: Confidentiality

Impact: Read MemoryRead Files or Directories

An attacker could read confidential data if they are able to control resource references.

Scope: IntegrityConfidentialityAvailability

Impact: Modify MemoryExecute Unauthorized Code or Commands

An attacker could use malicious input to modify data or possibly alter control flow in unexpected ways, including arbitrary command execution.

Detection Methods 10

Automated Static Analysis

Some instances of improper input validation can be detected using automated static analysis. A static analysis tool might allow the user to specify which application-specific methods or functions perform input validation; the tool might also have built-in knowledge of validation frameworks such as Struts. The tool may then suppress or de-prioritize any associated warnings. This allows the analyst to focus on areas of the software in which input validation does not appear to be present. Except in the cases described in the previous paragraph, automated static analysis might not be able to recognize when proper input validation is being performed, leading to false positives - i.e., warnings that do not have any security consequences or require any code changes.

Manual Static Analysis

When custom input validation is required, such as when enforcing business rules, manual analysis is necessary to ensure that the validation is properly implemented.

Fuzzing

Fuzzing techniques can be useful for detecting input validation errors. When unexpected inputs are provided to the software, the software should not crash or otherwise become unstable, and it should generate application-controlled error messages. If exceptions or interpreter-generated error messages occur, this indicates that the input was not detected and handled within the application logic itself.

Automated Static Analysis - Binary or BytecodeSOAR Partial

According to SOAR [REF-1479], the following detection techniques may be useful: ``` Cost effective for partial coverage: ``` Bytecode Weakness Analysis - including disassembler + source code weakness analysis Binary Weakness Analysis - including disassembler + source code weakness analysis

Manual Static Analysis - Binary or BytecodeSOAR Partial

According to SOAR [REF-1479], the following detection techniques may be useful: ``` Cost effective for partial coverage: ``` Binary / Bytecode disassembler - then use manual analysis for vulnerabilities & anomalies

Dynamic Analysis with Automated Results InterpretationHigh

According to SOAR [REF-1479], the following detection techniques may be useful: ``` Highly cost effective: ``` Web Application Scanner Web Services Scanner Database Scanners

Dynamic Analysis with Manual Results InterpretationHigh

According to SOAR [REF-1479], the following detection techniques may be useful: ``` Highly cost effective: ``` Fuzz Tester Framework-based Fuzzer ``` Cost effective for partial coverage: ``` Host Application Interface Scanner Monitored Virtual Environment - run potentially malicious code in sandbox / wrapper / virtual machine, see if it does anything suspicious

Manual Static Analysis - Source CodeHigh

According to SOAR [REF-1479], the following detection techniques may be useful: ``` Highly cost effective: ``` Focused Manual Spotcheck - Focused manual analysis of source Manual Source Code Review (not inspections)

Automated Static Analysis - Source CodeHigh

According to SOAR [REF-1479], the following detection techniques may be useful: ``` Highly cost effective: ``` Source code Weakness Analyzer Context-configured Source Code Weakness Analyzer

Architecture or Design ReviewHigh

According to SOAR [REF-1479], the following detection techniques may be useful: ``` Highly cost effective: ``` Inspection (IEEE 1028 standard) (can apply to requirements, design, source code, etc.) Formal Methods / Correct-By-Construction ``` Cost effective for partial coverage: ``` Attack Modeling

Potential Mitigations 10

Phase: Architecture and Design

Strategy: Attack Surface Reduction

Consider using language-theoretic security (LangSec) techniques that characterize inputs using a formal language and build "recognizers" for that language. This effectively requires parsing to be a distinct layer that effectively enforces a boundary between raw input and internal data representations, instead of allowing parser code to be scattered throughout the program, where it could be subject to errors or inconsistencies that create weaknesses. [REF-1109] [REF-1110] [REF-1111]

Phase: Architecture and Design

Strategy: Libraries or Frameworks

Use an input validation framework such as Struts or the OWASP ESAPI Validation API. Note that using a framework does not automatically address all input validation problems; be mindful of weaknesses that could arise from misusing the framework itself (Improper Use of Validation Framework).

Phase: Architecture and DesignImplementation

Strategy: Attack Surface Reduction

Understand all the potential areas where untrusted inputs can enter the product, including but not limited to: parameters or arguments, cookies, anything read from the network, environment variables, reverse DNS lookups, query results, request headers, URL components, e-mail, files, filenames, databases, and any external systems that provide data to the application. Remember that such inputs may be obtained indirectly through API calls.

Phase: Implementation

Strategy: Input Validation

Assume all input is malicious. Use an "accept known good" input validation strategy, i.e., use a list of acceptable inputs that strictly conform to specifications. Reject any input that does not strictly conform to specifications, or transform it into something that does. When performing input validation, consider all potentially relevant properties, including length, type of input, the full range of acceptable values, missing or extra inputs, syntax, consistency across related fields, and conformance to business rules. As an example of business rule logic, "boat" may be syntactically valid because it only contains alphanumeric characters, but it is not valid if the input is only expected to contain colors such as "red" or "blue." Do not rely exclusively on looking for malicious or malformed inputs. This is likely to miss at least one undesirable input, especially if the code's environment changes. This can give attackers enough room to bypass the intended validation. However, denylists can be useful for detecting potential attacks or determining which inputs are so malformed that they should be rejected outright.

Effectiveness: High

Phase: Architecture and Design

For any security checks that are performed on the client side, ensure that these checks are duplicated on the server side, in order to avoid Client-Side Enforcement of Server-Side Security. Attackers can bypass the client-side checks by modifying values after the checks have been performed, or by changing the client to remove the client-side checks entirely. Then, these modified values would be submitted to the server. Even though client-side checks provide minimal benefits with respect to server-side security, they are still useful. First, they can support intrusion detection. If the server receives input that should have been rejected by the client, then it may be an indication of an attack. Second, client-side error-checking can provide helpful feedback to the user about the expectations for valid input. Third, there may be a reduction in server-side processing time for accidental input errors, although this is typically a small savings.

Phase: Implementation

When your application combines data from multiple sources, perform the validation after the sources have been combined. The individual data elements may pass the validation step but violate the intended restrictions after they have been combined.

Phase: Implementation

Be especially careful to validate all input when invoking code that crosses language boundaries, such as from an interpreted language to native code. This could create an unexpected interaction between the language boundaries. Ensure that you are not violating any of the expectations of the language with which you are interfacing. For example, even though Java may not be susceptible to buffer overflows, providing a large argument in a call to native code might trigger an overflow.

Phase: Implementation

Directly convert your input type into the expected data type, such as using a conversion function that translates a string into a number. After converting to the expected data type, ensure that the input's values fall within the expected range of allowable values and that multi-field consistencies are maintained.

Phase: Implementation

Inputs should be decoded and canonicalized to the application's current internal representation before being validated (Incorrect Behavior Order: Validate Before Canonicalize, Incorrect Behavior Order: Validate Before Filter). Make sure that your application does not inadvertently decode the same input twice (Double Decoding of the Same Data). Such errors could be used to bypass allowlist schemes by introducing dangerous inputs after they have been checked. Use libraries such as the OWASP ESAPI Canonicalization control. Consider performing repeated canonicalization until your input does not change any more. This will avoid double-decoding and similar scenarios, but it might inadvertently modify inputs that are allowed to contain properly-encoded dangerous content.

Phase: Implementation

When exchanging data between components, ensure that both components are using the same character encoding. Ensure that the proper encoding is applied at each interface. Explicitly set the encoding you are using whenever the protocol allows you to do so.

Demonstrative Examples 5

ID : DX-135

This example demonstrates a shopping interaction in which the user is free to specify the quantity of items to be purchased and a total is calculated.

Code Example:Bad
Java
java

The user has no control over the price variable, however the code does not prevent a negative value from being specified for quantity. If an attacker were to provide a negative value, then the user would have their account credited instead of debited.

ID : DX-136

This example asks the user for a height and width of an m X n game board with a maximum dimension of 100 squares.

Code Example:

Bad

/* board dimensions /

While this code checks to make sure the user cannot specify large, positive integers and consume too much memory, it does not check for negative values supplied by the user. As a result, an attacker can perform a resource consumption (Uncontrolled Resource Consumption) attack against this program by specifying two, large negative values that will not overflow, resulting in a very large memory allocation (Memory Allocation with Excessive Size Value) and possibly a system crash. Alternatively, an attacker can provide very large negative values which will cause an integer overflow (Integer Overflow or Wraparound) and unexpected behavior will follow depending on how the values are treated in the remainder of the program.

The following example shows a PHP application in which the programmer attempts to display a user's birthday and homepage.

Code Example:Bad
PHP
php

The programmer intended for $birthday to be in a date format and $homepage to be a valid URL. However, since the values are derived from an HTTP request, if an attacker can trick a victim into clicking a crafted URL with <script> tags providing the values for birthday and / or homepage, then the script will run on the client's browser when the web server echoes the content. Notice that even if the programmer were to defend the $birthday variable by restricting input to integers and dashes, it would still be possible for an attacker to provide a string of the form:

Code Example:Attack
bash

If this data were used in a SQL statement, it would treat the remainder of the statement as a comment. The comment could disable other security-related logic in the statement. In this case, encoding combined with input validation would be a more useful protection mechanism.

Furthermore, an XSS (Improper Neutralization of Input During Web Page Generation ('Cross-site Scripting')) attack or SQL injection (Improper Neutralization of Special Elements used in an SQL Command ('SQL Injection')) are just a few of the potential consequences when input validation is not used. Depending on the context of the code, CRLF Injection (Improper Neutralization of CRLF Sequences ('CRLF Injection')), Argument Injection (Improper Neutralization of Argument Delimiters in a Command ('Argument Injection')), or Command Injection (Improper Neutralization of Special Elements used in a Command ('Command Injection')) may also be possible.

ID : DX-34

The following example takes a user-supplied value to allocate an array of objects and then operates on the array.

Code Example:Bad
Java
java

This example attempts to build a list from a user-specified value, and even checks to ensure a non-negative value is supplied. If, however, a 0 value is provided, the code will build an array of size 0 and then try to store a new Widget in the first location, causing an exception to be thrown.

ID : DX-110

This Android application has registered to handle a URL when sent an intent:

Code Example:

Bad

Java

...* IntentFilter filter = new IntentFilter("com.example.URLHandler.openURL"); MyReceiver receiver = new MyReceiver(); registerReceiver(receiver, filter);

java

java

The application assumes the URL will always be included in the intent. When the URL is not present, the call to getStringExtra() will return null, thus causing a null pointer exception when length() is called.

Observed Examples 43

CVE-2024-37032Large language model (LLM) management tool does not validate the format of a digest value (Improper Validation of Specified Type of Input) from a private, untrusted model registry, enabling relative path traversal (Relative Path Traversal), a.k.a. Probllama

CVE-2022-45918Chain: a learning management tool debugger uses external input to locate previous session logs (External Control of File Name or Path) and does not properly validate the given path (Improper Input Validation), allowing for filesystem path traversal using "../" sequences (Path Traversal: '../filedir')

CVE-2021-30860Chain: improper input validation (Improper Input Validation) leads to integer overflow (Integer Overflow or Wraparound) in mobile OS, as exploited in the wild per CISA KEV.

CVE-2021-30663Chain: improper input validation (Improper Input Validation) leads to integer overflow (Integer Overflow or Wraparound) in mobile OS, as exploited in the wild per CISA KEV.

CVE-2021-22205Chain: backslash followed by a newline can bypass a validation step (Improper Input Validation), leading to eval injection (Improper Neutralization of Directives in Dynamically Evaluated Code ('Eval Injection')), as exploited in the wild per CISA KEV.

CVE-2021-21220Chain: insufficient input validation (Improper Input Validation) in browser allows heap corruption (Out-of-bounds Write), as exploited in the wild per CISA KEV.

CVE-2020-9054Chain: improper input validation (Improper Input Validation) in username parameter, leading to OS command injection (Improper Neutralization of Special Elements used in an OS Command ('OS Command Injection')), as exploited in the wild per CISA KEV.

CVE-2020-3452Chain: security product has improper input validation (Improper Input Validation) leading to directory traversal (Improper Limitation of a Pathname to a Restricted Directory ('Path Traversal')), as exploited in the wild per CISA KEV.

CVE-2020-3161Improper input validation of HTTP requests in IP phone, as exploited in the wild per CISA KEV.

CVE-2020-3580Chain: improper input validation (Improper Input Validation) in firewall product leads to XSS (Improper Neutralization of Input During Web Page Generation ('Cross-site Scripting')), as exploited in the wild per CISA KEV.

CVE-2021-37147Chain: caching proxy server has improper input validation (Improper Input Validation) of headers, allowing HTTP response smuggling (Inconsistent Interpretation of HTTP Requests ('HTTP Request/Response Smuggling')) using an "LF line ending"

CVE-2008-5305Eval injection in Perl program using an ID that should only contain hyphens and numbers.

CVE-2008-2223SQL injection through an ID that was supposed to be numeric.

CVE-2008-3477lack of input validation in spreadsheet program leads to buffer overflows, integer overflows, array index errors, and memory corruption.

CVE-2008-3843insufficient validation enables XSS

CVE-2008-3174driver in security product allows code execution due to insufficient validation

CVE-2007-3409infinite loop from DNS packet with a label that points to itself

CVE-2006-6870infinite loop from DNS packet with a label that points to itself

CVE-2008-1303missing parameter leads to crash

CVE-2007-5893HTTP request with missing protocol version number leads to crash

CVE-2006-6658request with missing parameters leads to information exposure

CVE-2008-4114system crash with offset value that is inconsistent with packet size

CVE-2006-3790size field that is inconsistent with packet size leads to buffer over-read

CVE-2008-2309product uses a denylist to identify potentially dangerous content, allowing attacker to bypass a warning

CVE-2008-3494security bypass via an extra header

CVE-2008-3571empty packet triggers reboot

CVE-2006-5525incomplete denylist allows SQL injection

CVE-2008-1284NUL byte in theme name causes directory traversal impact to be worse

CVE-2008-0600kernel does not validate an incoming pointer before dereferencing it

CVE-2008-1738anti-virus product has insufficient input validation of hooked SSDT functions, allowing code execution

CVE-2008-1737anti-virus product allows DoS via zero-length field

CVE-2008-3464driver does not validate input from userland to the kernel

CVE-2008-2252kernel does not validate parameters sent in from userland, allowing code execution

CVE-2008-2374lack of validation of string length fields allows memory consumption or buffer over-read

CVE-2008-1440lack of validation of length field leads to infinite loop

CVE-2008-1625lack of validation of input to an IOCTL allows code execution

CVE-2008-3177zero-length attachment causes crash

CVE-2007-2442zero-length input causes free of uninitialized pointer

CVE-2008-5563crash via a malformed frame structure

CVE-2008-5285infinite loop from a long SMTP request

CVE-2008-3812router crashes with a malformed packet

CVE-2008-3680packet with invalid version number leads to NULL pointer dereference

CVE-2008-3660crash via multiple "." characters in file extension

References 12

Seven Pernicious Kingdoms: A Taxonomy of Software Security Errors

Katrina Tsipenyuk, Brian Chess, and Gary McGraw

NIST Workshop on Software Security Assurance Tools Techniques and Metrics • NIST

07-11-2005

https://samate.nist.gov/SSATTM_Content/papers/Seven%20Pernicious%20Kingdoms%20-%20Taxonomy%20of%20Sw%20Security%20Errors%20-%20Tsipenyuk%20-%20Chess%20-%20McGraw.pdf

ID: REF-6

Input Validation with ESAPI - Very Important

Jim Manico

15-08-2008

https://manicode.blogspot.com/2008/08/input-validation-with-esapi.html(2023-04-07)

ID: REF-166

OWASP Enterprise Security API (ESAPI) Project

OWASP

https://owasp.org/www-project-enterprise-security-api/(2025-07-24)

ID: REF-45

Hacking Exposed Web Applications, Second Edition

Joel Scambray, Mike Shema, and Caleb Sima

McGraw-Hill

05-06-2006

ID: REF-168

Input validation or output filtering, which is better?

Jeremiah Grossman

30-01-2007

https://blog.jeremiahgrossman.com/2007/01/input-validation-or-output-filtering.html(2023-04-07)

ID: REF-48

The importance of input validation

Kevin Beaver

06-09-2006

http://searchsoftwarequality.techtarget.com/tip/0,289483,sid92_gci1214373,00.html

ID: REF-170

Writing Secure Code

Michael Howard and David LeBlanc

Microsoft Press

04-12-2002

https://www.microsoftpressstore.com/store/writing-secure-code-9780735617223

ID: REF-7

LANGSEC: Language-theoretic Security

http://langsec.org/

ID: REF-1109

LangSec: Recognition, Validation, and Compositional Correctness for Real World Security

http://langsec.org/bof-handout.pdf

ID: REF-1110

Curing the Vulnerable Parser: Design Patterns for Secure Input Handling

Sergey Bratus, Lars Hermerschmidt, Sven M. Hallberg, Michael E. Locasto, Falcon D. Momot, Meredith L. Patterson, and Anna Shubina

USENIX ;login:

2017

https://www.usenix.org/system/files/login/articles/login_spring17_08_bratus.pdf

ID: REF-1111

Supplemental Details - 2022 CWE Top 25

MITRE

28-06-2022

https://cwe.mitre.org/top25/archive/2022/2022_cwe_top25_supplemental.html#problematicMappingDetails(2024-11-17)

ID: REF-1287

State-of-the-Art Resources (SOAR) for Software Vulnerability Detection, Test, and Evaluation

Gregory Larsen, E. Kenneth Hong Fong, David A. Wheeler, and Rama S. Moorthy

07-2014

https://www.ida.org/-/media/feature/publications/s/st/stateoftheart-resources-soar-for-software-vulnerability-detection-test-and-evaluation/p-5061.ashx(2025-09-05)

ID: REF-1479

Likelihood of Exploit

High

Applicable Platforms

Languages:

Not Language-Specific : Often

Modes of Introduction

Architecture and Design

Implementation

Related Attack Patterns

Related Weaknesses

ChildOf:

Improper Neutralization (CWE-707)

PeerOf:

Insufficient Verification of Data Authenticity (CWE-345)

CanPrecede:

Improper Limitation of a Pathname to a Restricted Directory ('Path Traversal') (CWE-22)

CanPrecede:

Improper Resolution of Path Equivalence (CWE-41)

CanPrecede:

Improper Neutralization of Special Elements in Output Used by a Downstream Component ('Injection') (CWE-74)

CanPrecede:

Improper Restriction of Operations within the Bounds of a Memory Buffer (CWE-119)

CanPrecede:

Allocation of Resources Without Limits or Throttling (CWE-770)

Taxonomy Mapping

7 Pernicious Kingdoms
OWASP Top Ten 2004
CERT C Secure Coding
CERT C Secure Coding
CERT C Secure Coding
WASC
Software Fault Patterns

Notes

Relationship Improper Encoding or Escaping of Output and Improper Input Validation have a close association because, depending on the nature of the structured message, proper input validation can indirectly prevent special characters from changing the meaning of a structured message. For example, by validating that a numeric ID field should only contain the 0-9 characters, the programmer effectively prevents injection attacks. Multiple techniques exist to transform potentially dangerous input into something safe, which is different than "validation," which is a technique to check if an input is already safe. CWE users need to be cautious during root cause analysis to ensure that an issue is truly an input-validation problem.

MaintenanceAs of 2020, this entry is used more often than preferred, and it is a source of frequent confusion. It is being actively modified for CWE 4.1 and subsequent versions.

MaintenanceConcepts such as validation, data transformation, and neutralization are being refined, so relationships between Improper Input Validation and other entries such as Improper Neutralization may change in future versions, along with an update to the Vulnerability Theory document.

MaintenanceInput validation - whether missing or incorrect - is such an essential and widespread part of secure development that it is implicit in many different weaknesses. Traditionally, problems such as buffer overflows and XSS have been classified as input validation problems by many security professionals. However, input validation is not necessarily the only protection mechanism available for avoiding such problems, and in some cases it is not even sufficient. The CWE team has begun capturing these subtleties in chains within the Research Concepts view (Research Concepts), but more work is needed.

Terminology The "input validation" term is extremely common, but it is used in many different ways. In some cases its usage can obscure the real underlying weakness or otherwise hide chaining and composite relationships. Some people use "input validation" as a general term that covers many different neutralization techniques for ensuring that input is appropriate, such as filtering, i.e., attempting to remove dangerous inputs (related to Improper Filtering of Special Elements); encoding/escaping, i.e., attempting to ensure that the input is not misinterpreted when it is included in output to another component (related to Improper Encoding or Escaping of Output); or canonicalization, which often indirectly removes otherwise-dangerous inputs. Others use the term in a narrower context to simply mean "checking if an input conforms to expectations without changing it." CWE uses this narrow interpretation. Note that "input validation" has very different meanings to different people, or within different classification schemes. Caution must be used when referencing this CWE entry or mapping to it. For example, some weaknesses might involve inadvertently giving control to an attacker over an input when they should not be able to provide an input at all, but sometimes this is referred to as input validation. Finally, it is important to emphasize that the distinctions between input validation and output escaping are often blurred. Developers must be careful to understand the difference, including how input validation is not always sufficient to prevent vulnerabilities, especially when less stringent data types must be supported, such as free-form text. Consider a SQL injection scenario in which a person's last name is inserted into a query. The name "O'Reilly" would likely pass the validation step since it is a common last name in the English language. However, this valid name cannot be directly inserted into the database because it contains the "'" apostrophe character, which would need to be escaped or otherwise transformed. In this case, removing the apostrophe might reduce the risk of SQL injection, but it would produce incorrect behavior because the wrong name would be recorded.