Improper Validation of Generative AI Output

Incomplete Base
Structure: Simple
Description

The product invokes a generative AI/ML component whose behaviors and outputs cannot be directly controlled, but the product does not validate or insufficiently validates the outputs to ensure that they align with the intended security, content, or privacy policy.

Common Consequences 1
Scope: Integrity

Impact: Execute Unauthorized Code or CommandsVaries by Context

In an agent-oriented setting, output could be used to cause unpredictable agent invocation, i.e., to control or influence agents that might be invoked from the output. The impact varies depending on the access that is granted to the tools, such as creating a database or writing files.

Detection Methods 3
Dynamic Analysis with Manual Results Interpretation
Use known techniques for prompt injection and other attacks, and adjust the attacks to be more specific to the model or system.
Dynamic Analysis with Automated Results Interpretation
Use known techniques for prompt injection and other attacks, and adjust the attacks to be more specific to the model or system.
Architecture or Design Review
Review of the product design can be effective, but it works best in conjunction with dynamic analysis.
Potential Mitigations 4
Phase: Architecture and Design
Since the output from a generative AI component (such as an LLM) cannot be trusted, ensure that it operates in an untrusted or non-privileged space.
Phase: Operation
Use "semantic comparators," which are mechanisms that provide semantic comparison to identify objects that might appear different but are semantically similar.
Phase: Operation
Use components that operate externally to the system to monitor the output and act as a moderator. These components are called different terms, such as supervisors or guardrails.
Phase: Build and Compilation
During model training, use an appropriate variety of good and bad examples to guide preferred outputs.
Observed Examples 1
CVE-2024-3402chain: GUI for ChatGPT API performs input validation but does not properly "sanitize" or validate model output data (Improper Validation of Generative AI Output), leading to XSS (Improper Neutralization of Input During Web Page Generation ('Cross-site Scripting')).
References 5
LLM02: Insecure Output Handling
OWASP
21-03-2024
ID: REF-1441
Validating Outputs
Cohere and Guardrails AI
13-09-2023
ID: REF-1442
NeMo Guardrails: A Toolkit for Controllable and Safe LLM Applications with Programmable Rails
Traian Rebedea, Razvan Dinu, Makesh Sreedhar, Christopher Parisien, and Jonathan Cohen
12-2023
ID: REF-1443
Insecure output handling in LLMs
Snyk
ID: REF-1444
Building Guardrails for Large Language Models
Yi Dong, Ronghui Mu, Gaojie Jin, Yi Qi, Jinwei Hu, Xingyu Zhao, Jie Meng, Wenjie Ruan, and Xiaowei Huang
29-05-2024
ID: REF-1445
Applicable Platforms
Languages:
Not Language-Specific : Undetermined
Technologies:
AI/ML : UndeterminedNot Technology-Specific : Undetermined
Modes of Introduction
Architecture and Design
Implementation
Related Weaknesses
Notes
Research GapThis entry is related to AI/ML, which is not well understood from a weakness perspective. Typically, for new/emerging technologies including AI/ML, early vulnerability discovery and research does not focus on root cause analysis (i.e., weakness identification). For AI/ML, the recent focus has been on attacks and exploitation methods, technical impacts, and mitigations. As a result, closer research or focused efforts by SMEs is necessary to understand the underlying weaknesses. Diverse and dynamic terminology and rapidly-evolving technology further complicate understanding. Finally, there might not be enough real-world examples with sufficient details from which weakness patterns may be discovered. For example, many real-world vulnerabilities related to "prompt injection" appear to be related to typical injection-style attacks in which the only difference is that the "input" to the vulnerable component comes from model output instead of direct adversary input, similar to "second-order SQL injection" attacks.
MaintenanceThis entry was created by members of the CWE AI Working Group during June and July 2024. The CWE Project Lead, CWE Technical Lead, AI WG co-chairs, and many WG members decided that for purposes of timeliness, it would be more helpful to the CWE community to publish the new entry in CWE 4.15 quickly and add to it in subsequent versions.