Insecure Setting of Generative AI/ML Model Inference Parameters

Draft Base
Structure: Simple
Description

The product has a component that relies on a generative AI/ML model configured with inference parameters that produce an unacceptably high rate of erroneous or unexpected outputs.

Extended Description

Generative AI/ML models, such as those used for text generation, image synthesis, and other creative tasks, rely on inference parameters that control model behavior, such as temperature, Top P, and Top K. These parameters affect the model's internal decision-making processes, learning rate, and probability distributions. Incorrect settings can lead to unusual behavior such as text "hallucinations," unrealistic images, or failure to converge during training. The impact of such misconfigurations can compromise the integrity of the application. If the results are used in security-critical operations or decisions, then this could violate the intended security policy, i.e., introduce a vulnerability.

Common Consequences 2
Scope: IntegrityOther

Impact: Varies by ContextUnexpected State

The product can generate inaccurate, misleading, or nonsensical information.

Scope: Other

Impact: Alter Execution LogicUnexpected StateVaries by Context

If outputs are used in critical decision-making processes, errors could be propagated to other systems or components.

Detection Methods 2
Automated Dynamic AnalysisModerate
Manipulate inference parameters and perform comparative evaluation to assess the impact of selected values. Build a suite of systems using targeted tools that detect problems such as prompt injection (Improper Neutralization of Input Used for LLM Prompting) and other problems. Consider statistically measuring token distribution to see if it is consistent with expected results.
Manual Dynamic AnalysisModerate
Manipulate inference parameters and perform comparative evaluation to assess the impact of selected values. Build a suite of systems using targeted tools that detect problems such as prompt injection (Improper Neutralization of Input Used for LLM Prompting) and other problems. Consider statistically measuring token distribution to see if it is consistent with expected results.
Potential Mitigations 3
Phase: ImplementationSystem ConfigurationOperation
Develop and adhere to robust parameter tuning processes that include extensive testing and validation.
Phase: ImplementationSystem ConfigurationOperation
Implement feedback mechanisms to continuously assess and adjust model performance.
Phase: Documentation
Provide comprehensive documentation and guidelines for parameter settings to ensure consistent and accurate model behavior.
Demonstrative Examples 1
Assume the product offers an LLM-based AI coding assistant to help users to write code as part of an Integrated Development Environment (IDE). Assume the model has been trained on real-world code, and the model behaves normally under its default settings. Suppose there is a default temperature of 1, with a range of temperature values from 0 (most deterministic) to 2. Consider the following configuration.

Code Example:

Bad
JSON

{

json
The problem is that the configuration contains a temperature hyperparameter that is higher than the default. This significantly increases the likelihood that the LLM will suggest a package that did not exist at training time, a behavior sometimes referred to as "package hallucination." Note that other possible behaviors could arise from higher temperature, not just package hallucination. An adversary could anticipate which package names could be generated and create a malicious package. For example, it has been observed that the same LLM might hallucinate the same package regularly. Any code that is generated by the LLM, when run by the user, would download and execute the malicious package. This is similar to typosquatting. The risk could be reduced by lowering the temperature so that it reduces the unpredictable outputs and has a better chance of staying more in line with the training data. If the temperature is set too low, then some of the power of the model will be lost, and it may be less capable of producing solutions for rarely-encountered problems that are not reflected in the training data. However, if the temperature is not set low enough, the risk of hallucinating package names may still be too high. Unfortunately, the "best" temperature cannot be determined a priori, and sufficient empirical testing is needed.

Code Example:

Good
JSON

{

json
In addition to more restrictive temperature settings, consider adding guardrails that test that independently verify any referenced package to ensure that it exists, is not obsolete, and comes from a trusted party. Note that reducing temperature does not entirely eliminate the risk of package hallucination. Even with very low temperatures or other settings, there is still a small chance that a non-existent package name will be generated.
References 1
We Have a Package for You! A Comprehensive Analysis of Package Hallucinations by Code Generating LLMs
Joseph Spracklen, Raveen Wijewickrama, A H M Nazmus Sakib, Anindya Maiti, Bimal Viswanath, and Murtuza Jadliwala
02-03-2025
ID: REF-1487
Applicable Platforms
Languages:
Not Language-Specific : Undetermined
Technologies:
AI/ML : UndeterminedNot Technology-Specific : Undetermined
Modes of Introduction
Build and Compilation
Installation
Patching and Maintenance
Notes
Research GapThis weakness might be under-reported as of CWE 4.18, since there are no clear observed examples in CVE. However, inference parameters may be the root cause for various vulnerabilities - or important factors - but the vulnerability reports may concentrate more on the negative impact (e.g. code execution) or the weaknesses that the insecure settings contribute to. Alternately, dynamic techniques might not reveal the root cause if the researcher does not have access to the underlying source code and environment.