Exposure of Sensitive Information caused by Shared Microarchitectural Predictor State that Influences Transient Execution

Description

Shared microarchitectural predictor state may allow code to influence transient execution across a hardware boundary, potentially exposing data that is accessible beyond the boundary over a covert channel.

Extended Description

Many commodity processors have Instruction Set Architecture (ISA) features that protect software components from one another. These features can include memory segmentation, virtual memory, privilege rings, trusted execution environments, and virtual machines, among others. For example, virtual memory provides each process with its own address space, which prevents processes from accessing each other's private data. Many of these features can be used to form hardware-enforced security boundaries between software components. When separate software components (for example, two processes) share microarchitectural predictor state across a hardware boundary, code in one component may be able to influence microarchitectural predictor behavior in another component. If the predictor can cause transient execution, the shared predictor state may allow an attacker to influence transient execution in a victim, and in a manner that could allow the attacker to infer private data from the victim by monitoring observable discrepancies (CWE-203) in a covert channel [REF-1400]. Predictor state may be shared when the processor transitions from one component to another (for example, when a process makes a system call to enter the kernel). Many commodity processors have features which prevent microarchitectural predictions that occur before a boundary from influencing predictions that occur after the boundary. Predictor state may also be shared between hardware threads, for example, sibling hardware threads on a processor that supports simultaneous multithreading (SMT). This sharing may be benign if the hardware threads are simultaneously executing in the same software component, or it could expose a weakness if one sibling is a malicious software component, and the other sibling is a victim software component. Processors that share microarchitectural predictors between hardware threads may have features which prevent microarchitectural predictions that occur on one hardware thread from influencing predictions that occur on another hardware thread. Features that restrict predictor state sharing across transitions or between hardware threads may be always-on, on by default, or may require opt-in from software.

Common Consequences 1

Scope: Confidentiality

Impact: Read Memory

Detection Methods 3

Manual AnalysisModerate

This weakness can be detected in hardware by manually inspecting processor specifications. Features that exhibit this weakness may have microarchitectural predictor state that is shared between hardware threads, execution contexts (for example, user and kernel), or other components that may host mutually distrusting software (or firmware, etc.).

Automated AnalysisHigh

Software vendors can release tools that detect presence of known weaknesses on a processor. For example, some of these tools can attempt to transiently execute a vulnerable code sequence and detect whether code successfully leaks data in a manner consistent with the weakness under test. Alternatively, some hardware vendors provide enumeration for the presence of a weakness (or lack of a weakness). These enumeration bits can be checked and reported by system software. For example, Linux supports these checks for many commodity processors: $ cat /proc/cpuinfo | grep bugs | head -n 1 bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs taa itlb_multihit srbds mmio_stale_data retbleed

Automated AnalysisModerate

This weakness can be detected in hardware by employing static or dynamic taint analysis methods [REF-1401]. These methods can label each predictor entry (or prediction history, etc.) according to the processor context that created it. Taint analysis or information flow analysis can then be applied to detect when predictor state created in one context can influence predictions made in another context.

Potential Mitigations 11

Phase: Architecture and Design

The hardware designer can attempt to prevent transient execution from causing observable discrepancies in specific covert channels.

Phase: Architecture and Design

Hardware designers may choose to use microarchitectural bits to tag predictor entries. For example, each predictor entry may be tagged with a kernel-mode bit which, when set, indicates that the predictor entry was created in kernel mode. The processor can use this bit to enforce that predictions in the current mode must have been trained in the current mode. This can prevent malicious cross-mode training, such as when user-mode software attempts to create predictor entries that influence transient execution in the kernel. Predictor entry tags can also be used to associate each predictor entry with the SMT thread that created it, and thus the processor can enforce that each predictor entry can only be used by the SMT thread that created it. This can prevent an SMT thread from using predictor entries crafted by a malicious sibling SMT thread.

Effectiveness: Moderate

Phase: Architecture and Design

Hardware designers may choose to sanitize microarchitectural predictor state (for example, branch prediction history) when the processor transitions to a different context, for example, whenever a system call is invoked. Alternatively, the hardware may expose instruction(s) that allow software to sanitize predictor state according to the user's threat model. For example, this can allow operating system software to sanitize predictor state when performing a context switch from one process to another.

Effectiveness: Moderate

Phase: Implementation

System software can mitigate this weakness by invoking predictor-state-sanitizing operations (for example, the indirect branch prediction barrier on Intel x86) when switching from one context to another, according to the hardware vendor's recommendations.

Effectiveness: Moderate

Phase: Build and Compilation

If the weakness is exposed by a single instruction (or a small set of instructions), then the compiler (or JIT, etc.) can be configured to prevent the affected instruction(s) from being generated. One prominent example of this mitigation is retpoline ([REF-1414]).

Effectiveness: Limited

Phase: Build and Compilation

Use control-flow integrity (CFI) techniques to constrain the behavior of instructions that redirect the instruction pointer, such as indirect branch instructions.

Effectiveness: Moderate

Phase: Build and Compilation

Use software techniques (including the use of serialization instructions) that are intended to reduce the number of instructions that can be executed transiently after a processor event or misprediction.

Effectiveness: Incidental

Phase: System Configuration

Some systems may allow the user to disable predictor sharing. For example, this could be a BIOS configuration, or a model-specific register (MSR) that can be configured by the operating system or virtual machine monitor.

Effectiveness: Moderate

Phase: Patching and Maintenance

The hardware vendor may provide a patch to, for example, sanitize predictor state when the processor transitions to a different context, or to prevent predictor entries from being shared across SMT threads. A patch may also introduce new ISA that allows software to toggle a mitigation.

Effectiveness: Moderate

Phase: Documentation

If a hardware feature can allow microarchitectural predictor state to be shared between contexts, SMT threads, or other architecturally defined boundaries, the hardware designer may opt to disclose this behavior in architecture documentation. This documentation can inform users about potential consequences and effective mitigations.

Effectiveness: High

Phase: Requirements

Processor designers, system software vendors, or other agents may choose to restrict the ability of unprivileged software to access to high-resolution timers that are commonly used to monitor covert channels.

Demonstrative Examples 2

Branch Target Injection (BTI) is a vulnerability that can allow an SMT hardware thread to maliciously train the indirect branch predictor state that is shared with its sibling hardware thread. A cross-thread BTI attack requires the attacker to find a vulnerable code sequence within the victim software. For example, the authors of [REF-1415] identified the following code sequence in the Windows library ntdll.dll:

Code Example:Bad
x86 Assembly
x86 assembly

To successfully exploit this code sequence to disclose the victim's private data, the attacker must also be able to find an indirect branch site within the victim, where the attacker controls the values in edi and ebx, and the attacker knows the value in edx as shown above at the indirect branch site. A proof-of-concept cross-thread BTI attack might proceed as follows: 1. The attacker thread and victim thread must be co-scheduled on the same physical processor core. 1. The attacker thread must train the shared branch predictor so that when the victim thread reaches indirect_branch_site, the jmp instruction will be predicted to target example_code_sequence instead of the correct architectural target. The training procedure may vary by processor, and the attacker may need to reverse-engineer the branch predictor to identify a suitable training algorithm. 1. This step assumes that the attacker can control some values in the victim program, specifically the values in edi and ebx at indirect_branch_site. When the victim reaches indirect_branch_site the processor will (mis)predict example_code_sequence as the target and (transiently) execute the adc instructions. If the attacker chooses ebx so that `ebx = m - 0x13BE13BD - edx, then the first adc will load 32 bits from address m in the victim's address space and add *m (the data loaded from) to the attacker-controlled base address in edi. The second adc instruction accesses a location in memory whose address corresponds to *m`. 1. The adversary uses a covert channel analysis technique such as Flush+Reload ([REF-1416]) to infer the value of the victim's private data *m.

BTI can also allow software in one execution context to maliciously train branch predictor entries that can be used in another context. For example, on some processors user-mode software may be able to train predictor entries that can also be used after transitioning into kernel mode, such as after invoking a system call. This vulnerability does not necessarily require SMT and may instead be performed in synchronous steps, though it does require the attacker to find an exploitable code sequence in the victim's code, for example, in the kernel.

Observed Examples 3

CVE-2017-5754(Branch Target Injection, BTI, Spectre v2). Shared microarchitectural indirect branch predictor state may allow code to influence transient execution across a process, VM, or privilege boundary, potentially exposing data that is accessible beyond the boundary.

CVE-2022-0001(Branch History Injection, BHI, Spectre-BHB). Shared branch history state may allow user-mode code to influence transient execution in the kernel, potentially exposing kernel data over a covert channel.

CVE-2021-33149(RSB underflow, Retbleed). Shared return stack buffer state may allow code that executes before a prediction barrier to influence transient execution after the prediction barrier, potentially exposing data that is accessible beyond the barrier over a covert channel.

References 7

Retpoline: A Branch Target Injection Mitigation

Intel Corporation

22-08-2022

https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/retpoline-branch-target-injection-mitigation.html(2023-02-13)

ID: REF-1414

Spectre Attacks: Exploiting Speculative Execution

Paul Kocher, Jann Horn, Anders Fogh, Daniel Genkin, Daniel Gruss, Werner Haas, Mike Hamburg, Moritz Lipp, Stefan Mangard, Thomas Prescher, Michael Schwarz, and Yuval Yarom

05-2019

https://spectreattack.com/spectre.pdf(2024-02-14)

ID: REF-1415

Flush+Reload: A High Resolution, Low Noise, L3 Cache Side-Channel Attack

Yuval Yarom and Katrina Falkner

2014

https://www.usenix.org/system/files/conference/usenixsecurity14/sec14-paper-yarom.pdf(2023-02-13)

ID: REF-1416

Control Flow Integrity

The Clang Team

https://clang.llvm.org/docs/ControlFlowIntegrity.html(2024-02-13)

ID: REF-1398