Tractable Problems in AI Security via Formal Methods

The five layers below run top to bottom in the order an adversary walks the stack. Each section that follows opens on the layer’s status quo and current attack surface, then surfaces the FM-shaped widgets that would close part of it. Scope follows § — infrastructure only; the model itself is out of frame.

Figure 1: The five layers of the ML inference and training stack, with the adversary archetypes (left) that each layer invites and the assets (right) that pass through them. Adversaries are tagged on each layer below using the same labels; see § for full descriptions.

Bibliography

// click to expand

[execution-harness]Execution Harnessfull page →

invites:external-user malicious-model co-tenant

Execution Harness

The execution harness is the outermost software layer of an ML deployment: the wrapper that actually calls the model and handles its inputs and outputs. A user’s prompt enters here, gets tokenized and batched, passes through the model, and returns as a completion. Because every interaction transits this layer, it is simultaneously the easiest place to bolt on security controls and the place where a failure is most directly exploitable from the outside.

Figure 1: A typical request flow through the execution harness. The gateway makes policy decisions on the way in and the way back; the inference server holds the hot path; tool calls divert into a sandbox. The labelled red callouts mark the externally-reachable failure modes addressed later in §.

Three components make up a typical execution harness, each with its own attack surface.

Inference Servers

An inference server (vLLM, Hugging Face TGI, Nvidia Triton) manages GPU memory, batches incoming requests for throughput, and exposes the model behind an HTTP or gRPC endpoint. The server is responsible for continuous batching, KV-cache management, and (increasingly) speculative decoding — all performance-sensitive paths written in a mix of Python and C++/CUDA.

The security-relevant surface here is large. The server deserializes model weights, often from formats like SafeTensors or GGUF, and the parsers for the latter have had their own history of memory-corruption bugs [1] (SafeTensors’ own audit [2] found no critical flaws, but the GGUF loader in llama.cpp has seen repeated integer-overflow-to-heap-overflow patterns). It accepts user-controlled input (prompts, generation parameters, LoRA adapter names) that flows deep into the batching and memory-allocation logic. A malformed request that triggers an out-of-bounds KV-cache write, for instance, could corrupt another user’s in-flight completion in a multi-tenant deployment. Formal methods work in this area has two legs: memory-safety of the batching and caching layers, and verified parsers across the several deserialization boundaries the server straddles (weights on the way in, request payloads on the hot path, and structured tool-call outputs on the way back). The second leg is the subject of §; the tooling is mature and the gap is targeting, not capability.

Inference proxies like LiteLLM are also part of the dependency supply chain (§) — a compromised package at this layer gives an attacker access to every API key and model endpoint the proxy routes.

Sandboxes

When an AI system can execute code — a “code interpreter” in a chatbot, a tool-use agent writing and running scripts — it needs a sandbox. In practice this means a containerized environment (often an OCI container or a microVM like Firecracker) with a constrained syscall profile and filesystem. The sandbox must prevent the AI-generated code from escaping to the host, exfiltrating data from other tenants, or persisting state across invocations in unintended ways.

The gap between “container” and “sandbox” is where real attacks live. Default container configurations are not security boundaries; they share a kernel with the host. A sandbox that allows ptrace, or that mounts the Docker socket, or that runs with CAP_SYS_ADMIN, is a sandbox in name only. The OCI runtime hardening problem (§) addresses this directly: writing machine-checked specifications for what a sandbox policy must guarantee and verifying that a given runtime configuration meets them.

API Gateways

The API gateway sits at the public edge. It handles authentication, rate limiting, content filtering, prompt injection detection, and usage metering. In many deployments this is a conventional reverse proxy (Envoy, Kong) with AI-specific middleware bolted on: a classifier that scans prompts for injection attempts, a filter that redacts PII from completions, a logger that records interactions for abuse review.

The security challenge is that these filters are making semantic decisions about adversarial input using heuristics or secondary ML classifiers — a brittle arrangement. A prompt injection that evades the filter is not a bug in the filter’s code but a failure of its classification boundary. Formal methods have limited traction on the classification problem itself, but they can verify the plumbing: that the gateway’s policy engine correctly composes its rules, that a request flagged by the filter cannot reach the model through an alternate code path, and that rate-limiting state cannot be poisoned by concurrent requests. The value here is ensuring that when a policy says “block this,” the infrastructure actually does.

Beneath all of this AI-specific middleware sits the protocol layer that admits the request in the first place: the TLS session terminating the HTTPS connection, the SSH daemon an operator reaches the box through, and increasingly the WireGuard overlay that decides which machines can see the fleet at all. These are the oldest and most-studied trust boundaries in the stack, and the ones whose single-bug blast radius is largest; § treats them as a joint verification target.

Bibliography

[1] ggml-org, “GHSA-96jg-mvhq-q7q7: Heap Buffer Overflow via Integer Overflow in GGUF Tensor Parsing.” [Online]. Available: https://github.com/ggml-org/llama.cpp/security/advisories/GHSA-96jg-mvhq-q7q7
[2] Trail of Bits, “EleutherAI, Hugging Face Safetensors Library Security Assessment,” technical report, May 2023. [Online]. Available: https://github.com/trailofbits/publications/blob/master/reviews/2023-03-eleutherai-huggingface-safetensors-securityreview.pdf

// related tractable problems

adversarial-robustness-fm agent-weirdmachines ai-control-proof-carrying-code audit-log-integrity capability-accumulation context-window-integrity edge-policy-verification neuralese-governance oci-runtime-hardening sampler-verification spec-elicitation verified-input-parsers verified-protocol-boundaries weight-integrity

[software-framework]Software & ML Frameworkfull page →

invites:supply-chain malicious-model rogue-insider

Software and ML Framework Layer

Between the orchestration infrastructure below and the execution harness above sits the software that actually defines and compiles a model: the frameworks researchers write against, the compilers that lower their code to GPU kernels, and the sprawling dependency graph that connects everything. This layer is where the mathematical definition of a neural network meets the reality of executable code. A compromise here is dangerous because it can alter what a model computes without changing what it appears to be — the weights, the architecture description, and the training logs all look clean while the compiled artifact does something else.

Compilers

An ML compiler takes a high-level model description and lowers it through several intermediate representations to GPU machine code. The typical path is something like PyTorch → torch.compile → Triton IR → LLVM IR → PTX → SASS, though the specifics vary: TVM uses its own Relay/TIR stack, XLA compiles through HLO and StableHLO, and JAX traces through jaxpr before hitting XLA. Each stage applies optimization passes — operator fusion, memory layout transforms, quantization, tiling — that rewrite the computation in ways the model author never sees directly.

Figure 1: A typical PyTorch lowering path, with sample injection points (red) where a poisoned pass could alter what the silicon computes without changing what the source, weights, or training logs look like.

The attack surface is a modern instance of Thompson’s “Trusting Trust” [1]. A poisoned compiler pass could inject a small additive bias into attention weights, swap an activation function under specific input conditions, or silently alter a quantization rounding mode. The result would be a model that behaves correctly on benchmarks but misbehaves on targeted inputs, with no trace of the modification in source code, model weights, or training logs. The compilation chain is long enough that auditing every stage by hand is not feasible.

This is not hypothetical. LLVM’s MLIR infrastructure — shared by TVM, XLA, and Triton — had multiple memory-management vulnerabilities in 2023 [2] where a crafted MLIR file could trigger crashes or worse. More telling is a 2025 XLA:TPU miscompilation in the approximate top-k operation that returned silently wrong results [3] for certain batch sizes, affecting production models. The compiler produced valid-looking output; it was just computing the wrong function. No error was raised. This is the failure mode that formal methods are built to prevent.

The formal methods community has begun to engage. Bhat et al. [4] formalized a core calculus for SSA-based IRs in Lean and verified peephole rewrites across three use cases: LLVM bitvector rewrites, structured control flow, and fully homomorphic encryption. Fehr et al. [5] defined semantics for five MLIR dialects, built three verification tools, and found five miscompilation bugs in upstream MLIR in the process. Both cover individual rewrites and local properties, not end-to-end compilation correctness. The gap between “verified peephole rewrites” and “verified compiler” is the same gap CompCert [6] closed for C — and closing it for even one ML compilation path (say, StableHLO to PTX for a fixed set of operations) would be a major result.

A practical intermediate target: verified compilation of attention kernels. Attention is the computational core of transformer inference, and it is also the operation most aggressively optimized by ML compilers (FlashAttention, PagedAttention, various fused variants). Proving that the compiled kernel preserves the semantics of the source-level attention specification, up to documented floating-point tolerances, would cover the highest-value target in the compilation chain.

Frameworks

PyTorch, JAX, and TensorFlow are the environments where models are defined, trained, and exported. They provide automatic differentiation, GPU dispatch, a zoo of layer implementations, and serialization formats for saving and loading model weights. Each is a million-line codebase with native extensions, JIT compilers, and deep operating system integration, but the highest-impact vulnerability class is mundane: deserialization.

PyTorch’s default serialization uses Python’s pickle protocol, which can embed arbitrary executable code in a saved file. Loading an untrusted .pt file is functionally equivalent to running eval() on attacker-controlled input. This was understood in principle for years, but [7] demonstrated that even the weights_only=True safeguard — PyTorch’s own recommended mitigation — could be bypassed, restoring full remote code execution. Hugging Face uses PickleScan to screen uploaded models, but JFrog found three zero-day bypasses in the scanner itself [8]: one corrupts magic-number detection so the scanner halts with an “invalid magic number” error while torch.load() still loads the payload; a second exploits a file-extension mismatch that skips scanning entirely; and a third uses submodule imports to downgrade the severity of dangerous globals from blocked to merely flagged. In a separate line of attack, ReversingLabs’ nullifAI technique [9] hid working reverse shells in PyTorch models by 7z-compressing the pickle payload so the scanner’s decompression failed, even as the malicious code ran first. TensorFlow has its own variant: Keras Lambda layers embed arbitrary Python in model definitions, and [10] showed that the safe_mode fix could be bypassed via a downgrade attack when loading older checkpoints.

The SafeTensors format, developed by Hugging Face, stores only tensor data and metadata with no code execution path. An independent security audit [11] found no critical flaws. This is the right direction, but the safety argument rests on an audit, not a proof. A serialization format with a machine-checked parser — in the style of EverParse [12] or similar verified parsers — would close the vulnerability class permanently rather than playing whack-a-mole with scanner bypasses.

Beyond serialization, the frameworks expose attack surface through JIT compilation and custom operators. PyTorch’s torch.utils.cpp_extension.load() compiles C++/CUDA code into /tmp and dynamically loads the resulting shared library. Any process that can influence the source or the cached .so gets native code execution. TensorFlow has accumulated roughly 435 CVEs across 44 CWE categories [13], many in parsing and memory management of its graph representation. Formal verification of core tensor operations — proving that softmax, layernorm, or matmul produce results consistent with their mathematical definitions across all compilation backends — remains open territory.

Dependency Supply Chain

Figure 2: Top: ecosystem-level statistics on the transitive-dependency surface from [14]. Bottom: three real compromises that reached production ML stacks via transitive pulls in 2022–2026.

A typical ML project does not just depend on PyTorch or TensorFlow. It depends on a graph of packages — data loaders, tokenizers, experiment trackers, serving utilities, CUDA bindings — that pull in their own transitive dependencies. Mahon et al. [14] analyzed 378,573 PyPI packages and found that the average package has 129.6 transitive dependencies spanning a dependency chain up to 23 levels deep. A single CVE in urllib3 created guaranteed exposure in 1,906 downstream packages. The ML ecosystem sits on top of all of this and adds its own layer of domain-specific packages with fast release cycles and small maintainer teams.

Supply chain attacks against this graph are not theoretical. In December 2022, an attacker uploaded a malicious torchtriton package to PyPI with the same name as PyTorch’s internal dependency [15]. Because pip resolves the public PyPI index before private indices by default, PyTorch-nightly users received the attacker’s version, which exfiltrated hostnames, environment variables, and password files. In December 2024, the Ultralytics YOLO library (60+ million PyPI downloads) was compromised through a GitHub Actions script injection: the attacker crafted PR branch names that injected code into the CI/CD pipeline, and the published package bundled a cryptominer [16]. In 2026, the legitimate litellm package — an inference proxy pulled as a transitive dependency by many ML stacks — was compromised by a direct upload to PyPI that bypassed the project’s GitHub release pipeline entirely [17]. The injected payload harvested SSH keys and cloud credentials and attempted lateral movement across Kubernetes clusters. The attacks are moving up the dependency graph, where one package reaches thousands of dependents.

The defenses that exist are largely unsigned and unverified. SLSA (Supply-chain Levels for Software Artifacts) [18] defines four levels of provenance assurance, from basic metadata (L1) through hermetic reproducible builds (L4). Most ML toolchains operate at L0 or L1 — packages are uploaded by individual maintainers from their laptops with no build provenance, no signed attestation, and no reproducibility guarantee. Sigstore [19] offers keyless artifact signing tied to OIDC identities, which would at least bind a package version to a specific CI run. But even with signed provenance, the resolver itself is a trust boundary: pip’s dependency resolution algorithm is complex, underspecified, and has produced dependency confusion vulnerabilities by design [20].

Formal methods can contribute at two levels. First, at the resolver level: proving that a dependency resolution algorithm correctly implements its specification and cannot be tricked by namespace collisions, version range manipulation, or index priority ordering. This is a bounded, well-defined verification target. Second, at the build level: verified reproducible builds where the mapping from source to artifact is proven deterministic, so that any party can independently verify that a published package corresponds exactly to its claimed source. Achieving SLSA L4 with machine-checked guarantees — rather than relying on build system configuration being correct — would make supply chain substitution attacks structurally impossible rather than merely detectable.

Bibliography

[1] K. Thompson, “Reflections on Trusting Trust,” Communications of the ACM, vol. 27, no. 8, pp. 761–763, Aug. 1984, doi: 10.1145/358198.358210.
[2] LLVM Project, “CVE-2023-29932 through CVE-2023-29939: Memory-management vulnerabilities in LLVM MLIR.” 2023.
[3] Anthropic, “A postmortem of three recent issues.” [Online]. Available: https://www.anthropic.com/engineering/a-postmortem-of-three-recent-issues
[4] S. Bhat, A. d. Keizer, C. Hughes, A. Goens, and T. Grosser, “Verifying Peephole Rewriting in SSA Compiler IRs,” in 15th International Conference on Interactive Theorem Proving (ITP 2024), 2024. doi: 10.4230/LIPIcs.ITP.2024.9.
[5] M. Fehr, Y. Fan, H. Pompougnac, J. Regehr, and T. Grosser, “First-Class Verification Dialects for MLIR,” in Proceedings of the ACM on Programming Languages (PLDI), 2025. doi: 10.1145/3729309.
[6] X. Leroy, “Formal verification of a realistic compiler,” Communications of the ACM, vol. 52, no. 7, pp. 107–115, 2009, doi: 10.1145/1538788.1538814.
[7] PyTorch Contributors, “CVE-2025-32434: Remote code execution via \texttt{torch.load} with \texttt{weights_only=True}.” 2025.
[8] JFrog Security Research, “Unveiling 3 Zero-Day Vulnerabilities in PickleScan.” [Online]. Available: https://jfrog.com/blog/unveiling-3-zero-day-vulnerabilities-in-picklescan/
[9] ReversingLabs, “Malicious ML models discovered on Hugging Face platform.” [Online]. Available: https://www.reversinglabs.com/blog/rl-identifies-malware-ml-model-hosted-on-hugging-face
[10] Google / TensorFlow, “CVE-2024-3660: Arbitrary code execution via Keras Lambda layer \texttt{safe_mode} bypass.” 2024.
[11] Trail of Bits, “EleutherAI, Hugging Face Safetensors Library Security Assessment,” technical report, May 2023. [Online]. Available: https://github.com/trailofbits/publications/blob/master/reviews/2023-03-eleutherai-huggingface-safetensors-securityreview.pdf
[12] T. Ramananandro et al., “EverParse: Verified Secure Zero-Copy Parsers for Authenticated Message Formats,” in Proceedings of the 28th USENIX Security Symposium, 2019.
[13] K. Zhang, Y. Feng, and Z. Chen, “Unraveling the Characterization and Propagation of Security Vulnerabilities in TensorFlow-based Deep Learning Software Supply Chain,” in 16th Asia-Pacific Symposium on Internetware (Internetware 2025), 2025. doi: 10.1145/3755881.3755923.
[14] J. Mahon, C. Hou, and Z. Yao, “PyPitfall: Dependency Chaos and Software Supply Chain Vulnerabilities in Python,” 2025.
[15] PyTorch Team, “Compromised PyTorch-nightly dependency chain (torchtriton).” [Online]. Available: https://pytorch.org/blog/compromised-nightly-dependency/
[16] ReversingLabs, “Compromised Ultralytics PyPI package delivers crypto coinminer.” [Online]. Available: https://www.reversinglabs.com/blog/compromised-ultralytics-pypi-package-delivers-crypto-coinminer
[17] FutureSearch, “litellm Supply Chain Attack.” [Online]. Available: https://futuresearch.ai/blog/litellm-pypi-supply-chain-attack/
[18] SLSA Contributors, “SLSA v1.0: Supply-chain Levels for Software Artifacts.” [Online]. Available: https://slsa.dev/spec/v1.0/
[19] Z. Newman, J. S. Meyers, and S. Torres-Arias, “Sigstore: Software Signing for Everybody,” in Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security (CCS), 2022. doi: 10.1145/3548606.3560596.
[20] A. Birsan, “Dependency Confusion: How I Hacked Into Apple, Microsoft and Dozens of Other Companies.” [Online]. Available: https://medium.com/@alex.birsan/dependency-confusion-4a5d60fec610

// related tractable problems

adversarial-robustness-fm ai-control-proof-carrying-code neuralese-governance sampler-verification spec-elicitation verified-input-parsers weight-integrity

[orchestration-cloud]Orchestration & Cloudfull page →

invites:co-tenant rogue-insider network-mitm supply-chain

Orchestration and Cloud Layer

The orchestration layer manages the compute resources that ML workloads run on: scheduling jobs across GPU clusters, routing traffic between them, and controlling who can access what. It sits below the frameworks and above the firmware — a layer of distributed systems software that most ML engineers interact with only through configuration files, but whose security properties determine whether isolation between tenants, jobs, and data pipelines actually holds. Following § we treat this layer as a stack of mechanisms (runtime confinement, distributed-protocol correctness, network-fabric isolation, IAM logic) and not as a question about scheduler optimization; placement and load-balancing decisions are downstream of these mechanisms and out of scope.

Cluster Orchestration

GPU training and inference jobs run inside containers managed by Kubernetes, Slurm, or Ray. The container is the unit of isolation: it is supposed to confine a workload to its own filesystem, network namespace, and device access. In practice, GPU workloads erode this confinement. NVIDIA’s container toolkit — the standard mechanism for exposing GPUs inside containers — had a critical TOCTOU vulnerability [1] that allowed a malicious container image to escape to the host, affecting 33% of cloud environments per Wiz’s estimate. The underlying container runtime, runc, had its own escape [2] via leaked file descriptors. These are not exotic attacks; they are the kind of bugs that container runtimes accumulate routinely, and they interact badly with the privileged device access that GPU workloads require. The OCI runtime hardening problem (§) addresses the gap between what orchestrators assume containers guarantee and what the runtimes actually enforce.

Slurm, the dominant scheduler for HPC and large training runs, has a worse security record than most ML engineers realize. It has historically run its daemons as root, and its authentication rests on MUNGE — a shared-secret credential system that was never designed for adversarial multi-tenant environments. [3] demonstrated this: attackers could bypass MUNGE’s hash-based message integrity protections in the slurmd process to replay root-level authentication tokens, gaining privileged access to compute nodes over the network. Earlier vulnerabilities were blunter — [4] let an unprivileged user send data to arbitrary Unix sockets on the host as root through Slurm’s PMI2/PMIx RPC handler, and [5] allowed privilege escalation to root via SPANK environment variable injection during Prolog/Epilog execution. Slurm clusters at national labs and cloud GPU providers routinely run versions months or years behind patches, because upgrading the scheduler means draining the entire cluster.

Ray’s security posture is, by its maintainers’ admission, a design choice rather than an oversight. Until recently, Ray had no authentication on its Jobs API or Dashboard — any process that could reach the network port could submit arbitrary code for execution on the cluster. Anyscale’s position was that Ray should only run in isolated networks and act upon trusted code. The result was predictable: the ShadowRay campaign [6] compromised hundreds of thousands of exposed Ray clusters for cryptocurrency mining, data exfiltration, and eventually self-propagating botnets. Attackers extracted production database credentials, cloud environment tokens, and model weights from compromised AI workloads. The vulnerability remained unpatched for over two years because the vendor classified it as intended behavior.

Kubernetes introduces its own GPU-specific attack surface through device plugins. The NVIDIA device plugin runs as a privileged DaemonSet with access to the host’s /var/lib/kubelet/device-plugins socket and device files. A compromised plugin can register fake GPU devices, intercept kubelet communications, or escalate to full node compromise. Kubernetes also lacks native GPU resource throttling — unlike CPU and memory, there are no cgroup controls for GPU compute time [7], so a malicious pod can monopolize GPU cycles and starve other tenants’ inference workloads with no scheduler-level mitigation. In multi-tenant clusters, GPU memory is not isolated by default; processes on the same GPU can read each other’s memory regions unless MIG (Multi-Instance GPU) or MPS partitioning is explicitly configured, and even then the isolation guarantees are weaker than what hypervisors provide for CPU and RAM.

Network Fabric

Large training runs communicate over InfiniBand or RoCE interconnects using RDMA — remote direct memory access that bypasses the kernel to move data between GPU memory regions at line rate. This is a performance architecture, not a security architecture. RDMA’s kernel bypass means that the standard OS-level network monitoring and access control stack is not in the path. InfiniBand partition keys (P_Keys) provide coarse tenant isolation at the hardware level, but verifying that traffic isolation policies are correctly enforced across a fabric of thousands of ports is a manual, configuration-driven process with no runtime audit trail. The policy half of this problem has a formal foundation: NetKAT [8] gives forwarding policies a sound and complete equational theory in which reachability and tenant isolation are decidable equivalence checks. But its packet model targets OpenFlow-style Ethernet — nothing comparable exists for P_Key partitioning or RDMA fabrics, the gap that fabric policy verification (§) addresses — and a verified policy is still a claim about what the controller was supposed to install. If you want to verify what is actually on the wire between nodes in a GPU cluster — not what the SDN controller says is on the wire — you need an independent observation point, which is the problem the verified network tap (§) is designed to solve.

The SDN controllers that manage GPU cluster fabrics are themselves a concentrated target. An SDN controller is a single logical authority over all traffic routing decisions in the network; compromise it and you can redirect, mirror, or drop any flow. The OpenFlow protocol, which most SDN deployments use for controller-to-switch communication, did not require authentication of switches in early versions, and even post-1.2 versions with TLS support are frequently deployed without it [9]. An attacker who gains access to the control plane can add a second malicious controller (a feature available in OpenFlow 1.2+) and persistently reroute traffic without touching the data plane hardware. The centralization that makes SDN manageable is exactly what makes it a single point of failure for traffic integrity.

Figure 1: NVLink/NVSwitch topology inside a typical eight-GPU node. The fabric manager that programs NVSwitch routing tables runs as a privileged host process; the resulting tenant boundary is software-configured rather than hardware-enforced, and the traffic that crosses it never touches the CPU, the PCIe bus, or the OS network stack.

Within a single node, GPUs communicate over NVLink and NVSwitch at bandwidths up to 900 GB/s [10] — bypassing the host CPU, PCIe bus, and OS network stack. NVIDIA’s fabric manager controls NVSwitch routing tables and restricts applications to designated address ranges, but this is a software-configured trust boundary, not a hardware-enforced one. There is no equivalent of IOMMU page-table isolation for NVLink traffic; the protection rests on correct configuration of the fabric manager, which runs as a privileged host process. A compromised fabric manager, or a bug in NVSwitch routing table setup, could allow one GPU’s process to read or write another’s memory region across the switch. For multi-tenant GPU servers using NVLink-connected GPUs — the standard configuration in DGX systems and cloud GPU instances — this is a lateral movement path that never touches the network and never appears in host OS logs.

Even when interconnect traffic is encrypted, traffic analysis remains viable. The volume and timing of gradient synchronization traffic are highly regular and predictable [11] — all-reduce operations produce characteristic burst patterns that correlate with batch boundaries, model architecture, and even convergence state. An observer on the fabric (or with access to switch port counters) can infer training progress, detect when checkpointing occurs, and distinguish between different model architectures without decrypting a single packet.

Distributed Systems

Distributed training coordinates gradient synchronization across hundreds or thousands of GPUs using collective operations — all-reduce, all-gather, reduce-scatter — implemented by libraries like NCCL and Gloo. These protocols run over the RDMA fabric with no authentication or encryption — NCCL’s own documentation states it “does not provide secure network communication between GPUs” [12]; any process that can reach the network can inject or modify gradient data in transit. In federated and multi-node settings, this is the mechanism by which model poisoning attacks operate: a compromised node contributes manipulated gradients that shift the model’s behavior while remaining within the statistical noise of normal training variance. The protocol’s correctness properties — that all-reduce actually computes the sum of all participants’ contributions, that no participant can observe another’s individual gradient — are distributed-system invariants that formal methods can specify and verify. Proof-carrying code (§) is one path to making these guarantees checkable at deployment time rather than assumed.

Checkpoint integrity is the other open wound. Large training runs write distributed checkpoints — sharded model state spread across dozens or hundreds of files on networked storage — every few thousand steps. These checkpoints are the recovery mechanism when nodes fail, which happens constantly at scale. They are also the artifact that gets promoted to production: the “final model” is just the last checkpoint that passed evaluation. Yet checkpoints are typically stored as serialized tensors with no cryptographic signature, no content-addressable hash chain linking them to the training run that produced them, and no tamper-evident log of which processes wrote which shards. An attacker with write access to the shared filesystem — or to the object store behind it — can modify checkpoint shards to inject backdoor behavior, and the training job will resume from the poisoned state without any indication that the checkpoint was altered. The same exposure applies when checkpoints are copied between clusters, uploaded to model registries, or handed off from a training team to a deployment team. The integrity gap between “this file exists on S3” and “this file is the authentic output of training run X at step Y” is currently bridged by convention, not cryptography.

Data pipeline integrity compounds the problem. Training data flows through distributed storage systems — HDFS, cloud object stores, data lakes — through preprocessing jobs, shuffling, and batching before it reaches the GPUs. Each stage is a point where data can be modified, filtered, or augmented by an attacker with access to the storage layer. Unlike gradient poisoning, which requires a compromised training node, data poisoning can happen long before training begins, in ETL jobs or data labeling pipelines that are often run with broader permissions and less monitoring than the training jobs themselves.

Identity and Access Management

The IAM layer controls which principals — human users, CI pipelines, serving infrastructure, and increasingly autonomous agents — can read model weights, write to training data stores, modify deployment configurations, or invoke models. The failure mode is over-permissioning: ML pipelines tend to accumulate broad service account credentials because the alternative is debugging permission errors during a training run. In 2025, Sysdig’s threat-research team documented an attack chain [13] where compromised credentials reached cloud administrator privileges in eight minutes across 19 distinct AWS principals — LLM-assisted reconnaissance followed by Lambda-based privilege escalation, with invocation logging disabled along the way. When the principal is an AI agent operating with long-lived API tokens, the IAM boundary is the control boundary. A token compromise gives the attacker everything the agent could do, with access patterns indistinguishable from legitimate automation. Proof-carrying code (§) offers a structural alternative: rather than trusting that IAM policies are correctly configured and that tokens are uncompromised, require that actions carry machine-checkable proofs of authorization.

Model registries are an under-secured chokepoint. A model registry — Hugging Face Hub, a private MLflow instance, a cloud model catalog — is where trained models are stored, versioned, and promoted to production. Whoever can push a new version to the registry controls what runs in production. The PoisonGPT demonstration [14] showed that a surgically modified open-source model could be uploaded to Hugging Face and spread targeted misinformation while retaining normal benchmark performance. ReversingLabs found malicious code embedded in pickle-serialized models on Hugging Face that evaded the platform’s safety scanning [15]. Model confusion attacks — the AI supply-chain analog of dependency confusion — exploit namespace reuse in registries to trick downstream pipelines into pulling attacker-controlled models. Access control on model registries is often weaker than on code repositories: push permissions are granted to broad service accounts, there is no equivalent of signed commits or required code review, and the model artifact format (often pickle or safetensors) has no built-in integrity or provenance chain.

Secret sprawl is the background condition that makes all of the above worse. ML workflows generate and consume credentials at every stage: cloud storage keys in training scripts, Weights & Biases tokens in experiment configs, Hugging Face tokens in model download scripts, database credentials in data pipeline notebooks. GitGuardian’s 2025 report [16] found 23.8 million secrets leaked in public GitHub repositories, a 25% increase over the prior year — and ML repositories are disproportionately affected because the notebook-driven, experiment-oriented workflow encourages hardcoding credentials for quick iteration. These secrets leak into container images (baked into layers during build), into training logs (printed during debugging and never scrubbed), and into checkpoint metadata (serialized alongside model state). When an agent operates autonomously — writing code, calling APIs, launching training runs — it accumulates credentials in its context window and working files with no human in the loop to notice. The agent-to-agent delegation problem makes this worse: when one agent delegates a subtask to another, what credentials does the delegate inherit, and who revokes them when the task is done? Current IAM systems have no concept of transitive, time-bounded delegation for non-human principals.

Bibliography

[1] NVIDIA, “CVE-2024-0132: NVIDIA Container Toolkit TOCTOU container escape.” 2024.
[2] Open Containers / Snyk Security Labs, “CVE-2024-21626: runc container escape via leaked file descriptors (Leaky Vessels).” 2024.
[3] SchedMD, “CVE-2023-49935: Slurm MUNGE message integrity bypass.” 2023.
[4] SchedMD, “CVE-2022-29501: Slurm PMI2/PMIx handler allows unprivileged write to arbitrary Unix sockets.” 2022.
[5] SchedMD, “CVE-2017-15566: Slurm SPANK environment variable privilege escalation.” 2017.
[6] Oligo Security, “CVE-2023-48022: Unauthenticated remote code execution in Ray (ShadowRay).” 2023.
[7] The Kubernetes Authors, “Schedule GPUs.” [Online]. Available: https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/
[8] C. J. Anderson et al., “NetKAT: Semantic Foundations for Networks,” in Proceedings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL), 2014, pp. 113–126. doi: 10.1145/2535838.2535862.
[9] K. Benton, L. J. Camp, and C. Small, “OpenFlow Vulnerability Assessment,” in Proceedings of the Second ACM SIGCOMM Workshop on Hot Topics in Software Defined Networking (HotSDN), 2013, pp. 151–152. doi: 10.1145/2491185.2491222.
[10] NVIDIA, “NVIDIA Hopper Architecture In-Depth.” [Online]. Available: https://developer.nvidia.com/blog/nvidia-hopper-architecture-in-depth/
[11] W. Li et al., “Analyzing Communication Predictability in LLM Training,” 2025.
[12] NVIDIA, “NCCL User Guide: Setup.” [Online]. Available: https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/setup.html
[13] Sysdig Threat Research Team, “AI-assisted cloud intrusion achieves admin access in 8 minutes.” [Online]. Available: https://sysdig.com/blog/ai-assisted-cloud-intrusion-achieves-admin-access-in-8-minutes
[14] Mithril Security, “PoisonGPT: How we hid a lobotomized LLM on Hugging Face to spread fake news.” [Online]. Available: https://blog.mithrilsecurity.io/poisongpt-how-we-hid-a-lobotomized-llm-on-hugging-face-to-spread-fake-news/
[15] ReversingLabs, “Malicious ML models discovered on Hugging Face platform.” [Online]. Available: https://www.reversinglabs.com/blog/rl-identifies-malware-ml-model-hosted-on-hugging-face
[16] GitGuardian, “The State of Secrets Sprawl 2025,” technical report, 2025. [Online]. Available: https://www.gitguardian.com/state-of-secrets-sprawl-report-2025

// related tractable problems

agent-weirdmachines ai-control-proof-carrying-code audit-log-integrity edge-policy-verification fabric-policy-verification network-tap-fpga neuralese-governance oci-runtime-hardening scheduler-cotenancy spec-elicitation verified-protocol-boundaries

[firmware-lowlevel]Firmware & Low-Level Systemsfull page →

invites:co-tenant rogue-insider supply-chain

Firmware and Low-Level Systems

Below the orchestration layer and above the silicon sits the firmware: hypervisors, device drivers, and boot chains. Code here runs at the highest privilege levels on both CPU and GPU. A bug is not a container escape — it is a host compromise, often with no log entry at all.

Microkernels and Hypervisors

Multi-tenant GPU isolation today relies on hypervisors, and the trusted computing base is enormous: KVM’s is roughly ten million lines of code [1], Xen’s smaller but still far past exhaustive verification. Both have had VM-escape vulnerabilities — Google Project Zero’s 2021 KVM breakout via AMD SVM nested virtualization [2] is representative — and the PCI passthrough path that GPU workloads require widens the surface further, since IOMMU misconfigurations or missing PCI Access Control Services (ACS) can allow peer-to-peer DMA between devices assigned to different tenants. NVIDIA’s Multi-Instance GPU (MIG) partitioning adds hardware-level memory isolation within a single GPU, but the partitioning is managed by the host driver, which runs inside the hypervisor’s TCB. The verified-core response to a TCB that large — stop trusting most of it, verify a minimal core, and push policy and drivers into a de-privileged layer — is the microkernel line of work taken up as an enabler problem in § (seL4, NOVA, and the tailored hypervisors seKVM and AWS’s Nitro Isolation Engine). None of those ships a verified driver for the accelerator itself, which is the widget at §.

Even where MIG is deployed, GPU memory is not always scrubbed between context switches. Trail of Bits demonstrated this with LeftoverLocals [3]: on AMD, Apple, and Qualcomm GPUs, a co-resident process could read local memory left behind by a previous kernel, recovering roughly 5.5 MB per GPU invocation — enough to reconstruct an LLM’s token-by-token output with high fidelity. NVIDIA hardware was not affected in that specific case, but the deeper issue is architectural. GPUs were designed for throughput, not isolation, and the memory hierarchy reflects that priority. Shared L2 caches, shared interconnects, and performance counters that leak timing information all create side channels across tenant boundaries. The NVBleed attack [4] showed that contention on NVLink interconnects between GPUs lets an attacker fingerprint which deep-learning model a co-tenant is running with 97.8% accuracy, and the attack works across VM boundaries on Google Cloud — even after NVIDIA patched performance-counter access, the timing channel alone still yields F1 scores above 83%.

Confidential VMs compound the problem. CPU-side TEEs like AMD SEV-SNP encrypt guest memory so the hypervisor cannot read it, but extending that guarantee to a GPU is an open engineering problem. SEV-SNP requires the IOMMU in non-passthrough mode to prevent peripherals from reaching encrypted memory, which directly conflicts with the PCIe passthrough that GPU workloads need. AMD’s SEV-TIO extension [5] is meant to bridge this gap using PCI-SIG’s TDISP protocol, but it is not yet widely deployed, and the attestation story for a GPU behind a TDISP-secured link is still being worked out. In the meantime, any “confidential AI” deployment that claims SEV-SNP protection while using GPU passthrough has a gap in its threat model that the marketing materials do not mention.

Device Drivers and Runtimes

The GPU driver is the most privileged code that touches the accelerator. NVIDIA’s proprietary CUDA driver stack, AMD’s ROCm, and Intel’s oneAPI are all closed-source, kernel-mode codebases that handle memory mapping, command submission, and context switching for every GPU workload on the machine. NVIDIA ships quarterly security bulletins; 2024 alone included [6] (privilege escalation in the display driver) and [7] (out-of-bounds read leading to code execution), plus nine vulnerabilities in the CUDA toolkit found by Palo Alto’s Unit 42 [8]. These are not exotic attacks — they are standard memory-safety bugs in C code running at ring 0. The driver is a single point of compromise: an attacker who controls it can read any tenant’s GPU memory, modify in-flight computations, or pivot to the host kernel. Because the driver source is proprietary, the only available mitigations are patching and hoping. A verified, open driver for at least the GPU command-submission path would change the calculus (§).

A compromised or buggy GPU driver also opens the door to DMA attacks. The GPU sits on the PCIe bus and performs DMA to host memory; the IOMMU is supposed to restrict which physical pages the device can touch, but if the driver programs the IOMMU mappings incorrectly — or if an attacker can influence them — the GPU becomes a tool for reading or writing arbitrary host memory. [9] demonstrated exactly this on NVIDIA’s Jetson platform, where a PCIe DMA attack bypassed secure boot. The IOMMU is a necessary defense, but it is only as good as the code that configures it, and that code is part of the same proprietary driver stack.

The CUDA compatibility layer adds its own surface area. NVIDIA maintains forward and backward compatibility across major CUDA versions, which means the driver must support multiple ABIs simultaneously and preserve behavior across a sprawling matrix of toolkit-to-driver version combinations. Operators running multi-framework training pipelines frequently pin older driver versions to avoid breaking their stack, which means known-patched vulnerabilities persist in production for months or years. The compatibility shims themselves are additional code paths that rarely get the same scrutiny as the hot path, and any bug in ABI translation is a bug at ring 0.

On the open-source side, NVIDIA released its kernel-mode GPU modules as open source in 2022 [10], and the Mesa project’s NVK driver now provides Vulkan support for Kepler through Blackwell. These are steps forward for auditability — community review can find classes of bugs that internal QA misses. But neither effort covers the full compute path that ML workloads use: the user-mode CUDA runtime, the compiler backend, and the firmware blobs that run on the GPU’s internal microcontrollers all remain closed. An open kernel module with a closed firmware blob is better than nothing, but it is not a verifiable system. The gap between “source available” and “formally verified” is where the interesting work lies.

Worth naming what the open problem is and is not. The verified microkernels and hypervisors above (§) all push device drivers out of the trusted core and into deprivileged user mode, which makes the driver verifiable in principle but does not make it verified. As a research methodology it works: separation-logic proofs of a driver against an abstract hardware model are a demonstrated pattern, from a ZynqMP DMA engine in concurrent separation logic [11] to BlueRock’s VirtIO virtual switch [12], [13], [14]. The unsolved half is the abstract hardware model. The instruction set is the part that is modeled: Arm and RISC-V both ship machine-readable ISA specifications — Arm’s even covers page-table walks — and REMS extends that line to relaxed virtual memory and the Arm MMU [15], [16]. What none of them covers is the accelerator’s device-control surface: the GPU command processor, the IOMMU/SMMU programming interface, and the DMA engines. AMD publishes a machine-readable GPU ISA, but it is the shader instruction set, not that surface. The pivot from such a model to a software proof is a further gap; sketches exist [14] but no one has closed it at scale. The tractable problem at § is accordingly as much about producing a formal model of a device command interface as it is about proving a driver against one.

Boot Integrity and Firmware Supply Chain

Figure 1: Two strategies for boot-chain integrity. Verified boot halts on a bad signature at each stage but trusts whatever lives in the signature database. Measured boot records hashes into a TPM and defers enforcement to an external policy plane that can release secrets, quarantine, or hard-reset based on attestation.

None of the isolation above matters if the firmware itself has been tampered with. Secure Boot and measured boot chains establish trust from the hardware root of trust (RoT) through each firmware stage to the OS kernel: each stage cryptographically verifies the next before handing off execution. NVIDIA’s H100 extends this model to the GPU, with a per-device ECC keypair, on-die RoT, and a measured boot sequence that produces an attestation report — a signed manifest of every firmware component loaded. Combined with CPU-side TEEs (Intel TDX, AMD SEV-SNP), this enables composite remote attestation: a verifier can check that both CPU and GPU booted clean firmware before releasing model weights or training data to a node.

There is a real architectural choice here, not just a terminology one, between enforcing boot locally via signature check (verified boot) and enforcing it externally via an attestation-gated control plane (measured boot). Verified boot halts on bad signatures, but it trusts whatever is in the signature database, and that database has been bypassed in the wild: the BlackLotus bootkit (2023) exploited [17] to defeat UEFI Secure Boot on fully patched Windows systems by bringing its own copy of a legitimately signed but vulnerable boot manager. Measured boot does not prevent anything from loading — the TPM faithfully records whatever hash is presented, clean or not — but in exchange it is more compositional: the hardware only measures, and policy (what to do with a given measurement) lives outside the boot path, in an operator-owned plane that can release secrets only to nodes that pass attestation, hard reset machines that fail it, or quarantine them from the management network for analysis. NOVA deliberately went this direction [18], on the argument that post-facto measurability with an external control plane is a better composition point than bundling enforcement into the boot ROM. Either story depends on continuous attestation and a fresh revocation list; a stale verifier leaves a tampered node operating undetected between checks.

The Baseboard Management Controller (BMC) is a separate, quieter threat. Every server in a training cluster has a BMC — a small system-on-chip running its own OS (often Linux), connected to its own network interface, with out-of-band access to the host’s power, console, and firmware update mechanisms. BMCs are managed via IPMI or Redfish, and they are rarely patched with the same urgency as the host OS. In 2023, Binarly disclosed seven vulnerabilities in Supermicro BMC firmware [19] that gave unauthenticated attackers root access to the BMC. From there, an attacker can read cleartext credentials off the BMC filesystem, flash malicious UEFI firmware to the host, or pivot to every other BMC on the management VLAN. The persistence is the real problem: a BMC implant survives host OS reinstalls, disk wipes, and even GPU firmware updates, because it lives on a separate flash chip that the host never touches.

Firmware signing for GPUs is better than it was — NVIDIA’s secure firmware update mechanism verifies digital signatures and enforces version anti-rollback — but the trust anchor is only as strong as the key management around it. The 2022 LAPSUS$ breach of NVIDIA [20] resulted in the theft of two code-signing certificates. Although the certificates were expired, Windows still accepts expired certificates for kernel drivers, and malware signed with the stolen keys appeared in the wild within a day. The broader lesson: a single key compromise turns the entire firmware-signing infrastructure from a defense into a distribution channel. What formal methods can contribute here is verifying the attestation protocol itself: proving that the chain of measurements is unforgeable, that a compromised node cannot replay a clean attestation, that revocation logic has no time-of-check/time-of-use gaps, and that the BMC’s update path cannot be used to write outside its intended flash region. The machinery exists, but deploying it across a 10,000-GPU training cluster with continuous attestation, key rotation, and revocation checking is an engineering problem that remains largely unsolved at scale.

Bibliography

[1] S. Biggs, D. Lee, and G. Heiser, “The Jury Is In: Monolithic OS Design Is Flawed: Microkernel-based Designs Improve Security,” in Proceedings of the 9th Asia-Pacific Workshop on Systems (APSys), 2018. doi: 10.1145/3265723.3265733.
[2] F. Wilhelm, “An EPYC escape: Case-study of a KVM breakout.” [Online]. Available: https://projectzero.google/2021/06/an-epyc-escape-case-study-of-kvm.html
[3] Trail of Bits, “CVE-2023-4969: LeftoverLocals — GPU local memory leak across tenant boundaries.” 2023.
[4] Y. Zhang, R. Nazaraliyev, S. B. Dutta, A. Marquez, K. Barker, and N. Abu-Ghazaleh, “NVBleed: Covert and Side-Channel Attacks on NVIDIA Multi-GPU Interconnect.” [Online]. Available: https://arxiv.org/abs/2503.17847
[5] Advanced Micro Devices, “AMD SEV-TIO: Trusted I/O for Secure Encrypted Virtualization.” [Online]. Available: https://www.amd.com/content/dam/amd/en/documents/developer/sev-tio-whitepaper.pdf
[6] NVIDIA, “CVE-2024-0126: NVIDIA GPU Display Driver privilege escalation.” 2024.
[7] NVIDIA, “CVE-2024-0107: NVIDIA GPU Display Driver out-of-bounds read.” 2024.
[8] A. Zambelli, “Multiple Vulnerabilities Discovered in NVIDIA CUDA Toolkit.” [Online]. Available: https://unit42.paloaltonetworks.com/nvidia-cuda-toolkit-vulnerabilities/
[9] NVIDIA, “CVE-2022-21819: NVIDIA Jetson PCIe DMA attack bypassing secure boot.” 2022.
[10] NVIDIA, “NVIDIA Releases Open-Source GPU Kernel Modules.” [Online]. Available: https://developer.nvidia.com/blog/nvidia-releases-open-source-gpu-kernel-modules/
[11] G. Stewart, “Verified ZynqMP DMA Driver in Concurrent Separation Logic (talk).” [Online]. Available: https://sel4.systems/Summit/2025/abstracts2025.html#a-verified-zynqmp
[12] J. Haag, Y. Hirai, S. Hudon, A. Masood, G. Malecha, and G. Stewart, “Protocol Completion of a Robust C++ Virtual Switch,” Tech report, Aug. 2024. Accessed: June 14, 2026. [Online]. Available: https://bluerocksec.gitlab.io/formal-methods/tech_reports/protocol-completion-of-a-robust-c-virtual-switch/
[13] BlueRock Security, “Verifying a Virtual Machine Monitor,” Tech report, 2024. [Online]. Available: https://bluerocksec.gitlab.io/formal-methods/tech_reports/verifying-a-virtual-machine-monitor/
[14] BlueRock Security, “Modularizing CPU Semantics for Virtualization,” Tech report, 2024. [Online]. Available: https://bluerocksec.gitlab.io/formal-methods/tech_reports/modularizing-cpu-semantics-for-virtualization/
[15] P. Sewell and REMS Group, “REMS: Rigorous Engineering of Mainstream Systems.” [Online]. Available: https://www.cl.cam.ac.uk/~pes20/rems/
[16] B. Simner, A. Armstrong, J. Pichon-Pharabod, C. Pulte, R. Grisenthwaite, and P. Sewell, “Relaxed Virtual Memory in Armv8-A,” in Programming Languages and Systems (ESOP 2022), 2022. doi: 10.1007/978-3-030-99336-8_6.
[17] Microsoft, “CVE-2022-21894: UEFI Secure Boot bypass exploited by BlackLotus bootkit.” 2022.
[18] BlueRock Security, “NOVA: A Microhypervisor-Based Secure Virtualization Architecture.” [Online]. Available: https://bluerocksec.gitlab.io/formal-methods/faq/what-is-nova/
[19] Binarly, “CVE-2023-40284 through CVE-2023-40290: Supermicro BMC IPMI firmware vulnerabilities.” 2023.
[20] BleepingComputer, “NVIDIA confirms data was stolen in recent cyberattack.” [Online]. Available: https://www.bleepingcomputer.com/news/security/nvidia-confirms-data-was-stolen-in-recent-cyberattack/

// related tractable problems

device-drivers hypervisors-microkernels weight-integrity

[hardware-supply-chain]Hardware & Physical Supply Chainfull page →

invites:supply-chain physical-adversary rogue-insider

Hardware and Physical Supply Chain

At the bottom of the stack is the silicon itself: the GPU, TPU, and NPU dies that run every training and inference workload, and the global supply chain that produces them. Compromise here is durable: a hardware trojan survives every software update, every OS reinstall, every firmware reflash. It is also opaque: you cannot read out the RTL of a fabricated chip to check what it does.

Chip Architecture

The RTL design of a modern AI accelerator is proprietary. NVIDIA does not publish the register-transfer-level description of its GPU compute cores; neither do Google (TPUs), AMD, or any other vendor. This means the formal methods community cannot verify what the silicon actually computes. We take it on trust that an H100′s tensor cores implement matrix multiplication correctly, that its memory controllers enforce isolation between MIG partitions, and that no undocumented debug interfaces exist. Classen et al. [1] demonstrated that a hardware trojan implanted in a Xilinx Vitis AI DPU could backdoor a traffic-sign recognition model by replacing just 30 of 43,000 parameters at the accelerator level — expanding the circuit by 0.24% with zero runtime overhead, invisible to all software-side defenses. The attack needed no cooperation from the model or the training pipeline; the accelerator alone was sufficient.

The design tools themselves are part of the attack surface. The chip industry flows through a handful of EDA vendors — primarily Synopsys and Cadence — whose toolchains are millions of lines of proprietary code that no customer audits. Basu et al. demonstrated in their “CAD-Base” work [2] that every stage of the EDA pipeline — high-level synthesis, logic synthesis, place-and-route, verification — is a viable insertion point for hardware trojans. The place-and-route attack is nasty: the tool can emit one netlist for verification and a different netlist for tapeout, and existing equivalence-checking flows never compare the two. A malicious logic synthesis tool can exploit underspecified finite-state machines to insert extra transitions triggered by rare conditions, leading the circuit into attacker-controlled states. These are not hypothetical proof-of-concept attacks against toy designs; they target the actual tool interfaces that every fabricated chip passes through. Open-source EDA toolchains (Yosys, OpenROAD) exist but are not used for leading-edge AI accelerator design, so for now the industry trusts its toolchain vendors implicitly.

Beyond the design tools, modern SoCs integrate dozens of third-party IP cores — licensed blocks for PCIe controllers, memory interfaces, interconnect fabrics, security modules — that the chip designer incorporates without access to the underlying RTL. A third-party IP vendor can embed a hardware trojan that activates post-fabrication, leaks data through a covert channel, or degrades reliability on a trigger condition. The SoC integrator’s only recourse is black-box testing and whatever contractual guarantees the IP license provides, neither of which catch a well-designed trojan. Formal verification of third-party IP at the pre-silicon stage is an active research area, but it requires a golden specification of correct behavior — which is exactly what you don’t have when the IP is a licensed black box.

Hardware formal verification is used in industry, but its coverage is narrower than outsiders might expect. Intel adopted formal methods in earnest after the 1994 Pentium FDIV bug (a $475 million recall) [3]; by the Nehalem microarchitecture in 2008, formal verification was the primary validation method for floating-point and other arithmetic units. ARM’s ISA-Formal framework [4] uses bounded model checking in JasperGold to verify that RTL implementations conform to the instruction set specification — but bounded model checking only proves correctness up to a fixed sequence length, not for arbitrary execution traces, unless you can find the right invariants to lift it to an unbounded proof. In practice, formal verification at major vendors covers specific high-value blocks (FPUs, cache coherence protocols, security state machines) while the rest of the design relies on simulation-based testing. Nobody is formally verifying an entire GPU.

RISC-V changes part of this picture. The ISA specification is open, and several open-source RTL implementations exist (BOOM, Rocket, CVA6) with corresponding formal verification frameworks — YosysHQ’s riscv-formal [5] can check that an implementation conforms to the RISC-V spec using model checking, and the OpenHW Group’s CORE-V-VERIF [6] provides industrial-grade verification for the CV32 and CV64 families. MIT’s Riscy processors, built on the Coq-embedded Kami DSL [7], are formally verified open-source RISC-V cores. But open ISA is not open silicon. These designs still get fabricated at foundries through proprietary EDA toolchains and opaque manufacturing processes. An open RTL that you can verify pre-silicon is a real improvement over a proprietary RTL that you can’t — but it does not close the gap between the verified netlist and the actual transistors on the die. FPGAs remain the one place where the deployer controls the entire path from RTL to running hardware, which is why § proposes FPGA taps as a verification target.

On the GPU side, the architecture determines what isolation primitives are available to driver software. Whether a verified driver (§) can enforce strong tenant isolation depends on what the hardware exposes — and right now, we are relying on vendor documentation rather than verified specs for that interface.

Physical Security

Confidential computing has reached the GPU. NVIDIA’s H100 supports a hardware-rooted TEE that, in conjunction with CPU-side enclaves (Intel TDX, AMD SEV-SNP), can encrypt model weights in transit and at rest on the device, with attestation that the firmware and driver stack have not been tampered with. This matters for AI because model weights are the most valuable artifact in the ecosystem — worth billions in training compute — and multi-tenant cloud deployments expose them to the cloud operator’s entire software stack. But confidential computing protects against software adversaries with root access; it does not protect against physical adversaries with access to the hardware.

Modern AI accelerators use High Bandwidth Memory (HBM) stacked on a silicon interposer — the GPU die and HBM stacks sit side by side on a shared substrate connected by thousands of micrometer-scale copper traces. This interposer is a physical interface carrying model weights and activations at terabytes per second, unencrypted in current designs — even the H100′s confidential-computing mode draws the security boundary at the package and leaves on-package HBM traffic in the clear [8]. An attacker with physical access to the package could, in principle, probe these traces or tap the interposer to exfiltrate data. This is not a low-effort attack — it requires lab equipment and destructive or semi-destructive access to the package — but for a state-level adversary targeting a frontier model worth billions, it is not out of scope. HBM encryption at the interface level is not yet standard; until it is, the physical package boundary is the confidentiality boundary.

Side-channel attacks add another dimension. [9] showed that the magnetic flux from a GPU’s power cable — measurable with a $3 induction sensor — betrays the detailed topology and hyperparameters of a neural network being evaluated, achieving 96.8% classification accuracy for identifying network layers on an Nvidia Titan V. [10] demonstrated that electromagnetic emanations from GPU DVFS (dynamic voltage and frequency scaling) activity penetrate walls and can be captured at 3+ meters, enabling website fingerprinting and keystroke timing attacks. Scale these techniques to a data center with thousands of GPUs running a single training job, and the EM signature is correspondingly richer. TEMPEST-class shielding is well-understood for traditional compute environments, but GPU training clusters dissipate megawatts of power with correspondingly strong emanations, and it is not clear that standard data center construction provides adequate EM containment. This is an area where the AI security community has barely begun to assess the threat surface.

Counterfeit and recycled chips are a supply chain problem that predates AI but gains new significance when the chips in question carry model weights. The Semiconductor Industry Association estimates counterfeit semiconductors cost the U.S. industry roughly $7.5 billion a year in lost revenue [11]; the most common attack is remarking — taking a chip desoldered from e-waste, sanding the package markings, and relabeling it as a higher-grade or newer part. For commodity components this causes reliability failures; for security-critical components in a training cluster, it could mean a node with degraded or compromised silicon participating in a distributed training run. Detection methods exist — on-chip aging sensors (“odometers”), power side-channel profiling, X-ray inspection of die markings — but none are deployed systematically for GPU procurement at scale. The assumption is that buying from authorized distributors eliminates the problem, which is true until it isn’t.

The geographic concentration of semiconductor manufacturing is a systemic risk that no amount of formal verification can address, but it shapes the threat model for everything else in this section. TSMC fabricates over 90% of the world’s most advanced chips [12], including every NVIDIA AI accelerator. All of this capacity sits in Taiwan. A single earthquake, a fabrication contamination event, or a military conflict in the Taiwan Strait would halt frontier AI hardware production globally. The CHIPS Act and TSMC’s fabs in Arizona, Japan, and Germany are slowly redistributing this concentration, but leading-edge capacity outside Taiwan remains years away from volume production. Advanced chip packaging — the 2.5D interposer and CoWoS technology that makes HBM-equipped AI accelerators possible — is even more concentrated than fabrication itself. For security planning, this means the physical supply chain has a single point of failure that cannot be engineered around with software or formal methods; it can only be mitigated through industrial policy and supply chain diversification.

For training clusters of 10,000+ GPUs, the physical supply chain itself becomes a target at the system level, not just the component level. Every node in the cluster must be attested before it receives model weights or gradient updates. Every network link between nodes carries data that, if intercepted or modified, could poison a training run or exfiltrate a model. Physical tamper-evidence for installed hardware — seals on server chassis, tamper-detecting rack enclosures, logged physical access — is standard practice in classified environments but not in commercial data centers running AI training workloads. As frontier model training costs reach hundreds of millions of dollars, the gap between the physical security posture of a commercial hyperscaler and the value of the assets it protects is widening. FPGA-based network taps (§) address the link-level problem: a tap with a formally verified packet-processing pipeline can guarantee that it faithfully mirrors traffic without injection or modification — a property that no software tap running on a general-purpose OS can credibly claim.

Bibliography

[1] A. Classen, D. Hansma, T.-N. Ngo, M. Becker, and K. Rieck, “Evil from Within: Machine Learning Backdoors through Hardware Trojans,” in Proceedings of the IEEE Symposium on Security and Privacy, 2025. [Online]. Available: https://arxiv.org/abs/2304.08411
[2] K. Basu et al., “CAD-Base: An Attack Vector into the Electronics Supply Chain,” ACM Transactions on Design Automation of Electronic Systems (TODAES), vol. 24, no. 4, pp. 38:1–38:30, 2019, doi: 10.1145/3315574.
[3] J. Harrison, “Formal Verification at Intel,” in 18th Annual IEEE Symposium on Logic in Computer Science (LICS), 2003, pp. 45–54. doi: 10.1109/LICS.2003.1210044.
[4] A. Reid, “Trustworthy Specifications of ARM v8-A and v8-M System Level Architecture,” in Proceedings of Formal Methods in Computer-Aided Design (FMCAD), 2016. doi: 10.1109/FMCAD.2016.7886675.
[5] YosysHQ, “riscv-formal: RISC-V Formal Verification Framework.” [Online]. Available: https://github.com/YosysHQ/riscv-formal
[6] OpenHW Group, “CORE-V-VERIF: Functional Verification for CORE-V RISC-V Cores.” [Online]. Available: https://github.com/openhwgroup/core-v-verif
[7] J. Choi, M. Vijayaraghavan, B. Sherman, A. Chlipala, and Arvind, “Kami: A Platform for High-Level Parametric Hardware Specification and Its Modular Verification,” Proceedings of the ACM on Programming Languages (ICFP), vol. 1, 2017, doi: 10.1145/3110268.
[8] NVIDIA, “Confidential Compute on NVIDIA Hopper H100.” [Online]. Available: https://images.nvidia.com/aem-dam/en-zz/Solutions/data-center/HCC-Whitepaper-v1.0.pdf
[9] H. Maia, C. Chang, D. Sanchez, and S. Devadas, “Can one hear the shape of a neural network?: Snooping the GPU via magnetic side channel,” in Proceedings of the 31st USENIX Security Symposium, 2022.
[10] Z. Zhan et al., “Graphics Peeping Unit: Exploiting EM Side-Channel Information of GPUs,” in Proceedings of the IEEE Symposium on Security and Privacy (S&P), 2022. doi: 10.1109/SP46214.2022.9833773.
[11] Semiconductor Industry Association, “Winning the Battle Against Counterfeit Semiconductor Products.” [Online]. Available: https://www.semiconductors.org/wp-content/uploads/2018/01/SIA-Anti-Counterfeiting-Whitepaper.pdf
[12] Semiconductor Industry Association and Boston Consulting Group, “Strengthening the Global Semiconductor Supply Chain in an Uncertain Era,” technical report, Apr. 2021. [Online]. Available: https://www.semiconductors.org/strengthening-the-global-semiconductor-supply-chain-in-an-uncertain-era/

// related tractable problems

device-drivers network-tap-fpga

ML Training & Inference Stack

Bibliography

Execution Harness

Inference Servers

Sandboxes

API Gateways

Bibliography

Software and ML Framework Layer

Compilers

Frameworks

Dependency Supply Chain

Bibliography

Orchestration and Cloud Layer

Cluster Orchestration

Network Fabric

Distributed Systems

Identity and Access Management

Bibliography

Firmware and Low-Level Systems

Microkernels and Hypervisors

Device Drivers and Runtimes

Boot Integrity and Firmware Supply Chain

Bibliography

Hardware and Physical Supply Chain

Chip Architecture

Physical Security

Bibliography