Tractable Problems in AI Security via Formal Methods

Software and ML Framework Layer

Between the orchestration infrastructure below and the execution harness above sits the software that actually defines and compiles a model: the frameworks researchers write against, the compilers that lower their code to GPU kernels, and the sprawling dependency graph that connects everything. This layer is where the mathematical definition of a neural network meets the reality of executable code. A compromise here is dangerous because it can alter what a model computes without changing what it appears to be — the weights, the architecture description, and the training logs all look clean while the compiled artifact does something else.

Compilers

An ML compiler takes a high-level model description and lowers it through several intermediate representations to GPU machine code. The typical path is something like PyTorch → torch.compile → Triton IR → LLVM IR → PTX → SASS, though the specifics vary: TVM uses its own Relay/TIR stack, XLA compiles through HLO and StableHLO, and JAX traces through jaxpr before hitting XLA. Each stage applies optimization passes — operator fusion, memory layout transforms, quantization, tiling — that rewrite the computation in ways the model author never sees directly.

Figure 1: A typical PyTorch lowering path, with sample injection points (red) where a poisoned pass could alter what the silicon computes without changing what the source, weights, or training logs look like.

The attack surface is a modern instance of Thompson’s “Trusting Trust” [1]. A poisoned compiler pass could inject a small additive bias into attention weights, swap an activation function under specific input conditions, or silently alter a quantization rounding mode. The result would be a model that behaves correctly on benchmarks but misbehaves on targeted inputs, with no trace of the modification in source code, model weights, or training logs. The compilation chain is long enough that auditing every stage by hand is not feasible.

This is not hypothetical. LLVM’s MLIR infrastructure — shared by TVM, XLA, and Triton — had multiple memory-management vulnerabilities in 2023 [2] where a crafted MLIR file could trigger crashes or worse. More telling is a 2025 XLA:TPU miscompilation in the approximate top-k operation that returned silently wrong results [3] for certain batch sizes, affecting production models. The compiler produced valid-looking output; it was just computing the wrong function. No error was raised. This is the failure mode that formal methods are built to prevent.

The formal methods community has begun to engage. Bhat et al. [4] formalized a core calculus for SSA-based IRs in Lean and verified peephole rewrites across three use cases: LLVM bitvector rewrites, structured control flow, and fully homomorphic encryption. Fehr et al. [5] defined semantics for five MLIR dialects, built three verification tools, and found five miscompilation bugs in upstream MLIR in the process. Both cover individual rewrites and local properties, not end-to-end compilation correctness. The gap between “verified peephole rewrites” and “verified compiler” is the same gap CompCert [6] closed for C — and closing it for even one ML compilation path (say, StableHLO to PTX for a fixed set of operations) would be a major result.

A practical intermediate target: verified compilation of attention kernels. Attention is the computational core of transformer inference, and it is also the operation most aggressively optimized by ML compilers (FlashAttention, PagedAttention, various fused variants). Proving that the compiled kernel preserves the semantics of the source-level attention specification, up to documented floating-point tolerances, would cover the highest-value target in the compilation chain.

Frameworks

PyTorch, JAX, and TensorFlow are the environments where models are defined, trained, and exported. They provide automatic differentiation, GPU dispatch, a zoo of layer implementations, and serialization formats for saving and loading model weights. Each is a million-line codebase with native extensions, JIT compilers, and deep operating system integration, but the highest-impact vulnerability class is mundane: deserialization.

PyTorch’s default serialization uses Python’s pickle protocol, which can embed arbitrary executable code in a saved file. Loading an untrusted .pt file is functionally equivalent to running eval() on attacker-controlled input. This was understood in principle for years, but [7] demonstrated that even the weights_only=True safeguard — PyTorch’s own recommended mitigation — could be bypassed, restoring full remote code execution. Hugging Face uses PickleScan to screen uploaded models, but JFrog found three zero-day bypasses in the scanner itself [8]: one corrupts magic-number detection so the scanner halts with an “invalid magic number” error while torch.load() still loads the payload; a second exploits a file-extension mismatch that skips scanning entirely; and a third uses submodule imports to downgrade the severity of dangerous globals from blocked to merely flagged. In a separate line of attack, ReversingLabs’ nullifAI technique [9] hid working reverse shells in PyTorch models by 7z-compressing the pickle payload so the scanner’s decompression failed, even as the malicious code ran first. TensorFlow has its own variant: Keras Lambda layers embed arbitrary Python in model definitions, and [10] showed that the safe_mode fix could be bypassed via a downgrade attack when loading older checkpoints.

The SafeTensors format, developed by Hugging Face, stores only tensor data and metadata with no code execution path. An independent security audit [11] found no critical flaws. This is the right direction, but the safety argument rests on an audit, not a proof. A serialization format with a machine-checked parser — in the style of EverParse [12] or similar verified parsers — would close the vulnerability class permanently rather than playing whack-a-mole with scanner bypasses.

Beyond serialization, the frameworks expose attack surface through JIT compilation and custom operators. PyTorch’s torch.utils.cpp_extension.load() compiles C++/CUDA code into /tmp and dynamically loads the resulting shared library. Any process that can influence the source or the cached .so gets native code execution. TensorFlow has accumulated roughly 435 CVEs across 44 CWE categories [13], many in parsing and memory management of its graph representation. Formal verification of core tensor operations — proving that softmax, layernorm, or matmul produce results consistent with their mathematical definitions across all compilation backends — remains open territory.

Dependency Supply Chain

Figure 2: Top: ecosystem-level statistics on the transitive-dependency surface from [14]. Bottom: three real compromises that reached production ML stacks via transitive pulls in 2022–2026.

A typical ML project does not just depend on PyTorch or TensorFlow. It depends on a graph of packages — data loaders, tokenizers, experiment trackers, serving utilities, CUDA bindings — that pull in their own transitive dependencies. Mahon et al. [14] analyzed 378,573 PyPI packages and found that the average package has 129.6 transitive dependencies spanning a dependency chain up to 23 levels deep. A single CVE in urllib3 created guaranteed exposure in 1,906 downstream packages. The ML ecosystem sits on top of all of this and adds its own layer of domain-specific packages with fast release cycles and small maintainer teams.

Supply chain attacks against this graph are not theoretical. In December 2022, an attacker uploaded a malicious torchtriton package to PyPI with the same name as PyTorch’s internal dependency [15]. Because pip resolves the public PyPI index before private indices by default, PyTorch-nightly users received the attacker’s version, which exfiltrated hostnames, environment variables, and password files. In December 2024, the Ultralytics YOLO library (60+ million PyPI downloads) was compromised through a GitHub Actions script injection: the attacker crafted PR branch names that injected code into the CI/CD pipeline, and the published package bundled a cryptominer [16]. In 2026, the legitimate litellm package — an inference proxy pulled as a transitive dependency by many ML stacks — was compromised by a direct upload to PyPI that bypassed the project’s GitHub release pipeline entirely [17]. The injected payload harvested SSH keys and cloud credentials and attempted lateral movement across Kubernetes clusters. The attacks are moving up the dependency graph, where one package reaches thousands of dependents.

The defenses that exist are largely unsigned and unverified. SLSA (Supply-chain Levels for Software Artifacts) [18] defines four levels of provenance assurance, from basic metadata (L1) through hermetic reproducible builds (L4). Most ML toolchains operate at L0 or L1 — packages are uploaded by individual maintainers from their laptops with no build provenance, no signed attestation, and no reproducibility guarantee. Sigstore [19] offers keyless artifact signing tied to OIDC identities, which would at least bind a package version to a specific CI run. But even with signed provenance, the resolver itself is a trust boundary: pip’s dependency resolution algorithm is complex, underspecified, and has produced dependency confusion vulnerabilities by design [20].

Formal methods can contribute at two levels. First, at the resolver level: proving that a dependency resolution algorithm correctly implements its specification and cannot be tricked by namespace collisions, version range manipulation, or index priority ordering. This is a bounded, well-defined verification target. Second, at the build level: verified reproducible builds where the mapping from source to artifact is proven deterministic, so that any party can independently verify that a published package corresponds exactly to its claimed source. Achieving SLSA L4 with machine-checked guarantees — rather than relying on build system configuration being correct — would make supply chain substitution attacks structurally impossible rather than merely detectable.

Bibliography

[1] K. Thompson, “Reflections on Trusting Trust,” Communications of the ACM, vol. 27, no. 8, pp. 761–763, Aug. 1984, doi: 10.1145/358198.358210.
[2] LLVM Project, “CVE-2023-29932 through CVE-2023-29939: Memory-management vulnerabilities in LLVM MLIR.” 2023.
[3] Anthropic, “A postmortem of three recent issues.” [Online]. Available: https://www.anthropic.com/engineering/a-postmortem-of-three-recent-issues
[4] S. Bhat, A. d. Keizer, C. Hughes, A. Goens, and T. Grosser, “Verifying Peephole Rewriting in SSA Compiler IRs,” in 15th International Conference on Interactive Theorem Proving (ITP 2024), 2024. doi: 10.4230/LIPIcs.ITP.2024.9.
[5] M. Fehr, Y. Fan, H. Pompougnac, J. Regehr, and T. Grosser, “First-Class Verification Dialects for MLIR,” in Proceedings of the ACM on Programming Languages (PLDI), 2025. doi: 10.1145/3729309.
[6] X. Leroy, “Formal verification of a realistic compiler,” Communications of the ACM, vol. 52, no. 7, pp. 107–115, 2009, doi: 10.1145/1538788.1538814.
[7] PyTorch Contributors, “CVE-2025-32434: Remote code execution via \texttt{torch.load} with \texttt{weights_only=True}.” 2025.
[8] JFrog Security Research, “Unveiling 3 Zero-Day Vulnerabilities in PickleScan.” [Online]. Available: https://jfrog.com/blog/unveiling-3-zero-day-vulnerabilities-in-picklescan/
[9] ReversingLabs, “Malicious ML models discovered on Hugging Face platform.” [Online]. Available: https://www.reversinglabs.com/blog/rl-identifies-malware-ml-model-hosted-on-hugging-face
[10] Google / TensorFlow, “CVE-2024-3660: Arbitrary code execution via Keras Lambda layer \texttt{safe_mode} bypass.” 2024.
[11] Trail of Bits, “EleutherAI, Hugging Face Safetensors Library Security Assessment,” technical report, May 2023. [Online]. Available: https://github.com/trailofbits/publications/blob/master/reviews/2023-03-eleutherai-huggingface-safetensors-securityreview.pdf
[12] T. Ramananandro et al., “EverParse: Verified Secure Zero-Copy Parsers for Authenticated Message Formats,” in Proceedings of the 28th USENIX Security Symposium, 2019.
[13] K. Zhang, Y. Feng, and Z. Chen, “Unraveling the Characterization and Propagation of Security Vulnerabilities in TensorFlow-based Deep Learning Software Supply Chain,” in 16th Asia-Pacific Symposium on Internetware (Internetware 2025), 2025. doi: 10.1145/3755881.3755923.
[14] J. Mahon, C. Hou, and Z. Yao, “PyPitfall: Dependency Chaos and Software Supply Chain Vulnerabilities in Python,” 2025.
[15] PyTorch Team, “Compromised PyTorch-nightly dependency chain (torchtriton).” [Online]. Available: https://pytorch.org/blog/compromised-nightly-dependency/
[16] ReversingLabs, “Compromised Ultralytics PyPI package delivers crypto coinminer.” [Online]. Available: https://www.reversinglabs.com/blog/compromised-ultralytics-pypi-package-delivers-crypto-coinminer
[17] FutureSearch, “litellm Supply Chain Attack.” [Online]. Available: https://futuresearch.ai/blog/litellm-pypi-supply-chain-attack/
[18] SLSA Contributors, “SLSA v1.0: Supply-chain Levels for Software Artifacts.” [Online]. Available: https://slsa.dev/spec/v1.0/
[19] Z. Newman, J. S. Meyers, and S. Torres-Arias, “Sigstore: Software Signing for Everybody,” in Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security (CCS), 2022. doi: 10.1145/3548606.3560596.
[20] A. Birsan, “Dependency Confusion: How I Hacked Into Apple, Microsoft and Dozens of Other Companies.” [Online]. Available: https://medium.com/@alex.birsan/dependency-confusion-4a5d60fec610

Software & ML Framework

Software and ML Framework Layer

Compilers

Frameworks

Dependency Supply Chain

Bibliography