Tractable Problems in AI Security via Formal Methods

Introduction

How to read this document

The document has two halves. § walks the five layers of the ML training and inference stack — execution harness, software/ML framework, orchestration and cloud, firmware and low-level systems, hardware and physical supply chain — and, for each, sketches the status quo and how attackable it is. § is the payload: a list of concrete problems, each tagged by which layers of the stack it touches and whether it is an enabler (unbottlenecks a class of downstream work) or a widget (a scoped, shippable artifact). The website at tractable.for-all.dev mirrors the same content, with the layer/problem tagging exposed as a many-to-many filter.

What formal methods can and can’t claim

Figure 1: After Evan Miyazono. Formal verification only closes the middle gap; the elicitation gap on the left is out of scope for the proof itself, and the modeling gap on the right is only as good as the model of the system the proof is stated against. Both ends are where most real-world failures live.

This is the same line the June 2026 RAND expert survey draws: verification reaches code-level correctness — memory safety, access control, protocol enforcement, sandbox isolation — and has little purchase on model semantics like jailbreaks or sandbagging [1]. We take that boundary as our scope. Everything in this document sits to the left of the modeling gap, in the infrastructure the model runs on, not in the model itself.

What we count as tractable

Three working principles narrow the problem list in §.

Mechanism, not policy

Verification has the most leverage on the mechanisms that establish system integrity — microkernels, hypervisors, drivers, network protocols — and the least on the policies layered on top of them. Container scheduling on Kubernetes is the canonical example: a scheduler that places pods optimally produces a faster answer, not a more correct one, and verifying its placement decisions does not improve isolation between tenants if the underlying runtime is unsound. We accordingly scope the orchestration layer (§) to its mechanism content — distributed-protocol correctness, IAM logic, network-fabric isolation, runtime confinement — and treat scheduler optimization as out of scope. The same line shows up at every layer: there is a verifiable mechanism underneath, and a policy or optimization above it whose correctness is downstream of the mechanism’s.

Small API surfaces are reachable; large ones are not

Compare NOVA’s 16-hypercall interface [2] to the POSIX surface that Linux exports, or to the API a Docker daemon exposes through its versions. The first is small enough to specify and prove against; the others balloon with each release and have no single coherent specification to target. Tractability tracks API size more reliably than it tracks codebase size — a small interface in front of a large implementation is a verification target, a large interface is not. This shapes the widget specs in § toward the smallest customer-relevant subset — the 10 or 25 functions a real workload actually uses — rather than full-coverage proofs of sprawling APIs. The RAND survey points at the same first targets from the other direction: its experts ranked cryptographic primitives and access control as simultaneously highest in security value and highest in verification feasibility, and noted that production-ready verified implementations of both already exist [1].

Separable, language-agnostic specifications

A system’s specification should be separable from its implementation, both technically (so a Rust rewrite of a C++ component does not invalidate the proof effort) and legally (so an open spec can be referenced by implementations under stricter licenses). NOVA is the existence proof: the implementation is GPL-2 (Intel and TU Dresden) while the specification is licensed separately under Blue Rock [2]. The same separation is what would let a Linux retrofit happen without forcing every maintainer onto a single proof toolchain. We treat separability as a precondition: a widget that cannot factor cleanly into spec-and-implementation is one we cannot recommend, regardless of how shippable the implementation looks. A related cut shows up in the RAND recommendations along a different axis — split the verified invariant-enforcing core from the unverified performance-optimized code around it, so verification can keep pace with a fast-moving codebase instead of holding the whole thing hostage to its slowest-changing part [1].

Why this is hard to fund

The honest difficulty: formal methods compete in a market where the unverified alternative is usually free. A startup considering whether to buy a verified hypervisor over KVM, or a verified container runtime over runc, is comparing a paid product against a zero-marginal-cost open-source incumbent that is good enough for the threat model the customer thinks they have. The ROI calculation is upside-down before the conversation starts. This is the central reason the formal-methods talent that exists today is concentrated in domains — avionics, defense, automotive [3] — where regulators force the comparison to be against a counterfactual incident rather than against the free alternative.

The document does not solve this problem, but it tries to make the right cases visible. For each tractable problem in §, we name the threat model the verified component would close, the alternatives a buyer is implicitly comparing it against, and the lift to deliver a usable artifact rather than a research prototype. AI infrastructure is one of the few private-sector settings where the counterfactual incident is large enough to invert the calculation — model weight exfiltration, training-data poisoning, or container escape from an agentic workload are losses on the order of the model’s training cost — and where the customers (frontier labs, regulated downstream deployers) have both the budget and the risk model to act on it. Those losses split along the two converging threat models a 2026 RAND expert consultation puts at the center of the case for verifying this infrastructure — a conventional cyber adversary (nation-state or criminal) stealing weights or disrupting operations, and the loss-of-control case where a misaligned model exploits vulnerabilities in its own infrastructure to bypass safety monitors or exfiltrate itself [1] — and one hardened runtime closes both. That same consultation is a useful corrective on how the sale actually closes: in a capability race, frontier labs will not eat a meaningful performance or velocity penalty for security alone, so a verified component has to win on something they already want — fewer bugs, breach protection, faster incident recovery — with the assurance riding along rather than carrying the pitch. The same report offers the existence proof that this is achievable: AWS’s Automated Reasoning Group found that formally verifying parts of S3 left the code often more performant than what it replaced, easier to maintain, and faster to release [1] — verification buying velocity rather than spending it, which is what inverts the upside-down ROI above. Public money is now in play too: the June 2026 executive order directs OMB to identify federal grant funding for AI vulnerability detection [4], a channel that could underwrite the early widgets in § before any private buyer’s ROI flips. The labs’ own security planning corroborates the diagnosis: Anthropic’s frontier safety roadmap, under the heading “leveling up across the board,” names the same surfaces this document does — hardened Kubernetes access controls across frontier clusters, allowlist-only network egress in sensitive environments, integrity guarantees for build sources and dependencies, cryptographically verified short-lived identities [5]. The roadmap states those as configuration goals a security team checks; they are also properties a verified component could discharge rather than attest. Making the business case requires actually writing it down. That is what this document is.

Bibliography

[1] G. P. Sarma, R. Steratore, S. D. Bhatt, and G. Irving, “Verified Machine Learning Infrastructure: Formal Methods for Trustworthy Artificial Intelligence Deployment,” Santa Monica, CA, Research Report RR-A4881-1, June 2026. doi: 10.7249/RRA4881-1.
[2] BlueRock Security, “NOVA: A Microhypervisor-Based Secure Virtualization Architecture.” [Online]. Available: https://bluerocksec.gitlab.io/formal-methods/faq/what-is-nova/
[3] J. Woodcock, P. G. Larsen, J. Bicarregui, and J. Fitzgerald, “Formal Methods: Practice and Experience,” ACM Computing Surveys, vol. 41, no. 4, pp. 1–36, 2009, doi: 10.1145/1592434.1592436.
[4] The White House, “Promoting Advanced Artificial Intelligence Innovation and Security.” [Online]. Available: https://www.whitehouse.gov/presidential-actions/2026/06/promoting-advanced-artificial-intelligence-innovation-and-security/
[5] Anthropic, “Anthropic's Frontier Safety Roadmap.” [Online]. Available: https://www.anthropic.com/responsible-scaling-policy/roadmap