Tractable Problems in AI Security via Formal Methods

Adversarial Robustness of Formal Methods

Adversarial Robustness of Formal Methods Agda ’s issue label “false” on GitHub sits at the time of this commit at 10 open and 76 closed . Agda ’s issue label “false” tracks the pro...

software-frameworkexecution-harnessmalicious-model

enabler

Formal Theory of Weird Machines in Agents

Formal Theory of Weird Machines in Agents A weird machine [1] is the unintended computational substrate an attacker programs when individually-legitimate operations compose into ca...

execution-harnessorchestration-cloudmalicious-model

enabler

AI Control via Proof-Carrying Code

AI Control via Proof-Carrying Code Proof-carrying code [1] and AI Control [2] , [3] are two literatures about the same problem: keep an untrusted producer on the far side of an int...

execution-harnesssoftware-frameworkorchestration-cloudmalicious-model

enabler

Audit Log Integrity and Session State Channels

Audit Log Integrity and Session State Channels The audit log is the primary accountability mechanism for deployed AI systems. Regulators, red teams, and incident responders all dep...

execution-harnessorchestration-cloudmalicious-modelrogue-insider

Capability-Safe Tool Interfaces

Capability-Safe Tool Interfaces A single tool call in an agentic harness is usually benign. Read a file, run a shell command, make an HTTP request — each is authorized individually...

execution-harnessmalicious-model

Context Window Integrity Under Adversarial Length

Context Window Integrity Under Adversarial Length A model’s context window is a fixed-size resource. The serving harness fills it by concatenating, in priority order, a system prom...

execution-harnessexternal-usermalicious-model

Device Drivers for Verified Kernels

Device Drivers for Verified Kernels Multi-tenant GPU workloads today run on hypervisors whose TCB is orders of magnitude too large to verify, and whose device-facing drivers are pr...

firmware-lowlevelhardware-supply-chainco-tenantrogue-insider

Edge Policy Verification

Edge Policy Verification The API gateway in front of an inference service decides two things on every request: whether the caller is who they claim to be, and whether they have exc...

execution-harnessorchestration-cloudexternal-usermalicious-model

Fabric Policy Verification

Fabric Policy Verification Every tenant-isolation claim on a GPU cluster fabric bottoms out in configuration state: P_Key partition tables and forwarding tables programmed by the I...

orchestration-cloudco-tenantnetwork-mitm

Microkernels and Hypervisors

Microkernels and Hypervisors Multi-tenant GPU isolation today relies on hypervisors. While hypervisors started small, broadly they have grown into full cloud-operating systems. KVM...

firmware-lowlevelco-tenantrogue-insider

enabler

Verified Network Tap

Verified Network Tap The control plane that decides what flows where on a GPU cluster fabric — SDN controllers, fabric managers, tenant routing tables ( § ) — is the same piece of ...

orchestration-cloudhardware-supply-chainnetwork-mitmphysical-adversary

Governance and Interpretability of Neuralese Proof Stacks

Governance and Interpretability of Neuralese Proof Stacks table stakes is that proofs are cheap. We assume a proof oracle will make code and proofs that they’re correct too cheap t...

execution-harnesssoftware-frameworkorchestration-cloudmalicious-model

enabler

OCI Runtime Hardening

OCI Runtime Hardening The standard container runtime, runc , implements the OCI specification atop Linux namespaces and cgroups. It was never designed to resist an adversary inside...

orchestration-cloudexecution-harnessmalicious-modelco-tenant

Sampler Integrity and Determinism

Sampler Integrity and Determinism The sampler is the last piece of code between the model’s output logits and the token that reaches the user. It applies temperature scaling, top- ...

software-frameworkexecution-harnessrogue-insidermalicious-model

Scheduler Integrity and Co-Tenancy Isolation

Scheduler Integrity and Co-Tenancy Isolation The dominant real-world mitigation for co-tenant attacks is not a scheduler policy but a purchase order: hyperscaler customers with ser...

orchestration-cloudco-tenantrogue-insider

Specification Elicitation and Validation

Specification Elicitation and Validation If proofs are cheap and specs are expensive, then specs are the bottleneck. Every widget in this document terminates in a spec: a verified ...

execution-harnesssoftware-frameworkorchestration-cloudmalicious-model

enabler

Verified Input Parsers

Verified Input Parsers Every boundary in the AI serving stack where text is parsed or transformed is a security boundary. The abuse classifier preprocessor tokenizes, truncates, an...

execution-harnesssoftware-frameworkexternal-usermalicious-model

Verified Protocol Trust Boundaries

Verified Protocol Trust Boundaries Before any of the AI-specific machinery in this chapter runs, a request has already crossed two or three protocol boundaries that predate AI enti...

execution-harnessorchestration-cloudexternal-usernetwork-mitm

Weight Integrity and Kernel Supply Chain

Weight Integrity and Kernel Supply Chain Model weights are the highest-value static artifact in an ML deployment. A frontier model’s checkpoint runs tens to hundreds of gigabytes, ...

execution-harnesssoftware-frameworkfirmware-lowlevelsupply-chainrogue-insider

Adversarial Robustness of Formal Methods

Adversarial Robustness of Formal Methods Agda ’s issue label “false” on GitHub sits at the time of this commit at 10 open and 76 closed . Agda ’s issue label “false” tracks the pro...

software-frameworkexecution-harnessmalicious-model

enabler

Formal Theory of Weird Machines in Agents

Formal Theory of Weird Machines in Agents A weird machine [1] is the unintended computational substrate an attacker programs when individually-legitimate operations compose into ca...

execution-harnessorchestration-cloudmalicious-model

enabler

AI Control via Proof-Carrying Code

AI Control via Proof-Carrying Code Proof-carrying code [1] and AI Control [2] , [3] are two literatures about the same problem: keep an untrusted producer on the far side of an int...

execution-harnesssoftware-frameworkorchestration-cloudmalicious-model

enabler

Audit Log Integrity and Session State Channels

Audit Log Integrity and Session State Channels The audit log is the primary accountability mechanism for deployed AI systems. Regulators, red teams, and incident responders all dep...

execution-harnessorchestration-cloudmalicious-modelrogue-insider

Capability-Safe Tool Interfaces

Capability-Safe Tool Interfaces A single tool call in an agentic harness is usually benign. Read a file, run a shell command, make an HTTP request — each is authorized individually...

execution-harnessmalicious-model

Context Window Integrity Under Adversarial Length

Context Window Integrity Under Adversarial Length A model’s context window is a fixed-size resource. The serving harness fills it by concatenating, in priority order, a system prom...

execution-harnessexternal-usermalicious-model

Device Drivers for Verified Kernels

Device Drivers for Verified Kernels Multi-tenant GPU workloads today run on hypervisors whose TCB is orders of magnitude too large to verify, and whose device-facing drivers are pr...

firmware-lowlevelhardware-supply-chainco-tenantrogue-insider

Edge Policy Verification

Edge Policy Verification The API gateway in front of an inference service decides two things on every request: whether the caller is who they claim to be, and whether they have exc...

execution-harnessorchestration-cloudexternal-usermalicious-model

Fabric Policy Verification

Fabric Policy Verification Every tenant-isolation claim on a GPU cluster fabric bottoms out in configuration state: P_Key partition tables and forwarding tables programmed by the I...

orchestration-cloudco-tenantnetwork-mitm

Microkernels and Hypervisors

Microkernels and Hypervisors Multi-tenant GPU isolation today relies on hypervisors. While hypervisors started small, broadly they have grown into full cloud-operating systems. KVM...

firmware-lowlevelco-tenantrogue-insider

enabler

Verified Network Tap

Verified Network Tap The control plane that decides what flows where on a GPU cluster fabric — SDN controllers, fabric managers, tenant routing tables ( § ) — is the same piece of ...

orchestration-cloudhardware-supply-chainnetwork-mitmphysical-adversary

Governance and Interpretability of Neuralese Proof Stacks

Governance and Interpretability of Neuralese Proof Stacks table stakes is that proofs are cheap. We assume a proof oracle will make code and proofs that they’re correct too cheap t...

execution-harnesssoftware-frameworkorchestration-cloudmalicious-model

enabler

OCI Runtime Hardening

OCI Runtime Hardening The standard container runtime, runc , implements the OCI specification atop Linux namespaces and cgroups. It was never designed to resist an adversary inside...

orchestration-cloudexecution-harnessmalicious-modelco-tenant

Sampler Integrity and Determinism

Sampler Integrity and Determinism The sampler is the last piece of code between the model’s output logits and the token that reaches the user. It applies temperature scaling, top- ...

software-frameworkexecution-harnessrogue-insidermalicious-model

Scheduler Integrity and Co-Tenancy Isolation

Scheduler Integrity and Co-Tenancy Isolation The dominant real-world mitigation for co-tenant attacks is not a scheduler policy but a purchase order: hyperscaler customers with ser...

orchestration-cloudco-tenantrogue-insider

Specification Elicitation and Validation

Specification Elicitation and Validation If proofs are cheap and specs are expensive, then specs are the bottleneck. Every widget in this document terminates in a spec: a verified ...

execution-harnesssoftware-frameworkorchestration-cloudmalicious-model

enabler

Shovel-Ready Project Specs