Tractable Problems in AI Security via Formal Methods

Edge Policy Verification

The API gateway in front of an inference service decides two things on every request: whether the caller is who they claim to be, and whether they have exceeded their allowed rate. Both reduce to the same kind of property — over any window of length $T$ , principal $X$ may send at most $N$ requests — with authentication setting which $X$ applies and rate limiting enforcing the bound. They also fail the same way: the protocol’s state space is large enough that production implementations exhibit reachable states no one designed for. JWT algorithm confusion [1] let attackers forge tokens by switching from RSA to HMAC verification, a bug that lived in production libraries for years because the auth handshake’s state space was never systematically explored. OAuth 2.0 compounds the problem: authorization codes, refresh tokens, PKCE challenges, and session binding interact in enough combinations that implementation errors routinely produce exploitable states. On the quota side, distributed counters (Redis, or in-process sliding windows with gossip) admit concurrent-write race conditions that silently let burst traffic exceed policy — the system reports correct enforcement while the count drifts above the limit. The standard response to both failure modes — unit tests and penetration testing — samples the state space rather than covering it.

Authorization for agent-originated requests is a different shape of problem. In agentic deployments, the model generates tool calls — HTTP requests, database queries, shell commands — that carry the human session’s credentials. The model acts on behalf of the user but is not the user; it is a textbook confused deputy. If the routing logic does not attenuate authority, a prompt injection or jailbreak that causes the model to emit an unexpected tool call inherits whatever permissions the session holds. The question here is not “are the right principals allowed through” but “is the set of actions reachable through model-generated requests a strict subset of what the human intended to authorize” — a containment property over a capability graph, not a rate property over a credential state machine.

The two concerns share an architectural site (the gateway) but separate cleanly as formal targets, and they are treated below as two distinct projects.

Solution Sketch A: Rate-Bounded Authorization

Build a TLA+ model of the gateway’s credentialed-request lifecycle as a single state machine: authentication maps a request to a principal, then a per-principal counter decides whether to admit. On the authentication half, verify that no reachable state admits a token under an unintended algorithm, an expired credential, or a forged signature — covers JWT algorithm confusion and the OAuth state-machine errors above. Extend with a ProVerif [2] or Tamarin [3] model of the full OAuth 2.0 flow, producing machine-checked proofs of token secrecy and session binding. On the rate-limit half, express the policy as a TLA+ invariant — “over any window of length $T$ , the count for principal $k$ does not exceed $N$ ” — and verify the distributed counter implementation against it under all process interleavings and crash-recovery scenarios. Where the counter uses shared mutable state, apply Iris or another concurrent separation logic to prove the monotonicity invariant (the counter never decreases except at window expiry). The two halves share the principal identifier as their interface: the auth model guarantees principal identity is sound, and the rate-limit model guarantees enforcement is correct given that identity. The deliverable is a verified reference gateway with both halves discharged in the same TLA+ model file, plus a test harness that regression-checks deployed gateways against the spec.

Solution Sketch B: Agent Authority Attenuation

Model the gateway’s agent-mediation logic as a capability system in Alloy: the human session holds a capability set, the model’s tool-call interface holds an attenuated subset, and the routing logic mediates between them. The specification: no sequence of model-generated requests can reach a capability outside the attenuated set — a direct confused-deputy analysis of the gateway’s forwarding rules. The vocabulary of capabilities — what they name, what authority they convey, how they attenuate for tool calls — is exactly what § develops; this sketch consumes that vocabulary and verifies that the gateway’s enforcement of it has no escape paths. The deliverable is a verified reference for the gateway’s tool-call routing layer (or shim), with proofs that the attenuation function is complete (every model-reachable action lies inside the attenuated set) and that the routing rules are sound with respect to it.

Bibliography

[1] Auth0, “CVE-2015-9235: JWT algorithm confusion in \texttt{jsonwebtoken}.” 2015.
[2] B. Blanchet, “An Efficient Cryptographic Protocol Verifier Based on Prolog Rules,” in 14th IEEE Computer Security Foundations Workshop (CSFW), 2001, pp. 82–96.
[3] S. Meier, B. Schmidt, C. Cremers, and D. Basin, “The TAMARIN Prover for the Symbolic Analysis of Security Protocols,” in 25th International Conference on Computer Aided Verification (CAV), in Lecture Notes in Computer Science, vol. 8044. 2013, pp. 696–701. doi: 10.1007/978-3-642-39799-8\_48.