Tractable Problems in AI Security via Formal Methods

Sampler Integrity and Determinism

The sampler is the last piece of code between the model’s output logits and the token that reaches the user. It applies temperature scaling, top-k or top-p filtering, and then draws from the resulting distribution. This is a small amount of code — a few hundred lines in a typical inference server — but it sits at a chokepoint. An adversary who controls the temperature parameter, the random seed, or the filtering threshold can bias the output distribution without touching weights or the forward pass. The attack surface is not hypothetical: inference servers expose these parameters via API, configuration files, and environment variables, and the sampler’s internal state (particularly the PRNG) is rarely isolated from the rest of the serving process.

The scoping here is what makes the problem attractive. Unlike weight loading or kernel validation, the sampler is small enough to verify. The specification is a probability distribution parameterized by (logits, temperature, top-k, top-p, seed), and the property to check is that the implementation samples from exactly that distribution — no hidden state, no side-channel inputs, no drift from the spec across calls.

Solution/project Sketch

Specify the sampler as a probabilistic program and verify it with a probabilistic model checker (PRISM [1] or Storm [2]). The key property: for any input logit vector, the output token distribution is within a specified KL-divergence bound of the theoretical distribution defined by the declared parameters. This catches both outright bugs (off-by-one in top-k filtering) and subtle manipulation (PRNG state leaking information across requests). Separately, prove data-flow integrity on the implementation — that the sampler is a pure function of its declared inputs with no dependence on mutable global state or memory reachable from other threads. This is a standard information-flow analysis, tractable for code this size. The combination of distributional correctness and data-flow purity gives you a sampler you can trust at the boundary between model and user.

Bibliography

[1] M. Kwiatkowska, G. Norman, and D. Parker, “PRISM 4.0: Verification of Probabilistic Real-Time Systems,” in Proceedings of the 23rd International Conference on Computer Aided Verification (CAV), 2011, pp. 585–591. doi: 10.1007/978-3-642-22110-1_47.
[2] C. Dehnert, S. Junges, J.-P. Katoen, and M. Volk, “A Storm is Coming: A Modern Probabilistic Model Checker,” International Journal on Software Tools for Technology Transfer (STTT), 2017, doi: 10.1007/s10009-017-0469-y.