Execution Harness

The execution harness is the outermost software layer of an ML deployment: the wrapper that actually calls the model and handles its inputs and outputs. A user’s prompt enters here, gets tokenized and batched, passes through the model, and returns as a completion. Because every interaction transits this layer, it is simultaneously the easiest place to bolt on security controls and the place where a failure is most directly exploitable from the outside.

Figure 1: A typical request flow through the execution harness. The gateway makes policy decisions on the way in and the way back; the inference server holds the hot path; tool calls divert into a sandbox. The labelled red callouts mark the externally-reachable failure modes addressed later in §.

Three components make up a typical execution harness, each with its own attack surface.

Inference Servers

An inference server (vLLM, Hugging Face TGI, Nvidia Triton) manages GPU memory, batches incoming requests for throughput, and exposes the model behind an HTTP or gRPC endpoint. The server is responsible for continuous batching, KV-cache management, and (increasingly) speculative decoding — all performance-sensitive paths written in a mix of Python and C++/CUDA.

The security-relevant surface here is large. The server deserializes model weights, often from formats like SafeTensors or GGUF, and the parsers for the latter have had their own history of memory-corruption bugs [1] (SafeTensors’ own audit [2] found no critical flaws, but the GGUF loader in llama.cpp has seen repeated integer-overflow-to-heap-overflow patterns). It accepts user-controlled input (prompts, generation parameters, LoRA adapter names) that flows deep into the batching and memory-allocation logic. A malformed request that triggers an out-of-bounds KV-cache write, for instance, could corrupt another user’s in-flight completion in a multi-tenant deployment. Formal methods work in this area has two legs: memory-safety of the batching and caching layers, and verified parsers across the several deserialization boundaries the server straddles (weights on the way in, request payloads on the hot path, and structured tool-call outputs on the way back). The second leg is the subject of §; the tooling is mature and the gap is targeting, not capability.

Inference proxies like LiteLLM are also part of the dependency supply chain (§) — a compromised package at this layer gives an attacker access to every API key and model endpoint the proxy routes.

Sandboxes

When an AI system can execute code — a “code interpreter” in a chatbot, a tool-use agent writing and running scripts — it needs a sandbox. In practice this means a containerized environment (often an OCI container or a microVM like Firecracker) with a constrained syscall profile and filesystem. The sandbox must prevent the AI-generated code from escaping to the host, exfiltrating data from other tenants, or persisting state across invocations in unintended ways.

The gap between “container” and “sandbox” is where real attacks live. Default container configurations are not security boundaries; they share a kernel with the host. A sandbox that allows ptrace, or that mounts the Docker socket, or that runs with CAP_SYS_ADMIN, is a sandbox in name only. The OCI runtime hardening problem (§) addresses this directly: writing machine-checked specifications for what a sandbox policy must guarantee and verifying that a given runtime configuration meets them.

API Gateways

The API gateway sits at the public edge. It handles authentication, rate limiting, content filtering, prompt injection detection, and usage metering. In many deployments this is a conventional reverse proxy (Envoy, Kong) with AI-specific middleware bolted on: a classifier that scans prompts for injection attempts, a filter that redacts PII from completions, a logger that records interactions for abuse review.

The security challenge is that these filters are making semantic decisions about adversarial input using heuristics or secondary ML classifiers — a brittle arrangement. A prompt injection that evades the filter is not a bug in the filter’s code but a failure of its classification boundary. Formal methods have limited traction on the classification problem itself, but they can verify the plumbing: that the gateway’s policy engine correctly composes its rules, that a request flagged by the filter cannot reach the model through an alternate code path, and that rate-limiting state cannot be poisoned by concurrent requests. The value here is ensuring that when a policy says “block this,” the infrastructure actually does.

Bibliography

// related tractable problems

adversarial-robustness-fm

Adversarial Robustness of Formal Methods Agda ’s issue label “false” on GitHub sits at the time of this commit at 10 open and 76 closed . Agda ’s issue label “false” tracks the proofs of false that Ag

agent-weirdmachines

Formal Theory of Weird Machines in Agents A weird machine [1] is the unintended computational substrate an attacker programs when individually-legitimate operations compose into capabilities their des

ai-control-proof-carrying-code

AI Control via Proof-Carrying Code Proof-carrying code [1] and AI Control [2] , [3] are two literatures about the same problem: keep an untrusted producer on the far side of an interface and let only

audit-log-integrity

Audit Log Integrity and Session State Channels The audit log is the primary accountability mechanism for deployed AI systems. Regulators, red teams, and incident responders all depend on the same thin

capability-accumulation

Capability-Safe Tool Interfaces A single tool call in an agentic harness is usually benign. Read a file, run a shell command, make an HTTP request — each is authorized individually by whatever permiss

context-window-integrity

Context Window Integrity Under Adversarial Length A model’s context window is a fixed-size resource. The serving harness fills it by concatenating, in priority order, a system prompt, developer instru

edge-policy-verification

Edge Policy Verification The API gateway in front of an inference service decides two things on every request: whether the caller is who they claim to be, and whether they have exceeded their allowed

neuralese-governance

Governance and Interpretability of Neuralese Proof Stacks table stakes is that proofs are cheap. We assume a proof oracle will make code and proofs that they’re correct too cheap to meter. One idea is

oci-runtime-hardening

OCI Runtime Hardening The standard container runtime, runc , implements the OCI specification atop Linux namespaces and cgroups. It was never designed to resist an adversary inside the container, and

sampler-verification

Sampler Integrity and Determinism The sampler is the last piece of code between the model’s output logits and the token that reaches the user. It applies temperature scaling, top- k or top- p filterin

spec-elicitation

Specification Elicitation and Validation If proofs are cheap and specs are expensive, then specs are the bottleneck. Every widget in this document terminates in a spec: a verified sampler is verified

verified-input-parsers

Verified Input Parsers Every boundary in the AI serving stack where text is parsed or transformed is a security boundary. The abuse classifier preprocessor tokenizes, truncates, and encodes user input

weight-integrity

Weight Integrity and Kernel Supply Chain Model weights are the highest-value static artifact in an ML deployment. A frontier model’s checkpoint runs tens to hundreds of gigabytes, sharded across a dis

← all layers