problems / audit-log-integrity / widget
authors: quinn, maxvh
The audit log is the primary accountability mechanism for deployed AI systems. Regulators, red teams, and incident responders all depend on the same thing: a complete, tamper-evident record of what the model did. If log entries can be suppressed, falsified, or selectively deleted, the system is unauditable — and any safety claim that rests on “we can review the logs” collapses. Append-only storage and cryptographic log chaining (in the style of Certificate Transparency [1]) provide tamper evidence, but the implementations are not formally verified. The gap between “we use Merkle trees” and “no API call sequence available to the serving infrastructure can modify a committed entry” is exactly the gap formal methods fills.
There is a subtler problem on the same boundary. When a model’s output or summary is written back to session state for future context windows, a persistent side channel is created. A misaligned model can encode messages in session state that survive request boundaries — instructions to future invocations of itself, breadcrumbs for capability accumulation, or data staged for later exfiltration. This channel is invisible to the output filter, which inspects the response returned to the user, not the write-back to the session store. The threat model is not hypothetical: any system that persists conversation summaries, scratchpads, or tool-call results across requests has this surface.
The two problems are linked by a common structure: both involve a write path that must satisfy integrity constraints, and both fail silently when the constraints are violated. A corrupted log and a poisoned session store both look normal from outside the system until someone audits the contents — and the audit itself depends on the log being intact.
Build a TLA+ model of the serving infrastructure’s request lifecycle. State variables: the committed log, the in-flight request set, the per-principal permission relation. Transitions: a request enters the gateway, dispatches to forward-pass execution, emits a log entry, commits the entry to storage, returns a response. Two properties discharge against the same model. Tampering safety: in every reachable state, no transition available to any principal (model, harness, orchestrator) modifies or deletes an already-committed log entry — a safety check over the permission relation and the storage state machine. Logging completeness, the liveness counterpart: every request that reaches forward-pass execution is eventually followed by a log-commit transition before the corresponding response-emit transition, catching suppression by the same infrastructure that handles writing. Sketch A treats the log as a passive resource and verifies the surrounding access-control and lifecycle machinery that protects it.
An alternative architecture closes the same gap from the other direction: rather than verifying that the permission system prevents tampering, make tampering cryptographically detectable inside the log itself. Append-only Merkle-tree logs of the kind Certificate Transparency [1] uses (also sigstore’s Rekor [2]) publish signed tree heads that any third party can use to check, via consistency proofs, that two published heads belong to the same history. A serving infrastructure that retroactively removes a committed entry cannot then issue a new signed head consistent with the previous one — the tamper-evidence rests on the structure, not on the access-control system.
The verification target is the log component itself, against an abstract specification of an append-only Merkle log. Three local properties:
append(entry) operation produces a tree whose root is a valid extension of the previous root; no implementation path produces a signed root that omits a previously committed entry.consistency_proof(old_root, new_root) is sound (it only verifies for genuinely-consistent trees) and complete (it verifies whenever the trees are consistent).These are local properties of a single program and discharge cleanly in Dafny or F* as a refinement against the abstract spec. The result is a verified log component whose tamper-evidence guarantees rest on a small cryptographic structure rather than the permission graph surrounding it — a property that any third party with two signed tree heads can re-check independently of the operator, which Sketch A’s approach cannot offer. Sketch A and Sketch B are alternatives: pick one (or run both, treating Sketch B’s verified log as the storage primitive Sketch A’s permission model protects).
For the persistent-side-channel problem, apply information flow analysis. Model the session store as shared memory and the write-back function as the interface under test. The noninterference property: the content written to session state must be a function only of authorized inputs (the model’s declared output schema, the user’s message, the tool results). Any information flow from the model’s hidden state to the session store that is not mediated by the declared output path is a violation. This is verifiable as a pre/postcondition contract on the write-back function — specify the allowed content, prove the function produces only that content. Dafny is a natural fit: the write-back function is small, the specification is a refinement type on the output, and the verifier handles the quantifier reasoning automatically.
// stack context
[execution-harness] Execution Harness
Execution Harness The execution harness is the outermost software layer of an ML deployment: the wrapper that actually calls the model and handles its inputs and outputs. A user’s prompt enters here, gets tokenized and batched, passes through the model, and returns as a completion. Because every int…
[orchestration-cloud] Orchestration & Cloud
Orchestration and Cloud Layer The orchestration layer manages the compute resources that ML workloads run on: scheduling jobs across GPU clusters, routing traffic between them, and controlling who can access what. It sits below the frameworks and above the firmware — a layer of distributed systems s…