Specification Elicitation and Validation

If proofs are cheap and specs are expensive, then specs are the bottleneck. Every widget in this document terminates in a spec: a verified sampler is verified against some statement of what a sampler should do, an OCI runtime is hardened against some threat model, an IAM policy is checked against some notion of least privilege. The spec is where human intent meets machine-checkable formalism, and it is the part that no proof oracle can write for you — or rather, if a proof oracle writes it for you, you have quietly handed over the thing you were trying to keep.

Two subproblems:

  • Elicitation. Getting the spec out of the stakeholder’s head and into a formal language, without losing the parts they could not articulate. The classical software-engineering requirements-gathering literature is relevant but insufficient — we need elicitation techniques that produce artifacts a kernel can consume, not prose a PM can sign off on.
  • Validation. Convincing yourself that the spec you wrote is the spec you meant. A proof of P against the wrong P is worse than useless: it launders a bug into a theorem. The property-based testing tradition [1] and the spec-mining tradition [2] are the two complementary handles we have on this — the first forces you to execute the spec against concrete inputs, the second infers candidate specs from observed behavior for the human to ratify or reject.

TODO: discuss the relationship to § — if the proof substrate is itself opaque, spec elicitation becomes unsolvable rather than merely hard, which is why we treat that as a separate enabler.

TODO: discuss differential specification (two independently elicited specs, checked for agreement) as a validation technique.

Solution/project Sketch

TODO: flesh out. Candidate directions: interactive spec-synthesis from examples, lightweight formal methods for spec sanity-checking, spec-level mutation testing, and human-factors work on what kinds of formal notation domain experts can actually read and write.

Bibliography

  • [1] K. Claessen and J. Hughes, “QuickCheck: A Lightweight Tool for Random Testing of Haskell Programs,” in Proceedings of the 5th ACM SIGPLAN International Conference on Functional Programming (ICFP), 2000, pp. 268–279. doi: 10.1145/351240.351266.
  • [2] G. Ammons, R. Bodík, and J. R. Larus, “Mining Specifications,” in Proceedings of the 29th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL), 2002, pp. 4–16. doi: 10.1145/503272.503275.

// stack context

[execution-harness] Execution Harness

Execution Harness The execution harness is the outermost software layer of an ML deployment: the wrapper that actually calls the model and handles its inputs and outputs. A user’s prompt enters here, gets tokenized and batched, passes through the model, and returns as a completion. Because every int

[software-framework] Software & ML Framework

Software and ML Framework Layer Between the orchestration infrastructure below and the execution harness above sits the software that actually defines and compiles a model: the frameworks researchers write against, the compilers that lower their code to GPU kernels, and the sprawling dependency grap

[orchestration-cloud] Orchestration & Cloud

Orchestration and Cloud Layer The orchestration layer manages the compute resources that ML workloads run on: scheduling jobs across GPU clusters, routing traffic between them, and controlling who can access what. It sits below the frameworks and above the firmware — a layer of distributed systems s

← all problems