Composable Skill Compiler — English → Verified Agent Pipelines

The Problem

Why Current Agent
Orchestration Fails

01

Tool Selection Lottery

LLMs perform probabilistic retrieval under context pressure. The same prompt on the same model can select different tools on successive invocations. Selection is not deterministic, not provably correct, and not auditable.

02

Context Bloat

Sub-agents receiving full conversation history carry forward irrelevant tokens. This wastes budget, dilutes attention, and introduces surface area for confabulation.

03

Scope Creep

Without declared capability envelopes, a skill authorised to READ a calendar can attempt to WRITE to it. The capability boundary is implicit and unenforced.

04

P(success) Degradation

For a pipeline of n steps each with success probability p_i, joint success = ∏_i p_i. A 10-step pipeline at 95% per step achieves only 59.9% end-to-end. At 90%: 34.9%.

The Metaphor

Skill Composition
as Compilation

Compilation Stage	CSC Equivalent
Lexical analysis	LLM-assisted tokenisation of English prompt
Parsing / AST	Semantic IR extraction (intent, entities, constraints)
Type checking	Lean 4 handoff proofs at every DAG edge
Optimisation	Parallel branch detection, budget partitioning
Code generation	SkillDAG with execution mode, triage map, traces
Test harness	Wind tunnel + chaos monkey

English Prompt

SemanticIR

SkillDAG

HandoffProofs

Execution

Three-Layer Verification

What Gets Proved

Type Safety

output_type(source) ≼ input_type(target)

Structural subtyping: source output has at least the fields required by target input, with compatible types.

Capability Safety

capability_envelope(target) ⊆ allowed_after(source)

No skill escalates beyond its declared envelope. Composed envelope bounded by intersection.

Budget Safety

budget(source) + budget(target) ≤ remaining_budget

Total DAG budget proved ≤ available quota before any agent executes.

Confabulum Gate Soundness

confabulum_gate st = .Pass → st.confabulum_score < threshold

Synthesis only proceeds if all upstream SubTasks pass the ConfabulumRate gate (score < 0.65 on all 9 types). Proved in CSC/ConfabulumGate.lean.

EscalationGate (I11)

material_decision ∧ execute_allowed → human_auth_token.isSome

Material decisions require a human auth token. escalation_gate_sound proved in CSC/EscalationGate.lean. Maps to FINRA Rule 3110/3120.

Groundtrace Tamper-Evidence

valid_chain chain → modify chain[i] → ¬valid_chain chain

Hash-chain integrity: any modification to a historical record invalidates all successors. chain_tamper_evident proved in CSC/Groundtrace.lean.

Pathway Compliance (Compositional)

pathway_compliant dag ↔ ∀ e ∈ dag.edges, edge_norm_preserving e

Compliance is a path property, not just terminal-state. pathway_compliance_compositional proved. Addresses Goodhart's Law applied to agentic AI.

ClientProfile Freshness (I13)

requires_client_context ∧ execute_allowed → ∃ cp, fresh(cp)

Advice SubTasks cannot execute without a fresh ClientProfile. profile_required_sound proved in CSC/ClientProfile.lean.

Routing Cost-Optimality

route_cost_optimal: minimise token_spend s.t. accuracy ≥ floor

The router minimises token spend under an accuracy floor. route_cost_optimal proved in CSC/Router.lean. 9 theorems total.

All properties are proved in Lean 4. Original three-layer proofs are cached in the HandoffProof structure and replayed on demand. Phase 1–5 theorems extend coverage to epistemic integrity, audit tamper-evidence, agentic risk, and client context freshness. Invariant set: I1–I13.

Two-Axis Testing

Wind Tunnel +
Chaos Monkey

Wind Tunnel

Translation Axis

Tests English → SemanticIR robustness. Given a semantically fixed prompt, does paraphrase variation produce the same canonical IR?

Generates covering arrays over 8 paraphrase factors
Uses factorial test generation
If a pipeline can't survive its wind tunnel, it shouldn't be trusted with user data

Chaos Monkey

Pipeline Axis

Tests SkillDAG against input variation. Samples from property space of each skill's input types.

Monte Carlo over the input space
Triages into Green (machine-decidable correct), Amber (human determination required), Red (sorry/gap)
Quantifies the pipeline's coverage margin

These two axes are orthogonal. A pipeline can be paraphrase-robust but narrow in coverage, or wide in coverage but paraphrase-fragile. Both must be measured.

Core Data Structures & Modules

Architecture

SkillSignature

structure SkillSignature where
  id               : SkillId
  name             : String
  level            : SkillLevel          -- L1 | L2 | L3
  operation        : Operation           -- READ | WRITE | TRANSFORM
  input_type       : TypeSignature
  output_type      : TypeSignature
  capability_envelope : CapabilityEnvelope
  budget_ceiling   : Nat
  side_effects     : List SideEffect
  safety_class     : SafetyClass         -- safe | confirm | dangerous

HandoffProof

structure HandoffProof where
  edge             : DAGEdge
  type_proof       : TypeProof
  capability_proof : CapabilityProof
  budget_proof     : BudgetProof

SemanticIR

structure SemanticIR where
  intent           : Intent
  entities         : List Entity
  constraints      : List Constraint
  decomposition    : List SubTask
  uncertainty      : List FormalisationBoundary

Phase 1–5 Additions

New Lean 4 / Elixir Modules

Module	Description	Phase
`CSC.Router`	Task classifier — routes Factual / Structural / Formal to optimal execution path. 9 theorems including `route_cost_optimal`.	P1
`CSC.Groundtrace`	GroundtraceRecord hash-chain schema. Tamper-evident per-SubTask audit trail. `chain_tamper_evident` proved.	P2
`CSC.EscalationGate`	I11 — material decision gate requiring human auth token. FINRA Rule 3110/3120 supervisory substitution. `escalation_gate_sound` proved.	P3
`CSC.CompetenceSignal`	Certainty vocabulary gate — confidence level constrains synthesis language register. Addresses the “digital sociopath” problem.	P3
`CSC.PathwayAudit`	DAG edge compliance monitor — deontic obligation set checked at every step. `pathway_compliance_compositional` proved.	P3
`CSC.ClientProfile`	I13 — first-class client context type. Freshness proof enforced structurally. `profile_required_sound` proved.	P4
`CSC.DataAdapter`	Typed interface for grounded data sources (FRED, EDGAR, PerplexitySearch). Includes `stale_ground_blocked` theorem.	P4
`CSC.FiduciaryScope`	Licensed operator wrapper — hard pipeline gate requiring fiduciary acceptance. Integrates with `TokenGov.NormfallAlert`.	P5

Trustworthiness Roadmap

Fiduciary AI Gap Closure

The fiduciary AI problem (Lo, MIT Sloan, 2026) is that AI systems deliver correct and incorrect answers with equal confidence. CSC addresses this at five levels:

P2 — Confabulation

Phase 1

ConfabulumGate + RAG grounding + ConfidenceScore propagation. 9-type hallucination taxonomy with pipeline halts. TruthfulQA regression gate in CI.

4 / 5

P4 — Agentic Risk

Phase 3

EscalationGate (I11) + GroundtraceRecord + PathwayAudit + CompetenceSignal. FINRA 3110/3120 supervisory substitution. Deontic compliance checked at every DAG edge.

4.5 / 5

P5 — Governance

Phase 2 + 5

Hash-chain audit trail + telemetry + FiduciaryScope. Append-only AuditStore (Rule 17a-4 path). NormfallAlert monitors DOL/SEC/FINRA regulatory corpus.

4 / 5

P3 — Missing Context

Phase 4

ClientProfile (I13) + DataAdapter (FRED / EDGAR). Staleness invariants enforced structurally. Advice cannot proceed without a resolved, fresh client profile.

3.5 / 5

P1 — Legal Accountability

Phase 5

FiduciaryScope + NormfallAlert (TokenGov). Requires licensed operator. Stack refuses financial advice without signed FiduciaryScope. Engineering ceiling: P1 full closure requires regulatory assignment by a licensed entity.

3 / 5

Technical Lexicon

Lexicon

chorema

The geometric shape of a skill composition graph (chain, tree, diamond, mesh). Determines parallel execution optimisations.

groundtrace

Conjunction of all HandoffProofs in a SkillDAG. If inhabited, the pipeline is safe to execute.

promptome

Full semantic content of a prompt after IR extraction. The invariant preserved through canonicalisation.

normfall

A failure mode where a proof attempt collapses. Sufficient to halt compilation (fail-fast).

clausome

The universe of skill type signatures known to the compiler. The skill registry.

ergotropy

Ratio of useful work to total token expenditure.

tersiture

Maximum semantic density per token.

loquency

Verbosity overhead in agent communication.

witnesshood

Third-party attestation of execution correctness via CCAP trace.

yoneme

Cross-account task identity token for deduplication.

colimese

Categorical colimit of composed skill types at fan-in nodes.

driftlock

Irreversible capability envelope contraction over time.

Technology Stack

Built With

L4

Lean 4

Formal verification, 13 invariants (I1–I13), 4,163+ lines. Machine-checked proofs for type safety, capability safety, budget safety, confabulation gating, tamper-evident audit trails, agentic risk controls, and client context freshness.

Ex

Elixir / OTP

Runtime, supervision trees, fault tolerance, 5,800+ lines. BEAM VM provides lightweight processes with preemptive scheduling for concurrent pipeline execution.

CC

CCAP Protocol

ClawCombinator Agent Protocol for execution traces and witnesshood. Provides third-party attestation of execution correctness across agent boundaries.

Ecosystem

Part of a Larger Stack

TokenGov

Mechanism-design token budget allocation (VCG/DSIC). Ensures truthful bidding for compute resources across competing agent pipelines.

Elan

CompanyAsCode orchestrator. Models organisational structure as typed, composable skill graphs with formal governance constraints.

LegalLean

Formal legal reasoning in Lean 4. Encodes regulatory obligations as propositions and verifies compliance before pipeline execution.

EconLib4

Formalised economics in Lean 4 — 253 theorems. Provides the auction theory and mechanism design foundations for TokenGov.

OpenCompliance

Regulatory compliance framework. Maps jurisdiction-specific requirements to machine-checkable constraints on agent behaviour.

Composable SkillCompiler

Why Current AgentOrchestration Fails