Lean 4 + Elixir/OTP

Composable Skill
Compiler

English → Verified Agent Pipelines

Compile natural language into typed, capability-bounded, budget-constrained DAGs with machine-checked Lean 4 proofs at every handoff point.

13
Lean 4 Invariants
583
Elixir Tests
8
Phase 1–5 Modules
5
Fiduciary Gap Closures
View on GitHub

Why Current Agent
Orchestration Fails

01

Tool Selection Lottery

LLMs perform probabilistic retrieval under context pressure. The same prompt on the same model can select different tools on successive invocations. Selection is not deterministic, not provably correct, and not auditable.

02

Context Bloat

Sub-agents receiving full conversation history carry forward irrelevant tokens. This wastes budget, dilutes attention, and introduces surface area for confabulation.

03

Scope Creep

Without declared capability envelopes, a skill authorised to READ a calendar can attempt to WRITE to it. The capability boundary is implicit and unenforced.

04

P(success) Degradation

For a pipeline of n steps each with success probability pi, joint success = ∏i pi. A 10-step pipeline at 95% per step achieves only 59.9% end-to-end. At 90%: 34.9%.

Skill Composition
as Compilation

Compilation Stage CSC Equivalent
Lexical analysisLLM-assisted tokenisation of English prompt
Parsing / ASTSemantic IR extraction (intent, entities, constraints)
Type checkingLean 4 handoff proofs at every DAG edge
OptimisationParallel branch detection, budget partitioning
Code generationSkillDAG with execution mode, triage map, traces
Test harnessWind tunnel + chaos monkey

What Gets Proved

Type Safety

output_type(source) ≼ input_type(target)

Structural subtyping: source output has at least the fields required by target input, with compatible types.

Capability Safety

capability_envelope(target) ⊆ allowed_after(source)

No skill escalates beyond its declared envelope. Composed envelope bounded by intersection.

Budget Safety

budget(source) + budget(target) ≤ remaining_budget

Total DAG budget proved ≤ available quota before any agent executes.

Confabulum Gate Soundness

confabulum_gate st = .Pass → st.confabulum_score < threshold

Synthesis only proceeds if all upstream SubTasks pass the ConfabulumRate gate (score < 0.65 on all 9 types). Proved in CSC/ConfabulumGate.lean.

EscalationGate (I11)

material_decision ∧ execute_allowed → human_auth_token.isSome

Material decisions require a human auth token. escalation_gate_sound proved in CSC/EscalationGate.lean. Maps to FINRA Rule 3110/3120.

Groundtrace Tamper-Evidence

valid_chain chain → modify chain[i] → ¬valid_chain chain

Hash-chain integrity: any modification to a historical record invalidates all successors. chain_tamper_evident proved in CSC/Groundtrace.lean.

Pathway Compliance (Compositional)

pathway_compliant dag ↔ ∀ e ∈ dag.edges, edge_norm_preserving e

Compliance is a path property, not just terminal-state. pathway_compliance_compositional proved. Addresses Goodhart's Law applied to agentic AI.

ClientProfile Freshness (I13)

requires_client_context ∧ execute_allowed → ∃ cp, fresh(cp)

Advice SubTasks cannot execute without a fresh ClientProfile. profile_required_sound proved in CSC/ClientProfile.lean.

Routing Cost-Optimality

route_cost_optimal: minimise token_spend s.t. accuracy ≥ floor

The router minimises token spend under an accuracy floor. route_cost_optimal proved in CSC/Router.lean. 9 theorems total.

All properties are proved in Lean 4. Original three-layer proofs are cached in the HandoffProof structure and replayed on demand. Phase 1–5 theorems extend coverage to epistemic integrity, audit tamper-evidence, agentic risk, and client context freshness. Invariant set: I1–I13.

Wind Tunnel +
Chaos Monkey

Wind Tunnel

Translation Axis

Tests English → SemanticIR robustness. Given a semantically fixed prompt, does paraphrase variation produce the same canonical IR?

  • Generates covering arrays over 8 paraphrase factors
  • Uses factorial test generation
  • If a pipeline can't survive its wind tunnel, it shouldn't be trusted with user data

Chaos Monkey

Pipeline Axis

Tests SkillDAG against input variation. Samples from property space of each skill's input types.

  • Monte Carlo over the input space
  • Triages into Green (machine-decidable correct), Amber (human determination required), Red (sorry/gap)
  • Quantifies the pipeline's coverage margin
These two axes are orthogonal. A pipeline can be paraphrase-robust but narrow in coverage, or wide in coverage but paraphrase-fragile. Both must be measured.

Architecture

SkillSignature
structure SkillSignature where
  id               : SkillId
  name             : String
  level            : SkillLevel          -- L1 | L2 | L3
  operation        : Operation           -- READ | WRITE | TRANSFORM
  input_type       : TypeSignature
  output_type      : TypeSignature
  capability_envelope : CapabilityEnvelope
  budget_ceiling   : Nat
  side_effects     : List SideEffect
  safety_class     : SafetyClass         -- safe | confirm | dangerous
HandoffProof
structure HandoffProof where
  edge             : DAGEdge
  type_proof       : TypeProof
  capability_proof : CapabilityProof
  budget_proof     : BudgetProof
SemanticIR
structure SemanticIR where
  intent           : Intent
  entities         : List Entity
  constraints      : List Constraint
  decomposition    : List SubTask
  uncertainty      : List FormalisationBoundary

New Lean 4 / Elixir Modules

Module Description Phase
CSC.Router Task classifier — routes Factual / Structural / Formal to optimal execution path. 9 theorems including route_cost_optimal. P1
CSC.Groundtrace GroundtraceRecord hash-chain schema. Tamper-evident per-SubTask audit trail. chain_tamper_evident proved. P2
CSC.EscalationGate I11 — material decision gate requiring human auth token. FINRA Rule 3110/3120 supervisory substitution. escalation_gate_sound proved. P3
CSC.CompetenceSignal Certainty vocabulary gate — confidence level constrains synthesis language register. Addresses the “digital sociopath” problem. P3
CSC.PathwayAudit DAG edge compliance monitor — deontic obligation set checked at every step. pathway_compliance_compositional proved. P3
CSC.ClientProfile I13 — first-class client context type. Freshness proof enforced structurally. profile_required_sound proved. P4
CSC.DataAdapter Typed interface for grounded data sources (FRED, EDGAR, PerplexitySearch). Includes stale_ground_blocked theorem. P4
CSC.FiduciaryScope Licensed operator wrapper — hard pipeline gate requiring fiduciary acceptance. Integrates with TokenGov.NormfallAlert. P5

Fiduciary AI Gap Closure

The fiduciary AI problem (Lo, MIT Sloan, 2026) is that AI systems deliver correct and incorrect answers with equal confidence. CSC addresses this at five levels:

P2 — Confabulation
Phase 1
ConfabulumGate + RAG grounding + ConfidenceScore propagation. 9-type hallucination taxonomy with pipeline halts. TruthfulQA regression gate in CI.
4 / 5
P4 — Agentic Risk
Phase 3
EscalationGate (I11) + GroundtraceRecord + PathwayAudit + CompetenceSignal. FINRA 3110/3120 supervisory substitution. Deontic compliance checked at every DAG edge.
4.5 / 5
P5 — Governance
Phase 2 + 5
Hash-chain audit trail + telemetry + FiduciaryScope. Append-only AuditStore (Rule 17a-4 path). NormfallAlert monitors DOL/SEC/FINRA regulatory corpus.
4 / 5
P3 — Missing Context
Phase 4
ClientProfile (I13) + DataAdapter (FRED / EDGAR). Staleness invariants enforced structurally. Advice cannot proceed without a resolved, fresh client profile.
3.5 / 5
P1 — Legal Accountability
Phase 5
FiduciaryScope + NormfallAlert (TokenGov). Requires licensed operator. Stack refuses financial advice without signed FiduciaryScope. Engineering ceiling: P1 full closure requires regulatory assignment by a licensed entity.
3 / 5

Lexicon

chorema
The geometric shape of a skill composition graph (chain, tree, diamond, mesh). Determines parallel execution optimisations.
groundtrace
Conjunction of all HandoffProofs in a SkillDAG. If inhabited, the pipeline is safe to execute.
promptome
Full semantic content of a prompt after IR extraction. The invariant preserved through canonicalisation.
normfall
A failure mode where a proof attempt collapses. Sufficient to halt compilation (fail-fast).
clausome
The universe of skill type signatures known to the compiler. The skill registry.
ergotropy
Ratio of useful work to total token expenditure.
tersiture
Maximum semantic density per token.
loquency
Verbosity overhead in agent communication.
witnesshood
Third-party attestation of execution correctness via CCAP trace.
yoneme
Cross-account task identity token for deduplication.
colimese
Categorical colimit of composed skill types at fan-in nodes.
driftlock
Irreversible capability envelope contraction over time.

Built With

L4

Lean 4

Formal verification, 13 invariants (I1–I13), 4,163+ lines. Machine-checked proofs for type safety, capability safety, budget safety, confabulation gating, tamper-evident audit trails, agentic risk controls, and client context freshness.

Ex

Elixir / OTP

Runtime, supervision trees, fault tolerance, 5,800+ lines. BEAM VM provides lightweight processes with preemptive scheduling for concurrent pipeline execution.

CC

CCAP Protocol

ClawCombinator Agent Protocol for execution traces and witnesshood. Provides third-party attestation of execution correctness across agent boundaries.

Part of a Larger Stack

TokenGov

Mechanism-design token budget allocation (VCG/DSIC). Ensures truthful bidding for compute resources across competing agent pipelines.

Elan

CompanyAsCode orchestrator. Models organisational structure as typed, composable skill graphs with formal governance constraints.

LegalLean

Formal legal reasoning in Lean 4. Encodes regulatory obligations as propositions and verifies compliance before pipeline execution.

EconLib4

Formalised economics in Lean 4 — 253 theorems. Provides the auction theory and mechanism design foundations for TokenGov.

OpenCompliance

Regulatory compliance framework. Maps jurisdiction-specific requirements to machine-checkable constraints on agent behaviour.