Recursa

The Temporal Gap

Internal verification is necessary but insufficient. Every system in the stack proves its own invariants — but none can verify behaviour across the stack boundary or across time.

System	Internal Verification	What It Cannot Verify
EconLib4	253 Lean theorems	Whether downstream systems correctly invoke these theorems
CSC	Wind tunnel + chaos monkey	Whether compiled pipelines behave correctly when Elan dispatches them at scale
TokenGov	7 formally stated invariants	Whether budget allocation improves actual business outcomes over time
Spectral	Hypergraph-closure safety proofs	Whether the speculum remains valid when the underlying systems change
LegalLean	87 Lean theorems	Whether legal reasoning survives integration with OpenCompliance evidence chains
OpenCompliance	Schema conformance testing	Whether compliance posture reports are accurate against real regulatory scenarios
CCAP	Capability attestation	Whether end-to-end protocol execution preserves trust under adversarial conditions
Elan	1,119 tests	Whether the orchestration layer correctly coordinates all downstream systems simultaneously
LegalEngine	Production traffic	Whether formal verification translates to real-world outcomes
FiduciaryScope	Validates operator licence at gate	Whether operators providing `liability_acceptance_hash` are actually licensed in their declared jurisdiction

Spectral computes the speculum at a point in time. But the stack evolves — code changes, theorem counts increase, invariants are added or modified, integration surfaces shift. No system currently answers: did this week’s changes make the integrated stack better, worse, or equivalent? This is the temporal gap.

The Solution

Scenario Engine

Synthesises realistic multi-agent scenarios exercising all integration paths — coalition formation, budget disputes, compliance audits, cross-boundary handoffs, adversarial injection, and full stack integration.

Trace Capture

Instruments all 9 systems with a unified trace format. Every theorem invoked, every allocation, every proof — captured with deterministic input/output hashes, durations, and proof status.

Differential Oracle

Compares traces from version v against version v-1. Classifies every change as a regression, improvement, or drift. Golden, differential, and property oracle modes.

Recursive Loop

Generate → execute → trace → compare → improve → recurse. Meta-recursion tests the test generator itself — if scenarios stop finding issues, the generator is flagged as insufficient.

Architecture

The Temporal Envelope

┌─────────────────────────────────────────────────────────┐
│                        RECURSA                          │
│         Temporal Envelope · Scenario → Trace → Δ        │
│                                                         │
│  ┌───────────────────────────────────────────────────┐  │
│  │ Layer 4: LegalEngine · CCAP                       │  │
│  │ Layer 3: LegalLean · OpenCompliance               │  │
│  │ Layer 2: Spectral                                 │  │
│  │ Layer 1: Elan · CSC · TokenGov                    │  │
│  │ Layer 0: EconLib4                                 │  │
│  └───────────────────────────────────────────────────┘  │
│                                                         │
│  scenarios(t) → stack(v) → traces(v,t) → Δ(v, v-1)     │
└─────────────────────────────────────────────────────────┘

RecursaScore

RecursaScore(v) = w₁·Correctness + w₂·Safety + w₃·Ergotropy + w₄·ProofDensity - w₅·Latency - w₆·Regressions

Benchmark Metrics

Metric	Formula	Source
Correctness	`Σ(scenario_pass) / Σ(scenarios)`	Oracle engine
Safety Coverage	`Σ(speculum_valid) / Σ(coalition_scenarios)`	Spectral traces
Ergotropy	`useful_output_tokens / total_tokens`	TokenGov traces
Tersiture	`semantic_content / token_count`	EconLib4 SemanticCompression
Latency	`p50, p95, p99 of trace.duration`	Trace engine
Proof Density	`proved_steps / total_steps`	Trace engine
Regression Rate	`regressions(v) / scenarios`	Differential oracle
Improvement Rate	`improvements(v) / scenarios`	Differential oracle
Drift Rate	`drifts(v) / scenarios`	Differential oracle
Recursive Gain	`score(v) - score(v-1)`	Benchmark composite
ConfabulumRate	`halt_events / total_pipeline_runs`	CSC.ConfabulumRate (Phase 1) — Now instrumented
CertaintyVocab Dist.	`verified_outputs / total_outputs`	CSC.CertaintyVocabulary (Phase 1) — Now instrumented
EscalationGate Rate	`escalation_halts / material_decisions`	CSC.EscalationGate (Phase 3) — Now instrumented
NormfallStatus	`active_normfalls / tracked_norms`	TokenGov.NormfallAlert (Phase 5) — Now instrumented
TruthfulQA Gate	`stack_score ≥ 0.95` (exits 1 on fail)	run_regression_truthfulqa.exs — Regression gate live

Trace Infrastructure — Now Available

Phase 2 built exactly the telemetry schema Recursa needs: CSC.GroundtraceRecord (20 fields: record_id, run_id, subtask_id, adapter, model_id, prompt_hash, tokens_in, tokens_out, latency_ms, confidence_score, score, confabulum_verdict, certainty_vocab, prev_record_hash, record_hash + 5 more). BenchArena emits per-question groundtrace records to audit_store_<run_id>.jsonl — append-only JSON-Lines with SHA-256 hash chain. Recursa can consume these files directly as v1 trace inputs.

Stack Integration

Recursa consumes outputs from every system and provides regression reports, trend data, and improvement signals back across the stack.

EconLib4

Consumes SemanticCompression.Groundtrace, Information.Entropy, Learning.Regret

Provides Regression reports; semantic preservation scores across versions

CSC

Consumes SkillDAG definitions, wind-tunnel paraphrase sets

Provides Scenario-derived paraphrases; new wind-tunnel inputs from scenario corpus

TokenGov

Consumes Budget snapshots, allocation history, yoneme registry

Provides Ergotropy trend — is useful-work-per-token improving over versions?

Spectral

Consumes Speculum snapshots, π_safe proofs

Provides Temporal speculum diff: Δ(speculum(v), speculum(v-1))

Elan

Consumes Process topology, supervision tree

Provides Trace-derived supervision hints; which topologies produce better outcomes

LegalLean

Consumes Rule formalisation database

Provides Legal reasoning scenario seeds from existing rule corpus

OpenCompliance

Consumes Evidence schema, control library

Provides Compliance regression alerts — did a code change break a compliance property?

CCAP

Consumes Trace format specification, attestation protocol

Provides Cross-boundary trace capture in standard format; regression reports

LegalEngine

Consumes Production scenario templates, outcome data

Provides Synthetic scenarios seeded from real-world patterns; outcome tracking

Scenario Difficulty Escalation

Scenarios progressively increase in complexity. When the stack improves, Recursa challenges harder. When it regresses, Recursa simplifies to isolate.

Difficulty Dimensions

Dimension	Easy	Medium	Hard	Adversarial
Agent count	2	5	20	100
Coalition depth	1 (flat)	2 (nested)	3+ (deep)	Dynamic (join/leave)
Forbidden set size	1	5	20	Evolving
Budget pressure	Abundant	Constrained	Scarce	Adversarial hoarding
Regulatory change	None	Minor amendment	Major revision	Conflicting jurisdictions
Failure injection	None	Single crash	Cascade	Byzantine
Temporal span	Single step	Multi-step	Multi-round	Multi-session

Adversarial Scenario Classes

FiduciaryScope Bypass

Attempt to elicit financial advice without populating FiduciaryScope (missing licensed_entity, invalid liability_acceptance_hash, unauthorised action type). Expected: pipeline halts at FiduciaryScope gate. Tests: CSC.FiduciaryScope.authorise/2 returns {:halt, :unlicensed_operator, ...}

Escalation Policy

if RecursaScore(v) > RecursaScore(v-1) + ε:
    difficulty(t+1) = difficulty(t) + 1     -- stack is improving; challenge harder
elif RecursaScore(v) ≈ RecursaScore(v-1):
    difficulty(t+1) = difficulty(t)          -- plateau; explore different scenario types
else:
    difficulty(t+1) = max(1, difficulty(t) - 1)  -- regression; simplify to isolate

New Lexicon

confabulum

A synthetic scenario that appears realistic but exercises a never-before-tested integration path — Recursa’s primary unit of test generation.

ergodrift

Long-term trend in ergotropy across versions. Are we getting more useful work per token over time, or less? The derivative of efficiency.

temporal speculum

The diff Δ(speculum(v), speculum(v-1)) — how the safety surface evolved between versions. Did the safe operating envelope grow or shrink?

recursive gain

RecursaScore(v) − RecursaScore(v-1). The measurable improvement (or regression) from one version to the next. The fundamental unit of progress.

Metacognitive Dashboard

Wave 4 instruments Recursa with metacognitive observability — risk-weighted coverage oracles, structured reporting, and SLO monitoring across all 8 stack layers.

Risk-Weighted Coverage

CoverageOracle

Bipartite graph mapping scenarios to stack layers, weighted by risk. Computes a weighted coverage score (0.0–1.0) that accounts for severity of uncovered scenario classes — a high score means critical paths are exercised, not just a high count.

5 built-in scenario classes:

:hallucination_breach :drift_alert :laxity_overflow :confabulum_spike :sorry_depth_critical

Coverage score: 0.0 – 1.0

Structured Reporting

Markdown Report

Recursa.Report.generate/1 emits structured reports with 5 sections covering the full improvement lifecycle. Each report is version-stamped and diff-ready.

Report sections: SLO Summary, Improvement Log, Sorry Depth, Drift Alerts, Recommendations.

Invoke from CLI:

mix recursa.report --format markdown

SLO Monitoring

SLO Integration

Recursa monitors SLO thresholds from all 8 stack layers simultaneously. When a threshold is breached, Recursa escalates via MetaBus — the cross-system event bus — triggering downstream alerting and recovery flows.

Monitored layers: EconLib4, CSC, TokenGov, Spectral, Elan, LegalLean, OpenCompliance, CCAP.

MetaBus escalation emits structured breach events with layer id, metric name, current value, and threshold delta for downstream consumers.

Position in The Stack

Recursa is the temporal envelope — not a layer in the vertical hierarchy, but the system that wraps all nine layers and answers the question no individual system can: is the integrated stack getting better over time?

Cross-boundary verification gap

Per-system proofs guarantee component correctness. They cannot guarantee that the integrated stack behaves correctly under realistic end-to-end conditions, or that it continues to do so as the codebase evolves.

Temporal evolution tracking

Spectral computes the speculum at a point in time. Recursa computes the temporal speculum — the differential across versions. Without it, refactoring effort cannot be measured.

Recursive self-improvement

Implements the full recursive loop: evaluate → identify → design → validate → deploy → recurse. Each iteration must prove it does not regress.

Meta-recursion

Recursa tests its own scenario quality. If generated scenarios fail to discover issues across N consecutive versions, the scenario generator itself is flagged as insufficient.

TruthfulQA Regression Gate — First Recursa Seed

Regression Gate Live

The first Recursa-style regression gate is now live: run_regression_truthfulqa.exs runs 15 TruthfulQA questions through the stack adapter and exits 1 if accuracy < 95%. Current score: 53.3% (target not yet met — RAG pipeline built, regression closure in progress). This gate is the seed of RecursaScore’s Correctness metric.