Compositional Learning
How the Bastion composition engine discovers attack vectors that no single-domain auditor — and no human — anticipated, and how those discoveries become permanent deterministic governance.
The Problem
Individual security domains (authorization, arithmetic, temporal, state) are well-understood. Professional auditors have checklists for each. Static analysis tools have rules for each. But the most severe real-world exploits are chains — a medium-severity temporal issue enables a critical authorization bypass; a low-severity rounding error accumulates through state transitions into material financial impact.
These composite attack paths are:
- Not in any external source — they emerge from how this project's contracts interact
- Not visible to single-domain auditors — each sees only their domain slice
- The highest-value discoveries — they're the attack paths nobody checks for
No amount of domain-specific expertise catches them, because the vulnerability doesn't exist in any single domain. It exists in the composition.
What Is Compositional Learning
In formal AI terms, compositional learning means a system that:
- Learns atomic primitives — the building blocks of knowledge
- Learns how primitives compose — how blocks interact and chain
- Generalizes to unseen combinations — discovers novel compositions it was never explicitly taught
Bastion implements this with the LLM as the composition engine and the growing vector corpus as the knowledge base:
| Component | Role in compositional learning |
|---|---|
| Domain vectors (auth, arithmetic, temporal, state) | Primitives — the atomic vulnerability building blocks |
| Composition auditor agent | Composition engine — reasons about how primitives interact |
| LLM reasoning | Generalization — discovers novel chains from semantic understanding |
| Growing vector corpus | Accumulated knowledge — each accepted finding becomes a new primitive |
| Deterministic memorialization | Learning persistence — discoveries become permanent governance |
Why this qualifies as compositional learning without traditional ML:
The LLM has internalized composition patterns from its training on millions of security analyses, post-mortems, and academic papers. It doesn't need to be trained on your vectors — it understands how vulnerabilities compose as a general capability. The vector corpus provides the project-specific primitives; the LLM provides the composition reasoning. Each accepted composite becomes a new primitive available for the next run, expanding the composition space. The system learns because the knowledge base it reasons over grows — not because the model's weights change.
How It Works
The Composition Pipeline
Step 1: Load All Domain Vectors as Primitives
The composition auditor reads every vector across all domains. Unlike the 4 focused auditors (which only read their own domain file), the composition auditor sees the entire vulnerability landscape simultaneously.
For each vector, it models:
- What it weakens — what security property fails if exploited
- What it assumes — what security properties must hold for the mitigation to work
Step 2: Composition Reasoning
For every cross-domain vector pair, the agent evaluates three composition types:
| Type | Question | Severity |
|---|---|---|
| CHAIN | Does exploiting A break an assumption that B depends on? | max(A, B) + 1 level |
| AMPLIFY | Does exploiting A increase the impact of B? | B severity + 1 level |
| BYPASS | Does exploiting A defeat B's mitigation entirely? | CRITICAL |
The agent also evaluates three-vector chains (A → B → C), which are rare but almost always critical when they exist.
Step 3: Code Verification
Theoretical compositions are verified against actual source code:
- Interaction point — where do the two domains actually touch in code?
- Data flow — can an attacker move data from domain A's output to domain B's input?
- Mitigation independence — if A's fix fails, does B's fix still hold?
Only physically possible chains become proposals. Theoretical-only chains are logged as LATENT — they would become exploitable if a mitigation regresses.
Step 4: Memorialization
Accepted composite vectors follow the same path as any other vector:
Proposal → Human review → /integrate-vector → YAML + semgrep rule + test → make verify
The critical difference: the composite vector is now checked deterministically on every future build. A chain that the LLM discovered through reasoning becomes a permanent, mechanical, zero-judgment check. The non-deterministic insight has been converted into deterministic governance.
The Learning Loop
This is the core mechanism that makes the system genuinely learn over time:
Key properties:
- The primitive set only grows. Rejected proposals are discarded, but the existing corpus never shrinks.
- Composites become primitives. An accepted chain (A→B) can be composed with a new vector C to form a higher-order chain (A→B→C) on the next run.
- The composition space expands combinatorially. With N primitives, there are O(N²) possible pairs. Each accepted composite adds a new primitive, expanding the space for the next run.
- Diminishing novelty is expected and healthy. As the corpus matures, the agent finds fewer new chains per run. This means the known attack surface is converging — which is the goal.
- New single-domain vectors re-expand the space. When a domain auditor discovers a new primitive, it creates fresh composition possibilities that didn't exist before.
Composition Types in Detail
CHAIN: A Enables B
The most common composition. Exploiting vector A breaks a security property that vector B's mitigation assumes is intact.
Example: Stale attestation enables authorization bypass
Temporal: stale data accepted (EX-T-002)
↓ effect: time check bypassed
↓ breaks assumption of:
Authorization: credential check trusts freshness (EX-AUTH-003)
↓ result: expired credential passes authorization
Neither vulnerability alone is critical. The temporal issue just accepts old data. The auth check does validate the credential. But composed, the attacker retains access after their credentials expire.
Mitigation strategy: Break the chain at the composition point. The authorization check must independently verify freshness rather than relying on the temporal layer.
AMPLIFY: A Increases Severity of B
Exploiting vector A doesn't enable B, but magnifies its impact.
Example: Precision loss accumulates across state transitions
Arithmetic: integer division loses precision (EX-A-002)
↓ each calculation: small rounding error
↓ amplified by:
State: transitions compound the error (EX-S-001)
↓ result: O(N * epsilon) drift over N transitions
A 0.01% error per day is negligible. Over a year across a portfolio, it's material. The arithmetic bug alone is medium; the state machine alone is fine; the composition creates systematic financial drift.
Mitigation strategy: Track cumulative error and reconcile periodically.
BYPASS: A Defeats B's Mitigation
The most severe composition. Exploiting vector A renders vector B's mitigation completely ineffective.
Example: Deadline manipulation defeats temporal authorization guard
Temporal: off-by-one in deadline check (EX-T-001)
↓ creates boundary window
↓ bypasses:
Authorization: emergency override gated by deadline (EX-AUTH-001)
↓ result: override activates early, multi-party auth defeated
The auth mitigation (multi-party requirement) is sound in isolation. The temporal mitigation (deadline enforcement) is sound in isolation. But the auth control depends on the temporal control, and the temporal control has an off-by-one. The entire multi-party governance model is defeated.
Mitigation strategy: Defense in depth — the authorization check must not depend solely on the temporal layer. Add a grace period buffer at the authorization level.
Where It Fits in the Architecture
The composition engine sits between the focused auditors (Layer 2) and the vector store, creating a new Layer 2.5:
| Layer | What it does | Authority |
|---|---|---|
| Layer 1: Intelligence | Sweep external sources for new findings | Propose only |
| Layer 2: Domain Auditors | Review code within single domain | Propose only |
| Layer 2.5: Composition | Discover cross-domain attack chains | Propose only |
| Layer 2 (Human): Review | Accept, reject, or revise proposals | Decision authority |
| Layer 3: Automation | Generate artifacts from accepted proposals | Execute only |
| Layer 4: Deterministic | Run checks forever | Mechanical |
The composition auditor has the same authority as domain auditors — it can only propose, never modify. The human review gate (Layer 2) applies equally to composite vectors.
Domain Ownership
| Auditor | Owns | Reads |
|---|---|---|
| authorization-auditor | authorization.yaml | Own domain only |
| arithmetic-auditor | arithmetic.yaml | Own domain only |
| temporal-auditor | temporal.yaml | Own domain only |
| state-auditor | state.yaml | Own domain only |
| composition-auditor | composition.yaml | All domain files |
The composition auditor is the only agent that reads across all domains. It writes only to composition.yaml. Composite vectors reference their constituent vectors by ID, maintaining full traceability.
Schema Extension
Composite vectors extend the standard vector schema with composition-specific fields:
- id: AV-C-001
name: "Stale attestation enables authorization bypass"
severity: CRITICAL
composition_type: CHAIN # CHAIN | AMPLIFY | BYPASS
constituent_vectors: # What primitives compose into this
- vector_id: AV-T-002
domain: temporal
role: enabler # enabler | target | amplifier | intermediate
- vector_id: AV-AUTH-003
domain: authorization
role: target
applies_to:
- "**/CredentialCheck.daml"
description: |
[Standard description field — how the chain works]
mitigation_pattern: |
[Cross-domain guard that breaks the chain]
required_test: testStaleCredentialAuthChain
cwe: "CWE-613"
status: MISSING
discovered_by: agent
discovered_date: "2026-03-30"
New fields:
| Field | Type | Purpose |
|---|---|---|
composition_type | enum | CHAIN, AMPLIFY, or BYPASS |
constituent_vectors | array | References to the primitives that compose |
constituent_vectors[].role | enum | enabler, target, amplifier, intermediate |
These fields enable:
- Traceability — which primitives produced this composite
- Impact analysis — if a constituent vector is fixed, is the composite still exploitable?
- Regression detection — if a constituent's mitigation regresses, flag all composites that depend on it
Latent Chain Tracking
Not all compositions are currently exploitable. A latent chain is one where the composition is theoretically valid but currently blocked because one or more constituent vectors have COVERED status (their mitigations are in place).
Latent chains matter because:
- Mitigations can regress. A refactor might accidentally remove a guard.
- Coverage can change. A test might be deleted or a semgrep rule disabled.
- Dependencies should be explicit. If vector B's safety depends on vector A being mitigated, that dependency should be documented.
latent_chains:
- name: "Stale attestation → authorization bypass"
constituent_vectors: ["AV-T-002", "AV-AUTH-003"]
blocked_by: "AV-T-002 has COVERED status"
risk: "If temporal mitigation regresses, auth check becomes bypassable"
recommendation: "Add integration test verifying both mitigations together"
Integration with Layer 4: When make verify detects that a vector's status changes from COVERED to MISSING, it should check whether any latent chains become exploitable and escalate their severity.
Running the Composition Auditor
When to Run
| Trigger | Reason |
|---|---|
| After domain auditors discover new vectors | New primitives expand composition space |
After /integrate-vector adds composites | Composites become primitives for higher-order chains |
| Before major releases | Comprehensive cross-domain review |
| After significant refactoring | Mitigations may have regressed, activating latent chains |
Invocation
The composition auditor is a Claude Code agent, invoked like any other:
Run the composition-auditor agent against this project
It reads all domain vector files, the project source code, and external intelligence, then outputs structured proposals.
Expected Output
composition_metadata:
vectors_analyzed: 45
pairs_evaluated: 990
chains_found: 3
latent_chains: 5
new_vectors:
- id: AV-C-NEW-001
name: "..."
composition_type: CHAIN
constituent_vectors: [...]
# ... full vector definition
latent_chains:
- name: "..."
constituent_vectors: [...]
blocked_by: "..."
Why LLM Composition, Not Traditional ML
A trained classifier finds vectors that are statistically similar. The composition auditor needs to find vectors that are causally related but semantically distant — a temporal staleness issue and an authorization credential check don't look similar at all. They compose because of hidden dependencies in how one's effect breaks the other's assumption. That's a reasoning problem, not a similarity problem.
| Aspect | Why the LLM approach is correct |
|---|---|
| Cold start | Full capability from run 1 — no labeled training data needed |
| Reasoning | Understands why vulnerabilities compose, not just statistical correlation |
| Infrastructure | Just another Claude Code agent — no training pipeline, GPU, or model hosting |
| Knowledge growth | The vector corpus the LLM reads expands over time — that is the learning |
| Domain scale | DAML/Canton is a bounded ecosystem; the total distinct vulnerability surface will converge well before volume would justify a trained model |
| Novel discovery | The highest-value compositions are the surprising ones — exactly where causal reasoning outperforms pattern matching |
Measuring Effectiveness
Track these metrics to evaluate whether the composition engine is finding real value:
| Metric | What it measures | Healthy range |
|---|---|---|
| Proposals per run | Discovery rate | 1-5 (decreasing over time is expected) |
| Accept rate | Proposal quality | Above 40% (below suggests false positive issue) |
| Unique composition types | Diversity of findings | All 3 types (CHAIN, AMPLIFY, BYPASS) |
| Cross-domain coverage | Which domain pairs are explored | All 6 pairs (4 choose 2) |
| Latent→Active conversions | Regression detection value | Should be near 0 (means mitigations hold) |
| Time to memorialization | Review pipeline efficiency | Under 1 sprint from proposal to deterministic check |
Process Model Summary
The complete Bastion process, including compositional learning:
The key insight: Every box above the Human Review Gate is non-deterministic — it uses AI judgment, external research, and semantic reasoning. Every box below it is deterministic — YAML files, semgrep rules, DAML tests, mechanical verification. The Human Review Gate is the trust boundary that converts one into the other. Compositional learning is the mechanism that ensures the non-deterministic layer keeps finding things the deterministic layer hasn't captured yet.