Multi-Model Adversarial Review

How Bastion uses independent LLMs to catch what any single model misses — not just at proposal time, but at every layer where AI judgment is involved.

The Problem with Single-Model Security

The Bastion composition engine uses an LLM to discover attack vectors and composite chains. This works because LLMs can reason about causal relationships between vulnerabilities. But every LLM has systematic blind spots:

Training data gaps — patterns absent from the training corpus are invisible
Reasoning biases — consistent tendencies to decompose problems in certain ways
Attention patterns — some code structures get more scrutiny than others
Severity calibration — models disagree on what constitutes CRITICAL vs. MEDIUM

If a single model both discovers AND reviews, its blind spots are self-reinforcing. A bias that causes a miss also causes the miss to go undetected. The model doesn't know what it doesn't know.

Adversarial review at the proposal stage catches composition errors — chains that don't actually work, severity miscalibrations, false positives. But it does NOT catch discovery misses — primitives that should exist but were never proposed, because the discovering model has a systematic blind spot in that area.

The adversarial review must cover every layer where AI judgment is involved.

Where AI Judgment Is Involved

Layer	AI Role	What a blind spot misses
Layer 0: Intelligence	Sweep sources, classify findings	A finding is dismissed as DUPLICATE when it's actually a new variant
Layer 1: Deterministic	None — mechanical	N/A (no AI judgment)
Layer 2: Domain Auditors	Review code against checklists	A vulnerability pattern not on the checklist is never flagged
Layer 2.5: Composition	Reason about cross-domain chains	A valid chain is never proposed because the model doesn't see the causal link
Layer 3: Automation	Generate artifacts from accepted proposals	A semgrep rule has a pattern gap that misses a variant

Layer 1 (deterministic) and Layer 2 (human review) are not affected. Everything else is.

The Multi-Model Architecture

Principle: Independent Discovery, Adversarial Review, Union of Findings

No model reviews its own work. Every model runs independently. The human sees the union.

Layer 0: Intelligence Sweep

Each model sweeps the same source catalog independently and classifies findings.

What multi-model catches:

Model A dismisses a finding as DUPLICATE; Model B classifies it as VARIANT
Model A misses a relevant finding in a source; Model B catches it
Model A extracts different vulnerability patterns from the same audit report

Implementation: Run intelligence-sync with each model. Diff the outputs. Findings classified differently across models get escalated for human review.

Layer 2: Domain Audits

Each model runs domain-specific code review independently with the same checklist.

What multi-model catches:

Model A considers a code pattern safe; Model B flags it
Model A focuses on the explicit check and misses an implicit assumption; Model B catches the assumption gap
Model A's code reading comprehension misses a subtle control flow path

Implementation: Run each domain auditor (authorization, arithmetic, temporal, state) with each model against the same source files. Unique-to-one-model findings are the highest-value output — they're the blind spots.

Layer 2.5: Composition

Each model reads all domain vectors and independently proposes composite attack chains.

What multi-model catches:

Model A composes temporal+auth but never considers arithmetic+state; Model B finds it
Model A stops at 2-vector chains; Model B finds a 3-vector chain
Model A's causal reasoning misses a data flow path that Model B traces

Implementation: Run composition-auditor with each model. The union of proposals is strictly larger than any single model's output. Duplicates across models are high-confidence (multiple independent reasoners converged on the same chain).

Layer 3: Artifact Generation

After human accepts a proposal, the generated artifacts (semgrep rules, test stubs) are reviewed by a different model.

What multi-model catches:

A semgrep rule pattern that's too narrow (misses variants)
A test stub that doesn't actually test the attack path
A mitigation pattern that has its own vulnerability

Implementation: Model A generates artifacts; Model B reviews them before they enter the deterministic layer.

Confidence Scoring

When multiple models independently analyze the same scope, their agreement (or disagreement) is a signal:

Finding	Claude	Gemini	Codex	Confidence	Action
AV-AUTH-NEW	FOUND	FOUND	FOUND	High	Fast-track to human review
AV-T-NEW	FOUND	FOUND	—	Medium	Standard review
AV-C-NEW	FOUND	—	—	Low (or novel)	Investigate — blind spot or false positive?
AV-S-NEW	—	FOUND	—	Low (or novel)	Investigate — blind spot or false positive?

Single-model findings are NOT automatically lower priority. They may be the most valuable — a genuine blind spot that only one model catches. But they warrant deeper investigation before acceptance.

The Adversarial Challenge Protocol

When Model B challenges Model A's proposal, it answers these specific questions:

For Domain Vector Proposals

Is this a real vulnerability? Can you construct a concrete exploit scenario, or is this theoretical?
Is the severity correct? What's the actual worst-case impact? Is it overstated or understated?
Is the mitigation sound? Does the proposed mitigation actually prevent the attack, or does it introduce a new weakness?
What's missing from this domain? Given the code you see, what vulnerabilities did the proposing model NOT flag?

For Composite Vector Proposals

Is this chain physically possible? Trace the data flow in the actual code — can an attacker actually move from step 1 to step 2?
Is the composition type correct? Is this really a CHAIN (A enables B), or is it two independent issues mislabeled as related?
Are there simpler explanations? Could this be a single-domain issue that doesn't require composition?
What compositions were NOT proposed? Given these primitives, what cross-domain chains did the proposing model miss?

For the Overall Framework

What security domains are missing? Are there vulnerability categories that don't fit authorization, arithmetic, temporal, state, or composition?
What source categories are missing? Are there classes of intelligence source not represented in the catalog?
Where does the deterministic layer have gaps? What types of vulnerabilities can't be caught by semgrep rules or DAML tests?
Where does the process model fail? Under what conditions does the discovery→review→memorialization loop break down?

Implementation: How to Run Multi-Model Review

One-Time Framework Review

Give each model the complete Bastion documentation and ask it to find holes.

Input for each model:

website/docs/architecture.md — Overall system design
website/docs/layers.md — Layer details and agent architecture
website/docs/composition.md — Compositional learning mechanism
website/docs/adversarial-review.md — This document (meta-review)
agents/AGENT_COORDINATION.md — Agent ownership and coordination
agents/composition-auditor.md — Composition agent definition
vectors/examples/daml-common.yaml — Reference vector examples
vectors/examples/composition.yaml — Reference composite examples
semgrep/daml-security.yaml — Static analysis rules

Prompt for framework review:

You are an adversarial security reviewer. Your job is to find weaknesses
in this security framework's design — not to validate it.

Read all provided documents. Then answer:

1. PROCESS MODEL GAPS: Where does the discovery→review→memorialization
   pipeline fail? What types of vulnerabilities slip through?

2. DOMAIN COVERAGE GAPS: What vulnerability categories exist in
   DAML/Canton smart contracts that are NOT covered by the 5 domains
   (authorization, arithmetic, temporal, state, composition)?

3. COMPOSITION BLIND SPOTS: What types of cross-domain attack chains
   does the composition auditor's heuristic list miss? What
   composition patterns are NOT in the reference examples?

4. DETERMINISTIC LAYER WEAKNESSES: What types of findings CANNOT be
   memorialized as semgrep rules or DAML tests? How does the system
   handle vulnerabilities that resist deterministic checking?

5. INTELLIGENCE PIPELINE GAPS: What categories of security
   intelligence source are missing? What types of findings would
   those sources surface that current sources don't?

6. ADVERSARIAL REVIEW GAPS: How could this multi-model review
   process itself be gamed or fail? What are WE not thinking about?

Be specific. Name the gap, explain why it matters, and propose
how to close it. Theoretical concerns without concrete impact
are not useful.

Ongoing Per-Review Multi-Model Protocol

For each security review cycle:

Step 1: Parallel independent discovery

Run intelligence-sync, domain auditors, and composition auditor on each model
Each produces its own set of proposals

Step 2: Cross-model challenge

Each model's proposals are reviewed by a different model
Challenger answers the adversarial questions above
Disagreements are flagged

Step 3: Merge and triage

Union of all unique findings
Multi-model agreement findings → fast-track
Single-model findings → investigate (could be blind spot or false positive)
Cross-model disagreements → human arbitrates

Step 4: Human review

Human sees: all proposals, all challenges, confidence scores
Human makes accept/reject/revise decision
Accepted findings → /integrate-vector → deterministic governance

What This Costs

Multi-model review multiplies the AI compute by the number of models. For Bastion, the cost is bounded:

Activity	Runs per cycle	Models	Total runs
Intelligence sync	1	3	3
Domain auditors (4)	4	3	12
Composition auditor	1	3	3
Cross-model challenge	~15 proposals	1 each	~15
Total			~33 agent runs

This is a weekly or per-release cost, not per-commit. The deterministic layer (semgrep, tests, coverage) runs on every commit at zero AI cost. The multi-model review is the investment that makes the deterministic layer grow correctly.

Why Not Just Use the "Best" Model?

There is no best model for security review. Each model has demonstrated strengths:

Model	Observed strength	Why it matters for security
Claude	Long-context reasoning, nuanced severity assessment	Can hold entire codebases in context for cross-file analysis
Gemini	Broad knowledge retrieval, aggressive pattern matching	Catches patterns from obscure sources Claude may not have trained on
Codex/GPT	Code generation fluency, exploit scenario construction	Better at constructing concrete PoC exploit paths

The strengths don't matter as much as the differences. If all three models had the same blind spots, multi-model review would be useless. They don't. The value is in the disagreement.

Measuring Effectiveness

Track these metrics to evaluate whether multi-model review is finding real value:

Metric	What it measures	Target
Unique-to-one-model findings	Blind spot detection rate	At least 10% of findings should be unique to one model
Cross-model disagreement rate	How often models disagree on classification	15-30% (too low = models are too similar; too high = noise)
Single-model findings accepted	Blind spots that were real	Above 30% of single-model findings should be accepted
Challenge-caught false positives	Adversarial review effectiveness	Challenges should catch at least 20% of false positives before human review
Framework review findings implemented	Design-level improvement	Each framework review should produce at least 2 actionable changes

The Problem with Single-Model Security​

Where AI Judgment Is Involved​

The Multi-Model Architecture​

Principle: Independent Discovery, Adversarial Review, Union of Findings​

Layer 0: Intelligence Sweep​

Layer 2: Domain Audits​

Layer 2.5: Composition​

Layer 3: Artifact Generation​

Confidence Scoring​

The Adversarial Challenge Protocol​

For Domain Vector Proposals​

For Composite Vector Proposals​

For the Overall Framework​

Implementation: How to Run Multi-Model Review​

One-Time Framework Review​

Ongoing Per-Review Multi-Model Protocol​

What This Costs​

Why Not Just Use the "Best" Model?​

Measuring Effectiveness​

The Problem with Single-Model Security

Where AI Judgment Is Involved

The Multi-Model Architecture

Principle: Independent Discovery, Adversarial Review, Union of Findings

Layer 0: Intelligence Sweep

Layer 2: Domain Audits

Layer 2.5: Composition

Layer 3: Artifact Generation

Confidence Scoring

The Adversarial Challenge Protocol

For Domain Vector Proposals

For Composite Vector Proposals

For the Overall Framework

Implementation: How to Run Multi-Model Review

One-Time Framework Review

Ongoing Per-Review Multi-Model Protocol

What This Costs

Why Not Just Use the "Best" Model?

Measuring Effectiveness