Multi-Agent Architecture

The Coherence Diagnostic Pipeline

Six autonomous agents. Adversarial debate. Deterministic scoring. All local inference. No cloud dependency for pipeline execution.

6
Autonomous Agents
4
Pipeline Stages
0.000
Score Stdev
133
Production Runs
15
Entities
01Collect
02Extract
03Score
04Synthesize
JSONL
Claims + Observations
Case File
Agent 01
The Collector
Gathers raw center + edge documents from 7 public source types per entity.
CENTER press releases, job postings, earnings transcripts
EDGE customer reviews, social media, news, employee reviews
OUTPUT ~1,800 docs/entity → JSONL
DEDUP content-level, ~170 lines
TOOLS Perplexity Computer + custom collectors
Agent 02
The Extractor
Reads each document individually. Produces structured claims (center) and observations (edge).
INPUT raw JSONL, batch_size=1
CENTER → structured claims
EDGE → structured observations
OUTPUT ~6,100 claims + observations
MODULE agent_extract.py
Agents 03 – 05 · The Adversarial Core
Agent 03a
Truth Diagnostician
Scores center-edge alignment. 3 findings: alignment, omission, contradiction.
WEIGHT 55% of overall
CAP 3 findings (hard)
Agent 03b
Authority Diagnostician
Scores voice/power coherence. 3 findings: compression, diffusion, misalignment.
WEIGHT 45% of overall
CAP 3 findings (hard)
Agent 04 · The Skeptic
Finding submitted with evidence citations
Skeptic challenges — evidence quality, circular reasoning, unsupported confidence
Diagnostician rebuts — defends with specific evidence
Verdict rendered for each of 6 findings
Sustained → Evidence Ledger
Rejected → Rejection Log
truth = alignment(+0.45) + omission(-0.30) + contradiction(-0.50)
authority = compression(-0.25) + diffusion(-0.18) + misalignment(-0.22)
overall = truth × 0.55 + authority × 0.45
strength: strong(0.40) · moderate(0.25) · weak(0.10)
0.448
σ = 0.000 (deterministic)
Agent 06
The Recorder
Logs every run to the performance ledger. Maintains evidence chains. Writes case files and summaries.
CASE FILE case_file.json
SUMMARY case_summary.md
LEDGER ledger.json (133 entries)
TOKENS aggregate_tokens.py
HYGIENE post_run_hygiene.sh
NOTIFY operator_inbox.jsonl
Agent 00 · Human
The Architect
Designs pipeline, writes prompts, debugs failures, makes all decisions. The only entity with decision authority. Agents observe and suggest. The Architect decides.
RULE AR-001: Automation may observe, summarize, and suggest. Automation may not decide.
SCOPE prompt design, architecture, failure triage, all external decisions
Agent 05
The Validator
Entity resolution, schema validation, provenance hash checks. Runs inline between extraction and scoring.
CHECK schema conformance
CHECK provenance hashes
CHECK entity resolution
GATE blocks malformed data
Compute Infrastructure — All Local, No Cloud Dependency
Spark 1
NVIDIA DGX
Primary inference
Ops Center host
Spark 2
NVIDIA DGX
Parallel fleet
scoring
M2 Studio
Apple M2 Ultra
Embeddings
Agent host
M3 Ultra
Apple M3 Ultra
72b vision
Image generation
NAS
Synology + NVMe
Shared filesystem
Source of truth
MacBook Pro
Orchestrator
SSH control plane
Dispatches, doesn't infer
Autoresearch — Continuous Prompt Optimization
Track 1 — Scoring
Scoring Prompt Optimization
Optimizes Truth, Authority, Skeptic, and Rebuttal prompts against a fixed proxy entity.
PROXYNike run_079 (22 docs)
EXPERIMENTS41+
BASELINE0.058
BEST0.091 (1.57x)
TIME/EXPERIMENT~45 min (3 trials)
METRICrepro × stability × evidence × diagnostic
Track 2 — Extraction
Extraction Prompt Optimization
Optimizes source-specific extraction prompts. Key insight: fewer, sharper items beat volume.
PROXYNike stratified (15 docs)
EXPERIMENTS19+
METRICsustain_rate × mean_strength
KEY FINDING98% extracted items never cited
SOLUTIONasymmetric limits, fewer/sharper
TIME/EXPERIMENT~20 min
AR-001
"Automation may observe, summarize, and suggest. Automation may not decide."