Design Philosophy
Most diagnostic systems fail in one of two ways: they make interpretation simple and scoring complex, or they make both complex. The Coherence Scan separates these concerns deliberately.
Interpretation is hard. Extracting meaningful signal from SEC filings, employee reviews, customer complaints, and job postings requires a sophisticated lens. AI agents do this work, and an adversarial process keeps them honest.
Scoring is simple. Once findings survive debate, the math that converts them into scores is deterministic, locked, and auditable. No interpretation happens at the scoring layer. No weights shift between runs. The formula was calibrated against 74 scored runs and frozen.
This separation is the instrument's integrity mechanism. The agents can hallucinate or miss nuance — that's why the skeptic exists. But once a finding is sustained, the score it produces is the same score it will always produce. No drift. No judgment calls. No "it depends."
The Pipeline
Every scan follows the same five-stage process. No stage is optional. No stage can be skipped.
Each stage produces artifacts that feed the next. Nothing is inferred. Nothing is assumed. The synthesizer writes the final assessment from sustained findings only — it cannot reference anything the skeptic rejected.
Evidence Lineage
Every number in a Coherence Scan traces back to a specific source document. The chain is fully auditable.
If a buyer asks "why is the Authority score 0.38?" — the answer traces from score, to sustained finding, to specific claims and observations, to the exact passage in the source document. The chain does not break.
The Adversarial Skeptic
The skeptic is not a quality check. It is a structural feature of the instrument. It exists because the agents that generate findings can be wrong.
How it works
Round 1: Challenge. The skeptic reviews each proposed finding and issues a verdict: sustained (evidence holds), weakened (evidence is thin but not indefensible), or rejected (evidence does not support the claim).
Round 2: Rebuttal. Weakened findings get a second chance. The skeptic reviews them again with the original challenge in view. Final verdict: sustained or rejected. No appeals.
What the skeptic catches
In a recent scan, the skeptic rejected 4 of 6 proposed findings. That is the instrument working correctly. The rejection rate is published in every scan because what the instrument chose not to assert is as important as what it did.
Evidence minimums
Findings must meet minimum evidence thresholds before the skeptic even sees them. The thresholds vary by finding type — contradictions require more supporting evidence than alignments, because the claim is stronger. Findings below the threshold are automatically excluded. The exact thresholds are part of the proprietary scoring model.
Deterministic Scoring
Once findings survive debate, the math is mechanical.
Each sustained finding has two properties that determine its contribution to the score:
The score computation is: contribution = base value × strength weight, summed across all sustained findings, normalized, and bounded. A sparse-finding dampener prevents single findings from dominating. No findings in a dimension = neutral score (0.50), not a good score.
The overall coherence score is a weighted average of Truth, Authority, and Continuity. The vertex weights and dimension base values were calibrated against 74 agent-scored runs and have been frozen since calibration. They do not change between scans, between entities, or between operators.
Confidence: What the Instrument Trusts
Every score comes with a confidence rating. This is not a subjective assessment — it is computed from five measurable properties of the evidence base:
- Evidence count. How many claims and observations support the assessment. More evidence = higher confidence.
- Source diversity. How many distinct source types contributed. A scan drawing from 7 source types is more reliable than one drawing from 2.
- Recency. How current is the evidence. Stale data reduces confidence.
- Internal consistency. Do the findings contradict each other? Contradicting findings reduce confidence. The penalty is capped — some contradiction is expected in complex organizations.
- Traceability. What fraction of evidence has a verifiable link to a source document. Untraceable evidence is weighted less.
The confidence score is the minimum across all assessed vertices. If Truth confidence is 0.85 but Authority confidence is 0.60, the overall confidence is 0.60. The instrument reports its weakest link, not its strongest.
A confidence of 0.78 means: "The evidence base is strong across multiple source types with high traceability and low internal contradiction." A confidence of 0.50 means: "I'm showing you what I see, but I need more data to be certain."
Caps and Guards
The instrument limits its own claims. When data quality is insufficient, scores are automatically capped regardless of what the agents found.
- Source coverage caps. If the scan lacks minimum center or edge source diversity, scores cannot exceed a defined ceiling and confidence is reduced. The instrument will not produce a confident diagnosis from narrow data.
- Evidence density caps. If total evidence (claims + observations) falls below a minimum threshold, the overall score is capped and confidence is downgraded. Sparse evidence cannot produce precise scores.
- Temporal depth caps. Continuity scoring requires multiple collection periods. Without sufficient longitudinal data, the continuity vertex is scored from observed trajectory patterns rather than the full continuity formula.
- Pre-debate rejection. Findings that do not meet minimum evidence thresholds are excluded before the skeptic debate begins. They never enter the scoring pipeline.
- Sustain rate monitoring. If the skeptic rejects more than 80% of proposed findings, the system flags a data quality warning. This prevents low-quality evidence from producing overconfident assessments.
These constraints exist because the most dangerous thing a diagnostic instrument can do is claim precision it hasn't earned. The caps are not a limitation. They are a design choice.
Reproducibility
A diagnostic instrument must produce consistent results. We tested this by scanning the same organization seven times across four collection periods as data quality improved from Grade C to Grade A.
| Period | Truth | Authority | Overall | Grade | Primary Signal |
|---|---|---|---|---|---|
| 1 | 0.45 | 0.38 | 0.42 | C | Authority compression detected |
| 2 | 0.33 | 0.55 | 0.43 | B | Multiple failure modes cascade |
| 3 | 0.58 | 0.38 | 0.49 | A | Authority crisis — lowest score |
| 4 | 0.44 | 0.50 | 0.47 | A | Stress migrates to Truth |
Scores moved. The structural diagnosis persisted. Authority compression was detected in the majority of runs. When data quality improved (Grade C to A), scoring precision increased but the underlying story remained consistent. That is how a real diagnostic instrument behaves: the readings sharpen as measurement quality improves, but the condition it's detecting doesn't change because the measurement got better.
Across a 15-entity fleet diagnostic, failure modes showed sector-level patterns: the same structural condition appeared in all three fintech entities analyzed, while aerospace and defense entities showed a different structural profile. The instrument detects structure, not noise.
Failure Mode Detection
The 17 failure modes in the DRI™ taxonomy are the structural conditions the instrument detects. Each has:
- A tier classification. Foundational (Tier 1), Systemic (Tier 2), or Terminal (Tier 3). Higher tiers require stronger evidence to activate.
- An activation threshold. A minimum confidence score that must be exceeded before the failure mode is reported as active. The thresholds are tiered — terminal failure modes have higher activation thresholds than foundational ones because the claim is more consequential.
- Cascade relationships. Defined pathways through which one failure mode produces conditions that activate another. These are structural, not statistical — they are derived from the taxonomy, not from correlations in the data.
- Precursor signals (field notes). 21 early-warning indicators that track patterns below the failure-mode threshold. These are the leading indicators — the conditions that, if left unaddressed, will cross the activation threshold.
The activation thresholds, cascade definitions, and precursor mappings are part of the proprietary scoring model.
What We Don't Share
The architecture of this instrument is transparent. The parameters are proprietary.
This is the same boundary that defines every serious measurement system. FICO publishes that credit scores use payment history, utilization, length of history, new credit, and credit mix. It does not publish the weights. The architecture builds trust. The parameters are the instrument.