Greenbaum Labs
Methodology

How the Instrument Works

The Coherence Scan is not a survey, a sentiment analysis, or a consulting framework. It is a measurement instrument with a defined evidence chain, deterministic scoring, and built-in constraints that prevent it from claiming more than it can prove.

Collect
Extract
Score
Debate
Synthesize
Evidence only. Public data, traceable to source.
Adversarial filtering. Skeptic rejects what can't be defended.
Deterministic scoring. Locked formula. Zero drift.
Confidence + caps. The instrument limits its own claims.

Design Philosophy

Core Principle
The lens is sophisticated because it needs to be. The math is simple because it should be.

Most diagnostic systems fail in one of two ways: they make interpretation simple and scoring complex, or they make both complex. The Coherence Scan separates these concerns deliberately.

Interpretation is hard. Extracting meaningful signal from SEC filings, employee reviews, customer complaints, and job postings requires a sophisticated lens. AI agents do this work, and an adversarial process keeps them honest.

Scoring is simple. Once findings survive debate, the math that converts them into scores is deterministic, locked, and auditable. No interpretation happens at the scoring layer. No weights shift between runs. The formula was calibrated against 74 scored runs and frozen.

This separation is the instrument's integrity mechanism. The agents can hallucinate or miss nuance — that's why the skeptic exists. But once a finding is sustained, the score it produces is the same score it will always produce. No drift. No judgment calls. No "it depends."

The Pipeline

Every scan follows the same five-stage process. No stage is optional. No stage can be skipped.

Collect
Extract
Score
Debate
Synthesize
Collect
Automated collection from public sources: SEC filings, earnings transcripts, job postings, employee reviews, customer reviews, social media, news coverage, and regulatory signals. No proprietary data. No inside sources. No surveys.
Extract
AI agents read every document and extract two types of signal: claims (what the organization says about itself) and observations (what the world says about the organization). Each extraction is tagged with source, scope, polarity, and a traceable hash.
Score
Specialized agents assess Truth and Authority by comparing claims against observations. They produce structured findings with dimension classification, strength rating, and supporting evidence references. Scoring is deterministic from findings — the formula is locked.
Debate
An adversarial skeptic agent challenges every proposed finding across two rounds. It identifies logical flaws, evidence gaps, and unsupported conclusions. Only findings that survive both rounds enter the final assessment.

Each stage produces artifacts that feed the next. Nothing is inferred. Nothing is assumed. The synthesizer writes the final assessment from sustained findings only — it cannot reference anything the skeptic rejected.

Evidence Lineage

Every number in a Coherence Scan traces back to a specific source document. The chain is fully auditable.

Source Document
A public document — an SEC filing, a customer review, an employee review, a job posting. Identified by source type, retrieval date, and content hash.
Extraction
AI agent reads the document and extracts claims or observations. Each is assigned an ID, classification, and excerpt hash that ties it to the specific passage in the source.
Finding
A scoring agent compares claims and observations, identifies a pattern (alignment, contradiction, omission, compression, diffusion, misalignment), and proposes a finding with references to the specific claims and observations that support it.
Debate
The skeptic challenges the finding. If sustained, the finding enters the evidence ledger with the skeptic's reasoning preserved. If rejected, the rejection reason and evidence gap are recorded.
Score
The sustained finding's dimension and strength feed the deterministic formula. The contribution to the final score is computed mechanically. No interpretation at this layer.

If a buyer asks "why is the Authority score 0.38?" — the answer traces from score, to sustained finding, to specific claims and observations, to the exact passage in the source document. The chain does not break.

The Adversarial Skeptic

The skeptic is not a quality check. It is a structural feature of the instrument. It exists because the agents that generate findings can be wrong.

How it works

Round 1: Challenge. The skeptic reviews each proposed finding and issues a verdict: sustained (evidence holds), weakened (evidence is thin but not indefensible), or rejected (evidence does not support the claim).

Round 2: Rebuttal. Weakened findings get a second chance. The skeptic reviews them again with the original challenge in view. Final verdict: sustained or rejected. No appeals.

What the skeptic catches

Insufficient breadth
"The finding cites only 5 of 47 complaints. Are these representative or cherry-picked?"
Logical inversion
"The finding treats quality control issues as evidence of improvement. These observations contradict the claim."
Missing context
"Account freezes could be temporary or situational, not systemic. Frequency data is needed."
Aspirational conflation
"The mission may be aspirational. Operational constraints may exist. The misalignment is plausible but not proven."

In a recent scan, the skeptic rejected 4 of 6 proposed findings. That is the instrument working correctly. The rejection rate is published in every scan because what the instrument chose not to assert is as important as what it did.

Evidence minimums

Findings must meet minimum evidence thresholds before the skeptic even sees them. The thresholds vary by finding type — contradictions require more supporting evidence than alignments, because the claim is stronger. Findings below the threshold are automatically excluded. The exact thresholds are part of the proprietary scoring model.

Deterministic Scoring

Once findings survive debate, the math is mechanical.

Each sustained finding has two properties that determine its contribution to the score:

Dimension
Truth: alignment (positive), omission (negative), contradiction (negative). Authority: compression (negative), diffusion (negative), misalignment (negative). Each dimension has a fixed base value.
Strength
Strong, moderate, or weak — determined by the scoring agent based on evidence density and specificity. Each strength level has a fixed weight.

The score computation is: contribution = base value × strength weight, summed across all sustained findings, normalized, and bounded. A sparse-finding dampener prevents single findings from dominating. No findings in a dimension = neutral score (0.50), not a good score.

The overall coherence score is a weighted average of Truth, Authority, and Continuity. The vertex weights and dimension base values were calibrated against 74 agent-scored runs and have been frozen since calibration. They do not change between scans, between entities, or between operators.

What this means
Give two operators the same sustained findings, and they will produce the same score. The scoring layer has zero degrees of freedom.

Confidence: What the Instrument Trusts

Every score comes with a confidence rating. This is not a subjective assessment — it is computed from five measurable properties of the evidence base:

The confidence score is the minimum across all assessed vertices. If Truth confidence is 0.85 but Authority confidence is 0.60, the overall confidence is 0.60. The instrument reports its weakest link, not its strongest.

A confidence of 0.78 means: "The evidence base is strong across multiple source types with high traceability and low internal contradiction." A confidence of 0.50 means: "I'm showing you what I see, but I need more data to be certain."

Caps and Guards

The instrument limits its own claims. When data quality is insufficient, scores are automatically capped regardless of what the agents found.

These constraints exist because the most dangerous thing a diagnostic instrument can do is claim precision it hasn't earned. The caps are not a limitation. They are a design choice.

Reproducibility

A diagnostic instrument must produce consistent results. We tested this by scanning the same organization seven times across four collection periods as data quality improved from Grade C to Grade A.

Period Truth Authority Overall Grade Primary Signal
1 0.45 0.38 0.42 C Authority compression detected
2 0.33 0.55 0.43 B Multiple failure modes cascade
3 0.58 0.38 0.49 A Authority crisis — lowest score
4 0.44 0.50 0.47 A Stress migrates to Truth

Scores moved. The structural diagnosis persisted. Authority compression was detected in the majority of runs. When data quality improved (Grade C to A), scoring precision increased but the underlying story remained consistent. That is how a real diagnostic instrument behaves: the readings sharpen as measurement quality improves, but the condition it's detecting doesn't change because the measurement got better.

Across a 15-entity fleet diagnostic, failure modes showed sector-level patterns: the same structural condition appeared in all three fintech entities analyzed, while aerospace and defense entities showed a different structural profile. The instrument detects structure, not noise.

Failure Mode Detection

The 17 failure modes in the DRI™ taxonomy are the structural conditions the instrument detects. Each has:

The activation thresholds, cascade definitions, and precursor mappings are part of the proprietary scoring model.

What We Don't Share

The architecture of this instrument is transparent. The parameters are proprietary.

Published
The pipeline stages, the evidence chain structure, the adversarial debate process, the confidence model components, the scoring philosophy, the cap and guard system, the reproducibility evidence, and the full 17-failure-mode taxonomy.
Proprietary
The dimension base values, strength weights, vertex weights, sparse-finding dampener formula, confidence component weights, failure mode activation thresholds, evidence minimum thresholds, score cap triggers, and calibration dataset.

This is the same boundary that defines every serious measurement system. FICO publishes that credit scores use payment history, utilization, length of history, new credit, and credit mix. It does not publish the weights. The architecture builds trust. The parameters are the instrument.

See it in action
The method produces the measurement. The measurement produces the weather map.
View Sample Scan