Greenbaum Labs
Lab

Build notes from the diagnostic engine.

Experiments, benchmarks, and field notes from building a structural diagnostic pipeline on local inference infrastructure. Real models, real data, real conditions.

What this is
The Coherence External Diagnostic Pipeline runs entirely on local hardware — no cloud APIs. Every model choice, prompt change, and scoring adjustment has measurable consequences. This is where we publish what we learn.
Model Race April 7, 2026
Gemma 4 vs Qwen3 on a Global Restaurant Chain
Two 30B-class models ran the full pipeline against the same 709-document corpus on parallel DGX Sparks. Speed, extraction quality, scoring fidelity, and diagnostic depth compared head-to-head.
709 documents Gemma 4 0.498 Qwen3 0.432 Winner: Gemma 4 31B
Pipeline Hardening March 10, 2026
Autoresearch: 73 Experiments to 0.000 Standard Deviation
Systematic pipeline hardening across 4 research tracks on distributed DGX Sparks. 30 hours of compute. Prompt tuning, scoring calibration, finding thresholds, and adversarial debate until the pipeline converged.
73 experiments 4 research tracks 30h compute 0.000 std dev