Research & Benchmarks
Reports, case studies, and failure analysis
March 27, 2026
Flagship Case Study: A Strong Agent Failed the Trial
Why a promising production-style agent lost trust, burned budget, and failed the pre-deployment stress test for autonomous agents.
March 2026
4-Agent Benchmark: Who Survives the Crucible?
We ran four agent strategies through 200 ticks of economic survival. Only two made it out alive.