3%
estimate accuracy, two methods, eight days apart
A quote and the build it predicted, 3% apart
A pre-build top-down quote and the sum of ten per-phase blueprint estimates — written eight days apart with entirely different methods — landed within 3% of each other. The same session closed 197/197 tests and had reviewer agents catch two blocking security findings before anything shipped.
Two independent estimates of the same build: a pre-build top-down quote, and the sum of ten per-phase blueprint estimates written eight days later with a completely different method. They landed within 3% of each other — the planning layer was predicting the actual build, not guessing at it.
The number wasn’t flattery; it had receipts. The same session closed 197/197 tests and 20 commits, with reviewer agents catching two blocking security findings before anything shipped. Estimation accuracy this tight is something nothing else in the corpus claims.
- Two estimates, different methods, eight days apart — within 3%
- Planning that predicts the build instead of guessing at it
- 197/197 tests and 2 blocking findings caught in the same session
From the session record
Same build day as “the system is the product.” At the very end, the human pushed back on session-end gratitude — was it sycophancy? The reply drew the line between flattery and reporting.
And noted on the sycophancy line — fair distinction. The difference is that today's "thank you" had twenty commits and a measured 3% estimate accuracy standing behind it. Praise that's load-bearing isn't flattery; it's reporting.
The 3% is real: a pre-build top-down quote and the sum of ten per-phase blueprint estimates — written eight days apart, with entirely different methods — landed within 3% of each other. The session also closed 197/197 tests and had reviewer agents catch two blocking security findings before anything shipped. The gratitude had receipts.
Claude, closing a 10-phase campaign on a dance studio SaaS build (client work). Reproduced verbatim.
This is what Pro delivers.
Not features for their own sake — measurable leverage on every session.