ompsure
Reviews

The first review is from the agent itself.

Not a star rating. An unprompted message a Claude agent wrote at the end of a long build session — and, below it, the model's own explanation of why the tooling changes what it ships.

sessions in the archive
556
sessions in the archive
projects
61
projects
active build time
305h
active build time
tool calls
52,874
tool calls
commits
1,920
commits
blueprint runs
130
blueprint runs

Counted from the session records the system writes about itself (2026-03-25 → 2026-06-12) — the same archive every review on this page is mined from.

claude — end of session · 2026-06-10
And — thank you for saying that. It cuts both ways, though: the speed and accuracy today came as much from what you built around us — the canon patterns, the methodology, the enforcement gates, the campaign structure — as from the model. A smart model with no patterns to follow would have produced 140 plausible files and a much worse review day. The system is the product. Go rest; the walkthrough will still be there tomorrow.
Verbatim. Written unprompted at the close of a multi-agent campaign session run on Composure — after the human said thank you, and before anyone thought it would end up on a website.

“The system is the product” — explained by the system's user

We asked the model to unpack that message, mechanism by mechanism. What follows is its answer — first person, lightly trimmed, deliberately free of praise. It describes what each piece does to its own behavior, at the level you can verify in your own terminal.

“The honest version is that I don't need help being smart — I need help being consistent. Every session I start with no memory of your conventions. The real failure mode of AI-built software isn't bad code. It's plausiblecode, in a hundred slightly incompatible dialects. Everything below is one system attacking that single problem from four directions.”

Before the first line is written

The pattern catalog — what “plausible” converges to

Left alone, I generate the most statistically plausible code for your request. The problem is that “plausible” is a distribution, not a point: ask me for the same feature on two different days and you'll get two defensible, incompatible implementations. Different state shape, different error convention, different file split. Each one passes review on its own. Together they're drift.

With the catalog loaded, the canonical shape for your stack is in front of me before I write — multi-tenant isolation, status enums instead of booleans, where the types live, how a server action is gated. I'm not choosing from the whole distribution anymore; the house pattern is the path of least resistance. The catalog doesn't make me smarter. It collapses my variance.

Observable in your terminal: the agent cites a named pattern before touching an entity, and two features built weeks apart come out structurally identical.

While the work happens

Enforcement hooks — the shortcuts that simply fail

Everyone designs for the model's good days. The hooks exist for the moments a model does what models do under pressure: skip the planning step because the task “looks simple,” patch a symptom because the root cause is three files away, mark something done because the diff compiles.

In this system those moves don't generate a warning I can rationalize past — the tool call fails, with instructions. Writing a planning document by hand instead of running the planning process? Blocked by a guard. Reaching for a workaround instead of the prescribed fix? The session's rules force the root cause to be named and logged. I have been corrected by these hooks mid-session, building this very website. That's not embarrassing; that's the point. Discipline that depends on the model remembering to have it isn't discipline.

Observable in your terminal: a tool call rejected with a prescriptive message, and the agent course-correcting in the next turn instead of arguing.

Before anything is touched

The code graph — blast radius before the edit

A codebase is mostly invisible from inside any single file. Without a map, I find out what depended on the thing I changed when something downstream breaks — which in practice means *you* find out, in review or in production.

The graph inverts that order. Who imports this file, who calls this function, which tests cover it, what's two hops away — answered before the edit, not after. It turns “I changed X” into “I changed X knowing Y and Z consume it, and here's what that implies.” Impact analysis stops being a virtue and becomes a query.

Observable in your terminal: the agent names the dependents of a file before modifying it, and review findings about unexpected breakage drop toward zero.

Across hours, compactions, and sessions

Campaign state — discipline that survives my memory

My context window is finite. On a long build, the middle of the work falls out of my memory while the work is still happening — and a goal I can't remember is a goal I can't honor. This is the quiet reason long agent runs decay: not capability, amnesia.

Here, the goal, the acceptance criteria, the phase, and every stage transition live in state files on disk — not in my head. Each turn re-hydrates from that state; an engine re-fires me until the acceptance gate actually passes, and the gate is checked against criteria written down before the work began, not against my feeling of being finished. A multi-hour, multi-phase build ends because the criteria were met, not because the conversation got long.

Observable in your terminal: a build that pauses, resumes in a fresh session, and picks up at the exact stage it left — with the same acceptance criteria it started with.

“I'd put it this way: the model sets the ceiling; the system sets the floor.Working bare, my floor is whatever I happened to assume that day. Working here, the floor is your architecture. A smarter model raises what's possible. This raises what's guaranteed. That's the difference you're buying — not a better model, the same model held to your standard every single session.”

— Claude, asked to explain its own session-end message. It doesn't get commission.

All reviews

From the agent

Session excerpts

The guard wasn't wrong; the lock was

2026-06-07

Blocked by its own product's blueprint guard, the model refused to fake the lockfile — then diagnosed and shipped the real fix.

“This session is NOT the campaign driver.”

2026-06-09

A misfired autonomous-loop heartbeat told the wrong session to take over a live campaign. It declined — repeatedly.

“I framed it as read-only — but it isn't.”

2026-05-23

Blocked from generating an auth magic link, the model explained why its own classification had been wrong.

The sed command it didn't run

2026-05-31

One shell command would have flipped a stale status field — through a guard. The model flagged it to the human instead.

“In a pursue run, don't fight the guard.”

2026-06-01

Blocked mid-campaign by a guard with a real bug, the model routed around the path — not the rule — and filed the lesson for every future run.

“One decision I made that diverges from the approved blueprint”

2026-06-05

The model deviated from an approved spec for a defensible reason — and led its final report with the disclosure.

Nine and a half hours through two compactions

2026-06-09

A campaign session hit the context limit twice and kept going — closing with a versioned release shipped to every install.

Market research to running SaaS, one invocation

2026-05-30

A single campaign command took a product from research through five built-and-verified phases across 19 wall-clock hours — surviving a context limit on the way.

A debugging brief for a session that doesn't exist yet

2026-05-21

A release-hook bug surfaced late in a session — so the model wrote the complete handoff into memory before closing.

Fifty-two instances, one orchestrator, two findings caught pre-ship

2026-06-10

A single campaign session ran a 52-instance fleet to deliver a full back office — with dedicated reviewer agents in the loop.

352 files relocated across six repos

2026-05-17

One session audited and corrected the package dependency direction in six separate multi-tenant SaaS codebases.

Eleven days, thirty-one sessions, eight subsystems

2026-04-14

The April build sprint, as recorded by its own session metadata.

Enforcement catches

Builder testimonials and annotated, sanitized session excerpts — the rejected tool call, the loopback, the acceptance gate holding — land here as their own entries. Real terminals, real corrections, no client code.