Skip to content
Engineering · Field Notes The Armature Method

Hope is not
a control.

After two silent agent failures inside a HIPAA-regulated codebase, I stopped trying to make the agent more reliable — and started designing the system around the fact that it isn't.

RB
Roderick Bertoncini
Founder, Mente360 · May 21, 2026 · 12 min
The Armature Method — turning fallible intelligence into reliable outcomes
DISPATCHES · TWO INCIDENTS, EARLY 2026
i.

At the beginning of this year, a coding agent left a requirement out of a build. Not implemented it wrong — left it out. The specification was in front of it; the omission was silent.

ii.

Six weeks later, a different agent, working against a clearinghouse API that did not follow the industry-standard pattern, hallucinated the industry-standard pattern into the integration code anyway.

Neither failure was dramatic. Neither was caught by reading the diff.
Both should have been impossible.

I build practice-management software for clinicians. The platform is HIPAA-compliant and multi-tenant; the data inside it is some of the most regulated in the country. Unlikely to recur is not a defensible posture.

The failure that ends my nights is not the dropped claim — a dropped claim is a hotfix and an apology. The failure that ends my nights is an audit trail missing a field a year from now because an agent quietly omitted it when no one was looking, and the discovery happening during a regulator's review.

That is not a bug to be cleaned up in a follow-up sprint. That is the kind of failure that ends a company.

After those two incidents I stopped trying to make the agent more reliable and started designing the system around the fact that it isn't.

This is the framework that came out of that.

THE ARMATURE METHOD

Named after the rotating core of an electric motor.

The magnets are inert without it. The agent is inert without the orchestration wrapped around it.

The layers in this method are not a static defense. They are windings, and work moves through them.

§ 1

The Premise

Agent-driven development must be approached as a system, with the agent assumed to be fallible.

This is the framing I was missing for months. Agents forget context. They drift from conventions. They infer requirements that were never agreed to. They declare work complete before it has been verified. None of this is a defect to be corrected in the agent. These are properties of the agent, to be designed around — the same way you design a process around a tired engineer at 4 p.m., not against an idealized one.

A well-designed system does not rely on the agent being consistently correct. It relies on consistently structured context, uniformly applied conventions, artifacts that carry state between sessions, and programmatic verification at every layer where verification is possible. The agent's intelligence is amplified by the system, not tested by it.

This is the operational philosophy I absorbed at Amazon, applied to a problem Amazon never faced: prefer mechanisms over diligence wherever automation is possible.

If a correct outcome depends on anyone remembering to, it is not yet a control; it is a hope.

The rest of Armature is what falls out when you take that line seriously.

§ 2

The Tenets

Eight rules, in priority order. When two of them point in different directions, the earlier one wins. They are not descriptions of how work happens. They are the criteria against which any proposed change to how work happens gets judged.

  1. 01
    Agents are fallible; design the system, not the agent.

    Correct outcomes are produced despite imperfect participants, human and machine.

  2. 02
    Automated mechanisms beat diligence.

    Any check that can be expressed as a script, hook, linter rule, manifest, or CI job should be expressed that way. Guidelines are placeholders for mechanisms not yet built.

  3. 03
    State lives in artifacts, not in memory.

    If it is not written down in a durable, versioned place, it does not exist.

  4. 04
    Defense is layered, inside-out.

    Quality is enforced at every layer from agent direction outward to CI. No single gate is sufficient.

  5. 05
    CI is the last line of defense, not the first.

    A CI failure is a diagnostic: an inner layer leaked. The target is CI with nothing to fail on.

  6. 06
    Query the source of truth before inferring.

    Knowledge store, language server, manifest. Pattern search is a last resort, not a first instinct.

  7. 07
    The spec leads the code.

    If implementation would diverge from the spec, the spec is updated first. Silent drift is the most expensive failure mode the framework exists to prevent.

  8. 08
    Surface the gap; do not work around it.

    Missing context, missing tools, missing answers are signals that the system has a hole. Improvising patches the symptom and buries the hole.

— Unless you know better ones.

§ 3

The Onion

Armature enforces quality in six layers, ordered inside-out. The innermost layer is closest to the agent's decision-making; the outermost is closest to production. Each layer catches what the previous one missed. CI is the last layer, not the first; the goal is code that is already correct by the time CI runs.

The six layers of the Armature Method — agent direction, context, sources of truth, tooling, cross verification, CI final gate
L1 Agent Direction

The instructions the agent operates under — system prompts, memory files, persona definitions. The cheapest to change and the weakest form of control. Anything load-bearing should be promoted outward as soon as a stronger mechanism is available.

L2 Context

The information the agent is given for a specific task. Recurring context is injected by a hook on every agent invocation, not fetched by the agent on request. The agent cannot forget to read what it was never asked to read.

L3 Sources of Truth

Knowledge store, language server, manifests. Conformance manifests came out of the hallucinated-API-pattern failure: every external interface now has a YAML manifest declaring the actual shape of the actual API. An invented field fails the manifest before the code reaches a test.

L4 CLI Tooling

Linters, formatters, type checkers, test runners, pre-commit and pre-push hooks. This is where conventions stop being prose and start being enforced. If a convention exists and a tool can enforce it, Armature requires the tool to be wired up.

L5 Cross-Verification

Agent work is reviewed by another agent and by programmatic checks before it is considered done. A second agent reading the diff against the spec catches deviations the implementing agent had already normalized.

L6 CI / Final Gate

The full test suite, the full linter set, security scans. A CI failure here is a diagnostic, not a catch. It means one of the inner layers should have caught the problem first, and a new control needs to be added inward.

A worked example · the schema validation gate

Early on, agents routinely submitted pull requests that failed schema validation in CI — DRF serializers out of sync with API contracts, fixtures that no longer matched their schemas. The fix wasn't a better instruction. It was a CLI tool that runs the relevant validations locally, plus a pre-push hook that requires the CLI to pass before the push is allowed. The CLI consults the manifests at Layer 3. The hook at Layer 4 makes the consultation structurally unavoidable.

The failure that used to surface at PR review or in CI now surfaces inside the agent's own session, as a precise machine-readable signal. The loop closes inside the work, not after it.

The direction of motion is always toward stronger, earlier, more structural enforcement.

§ 4

The Paradox at the Core

The innermost layer of Armature is agent behavioral direction. It is also the layer I rely on least.

This is the contradiction at the core of the framework, and I think it is the most important thing to be honest about. Layer 1 is a control, in the sense that it is better than nothing. But it is the weakest control, and the architecture of Armature is, in effect, a structure for escaping its own innermost layer as fast as possible.

Every recurring requirement that lives there is a candidate for promotion outward. Every directive that says the agent should remember to is a hook that has not been written yet.

The layer I rely on least is the layer I am asked about most.

What's in your CLAUDE.md? is the wrong question.
The right question is: what's in your CLAUDE.md that shouldn't be?

When something slips through Armature — and things still do — the diagnostic almost always points to the same thing. A control was sitting in Layer 1 that should have been promoted outward, and nobody noticed because Layer 1 had been making the right noises about it.

§ 5

Where This Comes From

Mente360 is the codebase Armature was born in. A solo-founder HIPAA-compliant SaaS, multi-tenant, real customers, real claims, real PHI. The stakes are real and the headcount is one. There is no team to absorb the cost of a quiet omission; there is no second reviewer who happens to spot the missing audit field. The system has to be designed under the assumption that no one is paying attention, because most of the time, no one is.

That assumption is the engine of the whole framework. Every protocol in this document traces to a real incident or a predictable class of error. The hooks were written because something was forgotten. The manifests were written because something was hallucinated. The cross-verification layer was added because a single confident agent shipped something silently wrong. None of this is theoretical. All of it was paid for.

The Armature Method is a working hypothesis, held provisionally. It is not a finished artifact. When a better mechanism is identified, the framework absorbs it. When a tenet is no longer load-bearing, it is retired. When a new failure mode appears that the existing layers do not catch, a new control is added at the innermost layer that can hold it — and then, over time, gets promoted outward, like everything else.

It will be wrong about some things. The point is that it should be wrong in ways that are surfaceable, falsifiable, and improvable, rather than in ways that depend on someone remembering to notice.

HOPE IS NOT A CONTROL.
Until we find better ones.
RB

Roderick Bertoncini

Founder, Mente360

Building practice-management software, and occasionally writing about how it gets built.

Ready to simplify your practice?

See how Mente360 can help you spend less time on admin and more time with clients.