The Calibration Protocol
A practical tutorial about making a coding agent prove it understands owners, seams, dependency direction, recipes, and checks before changing architecture.
By this point Peter had a small operating contract for Ledger House. The
assistant knew where to start, the reports rule named the ownership boundary,
modwire could show forbidden import edges, repeated review comments had been
split into checks and pre-edit questions, and the report export recipe made the
approved shape easier to create. This was enough for small work, but Peter knew
that the dangerous prompt was still waiting.
It was not “add a button”. It was not even “add a CSV export”. The dangerous prompt was the one that sounded helpful and vague at the same time:
Clean up this module.
That is the kind of request where a fast assistant can become very expensive. When the task is vague, almost every file looks movable. A helper can become a service, a service can move into another module, a dependency can be inverted in the wrong direction, and the final diff can still sound reasonable in a summary. Peter did not want to forbid this kind of work. Sometimes modules really do need cleanup. He wanted the assistant to prove that it understood the room before it started moving furniture.
Act I. The Dangerous Request
The request came after the CSV export was fixed. Finance had the report, the boundary check was clean, and the new recipe had already paid for itself. Then someone on the team looked at the reports module and said what teams say when they have been staring at code for too long:
Can we clean this up a little?
Peter did not hate the idea. The reports module had duplicated date formatting, two old export helpers, and a test fixture that was copied from another feature. There was real cleanup to do. But “clean this up” is not a task. It is an invitation to make many small architecture decisions without naming them.
For a human developer, Peter would start a conversation. What should stay? What should move? Which public seams must remain stable? Which tests prove the module still behaves the same? The assistant needed the same conversation, only written as a protocol it could follow.
That protocol is calibration.
Act II. Peter Asks For The Receipt
Before the assistant edits code, Peter asks for a receipt:
Route read:
- Files:
- Reason:
- Stop point:
- Owner:
- Public seam:
- Dependency direction:
- Recipe or scaffold:
- Checks to run:
- Drift conditions that stop the work:
This is not paperwork for its own sake. It is the assistant showing its understanding before the diff exists. If the receipt is wrong, the code will probably be wrong too, and it is much cheaper to correct the receipt than to review a finished refactor.
The most important line is not even the list of files. It is the stop point. The assistant must say where it stopped reading and why that was enough. If it reads too little, it misses the rule. If it reads too much, it can drown in context and still miss the decision that matters. Calibration is not “read everything”. It is “read the right thing, then prove what you understood”.
What Calibration Means
Calibration means the assistant names the architectural facts before editing. For the reports cleanup, Peter expects the assistant to identify the owner, the public seams, the dependency direction, the relevant recipe, the checks, and the conditions that should stop the work.
For example:
Route read:
- Files:
- .enclosure/rules/INDEX.md
- .enclosure/rules/shared/INDEX.md
- .enclosure/rules/local/reports.md
- Reason: cleanup affects reports application code and report export tests.
- Stop point: reports rule covers ownership, public seams, and forbidden
dependencies for this cleanup.
- Owner: reports owns export orchestration and formatting.
- Public seam: customers/public/customer-risk-query and audit/public/audit-query.
- Dependency direction: reports/ui -> reports/application -> public module seams.
- Recipe or scaffold: report-export recipe if a new export shape is needed.
- Checks to run: boundary check, report export tests, affected unit tests.
- Drift conditions that stop the work:
- moving customer risk rules into reports
- importing customers/internal/**
- changing audit access without an audit rule
- replacing public seams instead of using them
This receipt makes the vague task smaller. It does not say “clean everything”. It says which parts may move and which parts must not move.
The Human Gate
Calibration is also where the assistant admits what it does not know. This is the part Peter cares about most, because confident guessing is the real danger in architecture-sensitive work.
If the assistant finds two possible public seams, it should not choose one quietly. If the reports rule conflicts with an old example in the code, it should not follow the example blindly. If the cleanup would move behavior across module ownership, it should stop and ask.
The receipt can include a human gate:
Unresolved:
- There are two existing customer risk query shapes.
- One old report imports customers/internal/risk-score.ts.
Question:
- Should I preserve the old import for now, or move both reports to the public
customer risk query?
This is not a failure. This is cooperation. Peter can answer the architectural question before the assistant turns the wrong assumption into a hundred-line diff.
Comparing The Two Attempts
The original CSV attempt was fast. It produced correct behavior, but it crossed ownership boundaries because the assistant optimized for the visible task. The calibrated attempt starts slower, because the assistant must read, report, and ask before editing. That slower start is the point.
The difference looks like this:
First attempt:
- find nearby files
- import what is needed
- make the CSV work
- discover architecture problem in review
Calibrated attempt:
- read route
- name owner and public seams
- choose recipe or existing shape
- ask about missing seams
- edit inside the approved boundary
- run checks
The second path is not ceremony. It moves the architecture conversation before the implementation. That is where it belongs.
What To Calibrate
Peter does not need this protocol for every small change. If the task is a copy change in a label, a full architecture receipt is nonsense. Calibration is for work where the assistant might move structure, ownership, dependencies, or public contracts.
Good calibration candidates sound like this:
- clean up this module;
- split this service;
- make this consistent;
- extract a shared helper;
- introduce a new boundary;
- move this logic closer to the domain;
- refactor these tests.
These requests are dangerous because they are not only about code. They are about belonging. The assistant needs to know what may change and what must stay stable.
Exercise
Take one architecture-sensitive task from your own repository. Do not start with the implementation. Start with the receipt.
Fill this in:
Route read:
- Files:
- Reason:
- Stop point:
- Owner:
- Public seam:
- Dependency direction:
- Recipe or scaffold:
- Checks to run:
- Drift conditions that stop the work:
- Unresolved questions:
If you cannot fill in the owner, public seam, and dependency direction, the task is not ready for an assistant yet. That does not mean the task is bad. It means the architecture question has to be answered before the code starts moving.
Then ask the assistant to produce the same receipt. Compare its answer with yours. The difference between the two is useful. It shows which parts of the repository are clear and which parts still live in your head.
Summary
Fast code is no longer the hard part. The hard part is fast code that still belongs to the system.
Calibration is the moment where the assistant proves it understands that system before it changes it. It reads the route, names the owner, names the seams, names the dependency direction, selects the recipe or scaffold, lists the checks, and admits what is unclear. That is not bureaucracy. That is how a human architect and a coding agent share the same room.
enclosure is not here to replace Peter. It is here to give Peter’s decisions a
place in the repository, so humans and agents can read them before the next
confident change lands.