The Ward — A Room the AI Is Allowed to Work In
A field note about bounded work, architectural ownership, and why "make the thing" is not a safe unit of AI-assisted development.
A ticket describes work. A Ward describes a boundary.
Most tickets are written for humans who already know the system. They assume shared context, team memory, architectural taste, and all the small obvious things nobody writes down because nobody wants Jira to become a haunted legal contract.
An AI assistant does not reliably share that context. It sees the task, the nearby files, the examples in scope, and the shortest path to done. So when the task is "add colored schedule cells," it may do exactly that — and a few things you did not ask for, in places you did not mean to open.
The Ward starts one level earlier. Before the work, it asks where the work is allowed to happen, what it may read, what it may produce, what it must not touch, and who approves the contract before any code is written.
That is the room.
Ward #4 was the leak. This is the shape of the wall that should have stopped it.
The ticket is too soft
A normal ticket might say: "add colored schedule cells." That is enough for a human who knows where schedule logic belongs, which layer owns domain meaning, and why the rendering kernel should not suddenly develop opinions about nurse shifts, school lessons, payroll cells, or any other business object wearing a rectangle costume.
The AI does not get that background for free. So a Ward says the quiet part out loud:
Add colored schedule cells in the schedule vertical. The rendering kernel receives color as data. It must not know what the color means.
The ticket asks for output.
The Ward defines ownership.
The shape of a Ward
A Ward is a bounded unit of work with enough structure to keep the AI useful without letting it improvise the architecture.
A useful Ward usually fits on one readable page and answers a few boring questions before the code begins:
- What is this Ward allowed to change?
- What does it depend on?
- What should it produce for later Wards?
- Which behaviors define success?
- Which boundaries must not be crossed?
- How will a human verify that it is done?
A minimal Ward has these parts:
# Ward 017: Colored Schedule Cells
## Scope
Add colored schedule cell rendering in the schedule vertical.
## Inputs
- Existing schedule cell model
- Existing render command API
## Outputs
- Schedule metadata maps to configured UI colors
- Renderer receives color as plain render data
## Tests
1. maps_shift_type_to_color
Verifies: Shift metadata maps to configured colors
2. sends_color_to_renderer
Verifies: Render command receives explicit color
3. kernel_does_not_import_shift_types
Verifies: Kernel has no dependency on schedule types
## Must DO
- Keep domain meaning in the vertical layer.
- Pass colors into rendering as data.
## Must NOT
- Do not add shift types to the rendering kernel.
- Do not hardcode demo colors in engine code.
- Do not make rendering depend on schedule-specific concepts.
## Verification
All tests pass, and the implementation can be explained without saying "just for the demo."This is close enough to a ticket that it feels familiar, but strict enough that it changes the work.
The scope says what room we are in.
The inputs and outputs say where the doors are.
The tests define the contract.
The Must-NOT list marks the walls.
The verification step tells the human what "done" means before the AI tries to define it itself.
The lifecycle is part of the design
A Ward is not complete because the AI says it is complete.
It moves through a small lifecycle:
planned → red → approved → gold → completePlanned means the room exists, but nobody has started work inside it.
Red means the AI has written failing tests for the Ward contract.
Then it stops.
That stop is the important part.
The human reviews the tests before implementation begins. Not after. Not when the pull request is already a 900-line confidence incident. Before.
The question is not only "do these tests pass or fail?" They should fail. That is their job.
The real questions are:
- Are these the right behaviors?
- Is the ownership in the right layer?
- Are the boundaries tested?
- Is anything important missing?
- Is this Ward secretly three Wards in a trench coat?
Only after that review does the Ward become approved.
Gold means implementation. The AI is now allowed to make the approved tests pass, inside the approved boundaries.
Not redesign the subsystem.
Not expand the scope because the neighboring file looked lonely.
Not add a small clever abstraction that will require a future exorcism.
Just implement the contract.
Complete means the human has verified the result and the Ward becomes stable ground for future work.
The lifecycle is not process theatre. It is how the architecture stays attached to the work while the AI moves fast.
The approval gate is the architecture gate
The pause between Red and Gold is where WDD earns its keep.
Without that pause, the AI can write tests that validate its own misunderstanding, then implement confidently against the wrong contract. That is not malicious. It is just very efficient nonsense.
With the pause, the human gets to inspect the contract while it is still cheap to change.
This is where architectural ownership happens.
If a test expects the kernel to understand ShiftType::Night, the problem is not that the implementation is missing. The problem is that the test already moved domain meaning into the wrong room.
Catching that before implementation is the difference between correcting a boundary and cleaning up a leak.
This is why the human approval gate is not a formality. The human is not approving that the AI wrote some tests. The human is approving that the Ward describes the right work in the right place.
That is a very different review.
Small enough to distrust completely
A Ward should be small enough that you do not have to trust the AI very much.
That sounds pessimistic. It is actually the optimistic version.
I want AI to write useful code quickly. I also want the work to be small enough that I can review the contract, understand the diff, and catch architectural drift without pretending I read every line with equal spiritual intensity.
A Ward is too large when review becomes ritual.
You know the feeling. The diff is big. The tests are green. The assistant sounds confident. You scroll, nod, leave a comment about naming, and hope the basement is fine.
The basement is rarely fine.
A good Ward keeps the review honest. It gives the AI enough room to be useful, but not enough room to rearrange the house.
The smallest useful rule
The simplest version of WDD is this:
No Ward, no code.
Before implementation starts, the room should exist. The scope should be written down. The inputs and outputs should be named. The tests should be approved. The walls should be visible.
Then the AI can move fast. That is the bargain.
Not the whole castle, not "whatever files seem relevant," not "just make the demo work." A room. With a door. With a human holding the key.
Next: The Must-NOT List — the part of the spec that keeps architecture alive.
The next Ward zooms in on the most useful part of the contract: the rules that say what the AI must not do, even if it would make the local task easier.