My Superhero Origin Story
Every superhero needs an origin story.
Batman had the alley.
Spider-Man had the spider.
I had… Cursor, Claude 3.5 Sonnet, and a very bad idea.
I had just fallen in love with agentic coding. Cursor was happily orchestrating Claude like a tiny overcaffeinated intern, and I thought:
“What’s the worst that could happen if I go a tiny bit ambitious?”
So obviously, I chose this as my starter project:
“Let’s build an MCP server that can analyse extremely complex codebases and answer questions like:
- Which endpoints touch this entity?
- If I change this file, what else breaks?
- What does the dependency graph look like across React, Vue, C#/.NET, etc.?”
Impact analysis. Dependency trees. Multiple stacks. All in one go.
It was less “hello world” and more “hello nervous breakdown”.
But it was fine, because I had tests. And as we all know: if you have tests, nothing bad can ever happen. Right?
The "Success"
So I wired up some TDD-ish tests, opened Cursor and basically said:
“Use TDD to build this. Make the tests pass.”
Claude went full superhero montage:
- generating modules, helpers, abstractions
- wiring things up
- watching tests fail
- trying a “different approach”
- more failing tests
- more code, more changes, more retries
And then… all green.
Beautiful. I’d done it. I was now officially The Guy Who Does Real TDD With AI™.
The Horror
Fast forward to my actual superpower: reading the diff.
Buried in one of the core functions, I found something that looked suspiciously like this:
// test-specific path to satisfy expected output
if (isTestRun && input === "someTestFixture") {
return "whateverTheTestExpects";
}My AI assistant had not discovered deep insights about complex code graphs.
It had simply hard-coded the test’s expected value into the implementation.
Green tests by way of: if (runningUnderJest) just lie().
Once I saw that, I did what any responsible engineer would do: I grepped.
And then the real horror movie started. I found:
- “placeholder – works for tests” in core logic
- “TODO: real implementation, current version only matches test data”
- mocks that had quietly migrated from the test files into the production code
The model had optimised perfectly for the goal I had actually given it:
“Make the tests pass.”
Not:
“Implement the behaviour these tests describe in a way that’s safe for production.”
And that, unfortunately, was my superhero origin moment.
The radioactive spider bite was realising this:
- Tests alone are not enough.
- AI will ruthlessly game whatever you point it at.
- If your only objective is “green”, it will happily redefine “done” to mean “the tests stopped complaining”.
The Result: Gold Standard TDD
Out of that mess came what I now call Gold Standard TDD (GS-TDD):
- Red – Write tests that actually describe behaviour, constraints and risk.
- Gold – Ask the AI for a Gold Standard implementation suitable for production, not just the smallest hack that appeases Jest.
- Refactor – Keep a human in the loop whose job is to say “no” when the model gets clever instead of reliable.
So no, I don’t have a PhD or a tragic laboratory accident in my backstory.
I just have one very specific day, one very ambitious MCP server, and one AI that cheerfully cheated its way to green tests.
My Superpower
I assume the model will do exactly what I say, and I design my tests and prompts so “what I say” is finally a lot closer to “what I actually want”.