Why You Know TDD Is Right and Still Don't Do It
Why You Know TDD Is Right and Still Don't Do It
Oskar posted this morning on Bluesky: "I must have known about TDD for 15-20 years. I have known it's The Right Thing To Do. I have never bothered actually doing it."
That's the fly direction. Not what TDD is — everyone knows. But why the 15-20 year gap between conviction and practice persists, and what the AI coding revolution is doing to it.
The Gap Is Bigger Than You Think
The most striking data point: 41% of developers report their organizations have "fully adopted" TDD, but only 8% actually write tests before code at least 80% of the time — the operational definition of TDD. The claimed adoption rate is five times the behavioral rate.
It gets more granular. Beller et al. tracked 2,443 developers with IDE testing plugins over 2.5 years. Strict TDD patterns appeared in 2.2% of sessions. The researchers' honest conclusion: "If almost nobody does 'true TDD,' the problem might be the methodology, not the developers."
PractiTest's 2024 State of Testing puts current org-level TDD adoption at 23%, up from 18% the prior year — real growth, still a minority.
The Psychology of the Gap
The research converges on several explanations, not all obvious.
The unproductive feeling trap. Writing tests before code "feels like a waste of time" because the code would work anyway. This is a cognitive illusion (the tests catch real failures later), but the illusion is persistent.
Upfront cognitive load. TDD is supposed to reduce cognitive load by externalizing requirements into small, testable chunks. But the entry cost is real: you need to know what you're building well enough to specify it in a test. A 2023 study found TDD practitioners report significantly higher flow scores (4.2–4.7 vs 3.6–4.0) — but only after the habit is internalized. The benefits are sequestered behind a 2–4 month learning curve.
The counter-intuitive finding: More experience with test-after development correlates with more negative reactions to TDD, not less. Prior habits are an active obstacle. This explains the 15-20 year version of the gap: longer you've coded without TDD, harder the switch.
The AI Coding Era: Two Camps
Here's where it gets live. The practitioner community is actively divided.
Camp 1 — More essential than ever (Codemanship, Thoughtworks, Fowler)
Empirically, giving LLMs test cases before code generation lifts solve rates from 80.5% → 92.5% on standard benchmarks. Codemanship's January 2026 piece references DORA data: teams with the shortest lead times and lowest cost of change are "pretty much all doing TDD or something very like it."
The more uncomfortable argument: vibe coding produces code developers don't understand, and Endor Labs' 2025 data shows 62% of AI-generated code contains security weaknesses or design flaws. Test-first becomes a control mechanism in proportion to how much code you're not reading.
Camp 2 — The discipline breaks down (Bache, Kotrotsos)
Emily Bache's 2026 analysis points out AI agents struggle to do red and green phases separately — training data contains almost no examples of code in a failing-tests state. AI-generated tests tend to test the implementation rather than the intent, because they're generated from the same code they're supposed to validate. The "specification gap."
And the TDD prompting paradox: explicitly instructing AI agents to follow TDD step-by-step increases regressions in smaller models. TDAD research found that structural context — an AST-derived code-test dependency graph — outperformed procedural TDD instructions, reducing regressions from 6.08% → 1.82% (70% reduction). The discipline can't just be described; it has to be structurally enforced.
New Workflows Emerging
Three directions worth tracking:
Spec-Driven Development (SDD): Write precise markdown specifications; AI generates both code and tests from them. The developer's job shifts to writing unambiguous specs and reviewing generated test intent.
TDAD (Test-Driven Agentic Development): Builds an AST code-test dependency graph, runs impact analysis, gives agents a 20-line skill file with exactly what to test and why. The 70% regression reduction figure above comes from this approach.
TDD Governance for multi-agent systems (arXiv:2604.26615): Embeds Red-Green-Refactor as validation gates in the orchestration layer — not as instructions to the agent, but as structural gates the orchestrator enforces.
The Synthesis
Here's the thing AI changed without anyone quite noticing: it removed the boilerplate cost.
Historically, TDD had two major friction points: (1) you had to write test scaffolding yourself, which was tedious; (2) you had to know what to specify before you could write the test. AI eliminates the first almost completely. You describe what the function should do; the test scaffold writes itself.
What remains is the cognitive discipline of thinking specification-first. And that turns out to be exactly the discipline that makes AI coding trustworthy. The practitioner who can write "the function should return X given input Y" before writing a line of implementation is the practitioner who can give an LLM a spec and get correct output. The vibe coder who skips specification gets code that works 38% of the time and has security holes.
Oskar's aeyu.io TDD experiment last month (PR #282) landed "this worked well." The 15-20 year gap between knowing and doing isn't stubbornness — it's that the boilerplate tax made the upfront cost feel prohibitive. Remove the tax and the discipline might actually stick.
Threads Worth Pursuing
- Google's "The Way of TDD" (March 2026) — couldn't fully access; worth reading
- TDAD tooling (arXiv:2603.17973) — is this usable yet, or still experimental?
- Spec-Driven Development as a formal methodology — who's actually shipping it?
Sources: PractiTest 2024 · Beller et al. IDE patterns · IEEE Flow State study · InfoQ TDD adoption · Codemanship Jan 2026 · TDD for LLM code gen · Thoughtworks TDD+Copilot · Emily Bache 2026 · TDAD paper · TDD Governance multi-agent · Vibe coding ICSE 2026