Auto vs Manual Definition of Done: When Checklists Fail and What to Use Instead
Manual DoD checklists drift within 6 months. Here is why, what auto-evaluation looks like in Jira, and when each model wins.
Most teams start their Definition of Done with a Confluence page and a fresh checklist app. The team agrees, the boxes get ticked, sprints feel cleaner. For about two months. If you do not yet have an explicit DoD list, our 10-criteria DoD template for Jira is a starting point.
Then the boxes start getting ticked the same way standup updates get copied from yesterday — by reflex, on autopilot, without re-checking the underlying state. The DoD has not failed. The checklist has.
The lifecycle of a manual DoD checklist
Month 1: Enthusiasm
Everyone is bought in. The Scrum Master walks through the checklist at retro, devs untick boxes when something breaks, the PO checks acceptance criteria before sign-off. Velocity stabilises. Defects in production drop.
Month 3: Friction
A release is late. A bug slips through. Someone realises the checklist's "tests passed" box was ticked on a story whose tests were actually skipped because the test environment was down. The team patches the rule ("tests passed means tests actually ran, not just got past CI"). The checklist gets one more line.
Month 6: Ritual
New devs join. They see the checklist and tick the boxes because that's what you do here. The team has 14 criteria, half of which nobody remembers the reason for. Reviewers click through them faster than they can read. The checklist is now theatre — the team performs "Done" for itself.
Three failure patterns of manual checklists
1. Honest ticking, dishonest reality
The reviewer ticks "code reviewed" because someone approved the PR — but the approver only scanned the file titles. The box is ticked truthfully against the question asked, but the question was the wrong proxy for actual code review.
A box that tracks an artefact (PR exists, approval exists, document linked) is more honest than a box that tracks a verdict (code is good, tests are sufficient).
2. Stale boxes
The story passed DoD on Tuesday: all criteria ticked, ready to ship. On Wednesday morning the engineer rebases and accidentally drops the new test file. The DoD box still says "tests written" because nothing in Jira knows the file is gone. The story ships.
Manual checklists are point-in-time. They capture what the ticker believed at the moment of clicking. They do not re-evaluate when the underlying issue changes — and Jira issues change constantly.
3. Criteria drift
The criteria themselves rot. Three sprints in, the team adopts feature flags but the DoD still says "deployed to production" without a flag clause. Six sprints in, the team moves to trunk-based development but "PR approved by two reviewers" is still on the list. Manual checklists do not push back when reality shifts; they just collect cruft.
What auto-evaluation actually checks
Auto-evaluating DoD apps read Jira state directly. Instead of asking the reviewer "did you do X?" they ask Jira "is the condition for X true?":
- All subtasks Done? Auto: query subtask statuses. Manual: tick a box.
- Resolution set? Auto: check the Resolution field. Manual: tick a box.
- PR linked and merged? Auto: query the Development panel. Manual: tick a box.
- No blocking bugs open?Auto: query "is blocked by" links with status not Done. Manual: nobody remembers to tick this.
- Fix Version released? Auto: check Fix Version release state. Manual: requires the reviewer to know what Fix Version is.
The score updates whenever the underlying state changes. If someone rebases and drops a test file, and the test issue link goes red, the DoD score drops too. The story stops being "done" without anyone re-ticking anything.
When manual still wins
Auto-evaluation only works when the criterion maps to something Jira knows. Some DoD criteria are intrinsically subjective and need a human:
- PO sign-off — a person has to agree the work meets intent. Best modelled as a manual check after auto-criteria pass.
- UX review — designer eyes on the output. Cannot be inferred from Jira fields.
- Security review — for changes touching auth, payments, PII. A specialist has to inspect.
- Accessibility check — keyboard navigation, screen reader, colour contrast. Tools help, but the verdict is human.
The strongest DoD setups use both: auto-eval for the objective criteria (typically 70-80% of the list), manual checkbox for the subjective ones. The DoD score is computed across both, but the team understands which boxes carry which weight.
How to implement auto + manual DoD in Jira
The native-only path
Workflow validators (ScriptRunner, JMWE) can enforce auto-criteria on the transition to Done: block the move unless subtasks are Done, Resolution is set, and Fix Version is filled. Manual criteria live in a description checklist that the reviewer ticks before transitioning. This works but you maintain two systems and the score is binary (passes / fails) rather than graduated.
The dedicated-app path
A purpose-built DoD app evaluates the auto-criteria against Jira fields and exposes a single panel where the team can also tick subjective items. ReDoworks this way. Auto-criteria (description filled, subtasks Done, PR linked, no blocking bugs, etc.) evaluate continuously; manual criteria ("PO approved", "UX reviewed") sit as toggles in the same panel. The 0-100% score reflects both, weighted however the team configures.
Common mistakes when moving to auto-evaluation
- Trying to auto-eval everything. Some criteria are irreducibly human. Forcing them into auto-form (e.g. a label that someone has to apply manually) just moves the manual click and loses the human judgement signal.
- Switching all at once. Move auto-eval-able criteria over first, leave the manual ones intact, validate for a sprint or two, then iterate.
- Ignoring the score until Done. The point of a continuous score is to see readiness during development, not just at the transition. Surface it on the issue panel and the sprint board.
Key takeaway
Manual DoD checklists fail predictably: they go from enthusiasm to ritual within six months because the team has no signal when the boxes start lying. Auto-evaluation against Jira fields fixes the feedback loop for the objective criteria, leaving humans to handle the genuinely subjective parts. The right setup uses both — with the boundary drawn at "can Jira answer this question?". The same logic applies to the input side of the sprint: 5 signs your backlog isn't ready for planning covers the early-warning signals.