Flakiness erodes test trust. Manage at team level.
Process:
1. Detect.
- Track flake rate per test.
- Rerun failing tests automatically; if pass, mark as flaky.
- CI tools support this.
2. Quarantine.
- Mark flaky tests as skipped.
- Don't block merges on them.
- Track in dedicated dashboard.
3. Root cause.
- Each flaky test gets investigation.
- Fix or retire.
4. Fix.
- Address root cause (not retry-loop).
- Re-run 100x to verify.
- Remove from quarantine.
5. Retire.
- If can't fix, remove.
- Don't keep dead weight.
Cultural elements:
- Flakiness is broken — same urgency as failures.
- No blame — flakies happen; fix them.
- Owner per flaky test — accountability.
Tools:
- Pytest-flaky for Python tests.
- Jest --retries for LWC.
- Custom CI logic for Apex.
Anti-patterns:
- Ignoring flakies — develops habit of ignoring failures.
- Retry-and-pray — mask not fix.
- Flakies accumulate — eventually unsustainable.
Senior QA insight: flaky tests are technical debt. Manage like code debt.
The senior framing: address aggressively before they normalize ignoring failures.
