There’s not much a developer hates more than a blocked pipeline by a flaky test. Well, maybe having to refactor someone’s legacy code, but pipeline delays are right up there.
They’re on their own deadlines, and every minute counts. If the blocker is a real bug, no one argues - better to stop a bad merge than fire-drill it in production. But if the blocker turns out to be a flaky test? That’s when things get… heated. Faster than you can say “sorry,” your end-to-end tests get yanked from the pipeline and you’re back to hoping the right bugs get caught before prod. Not a great look.
And yet, ignoring tests isn’t an option either. If you’ve ever had something critical fail in production because “oh yeah, that test was turned off,” you know the pain. Tests need to run, and they need to run as soon in the pipeline as possible. Anything else is just a slow slide into “we’ll fix it later” - a.k.a. never.
How do you keep your pipeline trustworthy without triggering a developer revolt?
Here’s the strategy we use at Octomind that’s been keeping our devs (mostly) happy, our pipeline (mostly) green, and our releases (mostly) bug-free.
Forget the textbook “testing pyramid.” Our shape looks more like an hourglass.
We don’t have a dedicated QA, but we consider software quality to be an essential part of every Octoneer’s work. To keep stability front and center, we introduced a rotating role: OctoQA.
Each week, a different person wears the OctoQA hat. Their mission: monitor and manage our “non-pipeline” e2e tests.
Why non-pipeline? Because fresh e2e tests are often flaky at first. Not because the code is bad, but because test setup is hard - isolation issues, data dependencies, timing quirks. Even seasoned test writers don’t always nail it on the first try.
Here’s how it works:
This “quarantine first, promote later” approach means our blocking pipeline stays green for the right reasons - not because we stripped all the tests out of it, but because only proven and stable tests make it in.
It rarely happens, but even after promotion, a test can occasionally turn flaky. In that case, we don’t let it torture developers. It’s immediately pulled back into quarantine for investigation. Once fixed and stable again, it can rejoin the main pipeline.
After running this system for a while, a few truths have become obvious:
So yes, the pipeline still blocks when it has to. But it blocks for real bugs, not false alarms. That’s how you avoid both broken prod releases and angry dev mobs.