Feature flag kill switches
Wrap a risky feature in a flag. When it breaks, flip it off in seconds: no redeploy, no full rollback, just the safe fallback you chose.
Last updated:
A feature flag kill switch wraps a risky or expensive code path in a flag so you can disable it in production the instant it misbehaves. When something goes wrong, you set the flag to off in a dashboard. Every connected SDK serves the safe fallback on its next evaluation, within a second or two, with no redeploy, rebuild, or rollback. The code stays in the binary; no traffic runs it. Because the flag carries a default value, the same switch protects you against an unexpected outage as well as a deliberate one: if the SDK cannot reach the service, it returns the fallback you chose. Scoped to one feature, cutting it does not revert the rest of the release, which is what makes it the fastest lever you have during an incident. For the bare definition, see what a kill switch is.
Why put a kill switch on a feature?
Four reasons teams wrap risky behaviour in a flag before it ever ships to everyone.
- 1
Mitigate in seconds, not a deploy cycle
A redeploy during an incident means a rebuild, a pipeline run, and a fleet rollout while the error rate keeps climbing. Flipping a flag is a single configuration change that takes effect on the next evaluation. The mitigation lands in seconds, so the clock stops before you have even found the root cause.
- 2
Scoped to one feature
A rollback reverts everything in the release, including the unrelated fixes that shipped alongside the broken feature. A kill switch disables exactly one code path and leaves the rest of the deploy untouched. You stop the one thing that is on fire without un-shipping four things that are fine.
- 3
Decouple incident response from the pipeline
When the fix is a flag, the person mitigating does not need deploy access, a green CI run, or a release window. The on-call engineer flips the switch from the dashboard at 3am and triages afterwards. The deploy pipeline stops being on the critical path of every incident.
- 4
Pre-wired safety for risky paths
The switch only helps if it exists before the incident. Wrapping a third-party call or a heavy feature in a flag at build time means the lever is already in place when the dependency degrades. You decide the safe fallback once, calmly, instead of writing it under pressure during an outage.
What to put behind a kill switch
Not every line needs a switch. The candidates are the code paths most likely to fail or get expensive independently of the rest of your app.
| What to wrap | Why it earns a switch | What the fallback does |
|---|---|---|
| A flaky third-party dependency | A payment provider, search index, or recommendations API can slow down or fail independently of your own code. | Skip the call and serve a cached or degraded result instead of hanging the request behind a dead upstream. |
| A resource-heavy feature | A new query, an expensive render, or a fan-out job can spike database or CPU load only once real traffic hits it. | Drop back to the cheaper code path and shed the load while you tune or reshape the feature. |
| A brand-new launch | A feature carries the most unknown risk in the first hours after it ships, before production has stress-tested it. | Serve the previous, known-good behaviour the moment a metric turns red. |
| A quota- or cost-bound integration | An LLM call, an SMS gateway, or a metered API can burn money or hit a hard limit if usage runs away. | Disable the calls and queue, drop, or stub the work until the spend is back under control. |
A useful test: if this code path failed in production, would you want a way to disable it without shipping a new build? If yes, it earns a flag. The same flag you ramp up with a progressive rollout doubles as its kill switch once the feature is live.
How Featureflip handles kill switches
What makes the off-switch fast, safe, and dependable at the moment you actually need it.
- Server-Sent Events push. A flag change reaches every connected SDK within a second or two, fleet-wide, with no redeploy: Featureflip pushes it over an open Server-Sent Events connection the moment you flip the switch. This is the propagation guarantee that turns "off" into a real-time operation rather than a poll-interval wait. See the streaming API for how the SSE channel works.
- Sub-millisecond local evaluation. The kill-switch check adds no measurable latency to the hot path: the SDK evaluates from an in-memory copy of the config, so the switch you depend on during an incident never adds a network hop to every request. The config stays local even if the control plane has a blip.
- Fail-safe defaults. Every evaluate call takes a fallback value, returned if the flag is off or the SDK cannot reach Featureflip. A deliberate kill and an unexpected outage both resolve to the safe path you picked, so the switch protects you in either failure mode.
- Environment-scoped. Each environment holds its own flag state, so you can cut a feature in production while it keeps running in staging. See environments for how configuration is separated.
- Composable with rollouts. The kill switch and the percentage rollout are two ends of the same dial: ramp a feature up gradually, then cut it to zero instantly if it misbehaves. One flag, both behaviours, no extra wiring.
- Audit trail. On the Business plan, the audit log records who flipped the switch and when, so the post-incident review has the "who turned it off, and at what time" answer without guesswork.
What it looks like in your app
A kill switch is the ordinary flag-evaluation pattern with one rule: the third argument, the fallback, must be the safe value. That fallback is what the SDK returns when the flag is off and when it cannot reach Featureflip, so it has to represent the feature being disabled.
// 'false' is the fallback: returned if the flag is off OR if the
// SDK can't reach Featureflip. Pick the safe value, not the risky one.
if (client.evaluate('recommendations-enabled', user, false)) {
return renderRecommendations(user); // the expendable feature
}
return renderWithoutRecommendations(user); // the safe fallback
Set the flag to off in the dashboard and every SDK returns false on its next evaluation: the recommendations block stops rendering everywhere within seconds. Because the default is false, a control-plane outage also drops to the safe path rather than leaving the risky feature running. The application code never changes between an incident and a normal day. The same surface works in every language the platform supports, from Python and Go to C#, Java, and Node. Pick a quickstart from the SDK overview.
Kill switch vs rollback vs circuit breaker
Three tools that all reduce blast radius, but they are not interchangeable. The differences are trigger, scope, and speed.
| Dimension | Kill switch | Rollback | Circuit breaker |
|---|---|---|---|
| Trigger | A human, deliberately | A human, deliberately | Automatic, on error rate or latency |
| Scope | One feature | The whole release | One dependency call |
| Speed | Next evaluation (seconds) | Rebuild and redeploy (minutes) | Immediate, per request |
| What happens to the code | Stays deployed, dormant | Removed from production | Stays; calls short-circuit |
| Best for | Disabling a known-risky feature on demand | Reverting a bad deploy wholesale | Shedding load from a failing dependency without a human |
They compose well. A circuit breaker handles the automatic, per-request case (a dependency degrades, calls short-circuit on their own); a kill switch handles the deliberate case (you have judged a feature risky and want it off); a rollback is the heavier lever for when the whole release is bad. Reaching for the right one is half of a calm incident.
Common mistakes to avoid
The patterns that turn a kill switch into a switch that does nothing, or makes things worse, when you finally pull it.
Failing open instead of failing safe
The fallback value decides what happens when evaluation can't run. If you default to the risky path, an outage that stops evaluation leaves the dangerous code executing, the exact opposite of what a kill switch is for. Default every kill switch to the safe value so a deliberate flip and an unexpected blip both land on the same safe behaviour.
Never exercising the off path
The fallback branch rots when nobody runs it. The day you finally flip the switch, you discover the legacy path was deleted, throws, or renders broken. Exercise the off path in CI or staging on a schedule so it still works when production needs it.
A kill switch that depends on what it kills
If the flag configuration is fetched through the same gateway, database, or service that is failing, the switch dies with it. Keep the flag-evaluation path independent of the dependency it protects. Featureflip evaluates from an in-memory copy of the config, so the SDK can still read the flag even when the protected dependency is down.
No alert to tell you to flip it
A kill switch is a manual lever. Without monitoring on the wrapped code path, nobody knows the feature is misbehaving until a customer reports it. Pair every kill switch with an alert on the metric that would justify pulling it.
Flipping the wrong environment
Each environment holds its own flag state. Cutting the switch in staging does nothing for the production incident; cutting it in production during a test takes down live traffic. Confirm the environment before you flip, every time.
Treating an ops toggle like a release toggle
A rollout flag is short-lived: you delete it once the feature hits 100%. A kill switch is legitimately long-lived: it earns its keep precisely when something breaks, which might be months later. Keep it, but document it clearly so a later cleanup pass does not mistake it for flag debt.
When a kill switch is the wrong tool
A kill switch only works when there is a safe path to fall back to. A few situations do not offer one:
- Irreversible actions. A schema migration that has already altered the table, an email already sent, a card already charged: flipping a flag cannot un-do any of these. Reach for staged migrations and idempotent, replayable jobs instead, and put the switch in front of the action, not after it.
- Failures that need handling per request, automatically. If the right response is "retry, then degrade the moment this dependency errors," that is a circuit breaker tripping itself, not a human flipping a switch. Use the breaker for the automatic case and keep the kill switch for the deliberate one.
- Core functionality with no safe fallback. If a feature has no degraded mode, then "off" is just a different outage. The work is to build the fallback first; the switch is only as useful as the path it falls back to.
- Changes that must apply uniformly. A compliance-mandated behaviour or a security patch usually cannot be left disabled for some users and on for others. Ship it to everyone and monitor, rather than gating it behind a switch you are reluctant to leave off.
Frequently asked questions
- What is a feature flag kill switch?
- A feature flag kill switch is a flag whose only job is to turn a feature off quickly. You wrap a risky or expensive code path in a flag, and when it misbehaves in production you set the flag to off in the dashboard. Every connected SDK serves the safe fallback on its next evaluation, with no deployment, rebuild, or rollback. The code stays in the binary, but no traffic runs it, so you can mitigate first and diagnose later.
- How quickly does a kill switch take effect?
- With Featureflip, typically within a second or two. The dashboard pushes the change to every connected SDK over a Server-Sent Events stream, and the next flag evaluation uses the new value fleet-wide. SDKs configured for polling pick the change up on their next interval instead. Either way it is a configuration change, not a deploy, so there is no rebuild or rollout cycle between flipping the switch and the feature going dark.
- How is a kill switch different from a rollback?
- A rollback reverts a deployment: it rebuilds and redeploys, it is coarse (it reverts everything in that release, not just the broken feature), and it is slow and risky to run mid-incident. A kill switch is scoped to a single feature and takes effect on the next flag evaluation. The new code stays deployed but dormant, so you can stop the bleeding in seconds and then triage the root cause without the deploy pipeline on the critical path.
- How is a kill switch different from a circuit breaker?
- A circuit breaker is automatic: it watches error rates or latency on a specific dependency call and trips itself, per request, without a human. A kill switch is deliberate: a person decides a feature should be off and flips it. They are complementary. Use a circuit breaker to shed load from a failing dependency the instant it degrades, and a kill switch to disable a feature you have judged risky on your own schedule. Many teams wire both around the same risky integration.
- What should the kill switch's default value be?
- The safe one. Every evaluate call takes a fallback value that the SDK returns if the flag is off or if it cannot reach Featureflip. Choose the value that represents the feature being disabled, so a deliberate kill and an unexpected outage both resolve to the safe path. Defaulting to the risky value means a network blip silently re-enables the very thing you were guarding against.
- Can I disable a feature in one environment without affecting others?
- Yes. Flag state is per environment, so you can cut a feature in production while it keeps running in development or staging, and vice versa. That separation is what lets you flip a production kill switch during an incident without disturbing the environments your team is actively building in.
Wrap your next risky feature in a flag
Free Solo plan covers 10 flags and 2 environments. No credit card, no demo call: sign up and put a switch on it.
Related
Kill switch (definition)
The glossary entry: the short, plain-English definition and how it differs from a rollback.
Progressive rollouts
The other end of the dial: ramp a feature up gradually, with the same flag ready to cut it to zero.
What are feature flags?
The concept page covering flags, variations, environments, and the evaluation model.
Feature flag anti-patterns
How teams hurt themselves with flags, including kill switches that fail open or never get tested.