Feature Flag Anti-Patterns: 9 Mistakes Behind Real Outages

Q: What's the difference between a feature flag and an environment variable?

Feature flags are for behavior changes — turning a feature on for a cohort, gradually exposing a new code path, killing it under load. Environment variables are for deploy-time configuration — connection strings, region keys, log levels. Flags evaluate at runtime against user context; env vars are static for the life of the process. The full distinction lives in our feature flags vs environment variables post. Featureflip is built around the failure modes in this post: an audit log on every flag, per-evaluation events through the SDK, a one-click kill switch, and flag-type tags at creation so cleanup follows the lifecycle. If you'd like to see how that plays out in practice, start with the Solo plan. It's free forever for one project.

Feature flags don’t cause outages. Specific anti-patterns do — and they have names. Reusing a deprecated flag bit cost Knight Capital $460 million in 45 minutes (Henrico Dolfing case study, 2019). A flag-rollout-triggered performance bug at Slack cascaded into a day of degraded service in May 2020 (Slack Engineering). A code path that wasn’t behind a flag at all crashed half of Google Cloud globally in June 2025 (ThousandEyes outage analysis, 2025).

The interesting thing about those three incidents isn’t that flags are dangerous. It’s that the failure modes are recurring, named, and avoidable. This post lists nine of them — what each looks like in code, which public postmortem it shows up in, and the fix.

Key Takeaways

Most flag-related incidents trace back to a small set of named anti-patterns: reuse, client-side authorization, flag-as-config, coupled toggle points, untested off-state, missing kill switches, nested dependencies, no observability, and “set and forget.”

The flag itself is rarely the cause. The hidden coupling between flag identity and code-path identity is.

A flag check is display logic, not security logic. Authorization belongs behind a session-validated server check.

Most release flags should live days to weeks, not months (LaunchDarkly). Tag flag type at creation; lifecycle follows from type.

1. Reusing a deprecated flag

Nearly a decade before the incident, Knight Capital deprecated a feature called Power Peg. They marked the flag deprecated, switched users away, and disabled the option in the UI. They didn’t remove the code. Years later, an engineer needed a new flag bit and reused the deprecated Power Peg bit. The deploy missed one of eight servers. When the flag was set in production, the eighth server — still carrying the old binary — ran the dead code path. Knight bought roughly $7 billion of stock it couldn’t pay for, lost about $460 million in 45 minutes, and agreed to be acquired by Getco four months later (Knight Capital Group, Wikipedia).

The framing most retellings reach for is “flags are dangerous.” The narrower framing is more useful: a deprecated flag whose underlying code path still ships is a future incident waiting for an unrelated change to step on it. Two rules prevent the recurrence: never reuse a flag bit or key for a different feature, and remove the code path in the same change that retires the flag — in two PRs, flag check first, dead branch second. The full procedure lives in our feature flag cleanup playbook.

2. Using a client-side flag for an authorization decision

Client-side flags are tamperable. Any user with browser dev tools can flip them in three seconds. That’s fine when the flag controls display — whether a button is rendered, whether a tab is shown — because the server still enforces what the user can actually do. It’s an outage-class problem when the flag is the access control:

// Anti-pattern: the API trusts the client's flag.
if (flags.canExportAllTenants) {
  return await api.exportAllTenants();  // server has no auth check
}

The user opens dev tools, sets flags.canExportAllTenants = true, and exfiltrates every tenant’s data. The flag was never the boundary; the API was. The fix is a server-side authorization check that doesn’t care what the client believes:

// Server-side route handler.
if (!session.hasRole('platform_admin')) {
  return res.status(403).end();
}
return res.json(await exportAllTenants());

The flag stays. It now controls whether the export button renders. The authorization check is where it belongs. The principle: a flag is display logic. If toggling it changes who can do what, you have an authorization rule wearing a flag’s clothing.

3. Using a flag where an environment variable belongs

Database connection strings, API endpoints, region keys, log levels, and most numeric tuning parameters don’t belong in a feature flag service. Flags are for behavior changes — turn a feature on for a cohort, gradually expose a new code path, kill it under load. Environment variables are for configuration — the static, deploy-time values an app needs at startup.

The smell is a “flag” whose variant looks like configuration:

// Anti-pattern: flag whose value is configuration.
const config = client.evaluate('payments-config', user, {
  endpoint: 'https://payments-v2.example.com',
  timeoutMs: 5000,
  region: 'us-east-1',
});

Three things go wrong. The app’s startup now depends on the flag service being reachable. A static value becomes a runtime evaluation, paying network cost on every request. And the flag dashboard becomes a config registry that no one audits. The full distinction lives in our feature flags vs environment variables post; the short version is that if the variant looks like configuration, it is configuration.

4. Coupling the toggle point to the toggle router

Martin Fowler’s canonical feature toggles article calls out the distinction: the toggle point is where the decision is read; the toggle router is the logic that decides. Tangling them is what makes flag removal terrifying years later, because nobody knows what they’re deleting.

The anti-pattern looks like this — five files, each calling the SDK directly with the same set of inputs:

// In billing.js, checkout.js, signup.js, dashboard.js, mobile.js:
if (client.evaluate('billing-v2', { tenantId, region, plan, betaCohort })) {
  ...
}

Now imagine deleting billing-v2. You have to find every call site, verify each one passes the same evaluation context, and trust that none of them silently differ. Wrap the decision once, behind a name:

// One file. One named decision.
export function isBillingV2Enabled(user) {
  return client.evaluate('billing-v2', {
    tenantId: user.tenantId,
    region: user.region,
    plan: user.plan,
    betaCohort: user.betaCohort,
  });
}

Every other file imports isBillingV2Enabled(user). The decision lives once. Removal is one delete and a search-and-replace, not an archaeological dig.

5. Not testing the off state — or the on/off transition

The most common silent flag failure is shipping a flag that’s only ever been tested in one state. CI exercises the new code path because that’s what the PR added. The old path stops getting touched. When the flag flips back during an incident — or rolls forward to a cohort that exposes a latent bug — the un-exercised path executes for the first time in months.

Slack’s May 12, 2020 incident started exactly this way. Around 8:30am Pacific, a percentage-based flag rollout exposed a longstanding performance bug. Slack rolled the flag back within minutes, but the morning load spike had already pushed the backend into a cascade through stale HAProxy state and webapp autoscaling — by 4:45pm Pacific, the user-visible outage began (Slack Engineering).

The flag itself wasn’t broken. The code under the flag had never been exercised under production load. Two rules cover the gap: every PR that adds a flag check needs a CI test for both branches, and pre-rollout, the on path needs a load test or a canary that actually pushes traffic through it. Per-environment overrides in your flag platform exist for exactly this; use them.

6. No kill switch — or one that’s never been flipped

Google Cloud’s June 12, 2025 outage is the inverse case study. On May 29, Google rolled out new quota-policy checking code into Service Control. The code path was dormant — it activated only when a specific kind of policy was published — and the rollout itself wasn’t gated behind a feature flag. On June 12, a policy update introduced unintended blank fields, hit the dormant code, and triggered a null pointer dereference. Service Control crashed in regions worldwide, taking down APIs across Compute Engine, Cloud Storage, BigQuery, IAM, Cloud Run, Vertex AI, and Workspace (ThousandEyes, 2025).

The team had built a “red-button” kill switch into the new feature. They activated it within 10 minutes of identifying the root cause and finished rolling it out across all regions 40 minutes later (Google Cloud incident report). Two lessons sit on top of each other. The kill switch worked — Google’s own postmortem is explicit that without it, recovery would have taken hours longer. And the initial deploy should also have been behind a flag, defaulting to off, and exercised in production-like conditions before activation.

A kill switch you’ve never flipped isn’t a kill switch — it’s a hope. Mark every kill switch as permanent at creation, assign it an owner, and exercise it in a non-production environment quarterly. If it can’t be flipped without breaking something, it doesn’t actually work.

7. Nested flag dependencies and “flag spaghetti”

When a flag’s evaluation depends on another flag’s value, debugging an incident becomes a graph-traversal problem. The combinatorics get ugly fast — six nested flags create 64 possible code paths, and most teams have tested fewer than five of them.

// Anti-pattern: nested flags. 64 untested combinations live below.
if (flags.checkoutV2) {
  if (flags.applePayEnabled) {
    return checkoutV2WithApplePay(cart);
  } else if (flags.expressPaymentBeta) {
    return checkoutV2Express(cart);
  } else {
    return checkoutV2(cart);
  }
} else {
  return legacyCheckout(cart);
}

Fold the decision into one named function that returns one variant:

// One decision. Switchable. Auditable.
function checkoutVariantFor(user) {
  // platform-side targeting; client just renders the chosen variant
  return client.evaluate('checkout-variant', user);
  // returns: 'legacy' | 'v2' | 'v2-apple-pay' | 'v2-express-beta'
}

switch (checkoutVariantFor(user)) {
  case 'v2-apple-pay':   return checkoutV2WithApplePay(cart);
  case 'v2-express-beta': return checkoutV2Express(cart);
  case 'v2':             return checkoutV2(cart);
  default:               return legacyCheckout(cart);
}

The platform decides; the client picks. Targeting moves to a place where it can be audited, and the combinatorial explosion collapses into a single multi-arm experiment.

CI typically covers fewer than five combinations regardless of flag count. The rest only execute in production.

8. No observability on flag evaluations

When a flag is the suspect in an incident, you need to answer four questions in the first ten minutes:

Which flags changed in the last hour?
For the affected user, which variant fired?
What’s the variant rollout percentage right now?
Has variant traffic correlated with the error spike?

Without per-flag evaluation events and an audit log, every one of those is a guess. Most platforms emit evaluation telemetry; if yours doesn’t, instrument the wrapper that calls the SDK so you can correlate variant against error rate at the request level.

The minimum bar is: every flag change is logged with who, when, and what (audit log); every evaluation emits an event with flag key, variant, user identifier, and timestamp; a dashboard somewhere shows variant share by flag. None of that is exotic — it’s the same telemetry you’d want for any production system. Flags are just the system whose changes happen during an incident, which is exactly when you need the data the most. Featureflip surfaces both today: an audit log on every flag and per-evaluation events through the standard SDK telemetry path. (We don’t yet ship anomaly detection on variant share — that one is on us.)

9. “Set and forget” — treating flags as permanent infrastructure

A release flag that lives past its rollout becomes a zombie. Once a feature is at 100%, the flag is no longer protecting anything — it’s just a conditional branch a future engineer has to reason about. LaunchDarkly’s documentation is explicit: “Release flags are temporary. After you verify the new code is stable and roll out the feature to 100% of contexts, you should archive the flag” (LaunchDarkly). Industry data puts the share of flags that never get removed at 73% (FlagShark, 2025).

The fix isn’t a quarterly cleanup sprint. It’s a tag at creation. Every flag belongs to one of four lifecycle types, and the lifecycle drives the expected lifespan:

Tag flag type at creation. The type determines expected lifespan, and the lifespan determines when triage should fire.

The cleanup happens at creation, not at quarter-end. If a flag is tagged “release” the day it’s created, its expiration date is set then too; the dashboard surfaces it for review the moment it expires. If a flag is tagged “kill switch,” it has an owner, a test cadence, and never appears on a stale-flag report. The full operational loop — detect, triage, remove, prevent — sits in our feature flag cleanup playbook.

The shorter version

Most flag-related incidents come from a small set of named anti-patterns. Don’t reuse a deprecated flag bit — and when you retire a flag, remove the code in the same change. Don’t put authorization decisions in client-side flags. Don’t store configuration in a flag service. Wrap each flag decision in one named function so removal is mechanical. Test the off state, the on state, and the transition. Build a kill switch and exercise it before you need it. Flatten nested flag combinations into a single named decision. Emit evaluation telemetry so flag-suspected incidents are debuggable. And tag every flag with a lifecycle type at creation, so cleanup follows from policy instead of willpower.

The Knight Capital, Slack, and Google Cloud incidents are public reminders that the failure modes are real and the costs are large. Most teams will never run an automated order-routing system or operate one of the world’s three largest cloud platforms. The anti-patterns are the same.

Frequently asked questions

What is a feature flag anti-pattern?

A feature flag anti-pattern is a recurring misuse of feature flags that causes incidents, technical debt, or operational pain. Common examples include reusing deprecated flag bits, using client-side flags for authorization, treating flags as configuration, coupling toggle decisions to the code that reads them, and leaving release flags in place after rollout. Each has a documented fix.

Can feature flags cause outages?

Flags rarely cause outages directly. The anti-pattern around the flag does. Knight Capital’s $460M loss came from a deprecated flag reused without removing the code path it once gated (Henrico Dolfing, 2019). Slack’s May 2020 incident began with a flag rollout that exposed a latent performance bug. Google Cloud’s June 2025 outage came from a code path that wasn’t behind a flag at all.

Should feature flags be used for security or authorization decisions?

No. Client-side flags are tamperable — any user can flip them in browser dev tools. Server-side flags are safer but mix authorization logic with display logic, making access rules harder to audit. Authorization belongs in a session-validated server check; flags should control display, not access. If toggling the flag changes who can do what, you have an authorization rule.

How long should a feature flag live in production?

It depends on the flag’s type. Release flags should be retired within weeks of hitting 100%. Experiment flags should live for the duration of the experiment plus a two-week stabilization window. Kill switches and plan-tier entitlements live indefinitely if they’re documented as permanent and assigned an owner (LaunchDarkly). Tag the type at creation; lifespan follows.

What’s the difference between a feature flag and an environment variable?

Feature flags are for behavior changes — turning a feature on for a cohort, gradually exposing a new code path, killing it under load. Environment variables are for deploy-time configuration — connection strings, region keys, log levels. Flags evaluate at runtime against user context; env vars are static for the life of the process. The full distinction lives in our feature flags vs environment variables post.

Featureflip is built around the failure modes in this post: an audit log on every flag, per-evaluation events through the SDK, a one-click kill switch, and flag-type tags at creation so cleanup follows the lifecycle. If you’d like to see how that plays out in practice, start with the Solo plan. It’s free forever for one project.