Feature Flags vs Environment Variables: A Practical Guide

Q: Should I version-control my feature flag definitions?

Most teams keep flag definitions (key, variations, default) out of source control and manage them through the flag service's UI or API, because the whole point is changing them without a deploy. What does belong in version control: the flag key used in code, the default value, and a comment explaining what the flag gates and when it's expected to be retired.

Environment variables and feature flags both control application behavior, but they solve different problems. Conflating them leads to fragile systems, awkward deployment workflows, and flags that outlive their usefulness by years. The doctrine of putting config in the environment goes back to The Twelve-Factor App; the concept of a feature toggle was formalised by Martin Fowler and Pete Hodgson, who split toggles into release, experiment, ops, and permission categories. This post lays out when each tool belongs, and where teams confuse them.

Key Takeaways

Env vars configure where and how a process runs (database URLs, API keys, log levels). They are static for the life of the process and require a redeploy to change.

Feature flags configure what a specific request or user sees. They are evaluated per-request, support targeting, and change without a deploy.

The one-line rule: if two users hitting the same running instance could ever need different values, it’s a flag. If not, it’s an env var.

The most common mistake is ENABLE_X=true env vars repurposed as poor-man’s flags. Fine until the day you need a 10% rollout or a 2 AM kill switch (feature flag cleanup playbook).

What each one is

An environment variable is a named string value set in the process environment at startup and treated as static for the lifetime of that process. It configures where the application connects, what mode it runs in, and what secrets it uses: things that differ between deployment targets, not between users.

A feature flag is a named boolean or multi-variant value evaluated at runtime, typically over a remote source of truth, that controls which code path executes for a given request or user. It changes behavior without a redeploy, and can be scoped to a specific segment of traffic.

When should you use environment variables?

Environment variables excel at configuration that is static within an environment, secret, or determined before the process starts. The Twelve-Factor App’s third factor codifies this: anything that varies between deploys (credentials, hosts, per-environment toggles) should live in the process environment, not in source code (12factor.net). Env vars are read once at startup and treated as immutable for the process’s lifetime.

1. Database connection strings

DATABASE_URL is the canonical example. It points to your Postgres instance, includes credentials, and is different in dev, staging, and production. It never changes while the app is running. Storing it as an env var means it stays out of source code, can be rotated by updating the deployment secret, and doesn’t require any runtime evaluation logic.

2. API keys and secrets

Third-party service keys (payment processor secrets, object storage credentials, outbound email API keys) are secrets first and configuration second. Env vars compose naturally with secret management systems (Kubernetes secrets, Vault, Doppler) and satisfy security policies that require secrets to be kept out of application code. Evaluating a feature flag to find a secret is the wrong abstraction entirely.

3. Third-party service endpoints

STRIPE_API_BASE, OPENAI_API_HOST, S3_ENDPOINT. These are environment-level choices. Staging points at sandbox endpoints, production points at live ones. These don’t change per-user and don’t need runtime toggle semantics.

4. Build-time and runtime mode flags

NODE_ENV=production, RAILS_ENV=production, ASPNETCORE_ENVIRONMENT=Production. These inform the framework itself, not just your code. They affect which config files load, whether debug middleware is enabled, and how assets are bundled. They must be set before the process starts and cannot meaningfully be changed mid-flight.

5. Log levels and observability config

LOG_LEVEL=warn, OTEL_EXPORTER_OTLP_ENDPOINT, SENTRY_DSN. These affect how the process reports on itself. They are environment-wide, they configure external systems, and they often come from your platform’s secrets store. Routing them through a feature flag system adds a circular dependency (what if the flag system itself fails before logging is configured?).

The common thread: env vars are for where and how the process runs, not what it does for a specific user.

When should you use feature flags?

Feature flags shine when the question is: “should this code path execute, for whom, and starting when?” Martin Fowler’s canonical taxonomy splits flags into four categories (release, experiment, ops, and permission toggles), each with different lifetimes and ownership (martinfowler.com). Three of those four are impossible to express cleanly as env vars, because they need per-request, per-user, or runtime-mutable evaluation.

1. Gradual rollouts

You’ve merged a rework of your checkout flow. You want to expose it to 5% of users, monitor error rates and conversion, then ramp to 100% if the metrics look healthy, all without a second deployment. A feature flag does this; an env var does not.

2. Kill switches

Certain features carry operational risk: a new third-party integration, a resource-intensive background job, a new payment provider. A kill switch flag lets an on-call engineer disable it in seconds without touching infrastructure. An env var change requires a process restart (typically a rolling deploy that takes minutes, not seconds) and introduces its own risk.

3. A/B tests and experiments

Testing two button labels, two pricing page layouts, or two recommendation algorithms requires serving different variants to different users within the same deployed build. That’s a flag, specifically a multivariate flag, not a config value.

4. Per-user and per-segment targeting

Beta programs, internal dogfooding, enterprise tenant overrides, and geographic feature launches all require the same question: “should user X get behavior Y?” Env vars have no concept of a user context. Flags evaluate against user attributes, segment membership, or a deterministic hash of the user ID.

5. Paywall and plan-based variations

Showing premium features to paid users, gating beta features behind an opt-in, or launching a new UI only for enterprise accounts: these are targeting decisions made at evaluation time, per-request. Flags model this directly. A build-time config cannot.

6. Dark launches

You want to call the new code path in production, observe its behavior, and collect metrics, but not yet show its output to users. Wrap it in a flag that’s off for everyone, deploy, then turn it on for internal users only. This is impossible to express cleanly as an env var.

The common thread: flags are for what a specific request or user experiences, and for control you need to exert without touching deployment.

Side-by-side comparison

	Environment variable	Feature flag
Changes at runtime	No, requires process restart or redeploy	Yes, evaluated on every request
Per-user targeting	No, process-wide value	Yes, evaluates against user context
Audit log	No built-in history	Yes, changes tracked with timestamp and actor
Restart required	Yes, set at process startup	No, changes propagate to running instances
Granularity	Environment (dev / staging / prod)	Per-user, per-segment, percentage-based
Typical lifetime	Indefinite (rotated when credentials change)	Weeks to months (should be cleaned up after rollout)

The table understates one dimension: change-propagation latency. The order-of-magnitude gap between deploying a new env var and flipping a flag is what makes flags suitable for kill switches and incident response.

Log scale. Flag SDKs propagate via streaming or short polling intervals; env var changes need a process restart (typically a rolling deploy); pure code changes also pay a CI build. That 100×–1000× gap is why kill switches and gradual rollouts belong in a flag system.

Decision flowchart: which one do I reach for?

If you'd ever want two users in the same running instance to see different values, it's a flag. Otherwise it's an env var.

If you prefer it as text:

Secret, credential, or static per deployment (API keys, DATABASE_URL, LOG_LEVEL) → environment variable.
Runtime change, per-user targeting, or temporary rollout (kill switches, A/B tests, beta gates) → feature flag.
Still ambiguous? Ask whether two users hitting the same running instance could ever need different values. If yes → flag. If no → env var.

Common pitfalls

Using env vars as poor-man’s feature flags

The most frequent mistake: an engineer creates ENABLE_NEW_CHECKOUT=true in the environment to toggle a feature. This works until you need to roll it out to 10% of users, or turn it off at 2 AM without waking up the DevOps rotation. At that point the team discovers they’ve built a deployment-gated toggle instead of a runtime one. Changing it requires a redeployment and process restart, not a flag flip, and migrating the semantics while the feature is already in production is uncomfortable. The kill-switch case has real-world stakes: the Knight Capital incident — $460M lost in 45 minutes when a deprecated flag’s bit was reused — is the textbook example of a toggle that should have been a runtime kill switch backed by a cleanup workflow.

Using flags for things that never change

The inverse mistake is routing static infrastructure config through a flag system. POSTGRES_MAX_CONNECTIONS or REDIS_CLUSTER_HOST do not need audit logs or gradual rollout semantics. Adding them to a flag system increases the surface area for misconfiguration and creates a dependency on the flag service during startup, which is exactly when you want the fewest external dependencies.

Stale flags rotting in the codebase

Feature flags are temporary by design, but in practice they rot. Industry data puts the share of flags that never get removed at 73%, with the average enterprise application carrying 200+ active flags and 60% stale beyond 90 days (FlagShark, 2025). A flag for a rollout that completed eight months ago is dead code wrapped in an if statement, and nobody is sure whether it’s safe to remove. Build retirement into your workflow: when a flag hits 100% rollout and metrics are stable, schedule the cleanup. The four-step cleanup playbook (detect, triage, remove, prevent) covers the specific procedure.

Leaking configuration concerns across layers

Checking process.env.ENABLE_NEW_CHECKOUT deep in a service module creates an implicit coupling between deployment config and business logic. Flag evaluation, by contrast, passes a user context explicitly. This makes the behavior testable (inject a flag client that returns the value you want), auditable (the flag system records who changed it), and decoupled from deployment.

What flag evaluation looks like in production code

Here’s a concrete example using Featureflip’s Node.js SDK:

import { FeatureflipClient } from "@featureflip/node-sdk";

const client = await FeatureflipClient.create({
  sdkKey: process.env.FEATUREFLIP_SDK_KEY!,
});

const currentUser = { id: "user-123" }; // from your auth layer

const checkoutV2Enabled = client.boolVariation(
  "checkout-v2",
  { user_id: currentUser.id },
  false, // default if evaluation fails or flag is missing
);

if (checkoutV2Enabled) {
  // new flow
} else {
  // old flow
}

The SDK key itself is an env var: it’s a credential scoped to an environment. The flag evaluation is runtime, per-user, and falls back gracefully to false if the flag service is unreachable. The SDK key never changes between requests; the flag result does. For more on how Featureflip models targeting, segments, and rollout percentages, see the rollout strategies and environments docs.

Frequently asked questions

Can I use environment variables for A/B testing?

Not effectively. A/B testing requires serving different variants to different users within the same running process, based on a stable hash of the user ID or a targeting rule. Environment variables are process-wide (every request sees the same value), so you can’t split traffic without running multiple deployments. Use a multivariate feature flag instead.

How long should a feature flag live in the codebase?

Most rollout flags should be retired within weeks of hitting 100%. Permanent kill switches and entitlement flags (paywall, plan tier) live indefinitely by design. The trap is rollout flags that quietly become permanent: 73% of flags are never removed in surveys of mature flag installations (FlagShark, 2025). Treat retirement as part of the rollout, not an afterthought.

Are feature flags a security risk?

They can be, if misused. Putting secrets in a flag system is the obvious mistake: flag values are typically cached on the client and visible to anyone who inspects the SDK payload. Use environment variables and a secrets manager for credentials. Flags are safe for behavior toggles and targeting rules, which don’t expose sensitive data even if the flag config leaks.

What happens if the feature flag service is down?

Every reputable SDK is built to degrade safely: evaluations fall back to the default value passed in code, and the SDK retries the connection in the background. That’s why the third argument to boolVariation(...) is false in the example above. It’s the value served if the flag service can’t be reached, so your app stays up and the new code path simply doesn’t activate.

Should I version-control my feature flag definitions?

Most teams keep flag definitions (key, variations, default) out of source control and manage them through the flag service’s UI or API, because the whole point is changing them without a deploy. What does belong in version control: the flag key used in code, the default value, and a comment explaining what the flag gates and when it’s expected to be retired.

The one-line rule

Configuration that belongs to the process goes in env vars. Configuration that belongs to the request or user goes in flags. Env vars answer “where does this service connect?”; flags answer “what does this user see?”

If you remember one heuristic from this post: ask whether you’d ever want this value to differ between two users hitting the same running instance. If yes, it’s a flag. If no, it’s an env var.

For a deeper reference on how Featureflip models flags, targeting rules, and evaluation context, see Feature flags overview.