Feature Flag Reliability & Resilience
Feature flags sit in the request path of almost everything you ship, so the question that matters before any feature comparison is operational: what happens when Featureflip is slow, unreachable, or misconfigured? Adding flags should not add a new way for your application to fall over.
This page collects, in one place, how Featureflip behaves under failure — local evaluation, cached values, safe defaults, and graceful streaming fallback. The short version: a Featureflip outage does not take your application down, and a missing or un-resolvable flag falls back to a value you control.
What happens when Featureflip is unavailable
Section titled “What happens when Featureflip is unavailable”Featureflip SDKs do not make a network call on every flag check. They fetch the flag configuration once, hold it, and evaluate against that local copy. That single design choice is what makes flag checks both fast and resilient:
- Server-side SDKs (Node.js, Python, Go, Java, C#, Ruby) download the full flag configuration at startup and keep it in an in-memory store. A flag check is an in-memory lookup that typically completes in well under a millisecond. If the Evaluation API goes down after startup, evaluation is unaffected — the SDK keeps serving from its in-memory copy and reconnects in the background.
- The PHP SDK is cache-first by design, because PHP’s request-per-process model makes a long-lived in-memory client impractical. It writes the configuration through a PSR-16 cache (Redis, Memcached, APCu, or filesystem) and serves subsequent requests from cache without a network round trip.
- Client-side SDKs (Browser/JS, React, Android, Swift, Flutter) receive pre-evaluated values and keep them in an in-memory snapshot. The Android and Swift SDKs additionally persist the last successful snapshot to disk, so flags still resolve when the app cold-starts in airplane mode or on a flaky network — they serve the values from the last successful sync and retry quietly in the background.
If a value cannot be resolved at all, the SDK returns the default you supplied (see Safe defaults below). Nothing in the evaluation path waits on the network.
Safe defaults: the fallback you control
Section titled “Safe defaults: the fallback you control”Every variation method takes an explicit default value as a required argument. That default is returned whenever the flag is missing, the value has an unexpected type, the client has not finished loading yet, or any error occurs during evaluation:
// Node.js — the third argument is the defaultconst checkout = client.boolVariation('new-checkout', { user_id: 'u-123' }, false);# Python — variation() never raises; it returns the default on any errorcheckout = client.variation("new-checkout", {"user_id": "u-123"}, default=False)Because the safe value lives at the call site, your fallback behavior is a local decision, not a remote one. Evaluation is built to stay out of your error path: the Python and PHP SDKs are documented to never throw from a variation call, and the other SDKs return the default instead of surfacing an exception. Choose each default as the value you want users to get if Featureflip could tell you nothing at all — usually the current, safe code path.
For the difference between a flag’s off variation and its fallthrough (default) variation inside the evaluator, see What Are Feature Flags?.
Caching and last-known-good values
Section titled “Caching and last-known-good values”| SDK group | Where flag data lives | Survives an outage by |
|---|---|---|
| Node.js, Python, Go, Java, C#, Ruby | In-memory configuration loaded at startup | Evaluating locally with no per-request network call |
| PHP | PSR-16 cache (Redis, Memcached, APCu, filesystem) | Reading the configuration from cache between requests |
| Browser / JS, React | In-memory snapshot, updated by streamed deltas | Serving the last values received |
| Android, Swift | In-memory snapshot plus on-disk cache | Resolving to the last successful sync, even on a cold start with no network |
Server-side SDKs receive streamed delta updates and merge them into the in-memory store rather than re-fetching the whole configuration, so a brief disconnect costs you nothing once it recovers. Client SDKs work the same way for the values they hold.
Real-time updates and graceful degradation
Section titled “Real-time updates and graceful degradation”By default, SDKs subscribe to configuration changes over Server-Sent Events (SSE), so a rollout you adjust in the dashboard takes effect across your fleet within seconds — no redeploy, no restart. The connection is designed to degrade gracefully rather than fail:
- If the stream drops, the SDK reconnects with exponential backoff.
- After repeated failures — five attempts by default in the JavaScript, Node.js, and Ruby SDKs — the SDK falls back to polling on the poll interval (30 seconds by default) and keeps serving the last known values the entire time.
- A
pingkeepalive every 30 seconds holds the connection open through proxies and load balancers. - When connectivity returns, streaming resumes automatically.
You can also disable streaming and run polling-only if your environment forbids long-lived connections (streaming: false). Either way, evaluation never blocks on the network — it always reads from the local store.
Initialization behavior
Section titled “Initialization behavior”Because evaluation reads from a local copy, the only moment the network matters is the initial load. SDKs handle a slow or failing first fetch deliberately:
- Load in the background (default). Most SDKs return a client immediately and load flags in the background. Until the first load completes, evaluations return your defaults. Call
waitForInitialization()(or thecreate()helper) if you would rather block until flags are ready. - Start offline. The C# SDK exposes
StartOffline, which lets the client start even if the first fetch fails — it serves defaults and retries in the background. - Fail fast. The Python SDK initializes synchronously and raises
InitializationErrorif the first fetch fails or times out, so you can decide at startup whether to proceed. After initialization,variation()still never raises.
The initial-load timeout defaults to 10 seconds (initTimeout); server SDKs that expose them default to a 5-second connect timeout and a 10-second read timeout. All are configurable. Each SDK exposes an isInitialized flag so you can gate readiness checks or health probes.
Misconfiguration and guardrails
Section titled “Misconfiguration and guardrails”Resilience is not only about outages. Featureflip is built so that a configuration mistake fails toward safety, and so that mistakes are reviewable after the fact:
- A disabled flag serves its off variation to everyone with no targeting evaluated — the fastest, most predictable way to shut a feature off. This is the basis of the kill switch pattern.
- Separate SDK keys per environment limit the blast radius of a leaked key, and client keys only ever expose flags marked Client-side visible. See client-side vs server-side.
- Flags are not an authorization layer. A flag controls whether a feature is visible or active; it does not gate data or API access. Keep server-side permission checks independent of flag state. See Are feature flags a security risk?
- Every change is recorded. Who toggled a flag, changed a rule, or rotated a key, with before and after values, is captured automatically, so a misconfiguration has an answer to “what changed, and when?” See the Audit Log.
- Finished flags surface on their own. Stale flag detection flags conditionals that no longer make a real decision, so dead branches do not linger as hidden risk.
Next steps
Section titled “Next steps”- SDK Overview — server-side vs client-side, and SDK key security
- Audit Log — who changed what, and when
- Streaming Endpoint — the SSE contract and reconnection notes
- What Are Feature Flags? — off vs default variations and the evaluation flow
Frequently Asked Questions
Section titled “Frequently Asked Questions”What happens if the Featureflip service is unavailable?
Section titled “What happens if the Featureflip service is unavailable?”Your application keeps serving flags. Featureflip SDKs do not call the service on every flag check. Server-side SDKs download the flag configuration once at startup and evaluate locally from an in-memory copy, so an outage after startup has no effect on evaluation. Client-side SDKs hold the last set of values they received, and the Android and Swift SDKs persist the last successful snapshot to disk so flags still resolve on a cold start with no network. If a value genuinely cannot be resolved, the SDK returns the default you passed to the evaluation call. The connection retries in the background and recovers on its own once the service is reachable again.
Does Featureflip cache feature flag values?
Section titled “Does Featureflip cache feature flag values?”Yes. Server-side SDKs (Node.js, Python, Go, Java, C#, Ruby) keep the full flag configuration in an in-memory store and evaluate against it with no per-request network call. The PHP SDK is cache-first: it writes the configuration through a PSR-16 cache (Redis, Memcached, APCu, or filesystem) and reads from it between requests. Client-side SDKs keep an in-memory snapshot, and the Android and Swift SDKs also persist it to disk so the last known values are available offline and on the next cold start.
What value does a feature flag return if evaluation fails?
Section titled “What value does a feature flag return if evaluation fails?”Every variation method takes an explicit default value, and that default is what you get if the flag is missing, the value has an unexpected type, the client has not finished loading, or any error occurs. Evaluation is designed not to throw from your hot path: the Python and PHP SDKs are documented to never raise from a variation call, and the other SDKs return the default rather than surfacing an exception. Because the default is required at every call site, the safe fallback behavior is whatever you chose locally, not a remote decision.
Can I use Featureflip server-side only?
Section titled “Can I use Featureflip server-side only?”Yes. Server-side SDKs use a server SDK key that stays on your infrastructure and is never exposed to users, and they can see every flag in the project. You do not need to ship any client-side SDK or expose flags to the browser. If you do want client-side evaluation, client SDK keys are public by design and only return flags explicitly marked Client-side visible, so your targeting rules never reach the browser. Server keys and client keys are separate and managed per environment.
How does Featureflip handle real-time updates if the streaming connection drops?
Section titled “How does Featureflip handle real-time updates if the streaming connection drops?”SDKs subscribe to flag changes over Server-Sent Events and apply updates within seconds. If the stream drops, the SDK reconnects with exponential backoff. After repeated failures (five attempts by default in the JavaScript, Node.js, and Ruby SDKs) it falls back to polling on the poll interval (30 seconds by default) and keeps serving the last known values the whole time. When connectivity returns, streaming resumes automatically. Throughout, evaluation continues against the in-memory configuration, so a dropped connection never blocks a flag check.