What Is DNS Failover? How It Keeps Sites Online

How DNS failover works, how health checks trigger automatic record updates, TTL considerations, provider options, and when to use DNS-based failover vs alternatives.

When a server goes down, users get errors. DNS failover is a technique that automatically redirects traffic away from a failed server by changing DNS records in response to health check failures. Instead of users seeing an error page while your team scrambles to fix the problem, DNS failover points them to a healthy backup server. It is not instantaneous and it has limitations, but for many setups it is the simplest way to add redundancy. For a broader understanding of DNS, see our DNS guide.

How DNS Failover Works

DNS failover combines two components: health checks and automatic record updates.

Health checks are probes sent to your servers at regular intervals (typically every 30 to 60 seconds). The health check system connects to your server over HTTP, HTTPS, TCP, or ICMP and verifies that it responds correctly. A simple HTTP health check might request a specific URL and confirm it returns a 200 status code. More sophisticated checks can verify response content, measure response time, or check multiple endpoints.

Automatic record updates happen when a health check detects that a server is down. The DNS provider removes the failed server's IP from the DNS response (or replaces it with a backup IP). When users query your domain, they get the IP of a healthy server instead of the failed one.

The process works like this:

  1. You configure your primary server IP and one or more backup server IPs with your DNS provider.
  2. The provider runs health checks against all servers continuously.
  3. Under normal conditions, DNS queries return your primary server's IP.
  4. When the primary fails health checks (typically 2-3 consecutive failures to avoid false positives), the provider updates the DNS response to return the backup server's IP.
  5. When the primary recovers and passes health checks again, the provider switches DNS responses back.

TTL and the Failover Delay

DNS failover is not instant. This is its most important limitation, and understanding why requires understanding how DNS caching and TTL work.

When a resolver gets a DNS response, it caches it for the duration specified by the TTL (Time to Live) value. If your TTL is 3600 seconds (one hour), resolvers will use the cached response for up to an hour without querying your authoritative server again. If your server fails and DNS failover triggers a record change, resolvers that have the old record cached will continue sending users to the failed server until the cached record expires.

This means your effective failover time is:

Detection time (how long until health checks confirm the failure) + TTL (how long until cached records expire)

With a 30-second health check interval, 3 failures needed for confirmation, and a 300-second TTL, your worst-case failover time is about 90 + 300 = 390 seconds (6.5 minutes). With a 3600-second TTL, it could be over an hour.

To minimize failover time, you need a low TTL. Most DNS failover configurations use TTLs of 30 to 300 seconds. But low TTLs have their own costs: more DNS queries hit your authoritative servers, resolution is slightly slower for users (more cache misses), and not all resolvers respect very low TTLs.

Some resolvers ignore low TTLs

A small number of ISP resolvers enforce minimum TTL values regardless of what your authoritative server specifies. This is uncommon but means a handful of users may experience longer failover times than expected. There is nothing you can do about this on your end.

DNS Failover vs Load Balancing

DNS failover and DNS load balancing are related but serve different purposes.

DNS load balancing distributes traffic across multiple healthy servers by returning different IPs to different queries (round-robin) or using weighted responses. All servers are active and serving traffic simultaneously.

DNS failover keeps backup servers on standby. Under normal conditions, only the primary server receives traffic. Backup servers only receive traffic when the primary fails.

Many DNS providers combine both. You can have multiple active servers with load balancing during normal operation, with failover removing any server that fails health checks from the rotation.

AspectDNS FailoverDNS Load Balancing
Active serversPrimary only (backup on standby)All servers active
PurposeHigh availabilityTraffic distribution
Health checksRequiredRecommended but optional
Backup resourcesStandby (potentially lower cost)All provisioned and running
Failover speedTTL-dependentN/A (already distributed)

Providers That Offer DNS Failover

Several DNS providers include failover as a feature, either in their standard plans or as an add-on.

AWS Route 53

Route 53 offers health checks and failover routing policies as core features. You can configure primary and secondary records for any record type. Health checks can monitor HTTP/HTTPS endpoints, TCP connections, or even CloudWatch alarms. Route 53 charges per health check ($0.50-$0.75/month each) plus standard DNS query fees.

Cloudflare

Cloudflare's load balancing product includes health checks and failover. It supports multiple origin pools with configurable failover priority. Health checks run from multiple Cloudflare data centers globally, reducing the chance of false positives from network issues. Pricing starts at $5/month for the load balancing add-on.

NS1

NS1 provides advanced DNS traffic management with health checks, failover, and sophisticated routing policies. Their Filter Chain system allows complex failover logic including geographic failover and performance-based routing.

DNSMadeEasy

DNSMadeEasy was one of the first providers to offer DNS failover as a standalone feature. They provide health checks from multiple monitoring locations with configurable failover records. It is included in their business plans.

For a broader comparison of DNS providers, see our public DNS providers guide.

Setting Up DNS Failover

The exact steps vary by provider, but the general process is consistent.

1

Provision your backup infrastructure

Before configuring failover, you need somewhere to fail over to. This could be a server in a different data center, a different cloud region, a static "maintenance" page, or a CDN origin. The backup needs to be capable of serving your application or at least a useful maintenance page.

2

Lower your TTL in advance

If your current TTL is high, lower it well before you need failover. Set it to 60-300 seconds. Wait at least as long as your old TTL before relying on the new value, so all caches expire the old record.

3

Configure health checks

Set up health checks for your primary server. Use HTTPS if possible and check a specific endpoint that exercises your application (not just a static page). Configure the check interval, failure threshold (how many failures before triggering failover), and monitoring locations.

4

Configure failover records

Set your primary DNS record and one or more failover records. Specify the priority order so the system knows which backup to use first.

5

Test the failover

Simulate a failure by temporarily blocking health check requests to your primary server or by pointing the health check at a non-existent endpoint. Verify that DNS responses switch to your backup server. Then restore the primary and verify that DNS switches back. Document the observed failover time.

Limitations of DNS Failover

DNS failover is useful, but it is not a complete high-availability solution.

TTL delay. As discussed above, failover is not instant. There will always be some period where cached records point to the failed server. For applications that require near-instant failover, DNS-based solutions are insufficient.

No connection draining. DNS failover doesn't gracefully handle in-progress connections. Users who are mid-session when failover occurs may lose their session state unless your application handles this at the application layer.

Health check limitations. Health checks only verify what they are configured to check. If your health check endpoint returns 200 but the application is actually broken in some other way, failover won't trigger.

Client-side caching. Some applications and operating systems cache DNS responses independently of the TTL. Java applications, in particular, have historically cached DNS lookups aggressively with the networkaddress.cache.ttl JVM setting.

Not all records fail over equally. DNS failover works well for A and AAAA records (web traffic). Failing over MX records (email) is more complex because email delivery involves queuing and retry logic that interacts with DNS TTLs in different ways.

Alternatives to DNS Failover

When DNS failover's limitations are too significant, several alternatives exist.

Anycast routing allows multiple servers in different locations to share the same IP address. Traffic is routed to the nearest healthy server by the network itself. If one server goes down, traffic automatically flows to the next nearest. There is no TTL delay because the IP address doesn't change. Cloudflare, Google, and major CDNs use anycast extensively.

CDN failover is handled at the CDN layer. If your application is behind a CDN, the CDN can detect origin failures and serve cached content or route to a backup origin. This is faster than DNS failover because the CDN's IP (which is what DNS resolves to) stays the same.

Load balancer failover uses hardware or software load balancers (like AWS ALB/NLB, HAProxy, or NGINX) that sit in front of your servers. The load balancer handles health checks and routing at the TCP/HTTP layer with sub-second failover. DNS points to the load balancer, which is itself made highly available.

Global server load balancing (GSLB) combines DNS-based routing with health checks and geographic awareness. It is essentially a more sophisticated version of DNS failover that considers latency, geography, and server load when making routing decisions.

When DNS Failover Makes Sense

DNS failover is a good fit when:

  • You have a primary/secondary architecture where the secondary is a warm standby.
  • Your availability requirements tolerate 1-5 minutes of downtime during failover.
  • You want a simple, provider-managed solution without deploying your own load balancers.
  • You need geographic failover between regions that can't share a load balancer.
  • You are adding redundancy to an existing setup with minimal changes.

DNS failover is not ideal when you need sub-second failover, when you need to maintain session state during failover, or when your traffic volumes justify a dedicated load balancing infrastructure.

Regardless of your failover strategy, monitoring your DNS records is essential. If failover triggers and your DNS records change, you want to know about it immediately. Use a multi-provider DNS strategy for additional resilience at the DNS layer itself. For broader availability strategies, see our sibling guide on high availability.

References

Know when failover triggers

DNS Monitor watches your records continuously and alerts you when they change. Know instantly when DNS failover activates or when records don't switch back after recovery.

Try DNS Monitor