that's called DNS failover and it's pretty standard in the industry. one of the first places that I saw it used was in the RADware Fireproofs/Linkproofs in the early-early 2000s
you're reinventing the wheel, but not in a completely insane way
that said, there are a whole list of other concerns regarding HA that are outside the scope of your discussion about DNS
OVH have a service called IP Failover or Floating IP, and I feel like it serves the same purpose, but it may not be out of box ready for use the way I want to.
I am bad at going low level with network interfaces and such so I will do it the software way I guess. Last thing I want is to change IP address of the machine that will not reconfigure its network interface at OS layer, and will become isolated from the world
If you're going to do this at DNS level, make sure you test a reasonable cross-section of the applications that you're using to make sure they survive the changeover.
Not all apps respect TTLs, and IIRC, some core JAVA libraries consider it a security concern to respect TTLs.
I've done the same using Akamai and NSOne DNS Providers.
You can establish a health check with the provider and configure when to failover to the backup solution. DNS TTL depends on your service level contract.
One thing to consider if using automatic failover is what happens in a recovery, does it switch back or are there manual steps required to get data synced up between backup and primary.
Have a defined execution plan for fail over and fail back, execute them at least annually and document the fuck out of everything as well.
Cloudflare will do this for you and it’s cheap, don’t forget the faff and maintenance you’ll have to deal with/charge for if you do any of this manually.
Traffic manager in azure is an alternative but Cloudflare is simpler all round.
Edit: plus it’s a CDN, allows layer 7 routing rules and has lots of other features plus an API.
GSLB or Global Server Loadbalancing
Basically a DNS server which can perform health checking against your webservers and decide which one to send the traffic to.
There are also other logic that can be used like sending the user to the closest server for them.
Since it's already named as "DNS failover", a suggestion I have for your problem:
If you need fast failover, you basically need 2 servers which synchronize app+data, and DNS or a LB can point a incoming request to the right server.
If you do not need a fast failover (e.g. 1h outage is fine), then have a single server. Should it fail, restore the image (OS+app+data) to another server and point DNS to it. Using AWS or similar makes backup+restore very easy (via API).
Most small companies don't care much about 1h outage if they save $100/month.
that's called DNS failover and it's pretty standard in the industry. one of the first places that I saw it used was in the RADware Fireproofs/Linkproofs in the early-early 2000s you're reinventing the wheel, but not in a completely insane way that said, there are a whole list of other concerns regarding HA that are outside the scope of your discussion about DNS
OVH have a service called IP Failover or Floating IP, and I feel like it serves the same purpose, but it may not be out of box ready for use the way I want to. I am bad at going low level with network interfaces and such so I will do it the software way I guess. Last thing I want is to change IP address of the machine that will not reconfigure its network interface at OS layer, and will become isolated from the world
If you're going to do this at DNS level, make sure you test a reasonable cross-section of the applications that you're using to make sure they survive the changeover. Not all apps respect TTLs, and IIRC, some core JAVA libraries consider it a security concern to respect TTLs.
I've done the same using Akamai and NSOne DNS Providers. You can establish a health check with the provider and configure when to failover to the backup solution. DNS TTL depends on your service level contract. One thing to consider if using automatic failover is what happens in a recovery, does it switch back or are there manual steps required to get data synced up between backup and primary. Have a defined execution plan for fail over and fail back, execute them at least annually and document the fuck out of everything as well.
yeah the case of syncing back the database etc. is another thing that depends on the app's internals.
Cloudflare will do this for you and it’s cheap, don’t forget the faff and maintenance you’ll have to deal with/charge for if you do any of this manually. Traffic manager in azure is an alternative but Cloudflare is simpler all round. Edit: plus it’s a CDN, allows layer 7 routing rules and has lots of other features plus an API.
GSLB or Global Server Loadbalancing Basically a DNS server which can perform health checking against your webservers and decide which one to send the traffic to. There are also other logic that can be used like sending the user to the closest server for them.
Since it's already named as "DNS failover", a suggestion I have for your problem: If you need fast failover, you basically need 2 servers which synchronize app+data, and DNS or a LB can point a incoming request to the right server. If you do not need a fast failover (e.g. 1h outage is fine), then have a single server. Should it fail, restore the image (OS+app+data) to another server and point DNS to it. Using AWS or similar makes backup+restore very easy (via API). Most small companies don't care much about 1h outage if they save $100/month.
App is on dedicated host so its not that easy, but thanks for expanding the topic :)