How to protect against datacenter failure

If any of you are on Fremont, you know what I mean. I love Linode but I'm seeing way more downtime and want to spread that risk across two linodes in different datacenters. The tutorials currently on the site only seem to explain how to build highly available sites in the SAME datacenter, and nodebalancer doesn't seem to be a complete solution (just balances web server, not MySQL).

Does anyone know what I can do to protect myself against future failures? I simply can't afford for this to happen.

6 Replies

IMO, keeping a hot, replicated copy in another datacenter which can be switched to immediately is quite a bit of work. Possible for sure, but not something which you can setup and forget about, and it's going to have a performance impact etc. (I'd love somebody to prove me wrong on this BTW :) )

Maybe just move datacenter? I'm planning to move my servers out of fremont this weekend. OFC you will still get the occasional datacenter failure but It's going to be much more rare.

There are a few common solutions for highly available setups.

Probably the cheapest option is to use a simple round-robin DNS setup to serve your site from multiple data centers. The drawback here is that if one of your nodes goes down, you have to update DNS manually and this update may take a while to propagate. Some of your users will see downtime as a result.

One step up from basic round-robin DNS is to use a third-party DNS service that does load balancing and health monitoring. UltraDNS is one I know about, though have not used. These services monitor the availability of each node and reconfigure the DNS records (which have a TTL of a minute or less) to reduce downtime.

From there things get really expensive. Hardware products like BIG-IP are what the big dogs use, but they aren't applicable for Linode setups.

That covers the external stuff. For the internal side, you need to think about how to replicate configuration and data between nodes. These options are covered pretty well in the Linode Library:

http://library.linode.com/linux-ha

For your setup, you'll need to consider Apache and MySQL. You can ignore the parts about IP failover if you're sticking with one node per data center.

To tie it all together for the highest availability, I would recommend combining at least two nodes behind a NodeBalancer in each of at least two data centers. Then you stick UltraDNS in front of your NodeBalancers. This covers individual node downtime, host downtime, and a full data center outage.

Hope this helps!

@dwc:

Probably the cheapest option is to use a simple round-robin DNS setup to serve your site from multiple data centers. The drawback here is that if one of your nodes goes down, you have to update DNS manually and this update may take a while to propagate. Some of your users will see downtime as a result.

It's important to note that the propagation delay is true of any DNS-based solution (and this includes UltraDNS), and that setting a low TTL can minimize the impact here (at the expense of higher load on the DNS server, and slightly longer page load times for users as their DNS queries take longer to resolve). It's also important to note that manually activating failover scenarios is probably the only reliable solution (unless you get really fancy and complicated), because otherwise whatever is monitoring your downtime becomes a single point of failure. UltraDNS' solution isn't really any better than round-robin, except that it automates (and potentially introduces another single-point-of-failure). That's not to say that the convenience might be worth it.

The good thing about the round-robin approach is that, for the period of time that the one server is down until you can take it out of the rotation, you're not losing all your traffic, only a portion of it; if you have linodes in each datacenter, and it takes you 15 minutes between the server going down and your DNS changes propagating, you're only losing 17% of your traffic for 15 minutes.

Probably the best way (in my opinion) is to setup a cron job to do a dump of your database then rsync that database dump and your site to the other server, then have the other server import the database. Since rsync can copy each file directly to the correct location, this would be the easiest way to do it. If you're concerned about bandwidth, you can have your cron job create a compressed tarball before sending it over, but then you would need to have your cron job on the other machine unpack the tarball and drop them into place, which would add a small amount of complexity, though it's not too hard to do.

From there, you would need to keep an eye on your server and update the DNS whenever possible. You can also use a DNS that supports failover, which will ping your main server and switch it over to the backup when it doesn't respond, though as Guspaz pointed out, that would also add an extra point of failure. Nothing saying that you need to fill in all the name server slots on your domain name registrar with the same DNS provider – you can use two or three name server entries from another DNS provider, and fill in the others with Linode's name servers as a backup.

Actually, isn't one of the points of RR DNS that you don't have "hard" downtime when you lose one of the endpoints? It requires client-side code retry but this indicates that browsers do this and I know my web service client, based on libcurl, does as well.

It's not completely invisible: generally there will be a delay while clients try to use the bad address, but a lot/most clients will failover to the next address.

That requires, of course, that the downed address be gone. If it's returning error status, e.g., HTTP 500 or socket-level connection denied, that's what the client will see.

As to DBs, I have streaming replication setup between postgres instances. It does to take manual effort to failover. It's definitely the weak link in the chain.

@jords:

Maybe just move datacenter?

Data centers are very, very heavy so that is probably not an option.

James

Reply

Please enter an answer
Tips:

You can mention users to notify them: @username

You can use Markdown to format your question. For more examples see the Markdown Cheatsheet.

> I’m a blockquote.

I’m a blockquote.

[I'm a link] (https://www.google.com)

I'm a link

**I am bold** I am bold

*I am italicized* I am italicized

Community Code of Conduct