RAID card failure - all data was lost
> We experienced a RAID card failure on your Linode's host, dallas653. Despite our best efforts, we were unable to recover your Linode disk images from dallas653 due to an internal issue and unfortunately all data was lost.
I'm seriously disappointed with Linode. The basic tenent of data security is to have content in at least two places. I guess I thought that there was some level of redundancy in their VPS service. It's all too common that RAID hardware fails or disks can't be rebuilt. Even without me purchasing their backup service - I thought they had some level of backup in case of their "oops".
Unfortunately, we just deployed a new website and used this server for both a development and production environment. If I'm lucky, I may be able to piecemeal the site back together from local sources, but this may not go so well. First, I have to rebuild my server.
RAID card failures that lead to data loss are pretty rare, but they do happen, Linode makes no secret of the fact that hardware is not magic, and as always it's your responsibility to do your due diligence.
Sounds like you had some bad luck, but based on your description it doesn't appear that Linode behaved poorly. This wasn't "their oops", this was a hardware issue.
Backups are good for the soul.
I can count on one hand the number of times we've lost an entire host in the 12 years we've been doing this. It's rare, but it can and does happen.
Nightly rsync's are your friend.~~
I started with Linode because of good reviews, but will be switching my company over to Digital Ocean for a few reasons:
Linode support was unable to offer me any useful compensation for the loss of my data - such as a period of free backup service.
- Digital Ocean has really quick droplet deployment and was able to get me a working LAMP setup in only a few minutes
- I am also able to get monthly service + backup on DO for less than my service alone on Linode.
On top of that, I will be responsible for making my own backups with Cron/Rsync, Crashplan, or another solution.
I wouldn't say VPS services are fragile as such - no more so than a dedicated server - there are similar sorts of hardware issues involve. I'm not a long time Linode user, but I have had a couple of VPSs here for >2 years now, and not a single instance of any sort of data loss. I don't consider that fragile. When I used to have dedicated servers, I'd be totally responsible for my hardware (inc RAID) and backups. With a VPS, I'm only responsible for backups. I do take 4 hourly incremental backups, to two separate locations.
I'm not going to tell you to do backups, you know that already, you just didn't get it setup on this box it appears.
Regarding Digital Ocean, I have no idea if they are good or bad. If you think their 'droplet' setup is easy, have you taken a look at Linode's stackscripts to compare?
A few steps I've taken to make sure this doesn't happen again:
1) UpdraftPlus Wordpress Plugin to make automatic site backups to Dropbox
2) Cloudflare CDN to keep serving static copies of my site if there are any outages
3) Using Digital Ocean's VPS backup plan
For additional protection, I may backup the whole server using CrashPlan, but for now the server is only hosting a single website.
I had full backups of all my stuff — but on the same disk. It has been on my to-do list to improve my backup "system" and get all my important files in three separate locations. I didn't get to setting up a true backup system in time. I screwed up. Because in 15+ years of running dedicated servers and VPSs, I'd had never had a full data loss before, and other things were always more pressing.
As Jamie Zawinski would say:
RAID is not a backup solution.
As Saul Goodman might say: "Backups. They're like health insurance. You hope you never need them, but man, oh man, not having them? No!"
Back up everything, offsite. Set up Crashplan, do an rsync to your computer, set up git hooks to keep your database in version control and fully-backed up, or whatever it is you need to do. But don't delay.
Nightly rsync stored as incremental ZFS snapshots
And if all else fails, we've at least got copies of the various code and content on the developer machines along with occasional manual backups. I think we're pretty good with the two main backup setups though. The first option would let us get back up and running within a few minutes, the second option within a few hours. This was prompted after we got bit ourselves, when our previous fly-by-night VPS operator basically disappeared without a trace. Rebuilding from dev copies of code/content and slightly out of date manual database snapshots got us back up and running, but it was painful, and we've taken the steps I've mentioned to ensure it never happens again.
Our production server has two tiers of backup schemes in place:
Nightly rsync stored as incremental ZFS snapshots
I think we're pretty good with the two main backup setups though. The first option would let us get back up and running within a few minutes, the second option within a few hours.
Sorry to bring up an old thread, but I am new to Linode and have been running a couple of production servers for which I would like to setup the backups properly.
I am curious to know a couple of things:
Have you been able to get a new production server running from a Linode backup with a good db backup? The Linode documentation itself says that their backups are not to be relied upon for a high transaction db.
Do you have any pointers to share for learning more about the nightly sync stored as incremental ZFS snapshots?
Thanks in advance.
- Have you been able to get a new production server running from a Linode backup with a good db backup? The Linode documentation itself says that their backups are not to be relied upon for a high transaction db.
To this point, I've definitely done this (with PostgreSQL) - it will depend on your specific database, but in general as long as it's using a write-ahead log or some sort of journaling, and appropriately flushing things, the snapshot taken during a backup should be consistent enough for the database to recover, albeit perhaps losing any transactions that were "in flight" at the point of the backup snapshot. To the database this should be no different than a spontaneous reboot at that point in time (actually a bit less harsh).
The problem is that assumes journaling and flushing work properly, so has wiggle room for something to fail, and spontaneous database reboots don't always work out that well even on dedicated systems, so I think the general warning in the backup documentation is well founded. For older versions of MySQL for example, even if using InnoDB for your own tables, system tables would use MyISAM and be at greater risk if changing.
What I suggest is the same as the backup docs - in advance of your backup window (enough in advance for it to complete), schedule a database backup/checkpoint/etc… using whatever tools are appropriate to the specific database to ensure you have a consistent view of data stored on your system by the time the Linode backups come around. Then, should the need to restore occur, you can start by just seeing if the database recovers on its own, which is likely to succeed. But if it fails, you can revert to your local consistent snapshot, with the cost being losing a little additional time (the gap from the local backup to the actual Linode backup).
This can be independent of any further backup layers you use, or it can be part of them (e.g., you might transfer the local consistent backup to a central location for longer term archiving as part of your own backup task).
Since you mentioned high transaction, none of these approaches is probably suitable alone, since you'll still essentially lose up to 24 hours of transactions in the disaster scenario. So if the data is critical, for such cases I'd recommend including some sort of real time replication slave or streaming backup (depending on database capabilities) to a backup location.