Linode.com Forum
Linode Community Forums
 FAQFAQ    SearchSearch    MemberlistMemberlist    UsergroupsUsergroups  RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Migration: dallas165

 
This forum is locked: you cannot post, reply to, or edit topics.   This topic is locked: you cannot edit posts or make replies.    Linode.com Forum Forum Index -> System and Network Status
View previous topic :: View next topic  
Author Message
caker
Linode.com Staff


Joined: 15 Apr 2003
Posts: 2715
Location: Galloway, NJ

PostPosted: Tue Jul 21, 2009 7:18 pm    Post subject: Migration: dallas165 Reply with quote

We're investigating a problem with dallas165. Updates in a bit.

-Chris
Back to top
View user's profile Send private message Send e-mail Visit poster's website
jed
Linode.com Staff


Joined: 28 Mar 2009
Posts: 357
Location: New Jersey

PostPosted: Tue Jul 21, 2009 8:42 pm    Post subject: Reply with quote

Early this afternoon, one hard drive in dallas165's RAID 10 array failed. Calls were made, tickets filed, and a plan of action put into place. Customers would have never been any wiser, had no other drives failed; however, at around 8 PM EDT, another drive did.

Not even RAID can prepare for double drive failure. Two drives failing within six hours of each other is unprecedented, and quite unlucky. This is an extremely rare situation for Linode, and one we regret immensely. After extensive triage and troubleshooting, we have determined that all customer data on dallas165 has been lost.

However, hardware does fail; this sort of situation is mostly outside of our control. Let me be the first, on behalf of Linode, to apologize if you are affected by this host failure.

Customers on this host have been moved to dallas166, and tickets opened to discuss specifics relating to their account. If you have any questions whatsoever, please don't hesitate to open a ticket or e-mail us. Check your e-mail for a ticket from us, if you were on dallas165.

Once again, I apologize.
_________________
Jed Smith
Developer & Systems Administrator
Linode.com
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Xan
Senior Member


Joined: 08 Feb 2004
Posts: 552
Location: Austin

PostPosted: Tue Jul 21, 2009 8:56 pm    Post subject: Reply with quote

Ouch... My condolences to people affected. Once again, the lesson is: backups! Any valuable data must be backed up, whether it's on Linode or anywhere. Events like this are good to remind us of that, because it's a matter of when, not if, data loss hits us all.

Tangentially, of course RAID certainly can protect against double drive failures, but RAID 10 can't. But in any case, RAID is not a backup.

[edited to fix error pointed out by hybinet]


Last edited by Xan on Thu Jul 23, 2009 2:52 am; edited 1 time in total
Back to top
View user's profile Send private message Send e-mail
freedom_is_chaos
Senior Member


Joined: 12 Sep 2008
Posts: 166

PostPosted: Wed Jul 22, 2009 12:00 am    Post subject: Reply with quote

jed wrote:
Not even RAID can prepare for double drive failure. Two drives failing within six hours of each other is unprecedented, and quite unlucky.


Should tell that to our SAN system, we had 17 drives fail semi-simultaneously out of our 32 drive array.

It is unfortunate that all customer data is lost and unfortunate that Linode Backup is completely up and running yet either Razz But we knew what un-managed meant.
_________________
If it ain't broke, you didn't tweak it enough. If it is broke, use more duct tape.
http://independentchaos.com
Back to top
View user's profile Send private message Visit poster's website
essentialdots



Joined: 22 Jul 2009
Posts: 1

PostPosted: Wed Jul 22, 2009 11:04 am    Post subject: Reply with quote

We had our Linode hosted on dallas165.

It was a complete shock when I've logged in and saw no disk images in the dashboard. Talk about scary... I hope that none of you will ever see something like that.

We had backup of everything locally. However, we also had bunch of things set up on our Linode (mail, web, svn, mysql...) with a lot of optimizations and tweaks (custom patched Apache...). So, just restoring these would take days I guess (even with the server log we manually update).

We had "luck" to move to dallas165 at the beginning of July (this was our most important image which was 2 years old). Tech support managed to somehow recover our two weeks old image file. This was very convenient as at the end only emails and two projects we are working on right now had to be recovered in addition to old image file. Though, just those took us full working day (8 hours) to restore. All of our live production projects were up and running within an hour (so we were "offline" for very short period of time in central European time zone).

I am now as lucky as desperate I was this morning. Finally, we have put this behind us.

The moral of the story: when you do backups, don't think just about backing up. The reverse process and its speed is equally important.

I hope that Linode backup will be available soon.

My condolences to the rest of our "cohabitants" on dallas165.
_________________
Nikola Stojiljkovic
Essential Dots
Back to top
View user's profile Send private message
Guspaz
Senior Member


Joined: 26 May 2009
Posts: 357

PostPosted: Wed Jul 22, 2009 12:55 pm    Post subject: Reply with quote

There seem to have been a few double-drive failures of late (well, I Think this is only the second recently).

Nevertheless, considering the huge impact when they do occur (and we've seen that they do occur on occasion), has Linode considered switching from RAID10 to RAID6 or switching to simple RAID1 with 3 larger drives instead of (presumably) the four smaller drives used in RAID10? Either of these solutions would provide for the ability to survive two-drive failures.
Back to top
View user's profile Send private message
hybinet
Senior Member


Joined: 02 May 2008
Posts: 445

PostPosted: Wed Jul 22, 2009 1:13 pm    Post subject: Reply with quote

jed wrote:
Two drives failing within six hours of each other is unprecedented,

AFAIK, drives purchased at the same time from the same vendor tend to do that. They most likely are from the same production line (same "batch"), which sort of explains why they might have similar potentials for premature failure.

Xan wrote:
it's a matter of if, not when, data loss hits us all.

I think you got it backwards. It's a matter of when, not if, data loss will occur.
Back to top
View user's profile Send private message
Xan
Senior Member


Joined: 08 Feb 2004
Posts: 552
Location: Austin

PostPosted: Thu Jul 23, 2009 2:52 am    Post subject: Reply with quote

pfft, what a doof, thanks!
Back to top
View user's profile Send private message Send e-mail
smiffy
Senior Member


Joined: 23 Jan 2007
Posts: 88
Location: Rural South Australia

PostPosted: Fri Jul 24, 2009 5:24 pm    Post subject: Reply with quote

Once again, I find myself wishing that Linux kernel licensing were different so that we could have ZFS. That CAN cope with multiple disc failures. There's even a video of a guy taking a sledgehammer to a pair of discs in a hot system, plugging new discs in and watching it all rebuild without a hitch.
Back to top
View user's profile Send private message Visit poster's website
Guspaz
Senior Member


Joined: 26 May 2009
Posts: 357

PostPosted: Mon Jul 27, 2009 9:09 am    Post subject: Reply with quote

smiffy wrote:
Once again, I find myself wishing that Linux kernel licensing were different so that we could have ZFS. That CAN cope with multiple disc failures. There's even a video of a guy taking a sledgehammer to a pair of discs in a hot system, plugging new discs in and watching it all rebuild without a hitch.


ZFS can't handle multiple disk failures. It has no inherent redundancy. RAID-Z2 can handle two disk failures (RAID-Z can handle one).

But, RAID-5 can handle one disk failure, and RAID-6 can handle two.

ZFS/RAID-Z's advantages are not in the number of disk failures they can handle, they're in other things.
Back to top
View user's profile Send private message
irgeek
Linode.com Staff


Joined: 21 Jun 2003
Posts: 151
Location: Absecon, NJ

PostPosted: Mon Jul 27, 2009 12:03 pm    Post subject: Reply with quote

Most of you who were affected by the RAID crash have probably seen the ticket updates, but I just wanted to let everyone know that we finally managed to get the RAID to respond again this weekend. All customer data was copied off to a standby host and it's sitting there now in case anyone wants access to it.

If you haven't redeployed yet, we can put your Linode back the way it was. If you have redeployed and you'd just like access to the disks, let us know and we'll set it up for you.

-James
Back to top
View user's profile Send private message AIM Address
jsr
Junior Member


Joined: 09 Dec 2008
Posts: 43
Location: Gilbert, AZ

PostPosted: Tue Jul 28, 2009 8:45 am    Post subject: Reply with quote

I wasn't affected by this, but it is good to know that Linode kept on working on getting the data back and didn't just give up. Good job Linode.
Back to top
View user's profile Send private message Visit poster's website
freedom_is_chaos
Senior Member


Joined: 12 Sep 2008
Posts: 166

PostPosted: Thu Jul 30, 2009 12:56 am    Post subject: Reply with quote

jsr wrote:
I wasn't affected by this, but it is good to know that Linode kept on working on getting the data back and didn't just give up. Good job Linode.


This is why people that come to linode, stay with linode Smile
_________________
If it ain't broke, you didn't tweak it enough. If it is broke, use more duct tape.
http://independentchaos.com
Back to top
View user's profile Send private message Visit poster's website
Display posts from previous:   
This forum is locked: you cannot post, reply to, or edit topics.   This topic is locked: you cannot edit posts or make replies.    Linode.com Forum Forum Index -> System and Network Status All times are GMT - 5 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Forum Archive
RSS 2.0 | Additional RSS options
Powered by phpBB © 2001, 2005 phpBB Group

Home | Members | Contact Us | Terms of Service | ™ © 2003-2008 Linode, LLC. All rights reserved.