Linode.com Forum Forum Index Linode.com Forum
Linode Community Forums
 


Host26 reboot

Click here to go to the original topic

 
       Linode.com Forum Forum Index -> System and Network Status
Author Message
caker



Joined: 15 Apr 2003
Posts: 2404
Location: Galloway, NJ

Posted: Tue Mar 01, 2005 3:00 am    Post subject: Host26 reboot  

OOM Killer is eating Linodes on host26 now.

Upgrading the host kernel -- Linodes will be shutdown properly, host will be rebooted, and Linodes will be restarted.

Updates in a few.

-Chris
Back to top  
caker



Joined: 15 Apr 2003
Posts: 2404
Location: Galloway, NJ

Posted: Tue Mar 01, 2005 3:19 am    Post subject:  

Host26's kernel has been upgraded. Linodes are restarting now.

-Chris
Back to top  
jpw



Joined: 29 Jun 2004
Posts: 29

Posted: Tue Mar 01, 2005 12:49 pm    Post subject: OOM Killer details  

Chris,
Is there an overview explaination somewhere of the OOM Killer issues? I checked the forums and there seem to be occasional complaints about the OOM Killer being overly agressive, but I was looking for more details.

What are the specific issues with regard to UML hosts? Is the host OS's OOM Killer causing the problems, or a specific Linode instance's OOM Killer? Are there steps a Linode user can take to reduce the OOM Killer's wrath?

Like others, I have noticed during some more intense work like compiling, I will occasionally get out of memory errors even though there is plenty of swap configured (Linode 64 + 256M swap) and available.

Thanks,

--John
Back to top  
caker



Joined: 15 Apr 2003
Posts: 2404
Location: Galloway, NJ

Posted: Tue Mar 01, 2005 1:04 pm    Post subject:  

This is an issue with Linux kernel 2.6.10 having bugs in the OOM killer. None of these hosts are actually out of memory, but once the OOM killer comes alive it will eventually kill everything on the box. At least that's been our experience. As for an explaination, look for the various OOM killer related threads on LKML.

A bunch of OOM killer patches went into 2.6.11, and that seems to have fixed things.

-Chris
Back to top  
jpw



Joined: 29 Jun 2004
Posts: 29

Posted: Tue Mar 01, 2005 1:21 pm    Post subject: OOM Killer  

So the big problem is with the host kernel killing an entire UML instances?

And for the people who have seen out of memory compile problems (when there is enough swap), it's most likely their own UML's OOM Killer that may be killing processes?

Which means that some of my memory problems described above may be due to my UML kernel (using 2.4 latest) not have the OOM Killer patches backported? (I don't know how bad the OOM Killer problems are in 2.4, but there seem to be some issues based on what I've read.)


--John
Back to top  
caker



Joined: 15 Apr 2003
Posts: 2404
Location: Galloway, NJ

Posted: Tue Mar 01, 2005 1:33 pm    Post subject: Re: OOM Killer  

jpw wrote: So the big problem is with the host kernel killing an entire UML instances?
Yup. The OOM killer "decides" that the host needs more memory (when they're is *plenty* of free memory/swap), and starts killing UML processes. Once it starts, it never stops.

jpw wrote: And for the people who have seen out of memory compile problems (when there is enough swap), it's most likely their own UML's OOM Killer that may be killing processes?
Well, this might be another problem. Are you using Gentoo? If so, what host are you on?

jpw wrote: Which means that some of my memory problems described above may be due to my UML kernel (using 2.4 latest) not have the OOM Killer patches backported? (I don't know how bad the OOM Killer problems are in 2.4, but there seem to be some issues based on what I've read.)
Hmm, 2.4 had and then didn't have the OOM killer. If you're referring to "page_alloc" failures in your console, I believe those are genuine OOM scenarios (no RAM/swap). Or, perhaps the memory allocator in 2.4 is susceptible to the same problems.

-Chris
Back to top  
jpw



Joined: 29 Jun 2004
Posts: 29

Posted: Tue Mar 01, 2005 2:08 pm    Post subject: Re: OOM Killer  

caker wrote: Well, this might be another problem. Are you using Gentoo? If so, what host are you on?
host26, running Fedora Core 1, kernel 2.4 latest

I see the out of memory errors occasionally when compiling various things (named or sendmail, for example). Rerunning 'make' again usually finishes without problems. Restarting the build completely from scratch (to try and reproduce the problem) doesn't usually result in the out of memory error. So it seems less reproducable than I would expect if it was simply just running out of RAM + swap, which is why I thought it might be OOM Killer related.

caker wrote: Hmm, 2.4 had and then didn't have the OOM killer. If you're referring to "page_alloc" failures in your console, I believe those are genuine OOM scenarios (no RAM/swap). Or, perhaps the memory allocator in 2.4 is susceptible to the same problems.
I don't see anything in the logs or the system console, just in the shell as part of the make. There always seems to be a lot of unused swap when I see the error. Perhaps it's not OOM Killer related at all.

--John
Back to top  
 
       Linode.com Forum Forum Index -> System and Network Status
Page 1 of 1