No Traffic but I/O Spikes, OOMs, Crashes

performance

hey all,

constant OOMing has me at a loss for the cause.

sometimes the IO spikes before OOMkiller starts hacking at apache and the box completely crashes, with CPU jumping over 100% on each processor.

here are some of my commonly associate settings:

# StartServers 5

MinSpareServers 5

MaxSpareServers 10

MaxClients 150

MaxRequestsPerChild 0

StartServers 1

MinSpareServers 3

MaxSpareServers 6

MaxClients 15

MaxRequestsPerChild 3000

Mysql tweaks:

key_buffer = 16K

maxallowedpacket = 1M

thread_stack = 64K

table_cache = 4

sort_buffer = 64K

netbufferlength = 2K

skip-innodb

sysctl.conf tweak: vm/minfreekbytes = 16384

here is a munin report showing the crash: http://db.tt/iJQ57Bw

any help would be appreciated. thanks!

9 Replies

forum:db3l 14 years, 7 months ago

~~@mattryan29:~~

constant OOMing has me at a loss for the cause.
Well, the immediate cause is fairly easy to determine - you're running out of memory, to state the obvious. There's some supporting evidence for that in your munin graphs that show significant swap space usage, and if you're using a default 256MB swap image I can certainly see how it's possible you'd run completely out of memory. Not sure the graphs directly show such a failure, but its often the case that when it happens it's fast enough to be missed (or preclude) the next munin polling cycle. And yes, if the OOM killer can't clean out enough fast enough, it can lead to a kernel panic, which can max out CPU usage (though in my case it's always seemed to be a single core, probably for the kernel thread).

Of course, the rub is determining why, but a reasonable first step is to focus on your web stack, since that tends to have the most variability in a lot of configurations.

I'd start by reviewing status when the system is up, to determine a process size estimate for Apache, and how much memory is available once you take away any other standard processes on your system. Then see if that size, multiplied by your MaxClients configuration, could use more memory than you have.

If I could do so I'd also try stress testing the configuration (such as with "ab" choosing a representative URL that involves the full stack and database) in its current configuration to see if I could cause the problem. If so, it's a big advantage since you can stress test after changes.

In either case, you're likely going to want to drop MaxClients. If your analysis shows the current value is too large, you can use that to pick a new value. But even if it appears ok, one troubleshooting approach is to drop MaxClients a lot - like down to 1-2 - as well as dropping MaxRequestsPerChild a lot - maybe low double or single digits - in case there's a per-request leak going on. Dropping MaxRequestsPerChild is to help avoid a single process growing unusually large, which you may not have been able to catch while observing the system. Performance may suffer, but at this point the goal is to completely stop the full failure, and worry about performance second.

If you can't afford to do the stress testing or configuration changes on the production Linode, clone it to another Linode (even if you just add one for a few days for the testing), and then perform your stress testing and test changes there.

If your application stack is large enough memory-wise for each request, it's not necessarily wrong to have to drop MaxClients into single digits on a Linode 512. Nor does doing so necessarily imply terrible performance, unless each request takes a really long time to satisfy. Though of course, delaying requests is still a far more graceful degradation than keeling over completely.

There are a number of topics here on tuning Apache (and associated application stacks) and MySQL that have more detailed suggestions and ways to work up to a final configuration once you've stopped the pain, plus ways to improve performance at whatever configuration your Linode can support, so I'd definitely do a few forum searches to see if those can also help. But in terms of first steps, most of them boil down to dropping the configuration extremely low to guarantee you have enough resources, and then adjusting them slowly upwards. This may eventually also lead to a conclusion that a larger Linode is needed for your purpose, but that's really only something you can conclude with certainty after having tuned everything to the current Linode.

– David

forum:EtherealMind 14 years, 7 months ago

I'm alos having constant problems with OOM and IO lockups. The Ubuntu 10.04 just completely grinds to a hlat about every twenty minutes.

In the end my Apache got down to this:

StartServers 2

MinSpareServers 2

MaxSpareServers 2

ServerLimit 3

MaxClients 50

MaxRequestsPerChild 3000

StartServers 2

MinSpareThreads 2

MaxSpareThreads 5

ThreadLimit 32

ThreadsPerChild 25

MaxClients 50

ServerLimit 3

MaxRequestsPerChild 0

By limiting the server to just a few instances I have it stable for more than a hour but only if I reboot every hour. However performance is abysmal.

The latest version of the paravirt kernel is buggy according to Linode support but I'm at my wits end as what the cause is.

forum:akerl 14 years, 7 months ago

You either need to edit the prefork or worker section, depending on which MPM you are using. You won't be using both at once.

I'd recommend dropping MaxClients down to 5, and seeing what happens.

caker 14 years, 7 months ago

This will tell you which MPM you have installed, and thus which section of the config to tune:

apache2 -V | grep MPM

-Chris

forum:mattryan29 14 years, 7 months ago

what is an appropriate max requests per child on a 512 box?

forum:db3l 14 years, 7 months ago

~~@mattryan29:~~

what is an appropriate max requests per child on a 512 box?
The requests per child setting is a way to amortize the overhead of creating a new worker process across many requests. Even small non-zero values will probably always be helpful. But larger values really only make a difference if that overhead is your bottleneck, and since it leaves worker processes around longer, it risks resource growth over time. There's no single "appropriate" answer.

If I were setting a value for a new configuration, I might choose something like 10-50 just to have a value and then tune as I tested. Starting low acknowledges that the risk of a growing working process exceeding resources can have a much more disastrous impact than slower response rates due to process creation being a bottleneck.

A lot also depends on your request load. If you're peaking at single digit request/s rates, process fork overhead and thus the setting probably isn't going to matter all that much. If you're trying to hit hundreds or thousands a second, it could make a big difference.

It's far more important to first tune things so you are staying within your available resources (which requests per child doesn't really impact - that's MaxClients primarily). Once you're there, increasing requests per child will likely initially benefit performance, but will quickly fall off, something it wouldn't surprise me to see happening even at double digit values.

In the end, best is probably to just test in your specific situation. Use something like ab to get MaxClients to a point where you fit in memory, then see if increasing requests per child yields higher performance rates. My bet is pretty quickly it won't make much difference as the bottlenecks are elsewhere, and dwarf the process creation overhead.

– David

forum:mattryan29 14 years, 7 months ago

As suggested I dropped the max child requests way down and it helps a lot. I've been logging slow queries and worpress seems to be primary offender. I'll be increasing it and monitoring the memory consumption closely. But, this brings up a separate general memory question.

Is there a healthy percentage of memory to be in use, during normal operations? Such as low traffic periods? I know this question depends on a lot of variables. I guess I'm most interested to see what is comfortable for others.

forum:waldo 14 years, 7 months ago

This might help

http://www.linuxatemyram.com/

forum:db3l 14 years, 7 months ago

~~@mattryan29:~~

Is there a healthy percentage of memory to be in use, during normal operations? Such as low traffic periods? I know this question depends on a lot of variables. I guess I'm most interested to see what is comfortable for others.
You're right that it's almost impossible to answer outside of the parameters of your specific application stack and usage patterns. Ideally you want to use exactly 100% of your memory during the heaviest load (don't worry about low usage times), but usage is dynamic and impossible to predict exactly. Your main choice is likely to be how much space you want to try to leave for filesystem buffers/cache.

Some caching will always be beneficial - I/O is likely the most constrained resource on a Linode, and it doesn't help if you can service extra requests with more Apache workers if they all become I/O bound anyway.

If you just want a number to aim for, try to reserve 10-20% for caching/buffering, so 50-100MB on a 512 under full load. But testing is still best. You may find that due to other bottlenecks there's little performance increase to increasing MaxClients at some point, even if that leaves a lot more memory free. At that point, I'd leave things alone and let the kernel use more memory for buffering. I suppose the converse could be true (increasing MaxClients at the expense of buffering could have a big boost to performance) but I wouldn't bet on it.

This is also just the first "next step" in your additional tuning. This first step helped stop the major pain, but there are a number of performance dimensions you can work on now that you're fitting within the available memory. For example, caching for WordPress (also discussed in other posts) makes a separate trade-off where you'll want to take some other memory away from Apache workers and instead give it to the cache. So whatever balance you find with your initial MaxClients tuning and system memory may change as you fine tune things further.

– David

Reply

Description

Please enter an answer

Tips:

You can mention users to notify them: @username

You can use Markdown to format your question. For more examples see the Markdown Cheatsheet.

> I’m a blockquote.

I’m a blockquote.

[I'm a link] (https://www.google.com)

I'm a link

**I am bold** I am bold

*I am italicized* I am italicized

Compute

Storage

Networking

Databases

Services

Solutions

Pricing

Library

Technical Resources

Community

Marketplace

What's New

No Traffic but I/O Spikes, OOMs, Crashes

MinSpareServers 5

MaxSpareServers 10

MaxClients 150

MaxRequestsPerChild 0

9 Replies

Reply

Tips: