What can be done about the load spikes?

Hi all. I've been a Linode customer for about a year and a half now. I've experienced load spikes of varying severity pretty much the entire time; I started on host5 which had them very consistently at a certain time of the night (due to that host being set up before caker's scripts were moving crontab jobs around), then I was moved to host31 when that host was available. Both are Linode64 hosts. I've seen a steady increase in the load spikes on host31 over the past 6 months and it's getting a bit unbearable. My munin graphs show a load spike of greater than 3 (I've found that a load spike of 3 or above renders the machine effectively unuseable; between 2 and 3 the spike slows the machine down but it is still basically useable) about 2 - 3 times per day on average.

My subjective experience mirrors this too - I'd say once every couple of days I happen to be trying to do something with my Linode at the time of a spike and I find that my Linode is effectively unreachable. It's basically the equivalent in effect to a network outage several times a day every day. If there were a network outage at the data center this frequently, the terms of service would indicate that a rebate is in order. These load spikes have the same effect but apparently don't invoke the same terms of service.

I'm trying to figure out what the best hope of avoiding these spikes is. I have watched my Linode's performance for several months and have found that my host has used io tokens only a couple of times - and never gone below 50% io tokens available (except on the once or twice that I tested the io token functionality by purposely running the io token count down). So it's not my Linode that is causing the load spikes, which are generally acknowledged to be due to one or more Linodes on the host hammering the disk. Unless - maybe the load spikes are sometimes just a natural consequence of 30+ Linodes all doing a bit more io than normal at the same time, but not enough for any of them to start using io tokens?

Would it be possible to tune a host to have an even stricter io token limit? One that is more tuned to hosts which are mostly idle (my Linode is basically just a web server and mail server and uses very little resources)? If so, I would gladly move to that host if it meant a slightly slower overall performance but a reduction (elimination hopefully) of the load spikes.

Also, how much do newer kernels with io load balancing help? Do these kernels have to be installled on the host or the Linodes or both? I am still running a 2.4 kernel because I haven't gotten around to upgrading yet. I don't think that my kernel has anything to do with the spikes though, it's other Linodes that are causing it as far as I can tell.

I know that other people have complained of the same problem - I've read some other posts where people say that their Linodes are inaccessible during certian times of the day, or randomly - and I attribute it all to these load spikes.

I'll post my most recent munin graphs for load average here so you can see what I am talking about.

Here's my daily graph for today. There was a 15+ load spike early this morning. It was so bad that my other munin graphs have a gap in them at that time because munin didn't run as scheduled.

~~![](<URL url=)http://www.ischo.com/load_graphs/load-daily.png" />

Weekly:

~~![](<URL url=)http://www.ischo.com/load_graphs/load-weekly.png" />

Monthly: (here you can really see the worsening of the load spike problem over the last couple of weeks)

~~![](<URL url=)http://www.ischo.com/load_graphs/load-monthly.png" />

Yearly: (here we see that even though the average load is low, there are so many load spikes that the entire graph is filled with them)

~~![](<URL url=)http://www.ischo.com/load_graphs/load-yearly.png" />

I love Linode's service otherwise, the load spike issue has been a real problem now for the entire time that I've used this service. During those periods when the load spikes are relatively small and infrequent, I am a very happy customer indeed. But unfortunately the load spikes have been getting worse and worse, and over the course of my subscription to the Linode service, have more often than not been a real problem for me.

Any suggestions on what can be done by me with my Linode or by Linode itself to solve this problem would be very, very welcome. Thank you!~~~~

6 Replies

Hi,

The newer hosts have CFQ on the host kernels, which from what caker has said, has a good effect.

Hopefully when caker get round to rebooting all the hosts, CFQ will at less lessen the effects you are seeing.

Adam

@adamgent:

Hi,

The newer hosts have CFQ on the host kernels, which from what caker has said, has a good effect.

Hopefully when caker get round to rebooting all the hosts, CFQ will at less lessen the effects you are seeing.

Adam

Well if that will really help, then I'll put a ticket in to request a reboot on host31 …

I wish that someone who watches this stuff as closely as I do had some comment on how much the CFQ patches improved the problem. It surprises me that with as many Linode users as there are (must be many hundreds now), not many other people are complaining about this problem which must affect everyone …

Everyone complains, just not everyone on the forums ;).

Caker, however, has proven himself to be flexible – trying to meet everyones needs. Some people would rather have uptime than a newer host kernel. It's just a matter of priorities.

I'm having the same problem on host 31 …

i posted a ticket before i saw your post,

anyway, it seems that today it works correctly * crosses fingers*

Has this issue been resolved?

@mmajorow:

Has this issue been resolved?

Yes, I'd say that it's mostly been resolved. The upgrade of the host to the new kernel which includes CFQ helped alot. You can see my more recent graphs here;

Daily:

~~![](<URL url=)http://www.ischo.com/load_graphs_2/eva. … ad-day.png">http://www.ischo.com/loadgraphs2/eva.ischo.com-load-day.png" />

Weekly:

~~![](<URL url=)http://www.ischo.com/load_graphs_2/eva. … d-week.png">http://www.ischo.com/loadgraphs2/eva.ischo.com-load-week.png" />

Monthy:

~~![](<URL url=)http://www.ischo.com/load_graphs_2/eva. … -month.png">http://www.ischo.com/loadgraphs2/eva.ischo.com-load-month.png" />

Yearly:

~~![](<URL url=)http://www.ischo.com/load_graphs_2/eva. … d-year.png">http://www.ischo.com/loadgraphs2/eva.ischo.com-load-year.png" />

There has been significant improvement, although as you can see there are still spikes sometimes, but nowadays they are much less frequent and of a lower spike level. Only a couple of times per week is there a spike of more than 2, and that's often not all that noticeable these days; it feels like the spikes even when they happen are not as severe in their effects as they used to be.

It seems that Linode may be moving to Xen from UML relatively soon, which hopefully will eliminate this problem completely. At least I'll be able to compare my pre-Xen graphs with my post-Xen graphs and make a good comparison of how much better Linodes hosted on Xen perform.

In summary, I'd have to answer that yes, this problem has been resolved, although not 100% - but enough that it's no longer a major issue.~~~~

Reply

Please enter an answer
Tips:

You can mention users to notify them: @username

You can use Markdown to format your question. For more examples see the Markdown Cheatsheet.

> I’m a blockquote.

I’m a blockquote.

[I'm a link] (https://www.google.com)

I'm a link

**I am bold** I am bold

*I am italicized* I am italicized

Community Code of Conduct