| Author |
Message |
bji
Joined: 27 Aug 2003
Posts: 182
|
| Posted: Tue Feb 24, 2004 2:27 pm Post subject: One week in the life of a Linode64 on host5 |
|
|
Hi all. I started keeping track of the load on my Linode last week when the system was being unresponsive. I've accumulated a week's worth of load data and it's rather interesting.
The way that this works is, I run a shell script appends a line with the date and the current "uptime" value into a file, then goes to sleep for 15 seconds, then repeats. Thus I have accumulated load information for my Linode every 15 seconds for a week. I wrote a script which plots this data using gnuplot. Here is the result:
My system is almost completely unloaded; on its own, I would never expect its load to go over 1, and certainly never 2. I would attribute all of the spikes to activity of other Linodes on the host system.
It's nice to see that the uptime graph accurately reflects the load that occurs on host5 every night at about 1:20 am (note the consistent spikes to load 3 or 4 at this time). What is also interesting is that last night's load was quite low - will we see an improvement in this hotspot in the future? Only time will tell ...
Also notice the spikes early last week. Holy crap, I've never seen a load of 18+ on a Linux box before! I started keeping this information around the time of that first load 4 because I was having some performance problems and these are readily reflected in the big spikes in the graph over the next day or so.
But it's really quieted down since then and is more like what I have traditionally found to be the performance on host5 - quite good 95% of the time, but with spikes late at night when everyone's "updatedb" cron jobs run ...
BTW, I know that there are more elegant ways to accumulate this data than my stupid little script (MRTG/rrdtools?), but this took me all of 5 minutes to hack together, and I haven't set aside a block of time yet to install/configure those other tools ... pointers would be most appreciated! |
|
| Back to top |
|
caker
Joined: 15 Apr 2003
Posts: 2392
Location: Galloway, NJ
|
| Posted: Tue Feb 24, 2004 2:38 pm Post subject: |
|
|
Interesting!
A little history on this topic:
All of the cron jobs (mostly just updatedb and makewhatis and other "not-really-worth-it" jobs) were left in their default times when I first created the disto templates (used by the distro wizard). The first few hosts that were deployed, and the customers who are on them, deployed their Linux installs with the cron jobs running at the same time.
After realizing this, I modified the majority of the template distros and moved the cron jobs to weekly. So now it just gets hammered on Sundays. The two biggest problem hosts are host3 (Linode 128) and host5 (Linode 64). The hosts that were added later don't seem to exhibit this problem at all.
The only reason why an "idle" Linode's loadavg goes up is because of processes blocked (waiting) for disk access. Each process waiting for the disk adds 1 to loadavg.
I don't really like messing with people's filesystems, but I've considered a script which edits the FS the next time the Linode is rebooted. Other options include sending an email to those on host3 and host5 with a few commands they can run to lighten the load.
The biggest reason why I'm pushing 2.6 on the hosts is because of a more fair I/O scheduler. Still, though, running updatedb, etc and sucking up disk bandwidth is wasteful.
I'm open to suggestions.
-Chris |
|
| Back to top |
|
bji
Joined: 27 Aug 2003
Posts: 182
|
| Posted: Tue Feb 24, 2004 3:47 pm Post subject: |
|
|
caker wrote: Interesting!
A little history on this topic:
All of the cron jobs (mostly just updatedb and makewhatis and other "not-really-worth-it" jobs) were left in their default times when I first created the disto templates (used by the distro wizard). The first few hosts that were deployed, and the customers who are on them, deployed their Linux installs with the cron jobs running at the same time.
After realizing this, I modified the majority of the template distros and moved the cron jobs to weekly. So now it just gets hammered on Sundays. The two biggest problem hosts are host3 (Linode 128) and host5 (Linode 64). The hosts that were added later don't seem to exhibit this problem at all.
The only reason why an "idle" Linode's loadavg goes up is because of processes blocked (waiting) for disk access. Each process waiting for the disk adds 1 to loadavg.
I don't really like messing with people's filesystems, but I've considered a script which edits the FS the next time the Linode is rebooted. Other options include sending an email to those on host3 and host5 with a few commands they can run to lighten the load.
The biggest reason why I'm pushing 2.6 on the hosts is because of a more fair I/O scheduler. Still, though, running updatedb, etc and sucking up disk bandwidth is wasteful.
I'm open to suggestions.
-Chris
Chris,
Thank you for your reponse!
Three things:
1. I think you should send an email out to customers on host3 and host5, rather than modifying people's filesystems without their knowledge. I think that a round of emails just letting people know how they can benefit from changing their cron job times would be sufficient to solve most of the problem (after all, it's for their own good too - their own updatedb will run faster at a time when the Linode host is not loaded down).
2. What about "randomizing" the cron times on a disk image before deploying it for a particular Linode? I imagine that right now, when a user selects the deployment of a particular distribution, the host just copies the filesystem over into their UML partition "file" and then resizes the filesystem. What about adding a step where the filesystem is mounted and the cron times are "randomized" - you could just have a script that opens a filesystem, and writes a randomized /etc/crontab out into it. By "randomized" I mean that daily scripts are run at a random time between say 2 and 4 am EST, weeklies a random time on either Saturday or Sunday between 4 and 6 am, etc.
3. I think that if step 2 was done, then updatedb and other stuff which is normally done daily, should be moved back to daily.
I hope that my graph demonstrates that for 95% of the time, Linode performance is really awesome. It's just those predictable spikes that I'd like to see if we can do something about, and I appreciate your enthusiasm in this endeavor!
Best wishes,
Bryan |
|
| Back to top |
|
bji
Joined: 27 Aug 2003
Posts: 182
|
| Posted: Tue Mar 02, 2004 1:54 pm Post subject: Another week in the life of host5 |
|
|
Here's this week's graph:
What's very interesting is that the nightly spike increases in severity linearly up to a maximum on Feb. 27 (Friday), and then decreases linearly from there. Very strange.
What's the status on addressing this issue? Have emails been sent out to host5 Linode owners asking them to change their cron times?
(Edited to change graph to use the same scale as the previous graph; I'll use 0 - 20 as my load scale from now on so that all graphs can be easily compared) |
|
| Back to top |
|
caker
Joined: 15 Apr 2003
Posts: 2392
Location: Galloway, NJ
|
| Posted: Tue Mar 02, 2004 4:41 pm Post subject: Re: Another week in the life of host5 |
|
|
bji wrote: What's the status on addressing this issue? Have emails been sent out to host5 Linode owners asking them to change their cron times?
Not yet. I need to do is go through each distro and figure out which files to move and to where. Once I have a set of instructions, I'll send out the emails.
-Chris |
|
| Back to top |
|
bji
Joined: 27 Aug 2003
Posts: 182
|
| Posted: Tue Mar 09, 2004 7:59 pm Post subject: This week's graph |
|
|
Still no improvement.
|
|
| Back to top |
|
bji
Joined: 27 Aug 2003
Posts: 182
|
| Posted: Tue Mar 09, 2004 11:19 pm Post subject: Re: This week's graph |
|
|
bji wrote: Still no improvement.
Whoops, I didn't realize that I have to keep those images hosted on my server in order for them to show up in this forum. I lost the most recent graph because it wasn't backed up, but I restored the other graphs. I lost about half a day's worth of data at the end of the most recent graph too ... my backups were only up to last night ... sorry :( |
|
| Back to top |
|
myrealbox
Joined: 13 Mar 2004
Posts: 8
|
| Posted: Sat Mar 13, 2004 6:23 am Post subject: |
|
|
Those results seem very weird. What could be causing the nightly load to be symmetrical like that?
-Mike |
|
| Back to top |
|
bji
Joined: 27 Aug 2003
Posts: 182
|
| Posted: Sat Mar 13, 2004 11:46 am Post subject: |
|
|
myrealbox wrote: Those results seem very weird. What could be causing the nightly load to be symmetrical like that?
-Mike
Didn't you read the posts above? It's caused by everyone's "updatedb" cron jobs running at the same time; this cron job puts a heavy disk burden on the Linode, and a bunch of them at once is really bad for the whole system.
The very best solution would be a kernel which somehow fairly allocated disk bandwidth, so that no one would ever "starve" for disk I/O like this.
A secondary solution would be to change the cron times so that they are staggered instead of everyone running them at the same time. I have changed my Linode's cron time, but the majority of people on host5 seem to be oblivious to this problem and have not done so. |
|
| Back to top |
|
caker
Joined: 15 Apr 2003
Posts: 2392
Location: Galloway, NJ
|
| Posted: Sat Mar 13, 2004 2:57 pm Post subject: Re: Another week in the life of host5 |
|
|
bji wrote: What's the status on addressing this issue? Have emails been sent out to host5 Linode owners asking them to change their cron times?
Emails sent to host3 and host5 members. I've asked people to ack back when they make a change -- we'll see how much of a difference it makes. Of course, tonight/tomorrow morning is cron.weekly day, so might have to wait a few days to see.
Looking forward to your graphs after people make these changes...
Also, I'm working on the host-reboot-to-2.6 schedule. So look for that in the next week or two. 2.6 has been running great on host18 and host19. This is big! :)
-Chris |
|
| Back to top |
|
myrealbox
Joined: 13 Mar 2004
Posts: 8
|
| Posted: Sat Mar 13, 2004 11:37 pm Post subject: |
|
|
bji wrote:
Didn't you read the posts above? It's caused by everyone's "updatedb" cron jobs running at the same time; this cron job puts a heavy disk burden on the Linode, and a bunch of them at once is really bad for the whole system.
But as far as I can see, this does not explain why the load is cyclic and always symmetrical about a particular, but differing, day of the week.
-Mike |
|
| Back to top |
|
bji
Joined: 27 Aug 2003
Posts: 182
|
| Posted: Sun Mar 14, 2004 1:18 am Post subject: |
|
|
myrealbox wrote:
But as far as I can see, this does not explain why the load is cyclic and always symmetrical about a particular, but differing, day of the week.
-Mike
Ah. Yes, that is an interesting question. Sorry, I didn't understand what you had meant before. I am looking forward to seeing people change their cron times after Chris' email (I hope he asked people to randomize their cron minutes and possibly hours rather than just moving stuff to cron.weekly), I hope that we never have to figure out why the graphs look like that :) ... |
|
| Back to top |
|
bji
Joined: 27 Aug 2003
Posts: 182
|
| Posted: Tue Mar 16, 2004 10:54 pm Post subject: This week's graph |
|
|
Here it is:
It's hard to tell if there has been any improvement since Caker's email went out. The spikes are small but there were periods of smaller spikes in previous graphs as well. I hope to see them getting even smaller next week :) ... |
|
| Back to top |
|
caker
Joined: 15 Apr 2003
Posts: 2392
Location: Galloway, NJ
|
| Posted: Wed Mar 17, 2004 12:07 am Post subject: Re: This week's graph |
|
|
bji wrote: It's hard to tell if there has been any improvement since Caker's email went out. The spikes are small but there were periods of smaller spikes in previous graphs as well. I hope to see them getting even smaller next week :) ...
Good. We'll get a good sample over the next week or so before host5 is rebooted onto 2.6.
-Chris |
|
| Back to top |
|
bji
Joined: 27 Aug 2003
Posts: 182
|
| Posted: Tue Mar 23, 2004 4:38 pm Post subject: A definite improvement |
|
|
There has been a definite improvement; this week's peak spike is small. The spikes are still regular but they are definitely getting smaller overall. Will a 2.6 kernel for host5 help? Let's keep our fingers crossed ...
|
|
| Back to top |
|
| |