Higher load average after upgrade

Hi,

I recently upgraded one of my Linodes from a Linode 512 to 768 due to some memory limitations. After booting it up again, I noticed that while idling the load is still higher than it should be.

It's currently supposed to be idle, but the load avg are: load average: 0.49, 0.62, 0.59

It's staying around that number, I've seen it peak at 0.90, with nothing going on that should cause it.

CPU looks idle, IO seems low and RAM is fine as well. Here's the summary from top:

top - 16:07:26 up  4:11,  1 user,  load average: 0.60, 0.64, 0.60
Tasks: 106 total,   2 running, 104 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.0%us,  0.2%sy,  0.0%ni, 99.8%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:    768316k total,   709116k used,    59200k free,    32348k buffers
Swap:   524284k total,      180k used,   524104k free,   478828k cached

I tried to troubleshoot myself a bit, but I don't have enough experience to say whether this could be the cause or not. With nmon I looked at the kernel statistics and saw this:

 Kernel Stats ----------------------------------------------------------------------------------------------------------------------------------------                                                                                     |
| RunQueue              1   Load Average    CPU use since boot time                                                                                                                                                                         |
| ContextSwitch    1776.8    1 mins  0.88    Uptime Days=  0 Hours= 4 Mins=13                                                                                                                                                               |
| Forks               3.0    5 mins  0.71    Idle   Days=  1 Hours= 9 Mins=21                                                                                                                                                               |
| Interrupts       1050.6   15 mins  0.62    Average CPU use=-687.96%                                                                                                                                                                       |

I thought that maybe the interupts/contextswitched are what causes the high load? I don't know what it should be, but from what I saw on screenshots from other people, it wasn't this high when the system should be idle.

I looked at the interupts and see this:

Average:         INTR    intr/s
Average:            0      0.00
Average:            1      0.00
Average:            2      0.00
Average:            3      0.00
Average:            4      0.00
Average:            5      0.00
Average:            6      0.00
Average:            7      0.00
Average:            8      0.00
Average:            9      0.00
Average:           10      0.00
Average:           11      0.00
Average:           12      0.00
Average:           13      0.00
Average:           14      0.00
Average:           15      0.00
Average:           16    183.40
Average:           17      1.70
Average:           18     47.70
Average:           19      0.00
Average:           20      0.00
Average:           21      0.00
Average:           22    179.05
Average:           23      1.85
Average:           24     43.25
Average:           25      0.00
Average:           26      0.00
Average:           27      0.40
Average:           28    180.35
Average:           29      1.45
Average:           30     44.70
Average:           31      0.00
Average:           32      0.00
Average:           33      1.35
Average:           34    178.05
Average:           35      1.80
Average:           36     42.55
Average:           37      0.00
Average:           38      0.00
Average:           39      1.05
Average:           40      0.00
Average:           41      0.00
Average:           42      7.35
Average:           43      0.00
Average:           44      1.30
Average:           45      0.00
Average:           46      0.00
Average:           47      0.00
[email protected]:~# cat /proc/interrupts 
           CPU0       CPU1       CPU2       CPU3       
 16:    2972148          0          0          0  xen-percpu-virq      timer0
 17:      21870          0          0          0  xen-percpu-ipi       spinlock0
 18:     750503          0          0          0  xen-percpu-ipi       resched0
 19:        146          0          0          0  xen-percpu-ipi       callfunc0
 20:          0          0          0          0  xen-percpu-virq      debug0
 21:       8028          0          0          0  xen-percpu-ipi       callfuncsingle0
 22:          0    2915213          0          0  xen-percpu-virq      timer1
 23:          0      22061          0          0  xen-percpu-ipi       spinlock1
 24:          0     695840          0          0  xen-percpu-ipi       resched1
 25:          0         92          0          0  xen-percpu-ipi       callfunc1
 26:          0          0          0          0  xen-percpu-virq      debug1
 27:          0       8167          0          0  xen-percpu-ipi       callfuncsingle1
 28:          0          0    2824373          0  xen-percpu-virq      timer2
 29:          0          0      21485          0  xen-percpu-ipi       spinlock2
 30:          0          0     692198          0  xen-percpu-ipi       resched2
 31:          0          0        225          0  xen-percpu-ipi       callfunc2
 32:          0          0          0          0  xen-percpu-virq      debug2
 33:          0          0       8251          0  xen-percpu-ipi       callfuncsingle2
 34:          0          0          0    2895170  xen-percpu-virq      timer3
 35:          0          0          0      20237  xen-percpu-ipi       spinlock3
 36:          0          0          0     678003  xen-percpu-ipi       resched3
 37:          0          0          0        202  xen-percpu-ipi       callfunc3
 38:          0          0          0          0  xen-percpu-virq      debug3
 39:          0          0          0       8396  xen-percpu-ipi       callfuncsingle3
 40:        433          0          0          0   xen-dyn-event     xenbus
 41:        405          0          0          0   xen-dyn-event     hvc_console
 42:      66505          0          0          0   xen-dyn-event     blkif
 43:         67          0          0          0   xen-dyn-event     blkif
 44:      90115          0          0          0   xen-dyn-event     eth0
NMI:          0          0          0          0   Non-maskable interrupts
LOC:          0          0          0          0   Local timer interrupts
SPU:          0          0          0          0   Spurious interrupts
PMI:          0          0          0          0   Performance monitoring interrupts
IWI:          0          0          0          0   IRQ work interrupts
RTR:          0          0          0          0   APIC ICR read retries
RES:     750503     695840     692198     678003   Rescheduling interrupts
CAL:       8174       8259       8476       8598   Function call interrupts
TLB:          0          0          0          0   TLB shootdowns
ERR:          0
MIS:          0

As for contextswitches, I have no idea on how to look up what's causing the high number. Or even -if- these things can be the cause for the high load.

EDIT: I tried the following, but it doesn't seem to show what's causing it. In this list there's nothing that's very high.

[email protected]:~# pidstat -w 
Linux 3.4.2-linode44 (nuvini)   07/28/2012      _i686_  (4 CPU)

04:17:56 PM       PID   cswch/s nvcswch/s  Command
04:17:56 PM         1      0.33      0.00  init
04:17:56 PM         2      0.01      0.00  kthreadd
04:17:56 PM         3      0.37      0.00  ksoftirqd/0
04:17:56 PM         4      1.82      0.00  kworker/0:0
04:17:56 PM         5      0.11      0.00  kworker/u:0
04:17:56 PM         6      0.43      0.00  migration/0
04:17:56 PM         7      0.44      0.00  migration/1
04:17:56 PM         8      0.00      0.00  kworker/1:0
04:17:56 PM         9      0.34      0.00  ksoftirqd/1
04:17:56 PM        10      0.45      0.00  migration/2
04:17:56 PM        11      0.00      0.00  kworker/2:0
04:17:56 PM        12      0.34      0.00  ksoftirqd/2
04:17:56 PM        13      0.45      0.00  migration/3
04:17:56 PM        14      0.00      0.00  kworker/3:0
04:17:56 PM        15      0.38      0.00  ksoftirqd/3
04:17:56 PM        16      0.00      0.00  cpuset
04:17:56 PM        17      0.00      0.00  khelper
04:17:56 PM        18      0.04      0.00  kdevtmpfs
04:17:56 PM        19      0.00      0.00  kworker/u:1
04:17:56 PM        21      0.01      0.00  xenwatch
04:17:56 PM        22      0.03      0.00  xenbus
04:17:56 PM       155      0.17      0.00  sync_supers
04:17:56 PM       157      0.00      0.00  bdi-default
04:17:56 PM       159      0.00      0.00  kblockd
04:17:56 PM       168      0.96      0.00  kworker/1:1
04:17:56 PM       171      0.00      0.00  md
04:17:56 PM       255      0.00      0.00  rpciod
04:17:56 PM       257      1.40      0.00  kworker/0:1
04:17:56 PM       268      0.02      0.00  kswapd0
04:17:56 PM       269      0.00      0.00  ksmd
04:17:56 PM       270      0.00      0.00  fsnotify_mark
04:17:56 PM       274      0.00      0.00  ecryptfs-kthrea
04:17:56 PM       276      0.00      0.00  nfsiod
04:17:56 PM       277      0.00      0.00  cifsiod
04:17:56 PM       280      0.00      0.00  jfsIO
04:17:56 PM       281      0.00      0.00  jfsCommit
04:17:56 PM       282      0.00      0.00  jfsCommit
04:17:56 PM       283      0.00      0.00  jfsCommit
04:17:56 PM       284      0.00      0.00  jfsCommit
04:17:56 PM       285      0.00      0.00  jfsSync
04:17:56 PM       286      0.00      0.00  xfsalloc
04:17:56 PM       287      0.00      0.00  xfs_mru_cache
04:17:56 PM       288      0.00      0.00  xfslogd
04:17:56 PM       289      0.00      0.00  glock_workqueue
04:17:56 PM       290      0.00      0.00  delete_workqueu
04:17:56 PM       291      0.00      0.00  gfs_recovery
04:17:56 PM       292      0.00      0.00  crypto
04:17:56 PM       540      0.20      0.00  php5-fpm
04:17:56 PM       541      0.01      0.00  php5-fpm
04:17:56 PM       542      0.01      0.00  php5-fpm
04:17:56 PM       543      0.02      0.00  php5-fpm
04:17:56 PM       544      0.00      0.00  php5-fpm
04:17:56 PM       545      0.00      0.00  php5-fpm
04:17:56 PM       546      0.00      0.00  php5-fpm
04:17:56 PM       547      0.00      0.00  php5-fpm
04:17:56 PM       855      0.00      0.00  khvcd
04:17:56 PM       971      0.00      0.00  kpsmoused
04:17:56 PM       972      0.87      0.00  kworker/2:1
04:17:56 PM      1009      0.00      0.00  deferwq
04:17:56 PM      1011      0.89      0.00  kworker/3:1
04:17:56 PM      1013      0.37      0.00  kjournald
04:17:56 PM      1048      0.00      0.01  udevd
04:17:56 PM      1142      0.00      0.00  udevd
04:17:56 PM      1143      0.00      0.00  udevd
04:17:56 PM      1381      0.00      0.00  portmap
04:17:56 PM      1393      0.00      0.00  rpc.statd
04:17:56 PM      1618      0.00      0.00  rsyslogd
04:17:56 PM      1627      0.16      0.01  monit
04:17:56 PM      1687      0.01      0.00  mysqld_safe
04:17:56 PM      1831      0.11      0.00  mysqld
04:17:56 PM      1832      0.00      0.00  logger
04:17:56 PM      1887      1.00      0.01  apache2
04:17:56 PM      1891      0.00      0.00  apache2
04:17:56 PM      1892      0.00      0.00  apache2
04:17:56 PM      1898      0.33      0.00  apache2
04:17:56 PM      1900      0.00      0.00  atd
04:17:56 PM      1939      0.02      0.00  cron
04:17:56 PM      1948      0.20      0.00  flush-202:0
04:17:56 PM      1949      0.00      0.00  apache2
04:17:56 PM      1950      0.00      0.00  apache2
04:17:56 PM      2102      0.00      0.00  dhclient
04:17:56 PM      2158      0.03      0.00  sshd
04:17:56 PM      2166      0.09      0.00  master
04:17:56 PM      2169      0.01      0.00  qmgr
04:17:56 PM      2193      0.00      0.00  tlsmgr
04:17:56 PM      2550      1.02      0.00  ntpd
04:17:56 PM      2553      1.00      0.00  memcached
04:17:56 PM      2727      0.05      0.00  fail2ban-server
04:17:56 PM      2821      0.22      0.04  lfd
04:17:56 PM      2842      0.00      0.00  lfd
04:17:56 PM      3498      0.00      0.00  mingetty
04:17:56 PM      3655      0.00      0.00  sshd
04:17:56 PM      3657      0.42      0.01  sshd
04:17:56 PM      3660      0.01      0.00  bash
04:17:56 PM      3703      0.00      0.00  su
04:17:56 PM      3704      0.05      0.01  bash
04:17:56 PM      5250      0.00      0.00  sshd
04:17:56 PM      5252      0.01      0.00  sshd
04:17:56 PM      5253      0.00      0.00  sshd
04:17:56 PM      5710      0.00      0.00  teamspeak
04:17:56 PM      5711      1.47      0.14  ts3server_start
04:17:56 PM      5731      0.01      0.00  ts3server_linux
04:17:56 PM     11700      0.00      0.00  sleep
04:17:56 PM     11701      0.00      0.00  pidstat
04:17:56 PM     28375      0.01      0.00  smtpd
04:17:56 PM     29738      0.00      0.00  pickup
[email protected]:~# sar -w 1 3 
Linux 3.4.2-linode44 (nuvini)   07/28/2012      _i686_  (4 CPU)

04:18:19 PM    proc/s   cswch/s
04:18:20 PM      3.00   1899.00
04:18:21 PM      2.00   2031.00
04:18:22 PM      1.00   1959.00
Average:         2.00   1963.00

The server itself does seem to be running fine - but I would like to have a realistic value of the load, and this only started after the upgrade. It should be idle now so it shouldn't show a load like that.

Hopefully anyone has any ideas on how to get it back to how it should be. Thanks :)

6 Replies

http://forum.linode.com/viewtopic.php?f … =15#p52809">http://forum.linode.com/viewtopic.php?f=12&t=9169&start=15#p52809

James

Hi,

Linode offered to migrate me to another host. The problem wasn't fixed with that however, so it's probably something else.

I shut down pretty much every service/process that was non-essential. Load average still stays off. Nothing very high, but it should be lower.

I also tried to verify if I have the same problem CPU-graph wise, however my CPU graphs top off at 400, not 200.

Is there anything I can do to see where this is coming from?

Here's a process-list:

[email protected]:~# uptime
 00:16:29 up 28 min,  2 users,  load average: 0.27, 0.19, 0.15
[email protected]:~# ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 Jul28 ?        00:00:00 init [2]          
root         2     0  0 Jul28 ?        00:00:00 [kthreadd]
root         3     2  0 Jul28 ?        00:00:00 [ksoftirqd/0]
root         4     2  0 Jul28 ?        00:00:00 [kworker/0:0]
root         5     2  0 Jul28 ?        00:00:00 [kworker/u:0]
root         6     2  0 Jul28 ?        00:00:00 [migration/0]
root         7     2  0 Jul28 ?        00:00:00 [migration/1]
root         8     2  0 Jul28 ?        00:00:00 [kworker/1:0]
root         9     2  0 Jul28 ?        00:00:00 [ksoftirqd/1]
root        10     2  0 Jul28 ?        00:00:00 [migration/2]
root        11     2  0 Jul28 ?        00:00:00 [kworker/2:0]
root        12     2  0 Jul28 ?        00:00:00 [ksoftirqd/2]
root        13     2  0 Jul28 ?        00:00:00 [migration/3]
root        14     2  0 Jul28 ?        00:00:00 [kworker/3:0]
root        15     2  0 Jul28 ?        00:00:00 [ksoftirqd/3]
root        16     2  0 Jul28 ?        00:00:00 [cpuset]
root        17     2  0 Jul28 ?        00:00:00 [khelper]
root        18     2  0 Jul28 ?        00:00:00 [kdevtmpfs]
root        19     2  0 Jul28 ?        00:00:00 [kworker/u:1]
root        21     2  0 Jul28 ?        00:00:00 [xenwatch]
root        22     2  0 Jul28 ?        00:00:00 [xenbus]
root       155     2  0 Jul28 ?        00:00:00 [sync_supers]
root       157     2  0 Jul28 ?        00:00:00 [bdi-default]
root       159     2  0 Jul28 ?        00:00:00 [kblockd]
root       168     2  0 Jul28 ?        00:00:00 [kworker/3:1]
root       171     2  0 Jul28 ?        00:00:00 [md]
root       255     2  0 Jul28 ?        00:00:00 [rpciod]
root       257     2  0 Jul28 ?        00:00:00 [kworker/0:1]
root       268     2  0 Jul28 ?        00:00:00 [kswapd0]
root       269     2  0 Jul28 ?        00:00:00 [ksmd]
root       270     2  0 Jul28 ?        00:00:00 [fsnotify_mark]
root       274     2  0 Jul28 ?        00:00:00 [ecryptfs-kthrea]
root       276     2  0 Jul28 ?        00:00:00 [nfsiod]
root       277     2  0 Jul28 ?        00:00:00 [cifsiod]
root       280     2  0 Jul28 ?        00:00:00 [jfsIO]
root       281     2  0 Jul28 ?        00:00:00 [jfsCommit]
root       282     2  0 Jul28 ?        00:00:00 [jfsCommit]
root       283     2  0 Jul28 ?        00:00:00 [jfsCommit]
root       284     2  0 Jul28 ?        00:00:00 [jfsCommit]
root       285     2  0 Jul28 ?        00:00:00 [jfsSync]
root       286     2  0 Jul28 ?        00:00:00 [xfsalloc]
root       287     2  0 Jul28 ?        00:00:00 [xfs_mru_cache]
root       288     2  0 Jul28 ?        00:00:00 [xfslogd]
root       289     2  0 Jul28 ?        00:00:00 [glock_workqueue]
root       290     2  0 Jul28 ?        00:00:00 [delete_workqueu]
root       291     2  0 Jul28 ?        00:00:00 [gfs_recovery]
root       292     2  0 Jul28 ?        00:00:00 [crypto]
root       855     2  0 Jul28 ?        00:00:00 [khvcd]
root       969     2  0 Jul28 ?        00:00:00 [kpsmoused]
root      1006     2  0 Jul28 ?        00:00:00 [kworker/1:1]
root      1009     2  0 Jul28 ?        00:00:00 [deferwq]
root      1012     2  0 Jul28 ?        00:00:00 [kjournald]
root      1047     1  0 Jul28 ?        00:00:00 udevd --daemon
root      1091     2  0 Jul28 ?        00:00:00 [kworker/2:1]
root      1113  1047  0 Jul28 ?        00:00:00 udevd --daemon
root      1114  1047  0 Jul28 ?        00:00:00 udevd --daemon
root      1647     1  0 Jul28 ?        00:00:00 dhclient -v -pf /var/run/dhclient.eth0.pid -lf /var/lib/dhcp/dhclient.eth0.leases eth0
root      1688     1  0 Jul28 ?        00:00:00 /usr/sbin/sshd
root      1926     2  0 Jul28 ?        00:00:00 [flush-202:0]
root      1971     1  0 Jul28 ?        00:00:00 /usr/sbin/cron
root      3545     1  0 Jul28 hvc0     00:00:00 /sbin/mingetty hvc0
root      3655  1688  0 Jul28 ?        00:00:00 sshd: niels [priv]
niels     3657  3655  0 Jul28 ?        00:00:00 sshd: [email protected]/0
niels     3658  3657  0 Jul28 pts/0    00:00:00 -bash
root      3687  1688  0 Jul28 ?        00:00:00 sshd: niels [priv]
niels     3689  3687  0 Jul28 ?        00:00:00 sshd: [email protected]/1
niels     3690  3689  0 Jul28 pts/1    00:00:00 -bash
root      3719  3690  0 Jul28 pts/1    00:00:00 su
root      3720  3719  0 Jul28 pts/1    00:00:00 bash
niels     3984  3658  0 Jul28 pts/0    00:00:04 htop
root      4250  3720  0 00:16 pts/1    00:00:00 ps -ef

I see the same thing since Jul 1st reboot (fixing leap second problems). Average load is around 1, previously was less than 0.5. I also see "steal" in Munin CPU graphs which I previously didn't. I attribute that to the new kernel. 3.4.2 (linode25), and before that it was 3.0.18 (linode24). CentOS 6.3 (currently, but it was the same with 6.2).

There have been issues with the load average calculation in recent kernel versions. There appears to be a fix for this in mainline 3.5, and I know the fix has made it to 3.4.6 and 3.2.24. (I am not sure 3.0 was affected.)

https://patchwork.kernel.org/patch/1162981/

@Azathoth:

I see the same thing since Jul 1st reboot (fixing leap second problems). Average load is around 1, previously was less than 0.5. I also see "steal" in Munin CPU graphs which I previously didn't. I attribute that to the new kernel. 3.4.2 (linode25), and before that it was 3.0.18 (linode24). CentOS 6.3 (currently, but it was the same with 6.2).

Thank you! Seems like the upgrade pushed me to the latest kernel, I was on an older 3.x(something) kernel before. It looks like it's a bug in 3.4.2. When I went back to the 3.0.18 it looked normal again. Awesome.

I'll switch back to the latest kernel when an update comes to fix that load avg calculations.

Thanks again.

A cosmetic issue is not a good reason not to run the newer kernel…

Reply

Please enter an answer
Tips:

You can mention users to notify them: @username

You can use Markdown to format your question. For more examples see the Markdown Cheatsheet.

> I’m a blockquote.

I’m a blockquote.

[I'm a link] (https://www.google.com)

I'm a link

**I am bold** I am bold

*I am italicized* I am italicized

Community Code of Conduct