CPU spike, unresponsive SSH and corrupted logs

I've searched forums and googled but so far I've found very little info on what may be the problem.

Basically, what happened is that today at around 8am I lost ability to login using SSH or web console, though I still was able to access the website (lighttpd+php5+mysql), ftp (proftpd) and database (postgresql) and it all worked pretty fast. Later today I found out that email (postfix+courier) stopped working as well at around the same time, ~8am. At that point I decided not to wait until the night anymore and restarted the server from dashboard, which took almost two minutes to shutdown.

When I started reading the /var/log/ content I didn't find any suspicious records except for the fact that some of the logs are corrupted either with garbage in them, entries out of order, or entries from other log files mixed in. So far I haven't found any other corrupted files besides logs.

In graphs I can see that today at ~8:00 CPU suddenly maxed out at ~95-105% and stayed there for about hour and a half, until ~9:30. IO rate was steady at 50 with occasional (once in ~12 hours) spikes ~250-300, then during the CPU spike from ~8:00 to ~9:30 IO gradually dropped to 0 and then started jumping around erratically with max ~600.

I would really appreciate if anyone can give me any insight on how I can find out what had happened I how I can prevent it from happening again.

3 Replies

I've had this exact problem. No resolution yet unfortunately.

When this occurred I also noticed the system date had been changed to 1914. I've had this date error occur one other time on another linode where it didn't cause corrupted logs or a cpu spike, but that might have been that I noticed the date change before it caused too much of a problem (just a few hrs after the change). No hint in the logs as to the cause of the date change I could see.

NTP is set to ntp.ubuntu.com, but if that were the problem I assume I'd be seeing a lot of reports of this problem. Also, ntpd doesn't reset the date when this occurs due to the amount of change exceeding the sanity limit. ntpd's -g option could be used to avoid the sanity check, but I'm not going to use this till I'm sure of the cause.

The date issue resolves with a reboot, but then all the affected file's modified times need to be fixed.

@capitan_dorko:

When this occurred I also noticed the system date had been changed to 1914.

That is a kernel bug in 2.6.34… if you're still running that version after a reboot, make sure you're set to run Latest 2.6 Paravirt and aren't locked to a specific kernel.

Ah. Thank you so much. I am running 2.6.34 due to the paravirt kernel having had a bug regarding syslog with iptables. That however is not a show stopper like this can be.

Again. Much thanks.

Reply

Please enter an answer
Tips:

You can mention users to notify them: @username

You can use Markdown to format your question. For more examples see the Markdown Cheatsheet.

> I’m a blockquote.

I’m a blockquote.

[I'm a link] (https://www.google.com)

I'm a link

**I am bold** I am bold

*I am italicized* I am italicized

Community Code of Conduct