My server gets down at the same time point every day, help!

well, I have been using Linode 360 since this August, however, I began to have this downtime problem from late November. Basically, my server is down at 2pm (Mountain time) every single day and I have to restart the VPS multiple times to get it back on line.

Here's the traffic chart from my linode control panel:

~~![](<URL url=)http://www.panda-greatwall.com/downtime_linode.jpg" />

In this picture, you can see the downtime that has no traffic, which corresponds to high CPU usage ( ~140%). I have checked almost everything on my VPS, like system log, cron job, etc.

I wonder what could be the cause for this weird problem that is timed accurately for everyday?

thanks~~

15 Replies

check your cron scripts… specifically, i'd look for something that starts around 20:00 (see the spike before the downtime in the CPU load?)

How's your IO graph at that time? What kernel version are you on?

here's the IO graph:

~~![](<URL url=)http://www.panda-greatwall.com/io_down.jpg" />

I've checked my cron jobs and there's nothing set around 20:00

I found this thread http://www.linode.com/forums/viewtopic.php?t=3653 reports the similar problem as I have. I tried to install Munin by following this tutorial http://blog.jploh.com/2007/06/14/how-to … on-centos/">http://blog.jploh.com/2007/06/14/how-to-install-munin-on-centos/, and here's the report: http://69.56.251.127/munin/index.html (is it supposed to be only for me to view??, if yes, please let me know)

I use Lighttpd as webserver on CentOS 5.2 by the way.

thanks!~~

When you said youve checked your cron scripts, i assume you checked what is returned by crontab -e, AS well as whats listed in /etc/cron.d/ and /etc/cron.daily?

@Internat:

When you said youve checked your cron scripts, i assume you checked what is returned by crontab -e, AS well as whats listed in /etc/cron.d/ and /etc/cron.daily?

Yes, I've checked all of these. thanks

Here's munin report for you experts to help me analyze:

http://69.56.251.127/munin/localhost/localhost.html

Just now, my VPS was down again and I had to restart it to bring it back online. The start of the down time is exactly the same for everyday: 2pm Mountain Standard Time (MST).

What could be the problem for such problem?? thanks

Well, for one, your root filesystem seems to be full. That might not be the reason for the hang-up, but it's not good news at all. Your processes won't be able to write logs, create temporary files, and all kinds of bad things can happen.

Try to run "apt-get clean" as root, and see if that helps.

Otherwise try to remove some old logs from /var/log.

all right, I think I've found the source of the problem - cron jobs!

One of the cron jobs to run google sitemap generator by reading access log consumes too much CPU & memory resources, which kind of freezes the entire system.

I wonder how to make my access logs smaller and easier to read? I have started using logrotate for the access logs of my Lighttpd server. anyone happen to know?

thanks!!

I'm coming in on the tail end of this, but I'd like to offer some general sysadmin guidelines:

  • Run Munin. You seem to be doing this now, which is a good start.

  • Start checking Munin on a daily basis, so you get an idea of what normal "baseline" performance for your machine is.

  • When you have problems, start checking the logs under /var/log/. You can type "ls -ltr" to get a list of files sorted by time updated. The most recently updated log files will appear at the end of the list.

  • Log rotation. You should be doing this on a daily basis.

  • In your case, where the disk is full, you need to delete/remove the biggest offenders. Type "du -hs *" while in /var/log to get disk usage on each of the directories underneath of it. You can use this technique to quickly figure out if you have one or two large files (or directories full of lots of files) that are taking up lots of space.

Good luck!

– Doug

yes, I just found my disk is 100% used. Last time when I checked, it was still 50% used (probably a month ago), I don't even know what kind of craps fill up my disk so quickly.

So I wonder what command I should use to find which directories eat up most of the space?

thanks!!

cd /

du -hs *

"man du" for more information on how the command works.

I'd bet it's something in /var/log being a likely candidate.

I always slice up my drives, very baaaaaad idea to have everything in one slice.

@marcus0263:

I'd bet it's something in /var/log being a likely candidate.

I always slice up my drives, very baaaaaad idea to have everything in one slice.

You are right! One of the error.log files is 4.6 GB and I don't know what happened to that website. I have deleted the gigantic error.log file by running 'rm -rf error.log', however, the system still shows 100% space used when I ran 'df -hT'.

Filesystem    Type    Size  Used Avail Use% Mounted on
/dev/xvda     ext3     12G   12G   43M 100% /
tmpfs        tmpfs    181M     0  181M   0% /dev/shm

Did the deleted file go to some recycler?

Send a SIGHUP the web server - it still has the file open, so it won't go away until it closes it. I think a SIGHUP will do it. (If not, restart it :-)

a restart solved the problem. thanks

Reply

Please enter an answer
Tips:

You can mention users to notify them: @username

You can use Markdown to format your question. For more examples see the Markdown Cheatsheet.

> I’m a blockquote.

I’m a blockquote.

[I'm a link] (https://www.google.com)

I'm a link

**I am bold** I am bold

*I am italicized* I am italicized

Community Code of Conduct