Ways to monitor Disk IO and Traffic to pinpoint an error

general

Hi!

Ok so I'm still a total Linux novice even though I've been running my linodes for a couple of years now. Lately (as in the last couple of days), I've been having the occasional massive CPU spikes, my disk IO rate has been far more active and there seems to be a lot of transfer spikes. Most of these happen when I'm not paying attention (not that I spend my watching TOP scroll by :P) and don't really happen at any regular intervals. I have about 15 wordpress sites and a couple of static html sites on the server.

What are some methods or tools that I can use to try and pinpoint what causes the increase load? I would love to also be able to see exactly which group of files are being transferred. It is possible that I suddenly got a bunch more traffic but the spikes are a bit worrying.

Also, I already checked the Apache logs to see if there was anything out of the ordinary and I made a couple of corrections to APC but that was about it. I don't know if there is a program I can run for a day and then check all through logs to see whats going on.

So I'm running:

Debian 6

Apache 2.2.16

PHP 5.3.3-7+squeeze15

Thanks for the help!

15 Replies

forum:Azathoth 11 years, 7 months ago

Basically 'sar'. Also try 'psacct' (process accounting), I think it's 'acct' on Debian. For realtime viewing try iostat, iotop, mtop if running MySQL…

forum:Tamerax 11 years, 7 months ago

thanks. i don't suppose any of those logs actual file names that are being sent?

also, since i made my change to APC today, my IO load is way higher. At what point should I be concerned about it?

forum:Azathoth 11 years, 7 months ago

You can check files accessed by a process with lsof, and ftop to see the changes. You could also try apachetop and run it in parallel with iotop to see what's going on. There is no single tool that can give you the answers you seek, so you might wish to script various diagnostic reports and run them when the load gets above certain threshold (or iops, or cpu or whatever).

How are you running PHP? How big are these spikes? Can you give some numbers?

forum:Tamerax 11 years, 7 months ago

I don't know exactly how to answer those questions. Here is my graph. What I can tell you is that around 3 is when I made those changes to my APC config.

~~![](" />~~

forum:Azathoth 11 years, 7 months ago

Without knowing your content, I'd make an assumption about a "typical" wordpress site and say the traffic graph looks normal. Things sometimes come together resulting with spikes. Crawlers, visitors… I assume the 20h incoming spike is something uploaded by you?

You really need to check the access logs for the times during those spikes and see what was accessed.

As for the disk IO increase, well, since you say you tampered with APC I'd say that's the cause. I'd hazard a guess and say you reduced APC memory and it went from swapping to disk IO because the PHP files have to be purged and reloaded between requests. But that's really a lot of assumption.

You should post your APC config.

forum:danblack 11 years, 7 months ago

or grab the apc.php in the doc directory associated with the install, put that in your docroot, and see how much cache you're actually using.

forum:Tamerax 11 years, 7 months ago

my apc ini:

extension=apc.so

apc.enabled=1

apc.shm_segments=1

;32M per WordPress install

apc.shm_size=1024M

;Relative to the number of cached files (you may need to watch your stats for a day or two to find out a good number)

apc.numfileshint=7000

;Relative to the size of WordPress

apc.userentrieshint=4096

;The number of seconds a cache entry is allowed to idle in a slot before APC dumps the cache

apc.ttl=7200

apc.user_ttl=7200

apc.gc_ttl=3600

;Setting this to 0 will give you the best performance, as APC will

;not have to check the IO for changes. However, you must clear

;the APC cache to recompile already cached files. If you are still

;developing, updating your site daily in WP-ADMIN, and running W3TC

;set this to 1

apc.stat=1

;This MUST be 0, WP can have errors otherwise!

apc.includeonceoverride=0

;Only set to 1 while debugging

apc.enable_cli=0

;Allow 2 seconds after a file is created before it is cached to prevent users from seeing half-written/weird pages

apc.fileupdateprotection=2

;Leave at 2M or lower. WordPress does't have any file sizes close to 2M

apc.maxfilesize=2M

apc.cachebydefault=1

apc.userequesttime=1

apc.slam_defense=0

apc.mmapfilemask=/tmp/apc.XXXXXX

apc.stat_ctime=0

apc.canonicalize=1

apc.write_lock=1

apc.report_autofilter=0

apc.rfc1867=0

apc.rfc1867prefix =upload

apc.rfc1867name=APCUPLOAD_PROGRESS

apc.rfc1867_freq=0

apc.rfc1867_ttl=3600

apc.lazy_classes=0

apc.lazy_functions=0

That spike at 20h is nothing I did. It's one of the things I'm concerned about and trying to track down.

forum:Tamerax 11 years, 7 months ago

Here is the current output from my APC.PHP

~~![](<URL url=)http://pclennox.com/pic2.jpg" />

The thing is, I don't really know what it SHOULD look like. I'll take any and all advice at this point :)

Sorry…newbie moments…~~

forum:jasonlitka 11 years, 7 months ago

Dude… That is a lot of stuff cached in APC considering that you're only seeing 26 requests/s. At that rate I'm wondering whether or not you might be better off with the 1GB of RAM for other things.

forum:Azathoth 11 years, 7 months ago

I wonder what did you change in the APC config? And I suppose you checked the relevant error logs? Any of them filling up quickly? Other than that, you should check iotop and maybe even ftop for the most io active process.

forum:danblack 11 years, 7 months ago

At least prune it by 300M

forum:ruario 11 years, 7 months ago

iotop and iftop might come in handy

forum:Tamerax 11 years, 7 months ago

~~@Azathoth:~~

I wonder what did you change in the APC config? And I suppose you checked the relevant error logs? Any of them filling up quickly? Other than that, you should check iotop and maybe even ftop for the most io active process.

All I did was change the from the default apc setup.

forum:Tamerax 11 years, 7 months ago

~~@danblack:~~

At least prune it by 300M
can you explain a little bit more what you are referring too?

forum:danblack 11 years, 7 months ago

~~@Tamerax:~~

~~@danblack:~~

At least prune it by 300M
can you explain a little bit more what you are referring too?

The amount of unused apc memory allocation that I think I read wrong.

Reply

Description

Please enter an answer

Tips:

You can mention users to notify them: @username

You can use Markdown to format your question. For more examples see the Markdown Cheatsheet.

> I’m a blockquote.

I’m a blockquote.

[I'm a link] (https://www.google.com)

I'm a link

**I am bold** I am bold

*I am italicized* I am italicized

Compute

Storage

Networking

Databases

Services

Developer Tools

Industries

Pricing

Community

Engage With Us

Ways to monitor Disk IO and Traffic to pinpoint an error

15 Replies

Reply

Tips: