Help me diagnose a slow web server

Several times a week, my company sends out a mass email (through Mailchimp) to our subscribers. This results in a huge amount of traffic for our site, both from visits to the site and images in the email that are hosted by us.

The site runs on a custom PHP framework with MySQL and Apache2 on Ubuntu.

I've already tried to do a bit of optimization on the front end (according to YSlow guidelines) but the site grinds to a halt after the email goes out.

I also installed Munin a couple weeks ago to monitor the problem. You can see that I added some additional RAM this morning and restarted the server. I'm not sure why the "cache" portion of the "Memory usage" chart is always steadily climbing, but we were hitting the swap space briefly so I upped the RAM and rebooted at that point. After rebooting, I'm barely using any.

Here's the output of top:

top - 16:54:05 up  1:40,  1 user,  load average: 0.00, 0.04, 0.05
Tasks:  90 total,   1 running,  87 sleeping,   2 stopped,   0 zombie
Cpu(s):  0.4%us,  0.0%sy,  0.0%ni, 99.6%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   1668552k total,   258936k used,  1409616k free,    13996k buffers
Swap:   262140k total,        0k used,   262140k free,   101232k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                           
10190 www-data  20   0 36264  10m 4572 S    1  0.6   0:03.16 apache2                                                                                            
11534 www-data  20   0 37596  12m 5700 S    1  0.8   0:02.77 apache2                                                                                            
10857 www-data  20   0 37628  12m 5660 S    0  0.8   0:02.43 apache2                                                                                            
11532 www-data  20   0 36252 9.9m 4484 S    0  0.6   0:02.36 apache2                                                                                            
14854 www-data  20   0 35792  10m 5536 S    0  0.6   0:01.02 apache2                                                                                            
16189 root      20   0  2584 1072  820 R    0  0.1   0:00.03 top                                                                                                
    1 root      20   0  2728 1592 1204 S    0  0.1   0:00.81 init                                                                                               
    2 root      20   0     0    0    0 S    0  0.0   0:00.00 kthreadd                                                                                           
    3 root      20   0     0    0    0 S    0  0.0   0:00.00 ksoftirqd/0                                                                                        
    4 root      20   0     0    0    0 S    0  0.0   0:00.41 kworker/0:0                                                                                        
    5 root      20   0     0    0    0 S    0  0.0   0:00.08 kworker/u:0                                                                                        
    6 root      RT   0     0    0    0 S    0  0.0   0:00.00 migration/0                                                                                        
    7 root      RT   0     0    0    0 S    0  0.0   0:00.00 migration/1                                                                                        
    8 root      20   0     0    0    0 S    0  0.0   0:00.00 kworker/1:0                                                                                        
    9 root      20   0     0    0    0 S    0  0.0   0:00.01 ksoftirqd/1                                                                                        
   10 root      RT   0     0    0    0 S    0  0.0   0:00.00 migration/2                                                                                        
   11 root      20   0     0    0    0 S    0  0.0   0:00.00 kworker/2:0                                                                                        
   12 root      20   0     0    0    0 S    0  0.0   0:00.00 ksoftirqd/2                                                                                        
   13 root      RT   0     0    0    0 S    0  0.0   0:00.00 migration/3                                                                                        
   14 root      20   0     0    0    0 S    0  0.0   0:00.23 kworker/3:0                                                                                        
   15 root      20   0     0    0    0 S    0  0.0   0:00.00 ksoftirqd/3                                                                                        
   16 root       0 -20     0    0    0 S    0  0.0   0:00.00 khelper                                                                                            
   17 root      20   0     0    0    0 S    0  0.0   0:00.00 kworker/u:1                                                                                        
   21 root      20   0     0    0    0 S    0  0.0   0:00.00 xenwatch                                                                                           
   22 root      20   0     0    0    0 S    0  0.0   0:00.00 xenbus                                                                                             
  133 root      20   0     0    0    0 S    0  0.0   0:00.01 sync_supers                                                                                        
  135 root      20   0     0    0    0 S    0  0.0   0:00.00 bdi-default                                                                                        
  137 root       0 -20     0    0    0 S    0  0.0   0:00.00 kblockd                                                                                            
  148 root       0 -20     0    0    0 S    0  0.0   0:00.00 md

Apachetop after a minute:

last hit: 16:56:36         atop runtime:  0 days, 00:01:00             16:56:37
All:          149 reqs (   2.5/sec)       1632.9K (   27.7K/sec)      11.0K/req
2xx:     131 (87.9%) 3xx:      18 (12.1%) 4xx:     0 ( 0.0%) 5xx:     0 ( 0.0%)
R ( 60s):     149 reqs (   2.5/sec)       1632.9K (   27.2K/sec)      11.0K/req
2xx:     131 (87.9%) 3xx:      18 (12.1%) 4xx:     0 ( 0.0%) 5xx:     0 ( 0.0%)

   17  0.31 541.2  9.8*/static/img/emails/hfh-email.jpg
   16  0.27 330.3  5.6 /images/logo.png
    8  0.20  44.7  1.1 /deals/two-10oz-bags-of-sweet-potato-carrot-dog-treats
    5  0.08   5.1  0.1 /
    5  0.08   1.8  0.0 /account/logout
    4  0.07   2.2  0.0 /images/green-shadow-left.png
    4  0.07  53.7  0.9 /images/grass.jpg
    4  0.07  10.6  0.2 /css/tables.css
    4  0.08   6.2  0.1 /css/form.css  
    4  0.19 138.9  6.6 /scripts/compressed.js
    4  0.08  11.6  0.2 /favicon.ico
    3  0.05  24.9  0.4 /images/nav.png
    3  0.07 119.6  2.9 /static/team/2011/0505/13046351963269.jpg
    3  0.05   6.2  0.1 /images/refer_button.gif
    3  0.05   9.5  0.2 /images/get_daily_deals.png
    3  0.05   3.6  0.1 /images/social_twitter.png
    3  0.05   3.6  0.1 /images/social_fb.png
    3  0.05   5.3  0.1 /images/social_follow.png
    3  0.06   1.9  0.0 /account/register   
    3  0.06  10.6  0.2 /blueprint/screen.css
    3  0.05  15.0  0.3 /images/footer-grass.png
    3  0.06  16.4  0.3 /css/style.css
    3  0.06   3.0  0.1 /blueprint/print.css
    2  0.04   1.3  0.0 /images/treats-bar-top.png
    2  0.04  15.6  0.3 /images/buy-now-bg.png
    2  0.04   1.0  0.0 /images/deal_bg.png
    2  0.04   4.5  0.1 /images/social_receive.png
    2  0.03   3.4  0.1 /images/white-shadow-top.png
    2  0.04   1.4  0.0 /images/treats-bar-bottom.png
    2  0.06   2.8  0.1 /images/captive-go.png
    2  0.04  11.5  0.2 /images/deal_bottom.png
    2  0.03   1.6  0.0 /images/signup_short.png
    2  0.04   2.7  0.1 /account/
    2  0.22   5.7  0.6 /deals/buy
    2  0.03   1.6  0.0 /images/signup_shorter.png
    2  0.06  72.9  2.1 /static/team/2011/0505/13046351968837.jpg
    2  0.05  66.5  1.7 /static/team/2011/0505/13046351962970.jpg
    1  0.02   0.8  0.0 /order/pay
    1  0.02   5.5  0.1 /static/css/i/bg-dashboard-tab.gif
    1  0.03  52.0  1.3 /images/captive-white.jpg

The top file is the one that got sent out in the email. We can probably host that elsewhere, but whatever.


procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0  0      0 1406424  14264 101688    0    0     2     4   29   10  1  0 99  0

I don't know how to track down where all the slowness is coming from. It's not a very complex site from a DB perspective. Any leads?

Your CPU usage is low. You have plenty of free RAM. Your apache req/s is low.

Why do you say / how do you see that the website is slow?

Also, it's normal that the cache takes a lot of memory. Linux uses free memory as a cache, but that memory is still free for applications to allocate if needed.

To see you real memory consumption, look at the free column of the -/+ buffers/cache column in the sample output below:

$ free -m
             total       used       free     shared    buffers     cached
Mem:           490        410         79          0         18        238
-/+ buffers/cache:        153        337
Swap:          511         12        499

here the free mem is 337 and not 79

That's what I'm wondering too.

However, Yslow still gives lots of suggestions for your site when looking at this page:


Take a look at pagespeed too, it's another extension for Firefox.

I'd also consider making those dynamic pages into static pages for use in your email blast if that really nails the server. That way you're not hitting the DB for those requests.

Just some ideas.

The problem has already resolved itself as the traffic has died down. At its peak (shortly after the email campaign was sent), it would take up to 30 seconds for the homepage to load.

Watching the requests in Firebug, it's the images, CSS, and JS that were taking an extremely long time to load, which leads me to think it's not the DB that's the problem, but rather serving the actual resources over HTTP.

are you using apache2?

could you share the mpm-prefork or mpm-worker section of your apache2.conf?

is your bandwidth throttled?

2.5 req/sec is absolutely nothing… a machine with 512M of RAM can easily handle 100 req/sec

Thanks for all your help so far.

Here's my apache2 conf file. I didn't build the site, so there very well might be some crazy stuff in there…

Try setting MaxRequestsPerChild to something: … tsperchild">

Looks like with a setting of 0 the request will never die. I believe Apache's current default is 3000

Also, MaxClients looks like it was probably set for a 512 node. Since you have a lot more resources available to you now, maybe try bumping up MaxClients a little bit at a time and see how things go.

You can test your server, maybe from a separate Linode, using ab (Apache Bench)

Do you have any idea how many people are hitting the server when the email goes out?

For example:

ab -n 1000 -c 10

And perhaps take a look at this thread:

obs, posted some good insights there too.


And perhaps take a look at this thread:

obs, posted some good insights there too.

Woo fame! :D


