Optimize a 540: Tuning PHP/cgi+Apache/worker+APC+fcgid+MySQL
I'm running a Linode 540. It's currently running 32-bit OpenSuSE 11.1,
uname -a
Linux dev 2.6.27.29-0.1-xen #1 SMP 2009-08-15 17:53:59 +0200 i686 i686 i386 GNU/Linux
and its primary utilization is as a Drupal (v6.14) server – running on top of
* PHP 5.3.0 (fastcgi)
Apache 2.2.13/worker-mpm
mod_fcgid (apache svn/trunk r816755, == '2.3.2-dev')
APC 3.1.3p1
MySQL v5.1.36.</list>
Details of my current config follow below. I've yet to find a resource that discusses this particular scenario. And, even for similar ones, for each suggestion to "do X", there's another site suggesting to do exactly the opposite. Translated – "much voudou and mojo required!" :-/ That said, it works -- so far -- well enough.
In my experience, this is, generally, NOT an uncommon config. What IS less common is running it all on a RAM/CPU-limited Linode 540 … rather than, e.g., a standalone box with 4-dedicated CPU cores & 8 GB RAM.
So, "The Question" is … for a "small-to-moderate" (yes, that's subjective …) site, running on a Linode 540, how can/should the PHP/Apache/FCGId config be "optimized" to get the most bang for the buck?
I'll be very interested in any/all comments, suggestions, experience, etc for doing this right @ Linode. Yes, it seems that these configs are very usage-specific, and ultimately benchmarking is, of course, needed. Hopefully, though, others will share some interest here.
Thanks!
php-cgi5 -v
PHP 5.3.0 (cgi-fcgi) (built: Sep 8 2009 16:47:38)
Copyright (c) 1997-2009 The PHP Group
Zend Engine v2.3.0, Copyright (c) 1998-2009 Zend Technologies
with Xdebug v2.0.5, Copyright (c) 2002-2008, by Derick Rethans
with Suhosin v0.9.29, Copyright (c) 2007, by SektionEins GmbH
httpd2 -V
Server version: Apache/2.2.13 (Linux/SUSE)
Server built: Aug 10 2009 02:14:02
Server's Module Magic Number: 20051115:23
Server loaded: APR 1.3.8, APR-Util 1.3.9
Compiled using: APR 1.3.8, APR-Util 1.3.9
Architecture: 32-bit
Server MPM: Worker
threaded: yes (fixed thread count)
forked: yes (variable process count)
Server compiled with....
-D APACHE_MPM_DIR="server/mpm/worker"
-D APR_HAS_SENDFILE
-D APR_HAS_MMAP
-D APR_HAVE_IPV6 (IPv4-mapped addresses enabled)
-D APR_USE_SYSVSEM_SERIALIZE
-D APR_USE_PTHREAD_SERIALIZE
-D SINGLE_LISTEN_UNSERIALIZED_ACCEPT
-D APR_HAS_OTHER_CHILD
-D AP_HAVE_RELIABLE_PIPED_LOGS
-D DYNAMIC_MODULE_LIMIT=128
-D HTTPD_ROOT="/srv/www"
-D SUEXEC_BIN="/usr/sbin/suexec2"
-D DEFAULT_PIDLOG="/var/run/httpd2.pid"
-D DEFAULT_SCOREBOARD="logs/apache_runtime_status"
-D DEFAULT_ERRORLOG="/var/log/apache2/error_log"
-D AP_TYPES_CONFIG_FILE="/etc/apache2/mime.types"
-D SERVER_CONFIG_FILE="/etc/apache2/httpd.conf"
mysql -V
mysql Ver 14.14 Distrib 5.1.36, for suse-linux-gnu (i686) using readline 5.2
cat server-tuning.conf
<ifmodule mod_worker.c="">StartServers 1
MinSpareServers 1
MaxSpareServers 2
MaxClients 8
ThreadsPerChild 4
MinSpareThreads 4
MaxSpareThreads 8
MaxRequestsPerChild 2000
ThreadLimit 64
ThreadStackSize 1048576</ifmodule>
KeepAlive On
MaxKeepAliveRequests 10
KeepAliveTimeout 120
EnableMMAP off
EnableSendfile off
LimitRequestBody 1048576
AddDefaultCharset utf-8
AddEncoding x-compress .Z
AddEncoding x-gzip .gz .tgz
cat conf.d/mod_fcgid.conf
<ifmodule mod_fcgid.c="">Options +ExecCGI
PHP_Fix_Pathinfo_Enable 1
SharememPath /var/cache/apache2/fcgid_shm
SocketPath /var/cache/apache2
BusyScanInterval 120
BusyTimeout 300
DefaultInitEnv PHP_FCGI_CHILDREN 1
DefaultInitEnv PHP_FCGI_MAX_REQUESTS 1000
DefaultMaxClassProcessCount 2
DefaultMinClassProcessCount 2
ErrorScanInterval 3
IdleTimeout 300
IdleScanInterval 120
IPCCommTimeout 120
IPCConnectTimeout 60
# MaxProcessCount 2000
OutputBufferSize 64
ProcessLifeTime 3600
SpawnScore 1
SpawnScoreUpLimit 10
TerminationScore 2
ZombieScanInterval 3</ifmodule>
cat vhosts.d/master.conf
...
Include /etc/apache2/conf.d/mod_fcgid.conf
...
AddHandler fcgid-script .php
FCGIWrapper "/usr/bin/php-cgi5 -d apc.shm_size=25 -c /etc/php5/fastcgi/" .php
...
cat /etc/php5/conf.d/apc.ini
extension=apc.so
apc.enabled="1"
apc.cache_by_default="1"
apc.shm_segments="1"
apc.ttl="7200"
apc.user_ttl="7200"
apc.gc_ttl="3600"
apc.num_files_hint="1024"
apc.mmap_file_mask="/tmp/apc.XXXXXX"
apc.enable_cli="0"
apc.slam_defense="0"
apc.file_update_protection="2"
apc.max_file_size="1M"
apc.stat="1"
apc.write_lock="1"
apc.report_autofilter="0"
apc.include_once_override="0"
apc.rfc1867="0"
apc.rfc1867_prefix="upload_"
apc.rfc1867_name="APC_UPLOAD_PROGRESS"
apc.rfc1867_freq="0"
apc.localcache="0"
apc.localcache.size="512"
apc.coredump_unmap="0"
13 Replies
@PG1326:
DefaultMaxClassProcessCount 2
DefaultMinClassProcessCount 2
This is the only part that jumps out at me as being possibly bad.
If I understand correctly, your site is almost entirely Drupal-based, but here you're effectively limiting yourself to two simultaneous Drupal requests. Your 540 can handle more than that!
I wouldn't set a maximum of less than 4, and I'd suggest trying at least 8 and seeing how well it works. I also wouldn't have it pre-spawn less than 4.
And remember, even though you're sharing the host with other people, you can use up to four full CPU cores if it's available, and it often will be.
@nknight:
If I understand correctly, your site is almost entirely Drupal-based,
that's correct (for the moment …)
@nknight:
but here you're effectively limiting yourself to two simultaneous Drupal requests. Your 540 can handle more than that!
I wouldn't set a maximum of less than 4, and I'd suggest trying at least 8 and seeing how well it works. I also wouldn't have it pre-spawn less than 4.
per your suggestion, i've changed to
DefaultMaxClassProcessCount 8
DefaultMinClassProcessCount 4
and, so far, no problems.
@nknight:
And remember, even though you're sharing the host with other people, you can use up to four full CPU cores if it's available, and it often will be.
that implies that core usage is controllable by me … what specific setting are you referring to?
Fwiw, the largest 'effect' I've seen so far in terms of memory usage / performance is from
reducing swapiness,@ /etc/sysctl.conf
vm.swappiness = 0
and adding a cronjob to flush caches
45 * * * * sync; echo 3 > /proc/sys/vm/drop_caches
atm, that results in
free -m
total used free shared buffers cached
Mem: 546 171 374 0 2 81
-/+ buffers/cache: 87 459
Swap: 1023 0 1023
which appears encouraging …
I'm still seeing a not insignificant startup delay @ first access to the site. I think this has to do with preload of PHP children by fcgid … namely, i've
DefaultInitEnv PHP_FCGI_CHILDREN 1
DefaultInitEnv PHP_FCGI_MAX_REQUESTS 1000
and wonder if this is right for this usage … thoughts?
thanks!
@nknight:
I also wouldn't have it pre-spawn less than 4.
checking on the pre-spawning, i find
which purports,
> > YOU CAN NOT PRESPAWN FCGID.
>
mod-fastcgi has an option to do this but mod-fcgid does not.
i'll dig some more … but, afayk, has that changed?
"Re: [Mod-fcgid-users] Spawning explanation"
there's an interesting discussion about setting DefaultMaxClassProcessCount & DefaultMinClassProcessCount (modfcgid forking) "versus" PHPFCGI_CHILDREN (php forking).
iiuc, php forking doesn't work (well?) with mod_fgid.
BUT, if PHPFCGICHILDREN is not used (unclear whether that's ==1, ==0, or really rm'd) and instead forking control is handed over to mod_fcgid, one loses shared memory capability (APC, Xcache, etc).
is that really true? investigating further …
FastCGI with a PHP APC Opcode Cache
reading,
> "… The maxClassProcesses option is very important: it tells FastCGI to only spawn one php-cgi process regardless of how many requests are pending. Remember that our PHP process will spawn its own children, so FastCGI only needs to spawn one. Until this APC bug is fixed, this is necessary to allow sharing the APC cache among children."
he's set
maxClassProcesses 1
checking @ the referenced bug,
http://pecl.php.net/bugs/bug.php?id=11988
it's unclear as to whether apc 3.1.3p1 (which i use) has a solution as yet; a comment therein does refer to an external solution,
PHP-FPM: PHP FastCGI Process Manager
PHP-FPM is a patch for PHP4/5 to greatly improve PHP's FastCGI SAPI capabilities and administration
http://php-fpm.org/WhatisPHP-FPM
but at least as of Aug 6, 2009
http://michaelshadle.com/category/php-fpm/
is not php 530-friendly …
so, if we believe THIS scenario, it seems,
DefaultInitEnv PHP_FCGI_CHILDREN 5
DefaultInitEnv PHP_FCGI_MAX_REQUESTS 500
DefaultMaxClassProcessCount 1
DefaultMinClassProcessCount 1
is a best/required config for APC to be useful.
i think :-/
@PG1326:
@nknight:And remember, even though you're sharing the host with other people, you can use up to four full CPU cores if it's available, and it often will be.
that implies that core usage is controllable by me … what specific setting are you referring to?
No, what nknight means is that you have access to four cores on the host machine. At a minimum, you are guaranteed a proportional amount of CPU time (e.g., if every Linode on the same host were running full bore something like while true; do :; done). However, if other Linodes on your host are using less than their full allotment (which they are most of the time), you can use more than your full allotment up to 400% CPU. No configuration is necessary on your part (although the application must be either multi-threaded or split among multiple processes to be able to take advantage of more than one core).
Sorry, can't be of any help on the FCGI stuff.
It's been quite a while since I dealt much with either of the Apache FastCGI modules (these days I can usually get away with alternate solutions that let me use e.g. mod_php/python/perl), so the advice that you've found about how exactly to configure the number of processes stands a better chance of being right than me.
It seems you've worked out a good configuration (or at least a working one you can keep tweaking if needed) using PHP's built-in facilities, but out of curiosity I went hunting a bit and it does seem that mod_fcgid indeed has no ability to pre-spawn processes.
It looks like DefaultMinClassProcessCount 'n' just means that it will never kill processes if the number of processes is 'n' or less. This is something of an oddity, particularly in the context of Apache, where normally such an option would mean that 'n' processes would always be running, even at initial startup.
That isn't exactly an optimization of Apache, but unless you need some functionality of apache that can't be replicated with lighty or nginx, you'd probably save some RAM; you're already paying the overhead of fcgi.
The reason most people here frequently recommend Lighttpd/nginx is because they fit in the amount of memory provided by a linode quite well out-of-the-box. For example, lighttpd is a uniprocess server, and so doesn't need to have various thread/process related things tweaked. Most tweaking of lighttpd performance has to do with just picking the right max connections setting, and the default is usually good for most cases anyhow.
@PG1326:
Fwiw, the largest 'effect' I've seen so far in terms of memory usage / performance is from
reducing swapiness,@ /etc/sysctl.conf
vm.swappiness = 0
and adding a cronjob to flush caches
45 * * * * sync; echo 3 > /proc/sys/vm/drop_caches
atm, that results in
free -m total used free shared buffers cached Mem: 546 171 374 0 2 81 -/+ buffers/cache: 87 459 Swap: 1023 0 1023
which appears encouraging …
Is this really improving performance for you? Far from encouraging, I'd call that very discouraging. You're flushing out a lot of cached disk reads, which is only going to hurt performance later on.
You're only using 87 MB of RAM, might as well let the system use the rest of it to do something useful.
~JW
@Guspaz:
If you're using fastcgi, you may want to look into lighttpd or nginx
unfortunately, drupal is still a kludge (mod_rewrite? more?) on both lighttpd and nginx … _not_ using apache adds too many 'gotchas' (for now …)
@JshWright:
Far from encouraging, I'd call that very discouraging.
encouraging only in the sense i'd misunderstood linux caching :-/
@JshWright:
You're flushing out a lot of cached disk reads, which is only going to hurt performance later on.
You're only using 87 MB of RAM, might as well let the system use the rest of it to do something useful.
you're absolutely right here. i stopped using that "sync; echo 3 > /proc/sys/vm/drop_caches".
i stepped back and took a look at the whole system, rather than just the (supposed) apc perfomance.
realized that Apache's 'fat', mod*cache & modssl are sloooow, and learned that drupal's use of caching is underperforming …
that said, i switched:
drupal 6.14 -> pressflow-6
installed drupal CacheRouter module, config'd for APC
installed Pound as front-end/proxy, installed SSL certs there
got rid of Apache mod_ssl, mod_cache, mod_file_cache & mod_disk_cache
installed Varnish as a caching , reverse proxy
with that config, i'm starting to do some local hammering with httperf,
httperf --hog --http-version=1.1 \
--server=my.site.com --uri=/info.php \
--port=443 --ssl --ssl-no-reuse --ssl-ciphers=AES256-SHA \
--send-buffer=4096 --recv-buffer=16384 \
--num-calls=10 --num-conns=5000 --timeout=5 --rate=10
shows a decent CPU load,
top - 18:35:55 up 21:58, 3 users, load average: 2.29, 8.01, 13.52
Tasks: 102 total, 2 running, 100 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.4%us, 0.3%sy, 0.0%ni, 97.8%id, 1.5%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 559976k total, 330100k used, 229876k free, 3028k buffers
Swap: 1048568k total, 274004k used, 774564k free, 56232k cached
mem's ok – much better, relatively, than before,
free -m
total used free shared buffers cached
Mem: 546 322 224 0 2 55
-/+ buffers/cache: 263 282
Swap: 1023 267 756
and, httperf returns
`Maximum connect burst length: 1
Total: connections 5000 requests 49970 replies 49966 test-duration 499.964 s
Connection rate: 10.0 conn/s (100.0 ms/conn, <=109 concurrent connections)
Connection time [ms]: min 50.6 avg 485.9 max 14368.1 median 67.5 stddev 1537.5
Connection time [ms]: connect 102.5
Connection length [replies/conn]: 9.995
Request rate: 99.9 req/s (10.0 ms/req)
Request size [b]: 89.0
Reply rate [replies/s]: min 34.2 avg 99.9 max 223.8 stddev 18.9 (99 samples)
Reply time [ms]: response 28.7 transfer 9.4
Reply size [b]: header 266.0 content 75975.0 footer 0.0 (total 76241.0)
Reply status: 1xx=0 2xx=49966 3xx=0 4xx=0 5xx=0
CPU time [s]: user 156.91 system 308.77 (user 31.4% system 61.8% total 93.1%)
Net I/O: 7449.6 KB/s (61.0*10^6 bps)
Errors: total 4 client-timo 4 socket-timo 0 connrefused 0 connreset 0
Errors: fd-unavail 0 addrunavail 0 ftab-full 0 other 0`
this is @ access of a verbose php page (), not reusing SSL session IDs, and hence renegotiating ...
if i switch to a 'lightweight' html page as --uri taget, i'm seeing request rates ~ 400 req/s.
with the 'old' apache config i'd had in place, i couldn't even get 10 req/s consistently ...
note that, atm, this is testing from the linode itself ... so performance is tainted by the load of the testing executable itself.
now, to figure out how the rates i _am_ seeing compare to 'norms' ... for linodes (somebody 'in her' has to have checked at some point ...), and drupal in general.
oh, and, also did some mysql tweaking:
put mysql /tmp in tmpfs (@ /etc/fstab)
`~~[code]~~tmpfs /tmp/mysqltmp tmpfs rw,gid=105,uid=105,size=128M,nosuid,nodev,noexec,nr_inodes=10k,mode=0700 0 0<e>[/code]</e>`
monkeyed with (and continue to ...) cache, query, thread & buffer sizes
increased thread_concurrency from 2-> 8 (i.e., 2x # of CPUs)
all together, a flash-heavy, dynamic php page that had taken ~ 15 secs to load is coming up in ~1-2 secs now ...
still more room to improve, i suspect ....[/s][/b][/b]
@PG1326:
drupal is still a kludge (mod_rewrite? more?) on both lighttpd and nginx … _not_ using apache adds too many 'gotchas' (for now …)
Personally, I've had no problems at all running Drupal over Nginx. If you're only after clean urls, this is fairly trivial – just use something like the following:
if (!-e $request_filename) {
rewrite ^/(.*)$ /index.php?q=$1 last;
break;
That said, it seems like you're pretty set on apache
For the record, and using your httperf params from above (no real optimization, no apc) accessing a page containing phpinfo() over ssl, on a 360 also running mysql, php-fastcgi, spamd and exim4:``
Total: connections 5000 requests 50000 replies 50000 test-duration 499.990 s
Connection rate: 10.0 conn/s (100.0 ms/conn, <=2 concurrent connections)
Connection time [ms]: min 50.0 avg 87.8 max 175.2 median 87.5 stddev 4.4
Connection time [ms]: connect 40.8
Connection length [replies/conn]: 10.000
Request rate: 100.0 req/s (10.0 ms/req)
Request size [b]: 105.0
Reply rate [replies/s]: min 99.4 avg 100.0 max 100.4 stddev 0.1 (100 samples)
Reply time [ms]: response 1.9 transfer 2.8
Reply size [b]: header 182.0 content 52920.0 footer 2.0 (total 53104.0)
Reply status: 1xx=0 2xx=50000 3xx=0 4xx=0 5xx=0
CPU time [s]: user 150.35 system 333.25 (user 30.1% system 66.7% total 96.7%)
Net I/O: 5196.2 KB/s (42.6*10^6 bps)
Errors: total 0 client-timo 0 socket-timo 0 connrefused 0 connreset 0
Errors: fd-unavail 0 addrunavail 0 ftab-full 0 other 0`
And accessing a (thinly modified) default drupal front page, still with the above httperf settings:`[code]Total: connections 5000 requests 50000 replies 50000 test-duration 500.174 s
Connection rate: 10.0 conn/s (100.0 ms/conn, <=21 concurrent connections)
Connection time [ms]: min 222.8 avg 290.5 max 2257.2 median 274.5 stddev 123.4
Connection time [ms]: connect 19.2
Connection length [replies/conn]: 10.000
Request rate: 100.0 req/s (10.0 ms/req)
Request size [b]: 95.0
Reply rate [replies/s]: min 81.8 avg 100.0 max 117.8 stddev 2.6 (100 samples)
Reply time [ms]: response 27.1 transfer 0.0
Reply size [b]: header 487.0 content 6666.0 footer 2.0 (total 7155.0)
Reply status: 1xx=0 2xx=50000 3xx=0 4xx=0 5xx=0
CPU time [s]: user 103.38 system 295.68 (user 20.7% system 59.1% total 79.8%)
Net I/O: 707.6 KB/s (5.8*10^6 bps)
Errors: total 0 client-timo 0 socket-timo 0 connrefused 0 connreset 0
Errors: fd-unavail 0 addrunavail 0 ftab-full 0 other 0`[/s][/b][/b][/code][/s][/b][/b]
```
@nknight:
@PG1326:And remember, even though you're sharing the host with other people, you can use up to four full CPU cores if it's available, and it often will be.
Be careful with this. I don't remember exactly how xen works with this, but most of the time VM hosts will process al VM client cpu "slices" synchronously. Meaning that if you have 4 v cpus setup, the host will wait to process that vm till it has room on all 4 real cpus. On vmware a lot of things actually end up running better with 1/2 "cores" on 4-16 core machines when they used to run on 4-8core hardware pre virtualization.