Combating referrer spam
I'm currently on a Linode 512 using 32-bit Ubuntu 10.04.3 LTS, hosting several sites, most of which are Wordpress installations. I have gone through several tuning tutorials to ensure that my LAMP stack is configured appropriately. Additionally, the Wordpress installations are caching with Xcache, courtesy of W3 Total Cache.
I believe Apache is being hammered by referrer spam bots and I can't seem to control it, even when using fail2ban. What else can I do to ascertain what the true cause of the issue is and how to handle it.
This setup has worked quite well for some time, however, one of the Wordpress installations is being targeted by referrer spam, which I see in Apache's access.log for that particular vhost. I've taken a look at access.log and see all the referrer spam, though the number of requests/sec doesn't appear to be substantial enough to bring the server down (generally 250-700 per hour). I've dug a bit deeper in access.log, using awk to get a list of offending referrers and the IPs that are submitting them. I've tried banning several B-class IP ranges in iptables via ufw to prevent the connections from even reaching Apache. Furthermore, I've installed fail2ban and have enabled the default Apache protection configuration (along with SSH).
However, from time-to-time (almost daily), the site still gets hammered, spawning lots of Apache processes, chewing up available memory, leading to lots of swapping, which throws the IO requests through the roof (10K+ blocks/sec). This is reflected in the Linode dashboard graphs and brings the server to a grinding halt so much that I can barely SSH in. When I have been able to get in, I've validated with
iotop that apache2 is indeed thrashing.
Generally, my only course of action is to bounce the entire box, which resolves the issue until the IO's shoot through the roof again. I can't seem to pin-down a particular time-of-day that this occurs, but it's generally in the morning if it happens. I've tried validating that the server is running efficiently by running
ab -n 1000 -c 10 <url url="http://www.theproblematicsite.com/">http://www.theproblematicsite.com/</url> and the results look extremely promising (though, potentially skewed due to running on localhost). It's reporting 585 requests/sec with the longest request at 61ms.
At this point, I have no recourse except to monitor iotop and attempt to either bounce Apache (which I don't know is or is not effective since I usually can't even SSH in) or the whole server. I can throw this together and schedule it with cron, but it's a band-aid on a bullet wound.
This all has led me to here, to see if there's anything else I can do to ascertain what the root cause of the issue is and how to handle it.
I imagine you're using apache prefork which chews up ram even for static files.
If you want more advanced solutions you can switch to mpm-worker or nginx and split the webserver and php so static files don't load the php binary.
* Yes, I'm using prefork. I've entertained the idea of testing out Nginx, but I don't want to use the nuke option if unnecessary.
As for static files, I'm using Amazon CloudFront for most static data (images, minified CSS, etc.) on the most traffic-heavy site, though this isn't the site being spammed. I can enable this on the problematic site, but I'm hesitant to bless this as The Fix
I'm going to try and use apachetop to monitor the server when it gets hammered, but it would be easier if I could schedule via cron, a dump of results to a text file. However, I don't see a non-interactive/batch mode like I do with iotop. Is there a way to emulate this behavior so that I can see results retroactively?