MySQL dies every 12-24 hours, requires manual restart
I am using 1GB Linode and running a single Wordpress website on it. Average traffic is about 100 visitors/300 views per day. Since day one I've been having "Error establish connection to database" issue that repeats normally every 24 hours. Sometimes it does not happen for 3-4 days, but inevitably comes back. I have to restart mysql through "sudo service mysql restart" in order to get the website online again.
As I started working on this issue, I installed Mysqltuner and found that my maximum possible memory usage was 170%. I adjusted config to make it 70%, then 50% and now 29%, but the issue is still coming back.
I made a screenshot of Longview exactly at the moment where the issue happened (you can see drop of CPU/memory usage afterwards). It also appears that there was a big surge in swap usage just before the crash.
Also attaching a screenshot of mysqltuner output, right after the restart I did this morning.
Any advice would be greatly appreciated, I've been struggling with this for 2 months now, but no matter how much I research online and try to adjust my Linode/MySQL config, the issue persists.
Disable the OOM killer.
Switch to nginx + PHP-FPM if you are still using Apache.
Add more swap.
Disable the OOM killer.
Switch to nginx + PHP-FPM if you are still using Apache.
Apache is perfectly capable of serving 300 page views a day with a typical WordPress setup.
But you're right to point your finger at Apache and the OOM killer. MySQL is probably an innocent victim in this case, since it is configured to use no more than 297M of RAM, and with only 21M of data, it will never reach the maximum anyway. The spike in memory usage and swap thrashing suggests that Apache ate all the RAM, leaving no room for MySQL. This is typical for an Apache+PHP installation with default settings.
OP: go into your Apache configuration and find a line that says MaxClients or MaxRequestWorkers, under the mpm_prefork section. (Depending on your Linux distribution, this might be in its own file named mpm something.) Reduce this setting to 10 or less. Also, find a setting that says KeepAlive and turn it off. Restart Apache. Hopefully it will stop trying to eat all your RAM anymore.
I have been setting up Apache following this guide:
(I am using Ubuntu)
So KeepAlive was already Off, but MaxClients was set to 30. I have changed it to 8.
Here are my current settings:
I have done a lot of research about configuring MySQL, as I was certain that it is responsible for crashes (I've been told in Linode chat). However, I did not even look into Apache configuration, other than following that Linode guide. Thanks a lot for pointing me in the right direction, I will learn more about configuring Apache now.
Hopefully MaxClients set to 8 will be enough to solve the issue.
P.S. Website crashed again this morning (before I adjusted config), same surge in swap usage in Longview charts. Apache count (with user "www-data") in process explorer was always 15-20, using at least 600MB of memory. After the suggested config adjustment total memory usage went from 819MB to 178MB.
We can't rewrite applications so that they only ask for the memory they really need.
But we can give them all the virtual memory they ask. Some of it in real RAM, some of it in swap.
If applications ask for double the amount of RAM they really need, the logical thing to do is to have as much swap as RAM.
If applications ask for three times the amount of RAM they really need, the logical thing to do is to have a swap partition to be twice the size of RAM.
Increasing swap and disabling the OOM killer will make the system to run out of memory when it's really out of memory, not before based on some strange algorithm.
Not having enough swap makes the system not to be used at its full potential.
This is regardless of any other consideration. I will not claim that increasing swap will solve the real or the "underlying" problem. But between having a system that runs a little bit slow and having a system which does not work because processes die randomly, I wonder why so many people choose a system which does not work.
between having a system that runs a little bit slow and having a system which does not work because processes die randomly, I wonder why so many people choose a system which does not work.
More realistically, the choice is between a system that occasionally gets VERY slow (because swapping on a virtual machine is VERY slow) and a system that works perfectly fine 99.99% of the time (because the memory usage of well-known apps is highly predictable).
To extend the airliner analogy from your link, I'm not gonna fly around with a bunch of tanker trucks parked on top of the wings just because my plane might run out of fuel. A good pilot knows roughly how much fuel he will need to fly from A to B with X passengers and Y tons of cargo, taking into account the current weather and a comfortable margin of error. A good sysadmin does the same, only with RAM instead of jet fuel.
To extend the airliner analogy from your link, I'm not gonna fly around with a bunch of tanker trucks parked on top of the wings just because my plane might run out of fuel.
I think this is no more a good analogy, because a bunch of tanker trucks over the wings would make the plane to have a lot more weight, which in turn would make it to need more fuel per hour while flying.
Swap does not work that way. Unused swap does not make the machine to go slower or to be more CPU or memory consuming,
so a little bit more of swap does not harm and makes the possibilify of crash a little bit more unlikely. It's just a safety net.
We don't want any of our passengers to be ejected in a random fashion. We want them all to reach their destination. So my suggestion about OOM killer and swap translate to disabling the eject mechanism and ensuring that the plane has enough fuel (even if most of the time you have a lot more fuel than needed). This will not prevent the plane to crash for other reasons, but at least lack of fuel in normal circumstances and the evil eject mechanism will no longer be the reason.
Both your inputs are greatly appreciated, thanks a lot for adding to the discussion! Even though simply reducing max apache count from 30 to 8 seemed to fix the issue (the system never before went so long without crashing) I am still reading with interest about OOM killer and swap usage in my server. I am not a sysadmin, and these are all new concepts to me, so I am really grateful that you are pointing me in the right direction to keep on learning how to keep my Linode stable at all times.
Flying analogy makes total sense, thanks a lot for the link Sanvila.
A bit offtopic, but would you be able to recommend any book/online resource that teaches server administration (Linux/Ubuntu) starting from the basics? I do not want to become advanced sysadmin, but I do want to know what I am doing on my server. So far I was able only to follow instructions with vague understanding why I am doing this or that.
Is it normal, however, that apache2 process count is always at 8 now? Doesn't it mean that it is always limited with my settings? When it was limited to 30, apache2 process count was normally at 16.
Yes, limiting the number of Apache processes that can run at the same time is the whole point of "MaxClients 8".
Of course, this also limits the number of people who can visit your site at the same time. Which is why I also recommended "KeepAlive off".
Normally, Apache keeps a connection open for a few seconds in case the visitor tries to request another page. But this ties up an entire process, like making everyone else wait in line at the cashier just so you can grab another carton of milk from the back. "Keepalive off", on the other hand, cuts off the connection as soon as the visitor has been served, so that they can't keep a connection open for any longer than they absolutely need to. Then your 8 processes are free to serve another batch of 8 visitors.
If a visitor wants to request another page, they'll have to go to the back of the queue and wait their turn. But since the server is now running much more efficiently, each visitor gets served very quickly and the queue moves quickly, too. Some visitors will have to wait a fraction of a second longer to be served, but the overall experience is improved for everyone. This is how you serve lots of visitors with a small server, at least with the typical Apache+mpmprefork+modphp setup. (The equations are different with nginx.)
This is how you serve lots of visitors with a small server, at least with the typical Apache+mpmprefork+modphp setup. (The equations are different with nginx.)
The bigger question is why people still use prefork on a small VPS to begin with. Unless your site has virtually no static files, Apache is basically going to fire up the PHP interpreter for every single request which is a terrible waste of resources. When I switched to PHP FPM and the worker mpm, I haven't looked back.
If you're memory bound, KeepAlive is probably better off but if you run a different mpm, at least then you can use KeepAlive to improve site performance.
nginx is even better in this respect though as hybinet points out, Apache is perfectly capable of running a small site but it needs to be tuned.
I would bet the reason you're seeing those spikes is most likely script kiddies trying to brute force your logins and otherwise hack your WordPress install. I host a few WP sites myself and I get thousands of these attempts per day. WordPress is the most attacked CMS so if you want to host it yourself, you need better tools to fight back. If you can, I'd recommend only allowing access to wp-admin/ by IP address. That'll cut down on the number of attempts tremendously.
nginx is easier to rate limit out of the box so you can more easily block or slow down these requests though so they never hit your backend and flood you server. Its fastcgi_cache is a godsend and if your WP site is mostly static pages anyway, you can really crank up the cache time value. If your site suddenly gets a lot of traffic, you don't have to do a single thing to Apache, since nginx is handling the request out of its static cache.