Transient server errors: "Error establishing a database connection"
Hi everyone, thanks for the help.
I have a bunch of small, low-traffic, non-commercial / hobbyist WordPress websites that were on shared cPanel hosting for many years, and I've recently moved them all to Linode. Thanks to the outstanding Linode guides and a few weeks of breaking things, googling, tinkering, fixing, &c. I've managed to get everything mostly done.
They are all also using Cloudflare's free plan for SSL.
Now that the sites are on my Linode, I'm starting to face an extremely annoying and worrying issue: transient server errors.
In waves, every 10-30 minutes or so, I'm getting notifications via Jetpack's monitoring service that my sites are down. When I try to access them, I get a blank "Error establishing a database connection" message.
Usually as I try to dig in and solve the problem, the sites just seem to come back on their own - outages are lasting around maybe between five minutes and twenty minutes, but repeating many times in a row over the course of now about a week.
Full text of the error:
Error establishing a database connection
This either means that the username and password information in your wp-config.php file is incorrect or we can’t contact the database server at localhost. This could mean your host’s database server is down.
Are you sure you have the correct username and password?
Are you sure you have typed the correct hostname?
Are you sure the database server is running?
If you’re unsure what these terms mean you should probably contact your host. If you still need help you can always visit the WordPress Support Forums.
It's a WordPress error, which means that the request is reaching WordPress but the WordPress app is failing in some way (I think).
I don't think the issue could be a misconfigured wp-config.php, because the sites keep coming back and then disappearing after a few minutes without me touching wp-config.php. And whenever I'm not getting errors, they work completely fine.
Could apache or mysql on my Linode be constantly crashing and then restarting? With no involvement or intervention by me? How would I diagnose that, and what would cause it?
When I do curl -I https://example.com/, I get a 500 status code:
HTTP/1.1 500 Internal Server Error
Date: Sat, 23 May 2020 17:17:16 GMT
Content-Type: text/html; charset=UTF-8
Expires: Wed, 11 Jan 1984 05:00:00 GMT
Cache-Control: no-cache, must-revalidate, max-age=0
Expect-CT: max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
That seems to indicate that the errors are being served by Cloudflare, not by WordPress, so I'm not really sure whether to try looking for Cloudflare issues, WordPress issues, or Linode issues.
If you’re seeing the WordPress error page, that’ll be being served from your Linode, not Cloudflare.
As you’re Linode is returning a 500 error, I doubt CF would cache it so I’m reasonably confident you should be looking at your Linode.
How low is “low traffic”? I’ve seen cases with MySQL in Ubuntu where although max_connections is set high in the MySQL config, the OS is limiting the number of open files/sockets MySQL can use, and reduces that number to the seemingly random value of 214.
At these periods of downtime could your website be handling upwards of 214 concurrent visitors?
Have you checked out your MySQL error log to see if there is anything logged in there?
Alternatively, when this happens again, you could try enabling debugging in WordPress then visit your site to see the actual error rather than WordPress’ “friendly” page.
@andysh Thank you - when I say low traffic, I mean on the order of a few visits a month per site. 214 visits for all the sites combined over a month would probably be on the high end.
I think you're right about the MySQL error log - I'm looking at it now and it's a sea of entries like this:
2020-05-23T00:00:47.161466Z 31 [ERROR] [MY-013134] [Server] Table './example/example_options' is marked as crashed and should be repaired
2020-05-23T00:02:10.593332Z 43 [ERROR] [MY-013134] [Server] Table './example/example_options' is marked as crashed and last (automatic?) repair failed
I'm looking for a Linode guide that addresses issues like this but not finding one - can you point me in the right direction?