fastcgi throws 502 gateway errors on nginx after a few days

I have a mystery and I'm not sure how to solve it. My 360 Linode is running several Wordpress-based sites using nginx. Everything is great, except I find that, every few days, fastcgi fails and I start getting 502 gateway errors when I try to post a comment or upload an image, etc. Restarting fastcgi fixes the problem.

One thing that I noticed is that running free shows a little swap space used when this crops up. After restarting fastcgi, the swap frees up.

For fastcgi, I'm using the standard tutorial recommendation – it's php-cgi running in external fastcgi mode.

This is my config file:

#
# Settings for php-cgi in external FASTCGI Mode
#

# Should php-fastcgi run automatically on startup? (default: no)

START=yes

# Which user runs PHP? (default: www-data)

EXEC_AS_USER=

# Host and TCP port for FASTCGI-Listener (default: localhost:9000)

FCGI_HOST=localhost
FCGI_PORT=9000

# Environment variables, which are processed by PHP

PHP_FCGI_CHILDREN=5
PHP_FCGI_MAX_REQUESTS=1000

So, here's what I'm wondering: What could this be? What should I modify to try to fix this? Why is fastcgi dying on me?

I'm pretty good about figuring things out on my own, but I haven't been able to get anywhere through Google searches and I'm just not sure where to start. Thanks in advance for any tips you can throw my way!

13 Replies

I did a little more research and I'm going to try raising my PHPFCGICHILDREN to 15.

It sort of sounds like the problem could be that 4 worker_processes with 1024 connections each is just too much for 5 fcgi children.

I'm curious as to these ratios also – keep us posted on how you get on ! ;)

Which distribution and which PHP version are you using?

When I was on Ubuntu 8.04 LTS with PHP 5.2.4, it has heaps of problems and sometimes fastcgi would die multiple times a day depending on the load. Changed to Debian 5 with PHP 5.2.6 and it's now fine. 70k-100k page views per day (all executed through PHP) and 5 FCGI children is enough.

Yeah, I'm using Hardy. Debian 5? You found an older distro to be the answer? Huh. Interesting. If the problem persists, I'll consider changing distros.

That will not be a fun day.

I think I'm might try spawn-fcgi first though.

Things right now are working fine with more children. But, again, sorta too early to really call it.

Debian 5 is newer than Ubuntu Hardy…

Anyway, if your fastcgi process keeps committing suicide, put the startup script in another shell script, inside an infinite loop. That way, whenever the process exits, a new one will be started right away.

I'm having a wordpress running also in a extern FastCGI process, I saw they can take a lot of memory. Is there a way to limit this?

@hybinet:

Debian 5 is newer than Ubuntu Hardy…

Exactly :)

> Anyway, if your fastcgi process keeps committing suicide, put the startup script in another shell script, inside an infinite loop. That way, whenever the process exits, a new one will be started right away.

It won't work because the process does not die – they just become zombies and require a kill -9 :)

What I did before was running another script scanning /var/log/nginx/error.log -- something like a tail -f but immediately restart php-fcgi if some 502 error has been detected (and send me an email for notification). Again, upgrading to a different PHP version completely solves it.

@scotty:

@hybinet:

Debian 5 is newer than Ubuntu Hardy…

Exactly :)

What happened there? I read "Debian" and thought "Ubuntu".

Because I am an idiot. Just like I put this thread in the LAMP forum and requested help with nginx. :oops:

> It won't work because the process does not die – they just become zombies and require a kill -9 :)

Yes, this is the issue. It just happened again today. With 15 children running, I haven't noticed any issues with the pages that are being served, but the swap thing has cropped up again with fastcgi.

I hadn't checked the error log for nginx, mine is in /usr/local/nginx/logs:

2009/03/30 18:21:31 [alert] 16982#0: worker process 16986 exited on signal 9
2009/04/04 14:48:35 [alert] 11041#0: worker process 11044 exited on signal 9
2009/04/06 03:51:22 [alert] 7300#0: worker process 7304 exited on signal 9
2009/04/06 04:01:51 [alert] 8350#0: worker process 8354 exited on signal 9
2009/05/08 15:47:27 [alert] 2464#0: worker process 2466 exited on signal 9

So, I guess the fastcgi children go zombie one at a time until it impacts nginx and then boom!

Maybe I'll try spawn-cgi when I have a chance. Easier than switching out to Debian.

> What I did before was running another script scanning /var/log/nginx/error.log – something like a tail -f but immediately restart php-fcgi if some 502 error has been detected (and send me an email for notification). Again, upgrading to a different PHP version completely solves it.

Thanks for the tip. I assume you set it up in your hourly cron? I'll see if I can't figure out how to write up something like that for now.

I also thought I could just run a daily cron that restarts fastcgi.

> Thanks for the tip. I assume you set it up in your hourly cron? I'll see if I can't figure out how to write up something like that for now.

Well. Not exactly. Having it running hourly would mean that I am loosing one hour of traffic which is not desirable.

Anyway. Here's the script:

http://p.linode.com/2500

What it does is basically

1. Scans /var/log/nginx/error.log every 2 seconds (do filesize first so should have minimum impact on IO)

2. If 104: Connection reset by peer or 111: Connection refused are detected, start respawning the PHP process

When it tries to respawn, it

1. /etc/init.d/spawn-fcgi-php stop

2. Wait 2 seconds

3. killall -9 php-cgi

4. /etc/init.d/spawn-fcgi-php start

5. Send an email to tell me that something is wrong.

It should cut the downtime to minimum. You can just run it (as root) and it will daemonize to scan the Nginx error log file. Anyway, it has been customised to suit my environment but feel free to use it (at your own risk :)

Thanks for posting your code. I don't know Python so it's a bit beyond me. But I appreciate your effort.

So, I tried using spawn-fcgi. And then I stopped it. I didn't really understand how it worked, but I now understand it to just be yet another way of launching php5-cgi. It doesn't provide any help at all, that I can see.

My latest attempt? I lowered PHPFCGIMAX_REQUESTS to 500. Hopefully, fcgi spawns will recycle before they zombie now. I had this previously set at 1000.

This is one of the reasons I opted to use nginx to proxy to apache, rather than trying to keep FCGI alive. Yeah, apache brings some memory overhead, but its often less than people estimate from top or ps – on my VPS, the apache+php workers each seem to be using ~12MB of RAM and I have ~240MB "free" as buffers, unused memory and file cache. Are php-fcgi processes really that much lighter?

Apache+PHP is widely understood and the apache parent seems to do a decent job of keeping the child processes in line. Proxy to apache with nginx and you get the best of both worlds. Nginx can handle static files and keep alives with very low overhead and lets apache get back to work on the next dynamic page while nginx handles returning the script result to the client.

When I was initially setting up my server, I tried Apache and it was eating memory. So I switched to Nginx and my memory usage never pushes above 140mb RAM (and that's with generous allowances for fast-cgi). It sounds reasonable that the best path might be to combine the two. It also sounds a lot easier than switching to an entirely new distro. I'm going to look into that more. Thanks.

Reply

Please enter an answer
Tips:

You can mention users to notify them: @username

You can use Markdown to format your question. For more examples see the Markdown Cheatsheet.

> I’m a blockquote.

I’m a blockquote.

[I'm a link] (https://www.google.com)

I'm a link

**I am bold** I am bold

*I am italicized* I am italicized

Community Code of Conduct