ssl node balancer issue

Avatar forum:dbcdave 6 months, 1 week ago

Hi,

I've had a node balancer sitting in front of two app servers running smoothly for many years. SSL connections from the outside world (coming in port 443) terminate at the node balancer and are routed via port 444 to the internal app servers.

All of a sudden the node balancer is showing these servers are down for https . http port 80 still works (but website forces traffic to https so effectively the whole site is down).

I don't have a lot of experience with this stuff so not sure where to start debugging.

Some things:

No configs have changed recently. I have tried rebooting everything. Normal http traffic works fine. I checked my ssl cert is still valid.

I did do a:

tail -f /var/log/apache2/access.log

and see a ton of traffic from 192.168.255.xx (I think this is an internal ip?)

Thanks!

7 Replies

the nodebalancers use http on the private network to communicate with your backend nodes - it should be taking the SSL traffic and passing it back to the nodes on port 80.

are the failing health checks for port 444? and why use a separate port for https requests at all, since all the requests are being transmitted over http anyway?

Our php framework uses the port to know whether traffic is secure or not so we use a different port for routing the http vs https traffic internally (otherwise, as you say, we could just do port 80 for both).

I did an netstat -pltnu and see server is listening on 80,443, and 444 so I don't think that is the issue. The node balancer is using port 444 to route the traffic to the web servers but for some reason it is not seeing them

what sort of health check is it using? that's most likely your issue.

hmmm. it's just looking for http valid status every 20 seconds. timeout 10, attempts 4. It has been working for 4 years like this so not sure what would cause it to stop working

just an update if anyone has any ideas. The servers came up on their own and are now working most of the time, but now they are going down intermittently with the same issue (I can ssh in and don't see any obvious problems with cpu use). And clearly apache isn't overloaded as I can serve stuff over normal http

Does seem to have to do with the health check. But not sure whether it's really going down (a real issue) or whether it's a false positive health check. I turned on ip throttling (linode setting on node balancer was 0, set to 1) but not sure what else to do

Any thoughts?

Thanks!

the other interesting thing is that I have 2 app servers for redundancy (precisely to avoid this issue) and both seem to be removed from the rotation at the same time.

Ok, I dug into the logs and am seeing a large number of requests from a python user agent and the status (500) means they are failing:

xxx.xx.xx.xx - - [13/Mar/2018:16:55:27 -0700] "GET /urlremoved HTTP/1.1" 500 21227 "-" "python-request

s/2.14.2"

I suspect these status fails must be causing the health checks to fail.

I'm guessing someone is scraping the site? apache top says I'm getting 3.33 requests per second this url but it serves a potentially large file depending on the parameter. The ip address in the log is an internal ip adress (from the node balancer I assume).

What's the best way to block this ip? All traffic seems to come from that IP since this is an internal redirect from the outfacing node balancer

Thanks

Reply

Please enter an answer
Tips:

You can mention users to notify them: @username

You can use Markdown to format your question. For more examples see the Markdown Cheatsheet.

> I’m a blockquote.

I’m a blockquote.

[I'm a link] (https://www.google.com)

I'm a link

**I am bold** I am bold

*I am italicized* I am italicized

Community Code of Conduct