Is there anyone else suffering of stability issues?

As title.

I'm experiecing stability issues since some months.

The problems occurs really randomly at a distance of 20 to 30 days, but it is too random to say. In any case it occurs rarely, not often but it occurs and this is terrible for me.

I'm using CentOS 6.4 with the latest CentOS kernel using pv-grub.

Since this problems occured many times I'm loggin every five minutes the resources usages, this is the usage on the freezed vps.
> btmp begins Wed Oct 23 10:04:08 2013

total used free shared buffers cached

Mem: 1015568 945728 69840 0 123092 526536

-/+ buffers/cache: 296100 719468

Swap: 262136 496 261640

USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND

root 1392 2.2 1.0 2399220 10864 ? Sl Oct22 136:25 /usr/bin/python /usr/bin/fail2ban-server -b -s /var/run/fail2ban/fail2ban.sock -p /var/run/fail2ban/fail2ban.pid -x

502 25965 0.3 6.0 168132 61352 ? S 17:19 0:31 dovecot/imap

mysql 1245 0.1 2.2 771976 22372 ? Sl Oct22 8:52 /usr/libexec/mysqld –basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock

502 27180 0.1 5.1 161032 51868 ? S 18:34 0:06 dovecot/imap

root 1 0.0 0.1 19360 1412 ? Ss Oct22 0:02 /sbin/init

root 2 0.0 0.0 0 0 ? S Oct22 0:00 [kthreadd]

root 3 0.0 0.0 0 0 ? S Oct22 0:00 [migration/0]

root 4 0.0 0.0 0 0 ? S Oct22 0:00 [ksoftirqd/0]

root 5 0.0 0.0 0 0 ? S Oct22 0:00 [migration/0]

root 6 0.0 0.0 0 0 ? S Oct22 0:01 [watchdog/0]

root 7 0.0 0.0 0 0 ? S Oct22 0:00 [migration/1]

root 8 0.0 0.0 0 0 ? S Oct22 0:00 [migration/1]

root 9 0.0 0.0 0 0 ? S Oct22 0:01 [ksoftirqd/1]

root 10 0.0 0.0 0 0 ? S Oct22 0:01 [watchdog/1]

root 11 0.0 0.0 0 0 ? S Oct22 0:00 [migration/2]

root 12 0.0 0.0 0 0 ? S Oct22 0:00 [migration/2]

As you can see there is not CPU problem and no memory problems.

When the systems locks, it stops responding to SSH, HTTP, email servers, the systems seems dead.

The only things that works perfectly is the lish linode console, this makes me thinking at something strange.

What do you think? What could be the problem? Can I do something more to troubleshoot the problem?

Thanks.

22 Replies

Nothing suspicious on the console like OOM messages?

Is fail2ban doing something funky with your iptable rules, and blocking all network traffic?

If you can access the system via Lish during these times, I'd make sure your iptable rules aren't wonked. Does networking work at all during these times? Outbound?

-Chris

@caker:

Nothing suspicious on the console like OOM messages?

Is fail2ban doing something funky with your iptable rules, and blocking all network traffic?

If you can access the system via Lish during these times, I'd make sure your iptable rules aren't wonked. Does networking work at all during these times? Outbound?

-Chris

quite honored to get an answer from caker directly, not joking, I'm honest, so thanks for the reply :)

No, no OOM message on the console, nothing strange, no warnings.

I don't tried to see if my server can communicate with the external during the hangs, now I rebooted it and can't check.

I never checked the iptables rules while the system stopped responding to external IPs.

Do you think that fail2ban is banning all the external IPs?

I don't have any fail2ban logs but fail2ban sends me an email every time it ban an IP, and I don't received any email.

I created a cron that logs every 5 minutes this commands in the hope to understand why the system hangs.

#!/bin/ksh -p

mkdir -p /root/log_for_crash_detect/day_$(date +%Y-%m-%d)
LOG=/root/log_for_crash_detect/day_$(date +%Y-%m-%d)/log_$(date +%Y-%m-%d-%H-%M)

{
  echo -- date --------------------
  date
  echo -- date-end ----------------
  echo
  echo
  echo -- uptime ------------------
  uptime
  echo -- uptime-end --------------
  echo
  echo
  echo -- last --------------------
  last
  echo -- last-end ----------------
  echo
  echo
  echo -- lastlog -----------------
  lastlog
  echo -- lastlog-end--------------
  echo
  echo
  echo -- lastb -------------------
  lastb
  echo -- lastb-end----------------
  echo
  echo
  echo -- free --------------------
  free
  echo -- free-end ----------------
  echo
  echo
  echo -- ps aux --sort '-pcpu' ---
  ps aux --sort '-pcpu'
  echo -- ps aux ------------------
  echo
  echo
  echo -- iostat 1 5 ---------------------------------------------------------------------
  iostat 1 5
  echo -- iostat 1 5 end -----------------------------------------------------------------
  echo
  echo
  echo -- vmstat 1 5 ---------------------------------------------------------------------
  vmstat 1 5
  echo -- vmstat 1 5 end -----------------------------------------------------------------
  echo
  echo
  echo "-- ps auxf | sort -nr -k 4 | head -5 -----------------------------------------------"
  ps auxf | sort -nr -k 4 | head -5
  echo "-- ps auxf | sort -nr -k 4 | head -5 end -------------------------------------------"
  echo
  echo
  echo "-- ps auxf | sort -nr -k 3 | head -5 -----------------------------------------------"
  ps auxf | sort -nr -k 3 | head -5
  echo "-- ps auxf | sort -nr -k 3 | head -5 end -------------------------------------------"
  echo
  echo
  echo "-- [PID] [30EM] [PATH] && ps aux | awk '{print $2, $4, $11}' | sort -k2rn | head -n 20 -------"
  echo [PID] [30EM] [PATH] && ps aux | awk '{print $2, $4, $11}' | sort -k2rn | head -n 20
  echo "-- echo [PID] [30EM] [PATH] && ps aux | awk '{print $2, $4, $11}' | sort -k2rn | head -n 20 end ----"
  echo
  echo
  echo "-- ps -eo pcpu,pid,user,args | sort -k 1 -r | head -10 -----------------------------"
  ps -eo pcpu,pid,user,args | sort -k 1 -r | head -10
  echo "-- ps -eo pcpu,pid,user,args | sort -k 1 -r | head -10 end -------------------------"
  echo
  echo
  echo "-- iptables -L ---------------------------------------------------------------------"
  iptables -L
  echo "-- iptables -L end -----------------------------------------------------------------"
  echo
  echo

} >> $LOG

Nope, things are good for me, if I were experiencing a problem with my Linode, I would file a Ticket with Support.

next time you see this, log into LISH and grab the outputs of the following commands:

iptables -L -n -v

ifconfig

sounds like a firewall issue imo

@kbar:

next time you see this, log into LISH and grab the outputs of the following commands:

iptables -L -n -v

ifconfig

sounds like a firewall issue imo

I will add this commands to the "log every five minutes" lists and I will see the output when the problem will occur again.

Thanks for the help.

Fail2ban. The cause of oh so much pain and oh so little benefit.

@sednet:

Fail2ban. The cause of oh so much pain and oh so little benefit.

how to ban brute force attack against a webmail without fail2ban?

@sblantipodi:

@sednet:

Fail2ban. The cause of oh so much pain and oh so little benefit.

how to ban brute force attack against a webmail without fail2ban?
You can't. A distribute brute force attack renders fail2ban useless.

Personal I don't see the point of fail2ban. Your passwords are either secure or they're not. fail2ban has little benefit other than reducing log spam, imo.

-Chris

@caker:

@sblantipodi:

@sednet:

Fail2ban. The cause of oh so much pain and oh so little benefit.

how to ban brute force attack against a webmail without fail2ban?
You can't. A distribute brute force attack renders fail2ban useless.

Personal I don't see the point of fail2ban. Your passwords are either secure or they're not. fail2ban has little benefit other than reducing log spam, imo.

-Chris

never received a distribuite brute force attack (I'm lucky ;) ) but received many attacks who hitted my webmail/postfix for days.

with fail2ban days decreased to three attemps and logs decreased from hundred of MB to dozens of MB.

I'm sure that fail2ban is not the definitive security tool but I find it useful.

I add something more if it can help troubleshooting.

when the server stopped responding in the hurry of lift it I read the logs,

no strange attemp against my server, than I restarted iptables.

The system hanged in the restart process of iptables.

@sednet:

Fail2ban. The cause of oh so much pain and oh so little benefit.

So stopping people who are actively trying to break into your server is a bad idea? That's what it does, monitors logs, sees suspicious activity, bans the IP. How's that "little benefit"?

@jebblue:

So stopping people who are actively trying to break into your server is a bad idea? That's what it does, monitors logs, sees suspicious activity, bans the IP. How's that "little benefit"?

See caker's point. Your password and/or SSH key are either secure or they aren't. fail2ban won't stop a determined attacker from getting into your system if the means of authentication have been compromised. The only questionably useful purpose of fail2ban is to reduce the amount of logspam received from people attempting to brute force a properly secured system. Other measures can be taken to reduce the logspam that can't result in you getting locked out of your own system because you've forgotten your password or have too many SSH keys in your local SSH agent (this can be rather puzzling). Such measures include configuring your logging daemon to not log the messages generated by brute force attempts in the first place, or properly configuring logrotate to rotate and compress old logs. Other options include port knocking and single packet authorization (the latter being the preferred method, as it further validates that you are who you say you are, and supplements existing security).

-Doug

>> can't result in you getting locked out of your own system because

If you can successfully fail to log into your system more than a few times in a row then you might want to consider installing fail2ban or at least iptables rate limiting. Preferably both.

@jebblue:

can't result in you getting locked out of your own system because

If you can successfully fail to log into your system more than a few times in a row then you might want to consider installing fail2ban or at least iptables rate limiting. Preferably both.

Both of which only reduce logspam, and we're back to the point I gave before. Neither of them increase security in any useful way, but instead cause unnecessary headache for legitimate users.

-Doug

@dwfreed:

@jebblue:

can't result in you getting locked out of your own system because

If you can successfully fail to log into your system more than a few times in a row then you might want to consider installing fail2ban or at least iptables rate limiting. Preferably both.

Both of which only reduce logspam, and we're back to the point I gave before. Neither of them increase security in any useful way, but instead cause unnecessary headache for legitimate users.

-Doug

Are you saying iptables rate limiting is just good for reducing log congestion?

Hi,

I would add to the logs somethings that says me if linode can connect to the internet.

I don't want to ping google every 5 minutes so I would like to find an alternative solution.

Do you think that this command is a good solution to see if my linode is online and ready to connect to the internet?

ping -q -w 1 -c 1 `ip r | grep default | cut -d ' ' -f 3` > /dev/null && echo ok || echo error

@jebblue:

@sednet:

Fail2ban. The cause of oh so much pain and oh so little benefit.

So stopping people who are actively trying to break into your server is a bad idea? That's what it does, monitors logs, sees suspicious activity, bans the IP. How's that "little benefit"?

The key word here is "distributed". More and more of the brute force attempts are coming from distributed botnets as opposed to a single script running on a single system. This means Fail2ban will fill up your firewall deny rules with thousands of hosts and still not stop the attack.

Rate limiting is a good idea though (as you mention in later posts).

I rate limit connections to SSH with the following ruleset:

# cat /etc/iptables/iptables.rules
*nat
:PREROUTING ACCEPT [0:0]
:INPUT ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]
COMMIT
*mangle
:PREROUTING ACCEPT [0:0]
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]
COMMIT
*filter
:INPUT DROP [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
-A INPUT -p icmp -j ACCEPT
-A INPUT -i lo -j ACCEPT
-A INPUT -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
COMMIT

Those firewall rules have served me well over the years.

  • Les

@Ox-:

@jebblue:

@sednet:

Fail2ban. The cause of oh so much pain and oh so little benefit.

So stopping people who are actively trying to break into your server is a bad idea? That's what it does, monitors logs, sees suspicious activity, bans the IP. How's that "little benefit"?

The key word here is "distributed". More and more of the brute force attempts are coming from distributed botnets as opposed to a single script running on a single system. This means Fail2ban will fill up your firewall deny rules with thousands of hosts and still not stop the attack.

Rate limiting is a good idea though (as you mention in later posts).

in any case it sends you an email warning you about what it is happening.

I don't think that botnet has so many IPs to do thousands attacks from different IPs.

@sblantipodi:

in any case it sends you an email warning you about what it is happening.

For that you want a real HIDS because it will give you a better idea of the strength of the threat, not to mention it will send you warning emails on many other things attackers will try. Check out OSSEC.

@akerl:

I rate limit connections to SSH with the following ruleset:

# cat /etc/iptables/iptables.rules
*nat
:PREROUTING ACCEPT [0:0]
:INPUT ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]
COMMIT
*mangle
:PREROUTING ACCEPT [0:0]
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]
COMMIT
*filter
:INPUT DROP [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
-A INPUT -p icmp -j ACCEPT
-A INPUT -i lo -j ACCEPT
-A INPUT -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
COMMIT

Those firewall rules have served me well over the years.

  • Les

what is the difference between rate limiting IPs and using fail2ban?

rate limiting limits the numbers of connections from an IP, fail2ban limits the numbers of errors from an IP.

I find fail2ban more interesting than IP tables rate limiting.

Am I wrong? If yes, please argument why. Thanks.

@akerl:

I rate limit connections to SSH with the following ruleset:

# cat /etc/iptables/iptables.rules
*nat
:PREROUTING ACCEPT [0:0]
:INPUT ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]
COMMIT
*mangle
:PREROUTING ACCEPT [0:0]
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]
COMMIT
*filter
:INPUT DROP [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
-A INPUT -p icmp -j ACCEPT
-A INPUT -i lo -j ACCEPT
-A INPUT -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
COMMIT

Those firewall rules have served me well over the years.

  • Les

I believe I understand most iptables implementations, but yours don't seem like it gonna work.

You accept ICMP. Good

You accept connections to the loopback interface. Good

You do conn-tracking of existing or related connections, Good

You drop everything else. -> wait a minute, what happens if your net connection/session flakes out or you terminal craps out and you try to reconnect?

I don't get how this ruleset would help anybody unless they have console access and is using the box as a personal desktop. - Please enlighten me.

Reply

Please enter an answer
Tips:

You can mention users to notify them: @username

You can use Markdown to format your question. For more examples see the Markdown Cheatsheet.

> I’m a blockquote.

I’m a blockquote.

[I'm a link] (https://www.google.com)

I'm a link

**I am bold** I am bold

*I am italicized* I am italicized

Community Code of Conduct