Getting disconnected

Hi,

I was wondering if anyone has the same problem. I IRC from my linode with bX, and most of the time I'll have a second SSH session opened up so I can monitor some log files or for whatever else I need to do. I keep getting disconnected at random (today at least 5 times). Has this happened to anyone?

Erik

Acid-Duck@IRC

ducky@thermofart.homeunix.net

23 Replies

This has happened to me - connections to my BNC (on the linode) and also SSH, when idle for more than 30 minutes, are silently dropped and only pick back up after a reconnect.

A search of my firewall logs showed odd TCP packets that were being dropped at roughly the same time the connection was (and also a netstat on the linode showed a very high Send-Q on the relevant connection). Opening up parts of my firewall solved the problem - hope that helps :)

Problem is I don't have a firewall up, so that would not be the cause.

Erik

When the connection goes down to one SSH client, run netstat in the other and see what it says.

Also, what distribution are you using? I believe there's a setting somewhere that may be enabled which is causing your sessions to time out automatically.

In one of the shells, there is a default time out of 30 minutes if there is no input activity. So even if it was parsing log information the timeout would still happen.

It steems from the early days of unix.

Adam

@Quik:

When the connection goes down to one SSH client, run netstat in the other and see what it says.

Also, what distribution are you using? I believe there's a setting somewhere that may be enabled wich is causing your sessions to time out automatically.

I'm using the small slackware image. Thing is though there is activity since I'm IRC'ing.. and the program is ran from the shell. Funny thing though is my connection doesn't time out.. But my SSH session doesn't respond anymore, which forces me to close my SSH windows. Then I see the client quitting IRC (I'm on IRC with two clients). So I guess the connection between my client and the IRC server were still active, although for some reason the one between me and the SSH was loosing packets?

Erik

You can always run the SSH daemon in debug mode and see what happens when you connection is lost it may show some problem.

Or try to having two ssh sessions set-up and see if it is only the one which IRC running on it that gets kicked.

Adam

@adamgent:

You can always run the SSH daemon in debug mode and see what happens when you connection is lost it may show some problem.

Or try to having two ssh sessions set-up and see if it is only the one which IRC running on it that gets kicked.

Adam

I usually do have two windows and both of them drop. As for running SSH in debug mode I'll definately look into it. Thanks!

Erik

Just one point is that when in debug mode, it will only accept one SSH connection and remember to put in three -d

so

sshd -d -d -d

You will then need to review the system log.

Adam

I generally like to put it on another port too and leave the existing ssh daemon on 22.

sshd -dddp 33

and ssh -vvvp 33 myuser@mylinode

This is what I generally do. Leaves you your normal daemon on 22 to connect to without having to connect to the console to restart it. It may be nice to log the terminal that sshd is started from.

script ~/sshd.log

This will give you a new shell with stdin and stdout logged to sshd.log in your home directory.

Hope you get your problem worked out. I haven't had any issues like this on my linode. As an aside, you might look into the screen program. If your session gets disconnected, ssh back in and type 'screen -r'. Your back where you left off. You may also manually detatch with CTRL+A then D. This will exit screen but keep the session active so you can reconnect to it with 'screen -r' later.

I have started being disconnected by SSH recently as well. I could have sworn that up until recently, I could leave my SSH session connected overnight and it would not time out. But today I noticed that it is timing out.

I ran an sshd with debugging, and also a local ssh to my server with debugging as well, and I don't see anything in the SSH logs that would indicate what is going on.

The client times out and the last few things it says are:

debug1: fd 3 setting TCP_NODELAY
debug2: callback done
debug1: channel 0: open confirm rwindow 0 rmax 32768
debug2: channel 0: rcvd adjust 131072
debug3: channel_close_fds: channel 0: r -1 w -1 e -1

then my shell output is printed, and I leave it sitting for half an hour; on my next keypress, I get:

debug1: channel_free: channel 0: client-session, nchannels 1
debug3: channel_free: status: The following connections are open:
  #0 client-session (t4 r0 i0/0 o0/0 fd 4/5)

debug3: channel_close_fds: channel 0: r 4 w 5 e 6
Read from remote host shell.ischo.com: Connection reset by peer
Connection to shell.ischo.com closed.
debug1: Transferred: stdin 0, stdout 0, stderr 104 bytes in 2034.8 seconds
debug1: Bytes per second: stdin 0.0, stdout 0.0, stderr 0.1
debug1: Exit status -1

Looks like the connection is reset by the remote host. But on the server side, the logs don't show anything interesting:

debug1: session_input_channel_req: session 0 req shell
debug1: PAM setting tty to "/dev/pts/2"
debug1: PAM establishing creds
debug1: fd 4 setting TCP_NODELAY
debug3: mm_answer_pty: tty /dev/pts/2 ptyfd 3
debug3: mm_request_receive entering
debug1: channel 0: rfd 10 isatty
debug1: fd 10 setting O_NONBLOCK
debug2: fd 9 is O_NONBLOCK
debug1: Setting controlling tty using TIOCSCTTY.

I don't think that sshd or ssh are doing any timing out. I think that the firewall or router at the planet data center, or our Linode host system, is timing out TCP sessions with no activity after a certain number of seconds.

Which really sucks …

Note also that the server seems to still be hanging around even though the client disconnected:

-bash-2.05b$ ps auxww | grep sshd
root     26393  0.0  3.3  6704 1996 pts/1    S    Nov27   0:00 sshd -dddp 33
bji      26434  0.0  3.7  6732 2232 pts/1    S    Nov27   0:00 sshd -dddp 33

That's weird … it looks like all of my disconnected SSH sessions are still active on the server. It's like all of my TCP connections are being tunnelled and the tunnel-to-my-local-host connection is broken while the server-to-tunnel connection is still alive.

My guess is that the TCP connections are being tunnelled via the host system, with the ethernet device being a virtual device as implemented by the UML system, and that the host system (host5) is timing out my connection, while the UML side connection, is being kept around.

Any comments?

Not too sure about this, but there are two things that you guys might want to try, and something caker may have to do on the host server. (I'm not too sure how this all works from a UML point of view)

The first is decreasing the value in /proc/sys/net/ipv4/tcpkeepalivetime (either on the UML, or the host serer). (bji)

Lastly, you guys can try setting KeepAlive yes in either /etc/ssh/sshd_config or ~/.ssh/config

Hope that helps, sorry if it doesn't.

Bill Clinton

@Bill Clinton:

Not too sure about this, but there are two things that you guys might want to try, and something caker may have to do on the host server. (I'm not too sure how this all works from a UML point of view)

The first is increasing the value in /proc/sys/net/ipv4/tcpkeepalivetime (either on the UML, or the host serer).

Lastly, you guys can try setting KeepAlive yes in either /etc/ssh/sshd_config or ~/.ssh/config

Hope that helps, sorry if it doesn't.

Bill Clinton

My understanding of the TCP keepalive time is that it is the timeout after which a TCP keepalive packet will be sent. If so, then we'd want to DECREASE the value in /proc/sys/net/ipv4/tcpkeepalivetime, so that we send keepalive packets more often, which will (hopefully) get around the idle timeout that is set on the host.

I'm going to try playing around with my Linode's keepalive settings and see what helps. I have a feeling that if I sent it low enough, then the timeouts will no longer happen. I'll do a binary search to try to find the shortest keepalive that keeps sessions open, and report my results here.

@bji:

My understanding of the TCP keepalive time is that it is the timeout after which a TCP keepalive packet will be sent. If so, then we'd want to DECREASE the value in /proc/sys/net/ipv4/tcpkeepalivetime

Yes, sadly I wrote the wrong word, doh! Updated now.

Bill Clinton

Mine works great ssh locks on np, 9 ircds holding firm with two psy and two eggdrops sold as a rock.

just curious, but why 9 ircds?

So far my results seem to show that a tcpkeepalivetime of 1650 causes SSH timeouts, but 1500 does not. I'm trying to narrow it down a little further.

Just for reference, the bridging that occurs on the hosts essentially dumps Ethernet frames onto the local network (the switch) – the host's don't have any concept of TCP connections between your Linodes and the world. It's just layer-2 bridging.

There is a known issue that affects a few of the hosts, including host5 and host6. Apparently, when someone joins the bridge it blips the bridge long enough to disturb at least some of the people already on the bridge.

I'm scheduling a reboot for host6 and will test the fix. If everything goes well, I'll schedule reboots for the rest of the hosts that are affected as well.

Now, as far as idle ssh connections being timed out, you can try this (worked for me):

# send keepalive packets every 5 minutes
echo 300 > /proc/sys/net/ipv4/tcp_keepalive_time

#I also added this to my /etc/sshd/sshd_config:
KeepAlive yes
ClientAliveInterval 10

/etc/init.d/ssh restart

logout and reconnect. My idle ssh session remained connected over night.

-Chris

@mikegrb:

just curious, but why 9 ircds?

$x9=more money

$x1=less money

I find it odd that idle ssh sessions are dying.

I generally keep ssh sessions open from at least 2 systems in seperate locations (ie, home/work) to my linode and use screen to jump back and forth.

using putty or openssh as the client i have no real disconnect issues in the long-term. I believe my putty session from an XP system has been connected for about 2 weeks now.. and it sits idle for hours if not days at the time.

-E

OK, I've narrowed it down as far as I am going to go; the magic number is between 1600 and 1700.

If you set your /proc/sys/net/ipv4/tcpkeepalivetime value to 1600, your (otherwise not keep-alived) SSH sessions will not timeout.

If you set your /proc/sys/net/ipv4/tcpkeepalivetime value to 1700, your (otherwise not keep-alived) SSH sessions will timeout after about half an hour.

Anyone who is not experiencing SSH timeouts probably either already has their tcpkeepalivetime value set low enough, or has SSH set to do keepalives.

I recommend using the tcpkeepalivetime value rather than SSH keepalives, because the former will work for all kinds of network connections, the latter only for SSH.

@bji:

I recommend using the tcpkeepalivetime value rather than SSH keepalives, because the former will work for all kinds of network connections, the latter only for SSH.

Ok, but how do we set tcpkeepalivetime?

-Ashen

@Ashen:

@bji:

I recommend using the tcpkeepalivetime value rather than SSH keepalives, because the former will work for all kinds of network connections, the latter only for SSH.

Ok, but how do we set tcpkeepalivetime?

-Ashen

echo VALUE > /proc/sys/net/ipv4/tcpkeepalivetime

Put this an an rc startup file (such as /etc/rc.d/rc.local) to effect this setting on every boot.

@bji:

echo VALUE > /proc/sys/net/ipv4/tcpkeepalivetime

Put this an an rc startup file (such as /etc/rc.d/rc.local) to effect this setting on every boot.

While that will work just fine, the canonical Linux method is to use /etc/sysctl.conf, adding a line like:

net.ipv4.tcp_keepalive = 1500

See sysctl.conf(5) and sysctl(8) for details.

Reply

Please enter an answer
Tips:

You can mention users to notify them: @username

You can use Markdown to format your question. For more examples see the Markdown Cheatsheet.

> I’m a blockquote.

I’m a blockquote.

[I'm a link] (https://www.google.com)

I'm a link

**I am bold** I am bold

*I am italicized* I am italicized

Community Code of Conduct