Getting disconnected
I was wondering if anyone has the same problem. I IRC from my linode with bX, and most of the time I'll have a second SSH session opened up so I can monitor some log files or for whatever else I need to do. I keep getting disconnected at random (today at least 5 times). Has this happened to anyone?
Erik
Acid-Duck@IRC
23 Replies
A search of my firewall logs showed odd TCP packets that were being dropped at roughly the same time the connection was (and also a netstat on the linode showed a very high Send-Q on the relevant connection). Opening up parts of my firewall solved the problem - hope that helps
Erik
Also, what distribution are you using? I believe there's a setting somewhere that may be enabled which is causing your sessions to time out automatically.
It steems from the early days of unix.
Adam
@Quik:
When the connection goes down to one SSH client, run netstat in the other and see what it says.
Also, what distribution are you using? I believe there's a setting somewhere that may be enabled wich is causing your sessions to time out automatically.
I'm using the small slackware image. Thing is though there is activity since I'm IRC'ing.. and the program is ran from the shell. Funny thing though is my connection doesn't time out.. But my SSH session doesn't respond anymore, which forces me to close my SSH windows. Then I see the client quitting IRC (I'm on IRC with two clients). So I guess the connection between my client and the IRC server were still active, although for some reason the one between me and the SSH was loosing packets?
Erik
Or try to having two ssh sessions set-up and see if it is only the one which IRC running on it that gets kicked.
Adam
@adamgent:
You can always run the SSH daemon in debug mode and see what happens when you connection is lost it may show some problem.
Or try to having two ssh sessions set-up and see if it is only the one which IRC running on it that gets kicked.
Adam
I usually do have two windows and both of them drop. As for running SSH in debug mode I'll definately look into it. Thanks!
Erik
so
sshd -d -d -d
You will then need to review the system log.
Adam
sshd -dddp 33
and ssh -vvvp 33 myuser@mylinode
This is what I generally do. Leaves you your normal daemon on 22 to connect to without having to connect to the console to restart it. It may be nice to log the terminal that sshd is started from.
script ~/sshd.log
This will give you a new shell with stdin and stdout logged to sshd.log in your home directory.
Hope you get your problem worked out. I haven't had any issues like this on my linode. As an aside, you might look into the screen program. If your session gets disconnected, ssh back in and type 'screen -r'. Your back where you left off. You may also manually detatch with CTRL+A then D. This will exit screen but keep the session active so you can reconnect to it with 'screen -r' later.
I ran an sshd with debugging, and also a local ssh to my server with debugging as well, and I don't see anything in the SSH logs that would indicate what is going on.
The client times out and the last few things it says are:
debug1: fd 3 setting TCP_NODELAY
debug2: callback done
debug1: channel 0: open confirm rwindow 0 rmax 32768
debug2: channel 0: rcvd adjust 131072
debug3: channel_close_fds: channel 0: r -1 w -1 e -1
then my shell output is printed, and I leave it sitting for half an hour; on my next keypress, I get:
debug1: channel_free: channel 0: client-session, nchannels 1
debug3: channel_free: status: The following connections are open:
#0 client-session (t4 r0 i0/0 o0/0 fd 4/5)
debug3: channel_close_fds: channel 0: r 4 w 5 e 6
Read from remote host shell.ischo.com: Connection reset by peer
Connection to shell.ischo.com closed.
debug1: Transferred: stdin 0, stdout 0, stderr 104 bytes in 2034.8 seconds
debug1: Bytes per second: stdin 0.0, stdout 0.0, stderr 0.1
debug1: Exit status -1
Looks like the connection is reset by the remote host. But on the server side, the logs don't show anything interesting:
debug1: session_input_channel_req: session 0 req shell
debug1: PAM setting tty to "/dev/pts/2"
debug1: PAM establishing creds
debug1: fd 4 setting TCP_NODELAY
debug3: mm_answer_pty: tty /dev/pts/2 ptyfd 3
debug3: mm_request_receive entering
debug1: channel 0: rfd 10 isatty
debug1: fd 10 setting O_NONBLOCK
debug2: fd 9 is O_NONBLOCK
debug1: Setting controlling tty using TIOCSCTTY.
I don't think that sshd or ssh are doing any timing out. I think that the firewall or router at the planet data center, or our Linode host system, is timing out TCP sessions with no activity after a certain number of seconds.
Which really sucks …
Note also that the server seems to still be hanging around even though the client disconnected:
-bash-2.05b$ ps auxww | grep sshd
root 26393 0.0 3.3 6704 1996 pts/1 S Nov27 0:00 sshd -dddp 33
bji 26434 0.0 3.7 6732 2232 pts/1 S Nov27 0:00 sshd -dddp 33
That's weird … it looks like all of my disconnected SSH sessions are still active on the server. It's like all of my TCP connections are being tunnelled and the tunnel-to-my-local-host connection is broken while the server-to-tunnel connection is still alive.
My guess is that the TCP connections are being tunnelled via the host system, with the ethernet device being a virtual device as implemented by the UML system, and that the host system (host5) is timing out my connection, while the UML side connection, is being kept around.
Any comments?
The first is decreasing the value in
Lastly, you guys can try setting
Hope that helps, sorry if it doesn't.
Bill Clinton
@Bill Clinton:
Not too sure about this, but there are two things that you guys might want to try, and something caker may have to do on the host server. (I'm not too sure how this all works from a UML point of view)
The first is increasing the value in
/proc/sys/net/ipv4/tcpkeepalivetime (either on the UML, or the host serer).Lastly, you guys can try setting
KeepAlive yes in either /etc/ssh/sshd_config or ~/.ssh/configHope that helps, sorry if it doesn't.
Bill Clinton
My understanding of the TCP keepalive time is that it is the timeout after which a TCP keepalive packet will be sent. If so, then we'd want to DECREASE the value in /proc/sys/net/ipv4/tcpkeepalivetime, so that we send keepalive packets more often, which will (hopefully) get around the idle timeout that is set on the host.
I'm going to try playing around with my Linode's keepalive settings and see what helps. I have a feeling that if I sent it low enough, then the timeouts will no longer happen. I'll do a binary search to try to find the shortest keepalive that keeps sessions open, and report my results here.
@bji:
My understanding of the TCP keepalive time is that it is the timeout after which a TCP keepalive packet will be sent. If so, then we'd want to DECREASE the value in /proc/sys/net/ipv4/tcpkeepalivetime
Yes, sadly I wrote the wrong word, doh! Updated now.
Bill Clinton
There is a known issue that affects a few of the hosts, including host5 and host6. Apparently, when someone joins the bridge it blips the bridge long enough to disturb at least some of the people already on the bridge.
I'm scheduling a reboot for host6 and will test the fix. If everything goes well, I'll schedule reboots for the rest of the hosts that are affected as well.
Now, as far as idle ssh connections being timed out, you can try this (worked for me):
# send keepalive packets every 5 minutes
echo 300 > /proc/sys/net/ipv4/tcp_keepalive_time
#I also added this to my /etc/sshd/sshd_config:
KeepAlive yes
ClientAliveInterval 10
/etc/init.d/ssh restart
logout and reconnect. My idle ssh session remained connected over night.
-Chris
@mikegrb:
just curious, but why 9 ircds?
$x9=more money
$x1=less money
I generally keep ssh sessions open from at least 2 systems in seperate locations (ie, home/work) to my linode and use screen to jump back and forth.
using putty or openssh as the client i have no real disconnect issues in the long-term. I believe my putty session from an XP system has been connected for about 2 weeks now.. and it sits idle for hours if not days at the time.
-E
If you set your /proc/sys/net/ipv4/tcpkeepalivetime value to 1600, your (otherwise not keep-alived) SSH sessions will not timeout.
If you set your /proc/sys/net/ipv4/tcpkeepalivetime value to 1700, your (otherwise not keep-alived) SSH sessions will timeout after about half an hour.
Anyone who is not experiencing SSH timeouts probably either already has their tcpkeepalivetime value set low enough, or has SSH set to do keepalives.
I recommend using the tcpkeepalivetime value rather than SSH keepalives, because the former will work for all kinds of network connections, the latter only for SSH.
@bji:
I recommend using the tcpkeepalivetime value rather than SSH keepalives, because the former will work for all kinds of network connections, the latter only for SSH.
Ok, but how do we set tcpkeepalivetime?
-Ashen
@Ashen:
@bji:I recommend using the tcpkeepalivetime value rather than SSH keepalives, because the former will work for all kinds of network connections, the latter only for SSH.
Ok, but how do we set tcpkeepalivetime?
-Ashen
echo VALUE > /proc/sys/net/ipv4/tcpkeepalivetime
Put this an an rc startup file (such as /etc/rc.d/rc.local) to effect this setting on every boot.
@bji:
echo VALUE > /proc/sys/net/ipv4/tcpkeepalivetime
Put this an an rc startup file (such as /etc/rc.d/rc.local) to effect this setting on every boot.
While that will work just fine, the canonical Linux method is to use /etc/sysctl.conf, adding a line like:
net.ipv4.tcp_keepalive = 1500
See sysctl.conf(5) and sysctl(8) for details.