Jumphost suddenly reseting first SSH MUX connection attempts
I have been using a Debian 9 SSH jumpbox host to run my scripts/ansible playbooks for a while. The jumbox talks with Debian 9 and some Debian 8 servers, mostly. Most of the servers are VMs running under VMWare Enterprise 5.5.
The SSH client in the jumbox is configured for doing SSH MUX, and the authentication is done by an RSA certificate file.
The SSH has been working well for years now, however suddenly SSH connections started giving the error ssh_exchange_identification: read: Connection reset by peer
at first try, several times a day, which obviously creates havoc with my scripts and scripts of our development team.
However, after the first try they are ok for a while. The servers misbehaving appear be random at first, but they have some patterns/timeouts. If I do send a command to all of the servers, for instance, running in a command before the intended script/playbook, a few will fail, but the next script will run in all of them.
There havent been recent significant changes on the servers, except for security updates. The transition for Debian 9 has already some (significant) time.
I already found a MTU configuration or other that was once applied to several servers in a malfunction and forgotten, however that was not the case. I also diminished both on the client and server side the ControlPersist
and ClientAliveInterval
both to 1h, and that did not improve the situation.
So at the moment, I am at loss of why this is happening. I am however more inclined to a layer 7 issue than a network problem.
The SSH configuration on the client side /etc/ssh_config
, Debian 9 is:
Host *
SendEnv LANG LC_*
HashKnownHosts yes
GSSAPIAuthentication yes
GSSAPIDelegateCredentials no
ControlMaster auto
ControlPath /tmp/ssh_mux_%h_%p_%r
ControlPersist 1h
Compression no
UseRoaming no
On SSH on the server side of several Debian servers:
Protocol 2
HostKey /etc/ssh/ssh_host_rsa_key
HostKey /etc/ssh/ssh_host_dsa_key
UsePrivilegeSeparation yes
SyslogFacility AUTH
LogLevel INFO
LoginGraceTime 120
PermitRootLogin forced-commands-only
StrictModes yes
PubkeyAuthentication yes
IgnoreRhosts yes
HostbasedAuthentication no
PermitEmptyPasswords no
ChallengeResponseAuthentication no
PasswordAuthentication no
X11Forwarding no
X11DisplayOffset 10
PrintMotd no
PrintLastLog yes
TCPKeepAlive yes
AcceptEnv LANG LC_*
Subsystem sftp /usr/lib/openssh/sftp-server -l INFO
UsePAM yes
ClientAliveInterval 3600
ClientAliveCountMax 0
AddressFamily inet
SSH versions:
client -
$ssh -V
OpenSSH_7.4p1 Debian-10+deb9u1, OpenSSL 1.0.2l 25 May 2017
server(s)
SSH-2.0-OpenSSH_7.4p1 Debian-10+deb9u1 (Debian 9)
SSH-2.0-OpenSSH_6.7p1 Debian-5+deb8u3 (Debian 8)
I have seen that error at least in situations with both servers with the 4.9.0-0.bpo.1-amd64 version.
The tcpdump
of a server misbehaving, both machines being in the same network without any firewalls in the middle. I also monitor MAC addresses and there is not log of a new machine/MAC with the same MAC addresses in the last few years.
#tcpdump port 22
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
19:42:25.462896 IP jumbox.40270 > server.ssh: Flags [S], seq 3882361678, win 23200, options [mss 1160,sackOK,TS val 354223428 ecr 0,nop,wscale 7], length 0
19:42:25.463289 IP server.ssh > jumbox.40270: Flags [S.], seq 405921081, ack 3882361679, win 23200, options [mss 1160,nop,nop,sackOK,nop,wscale 7], length 0
19:42:25.463306 IP jumbox.40270 > server.ssh: Flags [.], ack 1, win 182, length 0
19:42:25.481470 IP server.ssh > jumbox.40270: Flags [S.], seq 4195986320, ack 3882361679, win 23200, options [mss 1160,nop,nop,sackOK,nop,wscale 7], length 0
19:42:25.481477 IP jumbox.40270 > server.ssh: Flags [.], ack 504902058, win 182, length 0
19:42:25.481490 IP server.ssh > jumbox.40270: Flags [R], seq 405921082, win 0, length 0
19:42:25.481494 IP server.ssh > jumbox.40270: Flags [P.], seq 504902058:504902097, ack 1, win 182, length 39
19:42:26.491536 IP server.ssh > jumbox.40270: Flags [S.], seq 4195986320, ack 3882361679, win 23200, options [mss 1160,nop,nop,sackOK,nop,wscale 7], length 0
19:42:26.491551 IP jumbox.40270 > server.ssh: Flags [R], seq 3882361679, win 0, length 0
19:42:28.507528 IP server.ssh > jumbox.40270: Flags [S.], seq 4195986320, ack 3882361679, win 23200, options [mss 1160,nop,nop,sackOK,nop,wscale 7], length 0
19:42:28.507552 IP jumbox.40270 > server.ssh: Flags [R], seq 3882361679, win 0, length 0
19:42:32.699540 IP server.ssh > jumbox.40270: Flags [S.], seq 4195986320, ack 3882361679, win 23200, options [mss 1160,nop,nop,sackOK,nop,wscale 7], length 0
19:42:32.699556 IP jumbox.40270 > server.ssh: Flags [R], seq 3882361679, win 0, length 0
19:42:40.891490 IP server.ssh > jumbox.40270: Flags [S.], seq 4195986320, ack 3882361679, win 23200, options [mss 1160,nop,nop,sackOK,nop,wscale 7], length 0
19:42:40.891514 IP jumbox.40270 > server.ssh: Flags [R], seq 3882361679, win 0, length 0
19:42:57.019511 IP server.ssh > jumbox.40270: Flags [S.], seq 4195986320, ack 3882361679, win 23200, options [mss 1160,nop,nop,sackOK,nop,wscale 7], length 0
19:42:57.019534 IP jumbox.40270 > server.ssh: Flags [R], seq 3882361679, win 0, length 0
An ssh -v server
log of a failed connection, with the reset error:
OpenSSH_7.4p1 Debian-10+deb9u1, OpenSSL 1.0.2l 25 May 2017
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 19: Applying options for *
debug1: /etc/ssh/ssh_config line 59: Deprecated option "useroaming"
debug1: auto-mux: Trying existing master
debug1: Control socket "/tmp/ssh_mux_fenix-storage_22_rui" does not exist
debug1: Connecting to fenix-storage [10.10.32.156] port 22.
debug1: Connection established.
debug1: key_load_public: No such file or directory
debug1: identity file /home/rui/.ssh/id_rsa type -1
debug1: key_load_public: No such file or directory
debug1: identity file /home/rui/.ssh/id_rsa-cert type -1
debug1: key_load_public: No such file or directory
debug1: identity file /home/rui/.ssh/id_dsa type -1
debug1: key_load_public: No such file or directory
debug1: identity file /home/rui/.ssh/id_dsa-cert type -1
debug1: key_load_public: No such file or directory
debug1: identity file /home/rui/.ssh/id_ecdsa type -1
debug1: key_load_public: No such file or directory
debug1: identity file /home/rui/.ssh/id_ecdsa-cert type -1
debug1: key_load_public: No such file or directory
debug1: identity file /home/rui/.ssh/id_ed25519 type -1
debug1: key_load_public: No such file or directory
debug1: identity file /home/rui/.ssh/id_ed25519-cert type -1
debug1: Enabling compatibility mode for protocol 2.0
write: Connection reset by peer
An ssh -v server
of a successful connection:
OpenSSH_7.4p1 Debian-10+deb9u1, OpenSSL 1.0.2l 25 May 2017
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 19: Applying options for *
debug1: /etc/ssh/ssh_config line 59: Deprecated option "useroaming"
debug1: auto-mux: Trying existing master
debug1: Control socket "/tmp/ssh_mux_sql01_22_rui" does not exist
debug1: Connecting to sql01 [10.20.10.88] port 22.
debug1: Connection established.
debug1: key_load_public: No such file or directory
debug1: identity file /home/rui/.ssh/id_rsa type -1
debug1: key_load_public: No such file or directory
debug1: identity file /home/rui/.ssh/id_rsa-cert type -1
debug1: key_load_public: No such file or directory
debug1: identity file /home/rui/.ssh/id_dsa type -1
debug1: key_load_public: No such file or directory
debug1: identity file /home/rui/.ssh/id_dsa-cert type -1
debug1: key_load_public: No such file or directory
debug1: identity file /home/rui/.ssh/id_ecdsa type -1
debug1: key_load_public: No such file or directory
debug1: identity file /home/rui/.ssh/id_ecdsa-cert type -1
debug1: key_load_public: No such file or directory
debug1: identity file /home/rui/.ssh/id_ed25519 type -1
debug1: key_load_public: No such file or directory
debug1: identity file /home/rui/.ssh/id_ed25519-cert type -1
debug1: Enabling compatibility mode for protocol 2.0
debug1: Local version string SSH-2.0-OpenSSH_7.4p1 Debian-10+deb9u1
debug1: Remote protocol version 2.0, remote software version OpenSSH_7.4p1 Debian-10+deb9u1
debug1: match: OpenSSH_7.4p1 Debian-10+deb9u1 pat OpenSSH* compat 0x04000000
debug1: Authenticating to sql01:22 as 'rui'
debug1: SSH2_MSG_KEXINIT sent
debug1: SSH2_MSG_KEXINIT received
debug1: kex: algorithm: curve25519-sha256
debug1: kex: host key algorithm: rsa-sha2-512
debug1: kex: server->client cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: none
debug1: kex: client->server cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: none
debug1: expecting SSH2_MSG_KEX_ECDH_REPLY
debug1: Server host key: ssh-rsa SHA256:6aJ+ipXRZJfbei5YbYtvqKXB01t1YO34O2ChdT/vk/4
debug1: Host 'sql01' is known and matches the RSA host key.
debug1: Found key in /home/rui/.ssh/known_hosts:315
debug1: rekey after 134217728 blocks
debug1: SSH2_MSG_NEWKEYS sent
debug1: expecting SSH2_MSG_NEWKEYS
debug1: SSH2_MSG_NEWKEYS received
debug1: rekey after 134217728 blocks
debug1: SSH2_MSG_EXT_INFO received
debug1: kex_input_ext_info: server-sig-algs=<ssh-ed25519,ssh-rsa,ssh-dss,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521>
debug1: SSH2_MSG_SERVICE_ACCEPT received
debug1: Authentications that can continue: publickey
debug1: Next authentication method: publickey
debug1: Offering RSA public key: /home/rui/.ssh/id_rsa
debug1: Server accepts key: pkalg ssh-rsa blen 277
debug1: Authentication succeeded (publickey).
Authenticated to sql01 ([10.20.10.88]:22).
debug1: setting up multiplex master socket
debug1: channel 0: new [/tmp/ssh_mux_sql01_22_rui]
debug1: control_persist_detach: backgrounding master process
debug1: forking to background
debug1: Entering interactive session.
debug1: pledge: id
debug1: multiplexing control connection
debug1: channel 1: new [mux-control]
debug1: channel 2: new [client-session]
debug1: client_input_global_request: rtype hostkeys-00@openssh.com want_reply 0
debug1: Sending environment.
debug1: Sending env LC_ALL = en_US.utf8
debug1: Sending env LANG = en_US.UTF-8
debug1: mux_client_request_session: master session id: 2
Interestingly enough, the behaviour can be reproduced with a telnet command:
$ telnet remote-server 22
Trying x.x.x.x...
Connected to remote-server
Escape character is '^]'.
Connection closed by foreign host.
$ telnet remote-server 22
Trying x.x.x.x...
Connected to remote-server
Escape character is '^]'.
SSH-2.0-OpenSSH_7.4p1 Debian-10+deb9u1
Protocol mismatch.
Connection closed by foreign host.
UPDATE:
Forced Protocol 2
in the /etc/ssh_client
client configuration in the jumpbox. No change.
UPDATE2:
Changed the old key encrypted with DES-EDE3-CBC for a new key encrypted with AES-128-CBC. Again no visible change.
UPDATE3:
Interestingly enough, while the mux is active, the situation does not presents itself.
UPDATE4:
I also have found a similar question at serverfault, however without a chosen answer: https://serverfault.com/questions/445045/ssh-connection-error-ssh-exchange-identification-read-connection-reset-by-pe
Tried regenerating the ssh host keys, and the suggestion of sshd: ALL
without success.
UPDATE 5
Opened a console on the VM on the destination and saw something 'strange'.
tcpdump whereas 1.1.1.1 is the jumpbox.
# tcpdump -n -vvv "host 1.1.1.1"
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
11:47:45.808273 IP (tos 0x0, ttl 64, id 38171, offset 0, flags [DF], proto TCP (6), length 60)
1.1.1.1.37924 > 1.1.1.2.22: Flags [S], cksum 0xfc1f (correct), seq 3260568985, win 29200, options [mss 1460,sackOK,TS val 407355522 ecr 0,nop,wscale 7], length 0
11:47:45.808318 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
1.1.1.2.22 > 1.1.1.1.37924: Flags [S.], cksum 0x5508 (incorrect -> 0x68a8), seq 2881609759, ack 3260568986, win 28960, options [mss 1460,sackOK,TS val 561702650 ecr 407355522,nop,wscale 7], length 0
11:47:45.808525 IP (tos 0x0, ttl 64, id 38172, offset 0, flags [DF], proto TCP (6), length 52)
1.1.1.1.37924 > 1.1.1.2.22: Flags [.], cksum 0x07b0 (correct), seq 1, ack 1, win 229, options [nop,nop,TS val 407355522 ecr 561702650], length 0
11:47:45.808917 IP (tos 0x0, ttl 64, id 38173, offset 0, flags [DF], proto TCP (6), length 92)
1.1.1.1.37924 > 1.1.1.2.22: Flags [P.], cksum 0x6de0 (correct), seq 1:41, ack 1, win 229, options [nop,nop,TS val 407355522 ecr 561702650], length 40
11:47:45.808930 IP (tos 0x0, ttl 64, id 1754, offset 0, flags [DF], proto TCP (6), length 52)
1.1.1.2.22 > 1.1.1.1.37924: Flags [.], cksum 0x5500 (incorrect -> 0x0789), seq 1, ack 41, win 227, options [nop,nop,TS val 561702651 ecr 407355522], length 0
11:47:45.822178 IP (tos 0x0, ttl 64, id 1755, offset 0, flags [DF], proto TCP (6), length 91)
1.1.1.2.22 > 1.1.1.1.37924: Flags [P.], cksum 0x5527 (incorrect -> 0x70c1), seq 1:40, ack 41, win 227, options [nop,nop,TS val 561702654 ecr 407355522], length 39
11:47:45.822645 IP (tos 0x0, ttl 64, id 21666, offset 0, flags [DF], proto TCP (6), length 40)
1.1.1.1.37924 > 1.1.1.2.22: Flags [R], cksum 0xaeb1 (correct), seq 3260569026, win 0, length 0
11:47:50.919752 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 1.1.1.2 tell 1.1.1.1, length 46
11:47:50.919773 ARP, Ethernet (len 6), IPv4 (len 4), Reply 1.1.1.2 is-at 00:50:56:b9:3d:2b, length 28
11:47:50.948732 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 1.1.1.1 tell 1.1.1.2, length 28
11:47:50.948916 ARP, Ethernet (len 6), IPv4 (len 4), Reply 1.1.1.1 is-at 00:50:56:80:57:1a, length 46
^C
11 packets captured
11 packets received by filter
0 packets dropped by kernel
UPDATE 6
Due to the checkum error, I disabled the TCP/UDP checksum offloading to the NIC in the VM, however it did not help.
$sudo ethtool -K eth0 rx off
$sudo ethtool -K eth0 tx off
iface eth0 inet static
address 1.1.1.2
netmask 255.255.255.0
network 1.1.1.0
broadcast 1.11.1.255
gateway 1.1.1.254
post-up /sbin/ethtool -K $IFACE rx off
post-up /sbin/ethtool -K $IFACE tx off
Understanding TCP Checksum Offloading (TCO) in a VMware Environment (2052904)
UPDATE 7
Disabled GSSAPIAuthentication
in the ssh client in the jumpbox. Tested Enable Compression yes
No change.
UPDATE 8
Testing filling up the checksum with iptables
.
/sbin/iptables -A POSTROUTING -t mangle -p tcp -j CHECKSUM --checksum-fill
It did not improve the situation.
UPDATE 9:
Found an interesting test about limiting cyphers, will try it out. MTU problems does not seem the culprit as I am having problems in some cases with server and client in the same network.
For now tested in the client side "ssh -c aes256-ctr", and the symptoms do not improve.
The mysterious case of broken SSH client (“connection reset by peer”)
UPDATE 10
Added this to /etc/ssh/ssh_config
. No changes.
Ciphers aes128-ctr,aes192-ctr,aes256-ctr,arcfour256,arcfour128,aes128-cbc,3des-cbc
SSH issues: Read from socket failed: Connection reset by peer
UPDATE 11
Defined the ssh service in port 22 and port 2222. It did not help.
UPDATE 12
I suspect it being a regression bug present in OpenSSH 7.4 that was corrected with OpenSSH 7.5
Release notes from OpenSSH 7.5
- sshd(8): Fix regression in OpenSSH 7.4 support for the
server-sig-algs extension, where SHA2 RSA signature methods were
not being correctly advertised. bz#2680
For using openSSH 7.5 in Debian 9/Stretch, I installed openssh-client
and openssh-server
from Debian testing/Buster.
No improvements on the situation.
UPDATE 13
Defined
Ciphers aes256-ctr
MACs hmac-sha1
Both at the client(s) and server side. No improvements.
UPDATE 14
Setup
UseDNS no
GSSAPIAuthentication no
GSSAPIKeyExchange no
No change.
UPDATE 15
/etc/ssh/sshd_config
Changed it to /etc/ssh/sshd_config:
TCPKeepAlive no
From How does tcp-keepalive work in ssh?
TCPKeepAlive operates on the TCP layer. It sends an empty TCP ACK
packet [from the SSH server to the client - Rui]. Firewalls can be configured to ignore these packets, so if you
go through a firewall that drops idle connections, these may not keep
the connection alive.
My guess is that TCPKeepAlive was configuring the server sending a packet that is being optimised/ignored in some layer down the stack bellow, and somewhat the remote SSH server believed it was still connected to the TCP mux client, while in fact the session was already teared down; thus the TCP reset(s) at first try.
So whilst some say that if you're using ClientAliveInterval, you can disable TCPKeepAlive, it seems to be more it you are using ClientAliveInterval you ought to disable TCPKeepAlive.
- It is clearly this option; as for the explanation, they are mainly conjectures and will have to double check them/the source when and if I have got time.
TCPKeepAlive apparently also has spoofing issues, so it is recommended that it should be turned off.
Nevertheless, still with the problem.
debian ssh vmware
add a comment |
I have been using a Debian 9 SSH jumpbox host to run my scripts/ansible playbooks for a while. The jumbox talks with Debian 9 and some Debian 8 servers, mostly. Most of the servers are VMs running under VMWare Enterprise 5.5.
The SSH client in the jumbox is configured for doing SSH MUX, and the authentication is done by an RSA certificate file.
The SSH has been working well for years now, however suddenly SSH connections started giving the error ssh_exchange_identification: read: Connection reset by peer
at first try, several times a day, which obviously creates havoc with my scripts and scripts of our development team.
However, after the first try they are ok for a while. The servers misbehaving appear be random at first, but they have some patterns/timeouts. If I do send a command to all of the servers, for instance, running in a command before the intended script/playbook, a few will fail, but the next script will run in all of them.
There havent been recent significant changes on the servers, except for security updates. The transition for Debian 9 has already some (significant) time.
I already found a MTU configuration or other that was once applied to several servers in a malfunction and forgotten, however that was not the case. I also diminished both on the client and server side the ControlPersist
and ClientAliveInterval
both to 1h, and that did not improve the situation.
So at the moment, I am at loss of why this is happening. I am however more inclined to a layer 7 issue than a network problem.
The SSH configuration on the client side /etc/ssh_config
, Debian 9 is:
Host *
SendEnv LANG LC_*
HashKnownHosts yes
GSSAPIAuthentication yes
GSSAPIDelegateCredentials no
ControlMaster auto
ControlPath /tmp/ssh_mux_%h_%p_%r
ControlPersist 1h
Compression no
UseRoaming no
On SSH on the server side of several Debian servers:
Protocol 2
HostKey /etc/ssh/ssh_host_rsa_key
HostKey /etc/ssh/ssh_host_dsa_key
UsePrivilegeSeparation yes
SyslogFacility AUTH
LogLevel INFO
LoginGraceTime 120
PermitRootLogin forced-commands-only
StrictModes yes
PubkeyAuthentication yes
IgnoreRhosts yes
HostbasedAuthentication no
PermitEmptyPasswords no
ChallengeResponseAuthentication no
PasswordAuthentication no
X11Forwarding no
X11DisplayOffset 10
PrintMotd no
PrintLastLog yes
TCPKeepAlive yes
AcceptEnv LANG LC_*
Subsystem sftp /usr/lib/openssh/sftp-server -l INFO
UsePAM yes
ClientAliveInterval 3600
ClientAliveCountMax 0
AddressFamily inet
SSH versions:
client -
$ssh -V
OpenSSH_7.4p1 Debian-10+deb9u1, OpenSSL 1.0.2l 25 May 2017
server(s)
SSH-2.0-OpenSSH_7.4p1 Debian-10+deb9u1 (Debian 9)
SSH-2.0-OpenSSH_6.7p1 Debian-5+deb8u3 (Debian 8)
I have seen that error at least in situations with both servers with the 4.9.0-0.bpo.1-amd64 version.
The tcpdump
of a server misbehaving, both machines being in the same network without any firewalls in the middle. I also monitor MAC addresses and there is not log of a new machine/MAC with the same MAC addresses in the last few years.
#tcpdump port 22
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
19:42:25.462896 IP jumbox.40270 > server.ssh: Flags [S], seq 3882361678, win 23200, options [mss 1160,sackOK,TS val 354223428 ecr 0,nop,wscale 7], length 0
19:42:25.463289 IP server.ssh > jumbox.40270: Flags [S.], seq 405921081, ack 3882361679, win 23200, options [mss 1160,nop,nop,sackOK,nop,wscale 7], length 0
19:42:25.463306 IP jumbox.40270 > server.ssh: Flags [.], ack 1, win 182, length 0
19:42:25.481470 IP server.ssh > jumbox.40270: Flags [S.], seq 4195986320, ack 3882361679, win 23200, options [mss 1160,nop,nop,sackOK,nop,wscale 7], length 0
19:42:25.481477 IP jumbox.40270 > server.ssh: Flags [.], ack 504902058, win 182, length 0
19:42:25.481490 IP server.ssh > jumbox.40270: Flags [R], seq 405921082, win 0, length 0
19:42:25.481494 IP server.ssh > jumbox.40270: Flags [P.], seq 504902058:504902097, ack 1, win 182, length 39
19:42:26.491536 IP server.ssh > jumbox.40270: Flags [S.], seq 4195986320, ack 3882361679, win 23200, options [mss 1160,nop,nop,sackOK,nop,wscale 7], length 0
19:42:26.491551 IP jumbox.40270 > server.ssh: Flags [R], seq 3882361679, win 0, length 0
19:42:28.507528 IP server.ssh > jumbox.40270: Flags [S.], seq 4195986320, ack 3882361679, win 23200, options [mss 1160,nop,nop,sackOK,nop,wscale 7], length 0
19:42:28.507552 IP jumbox.40270 > server.ssh: Flags [R], seq 3882361679, win 0, length 0
19:42:32.699540 IP server.ssh > jumbox.40270: Flags [S.], seq 4195986320, ack 3882361679, win 23200, options [mss 1160,nop,nop,sackOK,nop,wscale 7], length 0
19:42:32.699556 IP jumbox.40270 > server.ssh: Flags [R], seq 3882361679, win 0, length 0
19:42:40.891490 IP server.ssh > jumbox.40270: Flags [S.], seq 4195986320, ack 3882361679, win 23200, options [mss 1160,nop,nop,sackOK,nop,wscale 7], length 0
19:42:40.891514 IP jumbox.40270 > server.ssh: Flags [R], seq 3882361679, win 0, length 0
19:42:57.019511 IP server.ssh > jumbox.40270: Flags [S.], seq 4195986320, ack 3882361679, win 23200, options [mss 1160,nop,nop,sackOK,nop,wscale 7], length 0
19:42:57.019534 IP jumbox.40270 > server.ssh: Flags [R], seq 3882361679, win 0, length 0
An ssh -v server
log of a failed connection, with the reset error:
OpenSSH_7.4p1 Debian-10+deb9u1, OpenSSL 1.0.2l 25 May 2017
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 19: Applying options for *
debug1: /etc/ssh/ssh_config line 59: Deprecated option "useroaming"
debug1: auto-mux: Trying existing master
debug1: Control socket "/tmp/ssh_mux_fenix-storage_22_rui" does not exist
debug1: Connecting to fenix-storage [10.10.32.156] port 22.
debug1: Connection established.
debug1: key_load_public: No such file or directory
debug1: identity file /home/rui/.ssh/id_rsa type -1
debug1: key_load_public: No such file or directory
debug1: identity file /home/rui/.ssh/id_rsa-cert type -1
debug1: key_load_public: No such file or directory
debug1: identity file /home/rui/.ssh/id_dsa type -1
debug1: key_load_public: No such file or directory
debug1: identity file /home/rui/.ssh/id_dsa-cert type -1
debug1: key_load_public: No such file or directory
debug1: identity file /home/rui/.ssh/id_ecdsa type -1
debug1: key_load_public: No such file or directory
debug1: identity file /home/rui/.ssh/id_ecdsa-cert type -1
debug1: key_load_public: No such file or directory
debug1: identity file /home/rui/.ssh/id_ed25519 type -1
debug1: key_load_public: No such file or directory
debug1: identity file /home/rui/.ssh/id_ed25519-cert type -1
debug1: Enabling compatibility mode for protocol 2.0
write: Connection reset by peer
An ssh -v server
of a successful connection:
OpenSSH_7.4p1 Debian-10+deb9u1, OpenSSL 1.0.2l 25 May 2017
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 19: Applying options for *
debug1: /etc/ssh/ssh_config line 59: Deprecated option "useroaming"
debug1: auto-mux: Trying existing master
debug1: Control socket "/tmp/ssh_mux_sql01_22_rui" does not exist
debug1: Connecting to sql01 [10.20.10.88] port 22.
debug1: Connection established.
debug1: key_load_public: No such file or directory
debug1: identity file /home/rui/.ssh/id_rsa type -1
debug1: key_load_public: No such file or directory
debug1: identity file /home/rui/.ssh/id_rsa-cert type -1
debug1: key_load_public: No such file or directory
debug1: identity file /home/rui/.ssh/id_dsa type -1
debug1: key_load_public: No such file or directory
debug1: identity file /home/rui/.ssh/id_dsa-cert type -1
debug1: key_load_public: No such file or directory
debug1: identity file /home/rui/.ssh/id_ecdsa type -1
debug1: key_load_public: No such file or directory
debug1: identity file /home/rui/.ssh/id_ecdsa-cert type -1
debug1: key_load_public: No such file or directory
debug1: identity file /home/rui/.ssh/id_ed25519 type -1
debug1: key_load_public: No such file or directory
debug1: identity file /home/rui/.ssh/id_ed25519-cert type -1
debug1: Enabling compatibility mode for protocol 2.0
debug1: Local version string SSH-2.0-OpenSSH_7.4p1 Debian-10+deb9u1
debug1: Remote protocol version 2.0, remote software version OpenSSH_7.4p1 Debian-10+deb9u1
debug1: match: OpenSSH_7.4p1 Debian-10+deb9u1 pat OpenSSH* compat 0x04000000
debug1: Authenticating to sql01:22 as 'rui'
debug1: SSH2_MSG_KEXINIT sent
debug1: SSH2_MSG_KEXINIT received
debug1: kex: algorithm: curve25519-sha256
debug1: kex: host key algorithm: rsa-sha2-512
debug1: kex: server->client cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: none
debug1: kex: client->server cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: none
debug1: expecting SSH2_MSG_KEX_ECDH_REPLY
debug1: Server host key: ssh-rsa SHA256:6aJ+ipXRZJfbei5YbYtvqKXB01t1YO34O2ChdT/vk/4
debug1: Host 'sql01' is known and matches the RSA host key.
debug1: Found key in /home/rui/.ssh/known_hosts:315
debug1: rekey after 134217728 blocks
debug1: SSH2_MSG_NEWKEYS sent
debug1: expecting SSH2_MSG_NEWKEYS
debug1: SSH2_MSG_NEWKEYS received
debug1: rekey after 134217728 blocks
debug1: SSH2_MSG_EXT_INFO received
debug1: kex_input_ext_info: server-sig-algs=<ssh-ed25519,ssh-rsa,ssh-dss,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521>
debug1: SSH2_MSG_SERVICE_ACCEPT received
debug1: Authentications that can continue: publickey
debug1: Next authentication method: publickey
debug1: Offering RSA public key: /home/rui/.ssh/id_rsa
debug1: Server accepts key: pkalg ssh-rsa blen 277
debug1: Authentication succeeded (publickey).
Authenticated to sql01 ([10.20.10.88]:22).
debug1: setting up multiplex master socket
debug1: channel 0: new [/tmp/ssh_mux_sql01_22_rui]
debug1: control_persist_detach: backgrounding master process
debug1: forking to background
debug1: Entering interactive session.
debug1: pledge: id
debug1: multiplexing control connection
debug1: channel 1: new [mux-control]
debug1: channel 2: new [client-session]
debug1: client_input_global_request: rtype hostkeys-00@openssh.com want_reply 0
debug1: Sending environment.
debug1: Sending env LC_ALL = en_US.utf8
debug1: Sending env LANG = en_US.UTF-8
debug1: mux_client_request_session: master session id: 2
Interestingly enough, the behaviour can be reproduced with a telnet command:
$ telnet remote-server 22
Trying x.x.x.x...
Connected to remote-server
Escape character is '^]'.
Connection closed by foreign host.
$ telnet remote-server 22
Trying x.x.x.x...
Connected to remote-server
Escape character is '^]'.
SSH-2.0-OpenSSH_7.4p1 Debian-10+deb9u1
Protocol mismatch.
Connection closed by foreign host.
UPDATE:
Forced Protocol 2
in the /etc/ssh_client
client configuration in the jumpbox. No change.
UPDATE2:
Changed the old key encrypted with DES-EDE3-CBC for a new key encrypted with AES-128-CBC. Again no visible change.
UPDATE3:
Interestingly enough, while the mux is active, the situation does not presents itself.
UPDATE4:
I also have found a similar question at serverfault, however without a chosen answer: https://serverfault.com/questions/445045/ssh-connection-error-ssh-exchange-identification-read-connection-reset-by-pe
Tried regenerating the ssh host keys, and the suggestion of sshd: ALL
without success.
UPDATE 5
Opened a console on the VM on the destination and saw something 'strange'.
tcpdump whereas 1.1.1.1 is the jumpbox.
# tcpdump -n -vvv "host 1.1.1.1"
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
11:47:45.808273 IP (tos 0x0, ttl 64, id 38171, offset 0, flags [DF], proto TCP (6), length 60)
1.1.1.1.37924 > 1.1.1.2.22: Flags [S], cksum 0xfc1f (correct), seq 3260568985, win 29200, options [mss 1460,sackOK,TS val 407355522 ecr 0,nop,wscale 7], length 0
11:47:45.808318 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
1.1.1.2.22 > 1.1.1.1.37924: Flags [S.], cksum 0x5508 (incorrect -> 0x68a8), seq 2881609759, ack 3260568986, win 28960, options [mss 1460,sackOK,TS val 561702650 ecr 407355522,nop,wscale 7], length 0
11:47:45.808525 IP (tos 0x0, ttl 64, id 38172, offset 0, flags [DF], proto TCP (6), length 52)
1.1.1.1.37924 > 1.1.1.2.22: Flags [.], cksum 0x07b0 (correct), seq 1, ack 1, win 229, options [nop,nop,TS val 407355522 ecr 561702650], length 0
11:47:45.808917 IP (tos 0x0, ttl 64, id 38173, offset 0, flags [DF], proto TCP (6), length 92)
1.1.1.1.37924 > 1.1.1.2.22: Flags [P.], cksum 0x6de0 (correct), seq 1:41, ack 1, win 229, options [nop,nop,TS val 407355522 ecr 561702650], length 40
11:47:45.808930 IP (tos 0x0, ttl 64, id 1754, offset 0, flags [DF], proto TCP (6), length 52)
1.1.1.2.22 > 1.1.1.1.37924: Flags [.], cksum 0x5500 (incorrect -> 0x0789), seq 1, ack 41, win 227, options [nop,nop,TS val 561702651 ecr 407355522], length 0
11:47:45.822178 IP (tos 0x0, ttl 64, id 1755, offset 0, flags [DF], proto TCP (6), length 91)
1.1.1.2.22 > 1.1.1.1.37924: Flags [P.], cksum 0x5527 (incorrect -> 0x70c1), seq 1:40, ack 41, win 227, options [nop,nop,TS val 561702654 ecr 407355522], length 39
11:47:45.822645 IP (tos 0x0, ttl 64, id 21666, offset 0, flags [DF], proto TCP (6), length 40)
1.1.1.1.37924 > 1.1.1.2.22: Flags [R], cksum 0xaeb1 (correct), seq 3260569026, win 0, length 0
11:47:50.919752 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 1.1.1.2 tell 1.1.1.1, length 46
11:47:50.919773 ARP, Ethernet (len 6), IPv4 (len 4), Reply 1.1.1.2 is-at 00:50:56:b9:3d:2b, length 28
11:47:50.948732 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 1.1.1.1 tell 1.1.1.2, length 28
11:47:50.948916 ARP, Ethernet (len 6), IPv4 (len 4), Reply 1.1.1.1 is-at 00:50:56:80:57:1a, length 46
^C
11 packets captured
11 packets received by filter
0 packets dropped by kernel
UPDATE 6
Due to the checkum error, I disabled the TCP/UDP checksum offloading to the NIC in the VM, however it did not help.
$sudo ethtool -K eth0 rx off
$sudo ethtool -K eth0 tx off
iface eth0 inet static
address 1.1.1.2
netmask 255.255.255.0
network 1.1.1.0
broadcast 1.11.1.255
gateway 1.1.1.254
post-up /sbin/ethtool -K $IFACE rx off
post-up /sbin/ethtool -K $IFACE tx off
Understanding TCP Checksum Offloading (TCO) in a VMware Environment (2052904)
UPDATE 7
Disabled GSSAPIAuthentication
in the ssh client in the jumpbox. Tested Enable Compression yes
No change.
UPDATE 8
Testing filling up the checksum with iptables
.
/sbin/iptables -A POSTROUTING -t mangle -p tcp -j CHECKSUM --checksum-fill
It did not improve the situation.
UPDATE 9:
Found an interesting test about limiting cyphers, will try it out. MTU problems does not seem the culprit as I am having problems in some cases with server and client in the same network.
For now tested in the client side "ssh -c aes256-ctr", and the symptoms do not improve.
The mysterious case of broken SSH client (“connection reset by peer”)
UPDATE 10
Added this to /etc/ssh/ssh_config
. No changes.
Ciphers aes128-ctr,aes192-ctr,aes256-ctr,arcfour256,arcfour128,aes128-cbc,3des-cbc
SSH issues: Read from socket failed: Connection reset by peer
UPDATE 11
Defined the ssh service in port 22 and port 2222. It did not help.
UPDATE 12
I suspect it being a regression bug present in OpenSSH 7.4 that was corrected with OpenSSH 7.5
Release notes from OpenSSH 7.5
- sshd(8): Fix regression in OpenSSH 7.4 support for the
server-sig-algs extension, where SHA2 RSA signature methods were
not being correctly advertised. bz#2680
For using openSSH 7.5 in Debian 9/Stretch, I installed openssh-client
and openssh-server
from Debian testing/Buster.
No improvements on the situation.
UPDATE 13
Defined
Ciphers aes256-ctr
MACs hmac-sha1
Both at the client(s) and server side. No improvements.
UPDATE 14
Setup
UseDNS no
GSSAPIAuthentication no
GSSAPIKeyExchange no
No change.
UPDATE 15
/etc/ssh/sshd_config
Changed it to /etc/ssh/sshd_config:
TCPKeepAlive no
From How does tcp-keepalive work in ssh?
TCPKeepAlive operates on the TCP layer. It sends an empty TCP ACK
packet [from the SSH server to the client - Rui]. Firewalls can be configured to ignore these packets, so if you
go through a firewall that drops idle connections, these may not keep
the connection alive.
My guess is that TCPKeepAlive was configuring the server sending a packet that is being optimised/ignored in some layer down the stack bellow, and somewhat the remote SSH server believed it was still connected to the TCP mux client, while in fact the session was already teared down; thus the TCP reset(s) at first try.
So whilst some say that if you're using ClientAliveInterval, you can disable TCPKeepAlive, it seems to be more it you are using ClientAliveInterval you ought to disable TCPKeepAlive.
- It is clearly this option; as for the explanation, they are mainly conjectures and will have to double check them/the source when and if I have got time.
TCPKeepAlive apparently also has spoofing issues, so it is recommended that it should be turned off.
Nevertheless, still with the problem.
debian ssh vmware
The RST packets are not normal, something between your machine and the server seems to be killing your TCP connection. It's hard to tell what that might be without a full packet dump.
– Satō Katsura
Sep 8 '17 at 10:01
@SatōKatsura Though better. That server and jumpbox in the tcpdump are both in the same network; I do have other servers that do routing via firewall
– Rui F Ribeiro
Sep 8 '17 at 11:24
Well, you need to find out where those RST come from. There could be any number of reasons for that. shrug
– Satō Katsura
Sep 8 '17 at 11:33
@SatōKatsura sure indeed. Will add another tcpdump when at work. The difficult part is that this is a bit random
– Rui F Ribeiro
Sep 8 '17 at 11:37
add a comment |
I have been using a Debian 9 SSH jumpbox host to run my scripts/ansible playbooks for a while. The jumbox talks with Debian 9 and some Debian 8 servers, mostly. Most of the servers are VMs running under VMWare Enterprise 5.5.
The SSH client in the jumbox is configured for doing SSH MUX, and the authentication is done by an RSA certificate file.
The SSH has been working well for years now, however suddenly SSH connections started giving the error ssh_exchange_identification: read: Connection reset by peer
at first try, several times a day, which obviously creates havoc with my scripts and scripts of our development team.
However, after the first try they are ok for a while. The servers misbehaving appear be random at first, but they have some patterns/timeouts. If I do send a command to all of the servers, for instance, running in a command before the intended script/playbook, a few will fail, but the next script will run in all of them.
There havent been recent significant changes on the servers, except for security updates. The transition for Debian 9 has already some (significant) time.
I already found a MTU configuration or other that was once applied to several servers in a malfunction and forgotten, however that was not the case. I also diminished both on the client and server side the ControlPersist
and ClientAliveInterval
both to 1h, and that did not improve the situation.
So at the moment, I am at loss of why this is happening. I am however more inclined to a layer 7 issue than a network problem.
The SSH configuration on the client side /etc/ssh_config
, Debian 9 is:
Host *
SendEnv LANG LC_*
HashKnownHosts yes
GSSAPIAuthentication yes
GSSAPIDelegateCredentials no
ControlMaster auto
ControlPath /tmp/ssh_mux_%h_%p_%r
ControlPersist 1h
Compression no
UseRoaming no
On SSH on the server side of several Debian servers:
Protocol 2
HostKey /etc/ssh/ssh_host_rsa_key
HostKey /etc/ssh/ssh_host_dsa_key
UsePrivilegeSeparation yes
SyslogFacility AUTH
LogLevel INFO
LoginGraceTime 120
PermitRootLogin forced-commands-only
StrictModes yes
PubkeyAuthentication yes
IgnoreRhosts yes
HostbasedAuthentication no
PermitEmptyPasswords no
ChallengeResponseAuthentication no
PasswordAuthentication no
X11Forwarding no
X11DisplayOffset 10
PrintMotd no
PrintLastLog yes
TCPKeepAlive yes
AcceptEnv LANG LC_*
Subsystem sftp /usr/lib/openssh/sftp-server -l INFO
UsePAM yes
ClientAliveInterval 3600
ClientAliveCountMax 0
AddressFamily inet
SSH versions:
client -
$ssh -V
OpenSSH_7.4p1 Debian-10+deb9u1, OpenSSL 1.0.2l 25 May 2017
server(s)
SSH-2.0-OpenSSH_7.4p1 Debian-10+deb9u1 (Debian 9)
SSH-2.0-OpenSSH_6.7p1 Debian-5+deb8u3 (Debian 8)
I have seen that error at least in situations with both servers with the 4.9.0-0.bpo.1-amd64 version.
The tcpdump
of a server misbehaving, both machines being in the same network without any firewalls in the middle. I also monitor MAC addresses and there is not log of a new machine/MAC with the same MAC addresses in the last few years.
#tcpdump port 22
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
19:42:25.462896 IP jumbox.40270 > server.ssh: Flags [S], seq 3882361678, win 23200, options [mss 1160,sackOK,TS val 354223428 ecr 0,nop,wscale 7], length 0
19:42:25.463289 IP server.ssh > jumbox.40270: Flags [S.], seq 405921081, ack 3882361679, win 23200, options [mss 1160,nop,nop,sackOK,nop,wscale 7], length 0
19:42:25.463306 IP jumbox.40270 > server.ssh: Flags [.], ack 1, win 182, length 0
19:42:25.481470 IP server.ssh > jumbox.40270: Flags [S.], seq 4195986320, ack 3882361679, win 23200, options [mss 1160,nop,nop,sackOK,nop,wscale 7], length 0
19:42:25.481477 IP jumbox.40270 > server.ssh: Flags [.], ack 504902058, win 182, length 0
19:42:25.481490 IP server.ssh > jumbox.40270: Flags [R], seq 405921082, win 0, length 0
19:42:25.481494 IP server.ssh > jumbox.40270: Flags [P.], seq 504902058:504902097, ack 1, win 182, length 39
19:42:26.491536 IP server.ssh > jumbox.40270: Flags [S.], seq 4195986320, ack 3882361679, win 23200, options [mss 1160,nop,nop,sackOK,nop,wscale 7], length 0
19:42:26.491551 IP jumbox.40270 > server.ssh: Flags [R], seq 3882361679, win 0, length 0
19:42:28.507528 IP server.ssh > jumbox.40270: Flags [S.], seq 4195986320, ack 3882361679, win 23200, options [mss 1160,nop,nop,sackOK,nop,wscale 7], length 0
19:42:28.507552 IP jumbox.40270 > server.ssh: Flags [R], seq 3882361679, win 0, length 0
19:42:32.699540 IP server.ssh > jumbox.40270: Flags [S.], seq 4195986320, ack 3882361679, win 23200, options [mss 1160,nop,nop,sackOK,nop,wscale 7], length 0
19:42:32.699556 IP jumbox.40270 > server.ssh: Flags [R], seq 3882361679, win 0, length 0
19:42:40.891490 IP server.ssh > jumbox.40270: Flags [S.], seq 4195986320, ack 3882361679, win 23200, options [mss 1160,nop,nop,sackOK,nop,wscale 7], length 0
19:42:40.891514 IP jumbox.40270 > server.ssh: Flags [R], seq 3882361679, win 0, length 0
19:42:57.019511 IP server.ssh > jumbox.40270: Flags [S.], seq 4195986320, ack 3882361679, win 23200, options [mss 1160,nop,nop,sackOK,nop,wscale 7], length 0
19:42:57.019534 IP jumbox.40270 > server.ssh: Flags [R], seq 3882361679, win 0, length 0
An ssh -v server
log of a failed connection, with the reset error:
OpenSSH_7.4p1 Debian-10+deb9u1, OpenSSL 1.0.2l 25 May 2017
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 19: Applying options for *
debug1: /etc/ssh/ssh_config line 59: Deprecated option "useroaming"
debug1: auto-mux: Trying existing master
debug1: Control socket "/tmp/ssh_mux_fenix-storage_22_rui" does not exist
debug1: Connecting to fenix-storage [10.10.32.156] port 22.
debug1: Connection established.
debug1: key_load_public: No such file or directory
debug1: identity file /home/rui/.ssh/id_rsa type -1
debug1: key_load_public: No such file or directory
debug1: identity file /home/rui/.ssh/id_rsa-cert type -1
debug1: key_load_public: No such file or directory
debug1: identity file /home/rui/.ssh/id_dsa type -1
debug1: key_load_public: No such file or directory
debug1: identity file /home/rui/.ssh/id_dsa-cert type -1
debug1: key_load_public: No such file or directory
debug1: identity file /home/rui/.ssh/id_ecdsa type -1
debug1: key_load_public: No such file or directory
debug1: identity file /home/rui/.ssh/id_ecdsa-cert type -1
debug1: key_load_public: No such file or directory
debug1: identity file /home/rui/.ssh/id_ed25519 type -1
debug1: key_load_public: No such file or directory
debug1: identity file /home/rui/.ssh/id_ed25519-cert type -1
debug1: Enabling compatibility mode for protocol 2.0
write: Connection reset by peer
An ssh -v server
of a successful connection:
OpenSSH_7.4p1 Debian-10+deb9u1, OpenSSL 1.0.2l 25 May 2017
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 19: Applying options for *
debug1: /etc/ssh/ssh_config line 59: Deprecated option "useroaming"
debug1: auto-mux: Trying existing master
debug1: Control socket "/tmp/ssh_mux_sql01_22_rui" does not exist
debug1: Connecting to sql01 [10.20.10.88] port 22.
debug1: Connection established.
debug1: key_load_public: No such file or directory
debug1: identity file /home/rui/.ssh/id_rsa type -1
debug1: key_load_public: No such file or directory
debug1: identity file /home/rui/.ssh/id_rsa-cert type -1
debug1: key_load_public: No such file or directory
debug1: identity file /home/rui/.ssh/id_dsa type -1
debug1: key_load_public: No such file or directory
debug1: identity file /home/rui/.ssh/id_dsa-cert type -1
debug1: key_load_public: No such file or directory
debug1: identity file /home/rui/.ssh/id_ecdsa type -1
debug1: key_load_public: No such file or directory
debug1: identity file /home/rui/.ssh/id_ecdsa-cert type -1
debug1: key_load_public: No such file or directory
debug1: identity file /home/rui/.ssh/id_ed25519 type -1
debug1: key_load_public: No such file or directory
debug1: identity file /home/rui/.ssh/id_ed25519-cert type -1
debug1: Enabling compatibility mode for protocol 2.0
debug1: Local version string SSH-2.0-OpenSSH_7.4p1 Debian-10+deb9u1
debug1: Remote protocol version 2.0, remote software version OpenSSH_7.4p1 Debian-10+deb9u1
debug1: match: OpenSSH_7.4p1 Debian-10+deb9u1 pat OpenSSH* compat 0x04000000
debug1: Authenticating to sql01:22 as 'rui'
debug1: SSH2_MSG_KEXINIT sent
debug1: SSH2_MSG_KEXINIT received
debug1: kex: algorithm: curve25519-sha256
debug1: kex: host key algorithm: rsa-sha2-512
debug1: kex: server->client cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: none
debug1: kex: client->server cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: none
debug1: expecting SSH2_MSG_KEX_ECDH_REPLY
debug1: Server host key: ssh-rsa SHA256:6aJ+ipXRZJfbei5YbYtvqKXB01t1YO34O2ChdT/vk/4
debug1: Host 'sql01' is known and matches the RSA host key.
debug1: Found key in /home/rui/.ssh/known_hosts:315
debug1: rekey after 134217728 blocks
debug1: SSH2_MSG_NEWKEYS sent
debug1: expecting SSH2_MSG_NEWKEYS
debug1: SSH2_MSG_NEWKEYS received
debug1: rekey after 134217728 blocks
debug1: SSH2_MSG_EXT_INFO received
debug1: kex_input_ext_info: server-sig-algs=<ssh-ed25519,ssh-rsa,ssh-dss,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521>
debug1: SSH2_MSG_SERVICE_ACCEPT received
debug1: Authentications that can continue: publickey
debug1: Next authentication method: publickey
debug1: Offering RSA public key: /home/rui/.ssh/id_rsa
debug1: Server accepts key: pkalg ssh-rsa blen 277
debug1: Authentication succeeded (publickey).
Authenticated to sql01 ([10.20.10.88]:22).
debug1: setting up multiplex master socket
debug1: channel 0: new [/tmp/ssh_mux_sql01_22_rui]
debug1: control_persist_detach: backgrounding master process
debug1: forking to background
debug1: Entering interactive session.
debug1: pledge: id
debug1: multiplexing control connection
debug1: channel 1: new [mux-control]
debug1: channel 2: new [client-session]
debug1: client_input_global_request: rtype hostkeys-00@openssh.com want_reply 0
debug1: Sending environment.
debug1: Sending env LC_ALL = en_US.utf8
debug1: Sending env LANG = en_US.UTF-8
debug1: mux_client_request_session: master session id: 2
Interestingly enough, the behaviour can be reproduced with a telnet command:
$ telnet remote-server 22
Trying x.x.x.x...
Connected to remote-server
Escape character is '^]'.
Connection closed by foreign host.
$ telnet remote-server 22
Trying x.x.x.x...
Connected to remote-server
Escape character is '^]'.
SSH-2.0-OpenSSH_7.4p1 Debian-10+deb9u1
Protocol mismatch.
Connection closed by foreign host.
UPDATE:
Forced Protocol 2
in the /etc/ssh_client
client configuration in the jumpbox. No change.
UPDATE2:
Changed the old key encrypted with DES-EDE3-CBC for a new key encrypted with AES-128-CBC. Again no visible change.
UPDATE3:
Interestingly enough, while the mux is active, the situation does not presents itself.
UPDATE4:
I also have found a similar question at serverfault, however without a chosen answer: https://serverfault.com/questions/445045/ssh-connection-error-ssh-exchange-identification-read-connection-reset-by-pe
Tried regenerating the ssh host keys, and the suggestion of sshd: ALL
without success.
UPDATE 5
Opened a console on the VM on the destination and saw something 'strange'.
tcpdump whereas 1.1.1.1 is the jumpbox.
# tcpdump -n -vvv "host 1.1.1.1"
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
11:47:45.808273 IP (tos 0x0, ttl 64, id 38171, offset 0, flags [DF], proto TCP (6), length 60)
1.1.1.1.37924 > 1.1.1.2.22: Flags [S], cksum 0xfc1f (correct), seq 3260568985, win 29200, options [mss 1460,sackOK,TS val 407355522 ecr 0,nop,wscale 7], length 0
11:47:45.808318 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
1.1.1.2.22 > 1.1.1.1.37924: Flags [S.], cksum 0x5508 (incorrect -> 0x68a8), seq 2881609759, ack 3260568986, win 28960, options [mss 1460,sackOK,TS val 561702650 ecr 407355522,nop,wscale 7], length 0
11:47:45.808525 IP (tos 0x0, ttl 64, id 38172, offset 0, flags [DF], proto TCP (6), length 52)
1.1.1.1.37924 > 1.1.1.2.22: Flags [.], cksum 0x07b0 (correct), seq 1, ack 1, win 229, options [nop,nop,TS val 407355522 ecr 561702650], length 0
11:47:45.808917 IP (tos 0x0, ttl 64, id 38173, offset 0, flags [DF], proto TCP (6), length 92)
1.1.1.1.37924 > 1.1.1.2.22: Flags [P.], cksum 0x6de0 (correct), seq 1:41, ack 1, win 229, options [nop,nop,TS val 407355522 ecr 561702650], length 40
11:47:45.808930 IP (tos 0x0, ttl 64, id 1754, offset 0, flags [DF], proto TCP (6), length 52)
1.1.1.2.22 > 1.1.1.1.37924: Flags [.], cksum 0x5500 (incorrect -> 0x0789), seq 1, ack 41, win 227, options [nop,nop,TS val 561702651 ecr 407355522], length 0
11:47:45.822178 IP (tos 0x0, ttl 64, id 1755, offset 0, flags [DF], proto TCP (6), length 91)
1.1.1.2.22 > 1.1.1.1.37924: Flags [P.], cksum 0x5527 (incorrect -> 0x70c1), seq 1:40, ack 41, win 227, options [nop,nop,TS val 561702654 ecr 407355522], length 39
11:47:45.822645 IP (tos 0x0, ttl 64, id 21666, offset 0, flags [DF], proto TCP (6), length 40)
1.1.1.1.37924 > 1.1.1.2.22: Flags [R], cksum 0xaeb1 (correct), seq 3260569026, win 0, length 0
11:47:50.919752 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 1.1.1.2 tell 1.1.1.1, length 46
11:47:50.919773 ARP, Ethernet (len 6), IPv4 (len 4), Reply 1.1.1.2 is-at 00:50:56:b9:3d:2b, length 28
11:47:50.948732 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 1.1.1.1 tell 1.1.1.2, length 28
11:47:50.948916 ARP, Ethernet (len 6), IPv4 (len 4), Reply 1.1.1.1 is-at 00:50:56:80:57:1a, length 46
^C
11 packets captured
11 packets received by filter
0 packets dropped by kernel
UPDATE 6
Due to the checkum error, I disabled the TCP/UDP checksum offloading to the NIC in the VM, however it did not help.
$sudo ethtool -K eth0 rx off
$sudo ethtool -K eth0 tx off
iface eth0 inet static
address 1.1.1.2
netmask 255.255.255.0
network 1.1.1.0
broadcast 1.11.1.255
gateway 1.1.1.254
post-up /sbin/ethtool -K $IFACE rx off
post-up /sbin/ethtool -K $IFACE tx off
Understanding TCP Checksum Offloading (TCO) in a VMware Environment (2052904)
UPDATE 7
Disabled GSSAPIAuthentication
in the ssh client in the jumpbox. Tested Enable Compression yes
No change.
UPDATE 8
Testing filling up the checksum with iptables
.
/sbin/iptables -A POSTROUTING -t mangle -p tcp -j CHECKSUM --checksum-fill
It did not improve the situation.
UPDATE 9:
Found an interesting test about limiting cyphers, will try it out. MTU problems does not seem the culprit as I am having problems in some cases with server and client in the same network.
For now tested in the client side "ssh -c aes256-ctr", and the symptoms do not improve.
The mysterious case of broken SSH client (“connection reset by peer”)
UPDATE 10
Added this to /etc/ssh/ssh_config
. No changes.
Ciphers aes128-ctr,aes192-ctr,aes256-ctr,arcfour256,arcfour128,aes128-cbc,3des-cbc
SSH issues: Read from socket failed: Connection reset by peer
UPDATE 11
Defined the ssh service in port 22 and port 2222. It did not help.
UPDATE 12
I suspect it being a regression bug present in OpenSSH 7.4 that was corrected with OpenSSH 7.5
Release notes from OpenSSH 7.5
- sshd(8): Fix regression in OpenSSH 7.4 support for the
server-sig-algs extension, where SHA2 RSA signature methods were
not being correctly advertised. bz#2680
For using openSSH 7.5 in Debian 9/Stretch, I installed openssh-client
and openssh-server
from Debian testing/Buster.
No improvements on the situation.
UPDATE 13
Defined
Ciphers aes256-ctr
MACs hmac-sha1
Both at the client(s) and server side. No improvements.
UPDATE 14
Setup
UseDNS no
GSSAPIAuthentication no
GSSAPIKeyExchange no
No change.
UPDATE 15
/etc/ssh/sshd_config
Changed it to /etc/ssh/sshd_config:
TCPKeepAlive no
From How does tcp-keepalive work in ssh?
TCPKeepAlive operates on the TCP layer. It sends an empty TCP ACK
packet [from the SSH server to the client - Rui]. Firewalls can be configured to ignore these packets, so if you
go through a firewall that drops idle connections, these may not keep
the connection alive.
My guess is that TCPKeepAlive was configuring the server sending a packet that is being optimised/ignored in some layer down the stack bellow, and somewhat the remote SSH server believed it was still connected to the TCP mux client, while in fact the session was already teared down; thus the TCP reset(s) at first try.
So whilst some say that if you're using ClientAliveInterval, you can disable TCPKeepAlive, it seems to be more it you are using ClientAliveInterval you ought to disable TCPKeepAlive.
- It is clearly this option; as for the explanation, they are mainly conjectures and will have to double check them/the source when and if I have got time.
TCPKeepAlive apparently also has spoofing issues, so it is recommended that it should be turned off.
Nevertheless, still with the problem.
debian ssh vmware
I have been using a Debian 9 SSH jumpbox host to run my scripts/ansible playbooks for a while. The jumbox talks with Debian 9 and some Debian 8 servers, mostly. Most of the servers are VMs running under VMWare Enterprise 5.5.
The SSH client in the jumbox is configured for doing SSH MUX, and the authentication is done by an RSA certificate file.
The SSH has been working well for years now, however suddenly SSH connections started giving the error ssh_exchange_identification: read: Connection reset by peer
at first try, several times a day, which obviously creates havoc with my scripts and scripts of our development team.
However, after the first try they are ok for a while. The servers misbehaving appear be random at first, but they have some patterns/timeouts. If I do send a command to all of the servers, for instance, running in a command before the intended script/playbook, a few will fail, but the next script will run in all of them.
There havent been recent significant changes on the servers, except for security updates. The transition for Debian 9 has already some (significant) time.
I already found a MTU configuration or other that was once applied to several servers in a malfunction and forgotten, however that was not the case. I also diminished both on the client and server side the ControlPersist
and ClientAliveInterval
both to 1h, and that did not improve the situation.
So at the moment, I am at loss of why this is happening. I am however more inclined to a layer 7 issue than a network problem.
The SSH configuration on the client side /etc/ssh_config
, Debian 9 is:
Host *
SendEnv LANG LC_*
HashKnownHosts yes
GSSAPIAuthentication yes
GSSAPIDelegateCredentials no
ControlMaster auto
ControlPath /tmp/ssh_mux_%h_%p_%r
ControlPersist 1h
Compression no
UseRoaming no
On SSH on the server side of several Debian servers:
Protocol 2
HostKey /etc/ssh/ssh_host_rsa_key
HostKey /etc/ssh/ssh_host_dsa_key
UsePrivilegeSeparation yes
SyslogFacility AUTH
LogLevel INFO
LoginGraceTime 120
PermitRootLogin forced-commands-only
StrictModes yes
PubkeyAuthentication yes
IgnoreRhosts yes
HostbasedAuthentication no
PermitEmptyPasswords no
ChallengeResponseAuthentication no
PasswordAuthentication no
X11Forwarding no
X11DisplayOffset 10
PrintMotd no
PrintLastLog yes
TCPKeepAlive yes
AcceptEnv LANG LC_*
Subsystem sftp /usr/lib/openssh/sftp-server -l INFO
UsePAM yes
ClientAliveInterval 3600
ClientAliveCountMax 0
AddressFamily inet
SSH versions:
client -
$ssh -V
OpenSSH_7.4p1 Debian-10+deb9u1, OpenSSL 1.0.2l 25 May 2017
server(s)
SSH-2.0-OpenSSH_7.4p1 Debian-10+deb9u1 (Debian 9)
SSH-2.0-OpenSSH_6.7p1 Debian-5+deb8u3 (Debian 8)
I have seen that error at least in situations with both servers with the 4.9.0-0.bpo.1-amd64 version.
The tcpdump
of a server misbehaving, both machines being in the same network without any firewalls in the middle. I also monitor MAC addresses and there is not log of a new machine/MAC with the same MAC addresses in the last few years.
#tcpdump port 22
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
19:42:25.462896 IP jumbox.40270 > server.ssh: Flags [S], seq 3882361678, win 23200, options [mss 1160,sackOK,TS val 354223428 ecr 0,nop,wscale 7], length 0
19:42:25.463289 IP server.ssh > jumbox.40270: Flags [S.], seq 405921081, ack 3882361679, win 23200, options [mss 1160,nop,nop,sackOK,nop,wscale 7], length 0
19:42:25.463306 IP jumbox.40270 > server.ssh: Flags [.], ack 1, win 182, length 0
19:42:25.481470 IP server.ssh > jumbox.40270: Flags [S.], seq 4195986320, ack 3882361679, win 23200, options [mss 1160,nop,nop,sackOK,nop,wscale 7], length 0
19:42:25.481477 IP jumbox.40270 > server.ssh: Flags [.], ack 504902058, win 182, length 0
19:42:25.481490 IP server.ssh > jumbox.40270: Flags [R], seq 405921082, win 0, length 0
19:42:25.481494 IP server.ssh > jumbox.40270: Flags [P.], seq 504902058:504902097, ack 1, win 182, length 39
19:42:26.491536 IP server.ssh > jumbox.40270: Flags [S.], seq 4195986320, ack 3882361679, win 23200, options [mss 1160,nop,nop,sackOK,nop,wscale 7], length 0
19:42:26.491551 IP jumbox.40270 > server.ssh: Flags [R], seq 3882361679, win 0, length 0
19:42:28.507528 IP server.ssh > jumbox.40270: Flags [S.], seq 4195986320, ack 3882361679, win 23200, options [mss 1160,nop,nop,sackOK,nop,wscale 7], length 0
19:42:28.507552 IP jumbox.40270 > server.ssh: Flags [R], seq 3882361679, win 0, length 0
19:42:32.699540 IP server.ssh > jumbox.40270: Flags [S.], seq 4195986320, ack 3882361679, win 23200, options [mss 1160,nop,nop,sackOK,nop,wscale 7], length 0
19:42:32.699556 IP jumbox.40270 > server.ssh: Flags [R], seq 3882361679, win 0, length 0
19:42:40.891490 IP server.ssh > jumbox.40270: Flags [S.], seq 4195986320, ack 3882361679, win 23200, options [mss 1160,nop,nop,sackOK,nop,wscale 7], length 0
19:42:40.891514 IP jumbox.40270 > server.ssh: Flags [R], seq 3882361679, win 0, length 0
19:42:57.019511 IP server.ssh > jumbox.40270: Flags [S.], seq 4195986320, ack 3882361679, win 23200, options [mss 1160,nop,nop,sackOK,nop,wscale 7], length 0
19:42:57.019534 IP jumbox.40270 > server.ssh: Flags [R], seq 3882361679, win 0, length 0
An ssh -v server
log of a failed connection, with the reset error:
OpenSSH_7.4p1 Debian-10+deb9u1, OpenSSL 1.0.2l 25 May 2017
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 19: Applying options for *
debug1: /etc/ssh/ssh_config line 59: Deprecated option "useroaming"
debug1: auto-mux: Trying existing master
debug1: Control socket "/tmp/ssh_mux_fenix-storage_22_rui" does not exist
debug1: Connecting to fenix-storage [10.10.32.156] port 22.
debug1: Connection established.
debug1: key_load_public: No such file or directory
debug1: identity file /home/rui/.ssh/id_rsa type -1
debug1: key_load_public: No such file or directory
debug1: identity file /home/rui/.ssh/id_rsa-cert type -1
debug1: key_load_public: No such file or directory
debug1: identity file /home/rui/.ssh/id_dsa type -1
debug1: key_load_public: No such file or directory
debug1: identity file /home/rui/.ssh/id_dsa-cert type -1
debug1: key_load_public: No such file or directory
debug1: identity file /home/rui/.ssh/id_ecdsa type -1
debug1: key_load_public: No such file or directory
debug1: identity file /home/rui/.ssh/id_ecdsa-cert type -1
debug1: key_load_public: No such file or directory
debug1: identity file /home/rui/.ssh/id_ed25519 type -1
debug1: key_load_public: No such file or directory
debug1: identity file /home/rui/.ssh/id_ed25519-cert type -1
debug1: Enabling compatibility mode for protocol 2.0
write: Connection reset by peer
An ssh -v server
of a successful connection:
OpenSSH_7.4p1 Debian-10+deb9u1, OpenSSL 1.0.2l 25 May 2017
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 19: Applying options for *
debug1: /etc/ssh/ssh_config line 59: Deprecated option "useroaming"
debug1: auto-mux: Trying existing master
debug1: Control socket "/tmp/ssh_mux_sql01_22_rui" does not exist
debug1: Connecting to sql01 [10.20.10.88] port 22.
debug1: Connection established.
debug1: key_load_public: No such file or directory
debug1: identity file /home/rui/.ssh/id_rsa type -1
debug1: key_load_public: No such file or directory
debug1: identity file /home/rui/.ssh/id_rsa-cert type -1
debug1: key_load_public: No such file or directory
debug1: identity file /home/rui/.ssh/id_dsa type -1
debug1: key_load_public: No such file or directory
debug1: identity file /home/rui/.ssh/id_dsa-cert type -1
debug1: key_load_public: No such file or directory
debug1: identity file /home/rui/.ssh/id_ecdsa type -1
debug1: key_load_public: No such file or directory
debug1: identity file /home/rui/.ssh/id_ecdsa-cert type -1
debug1: key_load_public: No such file or directory
debug1: identity file /home/rui/.ssh/id_ed25519 type -1
debug1: key_load_public: No such file or directory
debug1: identity file /home/rui/.ssh/id_ed25519-cert type -1
debug1: Enabling compatibility mode for protocol 2.0
debug1: Local version string SSH-2.0-OpenSSH_7.4p1 Debian-10+deb9u1
debug1: Remote protocol version 2.0, remote software version OpenSSH_7.4p1 Debian-10+deb9u1
debug1: match: OpenSSH_7.4p1 Debian-10+deb9u1 pat OpenSSH* compat 0x04000000
debug1: Authenticating to sql01:22 as 'rui'
debug1: SSH2_MSG_KEXINIT sent
debug1: SSH2_MSG_KEXINIT received
debug1: kex: algorithm: curve25519-sha256
debug1: kex: host key algorithm: rsa-sha2-512
debug1: kex: server->client cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: none
debug1: kex: client->server cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: none
debug1: expecting SSH2_MSG_KEX_ECDH_REPLY
debug1: Server host key: ssh-rsa SHA256:6aJ+ipXRZJfbei5YbYtvqKXB01t1YO34O2ChdT/vk/4
debug1: Host 'sql01' is known and matches the RSA host key.
debug1: Found key in /home/rui/.ssh/known_hosts:315
debug1: rekey after 134217728 blocks
debug1: SSH2_MSG_NEWKEYS sent
debug1: expecting SSH2_MSG_NEWKEYS
debug1: SSH2_MSG_NEWKEYS received
debug1: rekey after 134217728 blocks
debug1: SSH2_MSG_EXT_INFO received
debug1: kex_input_ext_info: server-sig-algs=<ssh-ed25519,ssh-rsa,ssh-dss,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521>
debug1: SSH2_MSG_SERVICE_ACCEPT received
debug1: Authentications that can continue: publickey
debug1: Next authentication method: publickey
debug1: Offering RSA public key: /home/rui/.ssh/id_rsa
debug1: Server accepts key: pkalg ssh-rsa blen 277
debug1: Authentication succeeded (publickey).
Authenticated to sql01 ([10.20.10.88]:22).
debug1: setting up multiplex master socket
debug1: channel 0: new [/tmp/ssh_mux_sql01_22_rui]
debug1: control_persist_detach: backgrounding master process
debug1: forking to background
debug1: Entering interactive session.
debug1: pledge: id
debug1: multiplexing control connection
debug1: channel 1: new [mux-control]
debug1: channel 2: new [client-session]
debug1: client_input_global_request: rtype hostkeys-00@openssh.com want_reply 0
debug1: Sending environment.
debug1: Sending env LC_ALL = en_US.utf8
debug1: Sending env LANG = en_US.UTF-8
debug1: mux_client_request_session: master session id: 2
Interestingly enough, the behaviour can be reproduced with a telnet command:
$ telnet remote-server 22
Trying x.x.x.x...
Connected to remote-server
Escape character is '^]'.
Connection closed by foreign host.
$ telnet remote-server 22
Trying x.x.x.x...
Connected to remote-server
Escape character is '^]'.
SSH-2.0-OpenSSH_7.4p1 Debian-10+deb9u1
Protocol mismatch.
Connection closed by foreign host.
UPDATE:
Forced Protocol 2
in the /etc/ssh_client
client configuration in the jumpbox. No change.
UPDATE2:
Changed the old key encrypted with DES-EDE3-CBC for a new key encrypted with AES-128-CBC. Again no visible change.
UPDATE3:
Interestingly enough, while the mux is active, the situation does not presents itself.
UPDATE4:
I also have found a similar question at serverfault, however without a chosen answer: https://serverfault.com/questions/445045/ssh-connection-error-ssh-exchange-identification-read-connection-reset-by-pe
Tried regenerating the ssh host keys, and the suggestion of sshd: ALL
without success.
UPDATE 5
Opened a console on the VM on the destination and saw something 'strange'.
tcpdump whereas 1.1.1.1 is the jumpbox.
# tcpdump -n -vvv "host 1.1.1.1"
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
11:47:45.808273 IP (tos 0x0, ttl 64, id 38171, offset 0, flags [DF], proto TCP (6), length 60)
1.1.1.1.37924 > 1.1.1.2.22: Flags [S], cksum 0xfc1f (correct), seq 3260568985, win 29200, options [mss 1460,sackOK,TS val 407355522 ecr 0,nop,wscale 7], length 0
11:47:45.808318 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
1.1.1.2.22 > 1.1.1.1.37924: Flags [S.], cksum 0x5508 (incorrect -> 0x68a8), seq 2881609759, ack 3260568986, win 28960, options [mss 1460,sackOK,TS val 561702650 ecr 407355522,nop,wscale 7], length 0
11:47:45.808525 IP (tos 0x0, ttl 64, id 38172, offset 0, flags [DF], proto TCP (6), length 52)
1.1.1.1.37924 > 1.1.1.2.22: Flags [.], cksum 0x07b0 (correct), seq 1, ack 1, win 229, options [nop,nop,TS val 407355522 ecr 561702650], length 0
11:47:45.808917 IP (tos 0x0, ttl 64, id 38173, offset 0, flags [DF], proto TCP (6), length 92)
1.1.1.1.37924 > 1.1.1.2.22: Flags [P.], cksum 0x6de0 (correct), seq 1:41, ack 1, win 229, options [nop,nop,TS val 407355522 ecr 561702650], length 40
11:47:45.808930 IP (tos 0x0, ttl 64, id 1754, offset 0, flags [DF], proto TCP (6), length 52)
1.1.1.2.22 > 1.1.1.1.37924: Flags [.], cksum 0x5500 (incorrect -> 0x0789), seq 1, ack 41, win 227, options [nop,nop,TS val 561702651 ecr 407355522], length 0
11:47:45.822178 IP (tos 0x0, ttl 64, id 1755, offset 0, flags [DF], proto TCP (6), length 91)
1.1.1.2.22 > 1.1.1.1.37924: Flags [P.], cksum 0x5527 (incorrect -> 0x70c1), seq 1:40, ack 41, win 227, options [nop,nop,TS val 561702654 ecr 407355522], length 39
11:47:45.822645 IP (tos 0x0, ttl 64, id 21666, offset 0, flags [DF], proto TCP (6), length 40)
1.1.1.1.37924 > 1.1.1.2.22: Flags [R], cksum 0xaeb1 (correct), seq 3260569026, win 0, length 0
11:47:50.919752 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 1.1.1.2 tell 1.1.1.1, length 46
11:47:50.919773 ARP, Ethernet (len 6), IPv4 (len 4), Reply 1.1.1.2 is-at 00:50:56:b9:3d:2b, length 28
11:47:50.948732 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 1.1.1.1 tell 1.1.1.2, length 28
11:47:50.948916 ARP, Ethernet (len 6), IPv4 (len 4), Reply 1.1.1.1 is-at 00:50:56:80:57:1a, length 46
^C
11 packets captured
11 packets received by filter
0 packets dropped by kernel
UPDATE 6
Due to the checkum error, I disabled the TCP/UDP checksum offloading to the NIC in the VM, however it did not help.
$sudo ethtool -K eth0 rx off
$sudo ethtool -K eth0 tx off
iface eth0 inet static
address 1.1.1.2
netmask 255.255.255.0
network 1.1.1.0
broadcast 1.11.1.255
gateway 1.1.1.254
post-up /sbin/ethtool -K $IFACE rx off
post-up /sbin/ethtool -K $IFACE tx off
Understanding TCP Checksum Offloading (TCO) in a VMware Environment (2052904)
UPDATE 7
Disabled GSSAPIAuthentication
in the ssh client in the jumpbox. Tested Enable Compression yes
No change.
UPDATE 8
Testing filling up the checksum with iptables
.
/sbin/iptables -A POSTROUTING -t mangle -p tcp -j CHECKSUM --checksum-fill
It did not improve the situation.
UPDATE 9:
Found an interesting test about limiting cyphers, will try it out. MTU problems does not seem the culprit as I am having problems in some cases with server and client in the same network.
For now tested in the client side "ssh -c aes256-ctr", and the symptoms do not improve.
The mysterious case of broken SSH client (“connection reset by peer”)
UPDATE 10
Added this to /etc/ssh/ssh_config
. No changes.
Ciphers aes128-ctr,aes192-ctr,aes256-ctr,arcfour256,arcfour128,aes128-cbc,3des-cbc
SSH issues: Read from socket failed: Connection reset by peer
UPDATE 11
Defined the ssh service in port 22 and port 2222. It did not help.
UPDATE 12
I suspect it being a regression bug present in OpenSSH 7.4 that was corrected with OpenSSH 7.5
Release notes from OpenSSH 7.5
- sshd(8): Fix regression in OpenSSH 7.4 support for the
server-sig-algs extension, where SHA2 RSA signature methods were
not being correctly advertised. bz#2680
For using openSSH 7.5 in Debian 9/Stretch, I installed openssh-client
and openssh-server
from Debian testing/Buster.
No improvements on the situation.
UPDATE 13
Defined
Ciphers aes256-ctr
MACs hmac-sha1
Both at the client(s) and server side. No improvements.
UPDATE 14
Setup
UseDNS no
GSSAPIAuthentication no
GSSAPIKeyExchange no
No change.
UPDATE 15
/etc/ssh/sshd_config
Changed it to /etc/ssh/sshd_config:
TCPKeepAlive no
From How does tcp-keepalive work in ssh?
TCPKeepAlive operates on the TCP layer. It sends an empty TCP ACK
packet [from the SSH server to the client - Rui]. Firewalls can be configured to ignore these packets, so if you
go through a firewall that drops idle connections, these may not keep
the connection alive.
My guess is that TCPKeepAlive was configuring the server sending a packet that is being optimised/ignored in some layer down the stack bellow, and somewhat the remote SSH server believed it was still connected to the TCP mux client, while in fact the session was already teared down; thus the TCP reset(s) at first try.
So whilst some say that if you're using ClientAliveInterval, you can disable TCPKeepAlive, it seems to be more it you are using ClientAliveInterval you ought to disable TCPKeepAlive.
- It is clearly this option; as for the explanation, they are mainly conjectures and will have to double check them/the source when and if I have got time.
TCPKeepAlive apparently also has spoofing issues, so it is recommended that it should be turned off.
Nevertheless, still with the problem.
debian ssh vmware
debian ssh vmware
edited Oct 10 '17 at 13:02
Rui F Ribeiro
asked Sep 8 '17 at 9:35
Rui F RibeiroRui F Ribeiro
40.7k1479137
40.7k1479137
The RST packets are not normal, something between your machine and the server seems to be killing your TCP connection. It's hard to tell what that might be without a full packet dump.
– Satō Katsura
Sep 8 '17 at 10:01
@SatōKatsura Though better. That server and jumpbox in the tcpdump are both in the same network; I do have other servers that do routing via firewall
– Rui F Ribeiro
Sep 8 '17 at 11:24
Well, you need to find out where those RST come from. There could be any number of reasons for that. shrug
– Satō Katsura
Sep 8 '17 at 11:33
@SatōKatsura sure indeed. Will add another tcpdump when at work. The difficult part is that this is a bit random
– Rui F Ribeiro
Sep 8 '17 at 11:37
add a comment |
The RST packets are not normal, something between your machine and the server seems to be killing your TCP connection. It's hard to tell what that might be without a full packet dump.
– Satō Katsura
Sep 8 '17 at 10:01
@SatōKatsura Though better. That server and jumpbox in the tcpdump are both in the same network; I do have other servers that do routing via firewall
– Rui F Ribeiro
Sep 8 '17 at 11:24
Well, you need to find out where those RST come from. There could be any number of reasons for that. shrug
– Satō Katsura
Sep 8 '17 at 11:33
@SatōKatsura sure indeed. Will add another tcpdump when at work. The difficult part is that this is a bit random
– Rui F Ribeiro
Sep 8 '17 at 11:37
The RST packets are not normal, something between your machine and the server seems to be killing your TCP connection. It's hard to tell what that might be without a full packet dump.
– Satō Katsura
Sep 8 '17 at 10:01
The RST packets are not normal, something between your machine and the server seems to be killing your TCP connection. It's hard to tell what that might be without a full packet dump.
– Satō Katsura
Sep 8 '17 at 10:01
@SatōKatsura Though better. That server and jumpbox in the tcpdump are both in the same network; I do have other servers that do routing via firewall
– Rui F Ribeiro
Sep 8 '17 at 11:24
@SatōKatsura Though better. That server and jumpbox in the tcpdump are both in the same network; I do have other servers that do routing via firewall
– Rui F Ribeiro
Sep 8 '17 at 11:24
Well, you need to find out where those RST come from. There could be any number of reasons for that. shrug
– Satō Katsura
Sep 8 '17 at 11:33
Well, you need to find out where those RST come from. There could be any number of reasons for that. shrug
– Satō Katsura
Sep 8 '17 at 11:33
@SatōKatsura sure indeed. Will add another tcpdump when at work. The difficult part is that this is a bit random
– Rui F Ribeiro
Sep 8 '17 at 11:37
@SatōKatsura sure indeed. Will add another tcpdump when at work. The difficult part is that this is a bit random
– Rui F Ribeiro
Sep 8 '17 at 11:37
add a comment |
4 Answers
4
active
oldest
votes
Your symptoms sound consistent with having a machine on the network using the same IP address as the SSH server. Check the MAC address of the RST packets.
I actually monitor MAC addresses on that netblock, and it seems not to be the case.
– Rui F Ribeiro
Sep 9 '17 at 8:20
add a comment |
Are you going through any FW or device attempting TCP optimisation? I've got the same experience over a network and it turned out to be a device doing TCP optimisation.
Most probably, later on I solved a couple of bugs on that FW/router manipulating the configuration at Cisco level. But I never come back to this and nowadays I am on another job.
– Rui F Ribeiro
Oct 25 '18 at 16:59
add a comment |
Found some systems with
net.ipv4.tcp_timestamps = 0
in /etc/sysctl.conf ; the servers having the problem all have that enabled.
I ended up taking this line from the affected systems and running in all systems:
sudo sysctl -w net.ipv4.tcp_timestamps=1
Waiting for further tests.
add a comment |
In the end, found out it was due to bugs in the Cisco 6059 core router and the ASA firewall being used.
The Linux kernel v3 and v4 does not play well with TCP Sequence Randomization, and gives "random" problems on transferring big files, or other kind of obscure problems in many connections, of which SSH were more visible. Unfortunately, Windows, Mac and FreeBSD do play well, so it can be somewhat quoted as a Linux bug.
Each TCP connection has two ISNs: one generated by the client and one
generated by the server. The ASA randomizes the ISN of the TCP SYN
passing in both the inbound and outbound directions.
Randomizing the ISN of the protected host prevents an attacker from
predicting the next ISN for a new connection and potentially hijacking
the new session.
You can disable TCP initial sequence number randomization if
necessary, for example, because data is getting scrambled. For
example:
If another in-line firewall is also randomizing the initial sequence
numbers, there is no need for both firewalls to be performing this
action, even though this action does not affect the traffic.
I initially disabled Cisco Randomization in the internal core router, it was not enough. After Cisco Randomization was disabled both in the border firewalls and core Cisco router/switch, the problem stopped happening.
For disabling it, it is something similar to:
policy-map global_policy
class preserve-sq-no
set connection random-sequence-number disable
See Cisco note Disable TCP Sequence Randomization
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f391125%2fjumphost-suddenly-reseting-first-ssh-mux-connection-attempts%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
Your symptoms sound consistent with having a machine on the network using the same IP address as the SSH server. Check the MAC address of the RST packets.
I actually monitor MAC addresses on that netblock, and it seems not to be the case.
– Rui F Ribeiro
Sep 9 '17 at 8:20
add a comment |
Your symptoms sound consistent with having a machine on the network using the same IP address as the SSH server. Check the MAC address of the RST packets.
I actually monitor MAC addresses on that netblock, and it seems not to be the case.
– Rui F Ribeiro
Sep 9 '17 at 8:20
add a comment |
Your symptoms sound consistent with having a machine on the network using the same IP address as the SSH server. Check the MAC address of the RST packets.
Your symptoms sound consistent with having a machine on the network using the same IP address as the SSH server. Check the MAC address of the RST packets.
answered Sep 9 '17 at 6:57
user1998586user1998586
1613
1613
I actually monitor MAC addresses on that netblock, and it seems not to be the case.
– Rui F Ribeiro
Sep 9 '17 at 8:20
add a comment |
I actually monitor MAC addresses on that netblock, and it seems not to be the case.
– Rui F Ribeiro
Sep 9 '17 at 8:20
I actually monitor MAC addresses on that netblock, and it seems not to be the case.
– Rui F Ribeiro
Sep 9 '17 at 8:20
I actually monitor MAC addresses on that netblock, and it seems not to be the case.
– Rui F Ribeiro
Sep 9 '17 at 8:20
add a comment |
Are you going through any FW or device attempting TCP optimisation? I've got the same experience over a network and it turned out to be a device doing TCP optimisation.
Most probably, later on I solved a couple of bugs on that FW/router manipulating the configuration at Cisco level. But I never come back to this and nowadays I am on another job.
– Rui F Ribeiro
Oct 25 '18 at 16:59
add a comment |
Are you going through any FW or device attempting TCP optimisation? I've got the same experience over a network and it turned out to be a device doing TCP optimisation.
Most probably, later on I solved a couple of bugs on that FW/router manipulating the configuration at Cisco level. But I never come back to this and nowadays I am on another job.
– Rui F Ribeiro
Oct 25 '18 at 16:59
add a comment |
Are you going through any FW or device attempting TCP optimisation? I've got the same experience over a network and it turned out to be a device doing TCP optimisation.
Are you going through any FW or device attempting TCP optimisation? I've got the same experience over a network and it turned out to be a device doing TCP optimisation.
answered Oct 4 '18 at 17:41
YusufkYusufk
1366
1366
Most probably, later on I solved a couple of bugs on that FW/router manipulating the configuration at Cisco level. But I never come back to this and nowadays I am on another job.
– Rui F Ribeiro
Oct 25 '18 at 16:59
add a comment |
Most probably, later on I solved a couple of bugs on that FW/router manipulating the configuration at Cisco level. But I never come back to this and nowadays I am on another job.
– Rui F Ribeiro
Oct 25 '18 at 16:59
Most probably, later on I solved a couple of bugs on that FW/router manipulating the configuration at Cisco level. But I never come back to this and nowadays I am on another job.
– Rui F Ribeiro
Oct 25 '18 at 16:59
Most probably, later on I solved a couple of bugs on that FW/router manipulating the configuration at Cisco level. But I never come back to this and nowadays I am on another job.
– Rui F Ribeiro
Oct 25 '18 at 16:59
add a comment |
Found some systems with
net.ipv4.tcp_timestamps = 0
in /etc/sysctl.conf ; the servers having the problem all have that enabled.
I ended up taking this line from the affected systems and running in all systems:
sudo sysctl -w net.ipv4.tcp_timestamps=1
Waiting for further tests.
add a comment |
Found some systems with
net.ipv4.tcp_timestamps = 0
in /etc/sysctl.conf ; the servers having the problem all have that enabled.
I ended up taking this line from the affected systems and running in all systems:
sudo sysctl -w net.ipv4.tcp_timestamps=1
Waiting for further tests.
add a comment |
Found some systems with
net.ipv4.tcp_timestamps = 0
in /etc/sysctl.conf ; the servers having the problem all have that enabled.
I ended up taking this line from the affected systems and running in all systems:
sudo sysctl -w net.ipv4.tcp_timestamps=1
Waiting for further tests.
Found some systems with
net.ipv4.tcp_timestamps = 0
in /etc/sysctl.conf ; the servers having the problem all have that enabled.
I ended up taking this line from the affected systems and running in all systems:
sudo sysctl -w net.ipv4.tcp_timestamps=1
Waiting for further tests.
edited Oct 10 '17 at 13:26
answered Oct 6 '17 at 22:08
Rui F RibeiroRui F Ribeiro
40.7k1479137
40.7k1479137
add a comment |
add a comment |
In the end, found out it was due to bugs in the Cisco 6059 core router and the ASA firewall being used.
The Linux kernel v3 and v4 does not play well with TCP Sequence Randomization, and gives "random" problems on transferring big files, or other kind of obscure problems in many connections, of which SSH were more visible. Unfortunately, Windows, Mac and FreeBSD do play well, so it can be somewhat quoted as a Linux bug.
Each TCP connection has two ISNs: one generated by the client and one
generated by the server. The ASA randomizes the ISN of the TCP SYN
passing in both the inbound and outbound directions.
Randomizing the ISN of the protected host prevents an attacker from
predicting the next ISN for a new connection and potentially hijacking
the new session.
You can disable TCP initial sequence number randomization if
necessary, for example, because data is getting scrambled. For
example:
If another in-line firewall is also randomizing the initial sequence
numbers, there is no need for both firewalls to be performing this
action, even though this action does not affect the traffic.
I initially disabled Cisco Randomization in the internal core router, it was not enough. After Cisco Randomization was disabled both in the border firewalls and core Cisco router/switch, the problem stopped happening.
For disabling it, it is something similar to:
policy-map global_policy
class preserve-sq-no
set connection random-sequence-number disable
See Cisco note Disable TCP Sequence Randomization
add a comment |
In the end, found out it was due to bugs in the Cisco 6059 core router and the ASA firewall being used.
The Linux kernel v3 and v4 does not play well with TCP Sequence Randomization, and gives "random" problems on transferring big files, or other kind of obscure problems in many connections, of which SSH were more visible. Unfortunately, Windows, Mac and FreeBSD do play well, so it can be somewhat quoted as a Linux bug.
Each TCP connection has two ISNs: one generated by the client and one
generated by the server. The ASA randomizes the ISN of the TCP SYN
passing in both the inbound and outbound directions.
Randomizing the ISN of the protected host prevents an attacker from
predicting the next ISN for a new connection and potentially hijacking
the new session.
You can disable TCP initial sequence number randomization if
necessary, for example, because data is getting scrambled. For
example:
If another in-line firewall is also randomizing the initial sequence
numbers, there is no need for both firewalls to be performing this
action, even though this action does not affect the traffic.
I initially disabled Cisco Randomization in the internal core router, it was not enough. After Cisco Randomization was disabled both in the border firewalls and core Cisco router/switch, the problem stopped happening.
For disabling it, it is something similar to:
policy-map global_policy
class preserve-sq-no
set connection random-sequence-number disable
See Cisco note Disable TCP Sequence Randomization
add a comment |
In the end, found out it was due to bugs in the Cisco 6059 core router and the ASA firewall being used.
The Linux kernel v3 and v4 does not play well with TCP Sequence Randomization, and gives "random" problems on transferring big files, or other kind of obscure problems in many connections, of which SSH were more visible. Unfortunately, Windows, Mac and FreeBSD do play well, so it can be somewhat quoted as a Linux bug.
Each TCP connection has two ISNs: one generated by the client and one
generated by the server. The ASA randomizes the ISN of the TCP SYN
passing in both the inbound and outbound directions.
Randomizing the ISN of the protected host prevents an attacker from
predicting the next ISN for a new connection and potentially hijacking
the new session.
You can disable TCP initial sequence number randomization if
necessary, for example, because data is getting scrambled. For
example:
If another in-line firewall is also randomizing the initial sequence
numbers, there is no need for both firewalls to be performing this
action, even though this action does not affect the traffic.
I initially disabled Cisco Randomization in the internal core router, it was not enough. After Cisco Randomization was disabled both in the border firewalls and core Cisco router/switch, the problem stopped happening.
For disabling it, it is something similar to:
policy-map global_policy
class preserve-sq-no
set connection random-sequence-number disable
See Cisco note Disable TCP Sequence Randomization
In the end, found out it was due to bugs in the Cisco 6059 core router and the ASA firewall being used.
The Linux kernel v3 and v4 does not play well with TCP Sequence Randomization, and gives "random" problems on transferring big files, or other kind of obscure problems in many connections, of which SSH were more visible. Unfortunately, Windows, Mac and FreeBSD do play well, so it can be somewhat quoted as a Linux bug.
Each TCP connection has two ISNs: one generated by the client and one
generated by the server. The ASA randomizes the ISN of the TCP SYN
passing in both the inbound and outbound directions.
Randomizing the ISN of the protected host prevents an attacker from
predicting the next ISN for a new connection and potentially hijacking
the new session.
You can disable TCP initial sequence number randomization if
necessary, for example, because data is getting scrambled. For
example:
If another in-line firewall is also randomizing the initial sequence
numbers, there is no need for both firewalls to be performing this
action, even though this action does not affect the traffic.
I initially disabled Cisco Randomization in the internal core router, it was not enough. After Cisco Randomization was disabled both in the border firewalls and core Cisco router/switch, the problem stopped happening.
For disabling it, it is something similar to:
policy-map global_policy
class preserve-sq-no
set connection random-sequence-number disable
See Cisco note Disable TCP Sequence Randomization
answered 4 mins ago
Rui F RibeiroRui F Ribeiro
40.7k1479137
40.7k1479137
add a comment |
add a comment |
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f391125%2fjumphost-suddenly-reseting-first-ssh-mux-connection-attempts%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
The RST packets are not normal, something between your machine and the server seems to be killing your TCP connection. It's hard to tell what that might be without a full packet dump.
– Satō Katsura
Sep 8 '17 at 10:01
@SatōKatsura Though better. That server and jumpbox in the tcpdump are both in the same network; I do have other servers that do routing via firewall
– Rui F Ribeiro
Sep 8 '17 at 11:24
Well, you need to find out where those RST come from. There could be any number of reasons for that. shrug
– Satō Katsura
Sep 8 '17 at 11:33
@SatōKatsura sure indeed. Will add another tcpdump when at work. The difficult part is that this is a bit random
– Rui F Ribeiro
Sep 8 '17 at 11:37