[update: This issue is reportedly fixed in Net::SSH::Perl 1.34]
I ran into an issue with the Perl module Net::SFTP that drives the file transfers in a script that I use. Net::SFTP is built on top of Net::SSH::Perl, a pure Perl implementation of the SSH-2 protocol.
The issue is that after 1 GB of data is transferred the connection hangs indefinitely.
I found something similar was reported before in the Net::SSH::Perl bug tracker a couple of years ago. Though in that case the hanging was after 1 hour time rather than 1 GB data transfered. The bug report was rejected by the maintainer just the other day with the comment to “try the latest version”.
Well, I have the latest version of Net::SSH::Perl and Net::SFTP from CPAN (1.30) and still have the issue.
I did some digging.
On the server side I have
OpenSSH_3.9p1, OpenSSL 0.9.7a Feb 19 2003
If I turn on sshd debugging I get this logged at the time of the transfer stalling:
Feb 15 23:21:55 olive sshd: debug1: SSH2_MSG_KEXINIT sent
The client Perl script using Net::SFTP with debugging on, reports:
sftp: Sent message SSH2_FXP_READ I:130563 O:1069555712
Warning: ignore packet type 20
channel 1: window 16358 sent adjust 16410
sftp: Received reply T:103 I:130563
sftp: In read loop, got 8192 offset 1069555712
sftp: Sent message SSH2_FXP_READ I:130564 O:1069563904
I did some more digging and found that the ‘packet type 20′ that is being ignored corresponds to the SSH2_MSG_KEXINIT sent by the server and is a request to initiate session rekeying.
The rekeying was originally something that the commercial SSH2 package did and that sometimes caused problems early on when talking to an OpenSSH clients.
That was then. This is now and it seems that OpenSSH 3.9 knows all about rekeying sessions (I think this was added in 3.6?). The OpenSSH 3.9 server I was connecting to automatically attempts rekeying with SSH-2 clients after ~1GB of data transfer and as far as I can tell there’s no configuration to turn if off. There is an poorly documented ‘RekeyLimit’ for that version that is supposed to control how much data is transfered before rekeying but it had no effect in my hands and I do not see it attempting to regulate anything in the source code (though I’m not very good at C so maybe I’m missing something).
The proper fix I suppose would be to patch Net::SSH::Perl so it implements rekeying. However, I’m not nearly well versed enough in the SSH-2 protocol to do that, so I needed to work around this issue.
After further digging around in the OpenSSH source code I discovered that the server is compiled with a list of known clients that do not support session rekeying (see client strings with SSH_BUG_NOREKEY in OpenSSH compat.c).
Net::SSH::Perl reports itself to the server as version 1.30 (which it is) as the logging from the OpenSSH server in debug mode demonstrates:
‘1.30’ is not on OpenSSH’s compat.c list of SSH_BUG_NOREKEY clients. I could add it to the list and recompile but I don’t have admin access to the sshd server I use in production (the debug messages were from my test server). Therefore the fix has to be in the client.
So, I changed the $VERSION of Net::SSH::Perl to ‘OpenSSH_2.5.2′. The OpenSSH server recognizes it’s talking to a non-rekeying client and skips the session rekeying process. Now large transfers complete without hanging.