You are currently browsing the monthly archive for February 2007.
[update: This issue is reportedly fixed in Net::SSH::Perl 1.34]
I ran into an issue with the Perl module Net::SFTP that drives the file transfers in a script that I use. Net::SFTP is built on top of Net::SSH::Perl, a pure Perl implementation of the SSH-2 protocol.
The issue is that after 1 GB of data is transferred the connection hangs indefinitely.
I found something similar was reported before in the Net::SSH::Perl bug tracker a couple of years ago. Though in that case the hanging was after 1 hour time rather than 1 GB data transfered. The bug report was rejected by the maintainer just the other day with the comment to “try the latest version”.
Well, I have the latest version of Net::SSH::Perl and Net::SFTP from CPAN (1.30) and still have the issue.
I did some digging.
On the server side I have
OpenSSH_3.9p1, OpenSSL 0.9.7a Feb 19 2003
If I turn on sshd debugging I get this logged at the time of the transfer stalling:
Feb 15 23:21:55 olive sshd[19497]: debug1: SSH2_MSG_KEXINIT sent
The client Perl script using Net::SFTP with debugging on, reports:
sftp: Sent message SSH2_FXP_READ I:130563 O:1069555712
Warning: ignore packet type 20
channel 1: window 16358 sent adjust 16410
sftp: Received reply T:103 I:130563
sftp: In read loop, got 8192 offset 1069555712
sftp: Sent message SSH2_FXP_READ I:130564 O:1069563904
I did some more digging and found that the ‘packet type 20’ that is being ignored corresponds to the SSH2_MSG_KEXINIT sent by the server and is a request to initiate session rekeying.
The rekeying was originally something that the commercial SSH2 package did and that sometimes caused problems early on when talking to an OpenSSH clients.
That was then. This is now and it seems that OpenSSH 3.9 knows all about rekeying sessions (I think this was added in 3.6?). The OpenSSH 3.9 server I was connecting to automatically attempts rekeying with SSH-2 clients after ~1GB of data transfer and as far as I can tell there’s no configuration to turn if off. There is an poorly documented ‘RekeyLimit’ for that version that is supposed to control how much data is transfered before rekeying but it had no effect in my hands and I do not see it attempting to regulate anything in the source code (though I’m not very good at C so maybe I’m missing something).
The proper fix I suppose would be to patch Net::SSH::Perl so it implements rekeying. However, I’m not nearly well versed enough in the SSH-2 protocol to do that, so I needed to work around this issue.
After further digging around in the OpenSSH source code I discovered that the server is compiled with a list of known clients that do not support session rekeying (see client strings with SSH_BUG_NOREKEY in OpenSSH compat.c).
Net::SSH::Perl reports itself to the server as version 1.30 (which it is) as the logging from the OpenSSH server in debug mode demonstrates:
‘1.30’ is not on OpenSSH’s compat.c list of SSH_BUG_NOREKEY clients. I could add it to the list and recompile but I don’t have admin access to the sshd server I use in production (the debug messages were from my test server). Therefore the fix has to be in the client.
So, I changed the $VERSION of Net::SSH::Perl to ‘OpenSSH_2.5.2’. The OpenSSH server recognizes it’s talking to a non-rekeying client and skips the session rekeying process. Now large transfers complete without hanging.
I have a shell script to manage and report on my Tomcat instances. I wanted the ‘status’ portion of the script to report on instance uptime (which, by the way, has improved significantly since switching to JRockit). The script was already reporting the PID of the parent tomcat process so I shoved in this one-liner that takes that PID and gets the elapsed time from ps. I filter the result through grep and sed to get a clean human-readable output.
echo $uptime
The output is formated as one of days&hours, hours&minutes, minutes&seconds.
03h 23m
20m 56s
Anyone got a better or different way?
ls -d /usr/local/tomcat_instances/{InstanceA, InstanceB, InstanceC, InstanceD}/conf/Catalina/localhost | xargs -i{} cp /usr/local/tomcat_instances/Instance_Template/conf/Catalina/localhost/ROOT.xml {}
rsync -a -e “ssh gateway.remotenet.org ssh server.remotenet.org” :/logs /sync/logs
That is all.
sshfs is wickedly handy for mounting remote directories on your local filesystem. Recently I needed to mount the /logs directory off a remote server so a program on my workstation could process log files in /logs.
The textbook command to do that would be:
The tricky part in this particular case is that the server is on a private network so my workstation can not directly access it. I’m required to first ssh to a gateway machine and then ssh to the server.
---------------- ------------- ------------- |workstation | | | | server | | | --------------| gateway | ------- | | |/mnt/svrlogs | | | | /logs | ---------------- ------------- -------------
I found three ways to work with this scenario. I’d love to hear of more ways and get feedback on these.
I reported yesterday being unable to use the JRockit JVM when launching Tomcat with jsvc -user tomcat. I put strace -f on the process and came up with
[pid 13655] open(“/proc/self/maps”, O_RDONLY) = -1 EACCES (Permission denied)
That wasn’t surprising, I expected a /proc read problem based on my previous research, but this nailed it. strace has help me out on a few other occasions (even though I barely understand the output). I really must remember to use it sooner in the troubleshooting process.
Anyway, I delved into the source code for jsvc (jsvc-unix.c) where I was introduced to the world of Linux capabilities. Long story short, I implemented CAP_DAC_READ_SEARCH in the relevant macros, permitting the switched user to read /proc (and any other file on the system for that matter).
Here is my patch of jsvc-unix.c (from jsvc source shipped with the Apache Tomcat 5.5.20 distribution).
— jsvc-unix.c.dist 2007-02-05 22:34:01.000000000 -0500
+++ jsvc-unix.c 2007-02-05 23:41:18.000000000 -0500
@@ -115,12 +115,15 @@
#define CAPSMAX (1 << CAP_NET_BIND_SERVICE)+ \
(1 << CAP_DAC_READ_SEARCH)+ \
(1 << CAP_DAC_OVERRIDE)
-/* That a more reasonable configuration */
+/* That a more reasonable configuration.
+ CAP_DAC_READ_SEARCH permits reading /proc/self */
#define CAPS (1 << CAP_NET_BIND_SERVICE)+ \
+ (1 << CAP_DAC_READ_SEARCH)+ \
(1 << CAP_SETUID)+ \
(1 << CAP_SETGID)
/* probably the only one Java could use */
-#define CAPSMIN (1 << CAP_NET_BIND_SERVICE)
+#define CAPSMIN (1 << CAP_NET_BIND_SERVICE)+ \
+ (1 << CAP_DAC_READ_SEARCH)
static int set_caps(int caps)
{
struct __user_cap_header_struct caphead;
With patched jsvc in service, Tomcat happily runs with JRockit under a non-privileged user.
Related:
$ man 7 capabilities
BEA JRockit
Postscript
I’m reposting the pre-patch error from catalina.err here for Google to index. Maybe this post will be discovered and help someone encountering the same problem.
05/02/2007 23:13:16 12004 jsvc.exec error: Cannot create Java VM
05/02/2007 23:13:16 12003 jsvc.exec error: Service exit with a return value of 1
[WARN ] pthread_getattr_np failed in psiGetStackInfo for 0x2a9557b960.
[WARN ] OS message: Permission denied (13)
I’ve been exploring replacing Sun’s JVM with JRockit as a means to improving the uptime of our Struts-based web applications running under Tomcat. The problem I’ve been having is this. Repeatedly reloading our Struts-based web application (something we do frequently during development) causes Tomcat to stop responding to requests after about 10-ish reloads. Other techniques for updating a single webapp running under a live Tomcat instance – undeploy/deploy, stop/start – exhibit the same symptoms. The only out is to restart the entire Tomcat instance – disrupting all the other fully functional webapps deployed there.
Initially Tomcat logs were of no help to me but after upgrading Tomcat to 5.5 (from 5.0.28) and Sun’s JDK to 1.5.0_10 (from 1.5.0_05) I started getting Tomcat log entries that suggested the problem.
java.lang.OutOfMemoryError: PermGen space
That’s something Google can sink its teeth into and it quickly spit out the digested remains of the other poor saps that have fallen prey to this. The likely problem of this memory leak is due to incomplete garbage collection of the Classloader such that classes persist in Sun’s JVM PermGen memory heap – and therefore for the life of the JVM.
The simplest workaround (short of tracking down the sources of memory leaks in the app itself) suggested to replace Sun’s JVM with BEA’s JRockit. JRockit doesn’t have a PermGen heap so there can be no leaking there.
I rarely use GMail’s web interface – I don’t like it as well as desktop mail clients. For my primary account I retrieve messages via POP3 to my desktop mail client (all the while wishing Google would get busy with the IMAP) and my other low-volume, primarily read-only accounts are monitored with RSS feeds (well, technically they are Atom feeds). However, this week I needed to email out from one of the read-only accounts and that meant a trip to GMail web. Only I couldn’t get there on my PowerBook as both Safari and Firefox would hang at the blank ‘Loading…’ screen. The stuck at Loading… scenario Googles well but the top hits were of no help. Clear cache. Check. Clear cookies. Check. Adjust security settings in ZoneAlarm. Ch… “ZoneAlarm? WTF is that”, asks my PowerBook. You don’t want to know.
So I went off and did my business in Firefox on my Ubunutu box. Problem worked around but mystery not solved and that bugged me. So I committed some mental background processes to work on the problem. I was not surprised that Safari wasn’t playing nice with GMail but for both Safari and FF to exhibit the same symptoms was unusual. I ruled out problems with my GMail accounts – they were accessible from other machines – so the focus was on the commonality of Safari and FF on my Powerbook.
Eventually I came up with ‘userContent.css’ as just such a commonality. Years ago I installed a userContent.css file to suppress ads in web pages and to override style annoyances such as underlined links. It’s been quietly doing it’s job all this time and I had all but forgotten about it. Sure enough, disabling this custom CSS permitted GMail to load normally. A binary search of the contents led me to the specific offending line:
IFRAME[SRC*="ad."] { display: none ! important; }
I removed this line and userContent.css is now GMail compatible.
The bulk of my userContent.css originated from floppymoose.com with a few of my own edits. I paid floppymoose a visit and lo ‘n behold they report userContent.css is “now GMail compatible”.
That’s what I said.
Related:
floppymoose userContent.css
GMail via Atom
GMail::IMAPD
gmailproxy
Mike Davidson and comments on GMail over IMAP