Last week SpikeLab.org posted a set of benchmarks for scp, tar over ssh and tar over netcat. The loser in SpikeLab’s environment was scp, coming in at roughly three times slower than tar over ssh (10m10s vs 3m18s, respectively) for a directory of two hundred 10MB files.
Interesting. I had not ever noticed scp performing any worse than ssh but then again I had never compared them directly. I decided to run my own unscientific tests on my servers to see if I’m in the same boat.
The Hardware
sender: RedHat EL4, OpenSSH_3.9p1, OpenSSL 0.9.7a Feb 19 2003 16GB memory, 4x Dual Core AMD Opteron receiver: Apple OSX Server, OpenSSH_4.5p1, OpenSSL 0.9.7l 28 Sep 2006 1GB memory, 2GHz PowerPC G5
These are on a 1Gb/s network. There are four router hops between them and potentially competing traffic on the network so I repeated the tests a few times during non-peak hours to minimize effects of traffic interference.
The Tests
First I created a directory of two hundred 10MB files.
Then I ran a series of scripted file transfer tests, modeling after SpikeLab’s tests. The script is posted at the end of this post.
The Results
A representative result of one of the benchmarks for transferring urandom-generated files across the network is shown in Table 1.
Table 1.
| command | compression | time |
|---|---|---|
| scp | no | 251.04s |
| scp | ssh | 262.37s |
| tar | no | 264.99s |
| tar | ssh | 267.34s |
| tar | gzip | 324.88s |
| tar | ssh and gzip | 331.32s |
| tar | no, blowfish encryption | 279.94s |
| nc | no | 69.45s |
| nc | gzip | 219.53s |
In contrast to SpikeLab’s results, I saw no significant difference between scp and tar over ssh. Also, in my environment the addition of gzip compression to the tar transfers had a detrimental impact on the performance. Compare that with SpikeLab’s results in which the gzip compression significantly improved the transfer rates.
I can think of a few reasons for the transfer rate differences at the two sites. Different versions or build options of SSH could affect the results. ssh/scp have a number of options that can be set in configuration files so what I’m showing as the command line execution is not telling the whole story. Those behind-the-scene configurations may be affecting results.
The effects of gzip compression I see could be explained by the randomness of the files being compressed. The Table 1 results were using files generated from the contents of /dev/urandom. If I repeat the tests with files composed uniformly of NULL characters from /dev/zero then gzip gives a marked improvement (Table 2) on the transfer times. The more random the contents of a file the less compression gzip can achieve. In fact, if the contents is fully random, as the case here, no compression can take place and the size of the compressed file will actually be larger due to gzip’s accounting overhead stored with the file. So in some cases gzip can introduce compute time overhead with no reduction in data sent over the wire. The NULL files from /dev/zero compress nicely – the 2GB directory can be compressed down to a 2MB tarball – so the bandwidth savings is substantial.
Table 2.
Trials using non-random files generated from /dev/zero
| command | compression | time |
|---|---|---|
| scp | no | 271.80s |
| scp | ssh | 264.76s |
| tar | no | 269.62s |
| tar | ssh | 272.20s |
| tar | gzip | 78.33s |
| tar | ssh and gzip | 76.25s |
| tar | no, blowfish encryption | 277.29s |
| nc | no | 78.51s |
| nc | gzip | 78.12s |
Interestingly enabling compression in scp/ssh had no real effect on the NULL files although it should be using the same zlib compression algorithm and same default compression level (6) as gzip. The CPU on the receiver seems to be the limiting factor with gzip compression over netcat so no improvement was seen there. The previous ssh results were using ssh version 2. If I use ssh version 1, with and without compression, I do see a dramatic difference (Table 3).
Table 3.
SSH-1 and /dev/zero data
| command | compression | time |
|---|---|---|
| scp | no | 587.42s |
| scp | ssh | 98.88s |
| tar | no | 687.80s |
| tar | ssh | 93.92s |
| tar | gzip | 78.05s |
| tar | ssh and gzip | 87.90s |
As an aside, I tested SSH blowfish encryption which is reportedly faster than the default AES. However I saw no improvement to the transfer rate by using that algorithm (Table 1).
I think in summary all this highlights the need to benchmark specific environments and adjust accordingly. Your mileage may vary.
perfTests.sh
#!/bin/bash FILE=dir; RHOST=server.remotehost.com; echo $RHOST echo $FILE echo '<table>'; echo '<tr><th>command</th><th>compression</th><th>time</th></tr>'; echo '<tr><td>scp</td><td>no</td><td>'; \time -f "%es" sh -c "scp -qr -oCompression=no $FILE $RHOST:~" echo '</td></tr>'; echo '<tr><td>scp</td><td>ssh</td><td>'; \time -f "%es" sh -c "scp -Cqr $FILE $RHOST:~" echo '</td></tr>'; echo '<tr><td>tar</td><td>no</td><td>'; \time -f "%es" sh -c "tar cf - $FILE | ssh -oCompression=no $RHOST 'tar xf -'" echo '</td></tr>'; echo '<tr><td>tar</td><td>ssh</td><td>'; \time -f "%es" sh -c "tar cf - $FILE | ssh -C $RHOST 'tar xf -'" echo '</td></tr>'; echo '<tr><td>tar</td><td>gzip</td><td>'; \time -f "%es" sh -c "tar zcf - $FILE | ssh -oCompression=no $RHOST 'tar zxf -'" echo '</td></tr>'; echo '<tr><td>tar</td><td>ssh and gzip</td><td>'; \time -f "%es" sh -c "tar zcf - $FILE | ssh -C $RHOST 'tar zxf -'" echo '</td></tr>'; echo '<tr><td>tar</td><td>no, blowfish encryption</td><td>'; \time -f "%es" sh -c "tar cf - $FILE | ssh -c blowfish-cbc -oCompression=no $RHOST 'tar xf -'" echo '</td></tr>'; echo '<tr><td>nc</td><td>no</td><td>'; ssh -f $RHOST "nc -l -p 6969 | tar xf -"; sleep 3; \time -f "%es" sh -c "tar cf - $FILE | nc -w1 $RHOST 6969;" echo '</td></tr>'; echo '<tr><td>nc</td><td>gzip</td><td>'; ssh -f $RHOST "nc -l -p 6969 | tar zxf -"; sleep 3; \time -f "%es" sh -c "tar zcf - $FILE | nc -w1 $RHOST 6969;" echo '</td></tr>'; echo '</table>'; exit;

4 comments
Comments feed for this article
August 19, 2008 at 2:20 pm
Ben Clark
Of course compressing will be detrimental when you are compressing random ( /dev/urandom ) data. With a real file full of very redundant data ( try some xml files ) you will get serious improvement with compression.
Random data is by definition very information dense, if it weren’t then it would have redundancies ( patterns ) and so would not be random.
Because the data you send is random there are no patterns for the compression algorithm to find.
That is why compressing an already compressed file can make it larger – the compressed file is already information dense, and it’s bytes appear pretty random. The added overhead of the next ‘layer’ of compression takes more space than it is worth ( all the low hanging fruit have been picked by the first layer of compression ) and so the resulting file is larger.
In your test, compressing the random data before sending it over the wire may have resulted in more data being send over the wire ( unless the algoritm first tries compression, but then sends the raw file if the result was larger than the raw file which it probably does, though I don’t know ). There was also the computational cost involved in doing the compression.
Try the test again with some real files that are susceptible to compression, such as some xml files. ( gifs, jpegs, mp3s. .movs, .zip, .Z, .gz, and .bz2 files are already compressed and will contain little redundancy to compress out )
August 19, 2008 at 2:51 pm
crashingdaily
Thank you for your comment, Ben. The effects of compression on random data and tests with non-random data was covered in Tables 2 and 3 and associated discussion. I started with random data because I wanted to reproduce the tests reported by SpikeLab.
October 31, 2008 at 3:16 pm
Euro
Thank U for the article, it’s very useful.
BTW, the bottleneck of your scp is the network bandwidth, so the transferring time is limited for a given file size. Using Blowfish can’t change anything even though it’s, theoretically, much more faster. (you’re see the difference while using SCP for localhost )
September 16, 2011 at 10:23 am
ram
Hi
I need a small help over here. i just want to do scp at a constant rate. what i observed when transmitting a file is the data rate at which the file is being transmitted is varying. Is there any command to make the data flow at a constant rate rather than varying. Also let me know the maximum data rate at which we can transmit the data