NerdVana

Now with more 100% more Nerd

Fast Parallel Network Copy

Written by Eric Schwimmer

For when you need to copy a bunch of files between two network hosts in a hurry. This script is especially useful if both hosts are configured with LACP bonded interfaces, as these type of interfaces only do load-balancing on a per-stream basis (i.e. a single network file copy, via scp for example, will only ever be able to go as fast as the slave interface that it gets bound to). Parallel rsync is one possible solution, but it is not best suited for an initial/one-time file copy (checking for existence of remote files imposes non-negligible overhead, and using SSH also slows things down quite a bit if you are using a relatively heavyweight cipher like aes128-cbc, which is the default SSHv2 cipher on many systems).

Enter ptaripe! It leverages tar (file bundling), netcat (network transfer), screen (process control), lzop (fast [de]compression), pipe viewer (transfer rates) and dstat (net+disk I/O) in such a way that is guaranteed to flat-line your network:

#!/bin/bash

[[ $# != 2 && $# != 3 ]] \
    && echo "Usage: ptarpipe <local-path> <remote-host> [remote-path]" \
    && exit 1

LOCAL_PATH=$1
REMOTE_HOST=$2
REMOTE_PATH=$3
NUM_THREADS=4
LISTEN_PORT=10000

[[ "$REMOTE_PATH" == "" ]] && REMOTE_PATH=$LOCAL_PATH

# Create a working directory
WORK_DIR=/tmp/ptarpipe.$$
mkdir -p $WORK_DIR
cd $LOCAL_PATH

# Create a list of all the files that need to be copied. Randomize
# the file list, so that no one thread ends up an unfair number of 
# large files
find . \( -type f -o \( -type d -empty \) \) | sort -R > $WORK_DIR/files

# Create one file per thread, containing a list of files
# to be copied by that thread
NUM_FILES=$(wc -l $WORK_DIR/files | awk '{print $1}')
(( FILES_PER_THREAD = (NUM_FILES + NUM_THREADS - 1) / NUM_THREADS ))
cd $WORK_DIR
split -d -l $FILES_PER_THREAD files

# Now create our screen config that will launch all of the sending
# and receiving processes
SCREEN_RC=$WORK_DIR/screenrc
echo "startup_message off"  > $SCREEN_RC
i=0
while [[ $i < $NUM_THREADS ]]
do
    cat >> $SCREEN_RC << EOT
screen -t remote-$i ssh $REMOTE_HOST 'mkdir -p $REMOTE_PATH && \
nc -ld $LISTEN_PORT | lzop -d | tar xvp -C $REMOTE_PATH'
screen -t local-$i sh -c 'sleep 5; cd $LOCAL_PATH; \
tar cfp - -T $WORK_DIR/x$(printf '%-2.2d' $i) | \
pv -c -N DISK | lzop | pv -c -N NET | nc $REMOTE_HOST $LISTEN_PORT
EOT
    (( LISTEN_PORT++ ))
    (( i++ ))
done
echo "screen -t dstat dstat 5" >> $SCREEN_RC

# Start the copy, and then clean up when we are done
screen -c $SCREEN_RC
rm -rf $WORK_DIR


comments powered by Disqus