How to Copy Files Across a Network/Internet in UNIX/LINUX (Redhat, Debian, FreeBSD, etc) - scp tar rsync
One
of the many advantages of Linux/UNIX is how many ways you can do one
thing. This tuturial is going to show you some of the many ways you can
ttransfer files over a network connection.
In
this article/tutorial we will cover rsync, scp, and tar. Please note
that there are many other ways these are just some of the more common
ones. The methods covered assume that SSH is used in all sessions.
These methods are all much more secure and reliable than using rcp or
ftp. This tutorial is a great alternative for those looking for an FTP
alterative to transfering files over a network.
scp
scp or secure copy is probably the easiest of all the methods, its is designed as a replacement for rcp, which was a quick copy of cp with network funcationability.
scp syntax
scp [-Cr] /some/file [ more ... ] host.name:/destination/file
-or-
scp [-Cr] [[user@]host1:]file1 [ more ... ] [[user@]host2:]file2
Before scp does any copying it first connects via ssh. Unless proper keys are in place, then you will be asked for usernames. You can test if this is working by using ssh -v hostname
The
-r switch is used when you want to recursively go through directories.
Please note you must specify the source file as a directory for this to
work.
scp encrypts
data over your network connection, but by using the -C switch you can
compress the data before it goes over the network. This can
significantly decrease the time it takes to copy large files.
Tip: By default scp uses 3DES encryption algorithm, all encryption algorithms are slow, but some are faster than others. Using -c blowfish can speed things up.
What scp shouldn't be used for:
1. When you are copying more than a few files, as scp spawns a new process for each file and can be quite slow and resource intensive when copying a large number of files.
2. When using the -r switch, scp
does not know about symbolic links and will blindly follow them, even
if it has already made a copy of the file. The can lead to scp copying an infinite amount of data and can easily fill up your hard disk, so be careful.
rsync
rsync has very similar syntax to scp:
rsync -e ssh [-avz] /some/file [ more ... ] host.name:/destination/file
-or-
rsync -ave ssh source.server:/path/to/source /destination/dir
rsync's
speciality lies in its ability to analyse files and only copy the
changes made to files rather than all files. This can lead to enormous
improvements when copying a directory tree a second time.
Switches:
-a Archive mode, most likely you should always keep this on. Preserves file permissions and does not follow symlinks.
-v Verbose, lists files being copied
-z
Enable compression, this will compress each file as it gets sent over
the pipe. This can greatly decrease time depending on what sort files
you are copying.
-e ssh Uses ssh as the transport, this should always be specified.
Disadvantages of using rsync:
1. Picky syntax, use of trailing slashes can be confusing.
2. Have to remember that you are using ssh.
3. rsync is not installed on all computers.
tar
tar is usually used for achiving applications, but what we are going to do in this case is tar it then pipe it over an ssh connection. tar
handles large file trees quite well and preserves all file permissions,
etc, including those UNIX systems which use ACLs, and works quite well
with symlinks.
the syntax is slightly different as we are piping it to ssh:
tar -cf - /some/file | ssh host.name tar -xf - -C /destination
-or with compression-
tar -czf - /some/file | ssh host.name tar -xzf - -C /destination
Switch -c for tar creates an archive and -f which tells tar to send the new archive to stdout.
The second tar
command uses the -C switch which changes directory on the target host.
It takes the input from stdin. The -x switch extracts the archive.
The
second way of doing the transfer over a network is with the -z option,
which compresses the stream, decreasing time it will take to transfer
over the network.
Some people may ask why tar
is used, this is great for large file trees, as it is just streaming
the data from one host to another and not having to do intense
operations with file trees.
If using the -v (verbose) switch, be sure only to include it on the second tar command, otherwise you will see double output.
Using tar and piping can also be a great way to transfer files locally to be sure that file permissions are kept correctly:
tar cf - /some/file | (cd /some/file; tar xf -)
This
may seem like a long command, but it is great for making sure all file
permissions are kept in tact. What it is doing is streaming the files
in a sub-shell and then untarring them in the target directory. Please
note that the -z command should not be used for local files and no
perfomance increase will be visible as overhead processing (CPU) will
be evident, and will slow down the copy.
Why tar shouldn't be used:
1. The syntax can be hard to remember
2. It's not as quick as to type scp for a small number of files
3. rsync will beat it hands down for a tree of files that already exist in the destination.
There
are several other ways of copying over a network, such as FTP, NAS, and
NFS but these all requre specialised software installed on either the
receiving or sending end, and hence are not as useful as the above
commands.