SEQRES 1 357 GLU VAL LEU ILE THR GLY LEU ARG THR ARG ALA VAL ASN 2MNR 106 SEQRES 2 357 VAL PRO LEU ALA TYR PRO VAL HIS THR ALA VAL GLY THR 2MNR
5.9 Playing Nicely with Others in a Shared Environment
5.9.4 Creating Archives of Your Data
So, after months of your time, hundreds of megabytes of files, and several layers of subdirectories, the otter project is finally complete. Time to move on to the next project with a clean slate. But as
refreshing as it may sound, you can't just type:
% rm -rf otter/
Other people may need to look back at your findings or use them as a starting point for their own research. At the other extreme, you can't leave your files lying around or laboriously copy them a few at a time to another location. Not every file needs to be accessible at all times; some files are replaced, while others are more conveniently stored elsewhere. This section covers the tools provided by Unix for archiving your data so you don't have to worry about it on a day-to-day basis but can find things later when you need them.
5.9.4.1 tar: Hold the feathers
Usage: tar functions[options] [arguments] filenames
After going through all the effort of setting up your filesystem rationally, it seems like a waste to lose that structure in the process of storing it away, like hastily packed dishes in an unexpected cross- country move. Fortunately, there is a Unix command that lets you work with whole directories of files while retaining the directory structure. tar compacts a directory and all its component files and (if you ask for it) subdirectories into a single file with the name of the compacted directory and a .tar
and options. tar is short for "tape archive," since the utility was originally designed to read and write archives stored on magnetic tape. Another common use of tar is to package software in a form that can be easily transferred over the Internet.
To run tar, you must choose one of the following functions: c
Creates a new tape archive r
Appends the files to an existing archive u
Adds files to the archive if they aren't present or are modified x
Extracts files from an existing archive t
Prints a table of contents of the archive The options for tar are as follows:
f archive
Performs the specified operation on archive, which can either be a device (such as a tape drive or a removable disk) or a tar file
v
(verbose mode) Prints the name o f each file archived or extracted with a character to indicate the function (a for archived; x for extracted)
w
(whiny mode) Asks for confirmation at every step
Note that neither functions nor options require the hyphen that usually precedes Unix command options.
If you type:
the otter/ directory and all its subdirectories are rolled into a single file called otter.tar. It's good practice to use the v option, so you can see if something is going horribly wrong while the archive is being processed.
If, on the other hand, you want to make an archive of the otter/ directory on the tape drive nftape, you can type:
% tar cvf /dev/nftape otter/
A couple of warnings about tar are in order. First, before you use tar on your system, you should use
which to find out whether the GNU or the standard version is installed. Several of the options mean different things to each version; the ones listed earlier are the same in each version.
Second, the tar file you create will be as large as all the contents of the directory and subdirectories beneath it. This condition has dire implications if your archived directory is large and you have limited disk space, or you need to transfer large amounts of tar 'd data. In these cases, you should break down the directory into subdirectories of a more manageable size, and tar those instead.
If you don't have the space on your current filesystem or partition for your files and the archive you are creating to exist simultaneously, or you wish to download a whole archive file and unpack it just to retrieve a few files, you can transfer your archive over the network or even just to another partition using a combination of ftp and tar commands. Sending an archive this way and then extracting it at the destination can be less time-consuming than a cp -r if a large number of files are involved. The ftp
program recognizes a form in which a command replaces the input filenames. The command is
executed in a subshell on the local machine and operates on files on the local file system. The construct is:
ftp command "| command" filename
Inside the ftp program, here's how to send the output of the tar command, enclosed in quotes, into the filename specified as the target on the remote machine:
put "|tar cvBf - *" filename
Here's how to direct the downloaded archive through the tar command, resulting in extraction of only the files in the specified directory within the archive:
get filename.tar "|tar xvf - dirname"
Finally, here's how to list the contents of the remote archive:
get filename.tar "|tar t - *"
5.9.4.2 compress
Usage: compress -[options] filenames
Ultimately, you don't want to be left with large—if more manageable—tar files cluttering up your filesystem. In this situation, data-compression utilities are important, since they allow you to cheat and
reduce the amount of space that files take up on your hard disk. compress is the standard Unix file- compression command. It's the opposite of uncompress, the command used in Chapter 3 to open compressed papers and software. compress adds a .Z to the end of the filename.
Here are the most useful options for compress : -f
Forces compression; even if there is already a compressed version of the file, the main effect is to not overwrite an existing compressed file
-v
(verbose mode) Prints percentage compression achieved by the file -r
(recursive mode) If compress is applied to a directory that contains subdirectories, compresses their contents as well as those of the original directory
If you have a text file named stoat.txt and the tar file of the otter/ directory from the last section, and you want to compress both and look at the resulting compression ratio achieved, type:
% compress -v stoat.txt otter.tar
This command produces two files stoat.txt.Z and otter.tar.Z. The files can be uncompressed using the
uncompress command or gzip -d (described next). In case you were wondering, natural languages (the kind humans use) end up with a compression ratio around 60%, and programming languages get around 40%. Try compressing the sequences of some of your favorite proteins to see what sort of ratio you get: the values can be wildly variable, depending on whether there are repeats in the sequence.
5.9.4.3 gzip
Usage: gzip -[options] filenames
As usual, in addition to the standard Unix compress, there's a faster and more efficient GNU utility:
gzip. gzip behaves in much the same way as compress, except that it gets better compression on average, since it uses a superior algorithm. gzip adds the suffix .gz to a file that it compresses. It emulates the compress options described earlier and adds a few of its own:
-N
(default setting) Preserves the original name and timestamp from the file being compressed -q
(quiet mode) Suppresses warnings when running -d
Returns a file that has been compressed by gzip to its uncompressed state; gzip can also recognize and uncompress files produced by compress