TACC User Guides
Ranch User Guide

System Overview

HPC machines are used primarily for scientific computing and therefore their account owned disk space ($HOME directory size) is limited. This is also true for TACC's visualization systems. The Ranch system serves HPC and Vis community machines by providing a massive, high-performance file system for archiving files.

TACC's long-term mass storage solution is a Sun Microsystems® StorageTek Mass Storage Facility named Ranch (ranch.tacc.utexas.edu). Ranch utilizes Sun's Storage Archive Manager Filesystem (SAM-FS) for migrating files to and from a tape archival system with a current storage capacity of 1 Petabyte(PB).

 

Architecture

Ranch's disk cache is built on a Sun ST6540 disk array containing approximately 50 Terabytes(TB) of spinning disk. This disk array is controlled by a Sun x4600 SAM-FS Metadata server, which has 16 CPUs and 32 GB of RAM.

A single Sun StorageTek SL8500 Automated Tape Library houses all of the offline archival storage. Each SL8500 library contains 10,000 tape slots and 64 tape drive slots. Each tape is capable of holding 500 GB of uncompressed data, so when fully populated, a single SL8500 library can house 5 PB. Each SL8500 library also contains 4 handbots to manage tapes and move them to or from the tape drives. If necessary, up to 4 SL8500 libraries can be integrated into a single archival solution, allowing for a maximum offline storage capacity of 20 PB.

The current Ranch configuration has 5,000 tapes, and is capable of housing 2.5 PB of uncompressed data. However, future plans call for further population of the tape slots, as well as upgrades of the physical media from 500 GB capacity to a 1 TB capacity, for a total capacity of 10PB.

Ranch Picture

 

User Environment

 

System Access

The preferred way of accessing Ranch, especially from scripts, is by using the TACC-defined environment variables $ARCHIVER and $ARCHIVE. These variables define the hostname of the current TACC archival system ($ARCHIVER) and each account's personal archival space ($ARCHIVE). These environment variables help ensure that scripts will continue to work, even if the system itself changes in the future.

Currently, direct login to Ranch is allowed so you can create directories and demigrate files from tape back to the disk subsystem for later transfer to TACC machines, or personal computers. However, since Ranch is an archive system, any files which have not been accessed recently will be stored on tape, so it is recommended that you use the 'stage' command documented below to retrieve files from tape before attempting to access them. We also recommend that you use 'tar' or another utility to bundle large numbers of small files together for more efficient storage and retrieval on Ranch.

Ranch access is not allowed from within job scripts on other TACC resources; data must be transferred from Ranch in order to be available to running jobs.

 

Login Information

From most TACC machines, you can access Ranch using rsh, as in the following example.

lonestar% rsh $ARCHIVER

The above method will usually not request a password. From the outside world, however, and from TACC systems where rsh is not available, use ssh by typing:

localhost% ssh ranch.tacc.utexas.edu

When using ssh, expect to type in a password .

 

File Systems

Ranch uses the Storage and Archive Manager File System (SAM-FS). SAM-FS contains several commands to manage the storage and location of files stored on Ranch. To get a full description of usage for any of the commands below, use the manpages on Ranch by logging in and typing "man < command >".

Command Description
List of SAM-FS Commands
stage retrieve files from tape and place in disk cache
sls similar to ls, with more migration information
sfind SAM-FS find
sdu du replacement - size of archived directory/file
 

Usage Policies

 For usage policies on Ranch and other TACC resources, refer to TACC's Usage Policies.

 

Usage

 

High Speed File Transfers

SSH and BBCP

We support the use of SSH/SCP and BBCP to transfer files to the Archive. To use ssh to create a 'tar' archive file from a directory, you can use the following alternatives to copy files to Ranch ($ARCHIVER):

  1. tar cvf - < dirname > | ssh ${ARCHIVER} "cat > ${ARCHIVE}/< tarfile.tar >"

    where < dirname > is the path to the directory you want to archive, and < tarfile.tar > is the name of the archive on Ranch.

    You could add the -z option to gzip, however, it would run faster if you do not compress the tar file. Gzip uses a lot of CPU, and the local network is not typically a bottleneck.

  2. You can also use the 'scp' utility to directly copy files to Ranch:
    scp <file> ${ARCHIVER}:${ARCHIVE}/<filename>

    where < file > is the name of the file to copy and < filename > is the path to the archive on Ranch.

  3. bbcp -T '/usr/bin/rsh -l %U %H bbcp' \ 
              -S '/usr/bin/rsh -l %U %H bbcp' <file> ${ARCHIVER}:${ARCHIVE}/<filename>

    where < file > is the name of the file to copy and < filename > is the path to the archive on Ranch. These options allow for transfer without the need to type a password. You can see all options by typing the following command:

    bbcp -h

    Here are a few bbcp options that you might find useful:

    • Like cp, bbcp has a  -r option for recursively transferring directories.
    • Often during large transfers, the connection between systems is lost. The -a option gives bbcp the ability to pick up where it left off.
    • The -P option displays a progress message every seconds, which may also be useful during large transfers.

    The multistreaming transfer ability of bbcp makes it ideal for large files. It can break up the transfer into multiple simultaneous streams, thereby transferring data much faster than single-streaming utilities such as scp and sftp. For more information, see the man page.

For large amounts of data, create smaller tar files; perhaps breaking the data up by subdirectory. This will also make it more efficient to retrieve portions of your data, as needed. If you are concerned about space and need to compress the tar files, please try to do so when the system is not heavily loaded. We recommend that small files be tarred together and compressed, but you should try to keep tar files under 10 GB if at all possible (this reduces the chance of file corruption). Binary data does not compress, so you can save that step.

Staging Data

To stage data, (begin the process of retrieving from tape), before transferring back from Ranch, do:

ssh $ARCHIVER stage "file list"

This will begin the staging process and return immediately, if you add the stage option -w, it will wait until staging is complete. Then, you can do:

rcp $ARCHIVER:"file list"

Or, you can login to Ranch and issue the commands from there. Use the following command to identify files which are on tape or disk:

ranch$ sls -2
-rwxr-xr-x   1 username G-81769        349 May  4  2008 filename
---------  ----- -- --  dk ti       
-rwx------   1 username G-81769        349 Jun 14  2008 filename
O--------  ----- -- --  dk ti   

The third line of the output will list attributes related to archiving:

Status Description
O The file is offline, removed from disk, and is only on tape.
P The file is offline with partial online
E
The tape where the file resides has been flagged as "damaged." Contact TACC User Services.
- The file is online (and has not been copied to tape.)

Files in the offline state should be staged using the stage command before attempting to retrieve them.

Remote ls (rls)

The rls command, where available, allows you to view your files on a remote system.  It can be used just like a normal ls. Be sure to include the $ARCHIVE variable to give rls the correct path. For example:

lslogin2$ ./rls -la test*
-rw-r--r-- 1 username support 30720 Sep 5 08:50 (DUL) /archive/username/test.tar.bz2

Sinc Is Not Cp (sinc)

The sinc utility is not supported at this time and the alternatives are recommended as a replacement.