Introduction to iRODS at TACC
iRODS is a data grid/data management tool. It allows you to store data in a unified namespace using multiple storage resources, to replicate data so that copies exist on multiple systems, and to store checksums and arbitrary metadata with a file. The TACC iRODS configuration supports accessing iRODS through either the native iRODS tools such as the UNIX "i-Commands" or through WebDAV. iRODS is configured to store data on a 20TB "cache" filesystem for relatively short-term storage, on the 750TB "Corral" Lustre file system, and on any of the 3 Ranch archive file systems. Each of these storage systems is referred to as a "resource" within iRODS. The resources names are documented in the following table:
| Resource Name | Storage System |
|---|---|
| cache | 20TB cache filesystem |
| ranch1 | Ranch /home1archive |
| ranch2 | Ranch /home2 archive |
| ranch3 | Ranch /home3 archive |
| corral | 750TB Lustre file system |
Use of the Corral resource may be subject to allocation constraints - please limit yourself to using the Ranch archive file systems for long-term storage unless you have an allocation on Corral.
Setup for command-line usage
On Ranger and Lonestar:
The directory /opt/apps/irods contains an example configuration file, irodsEnv.
1. cp /opt/apps/irods/irodsEnv ~/.irods/.irodsEnv
2. edit ~/.irods/.irodsEnv to include your username in the indicated locations
3. Add /opt/apps/irods/bin to your path so you can access the binaries, or use "module load irods"
4. Run the "iinit" command to initialize the system - this will ask for a password, which you only have to enter once per session
Once you have configured the ~/.irods/.irodsEnv file, you can use the "i-commands" to access data in the system. The i-commands are generally the same as the standard Unix file management commands, but with an i prepended: ils, imkdir, icd, and so on. The "-R" switch is used to specify a target resource for commands that store data in the system. Other common switches include
-v for verbose output
-r for recursive operation
-h for detailed information on usage of any given command
The "ils -l" command can be used to see all the copies of files in the system. If a file has been replicated from corral to ranch3, for example, the file wil be listed twice, with each listing indicating the resource where the file or replica is stored, along with the replica number, which is used in commands like irm or iget, which can target a specific copy of a file.
Storing data into and retrieving data from the iRODS system
The “iput” command is used to store data into the system
iput /home/
If you used the default .irodsEnv file, this will store data in the largest Ranch file system, /home3.
Including the “-K” switch will cause a checksum to be generated when the data is stored. The user has the option of verifying this checksum when retrieving the data with the “iget” command.
The “iget” command is used to retrieve data from the system.
iget
Including the -K switch will trigger verification of the checksum the user may have generated when storing the data using the “iput” command.
Long-term storage of data from the cache
To replicate data stored in the cache file system into Ranch for long-term storage, use the "irepl" command as follows:
irepl -R ranch3
The command above will replicate the cache file
Deleting data from one or all resources
Use the “irm” command to delete data entirely or from a single resource. irm without options deletes all copies of a file. irm with the "-n #" switch deletes a specific replica. If you have stored data initially in the cache and then replicated it to Ranch, for example, replica 0 will be the copy stored on the cache, and the command:
irm -n 0 <path-to-file>
Synchronizing a local directory with iRODS
The irsync command can be used to synchronize a local directory with iRODS, similar to the rsync Unix command. It can be used to make an exact copy of a directory hierarchy on a local disk within iRODS, or retrieve an exact copy of a directory hierarchy already stored in iRODS. It may also be used to create an exact copy of a file or directory within iRODS. iRODS paths are identified with an i: prefix in the irsync command. For example, if you have created a directory within iRODS called /tacc/home/joeuser/myproject, and you wish to retrieve an exact copy of that directory on Ranger, run the command:
irsync -r i:/tacc/home/joeuser/myproject /path/to/joeusers/work/directory
After editing the files on Ranger, you can then synchronize the data back into iRODS using the command:
irsync -r /path/to/joeusers/work/directory i:/tacc/home/joeuser/myproject
If you are storing or retrieving data to Ranch with the -R ranch3 option, you should also use the -s switch - this will use the size rather than the checksum of the file to determine whether synchronization is necessary, thereby avoiding the need to retrieve all the files from tape to compute checksums. This will greatly improve the performance of synchronization with Ranch.







