Storage Resources

Overview

ARC offers several different storage options for users’ data:

Name

Intent

File System

Environment Variable

Per User Maximum

Data Lifespan

Available On

Home

Long-term storage of files

Qumulo

$HOME

640 GB 1 million files

Unlimited

Login and Compute Nodes

Group (Cascades, DragonsTooth, Huckleberry)

Long-term storage of shared, group files

GPFS

- n/a -

10 TB, 5 million files per faculty researcher (Expandable via investment)

Unlimited

Login and Compute Nodes

Project (TinkerCliffs, Infer)

Long-term storage of shared, group files

GPFS (replaced a BeeGFS system)

- n/a -

25 TB, 5 million files per faculty researcher (Expandable via investment)

Unlimited

Login and Compute Nodes

Work (Cascades, DragonsTooth, Huckleberry)

Fast I/O, Temporary storage

GPFS

$WORK

14 TB, 3 million files

120 days

Login and Compute Nodes

Work (TinkerCliffs, Infer)(End of life: March 2022)

Personal storage - Use PROJECTS or SCRATCH options instead now.

(formerly was a BeeGFS filesystem)

$WORK

1 TB, 1 million files

Unlimited

Login and Compute Nodes

Fast Scratch

Short-term, fast access to working files

Vast

- n/a -

No size limits enforced

90 days

Login and compute nodes

Local Scratch (tmpdir)

Fast, temporary storage. Auto-deleted when job ends

Local disk hard drives, usually spinning disk or SSD

$TMPDIR

Size of node hard drive

Length of Job

Compute Nodes

Local Scratch (tmpnvme)

Fast, temporary storage. Auto-deleted when job ends

Local disk hard drives, usually spinning disk or SSD

$TMPDIR

Size of node hard drive

Length of Job

Compute Nodes

Memory (tmpfs)

Very fast I/O

Memory (RAM)

$TMPFS

Size of node memory allocated to job

Length of Job

Compute Nodes

Archive

Long-term storage for infrequently-accessed files

GPFS

$ARCHIVE

-

Unlimited

Login Nodes

Global

Centralized repositories of large, commonly needed datasets and databases

VAST

- n/a -

-

-

Login and compute nodes, Tinkercliffs only.

Each is described in the sections that follow.

Home

Home provides long-term storage for system-specific data or files, such as installed programs or compiled executables. Home can be reached the variable $HOME, so if a user wishes to navigate to their Home directory, they can simply type cd $HOME. Each user is provided a maximum of 640 GB in their Home directories (across all systems). When a user exceeds the soft limit, they are given a grace period after which they can no longer add any files to their Home directory until they are below the soft limit. Home directories are also subject to a 690 GB hard limit; users Home directories are not allowed to exceed this limit. Note that running jobs fail if they try to write to a Home directory after the soft limit grace period is expired or when the hard limit is reached.

Note

Avoid reading/writing data to/from HOME in a job or using it as a working directory. Stage files into a “scratch” location to keep unnecessary I/O off of the HOME filesystem and improve performance. /fastscratch and Local Scratch

Group and Project

Project (on TinkerCliffs and Infer) and Group (on Cascades, DragonsTooth, and Huckleberry) provide long-term storage for files shared among a research project or group, facilitating collaboration and data exchange within the group. Each Virginia Tech faculty member can request group storage up to the prescribed limit at no cost by requesting a storage allocation via ColdFront. Additional storage may be purchased through the investment computing or cost center programs.

Data ownership passes to the shared directory creator/owner

A project PI requests a shared storage directory and gives access to others. When given access to the shared directory, they will have the ability to add, modify, and delete files from the directory according to the mode (set of permissions) of the directory.

Modes get set and changed in lots of different ways, and this sometimes results in group members, even the group owner, not having access to some files or subdirectories in their shared directory. The owner(s) of such files can fix this on their own with some chmod and/or chown commands. ARC encourages shared directory owners to work with their group members to establish best practices and to make a point to ensure their files are properly culled, organized, and manageble by the group before leaving the group. ARC personnel can consult on the commands to use and best practices for implementing such a transfer.

When a project owner removes a user from their shared directory via ColdFront, that user will no longer be able to access the directory to make such changes.

Quotas on Previous Project system

Note

This information is deprecated as of March 2022. The Project storage has been transferred from the BeeGFS system to an IBM ESS storage system running GPFS which computes quota usage based on the contents of the directory directly.

The file system that provides Project and Work directories on TinkerCliffs and Infer does quotas based on the group ID (GID) associated with files. This means that:

  • Files in your Work directory can count against your Project quota if they have that project’s GID

  • Files in your Project directory can count against your Work quota if they have your personal GID

You can check your Project and Work quotas with the quota command. You can check the GID associated with your files with ll (the same as ls -l) and can change the group with chgrp (chgrp -R for recursive on a directory). You can find files in a more automated fashion with find – see the example below. As an example, here we find some files in /projects/myproject that are owned by mypid:

[mypid@tinkercliffs2 ~]$ find /projects/myproject/test -group mypid
/projects/myproject/test
/projects/myproject/test/datafile
/projects/myproject/test/test.txt
[mypid@tinkercliffs2 ~]$ ls -ld /projects/myproject/test/
drwxrwxr-x 2 mypid mypid 2 Oct  4 08:43 /projects/myproject/test/
[mypid@tinkercliffs2 ~]$ ls -lh /projects/myproject/test/
total 1.1G
-rw-rw-r-- 1 mypid mypid 1.0G Oct  4 08:43 datafile
-rw-rw-r-- 1 mypid mypid    5 Jun  8 10:51 test.txt

These files will count against mypid’s Work quota. We change their ownership to the associated group with chgrp -R:

[mypid@tinkercliffs2 ~]$ chgrp -R arc.myproject /projects/myproject/test
[mypid@tinkercliffs2 ~]$ ls -ld /projects/myproject/test/
drwxrwxr-x 2 mypid arc.myproject 2 Oct  4 08:43 /projects/myproject/test/
[mypid@tinkercliffs2 ~]$ ls -lh /projects/myproject/test/
total 1.1G
-rw-rw-r-- 1 mypid arc.myproject 1.0G Oct  4 08:43 datafile
-rw-rw-r-- 1 mypid arc.myproject    5 Jun  8 10:51 test.txt

The files will now count against the Project quota.

A more automated example would be to have find both locate and change ownership of the files:

[mypid@tinkercliffs2 ~]$ ls -lh /projects/myproject/test/
total 1.1G
-rw-rw-r-- 1 mypid mypid 1.0G Oct  4 08:43 datafile
-rw-rw-r-- 1 mypid mypid    5 Jun  8 10:51 test.txt
[mypid@tinkercliffs2 ~]$ find /projects/myproject/test -group mypid -exec chgrp arc.myproject {} +
[mypid@tinkercliffs2 ~]$ ls -lh /projects/myproject/test/
total 1.1G
-rw-rw-r-- 1 mypid arc.myproject 1.0G Oct  4 08:43 datafile
-rw-rw-r-- 1 mypid arc.myproject    5 Jun  8 10:51 test.txt

Work

Work provides users with fast, user-focused storage for use during simulations or other research computing applications. However, it encompasses two paradigms depending on the cluster where it is being used:

  • On TinkerCliffs and Infer, it provides 1 TB of user-focused storage that is not subject to a time limit. Note that this quota is enforced by the GID associated with files and not by directory, so files in Project storage can wind up being counted against your Work quota; see here for details and fixes.

  • On Cascades, DragonsTooth, and Huckleberry, it provides up to 14 TB of space. However, ARC reserves the right to purge files older than 120 days from this file system. It is therefore aimed at temporary files, checkpoint files, and other scratch files that might be created during a run but are not needed long-term. Work for a given system can be reached via the variable $WORK. So if a user wishes to navigate to Work directory, they can simply type cd $WORK.

Work on Tinkercliffs and Infer

Changes starting in March 2022

Note

The /work file system is being decomissioned on 3/28/2022. If you wish to preserve any data you have in /work/<pid>, you must migrate it to another location: /fastscratch for temporary storage of working files or /projects for longer-term, shared group storage.

Why is /work going away?

At the launch of Tinkercliffs in Fall 2020, two filesystems were made available for “scratch” style storage: /work and /fastscratch. The latter, /fastscratch, was a hosted on a system new to ARC at the time and provisionally released without restrictions to provide some time to monitor and gauge real-world performance and capabilities. The former, /work, followed the pattern of other ARC systems over the past 10+ years, by providing user-specific storage. It was inten

Archive

Archive provides users with long-term storage for data that does not need to be frequently accessed i.e. storing important/historical results. Archive is accessible from all ARC’s systems. Archive is not mounted on compute nodes, so running jobs cannot access files on it. Archive can be reached the variable $ARCHIVE, so if a user wishes to navigate to their Archive directory, they can simply type cd $ARCHIVE.

Best Practices for archival storage

Because the ARCHIVE filesystem is backed by tape (a high capacity but very high latency medium), it is very inefficient and disruptive to do file operations (especially on lots of small files) on the archive filesystem itself. Archival systems are designed to move and replicate very large files; ideally users will tar all related files into singular, large files. Procedures are below:

To place data in $ARCHIVE:

  1. create a tarball containing the files in your $HOME (or $WORK) directory

  2. copy the tarball to the $ARCHIVE filesystem (use rsync in case the transfer were to fail)

To retrieve data from $ARCHIVE:

  1. copy the tarball back to your $HOME (or $WORK) directory (use rsync in case the transfer were to fail).

  2. untar the file on the login node in your $HOME (or $WORK) directory. Directories can be tarred up in parallel with, for example, gnu parallel (available via the parallel module). This line will create a tarball for each directory more than 180 days old:

find . -maxdepth 1 -type d -mtime +180 | parallel [[ -e {}.tar.gz ]] || tar -czf {}.tar.gz {}

The resulting tarballs can then be moved to Archive and directories can then be removed. (Directories can also be removed automatically by providing the --remove-files flag to tar, but this flag should of course be used with caution.)

VAST - Fast Scratch

While the use of scratch storage options below are constrained to the duration of a job, the VAST storage system provides a temporary staging and working space with better performance characteristics than HOME or PROJECT. It is a shared resource and has limited capacity (200TB), but individual use is unlimited provided it

Local Scratch

Running jobs are given a workspace on the local drives on each compute node which are allocated to the job. The path to this space is specified in the $TMPDIR environment variable. This provides a higher performing option for I/O which is a bottleneck for some tasks that involve either handling a large volume of data or a large number of file operations.

Note

Any files in local scratch are removed at the end of a job, so any results or files to be kept after the job ends must be copied to another location as part of the job. /fastscratch and

Local Drives

Running jobs are given a workspace on the local drives on each compute node. The path to this space is specified in the $TMPDIR environment variable

Solid State Drives (SSDs)

Solid state drives do not use rotational media (spinning disks/platters) but memory-like flash storage which gives it better performance characteristics. The environment variable $TMPSSD is set to a directory on an SSD accessible to the owner of a job when SSD is available on compute nodes allocated to a job.

Memory as storage

Running jobs have access to an in-memory mount on compute nodes via the $TMPFS environment variable. This should provide very fast read/write speeds for jobs doing I/O to files that fit in memory (see the system documentation for the amount of memory per node on each system). Please note that these files are removed at the end of a job, so any results or files to be kept after the job ends must be copied to Work or Home.

NVMe Drives

Same idea as Local Scratch, but on NVMe media which “has been designed to capitalize on the low latency and internal parallelism of solid-state storage devices.” Running jobs are given a workspace on the local NVMe drive on each compute node if it is so equipped. The path to this space is specified in the $TMPNVME environment variable. This provides another option for users who would prefer to do I/O to local disk (such as for some kinds of big data tasks). Please note that any files in local scratch are automatically removed at the end of a job, so any results or files to be kept after the job ends must be copied to Work or Home.

NVMe local scratch storage is available on nodes in the following nodes and capacities:

  • Cascades

    • largemem_q nodes, 1.8TB

    • k80_q nodes, 1.8TB

  • Tinkercliffs

    • a100_normal_q nodes, 11.7TB

    • intel_q nodes, 3.2TB

Global

On Tinkercliffs, the /global/ directory has been set up to provide centralized access to some commonly used databases and datasets which are large and/or have many files. All users can access these files. Some example datasets are the imagenet dataset and some biodatabases such as those neede by AlphaFold or other genomics applications. If you know of a dataset you think we should add to this repository, please let us know by submitted an ARC helpdesk request.

Checking Usage

You can check your current storage usage (in addition to your compute allocation usage) with the quota command:

[mypid@tinkercliffs2 ~]$ quota
USER       FILESYS/SET                         DATA (GiB)   QUOTA (GiB) FILES      QUOTA      NOTE 
mypid      /home                               584.2        596         -          -           

           GPFS                                                                              
mypid      /projects/myproject1                109.3        931                                
mypid      /projects/myproject2                2648.4       25600