(system-changes-jan23)= # ARC System Changes - January 2023 ## Notes and Guidance for January 2023 cluster changes ### Changes to `/home` on Cascades and Dragonstooth ```{admonition} HOME decoupled :class: warning While `/home` has been "universal" across ARC clusters in recent history, the Cascades/Dragonstooth `/home` is being decoupled from the others starting January 17, 2023. ``` Prior to the start of the January maintenance, the `/home` filesystem was universal across Tinkercliffs, Dragonstooth, Cascades, and Infer cluster. This is because the same network-attached storage system was mounted on `/home` on all the clusters. During the maintenance outage, a larger, faster replacement system for this purpose was brought online to serve `/home` for [Tinkercliffs](tinkercliffs) and [Infer](infer). Data was synchronized between the old and new systems to make the transition transparent for continued use of those systems. Since they're being decommsioned, [Cascades](cascades) and [Dragonstooth](dragonstooth) remain on the previous `/home` where the old data is still intact and are not connected to the new one. As a result, any file actions (added files, removed files, changes to files) performed on Tinkercliffs/Infer `/home` will not be reflected on the Cascades/Dragonstooth `/home` directories. The converse is true as well: any file actions performed on Cascades/Dragonstooth will not be reflected on Tinkercliffs/Infer. ### New policies for /fastscratch ```{admonition} Quota to be implemented on FASTSCRATCH :class: note Starting in January 2023, quota limits on the usage of `/fastscratch` will be put in place. ``` ## All ARC systems down for maintenance During a [maintenance outage](maintenance) in January 2023, the [Cascades](cascades) and [Dragonstooth](dragonstooth) clusters will be decommissioned. This means that jobs will no longer be accepted or start on the compute nodes. `/work` and `/groups` on Cascades and Dragonstooth will also be decommissioned in the following weeks. The login nodes will remain accessible for a limited time (tentatively, for about 3 weeks or until February 7th) to allow people the opportunity to retrieve data from those systems. A new backend storage system will come online to host `/home` directories on all current mainline ARC systems (Tinkercliffs and Infer). * At the time of transition, all data from the previous system will be replicated to the new system. No user action is needed. * The previous storage system for HOME directories will still serve the Cascades and Dragonstooth login nodes while they remain online. Changes on these nodes will not be reflected in /home on Tinkercliffs/Infer or vice-versa. ### Rationale We would prefer to keep the other clusters online until the new resource is available, but these older systems have rapidly become a liability as - their compute nodes fail (25% loss at this point) and are unsupported by manufacturers anymore - storage has endured a startling number of component failures and replacements recently - their provisioning/configuration management/administration systems are defunct and - the software stacks are outdated (OS kernel, `glibc`, compilers, libraries, software deployment system). To reduce the risk of catastrophic failure during operations and to align engineering time and effort toward new systems and services, these clusters are being taken offline. ### A new CPU system in the works As of December 2022, ARC is in the final phases of purchasing a new CPU system to replace these, but this new system is not likely to be available (due to acquisition, engineering, and testing timelines) before Summer 2023. ## What is NOT directly affected? The Tinkercliffs and Infer clusters and storage systems will resume normal operations in their current state after the end of the maintenance. The `/projects`, `/fastscratch`, and `/home` storage on those systems will remain in operation. ## Actions you may need to take The 3-6 weeks after the mid-January maintenance will be available for people to migrate any data they need to keep from those storage systems. A copy of all the `/groups` directories was made to `/projects` when Tinkercliffs was launched in fall 2020 and ARC will not make another bulk copy like this. ARC personnel are available to consult with PIs and labs as needed to assist with archiving older data sets and merging those in active use. We have a page here with information about [data transfers](data-transfer) ### Cleanup data in /groups and /work The hardware hosting the data which is currently stored in `/groups` or `/work` on Cascades is due to be decommissioned. If you need to preserve any files from those locations, please consider the following steps. ```{admonition} Please audit data before moving it :class: note Please avoid making bulk data transfers from `/groups` or `/work` until you have thoroughly reviewed the data. ARC systems are not intended for indefinite, permanent storage and keeping old, unused files greatly increases the cost of the filesystems and can cause performance degradation. ``` - Check to see if your data is already in `/projects` on Tinkercliffs or in some other storage repository. - Delete any old, duplicate, or unneeded data and files. - Consolidate old results or data so that only the necessary elements are kept. - Package old results or data into larger, more managable files using `tar` and/or `zip` utilites. An ideal file size for archival or transfer across networks is often beween 1GB and 100GB. Data sets which are smaller than 1GB or larger than 100GB will often be more cumbersome to work with. ```{admonition} tar vs. zip :class: note `tar` can package a directory tree into a single file, while `zip` utilities compress files. Test your data for compressibility before attempting to zip it. Many modern data formats do not compress well. ``` ## Get Help ARC personnel can assist with assessing and performing these steps. The best way to request such help is via a [4Help ticket](https://4help.vt.edu) or by attending [ARC office hours](https://arc.vt.edu/office-hours).