ARC System Changes: 2026-05
Open OnDemand: ood.arc.vt.edu system, firmware, and software updates
Hosted LLMs: llm.arc.vt.edu authentication framework change
Clusters changes (Tinkercliffs, Owl, Falcon):
update to Slurm version 25.11 (requires job queues to be dropped)
cluster manager software updates
fix bug in sizing of TMPFS
updates to correctly auto-drain nodes due to ECC errors on GPUs
(TBD, in testing) update Nvidia drivers to 595.71.05 for CUDA 13.2 compatibility on all GPU nodes except V100s
Configuration changes:
tweak to reduce nodes being drained after some jobs which encounter timeout
increase throughput of backfill scheduler cycle
introduce scheduling tweaks to reduce wait time for interactive jobs launched via OnDemand or
interactupdate job priority weights to better balance job age, job size, and fairshare and improve capacity for backfill scheduling
Storage systems
VAST (/scratch, /apps), Qumulo (/home), ESS (/projects) storage system updates