(scratch)= # Performance Comparison of Scratch vs. Various ARC Filesystems ## Test Results Summary The table below shows some informal test timings of file actions performed on a relatively small sample dataset which contains a large number of files. This type of data set can be a major challenge for many filesystems because a large portion of the time needed to process them is spent in overhead operations for the operating system, networking, storage protocol, and storage subsystems. When the filesystem is attached via a network (as are `/home` and `/projects` on ARC systems), there is a extra layer of overhead for the network communications and storage protocols. While ARC systems are interconnected with some of the fastests and lowest possible latency networks, the aggregate impact of that latency when performing on the order of 10^5 operations and beyond can be very noticeable. ## Sample fileset properties: |format|size|number of files|mean file size|stdev|min|medianmax| |-|-|-|-|-|-|-| |tar|9244907520 bytes (8.7GiB)|1290284|7165 bytes|26623|21|1785|1.2MiB| ## Table of results |target filesystem| copy from HOME| untar (s) | find (s) | delete (s) | |-----------------|---------------|-------------|-----------|-------------| | HOME | - n/a - | 6365.208 | 276.925 | 314.559 | | k80_q node NVME | 11.487 | 42.125 | 2.688 | - | | A100 node NVME | 17.486 | 25.424 | 1.653 | 2.130 | | PROJECTS | 9.352 | 2520 | 664.77 | | | /fastscratch | 25.385 | 5906.447 | 89.391 | 2821.392 | ## Lessons to infer from these results ### Data needs to be close to compute It is a widely used mantra that "data locality" is critical for compute performance and these tests provide a nearly real-world example. ### Keep many-small-files datasets tarred on networked file systems like `/home` and `/projects` ### Transferring data makes it more likely to be in a nearby cache ### NVMe built-in parallelism can be a huge advantage ## Cascades K80 node with NVMe test results: ``` # get a job on a k80_q node [brownm12@calogin2 ~]$ salloc --nodes=1 --exclusive --partition=k80_q --account=arctest salloc: Granted job allocation 926141 salloc: Waiting for resource configuration salloc: Nodes ca005 are ready for job [brownm12@ca005 ~]$ cd $TMPNVME [brownm12@ca005 926141]$ ll total 0 # copy from /home to TMPNVME [brownm12@ca005 926141]$ time cp ~/fstest/mil.tar . real 0m11.487s user 0m0.010s sys 0m8.308s # untar from TMPNVME -> TMPNVME [brownm12@ca005 926141]$ time tar -xf mil.tar real 0m42.125s user 0m4.399s sys 0m37.456s # Count the files extracted from the tar [brownm12@ca005 926141]$ time find ./10* | wc -l 1290284 real 0m2.688s user 0m1.009s sys 0m1.808s ``` ## Cascades login node working in `$HOME` ``` # Untar in /home -> /home [brownm12@calogin2 fstest]$ time tar -xf mil.tar real 106m5.208s user 0m21.187s sys 4m9.755s # Count the files extracted from the tar [brownm12@calogin2 fstest]$ time find ./10* | wc -l 1290284 real 4m36.925s user 0m3.257s sys 0m20.711s # rm on /home [brownm12@calogin2 fstest]$ time rm -rf 10* real 50m14.559s user 0m6.426s sys 1m38.699s ``` ## Tinkercliffs A100 node with NVMe drive tests ``` [brownm12@tc-gpu001 tmp]$ time cp ~/fstest/mil.tar . real 0m17.486s user 0m0.002s sys 0m5.363s [brownm12@tc-gpu001 tmp]$ time tar -xf mil.tar real 0m25.424s user 0m2.717s sys 0m22.601s [brownm12@tc-gpu001 tmp]$ time find ./10* | wc -l 1290284 real 0m1.653s user 0m0.647s sys 0m1.074s [brownm12@tc-gpu001 tmp]$ time rm -rf ./10* real 0m32.130s user 0m0.786s sys 0m26.716s [brownm12@tc-gpu001 tmp]$ time tar -c 10* > /dev/null real 0m6.420s user 0m3.210s sys 0m3.188s [brownm12@tc-gpu001 tmp]$ time tar -cf mil2.tar 10* real 0m13.066s user 0m3.787s sys 0m9.230s ``` ## Tinkercliffs login node testing against `/fastscratch` ``` #Copy from $HOME to /fastscratch [brownm12@tinkercliffs2 brownm12]$ time cp $HOME/fstest/mil.tar . real 0m25.385s user 0m0.002s sys 0m6.788s #Untar /fastscratch -> /fastscratch [brownm12@tinkercliffs2 brownm12]$ time tar -xf mil.tar real 98m26.447s user 0m4.996s sys 1m23.815s #Use find to count the files in the unpacked dataset [brownm12@tinkercliffs2 brownm12]$ time find ./10* | wc -l 1290284 real 1m29.391s user 0m0.827s sys 0m6.329s # Delete files from /fastscratch [brownm12@tinkercliffs2 brownm12]$ time rm -rf ./10* real 47m1.392s user 0m1.077s sys 1m4.614s ```