(anaconda)= # Using Anaconda on ARC systems ## Use recent versions of Anaconda ARC will keep older versions of software packages available on clusters, but we recommend using more recent packages where available. This is particularly true of Anaconda because the base code continues to evolve in functionality and integration with package repositories. There are many cases where current/recent environments are impossible to create or update when using older versions of Anaconda. Use `module spider anaconda` to search our module system for the most recent Anaconda available on the system you're using. ## Do not run `conda init` Running `conda init` is a convenience for managing Anaconda virutal environments on a single computer, but it does not produce portable results. The principle action of `conda init` seems to be do add lines like this to the user's BASH startup script `~/.bashrc`: ``` # >>> conda initialize >>> # !! Contents within this block are managed by 'conda init' !! __conda_setup="$('/apps/easybuild/software/tinkercliffs-rome/Anaconda3/2020.11/bin/conda' 'shell.bash' 'hook' 2> /dev/null)" if [ $? -eq 0 ]; then eval "$__conda_setup" else if [ -f "/apps/easybuild/software/tinkercliffs-rome/Anaconda3/2020.11/etc/profile.d/conda.sh" ]; then . "/apps/easybuild/software/tinkercliffs-rome/Anaconda3/2020.11/etc/profile.d/conda.sh" else export PATH="/apps/easybuild/software/tinkercliffs-rome/Anaconda3/2020.11/bin:$PATH" fi fi unset __conda_setup # <<< conda initialize <<< ``` Notably, you can see that explicit references are made to paths which are specific to a particular node type on ARC systems, like `/apps/easybuild/software/tinkercliffs-rome/...`. Such a path only exists on one node type on Tinkercliffs and will fail on a different type of Tinkercliffs node or any other cluster's nodes. In short, `conda init` produces non-portable results and so we recommend not to use it. ## Use `source activate` and do not use `conda activate` Use of `conda activate ` requires the Anaconda initialization from above, and so is not designed to work on systems where a single home directory is shared between several different nodes. Instead, use `source activate ` to activate Anaconda virtual environments. ## Create a virtual environment specifically for the type of node where it will be used The Tinkercliffs cluster has at least three different node types and so does Infer. Each node type is equipped with a different cpu micro-architecture, slightly different operating system and/or kernel versions, slightly different system configuration and packages. All are tuned to be customized and efficient for the particular node features. These system differences can make Anaconda virtual environments non-portable between node types. As a result, you should create and build a virtual environment on a node of the type where you will use the environment. ### Example 1: The Tinkercliffs login nodes are essentially identical to `normal_q` partition nodes. So if you wish to use Anaconda for jobs on `normal_q` nodes, you can build the environment on the Tinkerliffs login nodes OR on the `normal_q` nodes. But you should not use an environment which was built on another cluster (eg. Cascades or Infer). Instead, use `conda list` to view and document the most important packages and versions in the environment and build a new environment for the Tinkercliffs `normal_q` matching those specifications. ### Example 2: If you want to use Anaconda on Tinkercliffs `a100_normal_q` nodes, then you need to build the environment from a shell on those nodes. The important commands for this are: |command |purpose| |-|-| |`interact`|get an interactive command line shell on a compute node| |`module spider`|search for the latest anaconda module| |`module load`|load a module| |`conda create -p $HOME/envname`|create a new anaconda environment at the provided path| |`source activate $HOME/envname`|activate the newly created environment| |`conda install ...`|install packages into the environment| ```{note} $HOME "expands" in the shell to your home directory, eg. `/home/jdoe2`. And `envname` from above should be a short but meaninful name for the environment. Since they are particular to the node type, it is recommended to reference the node type in the name. For example `tca100-science` or `tcnq` for Tinkercliffs `a100_normal_q` nodes or Tinkercliffs `normal_q` nodes respectively. ``` ``` [jdoe2@tinkercliffs2 ~]$ interact --partition=a100_normal_q --nodes=1 --ntasks-per-node=4 --gres=gpu:1 --account=jdoeacct srun: job 438605 queued and waiting for resources srun: job 438605 has been allocated resources [jdoe2@tc-gpu004 ~]$ module spider anaconda module --------------------------------------------------------------------------------------------------------------- Anaconda3: --------------------------------------------------------------------------------------------------------------- Description: Built to complement the rich, open source Python community, the Anaconda platform provides an enterprise-ready data analytics platform that empowers companies to adopt a modern open data science analytics architecture. Versions: Anaconda3/2020.07 Anaconda3/2020.11 --------------------------------------------------------------------------------------------------------------- For detailed information about a specific "Anaconda3" package (including how to load the modules) use the module's full name. Note that names that have a trailing (E) are extensions provided by other modules. For example: $ module spider Anaconda3/2020.11 --------------------------------------------------------------------------------------------------------------- [jdoe2@tc-gpu004 ~]$ module load Anaconda3/2020.11 [jdoe2@tc-gpu004 ~]$ conda create -p ~/env/a100_env Collecting package metadata (current_repodata.json): done Solving environment: done Please update conda by running $ conda update -n base conda ## Package Plan ## environment location: /home/jdoe2/env/a100_env Proceed ([y]/n)? y Preparing transaction: done Verifying transaction: done Executing transaction: done [jdoe2@tc-gpu004 ~]$ source activate /home/jdoe2/env/a100_env/ (/home/jdoe2/env/a100_env) [jdoe2@tc-gpu004 ~]$ conda install python=3.9 pandas Collecting package metadata (current_repodata.json): done Solving environment: done ==> WARNING: A newer version of conda exists. <== current version: 4.10.3 latest version: 4.12.0 Please update conda by running $ conda update -n base conda ## Package Plan ## environment location: /home/jdoe2/env/a100_env added / updated specs: - pandas - python=3.9 ``` ## GPU - Cuda compatability While `nvidia-smi` will display a version of CUDA, this is just the base CUDA on the node and can be overridden by - loading a different CUDA module: `module spider cuda` - activating an Anaconda environment which has cudatoolkit `conda list cudatoolkit` - installing a conda package built with a different cuda: `conda list tensorflow` -> check the build string ### A100 GPUs require CUDA 11.0 or greater ### Check CUDA version in Tensorflow ``` import tensorflow as tf sys_details = tf.sysconfig.get_build_info() cuda_version = sys_details["cuda_version"] print(cuda_version) ``` ### Check cuDNN version in TensorFlow ``` cudnn_version = sys_details["cudnn_version"] print(cudnn_version) ``` ### Check CUDA version in PyTorcb `torch.version.cuda`