Conda

Why miniconda/miniforge and not anaconda?

While Anaconda includes the conda software and the conda-forge channel, which are open-source and freely licensed, the defaults channel is subject to paid licenses (see terms). To protect VT’s research community, ARC does not provide the Anaconda package. You may install Anaconda into your home but we recommend removing the defaults channel.

Use module spider miniconda or module spider miniforge to search our module system for the most recent Miniconda/Miniforge available. Read the instructions on how to build a virtual environment using miniforge.

Create a virtual environment specifically for the type of node where it will be used

Each cluster has at least two different node types. Each node type is equipped with a different cpu micro-architecture, slightly different operating system and/or kernel versions, slightly different system configuration and packages. All are tuned to be customized and efficient for the particular node features. These system differences can make virtual environments non-portable between node types.

As a result, you should create and build a virtual environment on a node of the type where you will use the environment.

Example:

If you want to use a conda environment on Tinkercliffs a100_normal_q nodes, then you need to build the environment from a shell on those nodes.

The important commands for this are:

command	purpose
`interact`	get an interactive command line shell on a compute node
`module spider`	search for the latest anaconda module
`module load`	load a module
`conda create -p $HOME/envname`	create a new anaconda environment at the provided path
`source activate $HOME/envname`	activate the newly created environment
`conda install ...`	install packages into the environment

Note

$HOME “expands” in the shell to your home directory, eg. /home/jdoe2. And envname from above should be a short but meaninful name for the environment. Since they are particular to the node type, it is recommended to reference the node type in the name. For example tca100-science or tcnq for Tinkercliffs a100_normal_q nodes or Tinkercliffs normal_q nodes respectively.

Use conda env list to view conde environments and their absolute paths. Use conda list to view the packages and versions in the currently activated environment.

[jdoe2@tinkercliffs2 ~]$ interact --partition=a100_normal_q --nodes=1 --ntasks-per-node=4 --gres=gpu:1 --account=jdoeacct
srun: job 2920919 queued and waiting for resources
srun: job 2920919 has been allocated resources
[jdoe2@tc-gpu001 ~]$ module spider miniconda

-----------------------------------------------------------------------------------------------------------
  Miniconda3: Miniconda3/24.7.1-0
-----------------------------------------------------------------------------------------------------------
    Description:
      Miniconda is a free minimal installer for conda. It is a small, bootstrap version of Anaconda that
      includes only conda, Python, the packages they depend on, and a small number of other useful
      packages.


    This module can be loaded directly: module load Miniconda3/24.7.1-0

    Help:
      Description
      ===========
      Miniconda is a free minimal installer for conda. It is a small,
       bootstrap version of Anaconda that includes only conda, Python, the packages they
       depend on, and a small number of other useful packages.


      More information
      ================
       - Homepage: https://docs.conda.io/en/latest/miniconda.html


[jdoe2@tc-gpu001 ~]$ module load Miniconda3/24.7.1-0
[jdoe2@tc-gpu001 ~]$ conda create -p ~/.conda/envs/a100_env python=3.11 
...

Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
#     $ conda activate /home/jdoe2/env/a100_env
#
# To deactivate an active environment, use
#
#     $ conda deactivate

[jdoe2@tc-gpu001 ~]$ source activate /home/jdoe2/.conda/envs/a100_env/
(a100_env) [jdoe2@tc-gpu001 ~]$ conda install matplotlib
Proceed ([y]/n)? y


Downloading and Extracting Packages:

Preparing transaction: done
Verifying transaction: done
Executing transaction: done

Using kernels with an Environment

You can use a Jupyter kernel to use a virtual environment inside a Jupyter notebook. Each kernel can be used to run different cells according to its language/package requirements. For example, if you have a notebook that uses two different sets of packages where each set is installed in a different conda environment, then you can use Jupyter kernels to switch between those two sets of packages.

To start a kernel that is associated with a specific environment, activate the environment and install ipykernel inside that environment:

[jdoe2@tinkercliffs2 ~]$ module load Miniconda3
[jdoe2@tinkercliffs2 ~]$ source activate /home/jdoe2/.conda/envs/a100_env/
[jdoe2@tinkercliffs2 ~]$ conda install ipykernel
(a100_env) [jdoe2@tinkercliffs2 ~]$ python -m ipykernel install --user --name a100_env --display-name "Python (a100_env)"
Installed kernelspec a100_env in /home/jdoe2/.local/share/jupyter/kernels/a100_env

Then, when launching the Jupyter interactive app from Open OnDemand, you can start a kernel in the environment created before. From the top menu, select *Kernel -> Change kernel -> Python (a100_env), then execute your cell.

GPU - CUDA compatibility

While nvidia-smi will display a version of CUDA, this is just the base CUDA on the node and can be overridden by

loading a different CUDA module: module spider cuda
activating an Anaconda environment which has cudatoolkit conda list cudatoolkit
installing a conda package built with a different cuda: conda list tensorflow -> check the build string

Check CUDA and cuDNN version in Tensorflow

import tensorflow as tf

# This will print out various build and version information
print("TensorFlow version:", tf.__version__)
print("CUDA built with:", tf.sysconfig.get_build_info()["cuda_version"])
print("cuDNN built with:", tf.sysconfig.get_build_info()["cudnn_version"])

Check CUDA version in PyTorch

import torch

# Print PyTorch version and the CUDA version it was compiled with
print("PyTorch version:", torch.__version__)
print("CUDA version (compiled):", torch.version.cuda)

# Check if CUDA is available and the runtime version
print("Is CUDA available?", torch.cuda.is_available())
if torch.cuda.is_available():
    print("CUDA runtime version (from driver):", torch.cuda.get_device_properties(0).major, ".", torch.cuda.get_device_properties(0).minor)