Jack Atkinson

Science, Archery, Computing

Jack Atkinson's Website

Jack Atkinson

Science, Archery, Computing

Pondering Conda

5 minutes
February 5, 2025
computing,  software,  rse, 

I recently had call to use conda in a project. This was a first for me, having always been reticent to explore it, with a preference for managing python environments using pip and venv, and other dependencies via manual install or modules on a HPC system. Here I reflect on the experience and the basics that I learnt.


Motivation

In the past a few people had occasionally asked for help building our software using conda. Since we were not conda users, and our main use case was on High Performance Computing (HPC) systems where conda is not well-suited/used we had never really bothered to explore it.

However, as part of the review in submitting the code to JOSS (the Journal of Open Source Software) one of the reviewers attempted to build using conda and experienced some issues. This, coupled with the fact that some newer colleagues were familiar with conda, finally prompted us to take a closer look into it.

Obtaining conda and navigating the conda licensing mire

The first piece of advice I got was on “how to get conda”. There have been various discussions following changes to licensing that I was vaguely aware of; I gather that there were recent changes whereby packages distributed by anaconda, the parent company of conda, were to require fees for commercial use. As a result, conda forge was established as a community effort.

So, I was advised to use conda forge for setting up conda and obtaining packages ( conda-forge GitHub ).

To set up conda environments I was advised to use miniforge. Instructions for installing and using this can be found on the miniforge GitHub.

Similarly I was advised to disable default channels (so as not to fetch from anaconda) and use the conda-forge channels wherever possible.

Using conda

For reproducibility I found it is best to define environments through yaml files, and indeed that was our main objective - to provide a simple environment from which users could build and run our software.

One thing I learnt was that sometimes you will need to specify more packages than you think. Especially if you are used to an environment modules system where there is good subdependency management. For example, I am used to just running module load cuda/12.6 on the HPC, but in conda I needed:

1
2
3
4
5
dependencies:
  - cuda-version 12.6  # Set version of cuda here
  - cuda-compiler >=12.6.3
  - cuda-libraries-dev >=12.6.3
  - cuda-nvtx-dev >=12.6.77

to get everything else (compilers, libraries, etc.) required.

Related to this I found that the Conda package documentation can be confusing for newcomers. Often package docs provide a long list of files/subpackages, but not much explanation of what you need for which different cases.

I also learnt that whilst conda is a good way to specify dependencies for a reproducible environment it is not perfect. Sometimes conda will still fall back on system packages that may not be available on someone else’s machine. An example in our case was zlib which is available on 99% of unix systems. It was a missing subdependency of something we were fetching from conda that didn’t get picked up until a user tried installing on a super minimal system.

Related to our specific applications, Pytorch is out of date! The homepage points you at the PyTorch conda channel, but elsewhere it is documented that this is no longer maintained. However, it is possible to get PyTorch from the conda-forge channel, though not officially supported by PyTorch.

Follow-up: The PyTorch homepage now seems to have updated to no longer recommend any conda install of PyTorch, instead preferring pip. pip can be used from inside conda, and we made use of this for mac users. Elsewhere there is a conda-forge version of PyTorch available, though it is not officially supported by PyTorch themselves.

Takeaways

Will I be using conda for everything going forwards?
Short answer, No.

I found that the headaches it gave me were not enough to convince me compared to the approach I currently use. Plus there are already reasons for not wanting it as we sometimes want more minimal installs and defining a different environment file for every application does not feel very sensible or maintainable.

However, I do see a benefit in projects including conda instructions/environments for their code as a simple way to get started. Something of a halfway house towards containerisation providing users with a sandbox in which to explore the code without immediately being bogged down in installation issues.

In future if I want to have a tinker with a piece of code and a conda environment file is provided I might use it to take a first look. This is way we implemented conda in our project, and I can now see a benefit.

Follow-up: Other reviewers and users have since come forward to say this made it easier for them to explore and review our software.

Acknowledgments

I would thank my colleagues Kacper, Matt, and Karl for their advice in setting up conda. I would also like to extend a huge thank you to Matthew Feickert , our JOSS reviewer, for his input as we implemented this, even if he would prefer pixi .