Eric J Ma's Website

Speeding up CI Pipelines with Micromamba: A LlamaBot Case Study

written by Eric J. Ma on 2023-11-26 | tags: continuous integration python micromamba llamabot conda yaml mambaforge caching


Nobody likes waiting around for continuous integration (CI) pipelines to finish. I'm sure many of you can relate to the frustration of slow build times. In my recent post, How to choose a (conda) distribution of Python, I touched upon Python distributions. However, a comment on LinkedIn by Wade Rosko, referencing Hugo Shi's post about speeding up a Saturn Cloud CI job with micromamba, got me thinking. They achieved a 40X speedup! This was something I had to try.

The Experiment

I decided to use LlamaBot's CI system as a test case. The original setup used Miniconda with the following YAML configuration:

- name: Setup miniconda
  if: matrix.environment-type == 'miniconda'
  uses: conda-incubator/setup-miniconda@v2
  with:
    auto-update-conda: true
    miniforge-variant: Mambaforge
    channels: conda-forge
    activate-environment: llamabot
    environment-file: environment.yml
    use-mamba: true
    python-version: ${{ matrix.python-version }}

In my new approach, I switched to using micromamba with the following YAML:

- uses: mamba-org/setup-micromamba@v1
  if: matrix.environment-type == 'miniconda'
  with:
    micromamba-version: '1.4.5-0'
    environment-file: environment.yml
    init-shell: bash
    cache-environment: true
    cache-environment-key: environment-${{ steps.date.outputs.date }}
    cache-downloads-key: downloads-${{ steps.date.outputs.date }}
    post-cleanup: 'all'

The rest of the YAML file remained unchanged.

Benchmarking Results

I meticulously recorded the timings, and you can find the full record here.

Configuration Run 1 Run 2 Run 3 Run 4
Old YAML 2m 4s 3m 16s 2m 3s 6m 34s
New YAML 1m 1s 1m 19s 2m 9s N/A

Analysis and Conclusion

The timings on my latest PRs with the new setup feel more consistent. Not only is there a noticeable reduction in build and test times, averaging around 1 minute, but there's also no difference in the final environment, and all tests pass.

The primary win here comes from the built-in, turnkey caching of the entire environment, a stark contrast to the more complicated caching methods suggested by mambaforge. Opting for caching based on the date seems to be a practical compromise, especially since I often test against bleeding-edge packages.

Wrapping Up

Switching to micromamba for the LlamaBot's CI system was a rewarding experience. It's a straightforward and effective way to reduce CI times significantly. If you're dealing with similar CI delays, consider giving micromamba a try. It could just be the solution you're looking for.


Cite this blog post:
@article{
    ericmjl-2023-speeding-study,
    author = {Eric J. Ma},
    title = {Speeding up CI Pipelines with Micromamba: A LlamaBot Case Study},
    year = {2023},
    month = {11},
    day = {26},
    howpublished = {\url{https://ericmjl.github.io}},
    journal = {Eric J. Ma's Blog},
    url = {https://ericmjl.github.io/blog/2023/11/26/speeding-up-ci-pipelines-with-micromamba-a-llamabot-case-study},
}
  

I send out a newsletter with tips and tools for data scientists. Come check it out at Substack.

If you would like to sponsor the coffee that goes into making my posts, please consider GitHub Sponsors!

Finally, I do free 30-minute GenAI strategy calls for teams that are looking to leverage GenAI for maximum impact. Consider booking a call on Calendly if you're interested!