Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove cuda based images #903

Open
eitsupi opened this issue Jan 25, 2025 · 8 comments
Open

Remove cuda based images #903

eitsupi opened this issue Jan 25, 2025 · 8 comments
Labels
CI pre-built images Related to pre-built images

Comments

@eitsupi
Copy link
Member

eitsupi commented Jan 25, 2025

I am fed up with the amount of questions about cuda and Python setup and the maintenance hassle, and strongly believe that users should install any version of Python using uv on the version of cuda image they want to use. (and then use rig to install and use any version of R)

The situation has changed dramatically from a few years ago when there was no rig or uv, and I think the significance of the old kind of pre-built image is declining.

@cboettig @noamross Thoughts?

@eitsupi eitsupi added CI pre-built images Related to pre-built images labels Jan 25, 2025
@eitsupi eitsupi pinned this issue Jan 25, 2025
@cboettig
Copy link
Member

makes sense to me -- that's what I've been doing for my needs, e.g. building on top of the jupyterhub cuda images. (e.g. https://github.com/boettiger-lab/k8s/blob/main/images/Dockerfile.gpu#L1 is my current gpu setup)

@cboettig
Copy link
Member

@eitsupi I'm thinking I'll drop a JupyterHub-based image into the old https://github.com/rocker-org/ml repo.

@eitsupi
Copy link
Member Author

eitsupi commented Jan 29, 2025

Thanks, that might make sense.

However, when we look here, there are multiple images for ML use. Which one is agreed to be the base image?
https://jupyter-docker-stacks.readthedocs.io/en/latest/using/selecting.html

Since it is not practical to cover all of these, I imagine it would be probably easiest to provide documentation and sample Dockerfiles explaining how to install R and RStudio on these images.

@cboettig
Copy link
Member

cboettig commented Jan 29, 2025

Yes, great points, thanks for raising these issues! I'll document these things, and I won't intend to cover all those images. You've probably noticed that actually quite few of those images include the NVIDIA CUDA libraries.

I do intend to provide a pre-built image with my recommended configuration as well, which will use the CUDA image on latest Ubuntu, as I indicate above. Jupyter's tensorflow only provides cuda latest, while tagged versions exist only for their pytorch base image. In my experience and surveys I have seen from colleagues at computing centers, pytorch is far more widely used at this time. So while I agree with you that when users look at all the images discussed there it looks intimidating, I think for this the choice I indicated above, quay.io/jupyter/pytorch-notebook:cuda12-ubuntu-24.04 makes sense.

I completely agree that we want to document how to customize this. Given the recent introduction of JupyterHub's Fancy Profiles that can build directly from a Dockerfile, it is easier than ever to bring-your-own Dockerfile (which is a natural pattern for codespaces and gitlab use as well).

There's obviously a lot of ways to set these things up, and just as Rocker has always done the rocker/ml repo will show just one opinionated way to go about it rather than something comprehensive or overly flexible; experienced users will always be able to adapt. e.g. I will go with Dirk's r2u approach, since for users writing their own Dockerfiles for a binder/jupyter experience, having it automatically solve apt dependencies is a significant win.

I know you've grown wary of all the python and cuda issues over here, so it sounds like addressing these in a different repo would be helpful too. For simplicity, the ml/ cuda image will not attempt the strong versioned promises we try and have here.

@benz0li
Copy link
Contributor

benz0li commented Feb 3, 2025

The situation has changed dramatically from a few years ago when there was no rig or uv, and I think the significance of the old kind of pre-built image is declining.

As a user, pre-built images are easier to work with than using a base image + a virtual environment manager.

IMHO containers [like the ones here] + rig/uv/other [virtual environment manager] are not meant for each other.

makes sense to me -- that's what I've been doing for my needs, e.g. building on top of the jupyterhub cuda images. (e.g. https://github.com/boettiger-lab/k8s/blob/main/images/Dockerfile.gpu#L1 is my current gpu setup)

You could also use b-data's/my CUDA-based JupyterLab R docker stack.

[...] that can build directly from a Dockerfile, it is easier than ever to bring-your-own Dockerfile (which is a natural pattern for codespaces and gitlab use as well).

Most people are simply building on existing Rocker or Jupyter images.
ℹ Like almost all of the few GPU-accelerated [Jupyter-based] images available.

@cboettig
Copy link
Member

cboettig commented Feb 3, 2025

Thanks @benz0li ! Your work is excellent as well. And yes, I totally get where you're coming from on containers vs virtual envs. I think that's definitely true for 'production containers', but perhaps a bit different for these 'dev containers' in which the goal is to support an end user customizing the environment further using patterns with which they are already familiar.

e.g. conda can be pretty cumbersome, especially when it comes to packages that require conda's 'activation' mechanism of shell shims and global env vars (e.g. as in rasterio and other gdal-binding conda packages).

However, as you already know, the official jupyter stacks are conda based, the python geospatial pangeo community is deeply conda based, and users know and expect conda. Hence the design I proposed above. This provides a concise Dockerfile that transparently extends the base Jupyter cuda image. Python installs are handled by conda. Meanwhile R installs are handled by Dirk's excellent r2u / bspm approach -- again based on user considerations. None of us think conda is a nice solution for installing R packages, but bspm handles the binary dependencies nicely (a container build time, during runtime I switch to binary installs from r-universe). In this way, a user can extend the environment with environment.yaml and install.r scripts without manually resolving lib deps.

As I noted above, this is certainly an opinionated setup, a bit different than existing setups but closely aligned with the official Jupyter images. I've tested this in a range of classroom and research settings over the past year or so alongside the other images discussed above. Moreover I think this provides a good way forward to maintain some cuda options in a separate repo in the rocker project while avoiding the headaches @eitsupi noted at top. big thanks to you both!

@benz0li
Copy link
Contributor

benz0li commented Feb 4, 2025

However, as you already know, the official jupyter stacks are conda based

Yes. That was one reason I created my own docker stacks.

Other reasons: Rocker images' use of s6-overlay and Juypter images' handling of the user's home directory1.

the python geospatial pangeo community is deeply conda based, and users know and expect conda.

People may install Conda / Mamba at user level.


Both the Version-stable Rocker images and Jupyter Docker Stacks are very popular and @eitsupi as well as @mathbunnyru do a great job improving and maintaining them.

Footnotes

  1. b-data's/my docker stacks allow for a persistent home directory that may be shared among all JupyterLab R/Python/Mojo/MAX/Julia docker images.

@benz0li
Copy link
Contributor

benz0li commented Feb 4, 2025

Regarding dev containers: (CUDA-based) Data Science dev containers
ℹ Available for R, Python, Julia and MAX/Mojo

(I am trying to serve a larger community with a unified setup)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI pre-built images Related to pre-built images
Projects
None yet
Development

No branches or pull requests

3 participants