-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ML image update #188
Comments
Correct, we are not using them much right now. However, there are several project spinning up now that will require ML Pangeo images, so it's a good time to think about this. IMO, before creating more images, we need to make a plan to address how to maintain these images sustainably going forward. Within a month or so we should have a dedicated, full-time Pangeo engineer at 2i2c, and that person should be able to help out with this. |
I don’t use these images. My $0.02: the many images problem is a symptom of a docker not being a package manager. Dockerfiles are a linear sequence of commands while packages form a dependency graph. It will always be hard to map docker images onto the packages people want. Maintaining multiple images is painful. Honestly, for scientific workflows with GB/TB scale datasets, “light” containers don’t seem worth the trouble. If you can get away with it, I suggest 1 mega docker image (you need to pin of all package versions or it will constantly break) or leveraging a tool like repo2docker if you need multiple images. You can also e.g. have packages installed when a user starts a container like the dask image does. |
It looks like this repo already uses repo2docker, so maybe the tooling is good enough to support many images 🤷 . Maybe pin the “from -image” statements as well to keep things more reproducible. |
@nbren12 thanks for the comments. This repo is a bit confusing to understand, despite the tags images are in theory reproducible thanks to using conda-lock to presolve for the environment added to the docker image, so for example to recreate an image from the past:
GPU-enabled ML packages are hard to cram into the same conda environment though in our attempts so far, which is why perhaps it's best to pick either tensorflow or pytorch. Preferably we have someone actively using the image responsible for curating the packages. Not sure who that would be these days?
Couldn't agree more. Although we've gotten a lot of mileage out of people using a common environment on pangeo hubs. For long term sustainability though, someone will need to tackle allowing users to customize their environment: #148 |
Ah yes. I see the lock files now.
Interesting. What's the main barrier? Package versions not resolving? |
Yeah. For example trying adding pytorch-gpu and jax in #179 https://github.com/pangeo-data/pangeo-docker-images/runs/1712185623?check_suite_focus=true It seems like the general guidance is not to mix conda channels (ideally everything comes from conda-forge with the 'strict' channel priority setting). But to get the GPU-enabled packages we've had to relax that setting (https://github.com/pangeo-data/pangeo-docker-images/blob/master/ml-notebook/condarc.yml) and point to packages on specific channels: pangeo-docker-images/ml-notebook/environment.yml Lines 11 to 14 in b6e6b19
|
Good to know. This topic provokes so much in me---I've spent a lot of time maintaining developer environments. I've been interested in a package manager called nix which is basically a more composable docker. I hope it picks up steam in the next few years. |
For some context, I will share the amazing blog post Noah recently published on this topic! https://www.noahbrenowitz.com/post/2021-version-pinning/ It's a hard problem, but one we should keep plugging away at. We don't have a perfect solution yet, but we have made good progress! |
love the post @nbren12 this one is also worth checking out for tips on reducing image size https://uwekorn.com/2021/03/01/deploying-conda-environments-in-docker-how-to-do-it-right.html |
Was talking to @scottyhq about using the ML image over here and having pytorch preloaded. I know @rabernat has asked about this before (#179) .
We were wondering who all are using the ML image? and what might be the requirements they have? @nbren12 @jhamman
It seems like the usage for the ML image is low based on the pulls here: https://github.com/pangeo-data/pangeo-docker-images.
Since pytorch and tensorflow are two of the big candidates,(and maybe used independently usually), @scottyhq suggested having a pangeo-pytorch and a pangeo-tensorflow.
Any other thoughts that people have?
The text was updated successfully, but these errors were encountered: