ML image update #188

dhruvbalwada · 2021-02-12T05:06:01Z

Was talking to @scottyhq about using the ML image over here and having pytorch preloaded. I know @rabernat has asked about this before (#179) .

We were wondering who all are using the ML image? and what might be the requirements they have? @nbren12 @jhamman
It seems like the usage for the ML image is low based on the pulls here: https://github.com/pangeo-data/pangeo-docker-images.

Since pytorch and tensorflow are two of the big candidates,(and maybe used independently usually), @scottyhq suggested having a pangeo-pytorch and a pangeo-tensorflow.

Any other thoughts that people have?

rabernat · 2021-02-12T05:14:56Z

Correct, we are not using them much right now. However, there are several project spinning up now that will require ML Pangeo images, so it's a good time to think about this.

IMO, before creating more images, we need to make a plan to address how to maintain these images sustainably going forward. Within a month or so we should have a dedicated, full-time Pangeo engineer at 2i2c, and that person should be able to help out with this.

nbren12 · 2021-02-12T07:54:05Z

I don’t use these images.

My $0.02: the many images problem is a symptom of a docker not being a package manager. Dockerfiles are a linear sequence of commands while packages form a dependency graph. It will always be hard to map docker images onto the packages people want.

Maintaining multiple images is painful. Honestly, for scientific workflows with GB/TB scale datasets, “light” containers don’t seem worth the trouble. If you can get away with it, I suggest 1 mega docker image (you need to pin of all package versions or it will constantly break) or leveraging a tool like repo2docker if you need multiple images. You can also e.g. have packages installed when a user starts a container like the dask image does.

nbren12 · 2021-02-12T08:01:42Z

It looks like this repo already uses repo2docker, so maybe the tooling is good enough to support many images 🤷 . Maybe pin the “from -image” statements as well to keep things more reproducible.

scottyhq · 2021-02-16T18:02:59Z

@nbren12 thanks for the comments. This repo is a bit confusing to understand, despite the tags images are in theory reproducible thanks to using conda-lock to presolve for the environment added to the docker image, so for example to recreate an image from the past:

git clone https://github.com/pangeo-data/pangeo-docker-images.git
cd pangeo-docker-images
git checkout 2020.09.30
docker build -t pangeo/base-image:master base-image
docker build -t pangeo/ml-notebook:2020.09.30 ml-notebook

GPU-enabled ML packages are hard to cram into the same conda environment though in our attempts so far, which is why perhaps it's best to pick either tensorflow or pytorch. Preferably we have someone actively using the image responsible for curating the packages. Not sure who that would be these days?

It will always be hard to map docker images onto the packages people want.

Couldn't agree more. Although we've gotten a lot of mileage out of people using a common environment on pangeo hubs. For long term sustainability though, someone will need to tackle allowing users to customize their environment: #148

nbren12 · 2021-02-16T22:24:04Z

Ah yes. I see the lock files now.

GPU-enabled ML packages are hard to cram into the same conda environment though in our attempts so far

Interesting. What's the main barrier? Package versions not resolving?

scottyhq · 2021-02-16T23:38:25Z

Interesting. What's the main barrier? Package versions not resolving?

Yeah. For example trying adding pytorch-gpu and jax in #179 https://github.com/pangeo-data/pangeo-docker-images/runs/1712185623?check_suite_focus=true

It seems like the general guidance is not to mix conda channels (ideally everything comes from conda-forge with the 'strict' channel priority setting). But to get the GPU-enabled packages we've had to relax that setting (https://github.com/pangeo-data/pangeo-docker-images/blob/master/ml-notebook/condarc.yml) and point to packages on specific channels:

pangeo-docker-images/ml-notebook/environment.yml

Lines 11 to 14 in b6e6b19

    
           #rapidsai-nightly/linux-64::cuspatial 
        
           #rapidsai-nightly/linux-64::cudf 
        
            - conda-forge/linux-64::cupy 
        
            - pkgs/main/linux-64::tensorflow-gpu>=2

nbren12 · 2021-02-17T02:36:47Z

Good to know. This topic provokes so much in me---I've spent a lot of time maintaining developer environments. I've been interested in a package manager called nix which is basically a more composable docker. I hope it picks up steam in the next few years.

rabernat · 2021-02-17T02:41:37Z

For some context, I will share the amazing blog post Noah recently published on this topic! https://www.noahbrenowitz.com/post/2021-version-pinning/

It's a hard problem, but one we should keep plugging away at. We don't have a perfect solution yet, but we have made good progress!

scottyhq · 2021-03-02T00:38:38Z

love the post @nbren12 this one is also worth checking out for tips on reducing image size https://uwekorn.com/2021/03/01/deploying-conda-environments-in-docker-how-to-do-it-right.html

weiji14 · 2023-09-26T19:57:32Z

Closing this as we've added a pytorch-notebook image in #315. See also discussion at #457 on optimizing the ml-notebook (tensorflow) and pytorch-notebook images further for GPU-accelerated workflows.

scottyhq mentioned this issue Mar 19, 2021

add Jax to ML notebook #196

Merged

rabernat mentioned this issue Apr 26, 2022

Add pytorch to ML image (or create separate image) #312

Closed

weiji14 mentioned this issue Apr 27, 2022

Create pytorch-notebook docker image #315

Merged

4 tasks

weiji14 mentioned this issue Sep 13, 2022

Add cupy to ml notebooks #322

Open

weiji14 closed this as completed Sep 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ML image update #188

ML image update #188

dhruvbalwada commented Feb 12, 2021 •

edited

Loading

rabernat commented Feb 12, 2021

nbren12 commented Feb 12, 2021 •

edited

Loading

nbren12 commented Feb 12, 2021

scottyhq commented Feb 16, 2021 •

edited

Loading

nbren12 commented Feb 16, 2021

scottyhq commented Feb 16, 2021

nbren12 commented Feb 17, 2021

rabernat commented Feb 17, 2021

scottyhq commented Mar 2, 2021

weiji14 commented Sep 26, 2023

ML image update #188

ML image update #188

Comments

dhruvbalwada commented Feb 12, 2021 • edited Loading

rabernat commented Feb 12, 2021

nbren12 commented Feb 12, 2021 • edited Loading

nbren12 commented Feb 12, 2021

scottyhq commented Feb 16, 2021 • edited Loading

nbren12 commented Feb 16, 2021

scottyhq commented Feb 16, 2021

nbren12 commented Feb 17, 2021

rabernat commented Feb 17, 2021

scottyhq commented Mar 2, 2021

weiji14 commented Sep 26, 2023

dhruvbalwada commented Feb 12, 2021 •

edited

Loading

nbren12 commented Feb 12, 2021 •

edited

Loading

scottyhq commented Feb 16, 2021 •

edited

Loading