Skip to content

Setting Up a Development Environment

Ryan Holbrook edited this page Apr 6, 2021 · 3 revisions

More information about Kaggle's Python images is in the repo.

In Docker, "images" are like classes and "containers" are like instances of a class. An image is a description of a runtime environment we want to create -- in our case, Ubuntu Linux, JupyterLab, and a bunch of Python packages. A container is a "live" version of an image. You could have lots of containers created from a single image if you wanted.

  1. Pull the latest image: docker pull "gcr.io/kaggle-images/python:latest" (CPU only) or docker pull "gcr.io/kaggle-gpu-images/python:latest" (with GPU support).
  • If you want the staging image, replace latest with staging.
  • The CPU image is about 20GB and the GPU image about 30GB.
  • You can see what images you currently have on your system with docker image ls.
  • To delete an image do docker image rm <IMAGE ID>. After downloading a new version of an image, you probably will want to delete the old version to keep from filling up your hard-drive.

Containers can be started and stopped. Stopping a container is like turning off a computer: processes terminate, but anything saved inside it will still be there when you start it again. If you delete a container, however, anything saved inside is gone forever. To keep from losing your work, you can share a folder from your "real" hard-drive with the container. Anything you save in the shared folder is saved permanently: you can delete the container and it will still be there.

  1. Create a folder called kaggle (or whatever) in your home directory and clone a copy of learntools into it git clone https://github.com/Kaggle/learntools.git. This will be your shared folder.

  2. Create a container: docker run -d -v /home/yourname/kaggle/:/home/jupyter/ -w /home/jupyter gcr.io/kaggle-images/python:latest

  • -d is the same as --detach. This gives you back control of your shell after starting the container.
  • -v is the same as --volume. This is how you share a folder. The syntax is -v <HOST PATH>:<CONTAINER PATH>. (The "host" is what Docker calls your "real" computer.)
  • -w is the same as --workdir. This sets the working directory to your shared folder.
  • Add --gpus all if you're using the GPU image.
  1. In your web browser, go to 172.17.0.2:8080. You should see JupyterLab come up. You can create notebooks, run IPython consoles, and run Bash terminals from here.

You are logged in as root inside the container. This means you'll want to change permissions back to yourself for anything you create inside the container. (I haven't been successful at making non-root access work.) In your regular, non-container shell, you can run sudo chown -R $USER:$USER * in /home/yourname/kaggle/ to change everything at once.

  1. You'll need to install a couple packages after creating a container the first time to get learntools to work. In JupyterLab, run:
  • pip install titlecase (a missing dependency)
  • pip uninstall learntools (since we're getting ready to install an editable version from our local repo)
  • cd /home/kaggle/learntools
  • pip install --editable .

And that's it! Just remember to delete containers after you're done using them. I usually just keep the same one around until I download an updated image, but if something ever breaks in a container, you can always just delete it and create a fresh one.

A couple more useful things:

  • docker container ls -a to list all containers
  • docker start <container> to start a stopped container (if you restarted your computer, say)
  • docker stop <container> to stop a running container
  • docker container rm <container> to delete an old container
  • docker exec -it <container> bash to open a Bash terminal in the container (or replace bash with any other program)
Clone this wiki locally