-
Notifications
You must be signed in to change notification settings - Fork 247
Setting Up a Development Environment
More information about Kaggle's Python images is in the repo.
In Docker, "images" are like classes and "containers" are like instances of a class. An image is a description of a runtime environment we want to create -- in our case, Ubuntu Linux, JupyterLab, and a bunch of Python packages. A container is a "live" version of an image. You could have lots of containers created from a single image if you wanted.
- Pull the latest image:
docker pull "gcr.io/kaggle-images/python:latest"
(CPU only) ordocker pull "gcr.io/kaggle-gpu-images/python:latest"
(with GPU support).
- If you want the staging image, replace
latest
withstaging
. - The CPU image is about 20GB and the GPU image about 30GB.
- You can see what images you currently have on your system with
docker image ls
. - To delete an image do
docker image rm <IMAGE ID>
. After downloading a new version of an image, you probably will want to delete the old version to keep from filling up your hard-drive.
Containers can be started and stopped. Stopping a container is like turning off a computer: processes terminate, but anything saved inside it will still be there when you start it again. If you delete a container, however, anything saved inside is gone forever. To keep from losing your work, you can share a folder from your "real" hard-drive with the container. Anything you save in the shared folder is saved permanently: you can delete the container and it will still be there.
-
Create a folder called
kaggle
(or whatever) in your home directory and clone a copy of learntools into itgit clone https://github.com/Kaggle/learntools.git
. This will be your shared folder. -
Create a container:
docker run -d -v /home/yourname/kaggle/:/home/jupyter/ -w /home/jupyter gcr.io/kaggle-images/python:latest
-
-d
is the same as--detach
. This gives you back control of your shell after starting the container. -
-v
is the same as--volume
. This is how you share a folder. The syntax is-v <HOST PATH>:<CONTAINER PATH>
. (The "host" is what Docker calls your "real" computer.) -
-w
is the same as--workdir
. This sets the working directory to your shared folder. - Add
--gpus all
if you're using the GPU image.
- In your web browser, go to
172.17.0.2:8080
. You should see JupyterLab come up. You can create notebooks, run IPython consoles, and run Bash terminals from here.
You are logged in as root
inside the container. This means you'll want to change permissions back to yourself for anything you create inside the container. (I haven't been successful at making non-root access work.) In your regular, non-container shell, you can run sudo chown -R $USER:$USER *
in /home/yourname/kaggle/
to change everything at once.
- You'll need to install a couple packages after creating a container the first time to get learntools to work. In JupyterLab, run:
-
pip install titlecase
(a missing dependency) -
pip uninstall learntools
(since we're getting ready to install an editable version from our local repo) cd /home/kaggle/learntools
pip install --editable .
And that's it! Just remember to delete containers after you're done using them. I usually just keep the same one around until I download an updated image, but if something ever breaks in a container, you can always just delete it and create a fresh one.
A couple more useful things:
-
docker container ls -a
to list all containers -
docker start <container>
to start a stopped container (if you restarted your computer, say) -
docker stop <container>
to stop a running container -
docker container rm <container>
to delete an old container -
docker exec -it <container> bash
to open a Bash terminal in the container (or replacebash
with any other program)