Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revamp Dockerfiles #17403

Merged
merged 55 commits into from
Feb 5, 2025
Merged

Revamp Dockerfiles #17403

merged 55 commits into from
Feb 5, 2025

Conversation

blozano-tt
Copy link
Contributor

@blozano-tt blozano-tt commented Jan 30, 2025

Problem(s) description

  • We have a dockerfile for every version of ubuntu, when we could have one
  • The way that customer sets up a baremetal machine is completely decoupled from CI, no verification of their flow.
  • Our Release Docker image is 6.5GB
  • Our CI Docker container has things it would never need, like gdb and vim, and we are pulling this many times.
  • We are not making use of multi-stage docker facilities
  • Changing one apt dependency incurs the cost of rebuilding gdb

What's changed

  • Introduce a single Dockerfile with a base, ci-build, ci-test, dev, release image definitions.
    • In a subsequent PR we can make use of ci-build for build machines, and ci-test for test machines.
  • Remove unnecessary files used to augment the Dockerfile definition (requirements)
  • I added ClangBuildAnalyzer
  • I added several CPM dependencies, so we can prototype building with system deps.

Possible in future work

It will be possible to make a much smaller release image in future work
However there is work to be done to make sure that the smoke test works with the release image.

$ docker images | grep release 
ghcr.io/tenstorrent/tt-metal/tt-metalium-ubuntu-20.04-amd64-release   dev-blozano-docker   adfcfcb6d262   18 minutes ago   2.5GB
ghcr.io/tenstorrent/tt-metal/tt-metalium-ubuntu-20.04-amd64-release   latest-rc            a41a7472f8c6   23 hours ago     7.65GB

Checklist

dockerfile/Dockerfile Outdated Show resolved Hide resolved
@blozano-tt blozano-tt added this to the Metal CI to Green milestone Feb 4, 2025

#############################################################

FROM ci-test AS dev
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we going to trim down the runtime dependencies later after the basis is merged?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I'll file a ticket for that.

#############################################################

FROM dev AS release

Copy link
Contributor

@dimitri-tenstorrent dimitri-tenstorrent Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There might be an issue with the release images as you merge @afuller-TT 's latest PR. See this line: https://github.com/tenstorrent/tt-metal/pull/17516/files#diff-6e6cd4043a79554fa4a311541ed0418aebae47985936fecc38c5a15ec0adb32bR1

I do not have a good idea of how to support that workflow with the stages outlined.

FROM $BASE_IMAGE as release?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need the build argument anymore.

The release image is now based on an earlier layer, instead of a pushed tag.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll be losing out on the efficiency of re-using a previously built stage, unless we have dumb luck on the build machine.

I'm okay with this as a stepping stone. But certainly there's room for improvement. I don't think it's detrimental; just inefficient (compute and storage).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any idea where this was used? I only see it referenced in egg-info.

dockerfile/Dockerfile Outdated Show resolved Hide resolved
jq \
pandoc \
pkg-config \
python3-dev \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't some of these hard requirements for building even outside of CI?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put all build dependencies in ci-build layer.

runtime layer is just for running with precompiled binaries.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the build dependencies went in install_dependencies.sh? Since we're calling it on line 20 asking for the build dependencies.

So if one adds a new build dependency, does it go here in the Dockerfile? Or in the install_dependencies.sh script?

Copy link
Contributor Author

@blozano-tt blozano-tt Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You raise a really good point.
My thinking was install_dependencies.sh should manage all build dependencies, and this section should be anything we need in CI for our workflows ... for instance, we do some sudo rm in workflows, so that should go here. We need jq for some parsing? add it here.... clang-tidy ... add it here ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cargo ... is only needed for tt-train I think ... so I have mixed emotions about it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

graphviz is only needed to run some pytests ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not ready to put these in install dependencies as they aren't used as of yet.

libfmt-dev
libyaml-cpp-dev
pybind11-dev
nlohmann-json3-dev
libgtest-dev
libboost-all-dev \

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved git, pkg-config, and python3-dev to install_dependencies.sh

Copy link
Contributor

@afuller-TT afuller-TT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it ❤️
I'll review again after latest main is merged in.

dockerfile/Dockerfile Outdated Show resolved Hide resolved
Comment on lines +55 to +56
libtbb-dev \
libcapstone-dev \
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these two are TRACY "requirements" ... a customer would never build with tracy? I'm thinking keep them here.
@mo-tenstorrent

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Customers definitely build with tracy. This is a tool for anyone developing on our cards. I would keep them in any docker release we have to support profiler builds.

Copy link
Contributor

@afuller-TT afuller-TT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's roll!

I know this doesn't USE these slimmed-down Docker images, but we can iterate after this lands.

@blozano-tt blozano-tt merged commit 7f0f16f into main Feb 5, 2025
11 checks passed
@blozano-tt blozano-tt deleted the blozano-docker branch February 5, 2025 20:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants