Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ROCm] ci build and dockerfile changes #161

Open
wants to merge 1 commit into
base: rocm-main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions build/rocm/build_wheels/Dockerfile.manylinux_2_28_x86_64.rocm
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,9 @@ ARG ROCM_VERSION=6.1.1
ARG ROCM_BUILD_JOB
ARG ROCM_BUILD_NUM

# Install system GCC and C++ libraries.
RUN yum install -y gcc-c++.x86_64
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is here to fix the problem where Clang can't find the standard C++ libraries, but this is definitely something we shouldn't be doing (at least long-term). GCC 13, which is the GCC version for manylinux_2_28, is already installed as part of the base image. If this pulls in, say, GCC 14 stuff, this may put us out of manylinux compliance.

I suspect there's another solution here that involves telling Clang to check standard C++ libraries wherever GCC 13 is keeping them.

@mrodden Any thoughts on where that might be?


RUN --mount=type=cache,target=/var/cache/dnf \
--mount=type=bind,source=build/rocm/tools/get_rocm.py,target=get_rocm.py \
python3 get_rocm.py --rocm-version=$ROCM_VERSION --job-name=$ROCM_BUILD_JOB --build-num=$ROCM_BUILD_NUM
Expand Down
7 changes: 7 additions & 0 deletions build/rocm/ci_build
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,13 @@ def dist_wheels(
# create manylinux image with requested ROCm installed
image = "jax-manylinux_2_28_x86_64_rocm%s" % rocm_version.replace(".", "")

# Try removing the Docker image.
try:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should do this. I was talking to Matt, and he said that the original intent of this script was to run it in Jenkins where the image has already been removed and Docker has been pruned. When you're not running in Jenkins (and on the GitHub Actions self-hosted runner), docker build should be smart enough to pick up the new image.

subprocess.run(["docker", "rmi", image], check=True)
print(f"Image {image} removed successfully.")
except subprocess.CalledProcessError as e:
print(f"Failed to remove Docker image {image}: {e}")

cmd = [
"docker",
"build",
Expand Down
3 changes: 3 additions & 0 deletions build/rocm/tools/get_rocm.py
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,9 @@ def install_packages(self, package_specs):
if self.pkgbin == "apt":
env["DEBIAN_FRONTEND"] = "noninteractive"

# Update indexes.
subprocess.check_call(["apt-get", "update"])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for my education, what is this line for?


LOG.info("Running %r" % cmd)
subprocess.check_call(cmd, env=env)

Expand Down