Skip to content

Commit

Permalink
set LD_LIBRARY_PATH for fbgemm in validate_binaries.sh (pytorch#2696)
Browse files Browse the repository at this point in the history
Summary:

# context
* to address the error when running github test
```
+++ conda run -n build_binary python -c 'import torch; import fbgemm_gpu; import torchrec'
+++ local cmd=run
+++ case "$cmd" in
+++ __conda_exe run -n build_binary python -c 'import torch; import fbgemm_gpu; import torchrec'
+++ /opt/conda/bin/conda run -n build_binary python -c 'import torch; import fbgemm_gpu; import torchrec'
ERROR:root:Could not load the library 'fbgemm_gpu_tbe_index_select.so': /lib64/libstdc++.so.6: version `GLIBCXX_3.4.29' not found (required by /opt/conda/envs/build_binary/lib/python3.10/site-packages/fbgemm_gpu/fbgemm_gpu_tbe_index_select.so)
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/opt/conda/envs/build_binary/lib/python3.10/site-packages/fbgemm_gpu/__init__.py", line 62, in <module>
    _load_library(f"{library}.so")
  File "/opt/conda/envs/build_binary/lib/python3.10/site-packages/fbgemm_gpu/__init__.py", line 21, in _load_library
    raise error
  File "/opt/conda/envs/build_binary/lib/python3.10/site-packages/fbgemm_gpu/__init__.py", line 17, in _load_library
    main()
  File "/home/ec2-user/actions-runner/_work/torchrec/torchrec/test-infra/.github/scripts/run_with_env_secrets.py", line 98, in main
    run_cmd_or_die(f"docker exec -t {container_name} /exec")
  File "/home/ec2-user/actions-runner/_work/torchrec/torchrec/test-infra/.github/scripts/run_with_env_secrets.py", line 39, in run_cmd_or_die
    raise RuntimeError(f"Command {cmd} failed with exit code {exit_code}")
RuntimeError: Command docker exec -t d5cfe23625bf3b1538b808a1344090ae72ff3977990bc1f780c7a46435a384ec /exec failed with exit code 1
    torch.ops.load_library(os.path.join(os.path.dirname(__file__), filename))
  File "/opt/conda/envs/build_binary/lib/python3.10/site-packages/torch/_ops.py", line 1357, in load_library
    ctypes.CDLL(path)
  File "/opt/conda/envs/build_binary/lib/python3.10/ctypes/__init__.py", line 374, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.29' not found (required by /opt/conda/envs/build_binary/lib/python3.10/site-packages/fbgemm_gpu/fbgemm_gpu_tbe_index_select.so)
```
* the issue was fixed before by D67949409 ([pytorch#2671](pytorch#2671)) in for another test
* this diff applies the same fix on the validate_binaries test.

# details
* previous failures
{F1974496108}

Differential Revision: D68511145
  • Loading branch information
TroyGarden authored and facebook-github-bot committed Jan 22, 2025
1 parent 519f193 commit fbce6b2
Show file tree
Hide file tree
Showing 2 changed files with 24 additions and 0 deletions.
18 changes: 18 additions & 0 deletions .github/scripts/validate_binaries.sh
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,24 @@ elif [[ ${MATRIX_CHANNEL} = 'release' ]]; then
export PYTORCH_URL="https://download.pytorch.org/whl/${CUDA_VERSION}"
fi


echo "CU_VERSION: ${CUDA_VERSION}"
echo "MATRIX_CHANNEL: ${MATRIX_CHANNEL}"
echo "CONDA_ENV: ${CONDA_ENV}"

if [[ $CUDA_VERSION = cu* ]]; then
# Setting LD_LIBRARY_PATH fixes the runtime error with fbgemm_gpu not
# being able to locate libnvrtc.so
echo "[NOVA] Setting LD_LIBRARY_PATH ..."
conda env config vars set -p ${CONDA_ENV} \
LD_LIBRARY_PATH="/usr/local/lib:${CUDA_HOME}/lib64:${CONDA_ENV}/lib:${LD_LIBRARY_PATH}"
else
echo "[NOVA] Setting LD_LIBRARY_PATH ..."
conda env config vars set -p ${CONDA_ENV} \
LD_LIBRARY_PATH="/usr/local/lib:${CONDA_ENV}/lib:${LD_LIBRARY_PATH}"
fi


# install pytorch
# switch back to conda once torch nightly is fixed
# if [[ ${MATRIX_GPU_ARCH_TYPE} = 'cuda' ]]; then
Expand Down
6 changes: 6 additions & 0 deletions .github/workflows/validate-binaries.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,12 @@
name: Validate binaries

on:
pull_request:
paths-ignore:
- "docs/*"
- "third_party/*"
- .gitignore
- "*.md"
workflow_call:
inputs:
channel:
Expand Down

0 comments on commit fbce6b2

Please sign in to comment.