-
-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sync some features with anaconda's recipe #318
Conversation
Hi! This is the friendly automated conda-forge-linting service. I failed to even lint the recipe, probably because of a conda-smithy bug 😢. This likely indicates a problem in your This message was generated by GitHub Actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/12957694482. Examine the logs at this URL for more detail. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Daniel! 🙏
Had a few questions below
+ python_include_dirs | ||
+ torch_include_dirs | ||
+ omp_include_dir_paths | ||
+ + [os.getenv('CONDA_PREFIX') + '/include'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Glancing through the source code, it looks inductor accepts an include_dirs
flag, which comes from a JSON config file
Could we just add our own JSON config file?
Seems this may be needed in other contexts as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inductor takes the compile_flags.json file for its AOT mode (handled in package.py), but not for its JIT mode. This is the problem the patch is solving - making it look in prefix/include during JIT (torch.compile) compilation. Probably in AOT mode the user wants to specify their own compile flags for their platform, which is what this json file is for.
The code you're looking at is where the AOT code is initializing the base class (BuildOptionsBase). However, we want to initialize the include directories in the child CppTorchOptions class, which is instantiated in cpu_vec_isa.py. It was a while ago I wrote this patch but IIRC that was the path in the stack trace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this check out ok for you?
mv build/lib.*/torch/include/{ATen,caffe2,tensorpipe,torch,c10} ${PREFIX}/include/ | ||
rm ${PREFIX}/lib/libtorch_python.* | ||
|
||
# Keep the original backed up to sed later | ||
cp build/CMakeCache.txt build/CMakeCache.txt.orig | ||
;; | ||
pytorch) | ||
$PREFIX/bin/python -m pip install . --no-deps -vvv --no-clean \ | ||
$PREFIX/bin/python -m pip install . --no-deps --no-build-isolation -vvv --no-clean \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we sync these flags across calls and Windows? Seems they vary a bit
Idk whether it is worth it, but we could also consider using script_env
to set most of these flags once and reuse them throughout
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for windows we've got
%PYTHON% -m pip %PIP_ACTION% . --no-build-isolation --no-deps -vvv --no-clean
and unix
$PREFIX/bin/python -m pip install . --no-deps --no-build-isolation -vvv --no-clean
So the same AFAICS?
Keeping them here seems more in line with what people will expect given other feedstocks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any further comments?
export USE_SYSTEM_PYBIND11=1 | ||
export USE_SYSTEM_EIGEN_INSTALL=1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens to the PyTorch vendored copies of these when using the system copies?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have a patch to prevent this for mkl but I thought it best to look at the blas/openmp stuff closer and maybe address in a different PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you ok with this?
export MAX_JOBS=${CPU_COUNT} | ||
# Leave a spare core for other tasks. This may need to be reduced further | ||
# if we get out of memory errors. | ||
export MAX_JOBS=$((CPU_COUNT > 1 ? CPU_COUNT - 1 : 1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if CPU_COUNT
is undefined or empty string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What CI provider do things run on for anaconda? I think it might need to become specific to that, and not in the overall else:
clause here. Our OSX builds on azure work are successful withing the 6h limit, but seem to fail with this PR. This is a likely culprit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's AWS. Happy to just leave it as-is. I think originally we had it maxed at four cores and then increased it to CPU_COUNT-1 because of common practice and a desire to speed things up rather than concrete technical reasons. We can leave this as-is and if we have issues our end solve them later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've fixed our case in 6074128, so I have no objection to doing what helps you in the else:
branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have made this a default in the else branch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that CPU_COUNT
is a passthrough environment variable
So one can set this in their CI scripts to the intended value and conda-build will use it. This is what conda-forge does
Instead of adding this to the build recipe, would recommend doing this in Anaconda's CI scripts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CPU_COUNT
isn't being set here. If it's set (somewhere else, like in CI as you suggest) then it'll be used to set MAX_JOBS
. If it's not set, this expression will evaluate to 1
The linter doesn't like this line:
I guess it doesn't like calling .split on cuda_compiler_version. I'm getting |
Normally something like this will do the trick
Edit: Can also use Jinja's |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks very nice, thanks a lot. I left some questions on the individual patches.
I think this PR is getting very close to merging. We're down to a tiny handful of test failures, which will hopefully be addressed by the lastest round of pushes. Beyond that, I don't want to shove more feature work into this PR, tensorpipe support on windows (etc.) can come in another PR - the CMake fixes were unavoidable though... Any other thoughts @danpetry? |
I'm good with that |
PS. So glad to see the GPU server back in full force - 4 simultaneous GPU jobs 🤩 - big thank you to @aktech for sorting out some very gnarly hardware issues there! 🙏 🚀 |
May I ask how you found that PR? By looking at these tests in upstream's main and seeing if anything's changed, suggesting a fix..? |
Going through the blame of the test file. It's a bit cumbersome because you need to do it in a local checkout - the file's too big for GH to load the blame. Example (on
From there you find the commit pytorch/pytorch@381213e (and its PR). Since that wasn't the one that introduced |
ok, thanks. Additionally, I see that cpp_wrapper should be off by default, which doesn't match up with the suggestion of the changes that these tests are failing because cpp_wrapper's enabled. Or, are you just taking these changes because they solve the issue by skipping and the patch can be removed with v2.6? |
TBH I was looking for changes in the failing tests that would be plausible for fixing things, and we get to drop backported patches in the next version upgrade. I didn't research much further (those rabbit holes can be an enormous time sink, so I went with 🤞 for now), and it's entirely possible that the fix is not sufficient. In that case I'm happy to skip the tests, but as that added skip was staring me in the face, I thought it's worth a shot. 🤷 |
yep. Thanks for the knowledge/process share. |
Hm. We're failing the new CMake tests for the CUDA-enabled pytorch.
Let me add a |
Yay, more failures
Presumably also The test skip was also not effective, as seen on windows. I'll switch to your suggestion with the unique names. Finally, we still have a problem with the CMake metadata, because the Caffe2 files still use the long-deprecated |
yes, it uses cudart to query the cuda device count: https://github.com/pytorch/pytorch/blob/40ccb7a86d456bc2fb9e45e4b33c774fbf0b3e46/torch/cuda/__init__.py#L125 potentially, it's worth removing this check, because it's a test of the platform not of the package itself. Probably just leaving this check should be fine: |
do you want some help with this or are you on it? I don't want to duplicate work but happy to help |
I'd very much appreciate some help with this. There's just too many |
Aarch+CUDA timed out after 15h (note time increments per line, as well as some failures)
Still, I'm going to merge this for now to get some new builds with the CMake fixes out. I've discussed with @danpetry that it would be good to have some recent builds downloadable for debugging the CUDA part of #333, and so this will help in that respect as well. For that, I'm skipping the CUDA failures mentioned above, but I'll have yet another PR to try fixing this more properly (+ a bunch of other accumulated stuff). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the patience here!
Well, unfortunately this fails even with the full CUDA toolchain (incl. cudart) present. I don't know how these tests are passing for Anaconda, or what's different here, but this is not working unfortunately. |
WTH, why does the test not find the GPUs anymore? (this is on MKL; meaning it passed
Anyway, I went back to the state before this PR in 4f1b584. |
Sigh, we really went 4/4 with the CUDA builds failing for various reasons; it looks like the
|
Checklist
0
(if the version changed)conda-smithy
(Use the phrase@conda-forge-admin, please rerender
in a comment in this PR for automated rerendering)See commit messages for individual changes.
If there are any changes that aren't wanted, it'd be nice to know before debugging them.
There are some more involved changes we have over our side, and they will follow in further PRs if after investigation they're deemed appropriate. (full diff here, for interest)