-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"spack mirror create -a" (mirror all) in parallel #48411
base: develop
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your PR.
I wouldn't use ThreadPoolExecutor
here cause Spack isn't thread safe. In particular fetching creates stage objects, which create lock objects, of which the bookkeeping is not thread safe iirc. See spack.util.parallel.make_concurrent_executor
in the code base elsewhere.
Secondly it's better to update mirror_stats
from the return value of create_mirror_for_one_spec
in the main thread/process, instead of updating it in parallel.
Finally I would implement -j
using spack.cmd.common.arguments.add_common_arguments(..., ["jobs"])
plus spack.config.determine_number_of_jobs(parallel=True)
since it takes into account config files and the number of available CPUs.
@haampie Thank you for the code review. I'll work on implementing these changes. |
@haampie
Mirror_stats seems to be updated as a part of that process and the only value returned from "create_mirror_from_package_object" is a true or false for success or failure. |
Actually I think I figured out what you mean... let me push some changes... |
@spackbot request review |
lib/spack/spack/cmd/mirror.py
Outdated
for candidate in mirror_specs | ||
] | ||
for mirror_future in futures: | ||
pkg_obj = mirror_future.result() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be in the context manager, __exit__
destroys the thread pool immediately.
lib/spack/spack/cmd/mirror.py
Outdated
def create_mirror_for_one_spec(candidate, mirror_cache, mirror_stats): | ||
pkg_cls = spack.repo.PATH.get_pkg_class(candidate.name) | ||
pkg_obj = pkg_cls(spack.spec.Spec(candidate)) | ||
spack.mirrors.utils.create_mirror_from_package_object(pkg_obj, mirror_cache, mirror_stats) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're still mutating mirror_stats
in a subprocess, so it's not reflected in the main process. The function should return what it added / what errored. Do the mirror_stats
reduction in the main process.
This PR improves the "spack mirror create -a" process by adding threading and an additional parameter "-j" to choose the number of threads used to download a full mirror.