Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"spack mirror create -a" (mirror all) in parallel #48411

Open
wants to merge 34 commits into
base: develop
Choose a base branch
from

Conversation

aquan9
Copy link

@aquan9 aquan9 commented Jan 5, 2025

This PR improves the "spack mirror create -a" process by adding threading and an additional parameter "-j" to choose the number of threads used to download a full mirror.

@spackbot-app spackbot-app bot added commands core PR affects Spack core functionality documentation Improvements or additions to documentation labels Jan 5, 2025
@aquan9 aquan9 marked this pull request as ready for review January 7, 2025 03:15
Copy link
Member

@haampie haampie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your PR.

I wouldn't use ThreadPoolExecutor here cause Spack isn't thread safe. In particular fetching creates stage objects, which create lock objects, of which the bookkeeping is not thread safe iirc. See spack.util.parallel.make_concurrent_executor in the code base elsewhere.

Secondly it's better to update mirror_stats from the return value of create_mirror_for_one_spec in the main thread/process, instead of updating it in parallel.

Finally I would implement -j using spack.cmd.common.arguments.add_common_arguments(..., ["jobs"]) plus spack.config.determine_number_of_jobs(parallel=True) since it takes into account config files and the number of available CPUs.

@aquan9
Copy link
Author

aquan9 commented Jan 7, 2025

@haampie Thank you for the code review. I'll work on implementing these changes.

@aquan9
Copy link
Author

aquan9 commented Jan 8, 2025

Secondly it's better to update mirror_stats from the return value of create_mirror_for_one_spec in the main thread/process, instead of updating it in parallel.

@haampie
I'm a little bit confused about this comment as I was going through the code and I was hoping you could clarify. There is no return value from the create_mirror_for_one_spec function. The line that actually downloads the file is:

spack.mirrors.utils.create_mirror_from_package_object(pkg_obj, mirror_cache, mirror_stats)

Mirror_stats seems to be updated as a part of that process and the only value returned from "create_mirror_from_package_object" is a true or false for success or failure.

@aquan9
Copy link
Author

aquan9 commented Jan 8, 2025

Secondly it's better to update mirror_stats from the return value of create_mirror_for_one_spec in the main thread/process, instead of updating it in parallel.

@haampie I'm a little bit confused about this comment as I was going through the code and I was hoping you could clarify. There is no return value from the create_mirror_for_one_spec function. The line that actually downloads the file is:

spack.mirrors.utils.create_mirror_from_package_object(pkg_obj, mirror_cache, mirror_stats)

Mirror_stats seems to be updated as a part of that process and the only value returned from "create_mirror_from_package_object" is a true or false for success or failure.

Actually I think I figured out what you mean... let me push some changes...

@aquan9 aquan9 requested a review from haampie January 16, 2025 20:39
@spackbot-app spackbot-app bot added the tests General test capability(ies) label Jan 16, 2025
@aquan9
Copy link
Author

aquan9 commented Jan 23, 2025

@spackbot request review

for candidate in mirror_specs
]
for mirror_future in futures:
pkg_obj = mirror_future.result()
Copy link
Member

@haampie haampie Jan 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be in the context manager, __exit__ destroys the thread pool immediately.

def create_mirror_for_one_spec(candidate, mirror_cache, mirror_stats):
pkg_cls = spack.repo.PATH.get_pkg_class(candidate.name)
pkg_obj = pkg_cls(spack.spec.Spec(candidate))
spack.mirrors.utils.create_mirror_from_package_object(pkg_obj, mirror_cache, mirror_stats)
Copy link
Member

@haampie haampie Jan 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're still mutating mirror_stats in a subprocess, so it's not reflected in the main process. The function should return what it added / what errored. Do the mirror_stats reduction in the main process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
commands core PR affects Spack core functionality documentation Improvements or additions to documentation shell-support tests General test capability(ies)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants