Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do resolutions within a module in parallel #391

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

craffit
Copy link

@craffit craffit commented May 27, 2022

Profiling showed conflict resolution to be a performance hotspot. Cpu utilization is a about 2 cores on my machine, after parallelizing the toplevel resolution loop utilization went to about 4 cores for my local workload.

With this change i measured a performance improvement from 50 seconds on sbt update to 30 seconds.

Copy link
Contributor

@jtjeferreira jtjeferreira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a nice improvement

Comment on lines +3 to +5
// This is required for jscala 2.12/2.13 compatibility for parallel collections
// (see https://github.com/scala/scala-parallel-collections/issues/22)
private[internal] object CompatParColls {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this code already exists in sbt, but is not public https://github.com/sbt/sbt/search?q=CollectionConverters

@craffit
Copy link
Author

craffit commented Jul 21, 2022

@alexarchambault could you take a look at this PR and see what you would like changed for it to be mergeable? (Or reject it if you think this work is no good)

Copy link
Member

@alexarchambault alexarchambault left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @craffit. I'm not against parallelizing this code, all the more that it brings perf gains. The only issue here is that the thread pool these calculations run on is not clear / not explicit. I've been bitten by this again very recently here for example.

So either the parallel collection should be configured so that it runs on an explicit thread pool (seems can be done, as shown here, although I have no experience with this API). Or things should be parallelized manually using futures and Future.sequence to go from List[Future[…]] to Future[List[…]] (all of that while passing those methods a custom thread pool / execution context).

As a thread pool to run those calculations, we could use either

  • the thread pool used by the coursier cache (but its thread count defaults to 6, which might not correspond to the number of cores that are available…)
  • a thread pool lazily started just for this via coursier.cache.internal.ThreadUtil. fixedThreadPool (which can be kept in a lazy field on an object somewhere - its threads automatically shut down after 1 minute of inactivity, so there's no need to manually shut them down)

I think both thread pools ought to do the job…

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants