Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature request - support remote persistent workers #776

Closed
aherrmann opened this issue Sep 13, 2024 · 4 comments
Closed

feature request - support remote persistent workers #776

aherrmann opened this issue Sep 13, 2024 · 4 comments

Comments

@aherrmann
Copy link
Contributor

Buck2 has support for persistent workers, however, these are only available for locally executed actions. In contrast, Bazel supports remote persistent workers, see also BuildBuddy docs.

Persistent workers can provide large performance benefits, the Bazel documentation reports 2-4x speed-ups for Java, preliminary experiments on our Haskell builds have shown about 3x speed-ups. Without support for persistent workers in remote execution environments, users have to make a trade-off between the speed-up provided by a persistent worker and the speed-up provided by scaling to many build nodes. It would be preferable if both speed-ups could be combined by supporting persistent workers in remote execution environments.

@aherrmann
Copy link
Contributor Author

I brought up the topic of remote persistent workers in the Remote Execution API Working Group meeting on 2024-10-08.

The current status of remote persistent worker support under Bazel is that of the 2021-03-06-remote-persistent-workers proposal. However, it is not a properly standardized feature under the remote execution protocol. Some remote execution systems do already support that proposal: To my knowledge, BuildFarm, BuildBuddy, and EngFlow. Others have not yet implemented it and some are hesitant to implement it until it’s standardized.

The outcome of the discussion in the remote execution working group was that there is appetite to standardize this feature. There are some concerns with the current remote persistent worker proposal, in particular potential resource leakage and lack of multiplexing worker support were mentioned. An advantage of the 2021-03-06 proposal is that it is automatically backward compatible with remote execution systems that do not support persistent workers.

There is also a difference in the worker protocol itself, in particular the Bazel worker protocol uses length prefixed protobuf objects over stdin/stdout, which can cause issues when workers inadvertently write to stdout, e.g. due to underlying libraries used. Buck2 on the other hand uses a gRPC protocol over Unix domain sockets, which doesn’t have that issue. One path that was proposed in the meeting was to first define a persistent worker protocol standard and then use that in the remote execution protocol.

To my knowledge Meta is currently revisiting the persistent worker protocol and the internal remote persistent worker support. This may be a good opportunity to also keep the open source remote execution protocol in mind. In the working group meeting @mostynb, @allada, and @ulfjack stated that they would be interested in participating in the discussion. @christolliday if this sounds interesting to Meta, could you perhaps share your thoughts on a possible future standardized remote persistent worker feature in the remote execution protocol?

@bergsieker
Copy link

I'm also interested in participating in discussions surrounding remote persistent workers.

@tjgq
Copy link

tjgq commented Nov 12, 2024

Please count me as interested as well (from the Bazel side).

@aherrmann
Copy link
Contributor Author

To provide an update here. There's been progress on getting the prototype implementation in #787 closer to a mergeable state by adding testing in Buck2 CI. There is still a bit of coordination in progress on the CI side. Once that's in, the next thing I am planning to do on this topic is to review the existing persistent worker protocols, collect their capabilities, and think about potential additional needs for the remote execution use-case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants