-
Notifications
You must be signed in to change notification settings - Fork 309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[evals] Add support for scaling evals and inference with ray #63
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have any unitests in this repo so some stuff to verify manually:
- does the regular single node cli still work after these changes?
- does it still work with openAI models?
There is a bit of extra stuff that we can trim down to not confuse people in workload.py
Yep, checked with the new updates and made sure everything is working e2e for both |
|
||
responses = copy.deepcopy(responses) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a NOTE + TODO comment here for now explaining the issue we saw?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bug details:
- a new Response object that's just a python dataclass with str, int, int attributes is initialized from the values of the ds.iter_rows() of a ray dataset
- these responses are processed in a ProcessPoolExecutor, but when we exit the context of the executor, and it tries to clean up the response objects, we runs into a SIGSEV on the ray object store level for some reason (see below)
Traceback for posterity
(raylet) *** SIGSEGV received at time=1738696160 on cpu 214 ***
(raylet) PC: @ 0x56080cd7f1ae (unknown) plasma::ReadReleaseRequest()
(raylet) @ 0x7fcb9fbaf520 4656 (unknown)
(raylet) @ 0x56080cd5715f 1456 plasma::PlasmaStore::ProcessMessage()
(raylet) @ 0x56080cd50f15 32 std::_Function_handler<>::_M_invoke()
(raylet) @ 0x56080cd86d33 1280 plasma::Client::Create()::{lambda()#1}::operator()()
(raylet) @ 0x56080cf56aad 1376 ray::ClientConnection::ProcessMessage()
(raylet) @ 0x56080cf6de98 1168 EventTracker::RecordExecution()
(raylet) @ 0x56080cf58fb8 400 boost::asio::detail::reactive_socket_recv_op<>::do_complete()
(raylet) @ 0x56080d557f9b 128 boost::asio::detail::scheduler::do_run_one()
(raylet) @ 0x56080d55a529 288 boost::asio::detail::scheduler::run()
(raylet) @ 0x56080d55aa42 96 boost::asio::io_context::run()
(raylet) @ 0x56080cd50b20 1424 plasma::PlasmaStoreRunner::Start()
(raylet) @ 0x56080ccc4b05 208 std::thread::_State_impl<>::_M_run()
(raylet) @ 0x56080d6bafb0 258531312 execute_native_thread_routine
(raylet) @ ... and at least 3 more frames
(raylet) {"asctime":"2025-02-04 19:09:20,137","levelname":"E","message":"*** SIGSEGV received at time=1738696160 on cpu 214 ***","component":"raylet","filename":"logging.cc","lineno":447}
(raylet) {"asctime":"2025-02-04 19:09:20,137","levelname":"E","message":" @ ... and at least 3 more frames","component":"raylet","filename":"logging.cc","lineno":447}
(raylet) {"asctime":"2025-02-04 19:09:20,137","levelname":"E","message":"PC: @ 0x56080cd7f1ae (unknown) plasma::ReadReleaseRequest()","component":"raylet","filename":"logging.cc","lineno":447}
(raylet) @ 0x56080cd873f8 48 std::_Function_handler<>::_M_invoke()
(raylet) {"asctime":"2025-02-04 19:09:20,137","levelname":"E","message":" @ 0x56080d6bafb0 258531312 execute_native_thread_routine","component":"raylet","filename":"logging.cc","lineno":447} [repeated 16x across cluster]
…rayllm Signed-off-by: SumanthRH <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!!
What does this PR do?
This PR adds support for using ray to speed up evals and data generation. Currently we use are using a preliminary version of using ray data + vllm while we wait for the code at ray.llm to be fully open sourced (coming in the next 1-2 weeks), after which we will migrate over.
Speedups
We also see faster data generation for sampling n parallel generations (above for Deepseek distilled Qwen-7B). Same inference steps for
32k
max tokens for AIME andn=128
with the Qwen math repo using a single tp=4 replica takes ~10 hours.How to Use
To use the new path: simply add
--use_ray
to existing commands and set relevant scaling parameters in--ray_config
.Reasonable defaults and examples of how to set advanced vllm engine arguments are provided in
ray_configs/ray_config.yaml
. For example, to run the Math-500 eval withSky-T1-32B-Preview
, you can use the following commandpython inference_and_check.py --model NovaSky-AI/Sky-T1-32B-Preview --task math500 --split test --max_tokens 8192 --use_ray --ray_config ray_configs/ray_config.yaml
where the
ray_config
looks like: