Agents #2936

August-murr · 2025-02-23T13:13:00Z

How to use:
pip install trl[agents]

from trl import GRPOConfig, GRPOTrainer,prepare_data_for_local_agent
from transformers import AutoTokenizer
from datasets import load_dataset

dataset = load_dataset(...)
tokenizer = AutoTokenizer.from_pretrained(...)
my_prompt = "..."

prepared_data = prepare_data_for_local_agent(dataset,tokenizer,system_prompt=my_prompt)

training_args = GRPOConfig(output_dir=...,....,use_vllm=True,use_agent=True)

trainer = GRPOTrainer(model="model_path",reward_funcs=reward_funcs,args=training_args,train_dataset=prepared_data)

trainer.train()

The default setting uses LocalExecuter, and I haven't trained any models with E2BExecuter yet.

I separated the data preprocessing from the GRPOTrainer script by using the prepare_data functions. This keeps changes to the trainer script minimal.

Performance:

I faced out-of-memory (OOM) issues and couldn't train an agent for more complex tasks. To show that it works, I trained the Qwen 2.5 0.5B to call code, using a reward function that counted the number of code calls in its output. The results showed an increase from 14% for the base model to 58% after 10 steps and 93% after 20 steps.

While the results look good, they don’t represent a practical use case. I think we should train an agent on a larger scale to achieve specific goals or benchmarks. We can also improve any factors that slow down training or scalability. In the end, we could write a blog post or report to showcase its effectiveness.
@qgallouedec, @lewtun, @edbeeching, @aymeric-roucher
Here are some ideas for training the agent:

Use an R1 variant, based on the budget, to train an agent to think and write code recursively to solve tasks.
Focus on standard benchmarks like GAIA or coding tests, with a reward function based on the think and code structure, and correctness.
for deep research, we could train the R1 agent using a dataset of queries and reports, or articles. The reward function would assess how similar the text embedding of the output report is to the correct report and its structure (e.g., markdown formatting).

HuggingFaceDocBuilderDev · 2025-02-23T13:17:42Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

qgallouedec · 2025-02-23T16:55:14Z

Super cool PR @August-murr! I'll check in details asap.

I faced out-of-memory (OOM) issues and couldn't train an agent for more complex tasks.

Can you share your code?

In the end, we could write a blog post or report to showcase its effectiveness.

Definitely!

While the results look good, they don’t represent a practical use case

Do you have another simple while practical use case?

qgallouedec · 2025-02-23T16:57:52Z

trl/__init__.py

+    from .agents import (
+        E2BExecutor,
+        LocalExecutor,
+        generate_agent_responses,
+        get_code,
+        prepare_data_for_e2b_agent,
+        prepare_data_for_local_agent,
+        read_script,
+    )
+


Move this just after

if TYPE_CHECKING:

qgallouedec · 2025-02-23T16:58:07Z

trl/agents/__init__.py

+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from .utils import (


Suggested change

from .utils import (

from .utils import (

qgallouedec · 2025-02-23T16:58:22Z

trl/agents/utils.py

+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from ..import_utils import is_agents_available


Suggested change

from ..import_utils import is_agents_available

from ..import_utils import is_agents_available

qgallouedec · 2025-02-23T17:01:36Z

trl/agents/utils.py

+if is_agents_available():
+    from inspect import getsource
+    from pathlib import Path
+    from typing import Callable, List, Optional
+
+    from e2b_code_interpreter import Sandbox
+    from langchain_experimental.utilities import PythonREPL
+    from vllm import LLM, SamplingParams
+else:
+    raise ImportError(
+        "Agents utilities are not available. Please install trl with " "`pip install trl[agents]` to use utils"
+    )


Suggested change

if is_agents_available():

from inspect import getsource

from pathlib import Path

from typing import Callable, List, Optional

from e2b_code_interpreter import Sandbox

from langchain_experimental.utilities import PythonREPL

from vllm import LLM, SamplingParams

else:

raise ImportError(

"Agents utilities are not available. Please install trl with " "`pip install trl[agents]` to use utils"

)

from inspect import getsource

from pathlib import Path

from typing import Callable, List, Optional

if e2b_available():

from e2b_code_interpreter import Sandbox

if is_langchain_experimental_available():

from langchain_experimental.utilities import PythonREPL

if is_vllm_available():

from vllm import LLM, SamplingParams

and move

raise ImportError( "Agents utilities are not available. Please install trl with " "`pip install trl[agents]` to use utils" )

directly into __init__ or in the functions that need it

qgallouedec · 2025-02-23T17:05:27Z

trl/agents/utils.py

+    Args:
+        chat (str): The chat message containing the code snippet.
+        tools_script (str, optional): A script to prepend to the extracted code snippet. Defaults to None.
+        parsing_string (str, optional): The string used to identify the start of the code snippet in the chat message. Defaults to "<code>".
+    Returns:
+        str: The extracted code snippet, optionally prepended with the tools script.


Suggested change

Args:

chat (str): The chat message containing the code snippet.

tools_script (str, optional): A script to prepend to the extracted code snippet. Defaults to None.

parsing_string (str, optional): The string used to identify the start of the code snippet in the chat message. Defaults to "<code>".

Returns:

str: The extracted code snippet, optionally prepended with the tools script.

Args:

chat (`str`):

Chat message containing the code snippet.

tools_script (`str` or `None`, *optional*, defaults to `None`):

A script to prepend to the extracted code snippet.

parsing_string (`str`, *optional*, defaults to `"<code>"`):

String used to identify the start of the code snippet in the chat message.

Returns:

`str`:

Extracted code snippet, optionally prepended with the tools script.

qgallouedec · 2025-02-23T17:07:39Z

trl/agents/utils.py

+    """
+    A class to handle code execution in an e2b sandbox environment.
+    """
+
+    def __init__(self, api_key: str, dependencies: Optional[List[str]] = None, template: Optional[str] = None):
+        """
+        Initialize the E2BExecutor with API key and optional settings.
+
+        Args:
+            api_key (str): Your E2B API Key.
+            dependencies (Optional[List[str]]): A list of dependencies to install. Defaults to None.
+            template (Optional[str]): Template for the sandbox environment. Defaults to None.
+        """


Suggested change

"""

A class to handle code execution in an e2b sandbox environment.

"""

def __init__(self, api_key: str, dependencies: Optional[List[str]] = None, template: Optional[str] = None):

"""

Initialize the E2BExecutor with API key and optional settings.

Args:

api_key (str): Your E2B API Key.

dependencies (Optional[List[str]]): A list of dependencies to install. Defaults to None.

template (Optional[str]): Template for the sandbox environment. Defaults to None.

"""

"""

A class to handle code execution in an e2b sandbox environment.

Initialize the E2BExecutor with API key and optional settings.

Args:

api_key (`str`):

Your E2B API Key.

dependencies (`list[str]]` or `None`, *optional*, defaults to `None`):

A list of dependencies to install.

template (`str` or `None`, *optional*, defaults to `None`):

Template for the sandbox environment.

"""

def __init__(self, api_key: str, dependencies: Optional[List[str]] = None, template: Optional[str] = None):

qgallouedec · 2025-02-23T17:16:48Z

trl/agents/utils.py

+    try:
+        return Path(user_script_path).read_text()
+    except Exception as e:
+        raise RuntimeError(f"Error reading the user script: {e}") from e


Suggested change

try:

return Path(user_script_path).read_text()

except Exception as e:

raise RuntimeError(f"Error reading the user script: {e}") from e

return Path(user_script_path).read_text()

Unless it really adds something, I'd avoid this. Unnecessarily wrapping exceptions can obscure tracebacks and make debugging harder

qgallouedec · 2025-02-23T17:17:44Z

Can you add some tests and docs as well?

August-murr · 2025-02-23T19:50:09Z

Can you share your code?

Kaggle Notebook
the biggest issue was really Kaggles 2xT4 having little VRAM.
I did try PEFT but then couldn't use it properly with vllm then decided to do full model instead.

Do you have another simple while practical use case?
no, not simpler than that.

qgallouedec · 2025-02-23T21:22:24Z

Not sure when you tested it but peft + vllm should be fixed now

qgallouedec · 2025-02-24T22:20:55Z

trl/import_utils.py

+def is_agents_available() -> bool:
+    return _langchain_experimental_available and _vllm_available and _e2b_available
+


Suggested change

def is_agents_available() -> bool:

return _langchain_experimental_available and _vllm_available and _e2b_available

I would not combine requirements here. Let's be fine grained where the deps are used

August-murr · 2025-02-25T12:34:15Z

@qgallouedec I don't get why the tests failed.
8 tests failed with error:
module 'torch' has no attribute 'hip'

qgallouedec · 2025-02-28T12:09:49Z

It's because of liger, merging main should solve this

kashif · 2025-02-28T12:12:02Z

they have a fix in the 0.5.4 version

qgallouedec · 2025-02-28T12:36:15Z

Actually,I think it's 0.5.4 that contains the bug, that's why we pinned to 0.5.3: #2952

qgallouedec · 2025-02-28T18:35:20Z

don't bother too much with windows. It a test fails, you can skip it

August-murr · 2025-03-02T09:27:04Z

@qgallouedec is there anything else needed??

ronald-d-rogers · 2025-03-02T17:50:01Z

trl/agents/utils.py

+    processed_prompts = tokenizer.apply_chat_template(conversations, tokenize=False, add_generation_prompt=True)
+
+    # Create a new dataset with processed prompts
+    return dataset.map(lambda x, idx: {prompt_column: processed_prompts[idx]}, with_indices=True)


Will this remove all of the columns? If so this might break things for folks who put the answer in a column to verify the completion in the reward func.

no, it doesn't. That was taken into consideration
the function only modifies the column "prompt" (or whatever column name you input) and keeps the other columns unchanged.

August-murr · 2025-03-03T18:16:32Z

just added parallel code execution using asyncio in 2197385 to make code generation more scalable.

Right now, we’re working with E2B, but if we find that alternatives like CodeSandbox.io, Modal, or Daytona are cheaper or faster, we’ll create wrappers for those too.

August-murr · 2025-03-04T14:16:49Z

Not sure when you tested it but peft + vllm should be fixed now

@qgallouedec
Can you share some code for a training run that worked for you? I've been using the example from docs plus a lora config and use_vllm=True and it has not been working. not for agents btw, just training model with the reward function in the docs.

qgallouedec · 2025-03-04T14:58:41Z

Thanks for the feedback, what's the traceback?

August-murr · 2025-03-04T15:25:34Z

Thanks for the feedback, what's the traceback?

there is no error, Everything's running, but the reward doesn't really seem to improve —the reward is all over the place.
Notebook
This exact notebook, without the PEFTConfig, works perfectly, pretty much nailing it with no response (getting the max reward) after just a few steps. So, I’m pretty convinced it has something to do with the PEFT and not the reward function or the data or other hyper-parameters.

qgallouedec · 2025-03-04T15:59:58Z

Possibly related? #2873

August-murr · 2025-03-04T17:37:32Z

Possibly related? #2873

Can you share the code you used in #2873 (comment)

August-murr added 5 commits February 23, 2025 10:31

debugging

00959c2

fixing sampleing.n

1fedef5

fixing syntax error

48d96f0

disabling vllm progress bar

2d17f7e

precommit

4a14e07

August-murr requested review from qgallouedec, lewtun, edbeeching and kashif and removed request for lewtun February 23, 2025 13:13

qgallouedec reviewed Feb 23, 2025

View reviewed changes

August-murr changed the base branch from agents to main February 24, 2025 08:22

August-murr added 2 commits February 24, 2025 21:59

suggestions

03b4595

impor error

73b2c74

qgallouedec reviewed Feb 24, 2025

View reviewed changes

August-murr added 2 commits February 24, 2025 22:27

e2b typo

e3d3a34

fine graining

f31e003

August-murr added 2 commits February 28, 2025 12:53

Merge branch 'main' into agents

332b733

fix test?

e178626

August-murr force-pushed the agents branch from 4ed9fc8 to e178626 Compare February 28, 2025 12:56

August-murr added 3 commits February 28, 2025 16:48

fix windows test

15532b4

style

28d45e9

whats wrong with windows

96c3179

August-murr requested a review from qgallouedec February 28, 2025 19:03

ronald-d-rogers reviewed Mar 2, 2025

View reviewed changes

August-murr added 3 commits March 3, 2025 16:58

e2b asynchronous code execution

2197385

Merge branch 'main' into agents

d0627ad

Merge main

67a719a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agents #2936

Agents #2936

August-murr commented Feb 23, 2025 •

edited

Loading

HuggingFaceDocBuilderDev commented Feb 23, 2025

qgallouedec commented Feb 23, 2025 •

edited

Loading

qgallouedec Feb 23, 2025

qgallouedec Feb 23, 2025

qgallouedec Feb 23, 2025

qgallouedec Feb 23, 2025 •

edited

Loading

qgallouedec Feb 23, 2025

qgallouedec Feb 23, 2025 •

edited

Loading

qgallouedec Feb 23, 2025 •

edited

Loading

qgallouedec commented Feb 23, 2025

August-murr commented Feb 23, 2025 •

edited

Loading

qgallouedec commented Feb 23, 2025

qgallouedec Feb 24, 2025

August-murr commented Feb 25, 2025

qgallouedec commented Feb 28, 2025

kashif commented Feb 28, 2025

qgallouedec commented Feb 28, 2025

qgallouedec commented Feb 28, 2025

August-murr commented Mar 2, 2025 •

edited

Loading

ronald-d-rogers Mar 2, 2025

August-murr Mar 2, 2025

August-murr commented Mar 3, 2025

August-murr commented Mar 4, 2025

qgallouedec commented Mar 4, 2025

August-murr commented Mar 4, 2025

qgallouedec commented Mar 4, 2025

August-murr commented Mar 4, 2025

	from ..import_utils import is_agents_available

	from ..import_utils import is_agents_available

		def is_agents_available() -> bool:
		return _langchain_experimental_available and _vllm_available and _e2b_available

Agents #2936

Are you sure you want to change the base?

Agents #2936

Conversation

August-murr commented Feb 23, 2025 • edited Loading

Performance:

HuggingFaceDocBuilderDev commented Feb 23, 2025

qgallouedec commented Feb 23, 2025 • edited Loading

qgallouedec Feb 23, 2025

Choose a reason for hiding this comment

qgallouedec Feb 23, 2025

Choose a reason for hiding this comment

qgallouedec Feb 23, 2025

Choose a reason for hiding this comment

qgallouedec Feb 23, 2025 • edited Loading

Choose a reason for hiding this comment

qgallouedec Feb 23, 2025

Choose a reason for hiding this comment

qgallouedec Feb 23, 2025 • edited Loading

Choose a reason for hiding this comment

qgallouedec Feb 23, 2025 • edited Loading

Choose a reason for hiding this comment

qgallouedec commented Feb 23, 2025

August-murr commented Feb 23, 2025 • edited Loading

qgallouedec commented Feb 23, 2025

qgallouedec Feb 24, 2025

Choose a reason for hiding this comment

August-murr commented Feb 25, 2025

qgallouedec commented Feb 28, 2025

kashif commented Feb 28, 2025

qgallouedec commented Feb 28, 2025

qgallouedec commented Feb 28, 2025

August-murr commented Mar 2, 2025 • edited Loading

ronald-d-rogers Mar 2, 2025

Choose a reason for hiding this comment

August-murr Mar 2, 2025

Choose a reason for hiding this comment

August-murr commented Mar 3, 2025

August-murr commented Mar 4, 2025

qgallouedec commented Mar 4, 2025

August-murr commented Mar 4, 2025

qgallouedec commented Mar 4, 2025

August-murr commented Mar 4, 2025

August-murr commented Feb 23, 2025 •

edited

Loading

qgallouedec commented Feb 23, 2025 •

edited

Loading

qgallouedec Feb 23, 2025 •

edited

Loading

qgallouedec Feb 23, 2025 •

edited

Loading

qgallouedec Feb 23, 2025 •

edited

Loading

August-murr commented Feb 23, 2025 •

edited

Loading

August-murr commented Mar 2, 2025 •

edited

Loading