Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Agents #2936

Open
wants to merge 17 commits into
base: main
Choose a base branch
from
Open

Agents #2936

wants to merge 17 commits into from

Conversation

August-murr
Copy link
Collaborator

@August-murr August-murr commented Feb 23, 2025

How to use:
pip install trl[agents]

from trl import GRPOConfig, GRPOTrainer,prepare_data_for_local_agent
from transformers import AutoTokenizer
from datasets import load_dataset

dataset = load_dataset(...)
tokenizer = AutoTokenizer.from_pretrained(...)
my_prompt = "..."

prepared_data = prepare_data_for_local_agent(dataset,tokenizer,system_prompt=my_prompt)

training_args = GRPOConfig(output_dir=...,....,use_vllm=True,use_agent=True)

trainer = GRPOTrainer(model="model_path",reward_funcs=reward_funcs,args=training_args,train_dataset=prepared_data)

trainer.train()

The default setting uses LocalExecuter, and I haven't trained any models with E2BExecuter yet.

I separated the data preprocessing from the GRPOTrainer script by using the prepare_data functions. This keeps changes to the trainer script minimal.

Performance:

I faced out-of-memory (OOM) issues and couldn't train an agent for more complex tasks. To show that it works, I trained the Qwen 2.5 0.5B to call code, using a reward function that counted the number of code calls in its output. The results showed an increase from 14% for the base model to 58% after 10 steps and 93% after 20 steps.

While the results look good, they don’t represent a practical use case. I think we should train an agent on a larger scale to achieve specific goals or benchmarks. We can also improve any factors that slow down training or scalability. In the end, we could write a blog post or report to showcase its effectiveness.
@qgallouedec, @lewtun, @edbeeching, @aymeric-roucher
Here are some ideas for training the agent:

  1. Use an R1 variant, based on the budget, to train an agent to think and write code recursively to solve tasks.
  2. Focus on standard benchmarks like GAIA or coding tests, with a reward function based on the think and code structure, and correctness.
  3. for deep research, we could train the R1 agent using a dataset of queries and reports, or articles. The reward function would assess how similar the text embedding of the output report is to the correct report and its structure (e.g., markdown formatting).

@August-murr August-murr requested review from qgallouedec, lewtun, edbeeching and kashif and removed request for lewtun February 23, 2025 13:13
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@qgallouedec
Copy link
Member

qgallouedec commented Feb 23, 2025

Super cool PR @August-murr! I'll check in details asap.

I faced out-of-memory (OOM) issues and couldn't train an agent for more complex tasks.

Can you share your code?

In the end, we could write a blog post or report to showcase its effectiveness.

Definitely!

While the results look good, they don’t represent a practical use case

Do you have another simple while practical use case?

trl/__init__.py Outdated
Comment on lines 232 to 241
from .agents import (
E2BExecutor,
LocalExecutor,
generate_agent_responses,
get_code,
prepare_data_for_e2b_agent,
prepare_data_for_local_agent,
read_script,
)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this just after

if TYPE_CHECKING:

# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from .utils import (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
from .utils import (
from .utils import (

# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from ..import_utils import is_agents_available
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
from ..import_utils import is_agents_available
from ..import_utils import is_agents_available

Comment on lines 17 to 28
if is_agents_available():
from inspect import getsource
from pathlib import Path
from typing import Callable, List, Optional

from e2b_code_interpreter import Sandbox
from langchain_experimental.utilities import PythonREPL
from vllm import LLM, SamplingParams
else:
raise ImportError(
"Agents utilities are not available. Please install trl with " "`pip install trl[agents]` to use utils"
)
Copy link
Member

@qgallouedec qgallouedec Feb 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if is_agents_available():
from inspect import getsource
from pathlib import Path
from typing import Callable, List, Optional
from e2b_code_interpreter import Sandbox
from langchain_experimental.utilities import PythonREPL
from vllm import LLM, SamplingParams
else:
raise ImportError(
"Agents utilities are not available. Please install trl with " "`pip install trl[agents]` to use utils"
)
from inspect import getsource
from pathlib import Path
from typing import Callable, List, Optional
if e2b_available():
from e2b_code_interpreter import Sandbox
if is_langchain_experimental_available():
from langchain_experimental.utilities import PythonREPL
if is_vllm_available():
from vllm import LLM, SamplingParams

and move

raise ImportError(
         "Agents utilities are not available. Please install trl with " "`pip install trl[agents]` to use utils"
     )

directly into __init__ or in the functions that need it

Comment on lines 38 to 43
Args:
chat (str): The chat message containing the code snippet.
tools_script (str, optional): A script to prepend to the extracted code snippet. Defaults to None.
parsing_string (str, optional): The string used to identify the start of the code snippet in the chat message. Defaults to "<code>".
Returns:
str: The extracted code snippet, optionally prepended with the tools script.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Args:
chat (str): The chat message containing the code snippet.
tools_script (str, optional): A script to prepend to the extracted code snippet. Defaults to None.
parsing_string (str, optional): The string used to identify the start of the code snippet in the chat message. Defaults to "<code>".
Returns:
str: The extracted code snippet, optionally prepended with the tools script.
Args:
chat (`str`):
Chat message containing the code snippet.
tools_script (`str` or `None`, *optional*, defaults to `None`):
A script to prepend to the extracted code snippet.
parsing_string (`str`, *optional*, defaults to `"<code>"`):
String used to identify the start of the code snippet in the chat message.
Returns:
`str`:
Extracted code snippet, optionally prepended with the tools script.

Comment on lines 52 to 64
"""
A class to handle code execution in an e2b sandbox environment.
"""

def __init__(self, api_key: str, dependencies: Optional[List[str]] = None, template: Optional[str] = None):
"""
Initialize the E2BExecutor with API key and optional settings.

Args:
api_key (str): Your E2B API Key.
dependencies (Optional[List[str]]): A list of dependencies to install. Defaults to None.
template (Optional[str]): Template for the sandbox environment. Defaults to None.
"""
Copy link
Member

@qgallouedec qgallouedec Feb 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""
A class to handle code execution in an e2b sandbox environment.
"""
def __init__(self, api_key: str, dependencies: Optional[List[str]] = None, template: Optional[str] = None):
"""
Initialize the E2BExecutor with API key and optional settings.
Args:
api_key (str): Your E2B API Key.
dependencies (Optional[List[str]]): A list of dependencies to install. Defaults to None.
template (Optional[str]): Template for the sandbox environment. Defaults to None.
"""
"""
A class to handle code execution in an e2b sandbox environment.
Initialize the E2BExecutor with API key and optional settings.
Args:
api_key (`str`):
Your E2B API Key.
dependencies (`list[str]]` or `None`, *optional*, defaults to `None`):
A list of dependencies to install.
template (`str` or `None`, *optional*, defaults to `None`):
Template for the sandbox environment.
"""
def __init__(self, api_key: str, dependencies: Optional[List[str]] = None, template: Optional[str] = None):

Comment on lines 101 to 104
try:
return Path(user_script_path).read_text()
except Exception as e:
raise RuntimeError(f"Error reading the user script: {e}") from e
Copy link
Member

@qgallouedec qgallouedec Feb 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
try:
return Path(user_script_path).read_text()
except Exception as e:
raise RuntimeError(f"Error reading the user script: {e}") from e
return Path(user_script_path).read_text()

Unless it really adds something, I'd avoid this. Unnecessarily wrapping exceptions can obscure tracebacks and make debugging harder

@qgallouedec
Copy link
Member

Can you add some tests and docs as well?

@August-murr
Copy link
Collaborator Author

August-murr commented Feb 23, 2025

Can you share your code?

Kaggle Notebook
the biggest issue was really Kaggles 2xT4 having little VRAM.
I did try PEFT but then couldn't use it properly with vllm then decided to do full model instead.

Do you have another simple while practical use case?
no, not simpler than that.

@qgallouedec
Copy link
Member

Not sure when you tested it but peft + vllm should be fixed now

@August-murr August-murr changed the base branch from agents to main February 24, 2025 08:22
Comment on lines 64 to 66
def is_agents_available() -> bool:
return _langchain_experimental_available and _vllm_available and _e2b_available

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def is_agents_available() -> bool:
return _langchain_experimental_available and _vllm_available and _e2b_available

I would not combine requirements here. Let's be fine grained where the deps are used

@August-murr
Copy link
Collaborator Author

@qgallouedec I don't get why the tests failed.
8 tests failed with error:
module 'torch' has no attribute 'hip'

@qgallouedec
Copy link
Member

It's because of liger, merging main should solve this

@kashif
Copy link
Collaborator

kashif commented Feb 28, 2025

they have a fix in the 0.5.4 version

@qgallouedec
Copy link
Member

Actually,I think it's 0.5.4 that contains the bug, that's why we pinned to 0.5.3: #2952

@qgallouedec
Copy link
Member

don't bother too much with windows. It a test fails, you can skip it

@August-murr
Copy link
Collaborator Author

August-murr commented Mar 2, 2025

@qgallouedec is there anything else needed??

processed_prompts = tokenizer.apply_chat_template(conversations, tokenize=False, add_generation_prompt=True)

# Create a new dataset with processed prompts
return dataset.map(lambda x, idx: {prompt_column: processed_prompts[idx]}, with_indices=True)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this remove all of the columns? If so this might break things for folks who put the answer in a column to verify the completion in the reward func.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, it doesn't. That was taken into consideration
the function only modifies the column "prompt" (or whatever column name you input) and keeps the other columns unchanged.

@August-murr
Copy link
Collaborator Author

just added parallel code execution using asyncio in 2197385 to make code generation more scalable.

Right now, we’re working with E2B, but if we find that alternatives like CodeSandbox.io, Modal, or Daytona are cheaper or faster, we’ll create wrappers for those too.

@August-murr
Copy link
Collaborator Author

Not sure when you tested it but peft + vllm should be fixed now

@qgallouedec
Can you share some code for a training run that worked for you? I've been using the example from docs plus a lora config and use_vllm=True and it has not been working. not for agents btw, just training model with the reward function in the docs.

@qgallouedec
Copy link
Member

Thanks for the feedback, what's the traceback?

@August-murr
Copy link
Collaborator Author

Thanks for the feedback, what's the traceback?

there is no error, Everything's running, but the reward doesn't really seem to improve —the reward is all over the place.
Notebook
This exact notebook, without the PEFTConfig, works perfectly, pretty much nailing it with no response (getting the max reward) after just a few steps. So, I’m pretty convinced it has something to do with the PEFT and not the reward function or the data or other hyper-parameters.

@qgallouedec
Copy link
Member

Possibly related? #2873

@August-murr
Copy link
Collaborator Author

Possibly related? #2873

Can you share the code you used in #2873 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants