-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Agents #2936
base: main
Are you sure you want to change the base?
Agents #2936
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Super cool PR @August-murr! I'll check in details asap.
Can you share your code?
Definitely!
Do you have another simple while practical use case? |
trl/__init__.py
Outdated
from .agents import ( | ||
E2BExecutor, | ||
LocalExecutor, | ||
generate_agent_responses, | ||
get_code, | ||
prepare_data_for_e2b_agent, | ||
prepare_data_for_local_agent, | ||
read_script, | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this just after
if TYPE_CHECKING:
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
from .utils import ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from .utils import ( | |
from .utils import ( |
trl/agents/utils.py
Outdated
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
from ..import_utils import is_agents_available |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from ..import_utils import is_agents_available | |
from ..import_utils import is_agents_available |
trl/agents/utils.py
Outdated
if is_agents_available(): | ||
from inspect import getsource | ||
from pathlib import Path | ||
from typing import Callable, List, Optional | ||
|
||
from e2b_code_interpreter import Sandbox | ||
from langchain_experimental.utilities import PythonREPL | ||
from vllm import LLM, SamplingParams | ||
else: | ||
raise ImportError( | ||
"Agents utilities are not available. Please install trl with " "`pip install trl[agents]` to use utils" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if is_agents_available(): | |
from inspect import getsource | |
from pathlib import Path | |
from typing import Callable, List, Optional | |
from e2b_code_interpreter import Sandbox | |
from langchain_experimental.utilities import PythonREPL | |
from vllm import LLM, SamplingParams | |
else: | |
raise ImportError( | |
"Agents utilities are not available. Please install trl with " "`pip install trl[agents]` to use utils" | |
) | |
from inspect import getsource | |
from pathlib import Path | |
from typing import Callable, List, Optional | |
if e2b_available(): | |
from e2b_code_interpreter import Sandbox | |
if is_langchain_experimental_available(): | |
from langchain_experimental.utilities import PythonREPL | |
if is_vllm_available(): | |
from vllm import LLM, SamplingParams |
and move
raise ImportError(
"Agents utilities are not available. Please install trl with " "`pip install trl[agents]` to use utils"
)
directly into __init__
or in the functions that need it
trl/agents/utils.py
Outdated
Args: | ||
chat (str): The chat message containing the code snippet. | ||
tools_script (str, optional): A script to prepend to the extracted code snippet. Defaults to None. | ||
parsing_string (str, optional): The string used to identify the start of the code snippet in the chat message. Defaults to "<code>". | ||
Returns: | ||
str: The extracted code snippet, optionally prepended with the tools script. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Args: | |
chat (str): The chat message containing the code snippet. | |
tools_script (str, optional): A script to prepend to the extracted code snippet. Defaults to None. | |
parsing_string (str, optional): The string used to identify the start of the code snippet in the chat message. Defaults to "<code>". | |
Returns: | |
str: The extracted code snippet, optionally prepended with the tools script. | |
Args: | |
chat (`str`): | |
Chat message containing the code snippet. | |
tools_script (`str` or `None`, *optional*, defaults to `None`): | |
A script to prepend to the extracted code snippet. | |
parsing_string (`str`, *optional*, defaults to `"<code>"`): | |
String used to identify the start of the code snippet in the chat message. | |
Returns: | |
`str`: | |
Extracted code snippet, optionally prepended with the tools script. |
trl/agents/utils.py
Outdated
""" | ||
A class to handle code execution in an e2b sandbox environment. | ||
""" | ||
|
||
def __init__(self, api_key: str, dependencies: Optional[List[str]] = None, template: Optional[str] = None): | ||
""" | ||
Initialize the E2BExecutor with API key and optional settings. | ||
|
||
Args: | ||
api_key (str): Your E2B API Key. | ||
dependencies (Optional[List[str]]): A list of dependencies to install. Defaults to None. | ||
template (Optional[str]): Template for the sandbox environment. Defaults to None. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
""" | |
A class to handle code execution in an e2b sandbox environment. | |
""" | |
def __init__(self, api_key: str, dependencies: Optional[List[str]] = None, template: Optional[str] = None): | |
""" | |
Initialize the E2BExecutor with API key and optional settings. | |
Args: | |
api_key (str): Your E2B API Key. | |
dependencies (Optional[List[str]]): A list of dependencies to install. Defaults to None. | |
template (Optional[str]): Template for the sandbox environment. Defaults to None. | |
""" | |
""" | |
A class to handle code execution in an e2b sandbox environment. | |
Initialize the E2BExecutor with API key and optional settings. | |
Args: | |
api_key (`str`): | |
Your E2B API Key. | |
dependencies (`list[str]]` or `None`, *optional*, defaults to `None`): | |
A list of dependencies to install. | |
template (`str` or `None`, *optional*, defaults to `None`): | |
Template for the sandbox environment. | |
""" | |
def __init__(self, api_key: str, dependencies: Optional[List[str]] = None, template: Optional[str] = None): |
trl/agents/utils.py
Outdated
try: | ||
return Path(user_script_path).read_text() | ||
except Exception as e: | ||
raise RuntimeError(f"Error reading the user script: {e}") from e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try: | |
return Path(user_script_path).read_text() | |
except Exception as e: | |
raise RuntimeError(f"Error reading the user script: {e}") from e | |
return Path(user_script_path).read_text() |
Unless it really adds something, I'd avoid this. Unnecessarily wrapping exceptions can obscure tracebacks and make debugging harder
Can you add some tests and docs as well? |
Kaggle Notebook
|
Not sure when you tested it but peft + vllm should be fixed now |
trl/import_utils.py
Outdated
def is_agents_available() -> bool: | ||
return _langchain_experimental_available and _vllm_available and _e2b_available | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def is_agents_available() -> bool: | |
return _langchain_experimental_available and _vllm_available and _e2b_available |
I would not combine requirements here. Let's be fine grained where the deps are used
@qgallouedec I don't get why the tests failed. |
It's because of liger, merging main should solve this |
they have a fix in the 0.5.4 version |
Actually,I think it's 0.5.4 that contains the bug, that's why we pinned to 0.5.3: #2952 |
don't bother too much with windows. It a test fails, you can skip it |
@qgallouedec is there anything else needed?? |
processed_prompts = tokenizer.apply_chat_template(conversations, tokenize=False, add_generation_prompt=True) | ||
|
||
# Create a new dataset with processed prompts | ||
return dataset.map(lambda x, idx: {prompt_column: processed_prompts[idx]}, with_indices=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this remove all of the columns? If so this might break things for folks who put the answer in a column to verify the completion in the reward func.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, it doesn't. That was taken into consideration
the function only modifies the column "prompt" (or whatever column name you input) and keeps the other columns unchanged.
just added parallel code execution using asyncio in 2197385 to make code generation more scalable. Right now, we’re working with E2B, but if we find that alternatives like CodeSandbox.io, Modal, or Daytona are cheaper or faster, we’ll create wrappers for those too. |
@qgallouedec |
Thanks for the feedback, what's the traceback? |
there is no error, Everything's running, but the reward doesn't really seem to improve —the reward is all over the place. |
Possibly related? #2873 |
Can you share the code you used in #2873 (comment) |
How to use:
pip install trl[agents]
The default setting uses LocalExecuter, and I haven't trained any models with E2BExecuter yet.
I separated the data preprocessing from the GRPOTrainer script by using the prepare_data functions. This keeps changes to the trainer script minimal.
Performance:
I faced out-of-memory (OOM) issues and couldn't train an agent for more complex tasks. To show that it works, I trained the Qwen 2.5 0.5B to call code, using a reward function that counted the number of code calls in its output. The results showed an increase from 14% for the base model to 58% after 10 steps and 93% after 20 steps.
While the results look good, they don’t represent a practical use case. I think we should train an agent on a larger scale to achieve specific goals or benchmarks. We can also improve any factors that slow down training or scalability. In the end, we could write a blog post or report to showcase its effectiveness.
@qgallouedec, @lewtun, @edbeeching, @aymeric-roucher
Here are some ideas for training the agent: