-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(skills): add skills to the pandas-ai library #653
Conversation
WalkthroughThis update introduces a new feature to the PandasAI library, allowing users to add custom skills to the agent. The changes include the creation of a Changes
TipsChat with CodeRabbit Bot (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 16
Files ignored due to filter (1)
- mkdocs.yml
Files selected for processing (15)
- docs/examples.md (1 hunks)
- docs/skills.md (1 hunks)
- examples/skills_example.py (1 hunks)
- pandasai/init.py (2 hunks)
- pandasai/agent/init.py (2 hunks)
- pandasai/assets/prompt_templates/generate_python_code.tmpl (1 hunks)
- pandasai/constants.py (1 hunks)
- pandasai/helpers/code_manager.py (8 hunks)
- pandasai/helpers/skills_manager.py (1 hunks)
- pandasai/skills/init.py (1 hunks)
- pandasai/smart_dataframe/init.py (2 hunks)
- pandasai/smart_datalake/init.py (7 hunks)
- tests/prompts/test_generate_python_code_prompt.py (1 hunks)
- tests/skills/test_skills.py (1 hunks)
- tests/test_codemanager.py (6 hunks)
Files skipped from review due to trivial changes (5)
- docs/examples.md
- pandasai/init.py
- pandasai/assets/prompt_templates/generate_python_code.tmpl
- pandasai/constants.py
- tests/prompts/test_generate_python_code_prompt.py
Additional comments (Suppressed): 23
pandasai/smart_dataframe/__init__.py (2)
26-32: The import statement for the
skill
decorator is added. Ensure that theskill
decorator is correctly implemented and available in thepandasai.skills
module.320-327: The
add_skills
method is added to thePandasAI
class. This method allows adding skills to the instance. Ensure that theadd_skills
method is correctly implemented in theSmartDatalake
class and that it correctly handles the skills.examples/skills_example.py (1)
- 1-46: The new hunk introduces a new skill to the pandas-ai library. The skill is a function
plot_salaries
that takes a merged DataFrame and plots the employee salaries against names. The function is decorated with the@skill
decorator, which registers the function as a skill with the provided attributes. The function is then added to anAgent
instance using theadd_skills
method. The agent is then used to chat and execute the skill.The code looks good overall, but there are a few points to consider:
The
plot_salaries
function saves the plot to a file named "temp_chart.png". If this function is called multiple times, the previous plot will be overwritten. Consider generating a unique filename for each plot to avoid overwriting previous plots.The
plot_salaries
function does not return any value. The function is expected to return a string according to the function signature, but it does not. Consider returning a message indicating the success of the operation or the filename of the saved plot.The
YOUR_API_KEY
placeholder in line 39 should be replaced with a valid API key. Make sure to not expose the API key in the code or version control system for security reasons. Consider using environment variables or a secure method to store and retrieve sensitive data like API keys.The
Agent
instance is created with amemory_size
of 10. Ensure that this is the intended size and that it is sufficient for the expected number of dataframes and skills.The
Agent
instance is created with two dataframesemployees_df
andsalaries_df
. Ensure that these dataframes are merged before calling theplot_salaries
skill, as the skill expects a merged dataframe.The
chat
method is called with a hardcoded string. If this is a user input, ensure that it is sanitized and validated before use.pandasai/helpers/skills_manager.py (1)
- 96-98: The setter for
used_skills
allows replacing the entire list of used skills. If this is not intended, consider removing the setter to makeused_skills
read-only.- @used_skills.setter - def used_skills(self, value): - self._used_skills = valuepandasai/agent/__init__.py (2)
1-7: The import statement for the
skill
decorator is added. This is used in the newadd_skills
method in theAgent
class.45-52: The
add_skills
method is introduced to add skills to theAgent
instance. This method simply forwards the skills to theSmartDatalake
instance. Ensure that theadd_skills
method in theSmartDatalake
class is implemented correctly and handles the skills appropriately.pandasai/helpers/code_manager.py (5)
208-220: The
execute_code
method now takes aCodeExecutionContext
object as a parameter instead of aprompt_id
. This change should be verified across the codebase to ensure that all calls to this method have been updated accordingly.238-241: The
execute_code
method now resets the used skills before executing the code. This is a good practice as it ensures that the skills are reset before each code execution, preventing potential issues with stale or incorrect skills being used.258-262: The
execute_code
method now adds the used skills to the environment before executing the code. This is a good practice as it ensures that the skills are available in the environment during code execution.411-414: The
_clean_code
method now takes aCodeExecutionContext
object as a parameter. This change should be verified across the codebase to ensure that all calls to this method have been updated accordingly.439-447: The
_clean_code
method now checks for used skills inside theanalyze_data
function. This is a good practice as it ensures that the skills used in the function are correctly identified and tracked.pandasai/skills/__init__.py (2)
17-21: The
func_def
attribute is a string representation of the function definition. However, it's not clear what the purpose of this attribute is. If it's for debugging or logging, it might be better to useinspect.getsource(skill_function)
to get the actual source code of the function, which would include any comments or other details in the function body. If it's for some other purpose, please clarify.23-37: The
__str__
or__repr__
method on the function object itself, which would be automatically called when the function is printed or converted to a string. If it's for some other purpose, please clarify.tests/test_codemanager.py (5)
1-6: The import of
uuid
has been removed andMagicMock
has been added. Ensure thatuuid
is not used elsewhere in the code or it will cause aNameError
. The addition ofMagicMock
is fine if it's used for mocking objects in the tests.10-16: The import of
CodeExecutionContext
has been added. This is fine if it's used in the tests.71-86: The
exec_context
fixture has been added and is used in thetest_run_code_for_calculations
andtest_run_code_invalid_code
tests. This is a good practice as it allows for better control and isolation of the test environment.108-164: The
exec_context
fixture is used in thetest_clean_code_removes_jailbreak_code
,test_clean_code_remove_environment_defaults
,test_clean_code_whitelist_import
,test_clean_code_raise_bad_import_error
, andtest_remove_dfs_overwrites
tests. This is a good practice as it allows for better control and isolation of the test environment.189-199: > Note: This review was outside of the patch, so it was mapped to the patch with the greatest overlap. Original lines [178-199]
The
exec_context
fixture is used in thetest_custom_whitelisted_dependencies
test. This is a good practice as it allows for better control and isolation of the test environment.pandasai/smart_datalake/__init__.py (5)
23-31: The import of
SkillsManager
andskill
frompandasai.helpers.skills_manager
andpandasai.skills
respectively, indicates the introduction of a new feature to manage skills. Ensure that these modules exist and are correctly implemented.41-47: The import of
CodeExecutionContext
from..helpers.code_manager
suggests that the code execution context is now being managed separately. This could improve the modularity and maintainability of the code.57-63: The addition of
_skills: SkillsManager
as a class attribute in theSmartDatalake
class indicates that skills are now being managed at the instance level. This could enhance the functionality of the class by allowing it to handle skills.102-109: The instantiation of
SkillsManager
in theSmartDatalake
constructor suggests that a newSkillsManager
object is created for each instance ofSmartDatalake
. This could improve the encapsulation and modularity of the code.358-367: The creation of a
CodeExecutionContext
object in the_execute_code
method and passing it to theexecute_code
method ofCodeManager
could improve the modularity and maintainability of the code by separating the execution context from theCodeManager
. However, ensure that theexecute_code
method ofCodeManager
has been updated to accept aCodeExecutionContext
object instead of aprompt_id
.
def get_skill_by_func_name(self, name: str): | ||
""" | ||
Get a skill by its name. | ||
|
||
Args: | ||
name (str): The name of the skill to retrieve. | ||
|
||
Returns: | ||
Skill or None: The skill with the given name, or None if not found. | ||
""" | ||
for skill in self._skills: | ||
if skill.name == name: | ||
return skill | ||
|
||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The get_skill_by_func_name
method could be simplified and made more efficient by using a dictionary to store skills by their names, as suggested above. Here's a suggested change:
- def get_skill_by_func_name(self, name: str):
- for skill in self._skills:
- if skill.name == name:
- return skill
- return None
+ def get_skill_by_func_name(self, name: str):
+ return self._skills.get(name)
pandasai/skills/__init__.py
Outdated
# The @skill decorator to define and register a skill | ||
def skill(name: str, description: str, usage: str): | ||
def decorator(skill_function): | ||
def wrapped_function(*args, **kwargs): | ||
return skill_function(*args, **kwargs) | ||
|
||
wrapped_function.name = skill_function.__name__ | ||
wrapped_function.skill_info = { | ||
"name": name, | ||
"description": description, | ||
"usage": usage, | ||
} | ||
wrapped_function.func_def = ( | ||
"""def pandasai.skills.{funcion_name}{signature}""".format( | ||
funcion_name=wrapped_function.name, | ||
signature=str(inspect.signature(skill_function)), | ||
) | ||
) | ||
|
||
wrapped_function.print = ( | ||
""" | ||
<function> | ||
Name: {name} | ||
Description: {description} | ||
Usage: {usage} | ||
{signature} | ||
</function | ||
""" | ||
).format( | ||
name=name, | ||
description=description, | ||
usage=usage, | ||
signature=wrapped_function.func_def, | ||
) | ||
|
||
return wrapped_function | ||
|
||
return decorator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The skill
decorator is well implemented. It wraps the original function and adds additional attributes to it. However, there's a minor typo in the closing tag of the function
in the print
attribute. It should be </function>
instead of </function
.
</function
</function>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ArslanSaleem there's a small typo here!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like humans are talking. I'll be quiet for now. If you need me again, just tag @coderabbitai in a new comment.
tests/skills/test_skills.py
Outdated
def test_add_skills_with_agent(self, agent: Agent): | ||
# Define skills using the decorator | ||
@skill("SkillA", "Description A", "Usage A") | ||
def skill_a(*args, **kwargs): | ||
return "SkillA Result" | ||
|
||
@skill("SkillB", "Description B", "Usage B") | ||
def skill_b(*args, **kwargs): | ||
return "SkillB Result" | ||
|
||
agent.add_skills(skill_a) | ||
assert len(agent._lake._skills.skills) == 1 | ||
|
||
agent._lake._skills._skills = [] | ||
agent.add_skills(skill_a, skill_b) | ||
assert len(agent._lake._skills.skills) == 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test test_add_skills_with_agent
checks if the add_skills
method of the Agent
class correctly adds skills. However, it does not check if the method correctly handles the case when the same skill is added twice. It would be better to also check for this case to ensure that the method does not add the same skill twice.
+ # Test that adding the same skill again does not increase the number of skills
+ agent.add_skills(skill_a)
+ assert len(agent._lake._skills.skills) == 1
tests/skills/test_skills.py
Outdated
def test_add_skills_with_smartDataframe(self, smart_dataframe: SmartDataframe): | ||
# Define skills using the decorator | ||
@skill("SkillA", "Description A", "Usage A") | ||
def skill_a(*args, **kwargs): | ||
return "SkillA Result" | ||
|
||
@skill("SkillB", "Description B", "Usage B") | ||
def skill_b(*args, **kwargs): | ||
return "SkillB Result" | ||
|
||
smart_dataframe.add_skills(skill_a) | ||
assert len(smart_dataframe._lake._skills.skills) == 1 | ||
|
||
smart_dataframe._lake._skills._skills = [] | ||
smart_dataframe.add_skills(skill_a, skill_b) | ||
assert len(smart_dataframe._lake._skills.skills) == 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test test_add_skills_with_smartDataframe
checks if the add_skills
method of the SmartDataframe
class correctly adds skills. However, it does not check if the method correctly handles the case when the same skill is added twice. It would be better to also check for this case to ensure that the method does not add the same skill twice.
+ # Test that adding the same skill again does not increase the number of skills
+ smart_dataframe.add_skills(skill_a)
+ assert len(smart_dataframe._lake._skills.skills) == 1
tests/skills/test_skills.py
Outdated
def test_code_exec_with_skills_no_use( | ||
self, code_manager: CodeManager, exec_context: MagicMock | ||
): | ||
code = """def analyze_data(dfs): | ||
return {'type': 'number', 'value': 1 + 1}""" | ||
skill1 = MagicMock() | ||
skill1.name = "SkillA" | ||
exec_context._skills_manager._skills = [skill1] | ||
code_manager.execute_code(code, exec_context) | ||
assert len(exec_context._skills_manager.used_skills) == 0 | ||
|
||
def test_code_exec_with_skills(self, code_manager: CodeManager): | ||
code = """def analyze_data(dfs): | ||
plot_salaries() | ||
return {'type': 'number', 'value': 1 + 1}""" | ||
|
||
@skill( | ||
name="Display employee salary", | ||
description="Plots the employee salaries against names", | ||
usage="Displays the plot having name on x axis and salaries on y axis", | ||
) | ||
def plot_salaries() -> str: | ||
return "plot_salaries" | ||
|
||
code_manager._middlewares = [] | ||
|
||
sm = SkillsManager() | ||
sm.add_skills(plot_salaries) | ||
exec_context = CodeExecutionContext(uuid.uuid4(), sm) | ||
code_manager.execute_code(code, exec_context) | ||
|
||
assert len(exec_context._skills_manager.used_skills) == 1 | ||
assert exec_context._skills_manager.used_skills[0] == "plot_salaries" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test test_code_exec_with_skills
checks if the execute_code
method of the CodeManager
class correctly adds used skills. However, it does not check if the method correctly handles the case when the same skill is used twice in the code. It would be better to also check for this case to ensure that the method does not add the same skill twice to the used skills.
+ # Test that using the same skill twice in the code does not increase the number of used skills
+ code = """def analyze_data(dfs):
+ plot_salaries()
+ plot_salaries()
+ return {'type': 'number', 'value': 1 + 1}"""
+ code_manager.execute_code(code, exec_context)
+ assert len(exec_context._skills_manager.used_skills) == 1
+ assert exec_context._skills_manager.used_skills[0] == "plot_salaries"
pandasai/smart_datalake/__init__.py
Outdated
""" | ||
self._code_manager.add_middlewares(*middlewares) | ||
|
||
def add_skills(self, *skills: List[skill]): | ||
""" | ||
Add Skills to PandasAI | ||
""" | ||
self._skills.add_skills(*skills) | ||
|
||
def _start_timer(self): | ||
"""Start the timer""" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The addition of the add_skills
method allows for the addition of skills to the SmartDatalake
instance. This could enhance the functionality of the class by allowing it to handle skills. However, the type hint for the skills
parameter should be skill
instead of List[skill]
as the *
operator is used, which unpacks the arguments from a list.
- def add_skills(self, *skills: List[skill]):
+ def add_skills(self, *skills: skill):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def add_skills(self, *skills): | ||
""" | ||
Add skills to the list of skills. If a skill with the same name | ||
already exists, raise an error. | ||
|
||
Args: | ||
*skills: Variable number of skill objects to add. | ||
""" | ||
for skill in skills: | ||
if any( | ||
existing_skill.name == skill.name for existing_skill in self._skills | ||
): | ||
raise ValueError(f"Skill with name '{skill.name}' already exists.") | ||
|
||
self._skills.extend(skills) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The add_skills
method currently raises an error if a skill with the same name is added. This could be problematic if the user wants to update an existing skill. Consider providing an option to overwrite an existing skill.
- raise ValueError(f"Skill with name '{skill.name}' already exists.")
+ existing_skill = next((s for s in self._skills if s.name == skill.name), None)
+ if existing_skill:
+ if overwrite:
+ self._skills.remove(existing_skill)
+ else:
+ raise ValueError(f"Skill with name '{skill.name}' already exists.")
pandasai/helpers/skills_manager.py
Outdated
for skill in self._skills: | ||
print(skill) | ||
|
||
return any(skill.name == name for skill in self._skills) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The print statement on line 45 seems to be for debugging purposes and should be removed for the final version of the code.
- print(skill)
def prompt_display(self) -> str: | ||
""" | ||
Displays skills for prompt | ||
""" | ||
if len(self._skills) == 0: | ||
return | ||
|
||
return ( | ||
""" | ||
You can also use the following functions, if relevant: | ||
|
||
""" | ||
+ self.__str__() | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The prompt_display
method returns None if there are no skills. This could potentially cause issues if the return value is expected to be a string. Consider returning an empty string or a message indicating that there are no skills.
- return
+ return "No skills available."
CodeRabbit review skipped By default, CodeRabbit only reviews PRs on the default branch. If you wish to have PRs reviewed on additional branches, you can configure that under the repository settings in the UI. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few minor changes, great job!
docs/skills.md
Outdated
description="Plots the employee salaries against names", | ||
usage="Displays the plot having name on x axis and salaries on y axis", | ||
) | ||
def plot_salaries(merged_df: pd.DataFrame) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably show a less rigid implementation. What I have in mind is more something that lets the LLM orchestrate and pass the variable accordingly.
For example:
def plot_salaries(names: ..., salaries: ...) -> ...:
plt.bar(names, salaries)
plt.xlabel("Employee Name")
plt.ylabel("Salary")
plt.title("Employee Salaries")
plt.xticks(rotation=45)
plt.savefig("temp_chart.png")
plt.close()
So that basically is the LLM that figures out which data to pass to the skill. The skill shouldn't be specific for one use case/df, but a more "generalist" function that can be used in many similar use cases, letting the LLM orchestrate and figuring out which one is the right one to use (if any).
pandasai/helpers/skills_manager.py
Outdated
bool: True if a skill with the given name exists, False otherwise. | ||
""" | ||
for skill in self._skills: | ||
print(skill) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leftover?
pandasai/skills/__init__.py
Outdated
# The @skill decorator to define and register a skill | ||
def skill(name: str, description: str, usage: str): | ||
def decorator(skill_function): | ||
def wrapped_function(*args, **kwargs): | ||
return skill_function(*args, **kwargs) | ||
|
||
wrapped_function.name = skill_function.__name__ | ||
wrapped_function.skill_info = { | ||
"name": name, | ||
"description": description, | ||
"usage": usage, | ||
} | ||
wrapped_function.func_def = ( | ||
"""def pandasai.skills.{funcion_name}{signature}""".format( | ||
funcion_name=wrapped_function.name, | ||
signature=str(inspect.signature(skill_function)), | ||
) | ||
) | ||
|
||
wrapped_function.print = ( | ||
""" | ||
<function> | ||
Name: {name} | ||
Description: {description} | ||
Usage: {usage} | ||
{signature} | ||
</function | ||
""" | ||
).format( | ||
name=name, | ||
description=description, | ||
usage=usage, | ||
signature=wrapped_function.func_def, | ||
) | ||
|
||
return wrapped_function | ||
|
||
return decorator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ArslanSaleem there's a small typo here!
Codecov Report
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. @@ Coverage Diff @@
## release/v1.4 #653 +/- ##
===============================================
Coverage ? 84.94%
===============================================
Files ? 70
Lines ? 3507
Branches ? 0
===============================================
Hits ? 2979
Misses ? 528
Partials ? 0 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
Summary by CodeRabbit
New Features:
SkillsManager
class to manage and track custom skills, enhancing the system's organization and efficiency.Improvements:
CodeManager
class to provide additional context for code execution, improving code management.SmartDatalake
class to include skills in the prompt, enhancing user interaction.Documentation:
Tests:
SkillsManager
class,skill
decorator, and their interactions with other classes, ensuring robustness and reliability.Refactor:
CodeManager
class to use aCodeExecutionContext
object, improving code readability and maintainability.