Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(skills): add skills to the pandas-ai library #653

Merged
merged 12 commits into from
Oct 21, 2023

Conversation

ArslanSaleem
Copy link
Collaborator

@ArslanSaleem ArslanSaleem commented Oct 17, 2023

Summary by CodeRabbit

New Features:

  • Introduced a new functionality to add custom skills to the PandasAI agent, enhancing its capabilities and versatility.
  • Implemented a new feature to display employee salaries using a bar plot, improving data visualization.
  • Added a new SkillsManager class to manage and track custom skills, enhancing the system's organization and efficiency.

Improvements:

  • Enhanced the CodeManager class to provide additional context for code execution, improving code management.
  • Updated the SmartDatalake class to include skills in the prompt, enhancing user interaction.

Documentation:

  • Updated documentation with examples demonstrating how to add custom functions to the agent.

Tests:

  • Added new test cases for the SkillsManager class, skill decorator, and their interactions with other classes, ensuring robustness and reliability.

Refactor:

  • Refactored the CodeManager class to use a CodeExecutionContext object, improving code readability and maintainability.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 17, 2023

Walkthrough

This update introduces a new feature to the PandasAI library, allowing users to add custom skills to the agent. The changes include the creation of a SkillsManager class, a skill decorator, and modifications to the CodeManager and SmartDatalake classes. The update also includes relevant documentation and test cases.

Changes

File(s) Summary
docs/examples.md, docs/skills.md Added documentation for the new feature of adding custom skills to the agent.
examples/skills_example.py Demonstrated the usage of the new feature by adding a custom skill to plot employee salaries.
pandasai/.../__init__.py, pandasai/constants.py Updated import statements to include the new skill module.
pandasai/agent/__init__.py, pandasai/smart_dataframe/__init__.py, pandasai/smart_datalake/__init__.py Added a new method add_skills to allow adding custom skills to the agent.
pandasai/helpers/code_manager.py Modified the execute_code method to take a CodeExecutionContext object, which provides additional context for code execution.
pandasai/helpers/skills_manager.py Introduced a new SkillsManager class to manage and track the usage of custom skills.
pandasai/skills/__init__.py Introduced a new skill decorator to define and register custom skills.
tests/.../test_generate_python_code_prompt.py Updated test case to include the new "skills" variable.
tests/skills/test_skills.py, tests/test_codemanager.py Added new test cases for the SkillsManager class, skill decorator, and the interaction between SkillsManager, Agent, and SmartDataframe classes.

🐇💻

Code is hopping, skills are popping,

With each new feature, there's no stopping.

From dataframes to AI, we're non-stop,

In the world of code, we're at the top! 🎉🚀


Tips

Chat with CodeRabbit Bot (@coderabbitai)

  • Mention @coderabbitai in any review comment for bot assistance.
  • Note: Review comments are made on code diffs or files, not on the PR overview.

Pause Incremental Reviews

  • Insert @coderabbitai: ignore in the PR description to halt the bot's ongoing reviews. Remove the line to resume.
  • Additionally, enter @coderabbitai pause to halt the bot's ongoing reviews as any issue or review comment. To
    resume reviews, enter @coderabbitai resume as any issue or review comment.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 16

Commits Files that changed from the base of the PR and between 6e52568 and af4ce21.
Files ignored due to filter (1)
  • mkdocs.yml
Files selected for processing (15)
  • docs/examples.md (1 hunks)
  • docs/skills.md (1 hunks)
  • examples/skills_example.py (1 hunks)
  • pandasai/init.py (2 hunks)
  • pandasai/agent/init.py (2 hunks)
  • pandasai/assets/prompt_templates/generate_python_code.tmpl (1 hunks)
  • pandasai/constants.py (1 hunks)
  • pandasai/helpers/code_manager.py (8 hunks)
  • pandasai/helpers/skills_manager.py (1 hunks)
  • pandasai/skills/init.py (1 hunks)
  • pandasai/smart_dataframe/init.py (2 hunks)
  • pandasai/smart_datalake/init.py (7 hunks)
  • tests/prompts/test_generate_python_code_prompt.py (1 hunks)
  • tests/skills/test_skills.py (1 hunks)
  • tests/test_codemanager.py (6 hunks)
Files skipped from review due to trivial changes (5)
  • docs/examples.md
  • pandasai/init.py
  • pandasai/assets/prompt_templates/generate_python_code.tmpl
  • pandasai/constants.py
  • tests/prompts/test_generate_python_code_prompt.py
Additional comments (Suppressed): 23
pandasai/smart_dataframe/__init__.py (2)
  • 26-32: The import statement for the skill decorator is added. Ensure that the skill decorator is correctly implemented and available in the pandasai.skills module.

  • 320-327: The add_skills method is added to the PandasAI class. This method allows adding skills to the instance. Ensure that the add_skills method is correctly implemented in the SmartDatalake class and that it correctly handles the skills.

examples/skills_example.py (1)
  • 1-46: The new hunk introduces a new skill to the pandas-ai library. The skill is a function plot_salaries that takes a merged DataFrame and plots the employee salaries against names. The function is decorated with the @skill decorator, which registers the function as a skill with the provided attributes. The function is then added to an Agent instance using the add_skills method. The agent is then used to chat and execute the skill.

The code looks good overall, but there are a few points to consider:

  1. The plot_salaries function saves the plot to a file named "temp_chart.png". If this function is called multiple times, the previous plot will be overwritten. Consider generating a unique filename for each plot to avoid overwriting previous plots.

  2. The plot_salaries function does not return any value. The function is expected to return a string according to the function signature, but it does not. Consider returning a message indicating the success of the operation or the filename of the saved plot.

  3. The YOUR_API_KEY placeholder in line 39 should be replaced with a valid API key. Make sure to not expose the API key in the code or version control system for security reasons. Consider using environment variables or a secure method to store and retrieve sensitive data like API keys.

  4. The Agent instance is created with a memory_size of 10. Ensure that this is the intended size and that it is sufficient for the expected number of dataframes and skills.

  5. The Agent instance is created with two dataframes employees_df and salaries_df. Ensure that these dataframes are merged before calling the plot_salaries skill, as the skill expects a merged dataframe.

  6. The chat method is called with a hardcoded string. If this is a user input, ensure that it is sanitized and validated before use.

pandasai/helpers/skills_manager.py (1)
  • 96-98: The setter for used_skills allows replacing the entire list of used skills. If this is not intended, consider removing the setter to make used_skills read-only.
-    @used_skills.setter
-    def used_skills(self, value):
-        self._used_skills = value
pandasai/agent/__init__.py (2)
  • 1-7: The import statement for the skill decorator is added. This is used in the new add_skills method in the Agent class.

  • 45-52: The add_skills method is introduced to add skills to the Agent instance. This method simply forwards the skills to the SmartDatalake instance. Ensure that the add_skills method in the SmartDatalake class is implemented correctly and handles the skills appropriately.

pandasai/helpers/code_manager.py (5)
  • 208-220: The execute_code method now takes a CodeExecutionContext object as a parameter instead of a prompt_id. This change should be verified across the codebase to ensure that all calls to this method have been updated accordingly.

  • 238-241: The execute_code method now resets the used skills before executing the code. This is a good practice as it ensures that the skills are reset before each code execution, preventing potential issues with stale or incorrect skills being used.

  • 258-262: The execute_code method now adds the used skills to the environment before executing the code. This is a good practice as it ensures that the skills are available in the environment during code execution.

  • 411-414: The _clean_code method now takes a CodeExecutionContext object as a parameter. This change should be verified across the codebase to ensure that all calls to this method have been updated accordingly.

  • 439-447: The _clean_code method now checks for used skills inside the analyze_data function. This is a good practice as it ensures that the skills used in the function are correctly identified and tracked.

pandasai/skills/__init__.py (2)
  • 17-21: The func_def attribute is a string representation of the function definition. However, it's not clear what the purpose of this attribute is. If it's for debugging or logging, it might be better to use inspect.getsource(skill_function) to get the actual source code of the function, which would include any comments or other details in the function body. If it's for some other purpose, please clarify.

  • 23-37: The print attribute is a string representation of the function's information. However, it's not clear what the purpose of this attribute is. If it's for debugging or logging, it might be better to implement a __str__ or __repr__ method on the function object itself, which would be automatically called when the function is printed or converted to a string. If it's for some other purpose, please clarify.

tests/test_codemanager.py (5)
  • 1-6: The import of uuid has been removed and MagicMock has been added. Ensure that uuid is not used elsewhere in the code or it will cause a NameError. The addition of MagicMock is fine if it's used for mocking objects in the tests.

  • 10-16: The import of CodeExecutionContext has been added. This is fine if it's used in the tests.

  • 71-86: The exec_context fixture has been added and is used in the test_run_code_for_calculations and test_run_code_invalid_code tests. This is a good practice as it allows for better control and isolation of the test environment.

  • 108-164: The exec_context fixture is used in the test_clean_code_removes_jailbreak_code, test_clean_code_remove_environment_defaults, test_clean_code_whitelist_import, test_clean_code_raise_bad_import_error, and test_remove_dfs_overwrites tests. This is a good practice as it allows for better control and isolation of the test environment.

  • 189-199: > Note: This review was outside of the patch, so it was mapped to the patch with the greatest overlap. Original lines [178-199]

The exec_context fixture is used in the test_custom_whitelisted_dependencies test. This is a good practice as it allows for better control and isolation of the test environment.

pandasai/smart_datalake/__init__.py (5)
  • 23-31: The import of SkillsManager and skill from pandasai.helpers.skills_manager and pandasai.skills respectively, indicates the introduction of a new feature to manage skills. Ensure that these modules exist and are correctly implemented.

  • 41-47: The import of CodeExecutionContext from ..helpers.code_manager suggests that the code execution context is now being managed separately. This could improve the modularity and maintainability of the code.

  • 57-63: The addition of _skills: SkillsManager as a class attribute in the SmartDatalake class indicates that skills are now being managed at the instance level. This could enhance the functionality of the class by allowing it to handle skills.

  • 102-109: The instantiation of SkillsManager in the SmartDatalake constructor suggests that a new SkillsManager object is created for each instance of SmartDatalake. This could improve the encapsulation and modularity of the code.

  • 358-367: The creation of a CodeExecutionContext object in the _execute_code method and passing it to the execute_code method of CodeManager could improve the modularity and maintainability of the code by separating the execution context from the CodeManager. However, ensure that the execute_code method of CodeManager has been updated to accept a CodeExecutionContext object instead of a prompt_id.

docs/skills.md Outdated Show resolved Hide resolved
docs/skills.md Outdated Show resolved Hide resolved
Comment on lines +45 to +59
def get_skill_by_func_name(self, name: str):
"""
Get a skill by its name.

Args:
name (str): The name of the skill to retrieve.

Returns:
Skill or None: The skill with the given name, or None if not found.
"""
for skill in self._skills:
if skill.name == name:
return skill

return None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The get_skill_by_func_name method could be simplified and made more efficient by using a dictionary to store skills by their names, as suggested above. Here's a suggested change:

-    def get_skill_by_func_name(self, name: str):
-        for skill in self._skills:
-            if skill.name == name:
-                return skill
-        return None
+    def get_skill_by_func_name(self, name: str):
+        return self._skills.get(name)

pandasai/helpers/code_manager.py Show resolved Hide resolved
Comment on lines 4 to 41
# The @skill decorator to define and register a skill
def skill(name: str, description: str, usage: str):
def decorator(skill_function):
def wrapped_function(*args, **kwargs):
return skill_function(*args, **kwargs)

wrapped_function.name = skill_function.__name__
wrapped_function.skill_info = {
"name": name,
"description": description,
"usage": usage,
}
wrapped_function.func_def = (
"""def pandasai.skills.{funcion_name}{signature}""".format(
funcion_name=wrapped_function.name,
signature=str(inspect.signature(skill_function)),
)
)

wrapped_function.print = (
"""
<function>
Name: {name}
Description: {description}
Usage: {usage}
{signature}
</function
"""
).format(
name=name,
description=description,
usage=usage,
signature=wrapped_function.func_def,
)

return wrapped_function

return decorator
Copy link
Contributor

@coderabbitai coderabbitai bot Oct 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The skill decorator is well implemented. It wraps the original function and adds additional attributes to it. However, there's a minor typo in the closing tag of the function in the print attribute. It should be </function> instead of </function.

</function
</function>

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ArslanSaleem there's a small typo here!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like humans are talking. I'll be quiet for now. If you need me again, just tag @coderabbitai in a new comment.

Comment on lines 206 to 221
def test_add_skills_with_agent(self, agent: Agent):
# Define skills using the decorator
@skill("SkillA", "Description A", "Usage A")
def skill_a(*args, **kwargs):
return "SkillA Result"

@skill("SkillB", "Description B", "Usage B")
def skill_b(*args, **kwargs):
return "SkillB Result"

agent.add_skills(skill_a)
assert len(agent._lake._skills.skills) == 1

agent._lake._skills._skills = []
agent.add_skills(skill_a, skill_b)
assert len(agent._lake._skills.skills) == 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test test_add_skills_with_agent checks if the add_skills method of the Agent class correctly adds skills. However, it does not check if the method correctly handles the case when the same skill is added twice. It would be better to also check for this case to ensure that the method does not add the same skill twice.

+        # Test that adding the same skill again does not increase the number of skills
+        agent.add_skills(skill_a)
+        assert len(agent._lake._skills.skills) == 1

Comment on lines 223 to 238
def test_add_skills_with_smartDataframe(self, smart_dataframe: SmartDataframe):
# Define skills using the decorator
@skill("SkillA", "Description A", "Usage A")
def skill_a(*args, **kwargs):
return "SkillA Result"

@skill("SkillB", "Description B", "Usage B")
def skill_b(*args, **kwargs):
return "SkillB Result"

smart_dataframe.add_skills(skill_a)
assert len(smart_dataframe._lake._skills.skills) == 1

smart_dataframe._lake._skills._skills = []
smart_dataframe.add_skills(skill_a, skill_b)
assert len(smart_dataframe._lake._skills.skills) == 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test test_add_skills_with_smartDataframe checks if the add_skills method of the SmartDataframe class correctly adds skills. However, it does not check if the method correctly handles the case when the same skill is added twice. It would be better to also check for this case to ensure that the method does not add the same skill twice.

+        # Test that adding the same skill again does not increase the number of skills
+        smart_dataframe.add_skills(skill_a)
+        assert len(smart_dataframe._lake._skills.skills) == 1

Comment on lines 318 to 350
def test_code_exec_with_skills_no_use(
self, code_manager: CodeManager, exec_context: MagicMock
):
code = """def analyze_data(dfs):
return {'type': 'number', 'value': 1 + 1}"""
skill1 = MagicMock()
skill1.name = "SkillA"
exec_context._skills_manager._skills = [skill1]
code_manager.execute_code(code, exec_context)
assert len(exec_context._skills_manager.used_skills) == 0

def test_code_exec_with_skills(self, code_manager: CodeManager):
code = """def analyze_data(dfs):
plot_salaries()
return {'type': 'number', 'value': 1 + 1}"""

@skill(
name="Display employee salary",
description="Plots the employee salaries against names",
usage="Displays the plot having name on x axis and salaries on y axis",
)
def plot_salaries() -> str:
return "plot_salaries"

code_manager._middlewares = []

sm = SkillsManager()
sm.add_skills(plot_salaries)
exec_context = CodeExecutionContext(uuid.uuid4(), sm)
code_manager.execute_code(code, exec_context)

assert len(exec_context._skills_manager.used_skills) == 1
assert exec_context._skills_manager.used_skills[0] == "plot_salaries"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test test_code_exec_with_skills checks if the execute_code method of the CodeManager class correctly adds used skills. However, it does not check if the method correctly handles the case when the same skill is used twice in the code. It would be better to also check for this case to ensure that the method does not add the same skill twice to the used skills.

+        # Test that using the same skill twice in the code does not increase the number of used skills
+        code = """def analyze_data(dfs):
+            plot_salaries()
+            plot_salaries()
+            return {'type': 'number', 'value': 1 + 1}"""
+        code_manager.execute_code(code, exec_context)
+        assert len(exec_context._skills_manager.used_skills) == 1
+        assert exec_context._skills_manager.used_skills[0] == "plot_salaries"

Comment on lines 196 to 207
"""
self._code_manager.add_middlewares(*middlewares)

def add_skills(self, *skills: List[skill]):
"""
Add Skills to PandasAI
"""
self._skills.add_skills(*skills)

def _start_timer(self):
"""Start the timer"""

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The addition of the add_skills method allows for the addition of skills to the SmartDatalake instance. This could enhance the functionality of the class by allowing it to handle skills. However, the type hint for the skills parameter should be skill instead of List[skill] as the * operator is used, which unpacks the arguments from a list.

-    def add_skills(self, *skills: List[skill]):
+    def add_skills(self, *skills: skill):

pandasai/smart_datalake/__init__.py Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 5

Commits Files that changed from the base of the PR and between af4ce21 and da513d0.
Files selected for processing (1)
  • pandasai/helpers/skills_manager.py (1 hunks)

Comment on lines +18 to +32
def add_skills(self, *skills):
"""
Add skills to the list of skills. If a skill with the same name
already exists, raise an error.

Args:
*skills: Variable number of skill objects to add.
"""
for skill in skills:
if any(
existing_skill.name == skill.name for existing_skill in self._skills
):
raise ValueError(f"Skill with name '{skill.name}' already exists.")

self._skills.extend(skills)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The add_skills method currently raises an error if a skill with the same name is added. This could be problematic if the user wants to update an existing skill. Consider providing an option to overwrite an existing skill.

-                raise ValueError(f"Skill with name '{skill.name}' already exists.")
+                existing_skill = next((s for s in self._skills if s.name == skill.name), None)
+                if existing_skill:
+                    if overwrite:
+                        self._skills.remove(existing_skill)
+                    else:
+                        raise ValueError(f"Skill with name '{skill.name}' already exists.")

Comment on lines 44 to 47
for skill in self._skills:
print(skill)

return any(skill.name == name for skill in self._skills)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The print statement on line 45 seems to be for debugging purposes and should be removed for the final version of the code.

-            print(skill)

pandasai/helpers/skills_manager.py Show resolved Hide resolved
Comment on lines +81 to +94
def prompt_display(self) -> str:
"""
Displays skills for prompt
"""
if len(self._skills) == 0:
return

return (
"""
You can also use the following functions, if relevant:

"""
+ self.__str__()
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The prompt_display method returns None if there are no skills. This could potentially cause issues if the return value is expected to be a string. Consider returning an empty string or a message indicating that there are no skills.

-            return
+            return "No skills available."

pandasai/helpers/skills_manager.py Show resolved Hide resolved
@gventuri gventuri changed the base branch from main to release/v1.4 October 18, 2023 19:32
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 18, 2023

CodeRabbit review skipped

By default, CodeRabbit only reviews PRs on the default branch. If you wish to have PRs reviewed on additional branches, you can configure that under the repository settings in the UI.

Copy link
Collaborator

@gventuri gventuri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few minor changes, great job!

docs/skills.md Outdated
description="Plots the employee salaries against names",
usage="Displays the plot having name on x axis and salaries on y axis",
)
def plot_salaries(merged_df: pd.DataFrame) -> str:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably show a less rigid implementation. What I have in mind is more something that lets the LLM orchestrate and pass the variable accordingly.

For example:

def plot_salaries(names: ..., salaries: ...) -> ...:
    plt.bar(names, salaries)
    plt.xlabel("Employee Name")
    plt.ylabel("Salary")
    plt.title("Employee Salaries")
    plt.xticks(rotation=45)
    plt.savefig("temp_chart.png")
    plt.close()

So that basically is the LLM that figures out which data to pass to the skill. The skill shouldn't be specific for one use case/df, but a more "generalist" function that can be used in many similar use cases, letting the LLM orchestrate and figuring out which one is the right one to use (if any).

bool: True if a skill with the given name exists, False otherwise.
"""
for skill in self._skills:
print(skill)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leftover?

Comment on lines 4 to 41
# The @skill decorator to define and register a skill
def skill(name: str, description: str, usage: str):
def decorator(skill_function):
def wrapped_function(*args, **kwargs):
return skill_function(*args, **kwargs)

wrapped_function.name = skill_function.__name__
wrapped_function.skill_info = {
"name": name,
"description": description,
"usage": usage,
}
wrapped_function.func_def = (
"""def pandasai.skills.{funcion_name}{signature}""".format(
funcion_name=wrapped_function.name,
signature=str(inspect.signature(skill_function)),
)
)

wrapped_function.print = (
"""
<function>
Name: {name}
Description: {description}
Usage: {usage}
{signature}
</function
"""
).format(
name=name,
description=description,
usage=usage,
signature=wrapped_function.func_def,
)

return wrapped_function

return decorator
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ArslanSaleem there's a small typo here!

@codecov-commenter
Copy link

Codecov Report

❗ No coverage uploaded for pull request base (release/v1.4@710ba3a). Click here to learn what that means.
The diff coverage is n/a.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

@@               Coverage Diff               @@
##             release/v1.4     #653   +/-   ##
===============================================
  Coverage                ?   84.94%           
===============================================
  Files                   ?       70           
  Lines                   ?     3507           
  Branches                ?        0           
===============================================
  Hits                    ?     2979           
  Misses                  ?      528           
  Partials                ?        0           

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@gventuri gventuri merged commit e711c89 into release/v1.4 Oct 21, 2023
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants