refactor(Pipelines) : Smart Data Frame Pipeline #735

milind-sinaptik · 2023-11-07T13:17:36Z

closes #xxxx (Replace xxxx with the GitHub issue number)
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.

coderabbitai · 2023-11-07T13:17:41Z

Important

Auto Review Skipped

Auto reviews are disabled on base/target branches other than the default branch. Please add the base/target branch pattern to the list of additional branches to be reviewed in the settings.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository.

To trigger a single review, invoke the @coderabbitai review command.

Tips

Chat with CodeRabbit Bot (`@coderabbitai`)

If you reply to a review comment from CodeRabbit, the bot will automatically respond.
To engage with CodeRabbit bot directly around the specific lines of code in the PR, mention @coderabbitai in your review comment
Note: Review comments are made on code diffs or files, not on the PR overview.
Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Commands (invoked as PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai help to get help.

Note: For conversation with the bot, please use the review comments on code diffs or files.

CodeRabbit Configration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
The JSON schema for the configuration file is available here.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.v2.json

Co-authored-by: Sourcery AI <>

gventuri · 2023-11-07T23:03:38Z

pandasai/smart_datalake/__init__.py

+            "output_type_helper", output_type_helper
+        )
+        pipeline_context.add_intermediate_value("viz_lib_helper", viz_lib_helper)
+        pipeline_context.add_intermediate_value("last_reasoning", self._last_reasoning)


No need to pass this info to the pipeline, as it will only be set and never read within the pipeline

Same for last_answer, last_code_generated

used in pandasai/smart_datalake/code_generator.py

only last_answer and last_reasoning are never used, getting rid of it

gventuri · 2023-11-07T23:06:32Z

pandasai/smart_datalake/__init__.py

+            "last_code_generated", self._last_code_generated
+        )
+        pipeline_context.add_intermediate_value("get_prompt", self._get_prompt)
+        pipeline_context.add_intermediate_value("llm", self.llm)


The llm can be found within the config, no need to pass it as an additional value.

fixing this

gventuri · 2023-11-07T23:43:47Z

pandasai/smart_datalake/__init__.py

+        pipeline_context.add_intermediate_value("code_manager", self._code_manager)
+        pipeline_context.add_intermediate_value(
+            "response_parser", self._response_parser
+        )


I suggest we move the full logic of code_manager and response_parser in the logic unit that is more appropriate. If it is used elsewhere, we still replicate the whole logic within the related logic unit.

there is not a lot of logic here, only passing the parser object here, which in turn does parsing with method parse. Have a look at result_parsing.py, whole logic is moved there.

@milind-sinaptik pandasai/pipelines/logic_units/output_logic_unit.py check this how we moved ResponseParser.

…as-ai into improvement/refactor_chat

ArslanSaleem · 2023-11-08T11:22:16Z

pandasai/smart_datalake/generate_smart_datalake_pipeline.py

+            context=context,
+            logger=logger,
+            steps=[
+                CodeGenerator(),


@milind-sinaptik can we add one more logic unit before code generator for generate prompt and use one that is already there for run execute llm? You can check the example of already implemented pipeline of sythetic dataframe.

ArslanSaleem · 2023-11-08T11:25:13Z

pandasai/smart_datalake/__init__.py

+        pipeline_context.add_intermediate_value(
+            "output_type_helper", output_type_helper
+        )
+        pipeline_context.add_intermediate_value("viz_lib_helper", viz_lib_helper)


@milind-sinaptik we don't these intermediate values to be stored only some might needed but viz_lib is coming from the config which is already present. These helpers can be called in inside logic unit.

ArslanSaleem · 2023-11-08T11:27:57Z

pandasai/smart_datalake/__init__.py

-
-                self.last_result = result
-                self.logger.log(f"Answer: {result}")
+            result = GenerateSmartDatalakePipeline(pipeline_context, self.logger).run()


@milind-sinaptik i would say why not pass query and output_type as dict to the pipeline.run(input). This way we can reduce the use of add_intermediate_value value usage.

ArslanSaleem · 2023-11-08T11:29:46Z

pandasai/smart_datalake/__init__.py

+        pipeline_context.add_intermediate_value(
+            "last_code_generated", self._last_code_generated
+        )
+        pipeline_context.add_intermediate_value("get_prompt", self._get_prompt)


@milind-sinaptik This functionality of get_prompt can be moved to logic unit that take cares of prompt generation. This way we can completely decouple it from the SmartDatalake.

…andas-ai ; /usr/bin/env /Users/milindlalwani/anaconda3/envs/pandas-ai/bin/python /Users/milindlalwani/.vscode/extensions/ms-python.python-2023.20.0/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher 59121 -- /Users/milindlalwani/pandas-ai/examples/from_csv.py

…a Smart Lake pipeline

…ri/pandas-ai into HEAD

Co-authored-by: Sourcery AI <>

ArslanSaleem · 2023-11-10T10:49:53Z

pandasai/pipelines/pipeline.py

@@ -79,6 +78,10 @@ def run(self, data: Any = None) -> Any:
        try:
            for index, logic in enumerate(self._steps):
                self._logger.log(f"Executing Step {index}: {logic.__class__.__name__}")
+
+                if logic.skip_if is not None and logic.skip_if(self._context):
+                    continue


@milind-sinaptik add log if skip for the debugging purpose

ArslanSaleem · 2023-11-10T12:31:47Z

pandasai/smart_datalake/__init__.py

@@ -628,11 +439,25 @@ def _validate_output(self, result: dict, output_type: Optional[str] = None):
            )
            raise ValueError("Output validation failed")

-    def _get_output_type_hint(self, output_type: Optional[str]) -> str:
+    def _get_viz_library_type(self) -> str:


@milind-sinaptik I think we don't this function, it's not used

this was not added by me

ArslanSaleem · 2023-11-10T12:35:02Z

pandasai/smart_datalake/__init__.py

+            query_exec_tracker=self._query_exec_tracker,
+        )
+        pipeline_context.add_intermediate_value("is_present_in_cache", False)
+        pipeline_context.add_intermediate_value(


@milind-sinaptik I think this output_type_helper is not necessary to be passed. We simply store Input = { query: str, output_type: str} and whichever logic unit it's needed we get from factory there.

query is already in the memory. The reason we are not providing as input is that we will be providing output type to all seven stages but only one needs it. It would be a bit un-necessary and output of one stage is input of other. Then we will have to manipulate the output of each stage as well.

Ok then instead of passing output_type_helper store raw output_type from the chat method and get it output_type_factory where is used

ArslanSaleem · 2023-11-10T12:35:55Z

pandasai/smart_datalake/__init__.py

+        )
+        pipeline_context.add_intermediate_value("get_prompt", self._get_prompt)
+        pipeline_context.add_intermediate_value("last_prompt_id", self.last_prompt_id)
+        pipeline_context.add_intermediate_value("skills", self._skills)


PipelineContext have skills in the constructor can be get from there.

ArslanSaleem · 2023-11-10T12:42:10Z

pandasai/smart_datalake/__init__.py

+            "last_code_generated", self._last_code_generated
+        )
+        pipeline_context.add_intermediate_value("get_prompt", self._get_prompt)
+        pipeline_context.add_intermediate_value("last_prompt_id", self.last_prompt_id)


last prompt id i think it can be added in last prompt id. We need to store thing in right place and i would let's completely remove last_prompt_id from SmartDatalake and initialize in QueryTracker -> start_new_track method.

ArslanSaleem · 2023-11-10T12:45:30Z

pandasai/smart_datalake/__init__.py

+        pipeline_context.add_intermediate_value("get_prompt", self._get_prompt)
+        pipeline_context.add_intermediate_value("last_prompt_id", self.last_prompt_id)
+        pipeline_context.add_intermediate_value("skills", self._skills)
+        pipeline_context.add_intermediate_value("code_manager", self._code_manager)


We can move code manager completely to logic unit of code execution becaus it is need only there and remove it completely from SmartDataLake

ArslanSaleem · 2023-11-10T12:51:23Z

pandasai/smart_datalake/__init__.py


-    def _add_result_to_memory(self, result: dict):
+        pipeline_context = PipelineContext(


@milind-sinaptik I would say construct PipelineContext and Pipeline in the constructor of SmartDatalake. And in the chat method use it like self._pipeline.run({query:str, output})

ArslanSaleem · 2023-11-10T12:55:33Z

pandasai/pipelines/smart_datalake_chat/code_execution.py

+            "code": code,
+            "error_returned": e,
+        }
+        error_correcting_instruction = context.get_intermediate_value("get_prompt")(


@milind-sinaptik why not move get prompt function here in that function and set_vars that are necessary for CorrectErrorPrompt.

I tried moving this but there are some test which start failing as move this function. Putting aside this for now.

ArslanSaleem · 2023-11-10T13:00:12Z

pandasai/pipelines/smart_datalake_chat/result_parsing.py

+        self._add_result_to_memory(result=result, context=pipeline_context)
+
+        result = pipeline_context.query_exec_tracker.execute_func(
+            pipeline_context.get_intermediate_value("response_parser").parse, result


@milind-sinaptik ResponseParser Object can be constructed here

refactor(Pipelines) : Smart Data Frame Pipeline

1c06347

sourcery-ai bot mentioned this pull request Nov 7, 2023

refactor(Pipelines) : Smart Data Frame Pipeline (Sourcery refactored) #736

Merged

'Refactored by Sourcery' (#736)

385db0d

Co-authored-by: Sourcery AI <>

gventuri requested a review from ArslanSaleem November 7, 2023 22:59

gventuri reviewed Nov 8, 2023

View reviewed changes

Milind Lalwani added 2 commits November 8, 2023 11:13

refactor(Pipelines) : made changes according to PR review

3d30afb

Merge 'improvement/refactor_chat' of https://github.com/gventuri/pand…

51e4142

…as-ai into improvement/refactor_chat

ArslanSaleem reviewed Nov 8, 2023

View reviewed changes

refactor(Pipelines) : Unit test cases added

55f9700

sourcery-ai bot mentioned this pull request Nov 8, 2023

refactor(Pipelines) : Smart Data Frame Pipeline (Sourcery refactored) #740

Merged

Milind Lalwani and others added 6 commits November 8, 2023 16:34

refactor(Pipelines) : Broken Test Cases Fixed

bdeeb9f

refactor(Pipelines) : Skip Logic added and More Steps created for Dat…

1cb977b

…a Smart Lake pipeline

Merge branch 'improvement/refactor_chat' of https://github.com/gventu…

09386c4

…ri/pandas-ai into HEAD

'Refactored by Sourcery' (#740)

69fa4be

Co-authored-by: Sourcery AI <>

refactor: move pipeline logic unit from sdf to pipelines folder

d5e9e03

gventuri requested a review from ArslanSaleem November 9, 2023 23:40

milind-sinaptik changed the base branch from release/v1.5 to main November 10, 2023 09:35

milind-sinaptik changed the base branch from main to release/v1.5 November 10, 2023 09:57

Merge branch 'release/v1.5' into improvement/refactor_chat

d6a6ed6

ArslanSaleem reviewed Nov 10, 2023

View reviewed changes

refactor(Pipelines) : Merge Comflicts Fixed

9b209ce

ArslanSaleem reviewed Nov 10, 2023

View reviewed changes

gventuri added 2 commits November 12, 2023 17:52

build: fix .lock file

3f3fc61

Merge branch 'release/v1.5' into improvement/refactor_chat

327b8e0

gventuri merged commit 5dd26ba into release/v1.5 Nov 12, 2023
9 checks passed


		def _add_result_to_memory(self, result: dict):
		pipeline_context = PipelineContext(

refactor(Pipelines) : Smart Data Frame Pipeline #735

refactor(Pipelines) : Smart Data Frame Pipeline #735

Conversation

milind-sinaptik commented Nov 7, 2023

coderabbitai bot commented Nov 7, 2023 • edited Loading

Auto Review Skipped

Chat with CodeRabbit Bot (@coderabbitai)

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configration File (.coderabbit.yaml)

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

milind-sinaptik Nov 8, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

milind-sinaptik Nov 8, 2023 • edited Loading

Choose a reason for hiding this comment

ArslanSaleem Nov 10, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ArslanSaleem Nov 10, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ArslanSaleem Nov 10, 2023 • edited Loading

Choose a reason for hiding this comment

coderabbitai bot commented Nov 7, 2023 •

edited

Loading

Chat with CodeRabbit Bot (`@coderabbitai`)

CodeRabbit Configration File (`.coderabbit.yaml`)

milind-sinaptik Nov 8, 2023 •

edited

Loading

milind-sinaptik Nov 8, 2023 •

edited

Loading

ArslanSaleem Nov 10, 2023 •

edited

Loading

ArslanSaleem Nov 10, 2023 •

edited

Loading

ArslanSaleem Nov 10, 2023 •

edited

Loading