Updates to Performance #346

dillonalaird · 2025-01-21T22:38:09Z

Several Updates:

Updated coding prompt to return only final answer and not additional information
If plan_context contains code, then coder will not write code, else coder will write code
get_tool_for_task can take in dictionary of images to differentiate different images (helps with choosing flux as a tool)
Update vision_agent_v2 prompt to not try to fix code on user's behalf
Parallelizes temporal_localization
Add inpainting and temporal localization to categorization request agent

camiloaz

one question, but LGTM

camiloaz · 2025-01-22T00:22:15Z

vision_agent/agent/vision_agent_planner_prompts_v2.py

+- "temporal localization" - localizing the time period an event occurs in a video.
+- "inpainting" - filling in masked parts of an image.


camiloaz · 2025-01-22T00:23:10Z

vision_agent/tools/planner_tools.py

@@ -248,6 +250,7 @@ def get_tool_for_task(
        - VQA
        - Depth and pose estimation
        - Video object tracking
+        - Image inpainting


what about temporal localization?

Oh good catch

hrnn · 2025-01-22T02:07:12Z

vision_agent/agent/vision_agent_planner_prompts_v2.py

@@ -651,22 +653,24 @@ def process_florence2_sam2_video_tracking(frames):
 """

 FINALIZE_PLAN = """
-**Role**: You are an expert AI model that can understand the user request and construct plans to accomplish it.
-
 **Task**: You are given a chain of thoughts, python executions and observations from a planning agent as it tries to construct a plan to solve a user request. Your task is to summarize the plan it found so that another programming agnet to write a program to accomplish the user request.


nit: not related to the pr

Suggested change

**Task**: You are given a chain of thoughts, python executions and observations from a planning agent as it tries to construct a plan to solve a user request. Your task is to summarize the plan it found so that another programming agnet to write a program to accomplish the user request.

**Task**: You are given a chain of thoughts, python executions and observations from a planning agent as it tries to construct a plan to solve a user request. Your task is to summarize the plan it found so that another programming agent to write a program to accomplish the user request.

Good catch!

dillonalaird added 12 commits January 20, 2025 12:15

add temporal localization to query

e785a90

reformatting

3bf808a

run temporal localization in parallel

5c7b32c

fix prompt for coding

6cfdcf6

update sim tools docs

849ff62

coder now generates code if no code provided in plan

214aff2

fix type checks

f03b2eb

fix spelling error

711af25

don't need role for o1

73634a4

allow passing dictionary to choose tool

5bdd211

add inpainting

94be334

fix side case for prompts

8006f3f

dillonalaird requested review from MingruiZhang and hrnn January 21, 2025 22:49

MingruiZhang approved these changes Jan 21, 2025

View reviewed changes

camiloaz approved these changes Jan 22, 2025

View reviewed changes

added temp localization

79c2b29

hrnn reviewed Jan 22, 2025

View reviewed changes

fix typo

ecbb9de

dillonalaird merged commit 8439354 into main Jan 22, 2025
8 checks passed

dillonalaird deleted the update-performance branch January 22, 2025 03:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updates to Performance #346

Updates to Performance #346

dillonalaird commented Jan 21, 2025 •

edited

Loading

camiloaz left a comment •

edited

Loading

camiloaz Jan 22, 2025

camiloaz Jan 22, 2025

dillonalaird Jan 22, 2025

hrnn Jan 22, 2025

dillonalaird Jan 22, 2025

		- "temporal localization" - localizing the time period an event occurs in a video.
		- "inpainting" - filling in masked parts of an image.

	Task: You are given a chain of thoughts, python executions and observations from a planning agent as it tries to construct a plan to solve a user request. Your task is to summarize the plan it found so that another programming agnet to write a program to accomplish the user request.
	Task: You are given a chain of thoughts, python executions and observations from a planning agent as it tries to construct a plan to solve a user request. Your task is to summarize the plan it found so that another programming agent to write a program to accomplish the user request.

Updates to Performance #346

Updates to Performance #346

Conversation

dillonalaird commented Jan 21, 2025 • edited Loading

camiloaz left a comment • edited Loading

Choose a reason for hiding this comment

camiloaz Jan 22, 2025

Choose a reason for hiding this comment

camiloaz Jan 22, 2025

Choose a reason for hiding this comment

dillonalaird Jan 22, 2025

Choose a reason for hiding this comment

hrnn Jan 22, 2025

Choose a reason for hiding this comment

dillonalaird Jan 22, 2025

Choose a reason for hiding this comment

dillonalaird commented Jan 21, 2025 •

edited

Loading

camiloaz left a comment •

edited

Loading