AI Agents #91

QingyaFan · 2024-11-25T06:52:01Z

什么是 AI Agent

AI Agent 是可以感知环境，并且使用 LLM 和工具来执行各种任务和功能。其可通过外部信息源来克服 LLM 的一些限制，可以计划和执行需要多个步骤或子任务的复杂操作。 ¹

分类

2005 年，Russell and Norvig's 基于感知和执行能力的强弱将 Agents 分为 5 类：¹ （由于时间较早，而当前的agents 进展较快，分类标准不一定适用，这里不做过多解释）

Simple reflex agents
Model-based reflex agents
Goal-based agents
Utility-based agents
Learning agents

2013 年，Weiss 根据 decision-making 的实现方式定义了 4 类 Agents：

Logic-based agents：通过逻辑演绎来决定下一步的执行动作
Reactive agents：预定义了环境情形和执行动作的映射关系
Belief-desire-intention agents：通过操作desire的数据结构（定义不是很明确）
Layered architectures：多层次 decision-making，每个层次都有特定的推理职责

当前，LangGraph 根据 LLM 对流程控制程度不同，LangGraph 将 Agent 大致分为了三类：Router、Tool calling agent、Custom agent architectures。²

Router: 只做一个decision
Tool calling agent：相对于router，有多个步骤进行 decision making，可以在多个tools 中选择合适的去完成任务。这类有三个典型能力特征：Planning、Tool Use、Memory。ReAct 便是一个典型的实现。使用 LangGraph 的 create_react_agent 可以低成本地实现一个这类agent：创建一个和LLM写作并利用tool calling能力的 graph 工作流。³
Custom agent：人为参与流程、并行、子图、Reflection等

ReAct Agent

ReAct 是 Reasoning and Acting，结合了推理和行动能力，模拟人类在解决问题和执行任务时的思维和行动过程。例如，当用户提出一个问题时，ReAct agent 不仅生成文本答案，还可能根据需求自动执行相关操作，如查询天气、预定餐厅等。

实现

在 ReAct 提出时，还没有 Function/Tool Calling 能力，当前，结合 tool-calling 可以很方便地实现 ReAct。

应用场景

智能客服：提供更加智能和人性化的客户支持，能够理解复杂的问题并执行相应的解决步骤。
个人助理：帮助用户管理日常事务，如日程安排、提醒、信息查询等。
自动化办公：辅助完成诸如数据整理、报告生成、邮件回复等办公任务，提高工作效率。
教育辅导：作为智能导师，帮助学生解答问题、提供学习建议和资源。

协议或规范

大家都在讲多 agent 协作完成复杂任务，而协作就得有协议，目前还没有统一的标准，agent-protocol 算是比较多认可的一个协议。它定义了一组API，实现的Agent只需对外提供对应API，并符合规范定义的输入和输出即可。 ⁴ 主要的 API 有：

POST /ap/v1/agent/tasks, 创建 task
POST /ap/v1/agent/tasks/{id}/steps, 触发 task 的下一步

当前 agent-protocol 提供了一些 SDK，主要是 Python、JavaScript/TypeScript 的实现。

使用场景

The scope of possible use cases for agents is vast and ever-expanding.

LlamaIndex也认为Agents 使用场景广泛且使用场景在不断扩大，并且举了几个例子：

Agentic RAG: Agentic RAG是一个基于个性化数据的上下文增强（context-augmented）的研究助理，不仅能回答简单问题，还能辅助复杂的研究任务。⁵ RAG是 Retrieval-Augmented Generation，能对大型语言模型输出进行优化，使其能够在生成响应之前引用训练数据来源之外的权威知识库。LLM 技术的本质在 LLM 响应中引入了不可预测性。此外，LLM 训练数据是静态的，并引入了其所掌握知识的截止日期。RAG是解决其中一些挑战的一种方法。⁶
Report Generation
Customer Support
SQL Agent

现有 Agent 实现

实现的要点

支持推理循环工作流
Tool封装规范

OpenAI Assistant API

OpenAI 的 Assistant API 可以利用 model、tool、file 来响应用户的 Query。当前（2024.12）Assistant API 支持三种类型的 tool：Code Interpreter、File Search、Function Calling. ⁷

Quick start code. ⁸

from openai import OpenAI

client = OpenAI

assistant = client.beta.assistants.create(
    name="Math Tutor",
    instructions="You are a personal math tutor. Write and run code to answer math questions.",
    tools=[{"type": "code_interpreter"}],
    model="gpt-4o",
)

thread = client.beta.threads.create()

message = client.beta.threads.messages.create(
    thread_id=thread.id,
    role="user",
    content="I need to solve the equation `3x + 11 = 14`. Can you help me?"
)

Assistant 表示可以响应用户对话的助手，可以通过 model、instructions 和 tools 配置其使用的LLM、Prompt和工具（Function Calling）
Thread 表示用户与一个或多个Assistants的会话，用户开启一个对话时需要初始化一个Thread对象
Message 表示一次 chat，一个Thread最多 100,000 Message

LlamaIndex Agents

LlamaIndex对Agents的定义：⁵

LlamaIndex 当前支持三类 agent：

Function Calling Agent（integrates with any function calling LLM）
ReAct agent (works across any chat/text completion endpoint)
"Advanced Agents": LLMCompiler, Chain-of-Abstraction, Language Agent Tree Search, and more.

from llama_index.core.tools import FunctionTool
from llama_index.llms.ollama import Ollama
from llama_index.core.agent import ReActAgent
from llama_index.core import Settings
from llm_service import llm

Settings.llm = llm

def multiply(a: int, b: int) -> int:
    return a * b

multiply_tool = FunctionTool.from_defaults(fn=multiply)

agent = ReActAgent.from_tools([multiply_tool], llm=llm, verbose=True)

接下来agent就可以与你对话了：

agent.chat("What is 2123 * 215123")

# 输出
> Running step 98361e05-06fa-4c53-9bb6-833c0f44a24b. Step input: what is 2123 * 215123
Thought: The user wants me to multiply two numbers, let's say A = 2123 and B = 215123. I will use the multiply tool from the previous step.
Action: multiply
Action Input: {'a': 2123, 'b': 215123}
Observation: 456706129
> Running step c457ef99-9d2e-44ef-95b3-bc199494277a. Step input: None
Thought: (Implicit) I can answer without any more tools!
Answer: Answer: The product of the numbers 2123 and 215123 is 456706129.

Lower-Level API

上面用到的 ReActAgent 是 AgentRunner 与 AgentWorker 交互的封装，

todo：Lower-Agent Guide ⁹

自定义Agent

自定义一个Agent最简单的方式是定义一个 stateful function，并用 FnAgentWorker 包装一下。

def multiply_agent_fn(state: dict) -> tuple[dict[str, any], bool]:
    """mock agent input function"""
    if "max_count" not in state:
        raise ValueError("max_count must be specified.")
    
    if "__output__" not in state:
        state["__output__"] = int(state["__task__"].input)
        state["count"] = 0
    else:
        state["__output__"] = state["__output__"] * 2
        state["count"] += 1
    
    is_done = state["count"] >= state["max_count"]

    return state, is_done

from llama_index.core.agent import FnAgentWorker

agent = FnAgentWorker(
    fn=multiply_agent_fn, initial_state={"max_count": 5}
).as_agent()

agent.query("5")

自定义Agent的例子

todo: 数据库查询Agent ¹⁰，可以模仿来创建各个场景的Agent，例如查询Log、Metric等。

Tool 封装

todo

agent 关键组件的 LlamaIndex 实现

If you want to leverage core agentic ingredients in your workflow, LlamaIndex has robust abstractions for every agent sub-ingredient.

Query Planning: Routing, Sub-Questions, Query Transformations.
Function Calling and Tool Use: Check out our OpenAI, Mistral guides as examples.
Memory: Example guide for adding memory to RAG.

routing

Routers are modules that take in a user query and a set of "choices" (defined by metadata), and returns one or more selected choices. ¹¹

todo

Router Query Engine

Router Query Engine ¹²

LangChain Agents

from langchain_core.messages import HumanMessage
from langgraph.checkpoint.memory import MemorySaver
from langgraph.prebuilt import create_react_agent
from langchain_anthropic import ChatAnthropic
from langchain_community.tools.tavily_search import TavilySearchResults

# create the agent
memory = MemorySaver()
model = ChatAnthropic(model_name="claude-3-sonnet-20240229")
search = TavilySearchResults(max_results=2)
tools = [search]
agent_executor = create_react_agent(model, tools, checkpointer=memory)

# use the agent
config = {"configurable": {"thread_id": "abc123"}}
for chunk in agent_executor.stream(
    {"messages": [HumanMessage(content="whats the weather where I live?")]}
):
    print(chunk)
    print("----")

todo: 如何使用本地或自行部署的LLM服务？

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AI Agents #91

AI Agents #91

QingyaFan commented Nov 25, 2024 •

edited

Loading

AI Agents #91

AI Agents #91

Comments

QingyaFan commented Nov 25, 2024 • edited Loading

什么是 AI Agent

分类

ReAct Agent

实现

应用场景

协议或规范

使用场景

现有 Agent 实现

实现的要点

OpenAI Assistant API

LlamaIndex Agents

Lower-Level API

自定义Agent

自定义Agent的例子

Tool 封装

agent 关键组件的 LlamaIndex 实现

routing

Router Query Engine

LangChain Agents

Footnotes

QingyaFan commented Nov 25, 2024 •

edited

Loading