Skip to content

Commit

Permalink
docs: update Agent documentation (#1333)
Browse files Browse the repository at this point in the history
Summary:
- [new] Agent concepts (session, turn)
- [new] how to write custom tools
- [new] non-streaming API and how to get outputs
- [update] remaining `memory` -> `rag` rename
- [new] note importance of `instructions`

Test Plan:
read
  • Loading branch information
ehhuang authored Mar 2, 2025
1 parent 46b0a40 commit 52977e5
Show file tree
Hide file tree
Showing 6 changed files with 170 additions and 64 deletions.
91 changes: 91 additions & 0 deletions docs/source/building_applications/agent.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
# Llama Stack Agent Framework

The Llama Stack agent framework is built on a modular architecture that allows for flexible and powerful AI applications. This document explains the key components and how they work together.

## Core Concepts

### 1. Agent Configuration

Agents are configured using the `AgentConfig` class, which includes:

- **Model**: The underlying LLM to power the agent
- **Instructions**: System prompt that defines the agent's behavior
- **Tools**: Capabilities the agent can use to interact with external systems
- **Safety Shields**: Guardrails to ensure responsible AI behavior

```python
from llama_stack_client.types.agent_create_params import AgentConfig
from llama_stack_client.lib.agents.agent import Agent

# Configure an agent
agent_config = AgentConfig(
model="meta-llama/Llama-3-70b-chat",
instructions="You are a helpful assistant that can use tools to answer questions.",
toolgroups=["builtin::code_interpreter", "builtin::rag/knowledge_search"],
)

# Create the agent
agent = Agent(llama_stack_client, agent_config)
```

### 2. Sessions

Agents maintain state through sessions, which represent a conversation thread:

```python
# Create a session
session_id = agent.create_session(session_name="My conversation")
```

### 3. Turns

Each interaction with an agent is called a "turn" and consists of:

- **Input Messages**: What the user sends to the agent
- **Steps**: The agent's internal processing (inference, tool execution, etc.)
- **Output Message**: The agent's response

```python
from llama_stack_client.lib.agents.event_logger import EventLogger

# Create a turn with streaming response
turn_response = agent.create_turn(
session_id=session_id,
messages=[{"role": "user", "content": "Tell me about Llama models"}],
)
for log in EventLogger().log(turn_response):
log.print()
```
### Non-Streaming



```python
from rich.pretty import pprint

# Non-streaming API
response = agent.create_turn(
session_id=session_id,
messages=[{"role": "user", "content": "Tell me about Llama models"}],
stream=False,
)
print("Inputs:")
pprint(response.input_messages)
print("Output:")
pprint(response.output_message.content)
print("Steps:")
pprint(response.steps)
```

### 4. Steps

Each turn consists of multiple steps that represent the agent's thought process:

- **Inference Steps**: The agent generating text responses
- **Tool Execution Steps**: The agent using tools to gather information
- **Shield Call Steps**: Safety checks being performed

## Agent Execution Loop


Refer to the [Agent Execution Loop](agent_execution_loop) for more details on what happens within an agent turn.
30 changes: 19 additions & 11 deletions docs/source/building_applications/agent_execution_loop.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ Each agent turn follows these key steps:

3. **Inference Loop**: The agent enters its main execution loop:
- The LLM receives a user prompt (with previous tool outputs)
- The LLM generates a response, potentially with tool calls
- The LLM generates a response, potentially with [tool calls](tools)
- If tool calls are present:
- Tool inputs are safety-checked
- Tools are executed (e.g., web search, code execution)
Expand Down Expand Up @@ -68,6 +68,7 @@ Each step in this process can be monitored and controlled through configurations

```python
from llama_stack_client.lib.agents.event_logger import EventLogger
from rich.pretty import pprint

agent_config = AgentConfig(
model="Llama3.2-3B-Instruct",
Expand Down Expand Up @@ -108,14 +109,21 @@ response = agent.create_turn(

# Monitor each step of execution
for log in EventLogger().log(response):
if log.event.step_type == "memory_retrieval":
print("Retrieved context:", log.event.retrieved_context)
elif log.event.step_type == "inference":
print("LLM output:", log.event.model_response)
elif log.event.step_type == "tool_execution":
print("Tool call:", log.event.tool_call)
print("Tool response:", log.event.tool_response)
elif log.event.step_type == "shield_call":
if log.event.violation:
print("Safety violation:", log.event.violation)
log.print()

# Using non-streaming API, the response contains input, steps, and output.
response = agent.create_turn(
messages=[{"role": "user", "content": "Analyze this code and run it"}],
attachments=[
{
"content": "https://raw.githubusercontent.com/example/code.py",
"mime_type": "text/plain",
}
],
session_id=session_id,
)

pprint(f"Input: {response.input_messages}")
pprint(f"Output: {response.output_message.content}")
pprint(f"Steps: {response.steps}")
```
1 change: 0 additions & 1 deletion docs/source/building_applications/evals.md
Original file line number Diff line number Diff line change
Expand Up @@ -149,7 +149,6 @@ agent_config = {
}
],
"tool_choice": "auto",
"tool_prompt_format": "json",
"input_shields": [],
"output_shields": [],
"enable_session_persistence": False,
Expand Down
18 changes: 10 additions & 8 deletions docs/source/building_applications/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,22 +8,24 @@ The best way to get started is to look at this notebook which walks through the

Here are some key topics that will help you build effective agents:

- **[Agent Execution Loop](agent_execution_loop)**
- **[RAG](rag)**
- **[Safety](safety)**
- **[Tools](tools)**
- **[Telemetry](telemetry)**
- **[Evals](evals)**

- **[Agent](agent)**: Understand the components and design patterns of the Llama Stack agent framework.
- **[Agent Execution Loop](agent_execution_loop)**: Understand how agents process information, make decisions, and execute actions in a continuous loop.
- **[RAG (Retrieval-Augmented Generation)](rag)**: Learn how to enhance your agents with external knowledge through retrieval mechanisms.
- **[Tools](tools)**: Extend your agents' capabilities by integrating with external tools and APIs.
- **[Evals](evals)**: Evaluate your agents' effectiveness and identify areas for improvement.
- **[Telemetry](telemetry)**: Monitor and analyze your agents' performance and behavior.
- **[Safety](safety)**: Implement guardrails and safety measures to ensure responsible AI behavior.

```{toctree}
:hidden:
:maxdepth: 1
agent
agent_execution_loop
rag
safety
tools
telemetry
evals
advanced_agent_patterns
safety
```
21 changes: 17 additions & 4 deletions docs/source/building_applications/rag.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
## Using "Memory" or Retrieval Augmented Generation (RAG)
## Using Retrieval Augmented Generation (RAG)

Memory enables your applications to reference and recall information from previous interactions or external documents.
RAG enables your applications to reference and recall information from previous interactions or external documents.

Llama Stack organizes the memory APIs into three layers:
Llama Stack organizes the APIs that enable RAG into three layers:
- the lowermost APIs deal with raw storage and retrieval. These include Vector IO, KeyValue IO (coming soon) and Relational IO (also coming soon.)
- next is the "Rag Tool", a first-class tool as part of the Tools API that allows you to ingest documents (from URLs, files, etc) with various chunking strategies and query them smartly.
- finally, it all comes together with the top-level "Agents" API that allows you to create agents that can use the tools to answer questions, perform tasks, and more.
Expand Down Expand Up @@ -86,7 +86,7 @@ from llama_stack_client.lib.agents.agent import Agent

# Configure agent with memory
agent_config = AgentConfig(
model="meta-llama/Llama-3.2-3B-Instruct",
model="meta-llama/Llama-3.3-70B-Instruct",
instructions="You are a helpful assistant",
enable_session_persistence=False,
toolgroups=[
Expand All @@ -102,6 +102,19 @@ agent_config = AgentConfig(
agent = Agent(client, agent_config)
session_id = agent.create_session("rag_session")


# Ask questions about documents in the vector db, and the agent will query the db to answer the question.
response = agent.create_turn(
messages=[{"role": "user", "content": "How to optimize memory in PyTorch?"}],
session_id=session_id,
)
```

> **NOTE:** the `instructions` field in the `AgentConfig` can be used to guide the agent's behavior. It is important to experiment with different instructions to see what works best for your use case.

You can also pass documents along with the user's message and ask questions about them.
```python
# Initial document ingestion
response = agent.create_turn(
messages=[
Expand Down
73 changes: 33 additions & 40 deletions docs/source/building_applications/tools.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,15 +83,15 @@ result = client.tool_runtime.invoke_tool(
)
```

#### Memory
#### RAG

The Memory tool enables retrieval of context from various types of memory banks (vector, key-value, keyword, and graph).
The RAG tool enables retrieval of context from various types of memory banks (vector, key-value, keyword, and graph).

```python
# Register Memory tool group
client.toolgroups.register(
toolgroup_id="builtin::memory",
provider_id="memory",
toolgroup_id="builtin::rag",
provider_id="faiss",
args={"max_chunks": 5, "max_tokens_in_context": 4096},
)
```
Expand All @@ -102,7 +102,7 @@ Features:
- Context retrieval with token limits


> **Note:** By default, llama stack run.yaml defines toolgroups for web search, code interpreter and memory, that are provided by tavily-search, code-interpreter and memory providers.
> **Note:** By default, llama stack run.yaml defines toolgroups for web search, code interpreter and rag, that are provided by tavily-search, code-interpreter and rag providers.
## Model Context Protocol (MCP) Tools

Expand All @@ -125,51 +125,44 @@ MCP tools require:
- Tools are discovered dynamically from the endpoint


## Tools provided by the client
## Adding Custom Tools

These tools are registered along with the agent config and are specific to the agent for which they are registered. The main difference between these tools and the tools provided by the built-in providers is that the execution of these tools is handled by the client and the agent transfers the tool call to the client and waits for the result from the client.
When you want to use tools other than the built-in tools, you can implement a python function and decorate it with `@client_tool`.

To define a custom tool, you need to use the `@client_tool` decorator.
```python
# Example agent config with client provided tools
config = AgentConfig(
toolgroups=[
"builtin::websearch",
],
client_tools=[ToolDef(name="client_tool", description="Client provided tool")],
)
```
from llama_stack_client.lib.agents.client_tool import client_tool

Refer to [llama-stack-apps](https://github.com/meta-llama/llama-stack-apps/blob/main/examples/agents/e2e_loop_with_client_tools.py) for an example of how to use client provided tools.

## Tool Structure
# Example tool definition
@client_tool
def my_tool(input: int) -> int:
"""
Runs my awesome tool.
Each tool has the following components:

- `name`: Unique identifier for the tool
- `description`: Human-readable description of the tool's functionality
- `parameters`: List of parameters the tool accepts
- `name`: Parameter name
- `parameter_type`: Data type (string, number, etc.)
- `description`: Parameter description
- `required`: Whether the parameter is required (default: true)
- `default`: Default value if any
:param input: some int parameter
"""
return input * 2
```
> **NOTE:** We employ python docstrings to describe the tool and the parameters. It is important to document the tool and the parameters so that the model can use the tool correctly. It is recommended to experiment with different docstrings to see how they affect the model's behavior.
Example tool definition:
Once defined, simply pass the tool to the agent config. `Agent` will take care of the rest (calling the model with the tool definition, executing the tool, and returning the result to the model for the next iteration).
```python
{
"name": "web_search",
"description": "Search the web for information",
"parameters": [
{
"name": "query",
"parameter_type": "string",
"description": "The query to search for",
"required": True,
}
],
}
# Example agent config with client provided tools
client_tools = [
my_tool,
]

agent_config = AgentConfig(
...,
client_tools=[client_tool.get_tool_definition() for client_tool in client_tools],
)
agent = Agent(client, agent_config, client_tools)
```

Refer to [llama-stack-apps](https://github.com/meta-llama/llama-stack-apps/blob/main/examples/agents/e2e_loop_with_client_tools.py) for an example of how to use client provided tools.


## Tool Invocation

Tools can be invoked using the `invoke_tool` method:
Expand Down

0 comments on commit 52977e5

Please sign in to comment.