Skip to content

Commit

Permalink
[Docs Agent] Docs Agent release version 0.4.0 (#533)
Browse files Browse the repository at this point in the history
* [Docs Agent] Docs Agent release version 0.4.0

- **Multi-modal support:** The Docs Agent CLI supports image, audio,
  and video files as part of a prompt to the Gemini model.
- **Formatted output:** Select the format of Docs Agent CLI's responses
  with the `--response_type json` and `--plaintext` options.
- **Autocomplete script:** The `autocomplete.sh` script is added to
  include Docs Agent CLI commands and options, making it easier and
  faster to use the Docs Agent CLI on a terminal.

* [Docs Agent] Docs Agent release version 0.4.0 (Files missed in the
previous commit)

- **Multi-modal support:** The Docs Agent CLI supports image, audio,
  and video files as part of a prompt to the Gemini model.
- **Formatted output:** Select the format of Docs Agent CLI's responses
  with the `--response_type json` and `--plaintext` options.
- **Autocomplete script:** The `autocomplete.sh` script is added to
  include Docs Agent CLI commands and options, making it easier and
  faster to use the Docs Agent CLI on a terminal.
  • Loading branch information
kyolee415 authored Oct 29, 2024
1 parent 213c6ce commit 4203530
Show file tree
Hide file tree
Showing 20 changed files with 2,819 additions and 2,049 deletions.
24 changes: 21 additions & 3 deletions examples/gemini/python/docs-agent/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,10 +26,10 @@ check out the [Set up Docs Agent][set-up-docs-agent] section below.

Docs Agent's `agent runtask` command allows you to run pre-defined chains of prompts,
which are referred to as **tasks**. These tasks simplify complex interactions by defining
a series of steps that the Docs Agent will execute. The tasks are defined in `.yaml` files
stored in the [`tasks`][tasks-dir] directory of your Docs Agent project. The tasks are
a series of steps that the Docs Agent CLI will execute. The tasks are defined in `.yaml`
files stored in the [`tasks`][tasks-dir] directory of your Docs Agent project. The tasks are
designed to be reusable and can be used to automate common workflows, such as generating
release notes, updating documentation, or analyzing complex information.
release notes, drafting overview pages, or analyzing complex information.

A task file example:

Expand Down Expand Up @@ -101,6 +101,16 @@ The list below summarizes the tasks and features supported by Docs Agent:
agent runtask --task DraftReleaseNotes
```

- **Multi-modal support**: Docs Agent's `agent helpme` command can include image,
audio, and video files as part of a prompt to the Gemini 1.5 model, for example:

```sh
agent helpme Provide a concise, descriptive alt text for this PNG image --file ./my_image_example.png
```

You can use this feature for creating tasks as well. For example, see the
[DescribeImages][describe-images] task.

For more information on Docs Agent's architecture and features,
see the [Docs Agent concepts][docs-agent-concepts] page.

Expand Down Expand Up @@ -241,6 +251,13 @@ Clone the Docs Agent project and install dependencies:
**Important**: From this point, all `agent` command lines below need to
run in this `poetry shell` environment.
5. (**Optional**) To enable autocomplete commands and flags related to
Docs Agent in your shell environment, run the following command:
```
source scripts/autocomplete.sh
```
### 5. Edit the Docs Agent configuration file
This guide uses the [open source Flutter documents][flutter-docs-src] as an example dataset,
Expand Down Expand Up @@ -458,3 +475,4 @@ Meggin Kearney (`@Meggin`), and Kyo Lee (`@kyolee415`).
[chunking-process]: docs/chunking-process.md
[new-15-mode]: docs/config-reference.md#app_mode
[tasks-dir]: tasks/
[describe-images]: tasks/describe-images-for-alt-text-task.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -235,6 +235,6 @@ function convertDriveFolder(folderName, outputFolderName="", indexFile="") {
insertRichText(sheet, md_chip, "E", row_number);
insertRichText(sheet, folder_chip, "I", row_number);
}
return gdoc_count, pdf_count, new_file_count, updated_file_count, unchanged_file_count
}
return gdoc_count, pdf_count, new_file_count, updated_file_count, unchanged_file_count
}
30 changes: 30 additions & 0 deletions examples/gemini/python/docs-agent/docs/cli-reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -258,6 +258,20 @@ For example:
agent helpme write a concept doc covering all features in this project? --allfiles ~/my-project --new
```

### Ask the model to print the output in JSON

The command below prints the output from the model in JSON format:

```sh
agent helpme <REQUEST> --response_type json
```

For example:

```sh
agent helpme how do I cook pasta? --response_type json
```

### Ask the model to run a pre-defined chain of prompts

The command below runs a task (a sequence of prompts) defined in
Expand Down Expand Up @@ -297,6 +311,22 @@ For example:
agent runtask --task IndexPageGenerator --custom_input ~/my_example/docs/development/
```

### Ask the model to print the output in plain text

By default, the `agent runtask` command uses Python's Rich console
to format its output. You can disable it by using the `--plaintext`
flag:

```sh
agent runtask --task <TASK> --plaintext
```

For example:

```sh
agent runtask --task DraftReleaseNotes --plaintext
```

## Managing online corpora

### List all existing online corpora
Expand Down
68 changes: 68 additions & 0 deletions examples/gemini/python/docs-agent/docs_agent/agents/docs_agent.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
"""Docs Agent"""

import typing
import os, pathlib

from absl import logging
import google.api_core
Expand Down Expand Up @@ -573,6 +574,73 @@ def ask_content_model_to_fact_check_prompt(self, context: str, prev_response: st
def generate_embedding(self, text, task_type: str = "SEMANTIC_SIMILARITY"):
return self.gemini.embed(text, task_type)[0]

# Generate a response to an image
def ask_model_about_image(self, prompt: str, image):
if not prompt:
prompt = f"Describe this image:"
if self.context_model.startswith("models/gemini-1.5"):
try:
# Adding prompt in the beginning allows long contextual
# information to be added.
response = self.gemini.generate_content([prompt, image])
except google.api_core.exceptions.InvalidArgument:
return self.config.conditions.model_error_message
else:
logging.error(f"The {self.context_model} can't read an image.")
response = None
exit(1)
return response

# Generate a response to audio
def ask_model_about_audio(self, prompt: str, audio):
if not prompt:
prompt = f"Describe this audio clip:"
audio_size = os.path.getsize(audio)
# Limit is 20MB
if audio_size > 20000000:
logging.error(f"The audio clip {audio} is too large: {audio_size} bytes.")
exit(1)
# Get the mime type of the audio file and trim the . from the extension.
mime_type = "audio/" + pathlib.Path(audio).suffix[:1]
audio_clip = {
"mime_type": mime_type,
"data": pathlib.Path(audio).read_bytes()
}
if self.context_model.startswith("models/gemini-1.5"):
try:
response = self.gemini.generate_content([prompt, audio_clip])
except google.api_core.exceptions.InvalidArgument:
return self.config.conditions.model_error_message
else:
logging.error(f"The {self.context_model} can't read an audio clip.")
exit(1)
return response

# Generate a response to video
def ask_model_about_video(self, prompt: str, video):
if not prompt:
prompt = f"Describe this video clip:"
video_size = os.path.getsize(video)
# Limit is 2GB
if video_size > 2147483648:
logging.error(f"The video clip {video} is too large: {video_size} bytes.")
exit(1)
request_options = {
"timeout": 600
}
mime_type = "video/" + pathlib.Path(video).suffix[:1]
video_clip_uploaded =self.gemini.upload_file(video)
video_clip = self.gemini.get_file(video_clip_uploaded)
if self.context_model.startswith("models/gemini-1.5"):
try:
response = self.gemini.generate_content([prompt, video_clip],
request_options=request_options)
except google.api_core.exceptions.InvalidArgument:
return self.config.conditions.model_error_message
else:
logging.error(f"The {self.context_model} can't see video clips.")
exit(1)
return response

# Function to give an embedding function for gemini using an API key
def embedding_function_gemini_retrieval(api_key, embedding_model: str):
Expand Down
16 changes: 11 additions & 5 deletions examples/gemini/python/docs-agent/docs_agent/interfaces/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -101,10 +101,16 @@ from your `$HOME` directory.
poetry shell
```

Entering the `poetry shell` environment is **required** for
running the `agent` command.
**Important**: You must always enter the `poetry shell` environment
to run the `agent` command.

2. Run the `agent helpme` command, for example:
2. Enable autocomplete for Docs Agent CLI options in your environment:

```
source scripts/autocomplete.sh
```

3. Run the `agent helpme` command, for example:

```
agent helpme how do I cook pasta?
Expand All @@ -113,7 +119,7 @@ from your `$HOME` directory.
This command returns the Gemini model's response of your input prompt
`how do I cook pasta?`.

3. View the list of Docs Agent tasks available in your setup:
4. View the list of Docs Agent tasks available in your setup:

```
agent runtask
Expand All @@ -122,7 +128,7 @@ from your `$HOME` directory.
This command prints a list of Docs Agent tasks that you can run.
(See the `tasks` directory in your local Docs Agent setup.)

4. Run the `agent runtask` command, for example:
5. Run the `agent runtask` command, for example:

```
agent runtask --task IndexPageGenerator
Expand Down
Loading

0 comments on commit 4203530

Please sign in to comment.