Skip to content

Commit

Permalink
Update Latest Review
Browse files Browse the repository at this point in the history
  • Loading branch information
didiforgithub committed Oct 28, 2024
1 parent 92e520d commit f0a3a3f
Show file tree
Hide file tree
Showing 15 changed files with 274 additions and 346 deletions.
52 changes: 31 additions & 21 deletions examples/aflow/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ AFlow is a framework for automatically generating and optimizing Agentic Workflo
[Read our paper on arXiv](https://arxiv.org/abs/2410.10762)

<p align="center">
<a href=""><img src="../../docs/resources/AFLOW-performance.jpg" alt="Performance Of AFLOW" title="Performance of AFlow<sub>1</sub>" width="80%"></a>
<a href=""><img src="../../docs/resources/aflow/AFLOW-performance.jpg" alt="Performance Of AFLOW" title="Performance of AFlow<sub>1</sub>" width="80%"></a>
</p>

## Framework Components
Expand All @@ -17,7 +17,7 @@ AFlow is a framework for automatically generating and optimizing Agentic Workflo
- **Evaluator**: Assesses workflow performance on given tasks. Provides feedback to guide the optimization process towards more effective workflows. See `metagpt/ext/aflow/scripts/evaluator.py` for details.

<p align="center">
<a href=""><img src="../../docs/resources/AFLOW-method.jpg" alt="Performance Of AFLOW" title="Framework of AFlow <sub>1</sub>" width="80%"></a>
<a href=""><img src="../../docs/resources/aflow/AFLOW-method.jpg" alt="Performance Of AFLOW" title="Framework of AFlow <sub>1</sub>" width="80%"></a>
</p>

## Datasets
Expand All @@ -26,39 +26,49 @@ AFlow is a framework for automatically generating and optimizing Agentic Workflo
We conducted experiments on six datasets (HumanEval, MBPP, GSM8K, MATH, HotpotQA, DROP) and provide their evaluation code. The data can be found in this [datasets](https://drive.google.com/uc?export=download&id=1DNoegtZiUhWtvkd2xoIuElmIi4ah7k8e) link, or you can download them using `metagpt/ext/aflow/data/download_data.py`

<p align="center">
<a href=""><img src="../../docs/resources/AFLOW-experiment.jpg" alt="Performance Of AFLOW" title="Comparison bewteen AFlow and other methods <sub>1</sub>" width="80%"></a>
<a href=""><img src="../../docs/resources/aflow/AFLOW-experiment.jpg" alt="Performance Of AFLOW" title="Comparison bewteen AFlow and other methods <sub>1</sub>" width="80%"></a>
</p>

### Custom Datasets
For custom tasks, you can reference the code in the `metagpt/ext/aflow/benchmark` folder. Inherit the `BaseBenchmark` class and implement `evaluate_problem`, `calculate_score`, and `get_result_columns` to add your custom dataset benchmark. Then, add your benchmark name in `metagpt/ext/aflow/scripts/evaluator.py` and `metagpt/ext/aflow/scripts/optimizer.py` to find effective workflows for your custom dataset.

## Quick Start

1. Configure your search in `optimize.py`:
- Open `examples/aflow/optimize.py`
- Set the following parameters:
1. Configure optimization parameters:
- Use command line arguments or modify default parameters in `examples/aflow/optimize.py`:
```python
dataset: DatasetType = "MATH" # Ensure the type is consistent with DatasetType
sample: int = 4 # Sample Count, which means how many workflows will be resampled from generated workflows
question_type: QuestionType = "math" # Ensure the type is consistent with QuestionType
optimized_path: str = "metagpt/ext/aflow/scripts/optimized" # Optimized Result Save Path
initial_round: int = 1 # Corrected the case from Initial_round to initial_round
max_rounds: int = 20 # The max iteration of AFLOW.
check_convergence: bool = True # Whether Early Stop
validation_rounds: int = 5 # The validation rounds of AFLOW.
if_fisrt_optimize = True # You should change it to False after the first optimize.
--dataset MATH # Dataset type (HumanEval/MBPP/GSM8K/MATH/HotpotQA/DROP)
--sample 4 # Sample count - number of workflows to be resampled
--question_type math # Question type (math/code/qa)
--optimized_path PATH # Optimized result save path
--initial_round 1 # Initial round
--max_rounds 20 # Max iteration rounds for AFLOW
--check_convergence # Whether to enable early stop
--validation_rounds 5 # Validation rounds for AFLOW
--if_first_optimize # Set True for first optimization, False afterwards
```
- Adjust these parameters according to your specific requirements and dataset
2. Set up parameters in `config/config2.yaml` (see `examples/aflow/config2.example.yaml` for reference)
3. Set the operator you want to use in `optimize.py` and in `optimized_path/template/operator.py`, `optimized_path/template/operator.json`. You can reference our implementation to add operators for specific datasets
4. When you first run, you can download the datasets and initial rounds by setting `download(["datasets", "initial_rounds"])` in `examples/aflow/optimize.py`

2. Configure LLM parameters in `config/config2.yaml` (see `examples/aflow/config2.example.yaml` for reference)

3. Set up operators in `optimize.py` and in `optimized_path/template/operator.py`, `optimized_path/template/operator.json`. You can reference our implementation to add operators for specific datasets

4. For first-time use, download datasets and initial rounds by setting `download(["datasets", "initial_rounds"])` in `examples/aflow/optimize.py`

5. (Optional) Add your custom dataset and corresponding evaluation function following the [Custom Datasets](#custom-datasets) section

6. (Optional) If you want to use a portion of the validation data, you can set `va_list` in `examples/aflow/evaluator.py`
6. Run `python -m examples.aflow.optimize` to start the optimization process!

7. Run the optimization:
```bash
# Using default parameters
python -m examples.aflow.optimize

# Or with custom parameters
python -m examples.aflow.optimize --dataset MATH --sample 4 --question_type math
```

## Reproduce the Results in the Paper
1. We provide the raw data obtained from our experiments (link), including the workflows and prompts generated in each iteration, as well as their trajectories on the validation dataset. We also provide the optimal workflow for each dataset and the corresponding data on the test dataset. You can download these data using `metagpt/ext/aflow/data/download_data.py`.
1. We provide the raw data obtained from our experiments ([download link](https://drive.google.com/uc?export=download&id=1Sr5wjgKf3bN8OC7G6cO3ynzJqD4w6_Dv)), including the workflows and prompts generated in each iteration, as well as their trajectories on the validation dataset. We also provide the optimal workflow for each dataset and the corresponding data on the test dataset. You can download these data using `metagpt/ext/aflow/data/download_data.py`.
2. You can directly reproduce our experimental results by running the scripts in `examples/aflow/experiments`.


Expand Down
8 changes: 0 additions & 8 deletions examples/aflow/experiments/optimize_drop.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,6 @@
from metagpt.configs.models_config import ModelsConfig
from metagpt.ext.aflow.scripts.optimizer import DatasetType, Optimizer, QuestionType

# DatasetType, QuestionType, and OptimizerType definitions
# DatasetType = Literal["HumanEval", "MBPP", "GSM8K", "MATH", "HotpotQA", "DROP"]
# QuestionType = Literal["math", "code", "qa"]
# OptimizerType = Literal["Graph", "Test"]

# When you fisrt use, please download the datasets and initial rounds; If you want to get a look of the results, please download the results.
# download(["datasets", "initial_rounds"])

# Crucial Parameters
dataset: DatasetType = "DROP" # Ensure the type is consistent with DatasetType
sample: int = 4 # Sample Count, which means how many workflows will be resampled from generated workflows
Expand Down
8 changes: 0 additions & 8 deletions examples/aflow/experiments/optimize_gsm8k.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,6 @@
from metagpt.configs.models_config import ModelsConfig
from metagpt.ext.aflow.scripts.optimizer import DatasetType, Optimizer, QuestionType

# DatasetType, QuestionType, and OptimizerType definitions
# DatasetType = Literal["HumanEval", "MBPP", "GSM8K", "MATH", "HotpotQA", "DROP"]
# QuestionType = Literal["math", "code", "qa"]
# OptimizerType = Literal["Graph", "Test"]

# When you fisrt use, please download the datasets and initial rounds; If you want to get a look of the results, please download the results.
# download(["datasets", "initial_rounds"])

# Crucial Parameters
dataset: DatasetType = "GSM8K" # Ensure the type is consistent with DatasetType
sample: int = 4 # Sample Count, which means how many workflows will be resampled from generated workflows
Expand Down
8 changes: 0 additions & 8 deletions examples/aflow/experiments/optimize_hotpotqa.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,6 @@
from metagpt.configs.models_config import ModelsConfig
from metagpt.ext.aflow.scripts.optimizer import DatasetType, Optimizer, QuestionType

# DatasetType, QuestionType, and OptimizerType definitions
# DatasetType = Literal["HumanEval", "MBPP", "GSM8K", "MATH", "HotpotQA", "DROP"]
# QuestionType = Literal["math", "code", "qa"]
# OptimizerType = Literal["Graph", "Test"]

# When you fisrt use, please download the datasets and initial rounds; If you want to get a look of the results, please download the results.
# download(["datasets", "initial_rounds"])

# Crucial Parameters
dataset: DatasetType = "HotpotQA" # Ensure the type is consistent with DatasetType
sample: int = 4 # Sample Count, which means how many workflows will be resampled from generated workflows
Expand Down
8 changes: 0 additions & 8 deletions examples/aflow/experiments/optimize_humaneval.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,6 @@
from metagpt.configs.models_config import ModelsConfig
from metagpt.ext.aflow.scripts.optimizer import DatasetType, Optimizer, QuestionType

# DatasetType, QuestionType, and OptimizerType definitions
# DatasetType = Literal["HumanEval", "MBPP", "GSM8K", "MATH", "HotpotQA", "DROP"]
# QuestionType = Literal["math", "code", "qa"]
# OptimizerType = Literal["Graph", "Test"]

# When you fisrt use, please download the datasets and initial rounds; If you want to get a look of the results, please download the results.
# download(["datasets", "initial_rounds"])

# Crucial Parameters
dataset: DatasetType = "HumanEval" # Ensure the type is consistent with DatasetType
sample: int = 4 # Sample Count, which means how many workflows will be resampled from generated workflows
Expand Down
8 changes: 0 additions & 8 deletions examples/aflow/experiments/optimize_math.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,6 @@
from metagpt.configs.models_config import ModelsConfig
from metagpt.ext.aflow.scripts.optimizer import DatasetType, Optimizer, QuestionType

# DatasetType, QuestionType, and OptimizerType definitions
# DatasetType = Literal["HumanEval", "MBPP", "GSM8K", "MATH", "HotpotQA", "DROP"]
# QuestionType = Literal["math", "code", "qa"]
# OptimizerType = Literal["Graph", "Test"]

# When you fisrt use, please download the datasets and initial rounds; If you want to get a look of the results, please download the results.
# download(["datasets", "initial_rounds"])

# Crucial Parameters
dataset: DatasetType = "MATH" # Ensure the type is consistent with DatasetType
sample: int = 4 # Sample Count, which means how many workflows will be resampled from generated workflows
Expand Down
8 changes: 0 additions & 8 deletions examples/aflow/experiments/optimize_mbpp.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,6 @@
from metagpt.configs.models_config import ModelsConfig
from metagpt.ext.aflow.scripts.optimizer import DatasetType, Optimizer, QuestionType

# DatasetType, QuestionType, and OptimizerType definitions
# DatasetType = Literal["HumanEval", "MBPP", "GSM8K", "MATH", "HotpotQA", "DROP"]
# QuestionType = Literal["math", "code", "qa"]
# OptimizerType = Literal["Graph", "Test"]

# When you fisrt use, please download the datasets and initial rounds; If you want to get a look of the results, please download the results.
# download(["datasets", "initial_rounds"])

# Crucial Parameters
dataset: DatasetType = "MBPP" # Ensure the type is consistent with DatasetType
sample: int = 4 # Sample Count, which means how many workflows will be resampled from generated workflows
Expand Down
64 changes: 37 additions & 27 deletions examples/aflow/optimize.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,25 +3,33 @@
# @Author : didi
# @Desc : Entrance of AFlow.

import argparse

from metagpt.configs.models_config import ModelsConfig
from metagpt.ext.aflow.data.download_data import download
from metagpt.ext.aflow.scripts.optimizer import DatasetType, Optimizer, QuestionType
from metagpt.ext.aflow.scripts.optimizer import Optimizer

# DatasetType, QuestionType, and OptimizerType definitions
# DatasetType = Literal["HumanEval", "MBPP", "GSM8K", "MATH", "HotpotQA", "DROP"]
# QuestionType = Literal["math", "code", "qa"]
# OptimizerType = Literal["Graph", "Test"]

# Crucial Parameters
dataset: DatasetType = "MATH" # Ensure the type is consistent with DatasetType
sample: int = 4 # Sample Count, which means how many workflows will be resampled from generated workflows
question_type: QuestionType = "math" # Ensure the type is consistent with QuestionType
optimized_path: str = "metagpt/ext/aflow/scripts/optimized" # Optimized Result Save Path
initial_round: int = 1 # Corrected the case from Initial_round to initial_round
max_rounds: int = 20 # The max iteration of AFLOW.
check_convergence: bool = True # Whether Early Stop
validation_rounds: int = 5 # The validation rounds of AFLOW.
if_fisrt_optimize = True # You should change it to False after the first optimize.

def parse_args():
parser = argparse.ArgumentParser(description="AFlow Optimizer")
parser.add_argument("--dataset", type=str, default="MATH", help="Dataset type")
parser.add_argument("--sample", type=int, default=4, help="Sample count")
parser.add_argument("--question_type", type=str, default="math", help="Question type")
parser.add_argument(
"--optimized_path", type=str, default="metagpt/ext/aflow/scripts/optimized", help="Optimized result save path"
)
parser.add_argument("--initial_round", type=int, default=1, help="Initial round")
parser.add_argument("--max_rounds", type=int, default=20, help="Max iteration rounds")
parser.add_argument("--check_convergence", type=bool, default=True, help="Whether to enable early stop")
parser.add_argument("--validation_rounds", type=int, default=5, help="Validation rounds")
parser.add_argument("--if_first_optimize", type=bool, default=True, help="Whether this is first optimization")
return parser.parse_args()


# Config llm model, you can modify `config/config2.yaml` to use more llms.
mini_llm_config = ModelsConfig.default().get("gpt-4o-mini")
Expand All @@ -37,24 +45,26 @@
"Programmer", # It's for math
]

# Create an optimizer instance
optimizer = Optimizer(
dataset=dataset, # Config dataset
question_type=question_type, # Config Question Type
opt_llm_config=claude_llm_config, # Config Optimizer LLM
exec_llm_config=mini_llm_config, # Config Execution LLM
check_convergence=check_convergence, # Whether Early Stop
operators=operators, # Config Operators you want to use
optimized_path=optimized_path, # Config Optimized workflow's file path
sample=sample, # Only Top(sample) rounds will be selected.
initial_round=initial_round, # Optimize from initial round
max_rounds=max_rounds, # The max iteration of AFLOW.
validation_rounds=validation_rounds, # The validation rounds of AFLOW.
)

if __name__ == "__main__":
args = parse_args()

# Create an optimizer instance
optimizer = Optimizer(
dataset=args.dataset, # Config dataset
question_type=args.question_type, # Config Question Type
opt_llm_config=claude_llm_config, # Config Optimizer LLM
exec_llm_config=mini_llm_config, # Config Execution LLM
check_convergence=args.check_convergence, # Whether Early Stop
operators=operators, # Config Operators you want to use
optimized_path=args.optimized_path, # Config Optimized workflow's file path
sample=args.sample, # Only Top(sample) rounds will be selected.
initial_round=args.initial_round, # Optimize from initial round
max_rounds=args.max_rounds, # The max iteration of AFLOW.
validation_rounds=args.validation_rounds, # The validation rounds of AFLOW.
)

# When you fisrt use, please download the datasets and initial rounds; If you want to get a look of the results, please download the results.
download(["datasets", "initial_rounds"], if_first_download=if_fisrt_optimize)
download(["datasets", "initial_rounds"], if_first_download=args.if_first_optimize)
# Optimize workflow via setting the optimizer's mode to 'Graph'
optimizer.optimize("Graph")
# Test workflow via setting the optimizer's mode to 'Test'
Expand Down
14 changes: 8 additions & 6 deletions metagpt/actions/action_node.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,12 +19,12 @@

from metagpt.actions.action_outcls_registry import register_action_outcls
from metagpt.const import USE_CONFIG_TIMEOUT
from metagpt.ext.aflow.scripts.utils import sanitize
from metagpt.llm import BaseLLM
from metagpt.logs import logger
from metagpt.provider.postprocess.llm_output_postprocess import llm_output_postprocess
from metagpt.utils.common import OutputParser, general_after_log
from metagpt.utils.human_interaction import HumanInteraction
from metagpt.utils.sanitize import sanitize


class ReviewMode(Enum):
Expand Down Expand Up @@ -527,7 +527,9 @@ def xml_compile(self, context):
"""
return context

async def code_fill(self, context, function_name=None, timeout=USE_CONFIG_TIMEOUT):
async def code_fill(
self, context: str, function_name: Optional[str] = None, timeout: int = USE_CONFIG_TIMEOUT
) -> Dict[str, str]:
"""
Fill CodeBlock Using ``` ```
"""
Expand All @@ -538,21 +540,21 @@ async def code_fill(self, context, function_name=None, timeout=USE_CONFIG_TIMEOU
result = {field_name: extracted_code}
return result

async def single_fill(self, context):
async def single_fill(self, context: str) -> Dict[str, str]:
field_name = self.get_field_name()
prompt = context
content = await self.llm.aask(prompt)
result = {field_name: content}
return result

async def xml_fill(self, context):
async def xml_fill(self, context: str) -> Dict[str, Any]:
"""
使用XML标签填充上下文并根据字段类型进行转换,包括字符串、整数、布尔值、列表和字典类型
Fill context with XML tags and convert according to field types, including string, integer, boolean, list and dict types
"""
field_names = self.get_field_names()
field_types = self.get_field_types()

extracted_data = {}
extracted_data: Dict[str, Any] = {}
content = await self.llm.aask(context)

for field_name in field_names:
Expand Down
14 changes: 9 additions & 5 deletions metagpt/ext/aflow/benchmark/benchmark.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,15 @@
import os
from abc import ABC, abstractmethod
from datetime import datetime
from pathlib import Path
from typing import Any, Callable, List, Tuple

import aiofiles
import pandas as pd
from tqdm.asyncio import tqdm_asyncio

from metagpt.logs import logger
from metagpt.utils.common import write_json_file


class BaseBenchmark(ABC):
Expand All @@ -18,6 +20,9 @@ def __init__(self, name: str, file_path: str, log_path: str):
self.file_path = file_path
self.log_path = log_path

PASS = "PASS"
FAIL = "FAIL"

async def load_data(self, specific_indices: List[int] = None) -> List[dict]:
data = []
async with aiofiles.open(self.file_path, mode="r", encoding="utf-8") as file:
Expand Down Expand Up @@ -55,18 +60,17 @@ def log_mismatch(
"extracted_output": extracted_output,
"extract_answer_code": extract_answer_code,
}
log_file = os.path.join(self.log_path, "log.json")
if os.path.exists(log_file):
with open(log_file, "r", encoding="utf-8") as f:
log_file = Path(self.log_path) / "log.json"
if log_file.exists():
with log_file.open("r", encoding="utf-8") as f:
try:
data = json.load(f)
except json.JSONDecodeError:
data = []
else:
data = []
data.append(log_data)
with open(log_file, "w", encoding="utf-8") as f:
json.dump(data, f, indent=4, ensure_ascii=False)
write_json_file(log_file, data, encoding="utf-8", indent=4)

@abstractmethod
async def evaluate_problem(self, problem: dict, graph: Callable) -> Tuple[Any, ...]:
Expand Down
5 changes: 1 addition & 4 deletions metagpt/ext/aflow/benchmark/humaneval.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,17 +6,14 @@
from tenacity import retry, retry_if_exception_type, stop_after_attempt, wait_fixed

from metagpt.ext.aflow.benchmark.benchmark import BaseBenchmark
from metagpt.ext.aflow.scripts.utils import sanitize
from metagpt.logs import logger
from metagpt.utils.sanitize import sanitize


class HumanEvalBenchmark(BaseBenchmark):
def __init__(self, name: str, file_path: str, log_path: str):
super().__init__(name, file_path, log_path)

PASS = "PASS"
FAIL = "FAIL"

class TimeoutError(Exception):
pass

Expand Down
Loading

0 comments on commit f0a3a3f

Please sign in to comment.