Skip to content

Commit

Permalink
bug fixes to the evaluate algorithm + speedups for redce/equivalence.…
Browse files Browse the repository at this point in the history
… Added diagram to README and a short blurb on how to reproduce our results"
  • Loading branch information
maxzuo committed Jun 26, 2024
1 parent a482c8b commit 90b8d60
Show file tree
Hide file tree
Showing 12 changed files with 532 additions and 428 deletions.
21 changes: 18 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,9 +32,9 @@ rm -rf tmp
## Basic Usage
To evaluate a PDDL problem description, we can use the `planetarium.evaluate` module:
```python
from planetarium import evaluate
import planetarium
...
evaluate.evaluate(gt_pddl_str, pred_pddl_str)
planetarium.evaluate(gt_pddl_str, pred_pddl_str)
```
The supported domains are `blocksworld` and `gripper` domains.

Expand All @@ -47,6 +47,7 @@ from datasets import load_dataset

dataset = load_dataset("BatsResearch/planetarium")
```
Here, `dataset["test"]` is the main test set used in the paper. You may evaluate on this set to reproduce our results.

You can reporduce the dataset, the splits, and a report by running the following command:
```bash
Expand Down Expand Up @@ -74,4 +75,18 @@ Total number of problems: $132,037$.
| $20$ - $40$ | $10,765$ | $2,112$ |
| $40$ - $60$ | $50,793$ | $9,412$ |
| $60$ - $80$ | $26,316$ | $25,346$ |
| $80$ - inf | $3,464$ | $2,438$ |
| $80$ - inf | $3,464$ | $2,438$ |

## How it Works
Planetarium🪐 compares two PDDL problem descriptions by first transcribing them into a graph representation.
Graphs help us to better detect and manipulate relationships between certain objects and propositions.
Next, we build "fully specified" graph representations by adding "trivial" propositions (propositions that do not exist in the problem description but must exist in any state that satisfies such description).
Finally, we use graph isomorphism to compare the fully specified graph representations of the two PDDL problem descriptions, either comparing the entire problem graph or the individual initial and goal scene graphs.
This lets check correctness of the translation of the natural language description into PDDL, without ever needing to run a planner.

Below is a flowchart providing an overview of the equivalence algorithm:

![Equivalence Algorithm Overview](assets/equivalence.png)
<p style="text-align: center;">(Left) Two planning problems, in PDDL problem description, real-world scenario, and graph representations. (Center) Fully specified graph representation. (Right) Graph isomorphism.</p>

The key to this algorithm working is building a specially crafted "fully specify" function, which we build for each domain that we want to support. We provide implementations for the `blocksworld` and `gripper` domains in the `planetarium.oracle` module.
Binary file added assets/equivalence.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 2 additions & 2 deletions evaluate.py
Original file line number Diff line number Diff line change
Expand Up @@ -200,8 +200,8 @@ def result():
parseable = True

# reduce and further validate the LLM output
oracle.reduce(llm_problem_graph.decompose()[0], validate=True)
oracle.reduce(llm_problem_graph.decompose()[1], validate=True)
oracle.reduce(llm_problem_graph.init())
oracle.reduce(llm_problem_graph.goal())
valid = True

problem_graph = builder.build(problem_pddl)
Expand Down
4 changes: 3 additions & 1 deletion planetarium/__init__.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
__all__ = ["builder", "downward", "graph", "metric", "oracle"]
__all__ = ["builder", "downward", "graph", "metric", "oracle", "evaluate"]

from . import builder
from . import downward
from . import graph
from . import metric
from . import oracle

from .evaluate import evaluate
4 changes: 2 additions & 2 deletions planetarium/evaluate.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
from pddl.parser.problem import LenientProblemParser
from pddl.formatter import problem_to_string

from planetarium import *
from planetarium import builder, oracle, metric, downward


VALIDATE = os.getenv("VALIDATE", "Validate")
Expand Down Expand Up @@ -129,7 +129,7 @@ def evaluate(

if source_graph == target_graph:
equivalent = True
elif source_graph.decompose()[0] != target_graph.decompose()[0]:
elif not metric.equals(source_graph.init(), target_graph.init()):
equivalent = False
else:
equivalent = metric.equals(
Expand Down
Loading

0 comments on commit 90b8d60

Please sign in to comment.