Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Simple and Advanced Geometry Dataset Generators #29

Open
4 tasks
Schmeitzke opened this issue Jan 30, 2025 · 2 comments · Fixed by #38
Open
4 tasks

Add Simple and Advanced Geometry Dataset Generators #29

Schmeitzke opened this issue Jan 30, 2025 · 2 comments · Fixed by #38
Assignees
Labels

Comments

@Schmeitzke
Copy link
Contributor

Schmeitzke commented Jan 30, 2025

Description
We would like to introduce two new dataset generators for geometry:

  1. Simple Geometry Dataset for basic angle-finding tasks (e.g., interior angles of polygons).
  2. Advanced Geometry Dataset for more complex problems (e.g., orthocenter, incircle radius, angle chasing using coordinate geometry).

Proposed Solution

  1. Simple Geometry

    • Task Scope: Generate regular polygon or polygon-like problems (e.g., “Find the missing interior angle in a convex polygon”).
    • Implementation:
      • Configuration class (SimpleGeometryConfig) with parameters for number of sides, angle ranges, and dataset size.
      • Dataset class (SimpleGeometryDataset) that uses a known formula (sum of interior angles = 180°(n−2)) to verify correctness.
    • Output: A text prompt (the polygon angles) and a numeric answer (the missing angle).
  2. Advanced Geometry

    • Task Scope: Coordinate-based geometry with Sympy for verification (e.g., orthocenter, incircle radius, angle measures).
    • Implementation:
      • Configuration class (AdvancedGeometryConfig) specifying coordinate ranges, possible tasks, and dataset size.
      • Dataset class (AdvancedGeometryDataset) that:
        • Randomly generates non-degenerate triangles.
        • Selects a task type (orthocenter, incircle, angle measure, etc.).
        • Uses Sympy geometry methods to compute exact/approximate solutions.
    • Output: A text prompt (e.g., triangle coordinates and requested property) and a numeric or symbolic solution.

Tasks

  • Create configuration and dataset classes for both simple and advanced geometry.
  • Implement unit tests for deterministic generation, correct structure, and verification logic.
  • Ensure code aligns with existing repository conventions (e.g., ProceduralDataset base class, register_dataset).
  • Document configuration parameters, usage examples, and test coverage.

Additional Context

  • The simple dataset builds on existing arithmetic or logic dataset patterns by focusing on basic geometry.
  • The advanced dataset extends geometric reasoning significantly, providing diverse and verifiable tasks for more robust model training.
  • Both datasets will aid in testing a model’s ability to handle numeric computations, symbolic reasoning, and multi-step geometry tasks.
@andreaskoepf
Copy link
Contributor

Great proposal, please go ahead, assigned the task to you.

@andreaskoepf
Copy link
Contributor

@Schmeitzke Thanks for the PR! We are now missing a good score_answer() implementation for these datasets, e.g. to generate a score between 0-1 depending on how far the answer-value is from the right value. Also the question should include a short hint about the desired precision for the answer, e.g. rounded to how many decimal digts. Please let me know if you are in adding the score_answer methods.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants