Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft: First pass at a BenchmarkV2 that can hold multiple splits. #273

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

cwognum
Copy link
Collaborator

@cwognum cwognum commented Feb 27, 2025

This is currently a draft. Do not merge!

@cwognum cwognum requested a review from Copilot February 27, 2025 21:03
@cwognum cwognum marked this pull request as draft February 27, 2025 21:03
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Overview

This PR implements a draft of BenchmarkV2 that supports multiple splits and updates several related modules such as predictions, splitting, dataset access, evaluation results, and error handling.

  • Introduces BenchmarkV2Prediction for uploadable benchmark evaluations.
  • Refactors splitting functionality by replacing a single split with multiple splits (using SplitV2) and updates associated validators and error messages.
  • Updates dataset, evaluation, and artifact modules to align with the new benchmark and split design.

Reviewed Changes

File Description
polaris/predictions/init.py Adds the new BenchmarkV2Prediction model with minor spelling improvements needed in field descriptions.
polaris/evaluate/_predictions.py Introduces prepare_predictions function and updates type aliases.
polaris/dataset/_subset.py Refactors iter and adjusts filter_targets and extend_inputs implementation.
polaris/benchmark/_split.py Renames and refines the split mixin with updated logging messages.
polaris/benchmark/init.py Exposes the new SplitV2 in the public API.
polaris/_artifact.py Adds repr, _repr_html, and str methods for better artifact display.
polaris/benchmark/_split_v2.py Renames fields (using train/test over training/test) and revises validators and error messages.
polaris/utils/errors.py Introduces a new InvalidSplitError for more granular error handling.
polaris/evaluate/_results.py Removes V2 evaluation result classes to consolidate evaluation around V1.
polaris/utils/types.py Narrows the ListOrArrayType alias to include numeric types.
polaris/evaluate/init.py Adjusts exports to reflect removal of V2 results classes.
polaris/benchmark/_benchmark_v2.py Refactors BenchmarkV2Specification to support multiple splits via a generator return signature.
polaris/benchmark/_base.py Refactors BenchmarkSpecification and removes redundant str implementations.
polaris/evaluate/_metric.py Removes duplicate prepare_predictions implementation.
polaris/competition/init.py Removes the str method in favor of repr for consistency.
polaris/evaluate/_metadata.py Removes ResultsMetadataV2 definitions in favor of a single metadata format.
polaris/dataset/_base.py Removes an unnecessary str override.

Copilot reviewed 18 out of 18 changed files in this pull request and generated 2 comments.

description="The benchmark specification that these predictions were generated for."
)
split_label: str = Field(
description="The label of the split specified in the benchmark that these prediction where generated with."
Copy link
Preview

Copilot AI Feb 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The description for split_label contains a typo ('these prediction where generated with'). Consider updating it to 'the label of the split specified in the benchmark that these predictions were generated with'.

Suggested change
description="The label of the split specified in the benchmark that these prediction where generated with."
description="The label of the split specified in the benchmark that these predictions were generated with."

Copilot is powered by AI, so mistakes are possible. Review output carefully before use.

Positive Feedback
Negative Feedback

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Please select one or more of the options

return self
if max(split.max for split in self.splits) >= dataset_length:
Copy link
Preview

Copilot AI Feb 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code uses 'split.max' to determine the maximum index but SplitV2 defines a 'max_index' property. Update the reference to 'split.max_index' to correctly validate the split indices against the dataset length.

Suggested change
if max(split.max for split in self.splits) >= dataset_length:
if max(split.max_index for split in self.splits) >= dataset_length:

Copilot is powered by AI, so mistakes are possible. Review output carefully before use.

Positive Feedback
Negative Feedback

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Please select one or more of the options
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants