-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft: First pass at a BenchmarkV2 that can hold multiple splits. #273
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR Overview
This PR implements a draft of BenchmarkV2 that supports multiple splits and updates several related modules such as predictions, splitting, dataset access, evaluation results, and error handling.
- Introduces BenchmarkV2Prediction for uploadable benchmark evaluations.
- Refactors splitting functionality by replacing a single split with multiple splits (using SplitV2) and updates associated validators and error messages.
- Updates dataset, evaluation, and artifact modules to align with the new benchmark and split design.
Reviewed Changes
File | Description |
---|---|
polaris/predictions/init.py | Adds the new BenchmarkV2Prediction model with minor spelling improvements needed in field descriptions. |
polaris/evaluate/_predictions.py | Introduces prepare_predictions function and updates type aliases. |
polaris/dataset/_subset.py | Refactors iter and adjusts filter_targets and extend_inputs implementation. |
polaris/benchmark/_split.py | Renames and refines the split mixin with updated logging messages. |
polaris/benchmark/init.py | Exposes the new SplitV2 in the public API. |
polaris/_artifact.py | Adds repr, _repr_html, and str methods for better artifact display. |
polaris/benchmark/_split_v2.py | Renames fields (using train/test over training/test) and revises validators and error messages. |
polaris/utils/errors.py | Introduces a new InvalidSplitError for more granular error handling. |
polaris/evaluate/_results.py | Removes V2 evaluation result classes to consolidate evaluation around V1. |
polaris/utils/types.py | Narrows the ListOrArrayType alias to include numeric types. |
polaris/evaluate/init.py | Adjusts exports to reflect removal of V2 results classes. |
polaris/benchmark/_benchmark_v2.py | Refactors BenchmarkV2Specification to support multiple splits via a generator return signature. |
polaris/benchmark/_base.py | Refactors BenchmarkSpecification and removes redundant str implementations. |
polaris/evaluate/_metric.py | Removes duplicate prepare_predictions implementation. |
polaris/competition/init.py | Removes the str method in favor of repr for consistency. |
polaris/evaluate/_metadata.py | Removes ResultsMetadataV2 definitions in favor of a single metadata format. |
polaris/dataset/_base.py | Removes an unnecessary str override. |
Copilot reviewed 18 out of 18 changed files in this pull request and generated 2 comments.
description="The benchmark specification that these predictions were generated for." | ||
) | ||
split_label: str = Field( | ||
description="The label of the split specified in the benchmark that these prediction where generated with." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The description for split_label contains a typo ('these prediction where generated with'). Consider updating it to 'the label of the split specified in the benchmark that these predictions were generated with'.
description="The label of the split specified in the benchmark that these prediction where generated with." | |
description="The label of the split specified in the benchmark that these predictions were generated with." |
Copilot is powered by AI, so mistakes are possible. Review output carefully before use.
|
||
return self | ||
if max(split.max for split in self.splits) >= dataset_length: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code uses 'split.max' to determine the maximum index but SplitV2 defines a 'max_index' property. Update the reference to 'split.max_index' to correctly validate the split indices against the dataset length.
if max(split.max for split in self.splits) >= dataset_length: | |
if max(split.max_index for split in self.splits) >= dataset_length: |
Copilot is powered by AI, so mistakes are possible. Review output carefully before use.
This is currently a draft. Do not merge!