Releases: mad-lab-fau/tpcp
v0.25.1 - Fixed Documentation-tests-mixin
[0.25.1] - 2023-10-25
Fixed
- Ignored names in the testing mixin are now correctly ignored both-ways.
I.e. it allows to document additional parameters as well, not just leave out parameters.
v0.25 - End of Py3.8 and new validate method
[0.25.0] - 2023-10-24
Added
- The Scorer class now has the ability to score datapoints in parallel.
This can be enabled by setting then_jobs
parameter of theScorer
class to something larger than 1.
(#95) - The
PyTestSnapshotTest
class does now support comparing dataframes with datetime columns.
(#97) - The
validate
function was introduced to enable validation of an algorithm on arbitrary data without parameter
optimization.
(#99) - Fixed the bug that the functions
optimize
andcross_validate
were crashing whenprogress_bar
was deactivated. - New example about caching.
(#98)
Changed
- In line with numpy and some other packages, we drop Python 3.8 support
v0.24.0 - Dateset Improvements
[0.24.0] - 2023-09-08
For all changes in this release see: #85
Deprecated
- The properties
group
andgroups
of theDataset
class are deprecated and will be removed in a future
release.
They are replaced by thegroup_label
andgroup_labels
properties of theDataset
class.
This renaming was done to make it more clear that these properties return the labels of the groups and not the
groups themselves. - The
create_group_labels
method of theDataset
class is deprecated and will be removed in a future release.
It is replaced by thecreate_string_group_labels
method of theDataset
class.
This renaming was done to avoid confusion with the new names forgroups
andgroup
Added
- Added
index_as_tuples
method to theDataset
class.
It returns the full index of the dataset as a list of named tuples regardless of the current grouping.
This might be helpful to extract the label information of a datapoint, whengroup
requires to handle multiple cases,
as your code expects the dataset in different grouped versions.
Changed
- BREAKING CHANGE (with Deprecation): The
group
property of theDataset
class is now calledgroup_label
. - BREAKING CHANGE: The
group_label
property now always returns named tuples of strings
(even for single groups where it used to return strings!). - BREAKING CHANGE (with Deprecation): The
groups
property of theDataset
class is now calledgroup_labels
. - BREAKING CHANGE: The
group_labels
property always returns a list of named tuples of strings
(even for single groups where it used to return a list of strings!). - BREAKING CHANGE: The parameter
groups
of theget_subset
method of theDataset
class is now called
group_labels
and always expects a list of named tuples of strings.
v0.23.0 - Testing Utils
[0.23.0] - 2023-08-30
Added
- We migrated some testing utilities from other libraries to tpcp and exposed some algorithm test helper
that previously only existed in the tests folder via the actual tpcp API.
This should make testing algorithms and pipelines developed with tpcp easier.
These new features are now available in thetpcp.testing
module.
(#89)
v0.22.1 - Fixed `safe_optimize` for GridSearchCV
[0.22.1] - 2023-08-30
Fixed
- The
safe_optimize
parameter ofGridSearchCV
is now correctly used during reoptimization.
Before, it was only forwarded to theOptimize
wrapper during the actual Grid-Search, but not during the final
reoptimization.
v0.22.0 - Tensorflow support
[0.22.0] - 2023-08-25
Added
- Official support for tensorflow/keras. The custom hash function now manages tensorflow models explicitly.
This makes it possible again to use themake_action_safe
andmake_optimize_safe
decorators with algorithms and
pipelines that have tensorflow/keras models as parameters.
(#87) - Added a new example for tensorflow/keras models.
(#87)
v0.20.1: Fix cross-validation regression
[0.20.1] - 2023-07-25
Fixed
- Fixed regression introduced in 0.19.0, which resulted in optimizers not beeing correctly cloned per fold.
In result, each CV fold would overwrite the optimizer object of the previous fold.
This did not affect the reported results, but the returned optimizer object was not the one that was used to calculate
the results.
v0.20.0 - BREAKING CHANGE: Fix optuna multiprocessing
[0.20.0] - 2023-07-24
Changed
-
BREAKING CHANGE: The way how all Optuna based optimizer work has been changed.
Instead of passing a function, that returns a study, you now need to pass a function that returns the parameters of a
study.
Creating the study is now handled by tpcp internally to avoid issues with multiprocessing.
This results in two changes.
The parameter name for all optuna pipelines has changed fromcreate_study
toget_study_params
.
Further, the expected call signature changed, asget_study_params
now gets a seed as argument.
This seed should be used to initialize the random number generator of the sampler and pruner of a study to ensure
that each process gets a different seed and sampling process.
(#80)To migrate your code, you need to change the following:
OLD:
def create_study(): return optuna.create_study(sampler=RandomSampler(seed=42)) OptunaSearch(..., create_study=create_study, ...)
NEW:
def get_study_params(seed: int): return dict(sampler=RandomSampler(seed=seed)) OptunaSearch(..., get_study_params=get_study_params, random_seed=42, ...)
v0.19.0 - Joblib Fixes and better errors
[0.19.0] - 2023-07-06
Added
- All optimization methods that do complicated loops (over parameters or CV-Folds) now raise new custom error messages
(OptimizationError and TestError) if they encounter an error.
These new errors have further information in which iteration of the loop the error occurred and should make it easier
to debug issues. - When a scorer fails, we now print the name (i.e. the group) of the datapoint that caused the error.
This should make it easier to debug issues with the scorer.
Changed
- We dropped support for joblib<0.13.0. due to some changes in the API. We only support the new API now, which allowed
us to simplify some of the multiprocessing code.
v0.18.0 - Some more validation
[0.18.0] - 2023-04-13
Fixed
- When
super().__init__()
is called before all parameters of the child class are initialized, we don't get an error
anymore.
Now all classes remember their parameters when they are defined and don't try to access parameters that are not
defined in their own init.
(#69)
Changed
- Validation is now performed recursively on all subclasses. Note like before validation is still only performed once
per class.
But with this change, we can also validate base classes that are not used directly.
(#70)
Added
- We validate now, if a child class implements all the parameters of its parent class.
While not strictly necessary, this is a sign of bad design, if not done.
It could also lead to issues with tpcps validation logic.
(#70) - It is now possible to hook into the validation and perform custom validation of classes.
(#70) - The dataset class now activly triggers validation and checks if the dataset subclass implements
groupby_cols
and
subset_index
.