- Fix: Support new concept of sklearn tags, now requiring sklearn >= 1.6.0
- Fix: Docs deployment
- Refactor: Use poetry as packaging tool
- Refactor: Add more typing
- Refactor: Add format checks
- Change
feature_names_in_
andfeature_names_out_
tonp.ndarray
instead of lists. - Fixed: Support python 3.11, 3.12, 3.13.
- Breaking: Drop support for python 3.7, 3.8 and 3.9
- Breaking: No longer add intercept in contrast coding schemes.
- fixed: Future Warning in Pandas
- fixed: np.NaNs in numpy 2.x
- improved: performance of the hashing encoder (about twice as fast)
- deprecate the `max_sample`` parameter, it has no use anymore
- add
process_creation_method
parameter - use concurrent.futures.ProcessPoolExecutor instead of hand-managed queues
- optimisations to hashlib calls, remove python 2 checks, fork instead of spawn
- fixed: issue 424 - pandas NaNs
- fixed: pandas deprecations and ruff linter suggestions
- fixed: issue 421 - detect pandas categorical type as categorical column
- fixed: issue 414 - broken link
- fixed: issue 412 - timestamp types in ordinal encoder
- fixed:
importlib
instead ofpkg_resources
- added: ignore option for one-hot-encoding
- fixed: external dependency in unit test
- fixed: gaps in ordinal encoding if nan values are present
- fixed: sklearn compliance: add
feature_names_in_
attribute - fixed: sklearn compliance:
get_feature_names_out
function has the correct signature - fixed: add RankHotEncoder in documentation
- fixed: return correct mapping in one hot encoder
category_mapping
property (issue #256) - refactor: quadratic runtime in ordinal encoder (issue #407)
- added gray encoder
- added thermometer / rank-hot encoder
- introduce compatibility with sklearn 1.2
- compatibility with
feature_names_out_
- remove boston housing dataset
- drop support for dataframes with non-homogenous data types in column names (i.e. having both string and integer column names)
- compatibility with
- improve performance of hashing encoder
- improve catboost documentation
- fix inverse transform in baseN with special character column names (issue 392)
- fix inverse transform of ordinal encoder with custom mapping (issue 202)
- fix re-fittable polynomial wrapper (issue 313)
- fix numerical stability for target encoding (issue 377)
- change default parameters of target encoding (issue 327)
- drop support for sklearn 0.x
- fix pypi sdist
- Added base class for contrast coding schemes in order to make them more maintainable
- Added hierarchical column feature in target encoder
- Fixed maximum recursion depth bug in hashing encoder
- Introduce base class for encoders
- Introduce tagging system on encoders and use it to parametrize tests
- Drop support for python 3.5 and python 3.6
- Require pandas >=1.0
- Introduce f-strings
- Make BinaryEncoder a BaseNEncoder for base=2
- FutureWarning for TargetEncoder's default parameters
- Made all encoders re-fittable on different datasets (c.f. issue 122)
- Introduced tox.ini file for easier version testing
- Fixed a bug with categorical data type in LeaveOneOut encoder
- Do not install examples as a package on its own
- improved documentation
- fix bug in CatBoost encoder
- fix future warnings with pandas
- added tests for python 3.9 and 3.10 in pipeline
- fix treating np.NaN and python None equal
- only build docs on release
- unified conversion of inputs pandas objects that are used internally including some bugfixes.
- added quantile encoder and summary encoder
- many bugfixes
- added count encoder
- Added generalized linear mixed model encoder
- Added cross-validation wrapper
- Added multi-class wrapper
- Support for pandas >= 1.0.1
- Moved CI to github actions
- Added experimental support for multithreading in hashing encoder
- Support for pandas >=0.24
- Removed support for missing values represented by None due to changes in Pandas 0.24. Use numpy.NaN
- Changed the default setting of Helmert encoder for handle_missing and handle_unknown
- Fixed wrong calculation in m-estimate encoder
- Fixed missing value handling in CatBoost encoder
- Added James-Stein, CatBoost and m-estimate encoders
- Added get_feature_names method
- Refactored treatment of missing and new values
- Speed up the encoders with vectorization
- Improved compatibility with Pandas Series and Numpy Arrays
- Added Weight of Evidence encoder
- Critical bugfix in hashing encoder
- Bugfixes related to missing value imputation
- Category names optionally added to encoded column names for some encoders
- Documentation updates
- Stats models pinned to avoid errors
- Performance enhancements
- Release for zenodo DOI
- Inverse transform implemented for some encoders
- Onehot transform returns same columns always
- Missing value and unknown handling now configurable in all relevant encoders
- Added more sophisticated missing value or unknown category handling to ordinal
- Passing through missing value config from onehot into ordinal
- Onehot will return an extra column when unknown categories are passed in if impute is used.
- Added BaseNEncoder to allow for more flexible alternatives to ordinal, onehot and binary.
- Full support for numpy arrays as input, not just dataframes.
- All encoders handle missing values and are tested for their handling
- Created a onehot encoder that follows the same conventions as the rest of the library instead of using sklearns.
- Did some basic benchmarking for data compression and memory usage, made some performance improvements
- Changed all docstrings to numpy style and added more documentation
- Moved all logic methods into staticmethods of the transformer classes themselves.
- Added more detailed checks for type and shape of input data in fit and transform
- Support input as list of lists, alongside numpy arrays and pandas dataframes.
- Better handling for missing values in hashing encoder
- Testing enhancements
- Hash type in hashing encoder now defaults to md5 using hashlib, but can be set to any valid hashlib hash
- Added optional parameter to return a numpy array rather than a dataframe from all transformers.
- Immediately return if cols is empty.
- Optionally pass drop_invariant to any encoder to consistently drop columns with 0 variance from the output (based on training set data in fit())
- If None is passed as the cols param, every string column will be encoded (pandas type = object).
- Changed setup.py to not explicitly force reinstalls of other packages
- Bugfixes
- First real usable release, includes sklearn compatible encoders.
- Basic library of encoders, no automated testing.