mds_2025_helper_functions

A package to streamline common code chunks executed by students in the UBC MDS program circa 2025.

Functions

compare_model_scores() - a function that takes multiple models and returns a table of mean CV scores for each for easy comparison.
perform_eda() - a function to perform exploratory data analysis on a dataset
dataset_summary() - a function that generates a comprehensive summary of a dataset, including missing value statistics, feature counts, duplicate rows, and descriptive statistics.
htv() - (Hypothesis Test Visualization) provide good plots for user's hypothesis test result, easier to understand what happend in test rather than just number.

Similar packages

While this package extends cross-validation from scikit-learn, there are no known packages that provide CV score comparison similar to compare_model_scores(). The most similar is the summary_cv() function in the CrossPy package, which summarizes CV scores for a single model.
While the ProfileReport class from the ydata-profiling package provides automated exploratory data analysis and reporting, there are no known packages that offer the same level of flexible, on-demand visualizations and insights as the perform_eda() function. The most similar functionality is available in pandas-profiling, which generates detailed HTML reports but lacks the modular, interactive approach that perform_eda() provides for tailoring EDA to specific datasets and workflows.
The dataset_summary() function combines essential dataset insights—missing values, feature types, duplicates, and basic statistics—into one comprehensive and easy-to-use tool. While similar functionality exists in libraries like pandas-profiling and missingno, these tools focus on specific aspects or full-scale exploratory analysis. No single function consolidates all these features in one place, making dataset_summary() a uniquely efficient solution for preprocessing workflows.
There is no similar function could provide plot for hypothesis test output. Data Scientist do it manually, but it is not friendly for learner.

Installation

$ pip install mds_2025_helper_functions

Usage

TODO

Contributing

Interested in contributing? Check out the contributing guidelines. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.

Contributors

Karlygash Zhakupbayeva, Samuel Adetsi, Xi Cu, Michael Hewlett

License

mds_2025_helper_functions was created by Karlygash Zhakupbayeva, Samuel Adetsi, Xi Cu, Michael Hewlett. It is licensed under the terms of the MIT license.

Credits

mds_2025_helper_functions was created with cookiecutter and the py-pkgs-cookiecutter template.

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
.github/workflows		.github/workflows
docs		docs
src/mds_2025_helper_functions		src/mds_2025_helper_functions
tests		tests
.gitignore		.gitignore
.readthedocs.yml		.readthedocs.yml
CHANGELOG.md		CHANGELOG.md
CONDUCT.md		CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mds_2025_helper_functions

Functions

Similar packages

Installation

Usage

Contributing

Contributors

License

Credits

About

Releases 2

Packages

Contributors 4

Languages

License

UBC-MDS/mds-2025-helper-functions

Folders and files

Latest commit

History

Repository files navigation

mds_2025_helper_functions

Functions

Similar packages

Installation

Usage

Contributing

Contributors

License

Credits

About

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 4

Languages

Packages