A package to streamline common code chunks executed by students in the UBC MDS program circa 2025.
- compare_model_scores() - a function that takes multiple models and returns a table of mean CV scores for each for easy comparison.
- perform_eda() - a function to perform exploratory data analysis on a dataset
- dataset_summary() - a function that generates a comprehensive summary of a dataset, including missing value statistics, feature counts, duplicate rows, and descriptive statistics.
- htv() - (Hypothesis Test Visualization) provide good plots for user's hypothesis test result, easier to understand what happend in test rather than just number.
-
While this package extends cross-validation from scikit-learn, there are no known packages that provide CV score comparison similar to compare_model_scores(). The most similar is the summary_cv() function in the CrossPy package, which summarizes CV scores for a single model.
-
While the
ProfileReport
class from the ydata-profiling package provides automated exploratory data analysis and reporting, there are no known packages that offer the same level of flexible, on-demand visualizations and insights as theperform_eda()
function. The most similar functionality is available in pandas-profiling, which generates detailed HTML reports but lacks the modular, interactive approach thatperform_eda()
provides for tailoring EDA to specific datasets and workflows. -
The
dataset_summary()
function combines essential dataset insights—missing values, feature types, duplicates, and basic statistics—into one comprehensive and easy-to-use tool. While similar functionality exists in libraries like pandas-profiling and missingno, these tools focus on specific aspects or full-scale exploratory analysis. No single function consolidates all these features in one place, makingdataset_summary()
a uniquely efficient solution for preprocessing workflows. -
There is no similar function could provide plot for hypothesis test output. Data Scientist do it manually, but it is not friendly for learner.
$ pip install mds_2025_helper_functions
- TODO
Interested in contributing? Check out the contributing guidelines. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.
Karlygash Zhakupbayeva, Samuel Adetsi, Xi Cu, Michael Hewlett
mds_2025_helper_functions
was created by Karlygash Zhakupbayeva, Samuel Adetsi, Xi Cu, Michael Hewlett. It is licensed under the terms of the MIT license.
mds_2025_helper_functions
was created with cookiecutter
and the py-pkgs-cookiecutter
template.