Link to report, Link to GitHub repository, Docker deployment instructions
Multi-horizon time series forecasting on a large dataset of hourly energy consumption values. Implementing a stateful LSTM model and an Inverted Transformer model using PyTorch Lightning, drawing inspiration from multiple existing architectures & making some modifications. Tuning hyperparameters with Optuna, generating forecast intervals with quantile regression, visualizing & comparing predictive performances. Transformer model deployment using Docker, with CUDA GPU support.
Link to report, Link to GitHub repository
Multivariate time series classification using sktime and pyts: kNN with DTW distance, ROCKET & Arsenal, WEASELMUSE and a PyTorch Lightning convolutional neural network trained on image transformed data. Visualizing & comparing the performances of all algorithms.
Link to report, Link to GitHub repository
Multivariate time series anomaly detection using PyOD algorithms & the Darts package: K-means clustering, Gaussian Mixture Models, ECOD, Isolation Forest and an Autoencoder with PyTorchLightning. Visualizing & comparing the results with multiple plots, including 3D interactive Plotly scatterplots.
Link to report, Link to GitHub repository
Imbalanced binary classification with scikit-learn and PyTorch Lightning, on a large dataset of used cars. Comparing logistic regression, SVM and XGBoost trained with class weights, with a neural network trained with focal loss. Performing hyperparameter optimization with Optuna. Assessing model performances with classification metrics & a sensitivity analysis based on a business scenario.
Link to report, part 1
\
Link to report, part 2
\
Link to GitHub repository
\
Link to Kaggle notebook
Time series regression modeling on a dataset of supermarket sales across years, with the Darts library in Python. Performing time decomposition & hybrid modeling, trying statistical methods such as linear regression, AutoARIMA and STL, as well as time series forecasting global neural networks / deep learning models.
Best score: 0.42505 RMSLE, placing 61th out of 612 (top 10%) in the leaderboard at submission time (March 2023).
Link to report, Link to GitHub repository
Feature engineering, MRMR feature selection and XGBoost modeling for the Kaggle House Prices Regression competition. Best submission score (September 2022): 0.12143 RMSLE, 271th place, top 8%.
Link to report, Link to GitHub repository
Imbalanced classification modeling with loan requests dataset. Hyperparameter tuning, performance benchmarking and performance metrics interpretation with the mlr3 package in R.
Link to report, Link to GitHub repository
Predicting concrete compressive strength using GAM regression, as a non-linear function of the mixture components. Visualization of the results with 3D interactive Plotly plots.
Link to report, Link to GitHub repository
Predicting used car prices using Bayesian Linear Regression, visualizing results and comparing with OLS regression.
Link to report, Link to GitHub repository
Non-hierarchical k-medoids clustering on a dataset of country statistics.
Link to report, Link to GitHub repository
Classification modeling on a large dataset of airline passenger satisfaction, using logistic regression, decision trees and random forests.
Page template forked from evanca