Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: loft-br/xgboost-survival-embeddings
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: main
Choose a base ref
...
head repository: RigoCorp/xgboost-survival-embeddings
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: main
Choose a head ref
Can’t automatically merge. Don’t worry, you can still create the pull request.
  • 18 commits
  • 12 files changed
  • 2 contributors

Commits on Jul 1, 2022

  1. Copy the full SHA
    60fce5b View commit details
  2. Copy the full SHA
    078402e View commit details
  3. working with the xgboost libraries.

    TODO:
    - fix some errors with categorical variables
    - measure runtimes
    - check results times or probabilities
    Raul committed Jul 1, 2022
    Copy the full SHA
    f439d32 View commit details

Commits on Jul 4, 2022

  1. working with the xgboost libraries.

    TODO:
    - check results times or probabilities
    - integrate in patient-similarity notebook
    Raul committed Jul 4, 2022
    Copy the full SHA
    28a2732 View commit details

Commits on Jul 5, 2022

  1. wip in brier score

    Raul Casaña Eslava committed Jul 5, 2022
    Copy the full SHA
    fc0fd94 View commit details
  2. wip in brier score 2

    Raul Casaña Eslava committed Jul 5, 2022
    Copy the full SHA
    2fb1b37 View commit details

Commits on Jul 6, 2022

  1. minor changes

    Raul committed Jul 6, 2022
    Copy the full SHA
    495697d View commit details

Commits on Jul 8, 2022

  1. patching some metrics and errors

    Raul committed Jul 8, 2022
    Copy the full SHA
    bbad373 View commit details

Commits on Jul 11, 2022

  1. finished tests on survival xgboost_se

    Raul committed Jul 11, 2022
    Copy the full SHA
    e291313 View commit details
  2. test scikit survival --> no nans admited!

    Raul committed Jul 11, 2022
    Copy the full SHA
    c56b4ba View commit details

Commits on Jul 12, 2022

  1. finished of testing auc and ibs from sksurv.

    stacked Weibull seems to be the best model
    Raul Casaña Eslava committed Jul 12, 2022
    Copy the full SHA
    b2c742a View commit details

Commits on Jul 13, 2022

  1. formatting tables and minor changes for the report. Removed xgbse as …

    …project, now only the local github project is loaded.
    Raul committed Jul 13, 2022
    Copy the full SHA
    9343c3f View commit details

Commits on Apr 30, 2024

  1. Update setup.py

    racaes authored Apr 30, 2024
    Copy the full SHA
    f29952c View commit details

Commits on May 3, 2024

  1. Added some simple example script for debugging.

    Raul Casaña Eslava committed May 3, 2024
    Copy the full SHA
    369eb98 View commit details
  2. Added the variable feature_types to handle categorical data in xgboos…

    …t when using numpy inputs instead of pandas categorical columns.
    Raul Casaña Eslava committed May 3, 2024
    Copy the full SHA
    6fa5bfc View commit details
  3. fixed some pending bugs related with predict on last iteration and am…

    …ended some future warnings in pandas
    Raul Casaña Eslava committed May 3, 2024
    Copy the full SHA
    436585c View commit details

Commits on May 9, 2024

  1. Added the option to select the number of boosting rounds in KaplanMey…

    …erTree
    Raul Casaña Eslava committed May 9, 2024
    Copy the full SHA
    c048a96 View commit details

Commits on May 27, 2024

  1. Replaced all "enable_categorical: bool = False"

    by "enable_categorical: bool = True"
    Raul Casaña Eslava committed May 27, 2024
    Copy the full SHA
    887f3a0 View commit details
Showing with 683 additions and 164 deletions.
  1. +2 −0 .gitignore
  2. +125 −0 custom_tests/debug_warnings.py
  3. +221 −46 examples/benchmarks/benchmarks_support.ipynb
  4. +1 −1 setup.py
  5. +25 −4 xgbse/_base.py
  6. +50 −20 xgbse/_debiased_bce.py
  7. +125 −43 xgbse/_kaplan_neighbors.py
  8. +25 −6 xgbse/_meta.py
  9. +47 −18 xgbse/_stacked_weibull.py
  10. +28 −8 xgbse/converters.py
  11. +18 −10 xgbse/metrics.py
  12. +16 −8 xgbse/non_parametric.py
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -139,3 +139,5 @@ cython_debug/

.DS_Store
.vscode/

.idea
125 changes: 125 additions & 0 deletions custom_tests/debug_warnings.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
# from pycox.datasets import metabric
from lifelines.datasets import load_dd
import numpy as np
import pandas as pd
from xgbse import XGBSEStackedWeibull
import matplotlib.pyplot as plt
from xgbse.converters import convert_to_structured
from xgbse import XGBSEKaplanTree, XGBSEBootstrapEstimator
from xgbse.metrics import concordance_index, approx_brier_score, dist_calibration_score
from sklearn.model_selection import train_test_split

plt.style.use('bmh')


# to easily plot confidence intervals
def plot_ci(mean_, upper_ci_, lower_ci_, i=42, title='Probability of survival $P(T \\geq t)$'):
# plotting mean and confidence intervals
plt.figure(figsize=(12, 4), dpi=120)
plt.plot(mean_.columns, mean_.iloc[i])
plt.fill_between(mean_.columns, lower_ci_.iloc[i], upper_ci_.iloc[i], alpha=0.2)

plt.title(title)
plt.xlabel('Time [days]')
plt.ylabel('Probability')
plt.tight_layout()


df = load_dd()

# splitting to X, T, E format
X = df.drop(['duration', 'observed'], axis=1)
X = X.astype({c: "category" for c in df.columns if df[c].dtype.name == "object"})
feature_types = ["c" if X[c].dtype.name in ["object", "category"] else "q" for c in X.columns]
T = df['duration']
E = df['observed']
y = convert_to_structured(T, E)
ENABLE_CATEGORICAL = True

# splitting between train, and validation
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=1 / 3, random_state=0)
TIME_BINS = np.arange(T.min(), T.max(), int((T.max() - T.min())/10 + 1))
# TIME_BINS
# ######################################################################################################################
bootstrap_estimator = XGBSEBootstrapEstimator(
XGBSEStackedWeibull(),
n_estimators=3
)

# fitting the meta estimator
bootstrap_estimator.fit(X_train, y_train,
time_bins=TIME_BINS,
enable_categorical=ENABLE_CATEGORICAL,
validation_data=(X_test, y_test),
early_stopping_rounds=100,
verbose_eval=100,
feature_types=feature_types)

# predicting
mean_prob, upper_ci_prob, lower_ci_prob = bootstrap_estimator.predict(
X_test,
return_ci=True,
enable_categorical=ENABLE_CATEGORICAL,
feature_types=feature_types
)

print(f"C-index XGBSEStackedWeibull bootstrap: {concordance_index(y_test, mean_prob)}")
print(f"Avg. Brier Score XGBSEStackedWeibull bootstrap: {approx_brier_score(y_test, mean_prob)}")

d_calib_weibull = dist_calibration_score(y_test, mean_prob, returns='all')
print(f"D-Calibration XGBSEStackedWeibull: {d_calib_weibull}")

# ######################################################################################################################


# xgboost parameters to fit our model
PARAMS_TREE = {
'objective': 'survival:cox',
'eval_metric': 'cox-nloglik',
'tree_method': 'hist',
'max_depth': 10,
'booster': 'dart',
'subsample': 1.0,
'min_child_weight': 50,
'colsample_bynode': 1.0
}

# fitting xgbse model
xgbse_model = XGBSEKaplanTree(PARAMS_TREE)
xgbse_model.fit(X_train, y_train, time_bins=TIME_BINS, enable_categorical=ENABLE_CATEGORICAL)

# predicting
mean, upper_ci, lower_ci = xgbse_model.predict(X_test, return_ci=True, enable_categorical=ENABLE_CATEGORICAL)

# print metrics
print(f"C-index: {concordance_index(y_test, mean)}")
print(f"Avg. Brier Score: {approx_brier_score(y_test, mean)}")

# plotting CIs
plot_ci(mean, upper_ci, lower_ci)

#
# %%time
# ######################################################################################################################
# base model as XGBSEKaplanTree
base_model = XGBSEKaplanTree(PARAMS_TREE)

# bootstrap meta estimator
bootstrap_estimator = XGBSEBootstrapEstimator(base_model, n_estimators=100)

# fitting the meta estimator
bootstrap_estimator.fit(X_train, y_train, time_bins=TIME_BINS, enable_categorical=ENABLE_CATEGORICAL)

# predicting
mean, upper_ci, lower_ci = bootstrap_estimator.predict(X_test, return_ci=True, enable_categorical=ENABLE_CATEGORICAL)

# print metrics
print(f"C-index: {concordance_index(y_test, mean)}")
print(f"Avg. Brier Score: {approx_brier_score(y_test, mean)}")

# plotting CIs
plot_ci(mean, upper_ci, lower_ci)

# ######################################################################################################################
# ######################################################################################################################
print("End of script!")
Loading