Code repository for Research Module with Prof. Riezler (WS22): Natalia, Pablo, Jinghua
A tranformer-based self-supervised approach to early sepsis prediction using physiological features and clinical notes.
-
Large Data Files on Google Drive (share link in a private email)
-
Mortality Data for dry run (original) ✅
-
Sepsis Data, three additional features (to be added) ✅
-
smaller set:
data
table =oc
table ✅ -
full set:
data
table >oc
table ✅
-
-
Sepsis Data, three additional features + clinical notes (to be added) ☑️
-
smaller set:
data
table =oc
table ☑️ -
full set:
data
table >oc
table ☑️
-
-
Each dataset is stored in pkl, each pkl loads a data
table (essentially for pretraining, but also used for tuning) and a oc
table (essentially for tuning).
-
Original planned sepsis patient ids not found in data, updated:
-
Write-up (to reveal specifics per chapter)
-
Experiments (Colab Notebooks)
-
Dry-runs and tests to explore
Mortality data, models, env setups and etc.
-
Forecasting and target prediction without text:
-
Forecasting and target prediction with text:
- tba
-
Document how much RAM required for each experiment.
RAM usage peak often reached during loading data into matrices for forecasting
- RAM usage peak often reached during loading data for forecasting
- usage instable at each run
- fluctuate to a higher level, not that much more used
-
131 features = 129 physiological features + 2 static features (Age & Gender)
-
3 additional features for sepsis check
-
clinical notes
-
Discuss in the first meeting in Januarysame features for pretraining and finetuning?
-
For now: Mannheim features ^ 40 features in wang et al. ^ 40 features in physionet challenge 2019
-
MIMIC-III
-
Mannheim Data
-
Strats (baseline, physiological features only)
-
Strats + Text
-
Wish: More flexible forecasting window!
Baselines:
SEFT?
to be discussed in a later stage
- AUC-ROC (implemented in Strats)
- physionet challenge 2019 Evaluation Scheme link to python implementation
-
Significance testing: with text vs. without text model
-
Ablation Study
Overleaf Link (Currently an ACL template)
I also have a parser for GitHub markdown
tables to latex
tables conversion: link
Time Series Forecasting consistent with PhysioNet challenge?
We ask participants to design and implement a working, open-source algorithm that can, based only on the clinical data provided, automatically identify a patient's risk of sepsis and make a positive or negative prediction of sepsis for every time interval.
Task to do:
-
reimplement architecture in wang et al.
-
in our case -> altering a binary classification model into a regression model! MSE loss
-
code for wang et al.: pending request
-
calculate PhysioNet challenge utility score code
Beyond the current approach for time series forecasting
-
Survival Analysis? (Time-to-event Analysis)
-
various other approaches to TSF