Research-Module-WS22

Code repository for Research Module with Prof. Riezler (WS22): Natalia, Pablo, Jinghua

A tranformer-based self-supervised approach to early sepsis prediction using physiological features and clinical notes.

Important Links to Data/Resources/Colab Notebooks/Write-up

Large Data Files on Google Drive (share link in a private email)
- Mortality Data for dry run (original) ✅
- Sepsis Data, three additional features (to be added) ✅
  - smaller set: data table = oc table ✅
  - full set: data table > oc table ✅
- Sepsis Data, three additional features + clinical notes (to be added) ☑️
  - smaller set: data table = oc table ☑️
  - full set: data table > oc table ☑️

Each dataset is stored in pkl, each pkl loads a data table (essentially for pretraining, but also used for tuning) and a oc table (essentially for tuning).

~~Train/test/val split by patient id ✅~~
Original planned sepsis patient ids not found in data, updated:
Write-up (to reveal specifics per chapter)
Experiments (Colab Notebooks)
- Dry-runs and tests to explore Mortality data, models, env setups and etc.
  - Initial Dry-run
  - Test Run
- Forecasting and target prediction without text:
  - Sepsis pre-text small
  - Sepsis pre-text large
- Forecasting and target prediction with text:
  - tba

Computing Requirement

Document how much RAM required for each experiment.

RAM usage peak often reached during loading data into matrices for forecasting

Exp/Model	System RAM	GPU RAM	GPU	time	Additional Notes
Exp 0 (dry run): reproduce strats for mortality prediction	reached 32.8 GB RAM usage when loading data and increasing	0	~	avg approx. 1 hour per epoch	`loading data into matrices for forecasting`
Exp 1 (dry run) strats mortality with our old ids	$>35GB$	0	~	-	session crashed when loading data into matrices for forecasting (RAM limit was 35 GB)
`starts_sepsis_small`	reached `35 GB` at epoch 15 and increasing	0	~	avg approx. 1 hour per epoch `24h` for forecasting, `24h` for target task	RAM usage increases over time, but not constant
`starts_sepsis_large`	reached `36.8 GB` at epoch 0	0	~	avg approx. 1 hour per epoch	expected to increase with more epochs
`starts_sepsis_small` with text
`starts_sepsis_large` with text

Observations:

RAM usage peak often reached during loading data for forecasting
usage instable at each run
fluctuate to a higher level, not that much more used

Data

Datasets

MIMIC-III
Mannheim Data

Features

131 features = 129 physiological features + 2 static features (Age & Gender)
3 additional features for sepsis check
clinical notes

~~Discuss in the first meeting in January~~
- ~~same features for pretraining and finetuning?~~
~~For now: Mannheim features ^ 40 features in wang et al. ^ 40 features in physionet challenge 2019~~

Data Inspection

MIMIC-III
Mannheim Data

Models

Strats (baseline, physiological features only)
Strats + Text
Wish: More flexible forecasting window!

Experiments

~~Baselines:~~

~~SEFT?~~

Evaluation

to be discussed in a later stage

AUC-ROC (implemented in Strats)
physionet challenge 2019 Evaluation Scheme link to python implementation

Further Analysis

Significance testing: with text vs. without text model
~~Ablation Study~~

OLD NOTES

MIMIC-III Data

Database Physionet
How to get data
Application form

Keep some documentations on overleaf?

~~Overleaf Link (Currently an ACL template)~~

I also have a parser for GitHub markdown tables to latex tables conversion: link

Problem Setup

Time Series Forecasting consistent with PhysioNet challenge?

We ask participants to design and implement a working, open-source algorithm that can, based only on the clinical data provided, automatically identify a patient's risk of sepsis and make a positive or negative prediction of sepsis for every time interval.

Task to do:

reimplement architecture in wang et al.
in our case -> altering a binary classification model into a regression model! MSE loss

Resources

MIMIC data extraction tool
code for wang et al.: pending request
calculate PhysioNet challenge utility score code
PhysioNet Challenge
SOFA to describe organ failure
TSF with ML

Extended Reading List

More thoughts

Beyond the current approach for time series forecasting

Survival Analysis? (Time-to-event Analysis)
various other approaches to TSF

Computing Resources

bw Uni Cluster
Jupyter Documentations
File System/Data Management

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

old_README.md

old_README.md

Research-Module-WS22

Important Links to Data/Resources/Colab Notebooks/Write-up

Computing Requirement

Observations:

Data

Datasets

Features

Data Inspection

Models

Experiments

Evaluation

Further Analysis

OLD NOTES

MIMIC-III Data

Keep some documentations on overleaf?

Problem Setup

Resources

Extended Reading List

More thoughts

Computing Resources

Files

old_README.md

Latest commit

History

old_README.md

File metadata and controls

Research-Module-WS22

Important Links to Data/Resources/Colab Notebooks/Write-up

Computing Requirement

Observations:

Data

Datasets

Features

Data Inspection

Models

Experiments

Evaluation

Further Analysis

OLD NOTES

MIMIC-III Data

Keep some documentations on overleaf?

Problem Setup

Resources

Extended Reading List

More thoughts

Computing Resources