Skip to content

Latest commit

 

History

History
198 lines (106 loc) · 7.13 KB

old_README.md

File metadata and controls

198 lines (106 loc) · 7.13 KB

Research-Module-WS22

Code repository for Research Module with Prof. Riezler (WS22): Natalia, Pablo, Jinghua

A tranformer-based self-supervised approach to early sepsis prediction using physiological features and clinical notes.

Important Links to Data/Resources/Colab Notebooks/Write-up

  • Large Data Files on Google Drive (share link in a private email)

    • Mortality Data for dry run (original) ✅

    • Sepsis Data, three additional features (to be added) ✅

      • smaller set: data table = oc table ✅

      • full set: data table > oc table ✅

    • Sepsis Data, three additional features + clinical notes (to be added) ☑️

      • smaller set: data table = oc table ☑️

      • full set: data table > oc table ☑️

Each dataset is stored in pkl, each pkl loads a data table (essentially for pretraining, but also used for tuning) and a oc table (essentially for tuning).

Computing Requirement

Document how much RAM required for each experiment.

RAM usage peak often reached during loading data into matrices for forecasting

Exp/Model System RAM GPU RAM GPU time Additional Notes
Exp 0 (dry run): reproduce strats for mortality prediction reached 32.8 GB RAM usage when loading data and increasing 0 ~ avg approx. 1 hour per epoch loading data into matrices for forecasting
Exp 1 (dry run) strats mortality with our old ids $>35GB$ 0 ~ - session crashed when loading data into matrices for forecasting (RAM limit was 35 GB)
starts_sepsis_small reached 35 GB at epoch 15 and increasing 0 ~ avg approx. 1 hour per epoch 24h for forecasting, 24h for target task Screenshot 2023-02-28 at 20 04 02 RAM usage increases over time, but not constant
starts_sepsis_large reached 36.8 GB at epoch 0 0 ~ avg approx. 1 hour per epoch Screenshot 2023-02-28 at 20 07 34 expected to increase with more epochs
starts_sepsis_small with text
starts_sepsis_large with text

Observations:

  • RAM usage peak often reached during loading data for forecasting
  • usage instable at each run
  • fluctuate to a higher level, not that much more used

Data

Datasets

Features

  • 131 features = 129 physiological features + 2 static features (Age & Gender)

  • 3 additional features for sepsis check

  • clinical notes

  • Discuss in the first meeting in January

    • same features for pretraining and finetuning?
  • For now: Mannheim features ^ 40 features in wang et al. ^ 40 features in physionet challenge 2019

Data Inspection

  • MIMIC-III

  • Mannheim Data

Models

  • Strats (baseline, physiological features only)

  • Strats + Text

  • Wish: More flexible forecasting window!

Experiments

Baselines:

Evaluation

to be discussed in a later stage

Further Analysis

  • Significance testing: with text vs. without text model

  • Ablation Study

OLD NOTES

MIMIC-III Data

Keep some documentations on overleaf?

Overleaf Link (Currently an ACL template)

I also have a parser for GitHub markdown tables to latex tables conversion: link

Problem Setup

Time Series Forecasting consistent with PhysioNet challenge?

We ask participants to design and implement a working, open-source algorithm that can, based only on the clinical data provided, automatically identify a patient's risk of sepsis and make a positive or negative prediction of sepsis for every time interval.

Task to do:

  • reimplement architecture in wang et al.

  • in our case -> altering a binary classification model into a regression model! MSE loss

Resources

Extended Reading List

More thoughts

Beyond the current approach for time series forecasting

  • Survival Analysis? (Time-to-event Analysis)

  • various other approaches to TSF

Computing Resources