Skip to content

Commit

Permalink
Version 1.0 November 2019
Browse files Browse the repository at this point in the history
  • Loading branch information
patrickthoral committed Nov 25, 2019
0 parents commit 50eb2a4
Show file tree
Hide file tree
Showing 23 changed files with 16,516 additions and 0 deletions.
2 changes: 2 additions & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# Auto detect text files and perform LF normalization
* text=auto
11 changes: 11 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
**/config.ini
**/.ipynb_checkpoints
**/__pycache__
**/*.csv
**/*.zip
**/*.7z
**/*.bak
**/*.ps1
**/*.parquet
**/dask-worker-space

21 changes: 21 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2019 Patrick Thoral

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
30 changes: 30 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
<img src="img/logo_amds.png" alt="Logo" height="128px"/>

# Welcome
AmsterdamUMCdb is the first freely accessible European intensive care database. It is endorsed by the European Society of Intensive Care Medicine (ESICM) and its Data Science Section. It contains de-identified health data related to tens of thousands of intensive care unit admissions, including demographics, vital signs, laboratory tests and medications.

# Version
The current version of AmsterdamUMCdb is 1.0, released in November 2019. This version contains data related to 23,371 intensive care unit and high dependency unit admissions of adult patients from 2003-2016.

# Requesting Access
The database, although de-identified, still contains detailed information regarding the clinical care of patients, so must be treated with appropriate care and respect and cannot be shared without permission. To request access, go to the [Amsterdam Medical Data Science](https://amsterdammedicaldatascience.nl/) website.

# Facts and Figures
The current database contains data from the clinical patient data management system of the department of Intensive Care, a mixed medical-surgical ICU, from Amsterdam University Medical Center. The clinical data contains 23,371 admissions from 20,169 patients admitted from 2003 to 2016 with a total of almost 1.0 billion clinical observations consisting of vitals, clinical scoring systems, device data and lab results data and 5.0 million medication records.

<img src="img/plot_admissions_year.png" alt="Admissions per year category" height="512px"/>
<img src="img/plot_admissions_age.png" alt="Admission per age category" height="512px"/>


# Available tables
The table and field definitions are available from the [AmsterdamUMCdb wiki](https://github.com/AmsterdamUMC/AmsterdamUMCdb/wiki) and from Jupyter Notebooks in the [tables](tables/) folder.

|Table name|Description|
|:---|:---|
|[admissions](https://github.com/AmsterdamUMC/AmsterdamUMCdb/wiki/admissions)|admissions and demographic data of the patients admitted to the ICU or MCU|
|[drugitems](https://github.com/AmsterdamUMC/AmsterdamUMCdb/wiki/drugitems)|medication orders including fluids, (parenteral) feeding and blood transfusions during the stay on the ICU|
|[freetextitems](https://github.com/AmsterdamUMC/AmsterdamUMCdb/wiki/freetextitems)|observations, including laboratory results, that are based on non-numeric (text) data|
|[listitems](https://github.com/AmsterdamUMC/AmsterdamUMCdb/wiki/listitems)|categorial observations, e.g. based on a selection from a list, like type of heart rhytm, ventilatory mode, etc.|
|[numericitems](https://github.com/AmsterdamUMC/AmsterdamUMCdb/wiki/numericitems)| numerical measurements and observations, including vital parameters, data from medical devices, lab results, outputs from drains and foley-catheters, scores etc.|
|[procedureorderitems](https://github.com/AmsterdamUMC/AmsterdamUMCdb/wiki/procedureorderitems)|procedures and tasks, such as performing a chest X-ray, drawing blood and daily ICU nursing care and scoring|
|[processitems](https://github.com/AmsterdamUMC/AmsterdamUMCdb/wiki/processitems)|catheters, drains, tubes, and continous non-medication processes (e.g. renal replacement therapy, hypothermia induction, etc.)|
85 changes: 85 additions & 0 deletions config.SAMPLE.ini
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
################################################################################
# SAMPLE config.ini file for AmsterdamUMCdb
# This configuration file contains settings for the amsterdamumcdb notebooks for
# connecting to databases. Save the file as config.ini in the root of the
# repository
################################################################################

################################################################################
# This section stores the settings for the csv containing the actual database
################################################################################
[files]
datapath = ./data
admissions = admissions.csv
drugitems = drugitems.csv
freetextitems = freetextitems.csv
listitems = listitems.csv
numericitems = numericitems.csv
procedureorderitems = procedureorderitems.csv
processitems = processitems.csv

################################################################################
# This section stores the settings for connecting to a postgreSQL server using
# the psycopg2 module.
################################################################################
[psycopg2]
database = postgres
username = postgres
password = postgres
host = 127.0.0.1
port = 5432

################################################################################
# This sectios stores the settings for connection to database (sql) servers
# from different database servers using the pyodbc package. The Amsterdam UMC
# AmsterdamUMCdb project uses Microsoft SQL server and is the default
# uncommented connection string. Uncomment the other connection strings
# depending on the database server in use. See
# [Connecting to databases](https://github.com/mkleehammer/pyodbc/wiki)
# on the pyodbc GitHub wiki for more information on setting the connection
# strings inclusing database and OS specific issues.
#
# Note: username/password are not required for Microsoft SQL Server when using
# Windows Authentication with Trusted_Connection
################################################################################
[pyodbc]
hostname = myservername.mydomain.com
database = mydatabase
username = myusername
password = mypassword

#Microsoft SQL Server Connection String using Windows Authentication
connectionstring = (
'DRIVER={ODBC Driver 13 for SQL Server};' #ODBC driver to use
'SERVER='+hostname+';'
'DATABASE='+database+';'
'Trusted_Connection=yes'
)


#Microsoft SQL Server Connection String using username/password
# connectionstring = (
# 'DRIVER={ODBC Driver 13 for SQL Server}' #ODBC driver to use
# 'SERVER='+hostname+';'
# 'DATABASE='+database';'
# 'UID='+username+';'
# 'PWD='+password+';'
# )

#MySQL
# connectionstring = (
# 'DRIVER={MySQL};'
# 'SERVER='+hostname+';'
# 'DATABASE='+database+';'
# 'UID='+username+';'
# 'PWD='+password+';'
# )

#PostgreSQL
# connectionstring = (
# 'DRIVER={PostgreSQL Unicode(x64)};'
# 'SERVER='+hostname+';'
# 'DATABASE='+database+';'
# 'UID='+username+';'
# 'PWD='+password+';'
# )
8 changes: 8 additions & 0 deletions data/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
<img src="../img/logo_amds.png" alt="Logo" height="128px"/>

# AmsterdamUMCdb - Freely Accessible ICU Database
version 1.0 November 2019
Copyright &copy; 2003-2019 Amsterdam UMC - Amsterdam Medical Data Science

# Data folder
This folder is a placeholder for the AmsterdamUMCdb csv files. Extract the files into this folder so the Jupyter Notebooks can find them without manually changing the paths. However, you are free to choose another location, but make sure to modify the [`config.SAMPLE.ini`](../config.SAMPLE.ini) file in the root folder of this repository and save it as `config.ini`.
Binary file added img/avatar_amds.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/avatar_amsterdam_umc.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/logo_amds.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/logo_amsterdam_umc.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/plot_admissions_age.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/plot_admissions_year.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
49 changes: 49 additions & 0 deletions setup-amsterdamumcdb/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
<img src="../img/logo_amds.png" alt="Logo" height="128px"/>

# AmsterdamUMCdb - Freely Accessible ICU Database
version 1.0 November 2019
Copyright &copy; 2003-2019 Amsterdam UMC - Amsterdam Medical Data Science

# Setup AmsterdamUMCdb
## Requirements
- Access to the AmsterdamUMCdb csv files: request access from [Amsterdam Medical Data Science](https://www.amsterdammedicaldatascience.nl/).
- Operating system: any OS capable of running Python and PostgreSQL, including Windows, macOS and Linux.
- Internal memory: 8GB should suffice for basic analysis and running the Jupyter notebooks. However, the recommended memory specification to run both PostgreSQL and the Jupyter Notebooks on the same machine is 16-32 GB.
- Disk space: Downloading and extracting the database files will require 110 GB of hard disk space. In addition, creating the SQL database requires about 128 GB of hard disk space and and an additional 144 GB for creating the indices to improve query performance.

## 1. Install a Python distribution
We **strongly recommend** installing Python using Anaconda, a popular distribution that includes many useful modules for data science out-of-the-box. Install the (latest) Python 3.7 version distribution from [Anaconda's](https://www.anaconda.com/distribution) distribution page.

## 2. Install PostgreSQL
PostgreSQL is an open source database management system (DBMS), available for most operating systems, including Windows, macOS and Linux. We recommend the installation of the most recent version of PostgreSQL (version 12) from the PostgreSQL [download](https://www.postgresql.org/download/) page. Please note your password for the `postgres` superuser, and if you did not chose `postgres` as the password, you need to modify these settings in the [`config.SAMPLE.ini`](https://github.com/AmsterdamUMC/AmsterdamUMCdb/tree/master/config.SAMPLE.ini) file in the root of the repository. Save the file as [`config.ini`](https://github.com/AmsterdamUMC/AmsterdamUMCdb/tree/master/config.ini).

## 3. Install psycopg2 module
To connect to your postgreSQL server from Python, the [psycopg2](https://pypi.org/project/psycopg2/) package needs to be installed from the Anaconda Prompt/Shell using conda:

> conda install -c anaconda psycopg2
## 4. Clone the AmsterdamUMCdb GitHub respository
Clone or download the [AmsterdamUMCdb](https://github.com/AmsterdamUMC/AmsterdamUMCdb) repository from GitHub.
Follow the instructions on GitHub's online step-by-step guide, if needed: https://help.github.com/en/github/creating-cloning-and-archiving-repositories/cloning-a-repository.

## 5. Download the database files
Download the AmsterdamUMCdb zip file from and extract the files from the zip file to the `data` folder of the cloned AmsterdamUMCdb repository.

## 6. Create database tables
Start Jupyter notebook server from the command line (using Command Prompt on Windows or Terminal on Mac/Linux) by running:

> jupyter notebook
From the Jupyter file browser, open the `setup-amsterdamumc.ipynb` file from the `setup-amsterdamumc` folder in the cloned repository. The code in the notebook assumes there is a default postgres installation with a dabase named `postgres`, user `postgres` with password `postgres`. You should change these settings in the [`config.SAMPLE.ini`](https://github.com/AmsterdamUMC/AmsterdamUMCdb/tree/master/config.SAMPLE.ini) file in the root of the repository and save the file as [`config.ini`].
To create the tables in the database run this Jupyter notebook, either cell by cell (▶️ Run) to see what's happening, or use the ⏩ button to to automatically perform all steps. An `amsterdamumc` [schema](https://www.postgresql.org/docs/12/ddl-schemas.html) will be created, and all tables will be added to this schema.

## 7. Verify the database
After the notebook has been run completely, the postgres database should contain all tables with the same number of records we released. The output should state `Verification: PASSED`.

## 8. Create database table indices
It's highly recommended to create some useful indices to improve performance for common queries on identifiers like admissionid, itemid and measured times.

## 9. Jupyter Notebooks
While the indices are being created, the postgreSQL should be available for querying using the notebooks in the [`tables`](https://github.com/AmsterdamUMC/AmsterdamUMCdb/tree/master/tables) folder (with lower performance). We use plotly (version >4) for interactive plots in some notebooks. Plotly can be installed by
using conda:

> conda install -c plotly plotly
Loading

0 comments on commit 50eb2a4

Please sign in to comment.