generated from CDCgov/template
-
Notifications
You must be signed in to change notification settings - Fork 4
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
adds two notebooks to test clearing outputs (#121)
Co-authored-by: Bryan <[email protected]>
- Loading branch information
1 parent
ae4f6f0
commit 43a55ba
Showing
2 changed files
with
950 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,182 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"id": "0d9247a0-f8e3-435b-932b-da2aef3fe9e2", | ||
"metadata": {}, | ||
"source": [ | ||
"# Overview\n", | ||
"\n", | ||
"This notebook is intended is to walk you through the process of getting the ECR Viewer running locally with a fully populated database of dummy LA County data. There are several steps that need to be taken:\n", | ||
"\n", | ||
"1. Clone the PHDI repo\n", | ||
"2. Download the LAC zip file\n", | ||
"3. Convert all of the XML data into FHIR\n", | ||
"4. Start looking for errors" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "4102572c-df5d-47dc-8119-fe7f71aae743", | ||
"metadata": {}, | ||
"source": [ | ||
"### 1. Clone the PHDI repo\n", | ||
"\n", | ||
"If you've got SSH keys set with your GitHub account, then you can run `[email protected]:CDCgov/phdi.git`. Otherwise, you can run `https://github.com/CDCgov/phdi.git`. Either way, you should end up with a directory called `phdi`.\n", | ||
"\n", | ||
"**NOTE**: you should clean up the sample data that comes with the repo by deleting the data in the `phdi/containers/ecr-viewer/seed-scripts/baseECR` and `phdi/containers/ecr-viewer/seed-scripts/baseECR`." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "a22d2cbb-f4d0-4336-9c5a-f8333251eb36", | ||
"metadata": {}, | ||
"source": [ | ||
"### 2. Download the LAC zip file\n", | ||
"\n", | ||
"The data is saved [here](https://drive.google.com/file/d/17d10TmhGHT9fMF5sXONsLOrZTF9F6WWN/view?usp=drive_link) on Google Drive." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "d0477ff9-a8b3-4124-adef-9fc1c9990b5e", | ||
"metadata": {}, | ||
"source": [ | ||
"### 3. Convert all of the XML data into FHIR\n", | ||
"\n", | ||
"There are several steps to complete this process:\n", | ||
"\n", | ||
"1. Extract all of the zip files from the `LAC_DATA` zip archive.\n", | ||
"2. For each zip file in the `LAC_DATA` archive, extract all of the files into a folder named after the zip file.\n", | ||
"3. Fire up the FHIR conversion Docker container.\n", | ||
"4. Loop through each folder and use the eICR and RR XML files to create a FHIR JSON file." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "3648cc77-4b5c-4383-921a-0a48942ff0d2", | ||
"metadata": {}, | ||
"source": [ | ||
"#### 0. Import necessary libraries" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "5e22daca-cb6a-4619-9964-b4a75b903a2c", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import pandas as pd\n", | ||
"from lxml import etree\n", | ||
"from pathlib import Path\n", | ||
"from zipfile import ZipFile" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "729f3800-cf10-45ce-b072-44ee968b95de", | ||
"metadata": {}, | ||
"source": [ | ||
"#### 1. Extract all of the zip files from the `LAC_DATA` zip archive" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "b356a1b9-99d0-45e4-9e87-85ac7faa0b2f", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"infile = Path(\"./LAC_DATA.zip\")\n", | ||
"outfile = Path(\"./LAC_DATA\")\n", | ||
"with ZipFile(infile, \"r\") as zip_file:\n", | ||
" zip_file.extractall(path=\".\")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "3e95940d-9fcd-4629-8660-2389a5db00fd", | ||
"metadata": {}, | ||
"source": [ | ||
"#### 2. For each zip file in the archive, extract all of the other files into a folder named after the zip file" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "6e846502-92c4-4b98-bcbd-86a094f2feab", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"all_files = list(outfile.glob(\"*.*\"))\n", | ||
"for f in all_files:\n", | ||
" with ZipFile(f, \"r\") as zip_file:\n", | ||
" # assumes you are in the phdi directory\n", | ||
" zip_file.extractall(path=f\"./containers/ecr-viewer/seed-scripts/baseECR/{f.stem}/\")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "c47e3df2-631b-4ce6-aa11-f390ad783ef2", | ||
"metadata": {}, | ||
"source": [ | ||
"#### 3. Launch FHIR Conversion Docker container\n", | ||
"\n", | ||
"Navigate to the `ecr-viewer` subdirectory in the `phdi` directory via `cd containers/ecr-viewer` in your terminal and run the following commands:\n", | ||
"\n", | ||
"```bash\n", | ||
"dockId=$(docker run --rm -d -it -p 8080:8080 \"$(docker build -q ./../fhir-converter/)\")\n", | ||
"echo $dockId\n", | ||
"```\n", | ||
"\n", | ||
"As long as you see a value printed to the screen, your container should be up and running." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "b3f043ac-4bac-4a1a-9f30-efe4f7b1b0b2", | ||
"metadata": {}, | ||
"source": [ | ||
"#### 4. Loop through each folder and extract the files\n", | ||
"\n", | ||
"Change directories into the `seed-scripts` directory and run the following for loop in your terminal:\n", | ||
"\n", | ||
"```bash\n", | ||
"for d in ./baseECR/* ; do\n", | ||
" #first escape \", then /, and finally remove all new lines\n", | ||
" rr=$(sed -e 's/\"/\\\\\"/g ; s=/=\\\\\\/=g ; $!s/$/\\\\n/' \"$d/CDA_RR.xml\" | tr -d '\\n')\n", | ||
" eicr=$(sed 's/\"/\\\\\"/g ; s=/=\\\\\\/=g ; $!s/$/\\\\n/' \"$d/CDA_eICR.xml\" | tr -d '\\n')\n", | ||
" resp=$(curl -l 'http://localhost:8080/convert-to-fhir' --header 'Content-Type: application/json' --data-raw '{\"input_type\":\"ecr\",\"root_template\":\"EICR\",\"input_data\": \"'\"$eicr\"'\",\"rr_data\": \"'\"$rr\"'\"}')\n", | ||
" echo $resp | jq '.response.FhirResource' > \"./fhir_data/$(basename $d).json\"\n", | ||
"done\n", | ||
"```" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "306d7319-e4d5-44ff-9923-ef49fc8fe03b", | ||
"metadata": {}, | ||
"source": [ | ||
"To check that everything worked as expected, you can run `ls -lh ./seed-scripts/fhir-data/ | wc -l`, which should return a value of `1282`. " | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "437822b7-a9e1-4b35-b172-1a518c463709", | ||
"metadata": {}, | ||
"source": [ | ||
"In order to launch the ECR Viewer, you should simply need to run `docker compose up` from the `phdi/containers/ecr-viewer` directory. However, it should also fail. Seeding the database will fail because not all of the data successfully converted, so there are erroneous files.\n", | ||
"\n", | ||
"**This is where the work needs to be picked up.**" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"language_info": { | ||
"name": "python" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 5 | ||
} |
Oops, something went wrong.