Skip to content

Commit

Permalink
adds two notebooks to test clearing outputs (#121)
Browse files Browse the repository at this point in the history
Co-authored-by: Bryan <[email protected]>
  • Loading branch information
bryanbritten and bryanbritten authored Jun 14, 2024
1 parent ae4f6f0 commit 43a55ba
Show file tree
Hide file tree
Showing 2 changed files with 950 additions and 0 deletions.
182 changes: 182 additions & 0 deletions dedupe/notebooks/ECR to FHIR setup.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,182 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "0d9247a0-f8e3-435b-932b-da2aef3fe9e2",
"metadata": {},
"source": [
"# Overview\n",
"\n",
"This notebook is intended is to walk you through the process of getting the ECR Viewer running locally with a fully populated database of dummy LA County data. There are several steps that need to be taken:\n",
"\n",
"1. Clone the PHDI repo\n",
"2. Download the LAC zip file\n",
"3. Convert all of the XML data into FHIR\n",
"4. Start looking for errors"
]
},
{
"cell_type": "markdown",
"id": "4102572c-df5d-47dc-8119-fe7f71aae743",
"metadata": {},
"source": [
"### 1. Clone the PHDI repo\n",
"\n",
"If you've got SSH keys set with your GitHub account, then you can run `[email protected]:CDCgov/phdi.git`. Otherwise, you can run `https://github.com/CDCgov/phdi.git`. Either way, you should end up with a directory called `phdi`.\n",
"\n",
"**NOTE**: you should clean up the sample data that comes with the repo by deleting the data in the `phdi/containers/ecr-viewer/seed-scripts/baseECR` and `phdi/containers/ecr-viewer/seed-scripts/baseECR`."
]
},
{
"cell_type": "markdown",
"id": "a22d2cbb-f4d0-4336-9c5a-f8333251eb36",
"metadata": {},
"source": [
"### 2. Download the LAC zip file\n",
"\n",
"The data is saved [here](https://drive.google.com/file/d/17d10TmhGHT9fMF5sXONsLOrZTF9F6WWN/view?usp=drive_link) on Google Drive."
]
},
{
"cell_type": "markdown",
"id": "d0477ff9-a8b3-4124-adef-9fc1c9990b5e",
"metadata": {},
"source": [
"### 3. Convert all of the XML data into FHIR\n",
"\n",
"There are several steps to complete this process:\n",
"\n",
"1. Extract all of the zip files from the `LAC_DATA` zip archive.\n",
"2. For each zip file in the `LAC_DATA` archive, extract all of the files into a folder named after the zip file.\n",
"3. Fire up the FHIR conversion Docker container.\n",
"4. Loop through each folder and use the eICR and RR XML files to create a FHIR JSON file."
]
},
{
"cell_type": "markdown",
"id": "3648cc77-4b5c-4383-921a-0a48942ff0d2",
"metadata": {},
"source": [
"#### 0. Import necessary libraries"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5e22daca-cb6a-4619-9964-b4a75b903a2c",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"from lxml import etree\n",
"from pathlib import Path\n",
"from zipfile import ZipFile"
]
},
{
"cell_type": "markdown",
"id": "729f3800-cf10-45ce-b072-44ee968b95de",
"metadata": {},
"source": [
"#### 1. Extract all of the zip files from the `LAC_DATA` zip archive"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b356a1b9-99d0-45e4-9e87-85ac7faa0b2f",
"metadata": {},
"outputs": [],
"source": [
"infile = Path(\"./LAC_DATA.zip\")\n",
"outfile = Path(\"./LAC_DATA\")\n",
"with ZipFile(infile, \"r\") as zip_file:\n",
" zip_file.extractall(path=\".\")"
]
},
{
"cell_type": "markdown",
"id": "3e95940d-9fcd-4629-8660-2389a5db00fd",
"metadata": {},
"source": [
"#### 2. For each zip file in the archive, extract all of the other files into a folder named after the zip file"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6e846502-92c4-4b98-bcbd-86a094f2feab",
"metadata": {},
"outputs": [],
"source": [
"all_files = list(outfile.glob(\"*.*\"))\n",
"for f in all_files:\n",
" with ZipFile(f, \"r\") as zip_file:\n",
" # assumes you are in the phdi directory\n",
" zip_file.extractall(path=f\"./containers/ecr-viewer/seed-scripts/baseECR/{f.stem}/\")"
]
},
{
"cell_type": "markdown",
"id": "c47e3df2-631b-4ce6-aa11-f390ad783ef2",
"metadata": {},
"source": [
"#### 3. Launch FHIR Conversion Docker container\n",
"\n",
"Navigate to the `ecr-viewer` subdirectory in the `phdi` directory via `cd containers/ecr-viewer` in your terminal and run the following commands:\n",
"\n",
"```bash\n",
"dockId=$(docker run --rm -d -it -p 8080:8080 \"$(docker build -q ./../fhir-converter/)\")\n",
"echo $dockId\n",
"```\n",
"\n",
"As long as you see a value printed to the screen, your container should be up and running."
]
},
{
"cell_type": "markdown",
"id": "b3f043ac-4bac-4a1a-9f30-efe4f7b1b0b2",
"metadata": {},
"source": [
"#### 4. Loop through each folder and extract the files\n",
"\n",
"Change directories into the `seed-scripts` directory and run the following for loop in your terminal:\n",
"\n",
"```bash\n",
"for d in ./baseECR/* ; do\n",
" #first escape \", then /, and finally remove all new lines\n",
" rr=$(sed -e 's/\"/\\\\\"/g ; s=/=\\\\\\/=g ; $!s/$/\\\\n/' \"$d/CDA_RR.xml\" | tr -d '\\n')\n",
" eicr=$(sed 's/\"/\\\\\"/g ; s=/=\\\\\\/=g ; $!s/$/\\\\n/' \"$d/CDA_eICR.xml\" | tr -d '\\n')\n",
" resp=$(curl -l 'http://localhost:8080/convert-to-fhir' --header 'Content-Type: application/json' --data-raw '{\"input_type\":\"ecr\",\"root_template\":\"EICR\",\"input_data\": \"'\"$eicr\"'\",\"rr_data\": \"'\"$rr\"'\"}')\n",
" echo $resp | jq '.response.FhirResource' > \"./fhir_data/$(basename $d).json\"\n",
"done\n",
"```"
]
},
{
"cell_type": "markdown",
"id": "306d7319-e4d5-44ff-9923-ef49fc8fe03b",
"metadata": {},
"source": [
"To check that everything worked as expected, you can run `ls -lh ./seed-scripts/fhir-data/ | wc -l`, which should return a value of `1282`. "
]
},
{
"cell_type": "markdown",
"id": "437822b7-a9e1-4b35-b172-1a518c463709",
"metadata": {},
"source": [
"In order to launch the ECR Viewer, you should simply need to run `docker compose up` from the `phdi/containers/ecr-viewer` directory. However, it should also fail. Seeding the database will fail because not all of the data successfully converted, so there are erroneous files.\n",
"\n",
"**This is where the work needs to be picked up.**"
]
}
],
"metadata": {
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Loading

0 comments on commit 43a55ba

Please sign in to comment.