adds two notebooks to test clearing outputs (#121)

Co-authored-by: Bryan <[email protected]>
CDCgov · Jun 14, 2024 · 43a55ba · 43a55ba
1 parent ae4f6f0
commit 43a55ba
Show file tree

Hide file tree

Showing 2 changed files with 950 additions and 0 deletions.
diff --git a/dedupe/notebooks/ECR to FHIR setup.ipynb b/dedupe/notebooks/ECR to FHIR setup.ipynb
@@ -0,0 +1,182 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "0d9247a0-f8e3-435b-932b-da2aef3fe9e2",
+   "metadata": {},
+   "source": [
+    "# Overview\n",
+    "\n",
+    "This notebook is intended is to walk you through the process of getting the ECR Viewer running locally with a fully populated database of dummy LA County data. There are several steps that need to be taken:\n",
+    "\n",
+    "1. Clone the PHDI repo\n",
+    "2. Download the LAC zip file\n",
+    "3. Convert all of the XML data into FHIR\n",
+    "4. Start looking for errors"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4102572c-df5d-47dc-8119-fe7f71aae743",
+   "metadata": {},
+   "source": [
+    "### 1. Clone the PHDI repo\n",
+    "\n",
+    "If you've got SSH keys set with your GitHub account, then you can run `[email protected]:CDCgov/phdi.git`. Otherwise, you can run `https://github.com/CDCgov/phdi.git`. Either way, you should end up with a directory called `phdi`.\n",
+    "\n",
+    "**NOTE**: you should clean up the sample data that comes with the repo by deleting the data in the `phdi/containers/ecr-viewer/seed-scripts/baseECR` and `phdi/containers/ecr-viewer/seed-scripts/baseECR`."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a22d2cbb-f4d0-4336-9c5a-f8333251eb36",
+   "metadata": {},
+   "source": [
+    "### 2. Download the LAC zip file\n",
+    "\n",
+    "The data is saved [here](https://drive.google.com/file/d/17d10TmhGHT9fMF5sXONsLOrZTF9F6WWN/view?usp=drive_link) on Google Drive."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d0477ff9-a8b3-4124-adef-9fc1c9990b5e",
+   "metadata": {},
+   "source": [
+    "### 3. Convert all of the XML data into FHIR\n",
+    "\n",
+    "There are several steps to complete this process:\n",
+    "\n",
+    "1. Extract all of the zip files from the `LAC_DATA` zip archive.\n",
+    "2. For each zip file in the `LAC_DATA` archive, extract all of the files into a folder named after the zip file.\n",
+    "3. Fire up the FHIR conversion Docker container.\n",
+    "4. Loop through each folder and use the eICR and RR XML files to create a FHIR JSON file."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3648cc77-4b5c-4383-921a-0a48942ff0d2",
+   "metadata": {},
+   "source": [
+    "#### 0. Import necessary libraries"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "5e22daca-cb6a-4619-9964-b4a75b903a2c",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import pandas as pd\n",
+    "from lxml import etree\n",
+    "from pathlib import Path\n",
+    "from zipfile import ZipFile"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "729f3800-cf10-45ce-b072-44ee968b95de",
+   "metadata": {},
+   "source": [
+    "#### 1. Extract all of the zip files from the `LAC_DATA` zip archive"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b356a1b9-99d0-45e4-9e87-85ac7faa0b2f",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "infile = Path(\"./LAC_DATA.zip\")\n",
+    "outfile = Path(\"./LAC_DATA\")\n",
+    "with ZipFile(infile, \"r\") as zip_file:\n",
+    "    zip_file.extractall(path=\".\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3e95940d-9fcd-4629-8660-2389a5db00fd",
+   "metadata": {},
+   "source": [
+    "#### 2. For each zip file in the archive, extract all of the other files into a folder named after the zip file"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "6e846502-92c4-4b98-bcbd-86a094f2feab",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "all_files = list(outfile.glob(\"*.*\"))\n",
+    "for f in all_files:\n",
+    "    with ZipFile(f, \"r\") as zip_file:\n",
+    "        # assumes you are in the phdi directory\n",
+    "        zip_file.extractall(path=f\"./containers/ecr-viewer/seed-scripts/baseECR/{f.stem}/\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c47e3df2-631b-4ce6-aa11-f390ad783ef2",
+   "metadata": {},
+   "source": [
+    "#### 3. Launch FHIR Conversion Docker container\n",
+    "\n",
+    "Navigate to the `ecr-viewer` subdirectory in the `phdi` directory via `cd containers/ecr-viewer` in your terminal and run the following commands:\n",
+    "\n",
+    "```bash\n",
+    "dockId=$(docker run --rm -d -it -p 8080:8080 \"$(docker build -q ./../fhir-converter/)\")\n",
+    "echo $dockId\n",
+    "```\n",
+    "\n",
+    "As long as you see a value printed to the screen, your container should be up and running."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b3f043ac-4bac-4a1a-9f30-efe4f7b1b0b2",
+   "metadata": {},
+   "source": [
+    "#### 4. Loop through each folder and extract the files\n",
+    "\n",
+    "Change directories into the `seed-scripts` directory and run the following for loop in your terminal:\n",
+    "\n",
+    "```bash\n",
+    "for d in ./baseECR/* ; do\n",
+    "    #first escape \", then /, and finally remove all new lines\n",
+    "    rr=$(sed -e 's/\"/\\\\\"/g ; s=/=\\\\\\/=g ; $!s/$/\\\\n/' \"$d/CDA_RR.xml\" | tr -d '\\n')\n",
+    "    eicr=$(sed 's/\"/\\\\\"/g ; s=/=\\\\\\/=g ; $!s/$/\\\\n/'  \"$d/CDA_eICR.xml\" | tr -d '\\n')\n",
+    "    resp=$(curl -l 'http://localhost:8080/convert-to-fhir' --header 'Content-Type: application/json' --data-raw '{\"input_type\":\"ecr\",\"root_template\":\"EICR\",\"input_data\": \"'\"$eicr\"'\",\"rr_data\": \"'\"$rr\"'\"}')\n",
+    "    echo $resp | jq '.response.FhirResource' > \"./fhir_data/$(basename $d).json\"\n",
+    "done\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "306d7319-e4d5-44ff-9923-ef49fc8fe03b",
+   "metadata": {},
+   "source": [
+    "To check that everything worked as expected, you can run `ls -lh ./seed-scripts/fhir-data/ | wc -l`, which should return a value of `1282`. "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "437822b7-a9e1-4b35-b172-1a518c463709",
+   "metadata": {},
+   "source": [
+    "In order to launch the ECR Viewer, you should simply need to run `docker compose up` from the `phdi/containers/ecr-viewer` directory. However, it should also fail. Seeding the database will fail because not all of the data successfully converted, so there are erroneous files.\n",
+    "\n",
+    "**This is where the work needs to be picked up.**"
+   ]
+  }
+ ],
+ "metadata": {
+  "language_info": {
+   "name": "python"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}