-
Notifications
You must be signed in to change notification settings - Fork 91
Try out the FHIR Pipelines Controller
The FHIR Pipelines Controller makes it easy to schedule and manage the transformation of data from a FHIR server to a collection of Apache Parquet files. It uses FHIR Data Pipes JDBC pipeline to run either full or incremental transformations to a Parquet data warehouse.
This guide will show you how to set up the FHIR Pipelines Controller with a test HAPI FHIR server. It assumes you are using Linux, but should work with other environments with minor adjustments.
Clone the fhir-data-pipes GitHub repository using your preferred method. After cloned, open a terminal window and cd
to the directory where you cloned it. Later terminal commands will assume your working directory is the repository root.
The repository includes a Docker Compose configuration to bring up a HAPI FHIR server configured to use Postgres. You can see an example of configuring a HAPI FHIR server to use Postgres here.
docker-compose -f docker/hapi-compose.yml up
Next, we will populate the server with test data included in the repo. This uses the FHIR Data Pipes Synthea data uploader.
FHIR_ENDPOINT=http://localhost:8091/fhir
python3 ./synthea-hiv/uploader/main.py HAPI $FHIR_ENDPOINT \
--input_dir ./synthea-hiv/sample_data --cores 32
Depending on your machine, using too many cores may slow down your machine or cause JDBC connection pool errors with the HAPI FHIR server. Reducing the number of cores should help at the cost of increasing the time to upload the data.
The uploader requires the google-auth Python library, which you can install using
pip install --upgrade google-auth
First, open pipelines/controller/config/application.yml
in a text editor.
Change fhirServerUrl to be:
fhirServerUrl: "http://localhost:8091/fhir"
Read through the rest of the file to see other settings. The other lines may remain the same. Note the value of dwhRootPrefix
, as it will be where the Parquet files are written. You can also adjust this value if desired. Save and close the file.
Next, open pipelines/controller/config/hapi-postgres-config.json
in a text editor.
Change databaseHostName
to be:
"databaseHostName" : "localhost"
Save and close the file.
From the terminal run:
cd pipelines/controller/
mvn spring-boot:run
Open a web browser and visit http://localhost:8080. You should see the FHIR Pipelines Control Panel.
Before automatic incremental runs can occur, you must manually trigger a full run. Under the Run Full Pipeline section, click on Run Full. Wait for the run to complete.
The Control Panel shows the options being used by the FHIR Pipelines Controller.
This section corresponds to the settings in the application.yml
file.
This section calls out FHIR Data Pipes batch pipeline settings that are different from their default values. These are also mostly derived from application.yml
. Use these settings if you want to run the batch pipeline manually.
On your machine, look for the Parquet files created in the directory specified by dwhRootPrefix
in the application.yml file. FHIR Data Pipes includes query libraries to help explore the data.