Try out the FHIR Pipelines Controller

The FHIR Pipelines Controller makes it easy to schedule and manage the transformation of data from a FHIR server to a collection of Apache Parquet files. It uses FHIR Data Pipes JDBC pipeline to run either full or incremental transformations to a Parquet data warehouse.

This guide will show you how to set up the FHIR Pipelines Controller with a test HAPI FHIR server. It assumes you are using Linux, but should work with other environments with minor adjustments.

Clone the fhir-data-pipes repository

Clone the fhir-data-pipes GitHub repository using your preferred method. After cloned, open a terminal window and cd to the directory where you cloned it. Later terminal commands will assume your working directory is the repository root.

Set up the test server

The repository includes a Docker Compose configuration to bring up a HAPI FHIR server configured to use Postgres. You can see an example of configuring a HAPI FHIR server to use Postgres here.

docker-compose -f docker/hapi-compose.yml up

Next, we will populate the server with test data included in the repo. This uses the FHIR Data Pipes Synthea data uploader.

FHIR_ENDPOINT=http://localhost:8091/fhir

python3 ./synthea-hiv/uploader/main.py HAPI $FHIR_ENDPOINT \
  --input_dir ./synthea-hiv/sample_data --cores 32

Depending on your machine, using too many cores may slow down your machine or cause JDBC connection pool errors with the HAPI FHIR server. Reducing the number of cores should help at the cost of increasing the time to upload the data.

The uploader requires the google-auth Python library, which you can install using

pip install --upgrade google-auth

Configure the FHIR Pipelines Controller

First, open pipelines/controller/config/application.yml in a text editor.

Change fhirServerUrl to be:

fhirServerUrl: "http://localhost:8091/fhir"

Read through the rest of the file to see other settings. The other lines may remain the same. Note the value of dwhRootPrefix, as it will be where the Parquet files are written. You can also adjust this value if desired. Save and close the file.

Next, open pipelines/controller/config/hapi-postgres-config.json in a text editor.

Change databaseHostName to be:

"databaseHostName" : "localhost"

Save and close the file.

Run the FHIR Pipelines Controller

From the terminal run:

cd pipelines/controller/
mvn spring-boot:run

Open a web browser and visit http://localhost:8080. You should see the FHIR Pipelines Control Panel.

Before automatic incremental runs can occur, you must manually trigger a full run. Under the Run Full Pipeline section, click on Run Full. Wait for the run to complete.

Explore the configuration settings

The Control Panel shows the options being used by the FHIR Pipelines Controller.

Main configuration parameters

This section corresponds to the settings in the application.yml file.

Batch pipeline non-default configurations

This section calls out FHIR Data Pipes batch pipeline settings that are different from their default values. These are also mostly derived from application.yml. Use these settings if you want to run the batch pipeline manually.

Query the DWH

On your machine, look for the Parquet files created in the directory specified by dwhRootPrefix in the application.yml file. FHIR Data Pipes includes query libraries to help explore the data.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly