Skip to content

Commit

Permalink
Update Mkdocs Documentation (#1110)
Browse files Browse the repository at this point in the history
  • Loading branch information
itsiggs authored Jul 18, 2024
1 parent 3d67095 commit 26060a4
Show file tree
Hide file tree
Showing 11 changed files with 851 additions and 529 deletions.
4 changes: 2 additions & 2 deletions doc/docs/additional.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ performance.

FHIR Data Pipes transforms FHIR resources to _"near lossless"_ 'Parquet on FHIR'
representation based on
the ["Simplified SQL Projection of FHIR Resources"](https://github.com/FHIR/sql-on-fhir/blob/master/sql-on-fhir.md) (
the ["Simplified SQL Projection of FHIR Resources"](https://github.com/google/fhir-data-pipes/blob/master/doc/schema.md) (
_'SQL-on-FHIR-v1'_) schema

* The conversion is done using a forked version
Expand All @@ -68,7 +68,7 @@ _'SQL-on-FHIR-v1'_) schema

## Monitoring pipelines

The pipelines controller exposes a number of management end-points that can help
The pipelines controller exposes management end-points that can help
with monitoring the health of pipelines.

* The application has been integrated with the Spring Boot Actuator of Spring
Expand Down
124 changes: 15 additions & 109 deletions doc/docs/concepts.md → doc/docs/concepts/concepts.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
The key concepts that underpin the OHS Analytics components are:

1. **ETL Pipelines:** ETL Pipelines and Controller can be configured to
continuously transform FHIR data into an analytics friendly FHIR-in-Parquet
continuously transform FHIR data into an analytics friendly Parquet on FHIR
format.
2. **Deployment approaches**: The pipelines are designed to accommodate various
deployment approaches in terms of scalability; from a single machine to a
Expand All @@ -24,7 +24,7 @@ FHIR Data Pipes is made up of the **ETL Pipelines** and **Controller** modules *
FHIR data to Apache Parquet files (for data analysis) or another FHIR server (
for data integration).

![FHIR Data Pipes Transform Step Image](images/ETL_FHIR_to_Parquet.png)
![FHIR Data Pipes Transform Step Image](../images/ETL_FHIR_to_Parquet.png)

## ETL Pipelines

Expand All @@ -51,12 +51,12 @@ FHIR Data Pipes is designed to fetch FHIR source data in various forms and APIs:
FHIR Resources are transformed into a "Parquet on FHIR" format:

* Uses a forked version
of [Bunsen library](https://github.com/google/fhir-data-pipes/tree/master/bunsen) (_
currently supports STU3 and R4 versions of FHIR).
of [Bunsen library](https://github.com/google/fhir-data-pipes/tree/master/bunsen) (
currently supports STU3 and R4 versions of FHIR)
* Configurable support for FHIR profiles and extensions
* (Optional) In-pipeline 'flattening' of FHIR data
using [ViewDefinition](https://build.fhir.org/ig/FHIR/sql-on-fhir-v2/StructureDefinition-ViewDefinition.html)
resources - [read more](#viewdefinition-resource)
resources - [read more](views#viewdefinition-resource)

### Loading

Expand All @@ -78,10 +78,11 @@ full", "incremental", and "merger" pipelines together.
* The Pipelines Controller is built on top of pipelines and shares many of the
same settings
* Using the controller module you can schedule periodic incremental updates or
use a [web control panel](link_to_section_in_adv_guide) to start the pipeline
use the [Web Control Panel](../additional#web-control-panel) to start the
pipeline
manually

## Deployment approaches
## Deployment Approaches

There are a number of different deployment approaches - see table below.

Expand All @@ -97,112 +98,17 @@ requirements, and expertise of the team.
| Exploratory data science or ML use cases | Use the generated Parquet files which as _"near lossless"_ for enhanced data science workflows | Can either use the Parquet or custom schema to power dashboards or reports |
| Push FHIR data to a central FHIR-store (e.g., for a Shared Health Record system) | Use the Pipelines Controller to push from a FHIR source to a FHIR sink | Management of the intermediate Parquet files created as part of the pipelines |

## Query simplification approaches with pre-defined views

The heavily nested nature of FHIR resources requires complex SQL queries that
can make it difficult to work with for analytics use cases. A common approach to
address this is to flatten the data into a set of views (virtual or
materialized) which can then be queried using simpler SQL statements.
## Query Simplification with Flat Views

FHIR Data Pipes provides two approaches for flattening the FHIR resources into
virtual or materialized views:

1. SQL queries to generate virtual views (outside the pipeline)
2.
FHIR [ViewDefinition](https://build.fhir.org/ig/FHIR/sql-on-fhir-v2/StructureDefinition-ViewDefinition.html)

2. FHIR [ViewDefinition](https://build.fhir.org/ig/FHIR/sql-on-fhir-v2/StructureDefinition-ViewDefinition.html)
resources to generate materialized views (within the pipeline)

For both of these approaches, a set of [**"predefined views”
**](https://github.com/google/fhir-data-pipes/tree/master/docker/config/views)
for common resources are provided. These can be modified or extended.

The currently supported list (as of June, 2024) are:

```
Condition
DiagnosticReport
Encounter
Immunization
Location
Medicationrequest
Observation
Organization
Patient
Practitioner
PractitionerRole
Procedure
```

### SQL virtual views

These are samples of more complex SQL-on-FHIR queries for defining flat views
for common FHIR resources. These virtual views are applied outside of the
pipelines in the downstream SQL query engine.

The queries, which have `.sql` suffix, can be found
in [/docker/config/views](https://github.com/google/fhir-data-pipes/tree/master/docker/config/views)
directory (e.g `Patient_flat.sql`).

An example of a flat view for the Observation resource is below:

```sql
CREATE OR REPLACE VIEW flat_observation AS
SELECT O.id AS obs_id, O.subject.PatientId AS patient_id,
OCC.`system` AS code_sys, OCC.code,
O.value.quantity.value AS val_quantity,
OVCC.code AS val_code, OVCC.`system` AS val_sys,
O.effective.dateTime AS obs_date
FROM Observation AS O LATERAL VIEW OUTER explode(code.coding) AS OCC
LATERAL VIEW OUTER explode(O.value.codeableConcept.coding) AS OVCC
```

### ViewDefinition resource

The [SQL-on-FHIR-v2 specification](https://build.fhir.org/ig/FHIR/sql-on-fhir-v2/)
defines a ViewDefinition resource for defining views. Each column in the view is
defined using a FHIRPath expression. There is also an unnesting construct and
support for `constant` and `where` clauses too.

A system (pipeline or library) that implements the “View Layer” of the
specification provides a View Runner that is able to process these FHIR
ViewDefinition Resources over the “Data Layer” (lossless representation of the
FHIR data). The output of this are a set of portable, tabular views that can be
consumed by the “Analytics Layer” which is any number of tools that can be used
to work with the resulting tabular data.

FHIR Data Pipes is
a [reference implementation](https://fhir.github.io/sql-on-fhir-v2/#impls) of
the SQL-on-FHIR-v2 specification:

* The "View Runner" is by default part of the ETL Pipelines and uses the
transformed Parquet files as the “Data Layer”. _This can be extracted to be a
stand-alone component if required_

* When enabled as part of the Pipeline configuration, it will apply the
ViewDefinition resources from
the [views folder](https://github.com/google/fhir-data-pipes/tree/master/docker/config/views)
and materialize the resulting tables to the configured database (e.g., an
instance of PostgresSQL, MySQL, etc.).

* A set of pre-defined ViewDefinitions for common FHIR resources is provided as
part of the default package. These can be adapted, replaced and extended.

* The FHIR Data Pipes provides a simple ViewDefinition Editor which can be used
to explore FHIR ViewDefinitions and apply these to individual FHIR resources.

Once the FHIR data has been transformed via the ETL Pipelines, the resulting
schema is available for querying using a JDBC interface.

### ViewDefinition editor

The ViewDefinition editor provides a way to quickly evaluate ViewDefinition
resources against sample FHIR data. You access it as part of the Web Control
Panel, selecting the "Views" navigation item in the top right corner.

Using the ViewDefinition editor you can:

* Provide an input ViewDefinition (left)
* Apply it to a sample input FHIR resource (right pane)
* View the results in the generated table (top)

![FHIR Data Pipes Image](images/view_definition_editor.png)
For more information on both of these approaches, please
check [Schema and Flat Views](views.md).


190 changes: 190 additions & 0 deletions doc/docs/concepts/views.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,190 @@
# Schema and Flat Views

## Overview

The heavily nested nature of FHIR resources and the
_[Parquet on FHIR schema](https://github.com/google/fhir-data-pipes/blob/master/doc/schema.md)_
requires complex SQL queries that
can make them difficult to work with for analytics use cases. A common approach
to
address this is to flatten the data into a set of views (virtual or
materialized) which can then be queried using simpler SQL statements.

FHIR Data Pipes provides two approaches for flattening the FHIR resources into
virtual or materialized views:

1. SQL queries to
generate [virtual views](https://github.com/google/fhir-data-pipes/tree/master/docker/config/views) (
outside the pipeline)

2. FHIR [ViewDefinition](https://build.fhir.org/ig/FHIR/sql-on-fhir-v2/StructureDefinition-ViewDefinition.html)
resources to generate materialized views (within the pipeline)

For both of these approaches, a set of [**"predefined views"**](https://github.com/google/fhir-data-pipes/tree/master/docker/config/views)
for common FHIR resources are provided. These can be modified or extended.

The currently supported list (as of July, 2024) are:

```
Condition
DiagnosticReport
Encounter
Immunization
Location
Medicationrequest
Observation
Organization
Patient
Practitioner
PractitionerRole
Procedure
```

### SQL virtual views

These are samples of more complex SQL-on-FHIR queries for defining flat views
for common FHIR resources. These virtual views are applied outside the
pipelines in a downstream SQL query engine.

The queries, which have `.sql` suffix, can be found
in [/docker/config/views](https://github.com/google/fhir-data-pipes/tree/master/docker/config/views)
directory (e.g `Patient_flat.sql`).

An example of a flat view for the Observation resource is below:

```sql
CREATE OR REPLACE VIEW flat_observation AS
SELECT O.id AS obs_id, O.subject.PatientId AS patient_id,
OCC.`system` AS code_sys, OCC.code,
O.value.quantity.value AS val_quantity,
OVCC.code AS val_code, OVCC.`system` AS val_sys,
O.effective.dateTime AS obs_date
FROM Observation AS O LATERAL VIEW OUTER explode(code.coding) AS OCC
LATERAL VIEW OUTER explode(O.value.codeableConcept.coding) AS OVCC
```

## Query Simplification

The following example is taken from a tutorial Jupyter notebook available [here](https://github.com/google/fhir-data-pipes/blob/master/query/queries_large.ipynb).

The following queries count the number of patients that have had an observation
with a specific code (HIV viral load), with a value below a certain threshold
for the year 2010.

=== "Standalone Query"

```sql
SELECT COUNT(DISTINCT O.subject.PatientId) AS num_patients
FROM Observation AS O LATERAL VIEW explode(code.coding) AS OCC
WHERE OCC.code LIKE '856%%'
AND OCC.`system` = 'http://loinc.org'
AND O.value.quantity.value < 400000
AND YEAR(O.effective.dateTime) = 2010;
```
The output relation should have a count of 3074 patients:
```
+---------------+
| num_patients |
+---------------+
| 3074 |
+---------------+
```

=== "Query with Views"

```sql
SELECT COUNT(DISTINCT patient_id) AS num_patients
FROM Observation_flat
WHERE code LIKE '856%%'
AND code_sys = 'http://loinc.org'
AND val_quantity < 400000
AND YEAR(obs_date) = 2010
LIMIT 100;
```
The output relation should have a count of 3074 patients:
```
+---------------+
| num_patients |
+---------------+
| 3074 |
+---------------+
```

This approach preserves the nested structures and arrays of FHIR resources
within the `Observation_flat` view. The results of these queries can then be
used as arbitrary tables for further data analysis in other tools.

### ViewDefinition resource

The [SQL-on-FHIR-v2 specification](https://build.fhir.org/ig/FHIR/sql-on-fhir-v2/)
defines a ViewDefinition resource for defining views. Each column in the view is
defined using a [FHIRPath expression](https://hl7.org/fhirpath/).
There is also an un-nesting construct and
support for `constant` and `where` clauses too.

!!! info "Note"

* A singlular View Definition will not join different resources in any way
* Each View Definition defines a tabular view of exactly one resource type

A system (pipeline or library) that implements the “View Layer” of the
specification provides a View Runner that is able to process these FHIR
ViewDefinition Resources over the “Data Layer” (lossless representation of the
FHIR data). The output of this are a set of portable, tabular views that can be
consumed by the “Analytics Layer” which is any number of tools that can be used
to work with the resulting tabular data.

FHIR Data Pipes is
a [reference implementation](https://fhir.github.io/sql-on-fhir-v2/#impls) of
the SQL-on-FHIR-v2 specification:

* The "View Runner" is, by default, part of the ETL Pipelines and uses the
transformed Parquet files as the “Data Layer”. _This can be extracted to be a
stand-alone component if required_

* When enabled as part of the Pipeline configuration, thr "View Runner" will
apply the
ViewDefinition resources from
the [views folder](https://github.com/google/fhir-data-pipes/tree/master/docker/config/views)
and materialize the resulting tables to the configured database (an
instance of PostgresSQL, MySQL, etc.).

* A set of pre-defined ViewDefinitions for common FHIR resources is provided as
part of the default package. These can be adapted, replaced and extended.

* The FHIR Data Pipes provides a simple ViewDefinition Editor which can be used
to explore FHIR ViewDefinitions and apply these to individual FHIR resources.

Once the FHIR data has been transformed via the ETL Pipelines, the resulting
schema is available for querying using a JDBC interface.

Visit our [interactive playground](https://fhir.github.io/sql-on-fhir-v2/#pg) to
get a hands-on understanding of the Patient ViewDefinition resource, and many
more

### ViewDefinition editor

The ViewDefinition editor provides a way to quickly evaluate ViewDefinition
resources against sample FHIR data. You access it as part of
the [Web Control Panel](../additional#web-control-panel), selecting the "Views"
navigation item in the top right corner.

Using the ViewDefinition editor you can:

* Provide an input ViewDefinition (left)
* Apply it to a sample input FHIR resource (right pane)
* View the results in the generated table (top)

![FHIR Data Pipes Image](../images/view_definition_editor.png)

## Output Data Formats

### Conversion to PostgreSQL

To be continued...


### Conversion to Parquet

To be continued...

Loading

0 comments on commit 26060a4

Please sign in to comment.