Update Mkdocs Documentation (#1110)

google · Jul 18, 2024 · 26060a4 · 26060a4
1 parent 3d67095
commit 26060a4
Show file tree

Hide file tree

Showing 11 changed files with 851 additions and 529 deletions.
diff --git a/doc/docs/additional.md b/doc/docs/additional.md
@@ -53,7 +53,7 @@ performance.
 
 FHIR Data Pipes transforms FHIR resources to _"near lossless"_ 'Parquet on FHIR'
 representation based on
-the ["Simplified SQL Projection of FHIR Resources"](https://github.com/FHIR/sql-on-fhir/blob/master/sql-on-fhir.md) (
+the ["Simplified SQL Projection of FHIR Resources"](https://github.com/google/fhir-data-pipes/blob/master/doc/schema.md) (
 _'SQL-on-FHIR-v1'_) schema
 
 * The conversion is done using a forked version
@@ -68,7 +68,7 @@ _'SQL-on-FHIR-v1'_) schema
 
 ## Monitoring pipelines
 
-The pipelines controller exposes a number of management end-points that can help
+The pipelines controller exposes management end-points that can help
 with monitoring the health of pipelines.
 
 * The application has been integrated with the Spring Boot Actuator of Spring

diff --git a/doc/docs/concepts.md → doc/docs/concepts/concepts.md b/doc/docs/concepts.md → doc/docs/concepts/concepts.md
@@ -3,7 +3,7 @@
 The key concepts that underpin the OHS Analytics components are:
 
 1. **ETL Pipelines:** ETL Pipelines and Controller can be configured to
-   continuously transform FHIR data into an analytics friendly FHIR-in-Parquet
+   continuously transform FHIR data into an analytics friendly Parquet on FHIR
    format.
 2. **Deployment approaches**: The pipelines are designed to accommodate various
    deployment approaches in terms of scalability; from a single machine to a
@@ -24,7 +24,7 @@ FHIR Data Pipes is made up of the **ETL Pipelines** and **Controller** modules *
 FHIR data to Apache Parquet files (for data analysis) or another FHIR server (
 for data integration).
 
-![FHIR Data Pipes Transform Step Image](images/ETL_FHIR_to_Parquet.png)
+![FHIR Data Pipes Transform Step Image](../images/ETL_FHIR_to_Parquet.png)
 
 ## ETL Pipelines
 
@@ -51,12 +51,12 @@ FHIR Data Pipes is designed to fetch FHIR source data in various forms and APIs:
 FHIR Resources are transformed into a "Parquet on FHIR" format:
 
 * Uses a forked version
-  of [Bunsen library](https://github.com/google/fhir-data-pipes/tree/master/bunsen) (_
-  currently supports STU3 and R4 versions of FHIR).
+  of [Bunsen library](https://github.com/google/fhir-data-pipes/tree/master/bunsen) (
+  currently supports STU3 and R4 versions of FHIR)
 * Configurable support for FHIR profiles and extensions
 * (Optional) In-pipeline 'flattening' of FHIR data
   using [ViewDefinition](https://build.fhir.org/ig/FHIR/sql-on-fhir-v2/StructureDefinition-ViewDefinition.html)
-  resources - [read more](#viewdefinition-resource)
+  resources - [read more](views#viewdefinition-resource)
 
 ### Loading
 
@@ -78,10 +78,11 @@ full", "incremental", and "merger" pipelines together.
 * The Pipelines Controller is built on top of pipelines and shares many of the
   same settings
 * Using the controller module you can schedule periodic incremental updates or
-  use a [web control panel](link_to_section_in_adv_guide) to start the pipeline
+  use the [Web Control Panel](../additional#web-control-panel) to start the
+  pipeline
   manually
 
-## Deployment approaches
+## Deployment Approaches
 
 There are a number of different deployment approaches - see table below.
 
@@ -97,112 +98,17 @@ requirements, and expertise of the team.
 | Exploratory data science or ML use cases                                         | Use the generated Parquet files which as _"near lossless"_ for enhanced data science workflows           | Can either use the Parquet or custom schema to power dashboards or reports                |
 | Push FHIR data to a central FHIR-store (e.g., for a Shared Health Record system) | Use the Pipelines Controller to push from a FHIR source to a FHIR sink                                   | Management of the intermediate Parquet files created as part of the pipelines             | 
 
-## Query simplification approaches with pre-defined views
-
-The heavily nested nature of FHIR resources requires complex SQL queries that
-can make it difficult to work with for analytics use cases. A common approach to
-address this is to flatten the data into a set of views (virtual or
-materialized) which can then be queried using simpler SQL statements.
+## Query Simplification with Flat Views
 
 FHIR Data Pipes provides two approaches for flattening the FHIR resources into
 virtual or materialized views:
 
 1. SQL queries to generate virtual views (outside the pipeline)
-2.
-FHIR [ViewDefinition](https://build.fhir.org/ig/FHIR/sql-on-fhir-v2/StructureDefinition-ViewDefinition.html)
+
+2. FHIR [ViewDefinition](https://build.fhir.org/ig/FHIR/sql-on-fhir-v2/StructureDefinition-ViewDefinition.html)
 resources to generate materialized views (within the pipeline)
 
-For both of these approaches, a set of [**"predefined views”
-**](https://github.com/google/fhir-data-pipes/tree/master/docker/config/views)
-for common resources are provided. These can be modified or extended.
-
-The currently supported list (as of June, 2024) are:
-
-```
-Condition
-DiagnosticReport
-Encounter
-Immunization
-Location
-Medicationrequest
-Observation
-Organization
-Patient
-Practitioner
-PractitionerRole
-Procedure
-```
-
-### SQL virtual views
-
-These are samples of more complex SQL-on-FHIR queries for defining flat views
-for common FHIR resources. These virtual views are applied outside of the
-pipelines in the downstream SQL query engine.
-
-The queries, which have `.sql` suffix, can be found
-in [/docker/config/views](https://github.com/google/fhir-data-pipes/tree/master/docker/config/views)
-directory (e.g `Patient_flat.sql`).
-
-An example of a flat view for the Observation resource is below:
-
-```sql
-CREATE OR REPLACE VIEW flat_observation AS
-SELECT O.id AS obs_id, O.subject.PatientId AS patient_id,
-        OCC.`system` AS code_sys, OCC.code,
-        O.value.quantity.value AS val_quantity,
-        OVCC.code AS val_code, OVCC.`system` AS val_sys,
-        O.effective.dateTime AS obs_date
-      FROM Observation AS O LATERAL VIEW OUTER explode(code.coding) AS OCC
-        LATERAL VIEW OUTER explode(O.value.codeableConcept.coding) AS OVCC
-```
-
-### ViewDefinition resource
-
-The [SQL-on-FHIR-v2 specification](https://build.fhir.org/ig/FHIR/sql-on-fhir-v2/)
-defines a ViewDefinition resource for defining views. Each column in the view is
-defined using a FHIRPath expression. There is also an unnesting construct and
-support for `constant` and `where` clauses too.
-
-A system (pipeline or library) that implements the “View Layer” of the
-specification provides a View Runner that is able to process these FHIR
-ViewDefinition Resources over the “Data Layer” (lossless representation of the
-FHIR data). The output of this are a set of portable, tabular views that can be
-consumed by the “Analytics Layer” which is any number of tools that can be used
-to work with the resulting tabular data.
-
-FHIR Data Pipes is
-a [reference implementation](https://fhir.github.io/sql-on-fhir-v2/#impls) of
-the SQL-on-FHIR-v2 specification:
-
-* The "View Runner" is by default part of the ETL Pipelines and uses the
-  transformed Parquet files as the “Data Layer”. _This can be extracted to be a
-  stand-alone component if required_
-
-* When enabled as part of the Pipeline configuration, it will apply the
-  ViewDefinition resources from
-  the [views folder](https://github.com/google/fhir-data-pipes/tree/master/docker/config/views)
-  and materialize the resulting tables to the configured database (e.g., an
-  instance of PostgresSQL, MySQL, etc.).
-
-* A set of pre-defined ViewDefinitions for common FHIR resources is provided as
-  part of the default package. These can be adapted, replaced and extended.
-
-* The FHIR Data Pipes provides a simple ViewDefinition Editor which can be used
-  to explore FHIR ViewDefinitions and apply these to individual FHIR resources.
-
-Once the FHIR data has been transformed via the ETL Pipelines, the resulting
-schema is available for querying using a JDBC interface.
-
-### ViewDefinition editor
-
-The ViewDefinition editor provides a way to quickly evaluate ViewDefinition
-resources against sample FHIR data. You access it as part of the Web Control
-Panel, selecting the "Views" navigation item in the top right corner.
-
-Using the ViewDefinition editor you can:
-
-* Provide an input ViewDefinition (left)
-* Apply it to a sample input FHIR resource (right pane)
-* View the results in the generated table (top)
-
-![FHIR Data Pipes Image](images/view_definition_editor.png)
+For more information on both of these approaches, please
+check [Schema and Flat Views](views.md).
+
+
diff --git a/doc/docs/concepts/views.md b/doc/docs/concepts/views.md
@@ -0,0 +1,190 @@
+# Schema and Flat Views
+
+## Overview
+
+The heavily nested nature of FHIR resources and the
+_[Parquet on FHIR schema](https://github.com/google/fhir-data-pipes/blob/master/doc/schema.md)_
+requires complex SQL queries that
+can make them difficult to work with for analytics use cases. A common approach
+to
+address this is to flatten the data into a set of views (virtual or
+materialized) which can then be queried using simpler SQL statements.
+
+FHIR Data Pipes provides two approaches for flattening the FHIR resources into
+virtual or materialized views:
+
+1. SQL queries to
+   generate [virtual views](https://github.com/google/fhir-data-pipes/tree/master/docker/config/views) (
+   outside the pipeline)
+
+2. FHIR [ViewDefinition](https://build.fhir.org/ig/FHIR/sql-on-fhir-v2/StructureDefinition-ViewDefinition.html)
+resources to generate materialized views (within the pipeline)
+
+For both of these approaches, a set of [**"predefined views"**](https://github.com/google/fhir-data-pipes/tree/master/docker/config/views)
+for common FHIR resources are provided. These can be modified or extended.
+
+The currently supported list (as of July, 2024) are:
+
+```
+Condition
+DiagnosticReport
+Encounter
+Immunization
+Location
+Medicationrequest
+Observation
+Organization
+Patient
+Practitioner
+PractitionerRole
+Procedure
+```
+
+### SQL virtual views
+
+These are samples of more complex SQL-on-FHIR queries for defining flat views
+for common FHIR resources. These virtual views are applied outside the
+pipelines in a downstream SQL query engine.
+
+The queries, which have `.sql` suffix, can be found
+in [/docker/config/views](https://github.com/google/fhir-data-pipes/tree/master/docker/config/views)
+directory (e.g `Patient_flat.sql`).
+
+An example of a flat view for the Observation resource is below:
+
+```sql
+CREATE OR REPLACE VIEW flat_observation AS
+SELECT O.id AS obs_id, O.subject.PatientId AS patient_id,
+        OCC.`system` AS code_sys, OCC.code,
+        O.value.quantity.value AS val_quantity,
+        OVCC.code AS val_code, OVCC.`system` AS val_sys,
+        O.effective.dateTime AS obs_date
+      FROM Observation AS O LATERAL VIEW OUTER explode(code.coding) AS OCC
+        LATERAL VIEW OUTER explode(O.value.codeableConcept.coding) AS OVCC
+```
+
+## Query Simplification
+
+The following example is taken from a tutorial Jupyter notebook available [here](https://github.com/google/fhir-data-pipes/blob/master/query/queries_large.ipynb).
+
+The following queries count the number of patients that have had an observation 
+with a specific code (HIV viral load), with a value below a certain threshold 
+for the year 2010.
+
+=== "Standalone Query"
+
+    ```sql
+    SELECT COUNT(DISTINCT O.subject.PatientId) AS num_patients
+      FROM Observation AS O LATERAL VIEW explode(code.coding) AS OCC
+      WHERE OCC.code LIKE '856%%'
+        AND OCC.`system` = 'http://loinc.org'
+        AND O.value.quantity.value < 400000
+        AND YEAR(O.effective.dateTime) = 2010;
+    ```
+    The output relation should have a count of 3074 patients:
+    ```
+    +---------------+
+    | num_patients  |
+    +---------------+
+    | 3074          |
+    +---------------+
+    ```
+
+=== "Query with Views"
+
+    ```sql
+    SELECT COUNT(DISTINCT patient_id) AS num_patients
+      FROM Observation_flat
+      WHERE code LIKE '856%%'
+        AND code_sys = 'http://loinc.org'
+        AND val_quantity < 400000
+        AND YEAR(obs_date) = 2010
+      LIMIT 100;
+    ```
+    The output relation should have a count of 3074 patients:
+    ```
+    +---------------+
+    | num_patients  |
+    +---------------+
+    | 3074          |
+    +---------------+
+    ```
+
+This approach preserves the nested structures and arrays of FHIR resources
+within the `Observation_flat` view. The results of these queries can then be 
+used as arbitrary tables for further data analysis in other tools.
+
+### ViewDefinition resource
+
+The [SQL-on-FHIR-v2 specification](https://build.fhir.org/ig/FHIR/sql-on-fhir-v2/)
+defines a ViewDefinition resource for defining views. Each column in the view is
+defined using a [FHIRPath expression](https://hl7.org/fhirpath/).
+There is also an un-nesting construct and
+support for `constant` and `where` clauses too.
+
+!!! info "Note"
+
+    * A singlular View Definition will not join different resources in any way
+    * Each View Definition defines a tabular view of exactly one resource type
+
+A system (pipeline or library) that implements the “View Layer” of the
+specification provides a View Runner that is able to process these FHIR
+ViewDefinition Resources over the “Data Layer” (lossless representation of the
+FHIR data). The output of this are a set of portable, tabular views that can be
+consumed by the “Analytics Layer” which is any number of tools that can be used
+to work with the resulting tabular data.
+
+FHIR Data Pipes is
+a [reference implementation](https://fhir.github.io/sql-on-fhir-v2/#impls) of
+the SQL-on-FHIR-v2 specification:
+
+* The "View Runner" is, by default, part of the ETL Pipelines and uses the
+  transformed Parquet files as the “Data Layer”. _This can be extracted to be a
+  stand-alone component if required_
+
+* When enabled as part of the Pipeline configuration, thr "View Runner" will
+  apply the
+  ViewDefinition resources from
+  the [views folder](https://github.com/google/fhir-data-pipes/tree/master/docker/config/views)
+  and materialize the resulting tables to the configured database (an
+  instance of PostgresSQL, MySQL, etc.).
+
+* A set of pre-defined ViewDefinitions for common FHIR resources is provided as
+  part of the default package. These can be adapted, replaced and extended.
+
+* The FHIR Data Pipes provides a simple ViewDefinition Editor which can be used
+  to explore FHIR ViewDefinitions and apply these to individual FHIR resources.
+
+Once the FHIR data has been transformed via the ETL Pipelines, the resulting
+schema is available for querying using a JDBC interface.
+
+Visit our [interactive playground](https://fhir.github.io/sql-on-fhir-v2/#pg) to
+get a hands-on understanding of the Patient ViewDefinition resource, and many
+more
+
+### ViewDefinition editor
+
+The ViewDefinition editor provides a way to quickly evaluate ViewDefinition
+resources against sample FHIR data. You access it as part of
+the [Web Control Panel](../additional#web-control-panel), selecting the "Views"
+navigation item in the top right corner.
+
+Using the ViewDefinition editor you can:
+
+* Provide an input ViewDefinition (left)
+* Apply it to a sample input FHIR resource (right pane)
+* View the results in the generated table (top)
+
+![FHIR Data Pipes Image](../images/view_definition_editor.png)
+
+## Output Data Formats
+
+### Conversion to PostgreSQL
+
+To be continued... 
+
+
+### Conversion to Parquet
+
+To be continued...
+