Skip to content

Commit

Permalink
Add CDA Parser Documentation (#93)
Browse files Browse the repository at this point in the history
* Added docs for cda parser

* Update READEME.md

* README
  • Loading branch information
jenniferjiangkells authored Oct 25, 2024
1 parent ed7c69f commit fdd84af
Show file tree
Hide file tree
Showing 2 changed files with 113 additions and 22 deletions.
56 changes: 34 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ First time here? Check out our [Docs](https://dotimplement.github.io/HealthChain

## Features
- [x] 🛠️ Build custom pipelines or use [pre-built ones](https://dotimplement.github.io/HealthChain/reference/pipeline/pipeline/#prebuilt) for your healthcare NLP and ML tasks
- [x] 🏗️ Add built-in CDA and FHIR parsers to connect your pipeline to interoperability standards
- [x] 🏗️ Add built-in [CDA and FHIR parsers](https://dotimplement.github.io/HealthChain/reference/utilities/cda_parser/) to connect your pipeline to interoperability standards
- [x] 🧪 Test your pipelines in full healthcare-context aware [sandbox](https://dotimplement.github.io/HealthChain/reference/sandbox/sandbox/) environments
- [x] 🗃️ Generate [synthetic healthcare data](https://dotimplement.github.io/HealthChain/reference/utilities/data_generator/) for testing and development
- [x] 🚀 Deploy sandbox servers locally with [FastAPI](https://fastapi.tiangolo.com/)
Expand All @@ -33,7 +33,7 @@ First time here? Check out our [Docs](https://dotimplement.github.io/HealthChain
- **Built by health tech developers, for health tech developers** - HealthChain is tech stack agnostic, modular, and easily extensible.

## Pipeline
Pipelines provide a flexible way to build and manage processing pipelines for NLP and ML tasks that can easily interface with parsers and connectors to integrate with EHRs.
Pipelines provide a flexible way to build and manage processing pipelines for NLP and ML tasks that can easily integrate with complex healthcare systems.

### Building a pipeline

Expand Down Expand Up @@ -70,21 +70,39 @@ result = nlp(Document("Patient has a history of heart attack and high blood pres

print(f"Entities: {result.entities}")
```

#### Adding connectors
Connectors give your pipelines the ability to interface with EHRs.

```python
from healthchain.io import CdaConnector
from healthchain.models import CdaRequest

cda_connector = CdaConnector()

pipeline.add_input(cda_connector)
pipeline.add_output(cda_connector)

pipe = pipeline.build()

cda_data = CdaRequest(document="<CDA XML content>")
output = pipe(cda_data)
```

### Using pre-built pipelines
Pre-built pipelines are use case specific end-to-end workflows that already have connectors and models built-in.

```python
from healthchain.io.containers import Document
from healthchain.pipeline import MedicalCodingPipeline
from healthchain.models import CdaRequest

# Load the pre-built MedicalCodingPipeline
pipeline = MedicalCodingPipeline.load("./path/to/model")

# Create a document to process
result = pipeline(Document("Patient has a history of myocardial infarction and hypertension."))

print(f"Entities: {result.entities}")
cda_data = CdaRequest(document="<CDA XML content>")
output = pipeline(cda_data)
```


## Sandbox

Sandboxes provide a staging environment for testing and validating your pipeline in a realistic healthcare context.
Expand All @@ -102,7 +120,7 @@ Sandboxes provide a staging environment for testing and validating your pipeline
```python
import healthchain as hc

from healthchain.pipeline import Pipeline
from healthchain.pipeline import SummarizationPipeline
from healthchain.use_cases import ClinicalDecisionSupport
from healthchain.models import Card, CdsFhirData, CDSRequest
from healthchain.data_generator import CdsDataGenerator
Expand All @@ -111,25 +129,19 @@ from typing import List
@hc.sandbox
class MyCDS(ClinicalDecisionSupport):
def __init__(self) -> None:
self.pipeline = Pipeline.load("./path/to/model")
self.pipeline = SummarizationPipeline.load("./path/to/model")
self.data_generator = CdsDataGenerator()

# Sets up an instance of a mock EHR client of the specified workflow
@hc.ehr(workflow="patient-view")
@hc.ehr(workflow="encounter-discharge")
def ehr_database_client(self) -> CdsFhirData:
return self.data_generator.generate()

# Define your application logic here
@hc.api
def my_service(self, data: CDSRequest) -> List[Card]:
def my_service(self, data: CDSRequest) -> CDSRequest:
result = self.pipeline(data)
return [
Card(
summary="Welcome to our Clinical Decision Support service.",
detail=result.summary,
indicator="info"
)
]
return result
```

### Clinical Documentation
Expand All @@ -145,7 +157,7 @@ import healthchain as hc

from healthchain.pipeline import MedicalCodingPipeline
from healthchain.use_cases import ClinicalDocumentation
from healthchain.models import CcdData, ProblemConcept, Quantity,
from healthchain.models import CcdData, CdaRequest, CdaResponse

@hc.sandbox
class NotereaderSandbox(ClinicalDocumentation):
Expand All @@ -161,8 +173,8 @@ class NotereaderSandbox(ClinicalDocumentation):
return CcdData(cda_xml=xml_string)

@hc.api
def my_service(self, ccd_data: CcdData) -> CcdData:
annotated_ccd = self.pipeline(ccd_data)
def my_service(self, data: CdaRequest) -> CdaResponse:
annotated_ccd = self.pipeline(data)
return annotated_ccd
```
### Running a sandbox
Expand Down
79 changes: 79 additions & 0 deletions docs/reference/utilities/cda_parser.md
Original file line number Diff line number Diff line change
@@ -1 +1,80 @@
# CDA Parser

The `CdaAnnotator` class is responsible for parsing and annotating CDA (Clinical Document Architecture) documents. It extracts information about problems, medications, allergies, and notes from the CDA document, and allows you to add new information to the CDA document.

The CDA parser is used in the [CDA Connector](../pipeline/connectors/cdaconnector.md) module, but can also be used independently.

Internally, `CdaAnnotator` parses CDA documents from XML strings to a dictionary-based representation using `xmltodict` and uses Pydantic for data validation. New problems are added to the CDA document using a template-based approach. It's currently not super configurable, but we're working on it.

Data interacts with the `CdaAnnotator` through `Concept` data models, which are designed to be an system-agnostic intermediary between FHIR and CDA data representations.

[(CdaAnnotator API Reference](../../api/cda_parser.md) [| Concept API Reference)](../../api/data_models.md#healthchain.models.data.concept)

## Usage

### Parsing CDA documents

Parse a CDA document from an XML string:

```python
from healthchain.cda_parser import CdaAnnotator

cda = CdaAnnotator.from_xml(cda_xml_string)

problems = cda.problem_list
medications = cda.medication_list
allergies = cda.allergy_list
note = cda.note

print([problem.name for problem in problems])
print([medication.name for medication in medications])
print([allergy.name for allergy in allergies])
print(note)
```

You can access data parsed from the CDA document in the `problem_list`, `medication_list`, `allergy_list`, and `note` attributes of the `CdaAnnotator` instance. They return a list of `Concept` data models.

### Adding new information to the CDA document

The methods currently available for adding new information to the CDA document are:

| Method | Description |
|--------|-------------|
| `.add_to_problem_list()` | Adds a list of [ProblemConcept](../../api/data_models.md#healthchain.models.data.concept.ProblemConcept) |
| `.add_to_medication_list()` | Adds a list of [MedicationConcept](../../api/data_models.md#healthchain.models.data.concept.MedicationConcept) |
| `.add_to_allergy_list()` | Adds a list of [AllergyConcept](../../api/data_models.md#healthchain.models.data.concept.AllergyConcept) |

The `overwrite` parameter in the `add_to_*_list()` methods is used to determine whether to overwrite the existing list or append to it. If `overwrite` is `True`, the existing list will be replaced with the new list. If `overwrite` is `False`, the new list will be appended to the existing list.

Depending on the use case, you don't always need to return the original list of information in the CDA document you receive, although this is mostly useful if you are just developing and don't want the eye-strain of a lengthy CDA document.

### Exporting the CDA document

```python
xml_string = cda.export(pretty_print=True)
```

The `pretty_print` parameter is optional and defaults to `True`. If `pretty_print` is `True`, the XML string will be formatted with newlines and indentation.

## Example

```python
from healthchain.cda_parser import CdaAnnotator
from healthchain.models import ProblemConcept, MedicationConcept, AllergyConcept

cda = CdaAnnotator.from_xml(cda_xml_string)

new_problems = [ProblemConcept(name="New Problem", code="123456")]
new_medications = [MedicationConcept(name="New Medication", code="789012")]
new_allergies = [AllergyConcept(name="New Allergy", code="345678")]

# Add new problems, medications, and allergies
cda.add_to_problem_list(new_problems, overwrite=True)
cda.add_to_medication_list(new_medications, overwrite=True)
cda.add_to_allergy_list(new_allergies, overwrite=True)

# Export the modified CDA document
modified_cda_xml = cda.export()
```

The CDA parser is a work in progress. I'm just gonna be real with you, CDAs are the bane of my existence. If you, for some reason, love working with XML-based documents, please get [in touch](https://discord.gg/UQC6uAepUz)! We have plans to implement more functionality in the future, including allowing configurable templates, more CDA section methods, and using LLMs as a fallback parsing method.

0 comments on commit fdd84af

Please sign in to comment.