Skip to content

Commit

Permalink
Add validation scripts overview
Browse files Browse the repository at this point in the history
  • Loading branch information
kzollove committed Jan 11, 2024
1 parent 5352248 commit 30b2b06
Show file tree
Hide file tree
Showing 2 changed files with 30 additions and 7 deletions.
28 changes: 22 additions & 6 deletions rmd/tooling.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -53,26 +53,42 @@ This package identifies oncology regimens. Firstly, it identifies all patients w

---

# **Development & Testing**
# **Database Characterization and Validation**

<br>

## Standards Adherence Validation
## Purpose

<br>
Provide a semi-automated and extensible framework for generating, visualizing, and sharing an assessment of an OMOP-shaped database's adherence to the OHDSI Oncology Standard (tables, vocabulary) and the availabilty and types of oncology data it contains.

## Overview

The star of the framework is an R Package. Along with cataloguing an extensible set of queries and analyses used for assessing OMOP-shaped oncology data, the R package provides functionality for the four major processes involved in the framework:

1) Authoring an assessment specification
2) Executing an assessment specification
3) Generating assessment results
4) Visualizing assessment results

### Approach

<br>
_Assessments_ can be executed against an OMOP-shaped database to create a characterization and quality report. They are created using specificications.

_Specifications_ are JSON files that describe an assessment. They are composed by compiling analyses together with threshhold values.

_Analyses_ execute a query and return a row count or proportion describing the contents in the database. For example, analysis_id=1234 returns "the number of cancer diagnosis records derived from Tumor Registry source data".

### ?
_Threshholds_ provide study specific context to the results of analyses. An analysis asks how many cancer diagnoses derived from tumor registry data are in the database. Using threshholds, an assessment author can give ranges for "bad", "questionable", and "good" analysis results as they pertain to their study. An example threshhold, which would be encoded as JSON, could express the sentiment "A database with 0-200 diagnoses from tumor registry data would be unfit for this study, 201-500 diagnoses may be suitable, and over 500 diagnoses will be more enough."

### Extensibility

This tool is a product of collaboration. See the validation scripts README for detailed instructions on creating analyses (TODO) and using the R package to author and execute assessment specifications (TODO).
<br>


---
# **Development & Testing**

<br>

## Delta Vocabulary Framework

Expand Down
9 changes: 8 additions & 1 deletion validationScripts/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,13 @@ The star of the framework is an R Package. Along with cataloguing an extensible
3) Generating assessment results
4) Visualizing assessment results

_Assessments_ are created using specificications. _Specifications_ are composed by compiling analyses together with threshhold values. _Analyses_ return a number or proportion related to contents in the database. For example, analysis_id=1234 returns "the number of cancer diagnosis records derived from Tumor Registry source data". Threshholds
### Approach

_Assessments_ can be executed against an OMOP-shaped database to create a characterization and quality report. They are created using specificications.

_Specifications_ are JSON files that describe an assessment. They are composed by compiling analyses together with threshhold values.

_Analyses_ execute a query and return a row count or proportion describing the contents in the database. For example, analysis_id=1234 returns "the number of cancer diagnosis records derived from Tumor Registry source data".

_Threshholds_ provide study specific context to the results of analyses. An analysis asks how many cancer diagnoses derived from tumor registry data are in the database. Using threshholds, an assessment author can give ranges for "bad", "questionable", and "good" analysis results as they pertain to their study. An example threshhold, which would be encoded as JSON, could express the sentiment "A database with 0-200 diagnoses from tumor registry data would be unfit for this study, 201-500 diagnoses may be suitable, and over 500 diagnoses will be more enough."

0 comments on commit 30b2b06

Please sign in to comment.