Skip to content

Commit

Permalink
RadBench release
Browse files Browse the repository at this point in the history
  • Loading branch information
suneeta-mall committed Sep 4, 2024
0 parents commit e2b3a12
Show file tree
Hide file tree
Showing 29 changed files with 988 additions and 0 deletions.
2 changes: 2 additions & 0 deletions .github/CODEOWNERS
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
* @harrison-ai/ai

15 changes: 15 additions & 0 deletions .github/pull_request_template.md.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
Please follow the conventional commits documentation types: <https://www.conventionalcommits.org/en/v1.0.0/#specification>

## Proposed changes

Describe your changes here to communicate to the maintainers why we should accept this pull request.

### Focused Review

If there are parts of this PR that you would like special attention to, please mention them here and tag the most appropriate reviewer.

- Does this test cover all the important cases [TAG PERSON]

**Related issue:**


22 changes: 22 additions & 0 deletions .github/workflows/pages.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
name: Docs
on:
push:
#branches:
# - main
permissions:
contents: write
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: '3.10'

- name: install publishing dependencies
run: make install

- name: Deploy pages
run: mkdocs gh-deploy --force
# run: mkdocs /bin/bash -c "HOME=/tmp python -m mkdocs build"
11 changes: 11 additions & 0 deletions .github/workflows/renovate.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
on:
workflow_dispatch:

name: Renovate

jobs:
check_dependencies:
name: Check dependencies
runs-on: ubuntu-22.04
steps:
- uses: actions/checkout@v4
82 changes: 82 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
.coverage
.mypy_cache/
.pip.conf
.pytest_cache/
.pytest_logs/
lightning_logs/
.venv/
.vscode
__pycache__

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
outputs/
artifacts/

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
.hypothesis/
.pytest_cache/


# Crash log files
crash.log
*.log

# Envvars environment configuration file
.env
.envrc
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
.direnv
.envrc/
.vscode/
.pip.conf
.requirements-no-hashes.txt
.python-version

# Jupyter Notebook
.ipynb_checkpoints

# Temporary caches
*.so
cache/*
.tmp
site
11 changes: 11 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
.PHONY: install serve clean
.DEFAULT_GOAL := serve

install:
pip install -r requirements.txt

serve:
mkdocs serve

clean:
git clean -Xdf
36 changes: 36 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
![RadBench Logo](docs/resources/logo_font_azure.png)
# RadBench: Radiology Benchmark Framework

[![Documentations](https://img.shields.io/badge/Documentations-blue?style=flat)](https://harrison-ai.github.io/radbench/)

## Overview

RadBench is a radiology benchmark framework developed by [Harrison.ai](https://harrison.ai/). It is designed to evaluate the performance of Harrison.ai's foundational radiology model, `harrison.rad.1`, against other competitive models in the field. The framework employs a rigorous evaluation methodology across three distinct datasets to ensure the models are thoroughly assessed for clinical relevance, accuracy, and case comprehension. These datasets are:

1. [**RadBench Dataset**](docs/datasets/radbench.md): A new visual question-answering dataset designed by Harrison.ai to benchmark radiology models.

2. [**VQA-RAD Dataset**](docs/datasets/vqa-rad.md): A visual question-answering dataset for radiology, available at [Nature Datasets](https://www.nature.com/articles/sdata2018251).

3. [**Fellowship of the Royal College of Radiologists (FRCR) 2B Examination**](docs/datasets/frcr.md): Curated for the Fellowship of the Royal College of Radiologists (FRCR) Rapids 2B exam, obtained from third parties to ensure fairness in our evaluation process.



# mkdocs dev

To launch mkdocs locally, follow these instructions:

1. Create a Python environment:
```bash
python3 -m venv .venv
. .venv/bin/activate
```

2. Install the dependencies:
```bash
make install
```

3. Start the serving endpoint:
```bash
make serve
```
498 changes: 498 additions & 0 deletions data/radbench/radbench.csv

Large diffs are not rendered by default.

7 changes: 7 additions & 0 deletions docs/datasets/frcr.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
![RadBench Logo](../resources/logo_font_azure.png)

# FRCR

Medical specialists undertake rigorous and thorough evaluation examinations before practicing Radiology. The Fellowship of the Royal College of Radiologists (FRCR) is one such examination. We used a component of this examination, the FRCR 2B Rapids [@FRCR2B], to benchmark radiology foundation models.

While the actual examinations are kept confidential to prevent leakage, mock FRCR examinations are available on various established educational websites. Our FRCR evaluation dataset is comprised of 70 FRCR examination sheets procured from these established third party organisations. We have sourced this dataset from third party to ensure fairness in our evaluation process.
37 changes: 37 additions & 0 deletions docs/datasets/radbench.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
![RadBench Logo](../resources/logo_font_azure.png)

# RadBench Dataset

RadBench dataset is collation of clinically relevant Radiology specific visual questions and answers (VQA) based on plain film X-ray. This VQA dataset is clinically comprehensive, covering 3 or more questions per medical imaging. The radiology images for this set are sourced from [Medpix](https://medpix.nlm.nih.gov/home) and [Radiopaedia](https://radiopaedia.org/). RadBench is curated by medical doctors with expertise in relevant fields who interpret these images as part of their clinical duties.


![RadBench Overview](/resources/radbench_overview.jpg)

## Overview

The [RadBench dataset](https://github.com/harrison-ai/radbench/blob/main/data/radbench/radbench.csv) is formatted similarly to VQA-Rad[@Lau2018;] to ensure ease of use by Medical/Radiology communities. Some key differences are:

* **Rich set of possible answers**: The closed questions of the RadBench dataset have a set of possible answers explicitly defined.
* **Level of correctness**: The set of possible answers for given question is also ordered in terms of relative correctness. This is done to account for the fact that some options can be more incorrect than others. This ordering also helps with differential diagnosis.
* **Multi-turn Questionnaire**: Questions are ordered per case by specificity - meaning that if evaluated in the same context, they should be asked in that order. For example, "Is there a fracture in the study?" should be asked prior to "Which side is the fracture on?" as the second question implies the answer to the first.


## Why RadBench?

There has been a growing concern within computer vision and deep learning (CV & DL) communities that we have started to overfit popular existing benchmarks, such as ImageNet [@abs-2006-07159].
We share this concern and worry that Radiology foundation models perhaps are also starting to overfit on VQA-Rad [@Lau2018]. Besides, existing Radiology VQA datasets have several shortcomings:

* Some datasets have automatically generated questions and answers from existing noisy labels extracted from radiology reports. This leads to unnatural and ambiguous questions which cannot be adequately answered given the image. For instance:
* This question `In the given Chest X-Ray, is cardiomegaly present in the upper? (please answer yes/no)` (dataset source: ProbMed)[@ProbMed2024] is anatomically impossible to answer as cardiomegaly is not divided into `upper` and `lower`.
* Likewise, in SLAKE [@SLAKE2021] dataset, given the image `xmlab470/source.jpg`, question `Where is the brain non-enhancing tumor?` is asked. However the image is an axial non-contrast T2 MRI of the brain whereby answering the question of 'non-enhancing tumor' is not possible. The answer for this question is also given as `Upper Left Lobe` which is not a valid anatomical region in the brain. This should be answered as `anterior left frontal lobe`.
* Some existing datasets have been curated by non-medical specialists, leading to questions which may be less relevant to everyday clinical work and pathology.
* Existing datasets do not include more than one image per question, whereas in radiology many studies do include more than one view. Having only one image does not allow us to evaluate the model for its ability of comparing multiple images at once, which is a very clinically relevant task.
* Existing datasets do not specify the context in which the images should be used. This is relevant to RadBench as more than one image can be used in a single question. In RadBench, the `<i>` token is used to denote the location of an image in relation to the surrounding words (more specifically tokens). This allows specific references to the images in the question e.g. "the first study" or "the second study". As a result multi-turn comparison questions can now be asked.
* Existing datasets are not selected for clinically challenging cases where the pathology is visually subtle or rare. RadBench specifically selects a wide range of pathology in different anatomical parts with the intention of including challenging cases.




## Acknowledgements

We thank [Medpix](https://medpix.nlm.nih.gov/home) and [Radiopaedia](https://radiopaedia.org/) and their respective editorial teams and contributors specially NIH, Frank Gaillard, Andrew Dixon, and other Radiopaedia.org contributors for creating such a rich library of cases to test radiology expertise.
8 changes: 8 additions & 0 deletions docs/datasets/vqa-rad.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
![RadBench Logo](../resources/logo_font_azure.png)

# VQA-Rad

VQA-Rad is a dataset of clinically generated visual questions and answers about radiology images [@Lau2018;]. This dataset can be downloaded from [nature dataset](https://www.nature.com/articles/sdata2018251) or [here](https://files.osf.io/v1/resources/89kps/providers/osfstorage/?zip=) or alternatively from [Hugging Face](https://huggingface.co/datasets/flaviagiammarino/vqa-rad).



14 changes: 14 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
![RadBench Logo](https://harrison-ai.github.io/radbench/resources/logo_font_azure.png)

# RadBench: Radiology Benchmark Framework


## Overview

RadBench is a radiology benchmark framework developed by [Harrison.ai](https://harrison.ai/). It is designed to evaluate the performance of Harrison.ai's foundational radiology model, `harrison.rad.1`, against other competitive models in the field. The framework employs a rigorous evaluation methodology across three distinct datasets to ensure the models are thoroughly assessed for clinical relevance, accuracy, and case comprehension. These datasets are:

1. [**RadBench Dataset**](/datasets/radbench): A new visual question-answering dataset designed by Harrison.ai to benchmark radiology models.

2. [**VQA-RAD Dataset**](/datasets/vqa-rad): A visual question-answering dataset for radiology, available at [Nature Datasets](https://www.nature.com/articles/sdata2018251).

3. [**Fellowship of the Royal College of Radiologists (FRCR) 2B Examination**](/datasets/frcr): Curated for the Fellowship of the Royal College of Radiologists (FRCR) Rapids 2B exam, obtained from third parties to ensure fairness in our evaluation process.
13 changes: 13 additions & 0 deletions docs/readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
![RadBench Logo](https://harrison-ai.github.io/radbench/resources/logo_font_azure.png)

# RadBench: Radiology Benchmark Framework

## Overview

RadBench is a radiology benchmark framework developed by [Harrison.ai](https://harrison.ai/). It is designed to evaluate the performance of Harrison.ai's foundational radiology model, `harrison.rad.1`, against other competitive models in the field. The framework employs a rigorous evaluation methodology across three distinct datasets to ensure the models are thoroughly assessed for clinical relevance, accuracy, and case comprehension. These datasets are:

1. [**RadBench Dataset**](/datasets/radbench): A new visual question-answering dataset designed by Harrison.ai to benchmark radiology models.

2. [**VQA-RAD Dataset**](/datasets/vqa-rad): A visual question-answering dataset for radiology, available at [Nature Datasets](https://www.nature.com/articles/sdata2018251).

3. [**Fellowship of the Royal College of Radiologists (FRCR) 2B Examination**](/datasets/frcr): Curated for the Fellowship of the Royal College of Radiologists (FRCR) Rapids 2B exam, obtained from third parties to ensure fairness in our evaluation process.
51 changes: 51 additions & 0 deletions docs/references/refs.bib
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
@article{Lau2018,
title = {A dataset of clinically generated visual questions and answers about radiology images},
author = {Lau, JJ and Gayen, S and Ben, Abacha A and Demner-Fushman, D.},
journal = {Scientific Data},
volume = {5},
number = {1},
pages = {2052-4463},
year = {2018},
url = {https://www.nature.com/articles/sdata2018251}
}

@article{SLAKE2021,
title = {SLAKE: A Semantically-Labeled Knowledge-Enhanced Dataset for Medical Visual Question Answering},
author = {Liu, B. and Zhan, L. and Xu, L. and Ma, L. and Yang, Y. and Wu, X.},
journal = {ArXiv},
number = {1},
volume = {/abs/2102.09542},
year = {2021},
url = {https://arxiv.org/abs/2102.09542}
}


@article{ProbMed2024,
title = {Worse than Random? An Embarrassingly Simple Probing Evaluation of Large Multimodal Models in Medical VQA.},
author = {Yan, Q. and He, X. and Yue, X. and Wang, X. E.},
journal = {ArXiv},
number = {1},
volume = {/abs/2405.20421},
year = {2024},
url = {https://arxiv.org/abs/2405.20421}
}

@online{FRCR2B,
author = {The Royal College of Radiologists},
title = {FRCR Part 2B (Radiology) - CR2B | The Royal College of Radiologists. },
url = {https://www.rcr.ac.uk/exams-training/rcr-exams/clinical-radiology-exams/frcr-part-2b-radiology-cr2b/},
urldate = {2024-08-07}
}

@article{abs-2006-07159,
author = {Lucas Beyer and
Olivier J. H{\'{e}}naff and
Alexander Kolesnikov and
Xiaohua Zhai and
A{\"{a}}ron van den Oord},
title = {Are we done with ImageNet?},
journal = {CoRR},
volume = {abs/2006.07159},
year = {2020},
url = {https://arxiv.org/abs/2006.07159}
}
Binary file added docs/resources/logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 4 additions & 0 deletions docs/resources/logo.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/resources/logo_font_azure.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/resources/logo_font_black.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/resources/logo_font_mint.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/resources/logo_font_white.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit e2b3a12

Please sign in to comment.