Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add configuration for DESPIAD project #572

Merged
merged 53 commits into from
Feb 6, 2025
Merged
Show file tree
Hide file tree
Changes from 50 commits
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
54eb64f
Add initial anonymisation config for ct
p-j-smith Dec 12, 2024
2730317
Add initial anonymisation config for pet
p-j-smith Dec 12, 2024
ff2a8c3
Add config for despiad
p-j-smith Dec 12, 2024
f20d5b8
anonymnise all resources before notifying the export api
p-j-smith Dec 12, 2024
d13b916
remove despaid.yaml from project config
p-j-smith Dec 12, 2024
6097314
Merge branch 'main' into paul/despiad-config
p-j-smith Dec 12, 2024
8a4a793
Merge branch 'main' into paul/despiad-config
p-j-smith Dec 16, 2024
01bf793
generate label based on patient id and study count in xnat project
p-j-smith Dec 17, 2024
5a47136
Use pseudo-anonymised StudyInstanceUID for xnat experiment label
p-j-smith Dec 17, 2024
28f92a6
Fix XNAT destination
p-j-smith Dec 17, 2024
901eaf1
Merge branch 'main' into paul/despiad-config
p-j-smith Dec 19, 2024
4fd226a
remove changes related to grouping resources before notifying export api
p-j-smith Dec 19, 2024
c2fcbe4
remove duplicated tags
p-j-smith Dec 19, 2024
5d3c69d
Add series_number_filters and allowed_manufacturers parameters to pix…
p-j-smith Jan 2, 2025
32dd84f
clarify docstring of _import_study_from_raw
p-j-smith Jan 6, 2025
7867eff
Add min_instances_per_series parameter to project config
p-j-smith Jan 6, 2025
7215191
Merge branch 'main' into paul/despiad-config
p-j-smith Jan 6, 2025
70d6794
Merge branch 'main' into paul/despiad-config
p-j-smith Jan 7, 2025
0b90ddf
Merge branch 'main' into paul/despiad-config
p-j-smith Jan 8, 2025
4cd4c7e
Keep study date and patient dob for despiad
p-j-smith Jan 8, 2025
397c6b1
Changes after reviewing the PET data for DESPIAD (#592)
davecash75 Jan 13, 2025
b6bcb3c
Add Radiopharmaceutical Start DateTime to pet.yaml
p-j-smith Jan 13, 2025
ffeb70d
remove blank lines from ct.yaml
p-j-smith Jan 13, 2025
f6095a8
remove tab from config file
p-j-smith Jan 13, 2025
b231a60
filter series number by manufacturer
p-j-smith Jan 14, 2025
5a9c52e
Add allowed_manufacturers for all test configs
p-j-smith Jan 14, 2025
f7d94e1
Count number of instances skipped due to series having too few instances
p-j-smith Jan 15, 2025
1891df0
move get_series_to_skip to dcmd
p-j-smith Jan 15, 2025
9597559
Add philips and carestream as allowed manufacturers for test project
p-j-smith Jan 15, 2025
855f82d
Merge branch 'main' into paul/despiad-config
p-j-smith Jan 15, 2025
5ea6418
Update description of project config in readme
p-j-smith Jan 15, 2025
aa170a6
Check _should_exclude_manufacurer before _should_exclude_series
p-j-smith Jan 21, 2025
f1eed49
filter out instance if manufacturer tag is missing
p-j-smith Jan 21, 2025
46d2109
allow all manufacturers for existing projects
p-j-smith Jan 21, 2025
09748bf
Merge branch 'main' into paul/despiad-config
p-j-smith Jan 21, 2025
5d4232f
Merge branch 'main' into paul/despiad-config
p-j-smith Jan 27, 2025
b6872b9
Add tests for PixlConfig.is_manufacturer_allowed and PixlConfig.is_se…
p-j-smith Jan 27, 2025
6b707cb
Add more tests for _should_exclude_series
p-j-smith Jan 27, 2025
a7c5917
Add tests for test_should_exclude_manufacturer
p-j-smith Jan 27, 2025
7527b62
Add tests for get_series_to_skip
p-j-smith Jan 28, 2025
4f59e79
Don't allow all manufacturers in the template config
p-j-smith Jan 28, 2025
48b3fd7
Set min_instances to 2 for despiad
p-j-smith Jan 28, 2025
332e648
Only allow manufacturer GE MEDICAL SYSTEMS for DESPIAD
p-j-smith Jan 29, 2025
ee5eb87
Keep Number of Time Slices attribute for PET
p-j-smith Jan 29, 2025
2084bdb
set 'pydicom.config.convert_wrong_length_to_UN = True' in dcmd
p-j-smith Jan 30, 2025
fadc695
Add series filters to despiad config
p-j-smith Jan 30, 2025
c1bfc73
Add series number and description filers for despiad
p-j-smith Jan 30, 2025
ad91125
Use ints for series numbers to exclude
p-j-smith Jan 30, 2025
fe18df1
Update default config to exclude series with mip in their description
p-j-smith Jan 30, 2025
9b749e6
Add ^company as an allowed manufacturer when testing anonymisation
p-j-smith Feb 4, 2025
eb4ee95
Use integers for series_number in tests
p-j-smith Feb 4, 2025
ecb04ce
set min_instances_per_series to 2 by default
p-j-smith Feb 4, 2025
caf101a
Set min_instances_per_series to 1 for testing
p-j-smith Feb 6, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -147,6 +147,13 @@ The configuration file defines:

- Project name: the `<project-slug>` name of the Project
- The DICOM dataset modalities to retain (e.g. `["DX", "CR"]` for X-Ray studies)
- The minimum number of instances required by a series (defaults to 1). May be set higher than 1 to filter out
p-j-smith marked this conversation as resolved.
Show resolved Hide resolved
series with a single screenshot containing patient identifiable data
- A list of series description filters (e.g. `['loc', 'pos']`). Series with descriptions matching any of these
filters will be skipped
- A list of allowed manufacturers. By default, no manufacturers are allowed. For each manufacturer:
- A regex to identify the allowed manufacturer (e.g. `^philips`)
- A list of series numbers to exclude for the given manufacturer (e.g. `[3, 4]`)
- The [anonymisation operations](/pixl_dcmd/README.md#tag-scheme-anonymisation) to be applied to the DICOM tags,
by providing a file path to one or multiple YAML files.
We currently allow two types of files:
Expand Down
2 changes: 1 addition & 1 deletion cli/src/pixl_cli/_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@
"password": config("PIXL_DB_PASSWORD"),
"database": config("PIXL_DB_NAME"),
},
} # type: dict
}


class APIConfig:
Expand Down
13 changes: 13 additions & 0 deletions orthanc/orthanc-anon/plugin/pixl.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@
from pixl_dcmd.dicom_helpers import get_study_info
from pixl_dcmd.main import (
anonymise_dicom_and_update_db,
get_series_to_skip,
parse_validation_results,
write_dataset_to_bytes,
)
Expand Down Expand Up @@ -331,6 +332,7 @@ def _anonymise_study_instances(
Return a list of the bytes of anonymised instances, and the anonymised StudyInstanceUID.
"""
config = load_project_config(project_name)
series_to_skip = get_series_to_skip(zipped_study, config.min_instances_per_series)
anonymised_instances_bytes = []
skipped_instance_counts = defaultdict(int)
dicom_validation_errors = {}
Expand All @@ -339,6 +341,17 @@ def _anonymise_study_instances(
with zipped_study.open(file_info) as file:
logger.debug("Reading file {}", file)
dataset = dcmread(file)

if dataset.SeriesInstanceUID in series_to_skip:
logger.debug(
"Skipping series {} for study {} due to too few instances",
dataset.SeriesInstanceUID,
study_info,
)
key = "DICOM instance discarded as series has too few instances"
skipped_instance_counts[key] += 1
continue

try:
anonymised_instance, instance_validation_errors = _anonymise_dicom_instance(
dataset, config
Expand Down
69 changes: 63 additions & 6 deletions pixl_core/src/core/project_config/pixl_config_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@

from __future__ import annotations

import re
from enum import Enum
from pathlib import Path
from typing import Any, Optional
Expand Down Expand Up @@ -61,6 +62,16 @@
modalities: list[str]


class Manufacturer(BaseModel):
"""
An allowed manufacturer for a project.
Also defines which series numbers to exclude for this manufacturer.
"""

regex: str = "no manufacturers allowed ^"
exclude_series_numbers: list[int] = []


class TagOperationFiles(BaseModel):
"""Tag operations files for a project. At least a base file is required."""

Expand Down Expand Up @@ -133,20 +144,66 @@
"""Project-specific configuration for Pixl."""

project: _Project
series_filters: Optional[list[str]] = None
min_instances_per_series: Optional[int] = 1
series_filters: Optional[list[str]] = [] # pydantic makes a deep copy of the empty default list
allowed_manufacturers: list[Manufacturer] = [Manufacturer()]
tag_operation_files: TagOperationFiles
destination: _Destination

def is_series_excluded(self, series_description: str) -> bool:
def is_series_description_excluded(self, series_description: str | None) -> bool:
"""
Return whether this config excludes the series with the given description
Return whether this config excludes the series with the given description.

Do a simple case-insensitive substring check - this data is ultimately typed by a human, and
different image sources may have different conventions for case conversion.

:param series_description: the series description to test
:returns: True if it should be excluded, False if not
"""
if self.series_filters is None or series_description is None:
if not self.series_filters or series_description is None:
return False
# Do a simple case-insensitive substring check - this data is ultimately typed by a human,
# and different image sources may have different conventions for case conversion.

return any(
series_description.upper().find(filt.upper()) != -1 for filt in self.series_filters
)

def is_series_number_excluded(self, manufacturer: str, series_number: str | None) -> bool:
"""
Return whether this config excludes the series with the given number for the given
manufacturer.

:param manufacturer: the manufacturer to test
:param series_number: the series number to test
:returns: True if it should be excluded, False if not
"""
if not self.is_manufacturer_allowed(manufacturer) or series_number is None:
return True

exclude_series_numbers = self._get_manufacturer(manufacturer).exclude_series_numbers
return series_number in exclude_series_numbers

def is_manufacturer_allowed(self, manufacturer: str) -> bool:
"""
Check whether the manufacturer is in the allow-list.

:param manufacturer: name of the manufacturer
:returns: True is the manufacturer is allowed, False if not
"""
for manufacturer_config in self.allowed_manufacturers:
if re.search(rf"{manufacturer_config.regex}", manufacturer, flags=re.IGNORECASE):
return True
return False

def _get_manufacturer(self, manufacturer: str) -> Manufacturer:
"""
Get the manufacturer configuration for the given manufacturer.

:param manufacturer: name of the manufacturer
:returns: Manufacturer configuration
:raises: ValueError: if the manufacturer is not allowed
"""
for manufacturer_config in self.allowed_manufacturers:
if re.search(rf"{manufacturer_config.regex}", manufacturer, flags=re.IGNORECASE):
return manufacturer_config
msg = f"Manufacturer {manufacturer} is not allowed by project {self.project.name}"
raise ValueError(msg)

Check warning on line 209 in pixl_core/src/core/project_config/pixl_config_model.py

View check run for this annotation

Codecov / codecov/patch

pixl_core/src/core/project_config/pixl_config_model.py#L208-L209

Added lines #L208 - L209 were not covered by tests
41 changes: 40 additions & 1 deletion pixl_core/tests/project_config/test_project_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -181,4 +181,43 @@ def test_series_filtering(base_yaml_data, series_filters, test_series_desc, expe
if series_filters is not None:
base_yaml_data["series_filters"] = series_filters
cfg = PixlConfig.model_validate(base_yaml_data)
assert cfg.is_series_excluded(test_series_desc) == expect_exclude
assert cfg.is_series_description_excluded(test_series_desc) == expect_exclude


@pytest.mark.parametrize(
("regex", "manufacturer", "allowed"),
[
("^allowed", "allowed", True),
("allowed", "not-allowed", False),
(None, "allowed", False),
],
)
def test_manufacturer_regex_filtering(base_yaml_data, regex, manufacturer, allowed):
"""Check the allowed manufacturers regex works."""
if regex is not None:
base_yaml_data["allowed_manufacturers"] = [{"regex": "^allowed"}]
cfg = PixlConfig.model_validate(base_yaml_data)
assert cfg.is_manufacturer_allowed(manufacturer) == allowed


@pytest.mark.parametrize(
("manufacturer", "series_number", "expect_exclude"),
[
("allowed", "2", True),
("allowed", "4", False),
("allowed", None, True),
("not-allowed", "4", True),
],
)
def test_manufacturer_series_number_filterings(
base_yaml_data, manufacturer, series_number, expect_exclude
):
"""Check the series number are correctly excluded."""
base_yaml_data["allowed_manufacturers"] = [
{"regex": "^allowed", "exclude_series_numbers": ["1", "2", "3"]}
]
cfg = PixlConfig.model_validate(base_yaml_data)
assert (
cfg.is_series_number_excluded(manufacturer=manufacturer, series_number=series_number)
== expect_exclude
)
75 changes: 71 additions & 4 deletions pixl_dcmd/src/pixl_dcmd/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
import typing
from functools import lru_cache
from io import BytesIO
from zipfile import ZipFile

import requests
from core.exceptions import PixlSkipInstanceError
Expand All @@ -26,7 +27,8 @@
anonymize_dataset,
)
from loguru import logger
from pydicom import DataElement, Dataset, dcmwrite
from pydicom import DataElement, Dataset, dcmread, dcmwrite
import pydicom

from core.project_config.pixl_config_model import PixlConfig
from pixl_dcmd._database import (
Expand All @@ -43,6 +45,10 @@
from pixl_dcmd.dicom_helpers import StudyInfo


# See: https://github.com/pydicom/pydicom/issues/2170
pydicom.config.convert_wrong_length_to_UN = True


def write_dataset_to_bytes(dataset: Dataset) -> bytes:
"""
Write pydicom DICOM dataset to byte array
Expand All @@ -56,14 +62,72 @@
return buffer.read()


def get_series_to_skip(zipped_study: ZipFile, min_instances: int) -> set[str]:
"""
Determine which series to skip based on the number of instances in the series.

If a series has fewer instances than `min_instances`, add it to a set of series to skip.

Args:
zipped_study: ZipFile containing the study
min_instances: Minimum number of instances required to include a series

"""
if min_instances <= 1:
return set()

series_instances = {}
for file_info in zipped_study.infolist():
with zipped_study.open(file_info) as file:
logger.debug("Reading file {}", file)
dataset = dcmread(file)
if dataset.SeriesInstanceUID not in series_instances:
series_instances[dataset.SeriesInstanceUID] = 1
continue
series_instances[dataset.SeriesInstanceUID] += 1

Check warning on line 87 in pixl_dcmd/src/pixl_dcmd/main.py

View check run for this annotation

Codecov / codecov/patch

pixl_dcmd/src/pixl_dcmd/main.py#L87

Added line #L87 was not covered by tests

return {
series for series, count in series_instances.items() if count < min_instances
}


def _should_exclude_series(dataset: Dataset, cfg: PixlConfig) -> bool:
"""
Check whether the dataset series should be exlucded based on its description
and number.
"""
series_description = dataset.get("SeriesDescription")
if cfg.is_series_excluded(series_description):
if cfg.is_series_description_excluded(series_description):
logger.debug("FILTERING OUT series description: {}", series_description)
return True

manufacturer = dataset.get("Manufacturer")
series_number = dataset.get("SeriesNumber")
if cfg.is_series_number_excluded(
manufacturer=manufacturer, series_number=series_number
):
logger.debug(
"FILTERING OUT series number: {} for manufacturer: {}",
series_number,
manufacturer,
)
return True

return False


def _should_exclude_manufacturer(dataset: Dataset, cfg: PixlConfig) -> bool:
manufacturer = dataset.get("Manufacturer")
if manufacturer is None:
logger.debug("FILTERING out as manufacturer tag is missing")
return True

should_exclude = not cfg.is_manufacturer_allowed(manufacturer=manufacturer)
if should_exclude:
logger.debug("FILTERING out manufacturer: {}", manufacturer)
return should_exclude


def anonymise_dicom_and_update_db(
dataset: Dataset,
*,
Expand Down Expand Up @@ -125,9 +189,12 @@
)

# Do before anonymisation in case someone decides to delete the
# Series Description tag as part of anonymisation.
# Series Description or Manufacturer tags as part of anonymisation.
if _should_exclude_manufacturer(dataset, config):
msg = "DICOM instance discarded due to its manufacturer"
raise PixlSkipInstanceError(msg)

Check warning on line 195 in pixl_dcmd/src/pixl_dcmd/main.py

View check run for this annotation

Codecov / codecov/patch

pixl_dcmd/src/pixl_dcmd/main.py#L194-L195

Added lines #L194 - L195 were not covered by tests
if _should_exclude_series(dataset, config):
msg = "DICOM instance discarded due to its series description"
msg = "DICOM instance discarded due to its series description or number"

Check warning on line 197 in pixl_dcmd/src/pixl_dcmd/main.py

View check run for this annotation

Codecov / codecov/patch

pixl_dcmd/src/pixl_dcmd/main.py#L197

Added line #L197 was not covered by tests
raise PixlSkipInstanceError(msg)
if dataset.Modality not in config.project.modalities:
msg = f"Dropping DICOM Modality: {dataset.Modality}"
Expand Down
Loading
Loading