From 5341b2adf6049db47961f28838e6774d5320f579 Mon Sep 17 00:00:00 2001
From: jenniferjiangkells Contribute to MiADE! Contribute to MiADE! Contribution guide MiADE (Medical information AI Data Extractor) is a set of tools for extracting formattable data from clinical notes stored in electronic health record systems (EHRs). Built with Cogstack's MedCAT package. To install MiADE, you need to download the spacy base model and Med7 model first:Contributing¶
-
Then, install MiADE:pip install https://huggingface.co/kormilitzin/en_core_med7_lg/resolve/main/en_core_med7_lg-any-py3-none-any.whl\npython -m spacy download en_core_web_md\n
pip install miade\n
"},{"location":"#license","title":"License","text":"MiADE is licensed under Elastic License 2.0.
The Elastic License 2.0 is a flexible license that allows you to use, copy, distribute, make available, and prepare derivative works of the software, as long as you do not provide the software to others as a managed service or include it in a free software directory. For the full license text, see our license page.
"},{"location":"contributing/","title":"Contributing","text":"Contribute to MiADE!
"},{"location":"about/overview/","title":"Project Overview","text":""},{"location":"about/overview/#background","title":"Background","text":"Data about people\u2019s health stored in electronic health records (EHRs) can play an important role in improving the quality of patient care. Much of the information in EHRs is recorded in ordinary language without any restriction on format ('free text'), as this is the natural way in which people communicate. However, if this information were stored in a standardised, structured format, computers will also be able to process the information to help clinicians find and interpret information for better and safer decision making. This would enable EHR systems such as Epic, the system in place at UCLH since April 2019, to support clinical decision making. For instance, the system may be able to ensure that a patient is not prescribed medicine that would give them an allergic reaction.
"},{"location":"about/overview/#the-challenge","title":"The challenge","text":"Free text may contain words and abbreviations which may be interpreted in more than one way, such as 'HR', which can mean 'Hour' or 'Heart Rate'. Free text may also contain negations; for example, a diagnosis may be mentioned in the text but the rest of the sentence might say that it was ruled out. Although computers can be used to interpret free text, they cannot always get it right, so clinicians will always have to check the results to ensure patient safety. Expressing information in a structured way can avoid this problem, but has a big disadvantage - it can be time-consuming for clinicians to enter the information. This can mean that information is incomplete, or clinicians are so busy on the computer that they do not have time to listen to their patients.
"},{"location":"about/overview/#meeting-the-need","title":"Meeting the need","text":"The aim of MiADE is to develop a system to support automatic conversion of the clinician\u2019s free text into a structured format. The clinician can check the structured data immediately, before making it a formal part of the patient\u2019s record. The system will record a patient\u2019s diagnoses, medications and allergies in a structured way, using NHS-endorsed clinical data standards (e.g. FIHR and SNOMED CT). It will use a technique called Natural Language Processing (NLP). NLP has been used by research teams to extract information from existing EHRs but has rarely been used to improve the way information is entered in the first place. Our NLP system will continuously learn and improve as more text is analysed and checked by clinicians.
We will first test the system in University College London Hospitals, where a new EHR system called Epic is in place. We will study how effective it is, and how clinicians and patients find it when it is used in consultations. Based on feedback, we will make improvements and install it for testing at a second site (Great Ormond Street Hospital). Our aim is for the system to be eventually rolled out to more hospitals and doctors\u2019 surgeries across the NHS.
"},{"location":"about/team/","title":"Team","text":"The MiADE project is developed by a team of clinicians, developers, AI researchers, and data standard experts at University College London (UCL) and the University College London Hospitals (UCLH), in collaboration with the Cogstack at King's College London (KCL).
"},{"location":"api-reference/annotator/","title":"Annotator","text":" Bases: ABC
An abstract base class for annotators.
Annotators are responsible for processing medical notes and extracting relevant concepts from them.
Attributes:
Name Type Descriptioncat
CAT
The MedCAT instance used for concept extraction.
config
AnnotatorConfig
The configuration for the annotator.
Source code insrc/miade/annotators.py
class Annotator(ABC):\n \"\"\"\n An abstract base class for annotators.\n\n Annotators are responsible for processing medical notes and extracting relevant concepts from them.\n\n Attributes:\n cat (CAT): The MedCAT instance used for concept extraction.\n config (AnnotatorConfig): The configuration for the annotator.\n \"\"\"\n\n def __init__(self, cat: CAT, config: AnnotatorConfig = None):\n self.cat = cat\n self.config = config if config is not None else AnnotatorConfig()\n\n if self.config.negation_detection == \"negex\":\n self._add_negex_pipeline()\n\n # TODO make paragraph processing params configurable\n self.structured_prob_lists = {\n ParagraphType.prob: Relevance.PRESENT,\n ParagraphType.imp: Relevance.PRESENT,\n ParagraphType.pmh: Relevance.HISTORIC,\n }\n self.structured_med_lists = {\n ParagraphType.med: SubstanceCategory.TAKING,\n ParagraphType.allergy: SubstanceCategory.ADVERSE_REACTION,\n }\n self.irrelevant_paragraphs = [ParagraphType.ddx, ParagraphType.exam, ParagraphType.plan]\n\n def _add_negex_pipeline(self) -> None:\n \"\"\"\n Adds the negex pipeline to the MedCAT instance.\n \"\"\"\n self.cat.pipe.spacy_nlp.add_pipe(\"sentencizer\")\n self.cat.pipe.spacy_nlp.enable_pipe(\"sentencizer\")\n self.cat.pipe.spacy_nlp.add_pipe(\"negex\")\n\n @property\n @abstractmethod\n def concept_types(self):\n \"\"\"\n Abstract property that should return a list of concept types supported by the annotator.\n \"\"\"\n pass\n\n @property\n @abstractmethod\n def pipeline(self):\n \"\"\"\n Abstract property that should return a list of pipeline steps for the annotator.\n \"\"\"\n pass\n\n @abstractmethod\n def process_paragraphs(self):\n \"\"\"\n Abstract method that should implement the logic for processing paragraphs in a note.\n \"\"\"\n pass\n\n @abstractmethod\n def postprocess(self):\n \"\"\"\n Abstract method that should implement the logic for post-processing extracted concepts.\n \"\"\"\n pass\n\n def run_pipeline(self, note: Note, record_concepts: List[Concept]) -> List[Concept]:\n \"\"\"\n Runs the annotation pipeline on a given note and returns the extracted concepts.\n\n Args:\n note (Note): The input note to process.\n record_concepts (List[Concept]): The list of concepts from existing EHR records.\n\n Returns:\n The extracted concepts from the note.\n \"\"\"\n concepts: List[Concept] = []\n\n for pipe in self.pipeline:\n if pipe not in self.config.disable:\n if pipe == \"preprocessor\":\n note = self.preprocess(note)\n elif pipe == \"medcat\":\n concepts = self.get_concepts(note)\n elif pipe == \"paragrapher\":\n concepts = self.process_paragraphs(note, concepts)\n elif pipe == \"postprocessor\":\n concepts = self.postprocess(concepts)\n elif pipe == \"deduplicator\":\n concepts = self.deduplicate(concepts, record_concepts)\n\n return concepts\n\n def get_concepts(self, note: Note) -> List[Concept]:\n \"\"\"\n Extracts concepts from a note using the MedCAT instance.\n\n Args:\n note (Note): The input note to extract concepts from.\n\n Returns:\n The extracted concepts from the note.\n \"\"\"\n concepts: List[Concept] = []\n for entity in self.cat.get_entities(note)[\"entities\"].values():\n try:\n concepts.append(Concept.from_entity(entity))\n log.debug(f\"Detected concept ({concepts[-1].id} | {concepts[-1].name})\")\n except ValueError as e:\n log.warning(f\"Concept skipped: {e}\")\n\n return concepts\n\n @staticmethod\n def preprocess(note: Note) -> Note:\n \"\"\"\n Preprocesses a note by cleaning its text and splitting it into paragraphs.\n\n Args:\n note (Note): The input note to preprocess.\n\n Returns:\n The preprocessed note.\n \"\"\"\n note.clean_text()\n note.get_paragraphs()\n\n return note\n\n @staticmethod\n def deduplicate(concepts: List[Concept], record_concepts: Optional[List[Concept]]) -> List[Concept]:\n \"\"\"\n Removes duplicate concepts from the extracted concepts list by strict ID matching.\n\n Args:\n concepts (List[Concept]): The list of extracted concepts.\n record_concepts (Optional[List[Concept]]): The list of concepts from existing EHR records.\n\n Returns:\n The deduplicated list of concepts.\n \"\"\"\n if record_concepts is not None:\n record_ids = {record_concept.id for record_concept in record_concepts}\n record_names = {record_concept.name for record_concept in record_concepts}\n else:\n record_ids = set()\n record_names = set()\n\n # Use an OrderedDict to keep track of ids as it preservers original MedCAT order (the order it appears in text)\n filtered_concepts: List[Concept] = []\n existing_concepts = OrderedDict()\n\n # Filter concepts that are in record or exist in concept list\n for concept in concepts:\n if concept.id is not None and (concept.id in record_ids or concept.id in existing_concepts):\n log.debug(f\"Removed concept ({concept.id} | {concept.name}): concept id exists in record\")\n # check name match for null ids - VTM deduplication\n elif concept.id is None and (concept.name in record_names or concept.name in existing_concepts.values()):\n log.debug(f\"Removed concept ({concept.id} | {concept.name}): concept name exists in record\")\n else:\n filtered_concepts.append(concept)\n existing_concepts[concept.id] = concept.name\n\n return filtered_concepts\n\n @staticmethod\n def add_numbering_to_name(concepts: List[Concept]) -> List[Concept]:\n \"\"\"\n Adds numbering to the names of problem concepts to control output ordering.\n\n Args:\n concepts (List[Concept]): The list of concepts to add numbering to.\n\n Returns:\n The list of concepts with numbering added to their names.\n \"\"\"\n # Prepend numbering to problem concepts e.g. 00 asthma, 01 stroke...\n for i, concept in enumerate(concepts):\n concept.name = f\"{i:02} {concept.name}\"\n\n return concepts\n\n def __call__(\n self,\n note: Note,\n record_concepts: Optional[List[Concept]] = None,\n ) -> List[Concept]:\n \"\"\"\n Runs the annotation pipeline on a given note and returns the extracted concepts.\n\n Args:\n note (Note): The input note to process.\n record_concepts (Optional[List[Concept]]): The list of concepts from existing EHR records.\n\n Returns:\n The extracted concepts from the note.\n \"\"\"\n concepts = self.run_pipeline(note, record_concepts)\n\n if self.config.add_numbering:\n concepts = self.add_numbering_to_name(concepts)\n\n return concepts\n
"},{"location":"api-reference/annotator/#miade.annotators.Annotator.concept_types","title":"concept_types
abstractmethod
property
","text":"Abstract property that should return a list of concept types supported by the annotator.
"},{"location":"api-reference/annotator/#miade.annotators.Annotator.pipeline","title":"pipeline
abstractmethod
property
","text":"Abstract property that should return a list of pipeline steps for the annotator.
"},{"location":"api-reference/annotator/#miade.annotators.Annotator.__call__","title":"__call__(note, record_concepts=None)
","text":"Runs the annotation pipeline on a given note and returns the extracted concepts.
Parameters:
Name Type Description Defaultnote
Note
The input note to process.
requiredrecord_concepts
Optional[List[Concept]]
The list of concepts from existing EHR records.
None
Returns:
Type DescriptionList[Concept]
The extracted concepts from the note.
Source code insrc/miade/annotators.py
def __call__(\n self,\n note: Note,\n record_concepts: Optional[List[Concept]] = None,\n) -> List[Concept]:\n \"\"\"\n Runs the annotation pipeline on a given note and returns the extracted concepts.\n\n Args:\n note (Note): The input note to process.\n record_concepts (Optional[List[Concept]]): The list of concepts from existing EHR records.\n\n Returns:\n The extracted concepts from the note.\n \"\"\"\n concepts = self.run_pipeline(note, record_concepts)\n\n if self.config.add_numbering:\n concepts = self.add_numbering_to_name(concepts)\n\n return concepts\n
"},{"location":"api-reference/annotator/#miade.annotators.Annotator.add_numbering_to_name","title":"add_numbering_to_name(concepts)
staticmethod
","text":"Adds numbering to the names of problem concepts to control output ordering.
Parameters:
Name Type Description Defaultconcepts
List[Concept]
The list of concepts to add numbering to.
requiredReturns:
Type DescriptionList[Concept]
The list of concepts with numbering added to their names.
Source code insrc/miade/annotators.py
@staticmethod\ndef add_numbering_to_name(concepts: List[Concept]) -> List[Concept]:\n \"\"\"\n Adds numbering to the names of problem concepts to control output ordering.\n\n Args:\n concepts (List[Concept]): The list of concepts to add numbering to.\n\n Returns:\n The list of concepts with numbering added to their names.\n \"\"\"\n # Prepend numbering to problem concepts e.g. 00 asthma, 01 stroke...\n for i, concept in enumerate(concepts):\n concept.name = f\"{i:02} {concept.name}\"\n\n return concepts\n
"},{"location":"api-reference/annotator/#miade.annotators.Annotator.deduplicate","title":"deduplicate(concepts, record_concepts)
staticmethod
","text":"Removes duplicate concepts from the extracted concepts list by strict ID matching.
Parameters:
Name Type Description Defaultconcepts
List[Concept]
The list of extracted concepts.
requiredrecord_concepts
Optional[List[Concept]]
The list of concepts from existing EHR records.
requiredReturns:
Type DescriptionList[Concept]
The deduplicated list of concepts.
Source code insrc/miade/annotators.py
@staticmethod\ndef deduplicate(concepts: List[Concept], record_concepts: Optional[List[Concept]]) -> List[Concept]:\n \"\"\"\n Removes duplicate concepts from the extracted concepts list by strict ID matching.\n\n Args:\n concepts (List[Concept]): The list of extracted concepts.\n record_concepts (Optional[List[Concept]]): The list of concepts from existing EHR records.\n\n Returns:\n The deduplicated list of concepts.\n \"\"\"\n if record_concepts is not None:\n record_ids = {record_concept.id for record_concept in record_concepts}\n record_names = {record_concept.name for record_concept in record_concepts}\n else:\n record_ids = set()\n record_names = set()\n\n # Use an OrderedDict to keep track of ids as it preservers original MedCAT order (the order it appears in text)\n filtered_concepts: List[Concept] = []\n existing_concepts = OrderedDict()\n\n # Filter concepts that are in record or exist in concept list\n for concept in concepts:\n if concept.id is not None and (concept.id in record_ids or concept.id in existing_concepts):\n log.debug(f\"Removed concept ({concept.id} | {concept.name}): concept id exists in record\")\n # check name match for null ids - VTM deduplication\n elif concept.id is None and (concept.name in record_names or concept.name in existing_concepts.values()):\n log.debug(f\"Removed concept ({concept.id} | {concept.name}): concept name exists in record\")\n else:\n filtered_concepts.append(concept)\n existing_concepts[concept.id] = concept.name\n\n return filtered_concepts\n
"},{"location":"api-reference/annotator/#miade.annotators.Annotator.get_concepts","title":"get_concepts(note)
","text":"Extracts concepts from a note using the MedCAT instance.
Parameters:
Name Type Description Defaultnote
Note
The input note to extract concepts from.
requiredReturns:
Type DescriptionList[Concept]
The extracted concepts from the note.
Source code insrc/miade/annotators.py
def get_concepts(self, note: Note) -> List[Concept]:\n \"\"\"\n Extracts concepts from a note using the MedCAT instance.\n\n Args:\n note (Note): The input note to extract concepts from.\n\n Returns:\n The extracted concepts from the note.\n \"\"\"\n concepts: List[Concept] = []\n for entity in self.cat.get_entities(note)[\"entities\"].values():\n try:\n concepts.append(Concept.from_entity(entity))\n log.debug(f\"Detected concept ({concepts[-1].id} | {concepts[-1].name})\")\n except ValueError as e:\n log.warning(f\"Concept skipped: {e}\")\n\n return concepts\n
"},{"location":"api-reference/annotator/#miade.annotators.Annotator.postprocess","title":"postprocess()
abstractmethod
","text":"Abstract method that should implement the logic for post-processing extracted concepts.
Source code insrc/miade/annotators.py
@abstractmethod\ndef postprocess(self):\n \"\"\"\n Abstract method that should implement the logic for post-processing extracted concepts.\n \"\"\"\n pass\n
"},{"location":"api-reference/annotator/#miade.annotators.Annotator.preprocess","title":"preprocess(note)
staticmethod
","text":"Preprocesses a note by cleaning its text and splitting it into paragraphs.
Parameters:
Name Type Description Defaultnote
Note
The input note to preprocess.
requiredReturns:
Type DescriptionNote
The preprocessed note.
Source code insrc/miade/annotators.py
@staticmethod\ndef preprocess(note: Note) -> Note:\n \"\"\"\n Preprocesses a note by cleaning its text and splitting it into paragraphs.\n\n Args:\n note (Note): The input note to preprocess.\n\n Returns:\n The preprocessed note.\n \"\"\"\n note.clean_text()\n note.get_paragraphs()\n\n return note\n
"},{"location":"api-reference/annotator/#miade.annotators.Annotator.process_paragraphs","title":"process_paragraphs()
abstractmethod
","text":"Abstract method that should implement the logic for processing paragraphs in a note.
Source code insrc/miade/annotators.py
@abstractmethod\ndef process_paragraphs(self):\n \"\"\"\n Abstract method that should implement the logic for processing paragraphs in a note.\n \"\"\"\n pass\n
"},{"location":"api-reference/annotator/#miade.annotators.Annotator.run_pipeline","title":"run_pipeline(note, record_concepts)
","text":"Runs the annotation pipeline on a given note and returns the extracted concepts.
Parameters:
Name Type Description Defaultnote
Note
The input note to process.
requiredrecord_concepts
List[Concept]
The list of concepts from existing EHR records.
requiredReturns:
Type DescriptionList[Concept]
The extracted concepts from the note.
Source code insrc/miade/annotators.py
def run_pipeline(self, note: Note, record_concepts: List[Concept]) -> List[Concept]:\n \"\"\"\n Runs the annotation pipeline on a given note and returns the extracted concepts.\n\n Args:\n note (Note): The input note to process.\n record_concepts (List[Concept]): The list of concepts from existing EHR records.\n\n Returns:\n The extracted concepts from the note.\n \"\"\"\n concepts: List[Concept] = []\n\n for pipe in self.pipeline:\n if pipe not in self.config.disable:\n if pipe == \"preprocessor\":\n note = self.preprocess(note)\n elif pipe == \"medcat\":\n concepts = self.get_concepts(note)\n elif pipe == \"paragrapher\":\n concepts = self.process_paragraphs(note, concepts)\n elif pipe == \"postprocessor\":\n concepts = self.postprocess(concepts)\n elif pipe == \"deduplicator\":\n concepts = self.deduplicate(concepts, record_concepts)\n\n return concepts\n
"},{"location":"api-reference/concept/","title":"Concept","text":" Bases: object
Represents a concept in the system.
Attributes:
Name Type Descriptionid
str
The unique identifier of the concept.
name
str
The name of the concept.
category
Optional[Enum]
The category of the concept (optional).
start
Optional[int]
The start position of the concept (optional).
end
Optional[int]
The end position of the concept (optional).
dosage
Optional[Dosage]
The dosage of the concept (optional).
linked_concepts
Optional[List[Concept]]
The linked concepts of the concept (optional).
negex
Optional[bool]
The negex value of the concept (optional).
meta_anns
Optional[List[MetaAnnotations]]
The meta annotations of the concept (optional).
debug_dict
Optional[Dict]
The debug dictionary of the concept (optional).
Source code insrc/miade/concept.py
class Concept(object):\n \"\"\"Represents a concept in the system.\n\n Attributes:\n id (str): The unique identifier of the concept.\n name (str): The name of the concept.\n category (Optional[Enum]): The category of the concept (optional).\n start (Optional[int]): The start position of the concept (optional).\n end (Optional[int]): The end position of the concept (optional).\n dosage (Optional[Dosage]): The dosage of the concept (optional).\n linked_concepts (Optional[List[Concept]]): The linked concepts of the concept (optional).\n negex (Optional[bool]): The negex value of the concept (optional).\n meta_anns (Optional[List[MetaAnnotations]]): The meta annotations of the concept (optional).\n debug_dict (Optional[Dict]): The debug dictionary of the concept (optional).\n \"\"\"\n\n def __init__(\n self,\n id: str,\n name: str,\n category: Optional[Enum] = None,\n start: Optional[int] = None,\n end: Optional[int] = None,\n dosage: Optional[Dosage] = None,\n linked_concepts: Optional[List[Concept]] = None,\n negex: Optional[bool] = None,\n meta_anns: Optional[List[MetaAnnotations]] = None,\n debug_dict: Optional[Dict] = None,\n ):\n self.name = name\n self.id = id\n self.category = category\n self.start = start\n self.end = end\n self.dosage = dosage\n self.linked_concepts = linked_concepts\n self.negex = negex\n self.meta = meta_anns\n self.debug = debug_dict\n\n if linked_concepts is None:\n self.linked_concepts = []\n\n @classmethod\n def from_entity(cls, entity: Dict) -> Concept:\n \"\"\"\n Converts an entity dictionary into a Concept object.\n\n Args:\n entity (Dict): The entity dictionary containing the necessary information.\n\n Returns:\n The Concept object created from the entity dictionary.\n \"\"\"\n meta_anns = None\n if entity[\"meta_anns\"]:\n meta_anns = [MetaAnnotations(**value) for value in entity[\"meta_anns\"].values()]\n\n return Concept(\n id=entity[\"cui\"],\n name=entity[\n \"source_value\"\n ], # can also use detected_name which is spell checked but delimited by ~ e.g. liver~failure\n category=None,\n start=entity[\"start\"],\n end=entity[\"end\"],\n negex=entity[\"negex\"] if \"negex\" in entity else None,\n meta_anns=meta_anns,\n )\n\n def __str__(self):\n return (\n f\"{{name: {self.name}, id: {self.id}, category: {self.category}, start: {self.start}, end: {self.end},\"\n f\" dosage: {self.dosage}, linked_concepts: {self.linked_concepts}, negex: {self.negex}, meta: {self.meta}}} \"\n )\n\n def __hash__(self):\n return hash((self.id, self.name, self.category))\n\n def __eq__(self, other):\n return self.id == other.id and self.name == other.name and self.category == other.category\n\n def __lt__(self, other):\n return int(self.id) < int(other.id)\n\n def __gt__(self, other):\n return int(self.id) > int(other.id)\n
"},{"location":"api-reference/concept/#miade.concept.Concept.from_entity","title":"from_entity(entity)
classmethod
","text":"Converts an entity dictionary into a Concept object.
Parameters:
Name Type Description Defaultentity
Dict
The entity dictionary containing the necessary information.
requiredReturns:
Type DescriptionConcept
The Concept object created from the entity dictionary.
Source code insrc/miade/concept.py
@classmethod\ndef from_entity(cls, entity: Dict) -> Concept:\n \"\"\"\n Converts an entity dictionary into a Concept object.\n\n Args:\n entity (Dict): The entity dictionary containing the necessary information.\n\n Returns:\n The Concept object created from the entity dictionary.\n \"\"\"\n meta_anns = None\n if entity[\"meta_anns\"]:\n meta_anns = [MetaAnnotations(**value) for value in entity[\"meta_anns\"].values()]\n\n return Concept(\n id=entity[\"cui\"],\n name=entity[\n \"source_value\"\n ], # can also use detected_name which is spell checked but delimited by ~ e.g. liver~failure\n category=None,\n start=entity[\"start\"],\n end=entity[\"end\"],\n negex=entity[\"negex\"] if \"negex\" in entity else None,\n meta_anns=meta_anns,\n )\n
"},{"location":"api-reference/dosage/","title":"Dosage","text":" Bases: object
Container for drug dosage information
Source code insrc/miade/dosage.py
class Dosage(object):\n \"\"\"\n Container for drug dosage information\n \"\"\"\n\n def __init__(\n self,\n dose: Optional[Dose],\n duration: Optional[Duration],\n frequency: Optional[Frequency],\n route: Optional[Route],\n text: Optional[str] = None,\n ):\n self.text = text\n self.dose = dose\n self.duration = duration\n self.frequency = frequency\n self.route = route\n\n @classmethod\n def from_doc(cls, doc: Doc, calculate: bool = True):\n \"\"\"\n Parses dosage from a spacy doc object.\n\n Args:\n doc (Doc): Spacy doc object with processed dosage text.\n calculate (bool, optional): Whether to calculate duration if total and daily dose is given. Defaults to True.\n\n Returns:\n An instance of the class with the parsed dosage information.\n\n \"\"\"\n quantities = []\n units = []\n dose_start = 1000\n dose_end = 0\n daily_dose = None\n total_dose = None\n route_text = None\n duration_text = None\n\n for ent in doc.ents:\n if ent.label_ == \"DOSAGE\":\n if ent._.total_dose:\n total_dose = float(ent.text)\n else:\n quantities.append(ent.text)\n # get span of full dosage string - not strictly needed but nice to have\n if ent.start < dose_start:\n dose_start = ent.start\n if ent.end > dose_end:\n dose_end = ent.end\n elif ent.label_ == \"FORM\":\n if ent._.total_dose:\n # de facto unit is in total dose\n units = [ent.text]\n else:\n units.append(ent.text)\n if ent.start < dose_start:\n dose_start = ent.start\n if ent.end > dose_end:\n dose_end = ent.end\n elif ent.label_ == \"DURATION\":\n duration_text = ent.text\n elif ent.label_ == \"ROUTE\":\n route_text = ent.text\n\n dose = parse_dose(\n text=\" \".join(doc.text.split()[dose_start:dose_end]),\n quantities=quantities,\n units=units,\n results=doc._.results,\n )\n\n frequency = parse_frequency(text=doc.text, results=doc._.results)\n\n route = parse_route(text=route_text, dose=dose)\n\n # technically not information recorded so will keep as an option\n if calculate:\n # if duration not given in text could extract this from total dose if given\n if total_dose is not None and dose is not None and doc._.results[\"freq\"]:\n if dose.value is not None:\n daily_dose = float(dose.value) * (round(doc._.results[\"freq\"] / doc._.results[\"time\"]))\n elif dose.high is not None:\n daily_dose = float(dose.high) * (round(doc._.results[\"freq\"] / doc._.results[\"time\"]))\n\n duration = parse_duration(\n text=duration_text,\n results=doc._.results,\n total_dose=total_dose,\n daily_dose=daily_dose,\n )\n\n return cls(\n text=doc._.original_text,\n dose=dose,\n duration=duration,\n frequency=frequency,\n route=route,\n )\n\n def __str__(self):\n return f\"{self.__dict__}\"\n\n def __eq__(self, other):\n return self.__dict__ == other.__dict__\n
"},{"location":"api-reference/dosage/#miade.dosage.Dosage.from_doc","title":"from_doc(doc, calculate=True)
classmethod
","text":"Parses dosage from a spacy doc object.
Parameters:
Name Type Description Defaultdoc
Doc
Spacy doc object with processed dosage text.
requiredcalculate
bool
Whether to calculate duration if total and daily dose is given. Defaults to True.
True
Returns:
Type DescriptionAn instance of the class with the parsed dosage information.
Source code insrc/miade/dosage.py
@classmethod\ndef from_doc(cls, doc: Doc, calculate: bool = True):\n \"\"\"\n Parses dosage from a spacy doc object.\n\n Args:\n doc (Doc): Spacy doc object with processed dosage text.\n calculate (bool, optional): Whether to calculate duration if total and daily dose is given. Defaults to True.\n\n Returns:\n An instance of the class with the parsed dosage information.\n\n \"\"\"\n quantities = []\n units = []\n dose_start = 1000\n dose_end = 0\n daily_dose = None\n total_dose = None\n route_text = None\n duration_text = None\n\n for ent in doc.ents:\n if ent.label_ == \"DOSAGE\":\n if ent._.total_dose:\n total_dose = float(ent.text)\n else:\n quantities.append(ent.text)\n # get span of full dosage string - not strictly needed but nice to have\n if ent.start < dose_start:\n dose_start = ent.start\n if ent.end > dose_end:\n dose_end = ent.end\n elif ent.label_ == \"FORM\":\n if ent._.total_dose:\n # de facto unit is in total dose\n units = [ent.text]\n else:\n units.append(ent.text)\n if ent.start < dose_start:\n dose_start = ent.start\n if ent.end > dose_end:\n dose_end = ent.end\n elif ent.label_ == \"DURATION\":\n duration_text = ent.text\n elif ent.label_ == \"ROUTE\":\n route_text = ent.text\n\n dose = parse_dose(\n text=\" \".join(doc.text.split()[dose_start:dose_end]),\n quantities=quantities,\n units=units,\n results=doc._.results,\n )\n\n frequency = parse_frequency(text=doc.text, results=doc._.results)\n\n route = parse_route(text=route_text, dose=dose)\n\n # technically not information recorded so will keep as an option\n if calculate:\n # if duration not given in text could extract this from total dose if given\n if total_dose is not None and dose is not None and doc._.results[\"freq\"]:\n if dose.value is not None:\n daily_dose = float(dose.value) * (round(doc._.results[\"freq\"] / doc._.results[\"time\"]))\n elif dose.high is not None:\n daily_dose = float(dose.high) * (round(doc._.results[\"freq\"] / doc._.results[\"time\"]))\n\n duration = parse_duration(\n text=duration_text,\n results=doc._.results,\n total_dose=total_dose,\n daily_dose=daily_dose,\n )\n\n return cls(\n text=doc._.original_text,\n dose=dose,\n duration=duration,\n frequency=frequency,\n route=route,\n )\n
"},{"location":"api-reference/dosageextractor/","title":"DosageExtractor","text":"Parses and extracts drug dosage
Attributes:
Name Type Descriptionmodel
str
The name of the model to be used for dosage extraction.
dosage_extractor
Language
The Spacy pipeline for dosage extraction.
Source code insrc/miade/dosageextractor.py
class DosageExtractor:\n \"\"\"\n Parses and extracts drug dosage\n\n Attributes:\n model (str): The name of the model to be used for dosage extraction.\n dosage_extractor (Language): The Spacy pipeline for dosage extraction.\n \"\"\"\n\n def __init__(self, model: str = \"en_core_med7_lg\"):\n self.model = model\n self.dosage_extractor = self._create_drugdoseade_pipeline()\n\n def _create_drugdoseade_pipeline(self) -> Language:\n \"\"\"\n Creates a spacy pipeline with given model (default med7)\n and customised pipeline components for dosage extraction\n\n Returns:\n nlp (spacy.Language): The Spacy pipeline for dosage extraction.\n \"\"\"\n nlp = spacy.load(self.model)\n nlp.add_pipe(\"preprocessor\", first=True)\n nlp.add_pipe(\"pattern_matcher\", before=\"ner\")\n nlp.add_pipe(\"entities_refiner\", after=\"ner\")\n\n log.info(f\"Loaded drug dosage extractor with model {self.model}\")\n\n return nlp\n\n def extract(self, text: str, calculate: bool = True) -> Optional[Dosage]:\n \"\"\"\n Processes a string that contains dosage instructions (excluding drug concept as this is handled by core)\n\n Args:\n text (str): The string containing dosage instructions.\n calculate (bool): Whether to calculate duration from total and daily dose, if given.\n\n Returns:\n The dosage object with parsed dosages in CDA format.\n \"\"\"\n doc = self.dosage_extractor(text)\n\n log.debug(f\"NER results: {[(e.text, e.label_, e._.total_dose) for e in doc.ents]}\")\n log.debug(f\"Lookup results: {doc._.results}\")\n\n dosage = Dosage.from_doc(doc=doc, calculate=calculate)\n\n if all(v is None for v in [dosage.dose, dosage.frequency, dosage.route, dosage.duration]):\n return None\n\n return dosage\n\n def __call__(self, text: str, calculate: bool = True):\n return self.extract(text, calculate)\n
"},{"location":"api-reference/dosageextractor/#miade.dosageextractor.DosageExtractor.extract","title":"extract(text, calculate=True)
","text":"Processes a string that contains dosage instructions (excluding drug concept as this is handled by core)
Parameters:
Name Type Description Defaulttext
str
The string containing dosage instructions.
requiredcalculate
bool
Whether to calculate duration from total and daily dose, if given.
True
Returns:
Type DescriptionOptional[Dosage]
The dosage object with parsed dosages in CDA format.
Source code insrc/miade/dosageextractor.py
def extract(self, text: str, calculate: bool = True) -> Optional[Dosage]:\n \"\"\"\n Processes a string that contains dosage instructions (excluding drug concept as this is handled by core)\n\n Args:\n text (str): The string containing dosage instructions.\n calculate (bool): Whether to calculate duration from total and daily dose, if given.\n\n Returns:\n The dosage object with parsed dosages in CDA format.\n \"\"\"\n doc = self.dosage_extractor(text)\n\n log.debug(f\"NER results: {[(e.text, e.label_, e._.total_dose) for e in doc.ents]}\")\n log.debug(f\"Lookup results: {doc._.results}\")\n\n dosage = Dosage.from_doc(doc=doc, calculate=calculate)\n\n if all(v is None for v in [dosage.dose, dosage.frequency, dosage.route, dosage.duration]):\n return None\n\n return dosage\n
"},{"location":"api-reference/medsallergiesannotator/","title":"MedsAllergiesAnnotator","text":" Bases: Annotator
Annotator class for medication and allergy concepts.
This class extends the Annotator
base class and provides methods for running a pipeline of annotation tasks on a given note, as well as validating and converting concepts related to medications and allergies.
Attributes:
Name Type Descriptionvalid_meds
List[int]
A list of valid medication IDs.
reactions_subset_lookup
Dict[int, str]
A dictionary mapping reaction IDs to their corresponding subset IDs.
allergens_subset_lookup
Dict[int, str]
A dictionary mapping allergen IDs to their corresponding subset IDs.
allergy_type_lookup
Dict[str, List[str]]
A dictionary mapping allergen types to their corresponding codes.
vtm_to_vmp_lookup
Dict[str, str]
A dictionary mapping VTM (Virtual Therapeutic Moiety) IDs to VMP (Virtual Medicinal Product) IDs.
vtm_to_text_lookup
Dict[str, str]
A dictionary mapping VTM IDs to their corresponding text.
Source code insrc/miade/annotators.py
class MedsAllergiesAnnotator(Annotator):\n \"\"\"\n Annotator class for medication and allergy concepts.\n\n This class extends the `Annotator` base class and provides methods for running a pipeline of\n annotation tasks on a given note, as well as validating and converting concepts related to\n medications and allergies.\n\n Attributes:\n valid_meds (List[int]): A list of valid medication IDs.\n reactions_subset_lookup (Dict[int, str]): A dictionary mapping reaction IDs to their corresponding subset IDs.\n allergens_subset_lookup (Dict[int, str]): A dictionary mapping allergen IDs to their corresponding subset IDs.\n allergy_type_lookup (Dict[str, List[str]]): A dictionary mapping allergen types to their corresponding codes.\n vtm_to_vmp_lookup (Dict[str, str]): A dictionary mapping VTM (Virtual Therapeutic Moiety) IDs to VMP (Virtual Medicinal Product) IDs.\n vtm_to_text_lookup (Dict[str, str]): A dictionary mapping VTM IDs to their corresponding text.\n \"\"\"\n\n def __init__(self, cat: CAT, config: AnnotatorConfig = None):\n super().__init__(cat, config)\n self._load_med_allergy_lookup_data()\n\n @property\n def concept_types(self) -> List[Category]:\n \"\"\"\n Returns a list of concept types.\n\n Returns:\n [Category.MEDICATION, Category.ALLERGY, Category.REACTION]\n \"\"\"\n return [Category.MEDICATION, Category.ALLERGY, Category.REACTION]\n\n @property\n def pipeline(self) -> List[str]:\n \"\"\"\n Returns a list of annotators in the pipeline.\n\n The annotators are executed in the order they appear in the list.\n\n Returns:\n [\"preprocessor\", \"medcat\", \"paragrapher\", \"postprocessor\", \"dosage_extractor\", \"vtm_converter\", \"deduplicator\"]\n \"\"\"\n return [\n \"preprocessor\",\n \"medcat\",\n \"paragrapher\",\n \"postprocessor\",\n \"dosage_extractor\",\n \"vtm_converter\",\n \"deduplicator\",\n ]\n\n def run_pipeline(\n self, note: Note, record_concepts: List[Concept], dosage_extractor: Optional[DosageExtractor]\n ) -> List[Concept]:\n \"\"\"\n Runs the annotation pipeline on the given note.\n\n Args:\n note (Note): The input note to run the pipeline on.\n record_concepts (List[Concept]): The list of previously recorded concepts.\n dosage_extractor (Optional[DosageExtractor]): The dosage extractor function.\n\n Returns:\n The list of annotated concepts.\n \"\"\"\n concepts: List[Concept] = []\n\n for pipe in self.pipeline:\n if pipe not in self.config.disable:\n if pipe == \"preprocessor\":\n note = self.preprocess(note)\n elif pipe == \"medcat\":\n concepts = self.get_concepts(note)\n elif pipe == \"paragrapher\":\n concepts = self.process_paragraphs(note, concepts)\n elif pipe == \"postprocessor\":\n concepts = self.postprocess(concepts, note)\n elif pipe == \"deduplicator\":\n concepts = self.deduplicate(concepts, record_concepts)\n elif pipe == \"vtm_converter\":\n concepts = self.convert_VTM_to_VMP_or_text(concepts)\n elif pipe == \"dosage_extractor\" and dosage_extractor is not None:\n concepts = self.add_dosages_to_concepts(dosage_extractor, concepts, note)\n\n return concepts\n\n def _load_med_allergy_lookup_data(self) -> None:\n \"\"\"\n Loads the medication and allergy lookup data.\n \"\"\"\n if not os.path.isdir(self.config.lookup_data_path):\n raise RuntimeError(f\"No lookup data configured: {self.config.lookup_data_path} does not exist!\")\n else:\n self.valid_meds = load_lookup_data(self.config.lookup_data_path + \"valid_meds.csv\", no_header=True)\n self.reactions_subset_lookup = load_lookup_data(\n self.config.lookup_data_path + \"reactions_subset.csv\", as_dict=True\n )\n self.allergens_subset_lookup = load_lookup_data(\n self.config.lookup_data_path + \"allergens_subset.csv\", as_dict=True\n )\n self.allergy_type_lookup = load_allergy_type_combinations(self.config.lookup_data_path + \"allergy_type.csv\")\n self.vtm_to_vmp_lookup = load_lookup_data(self.config.lookup_data_path + \"vtm_to_vmp.csv\")\n self.vtm_to_text_lookup = load_lookup_data(self.config.lookup_data_path + \"vtm_to_text.csv\", as_dict=True)\n\n def _validate_meds(self, concept) -> bool:\n \"\"\"\n Validates if the concept is a valid medication.\n\n Args:\n concept: The concept to validate.\n\n Returns:\n True if the concept is a valid medication, False otherwise.\n \"\"\"\n # check if substance is valid med\n if int(concept.id) in self.valid_meds.values:\n return True\n return False\n\n def _validate_and_convert_substance(self, concept) -> bool:\n \"\"\"\n Validates and converts a substance concept for allergy.\n\n Args:\n concept: The substance concept to be validated and converted.\n\n Returns:\n True if the substance is valid and converted successfully, False otherwise.\n \"\"\"\n # check if substance is valid substance for allergy - if it is, convert it to Epic subset and return that concept\n lookup_result = self.allergens_subset_lookup.get(int(concept.id))\n if lookup_result is not None:\n log.debug(\n f\"Converted concept ({concept.id} | {concept.name}) to \"\n f\"({lookup_result['subsetId']} | {concept.name}): valid Epic allergen subset\"\n )\n concept.id = str(lookup_result[\"subsetId\"])\n\n # then check the allergen type from lookup result - e.g. drug, food\n try:\n concept.category = AllergenType(str(lookup_result[\"allergenType\"]).lower())\n log.debug(\n f\"Assigned substance concept ({concept.id} | {concept.name}) \"\n f\"to allergen type category {concept.category}\"\n )\n except ValueError as e:\n log.warning(f\"Allergen type not found for {concept.__str__()}: {e}\")\n\n return True\n else:\n log.warning(f\"No lookup subset found for substance ({concept.id} | {concept.name})\")\n return False\n\n def _validate_and_convert_reaction(self, concept) -> bool:\n \"\"\"\n Validates and converts a reaction concept to the Epic subset.\n\n Args:\n concept: The concept to be validated and converted.\n\n Returns:\n True if the concept is a valid reaction and successfully converted to the Epic subset,\n False otherwise.\n \"\"\"\n # check if substance is valid reaction - if it is, convert it to Epic subset and return that concept\n lookup_result = self.reactions_subset_lookup.get(int(concept.id), None)\n if lookup_result is not None:\n log.debug(\n f\"Converted concept ({concept.id} | {concept.name}) to \"\n f\"({lookup_result} | {concept.name}): valid Epic reaction subset\"\n )\n concept.id = str(lookup_result)\n return True\n else:\n log.warning(f\"Reaction not found in Epic subset conversion for concept {concept.__str__()}\")\n return False\n\n def _validate_and_convert_concepts(self, concept: Concept) -> Concept:\n \"\"\"\n Validates and converts the given concept based on its metadata annotations.\n\n Args:\n concept (Concept): The concept to be validated and converted.\n\n Returns:\n The validated and converted concept.\n\n \"\"\"\n meta_ann_values = [meta_ann.value for meta_ann in concept.meta] if concept.meta is not None else []\n\n # assign categories\n if SubstanceCategory.ADVERSE_REACTION in meta_ann_values:\n if self._validate_and_convert_substance(concept):\n self._convert_allergy_type_to_code(concept)\n self._convert_allergy_severity_to_code(concept)\n concept.category = Category.ALLERGY\n else:\n log.warning(f\"Double-checking if concept ({concept.id} | {concept.name}) is in reaction subset\")\n if self._validate_and_convert_reaction(concept) and (\n ReactionPos.BEFORE_SUBSTANCE in meta_ann_values or ReactionPos.AFTER_SUBSTANCE in meta_ann_values\n ):\n concept.category = Category.REACTION\n else:\n log.warning(\n f\"Reaction concept ({concept.id} | {concept.name}) not in subset or reaction_pos is NOT_REACTION\"\n )\n if SubstanceCategory.TAKING in meta_ann_values:\n if self._validate_meds(concept):\n concept.category = Category.MEDICATION\n if SubstanceCategory.NOT_SUBSTANCE in meta_ann_values and (\n ReactionPos.BEFORE_SUBSTANCE in meta_ann_values or ReactionPos.AFTER_SUBSTANCE in meta_ann_values\n ):\n if self._validate_and_convert_reaction(concept):\n concept.category = Category.REACTION\n\n return concept\n\n @staticmethod\n def add_dosages_to_concepts(\n dosage_extractor: DosageExtractor, concepts: List[Concept], note: Note\n ) -> List[Concept]:\n \"\"\"\n Gets dosages for medication concepts\n\n Args:\n dosage_extractor (DosageExtractor): The dosage extractor object\n concepts (List[Concept]): List of concepts extracted\n note (Note): The input note\n\n Returns:\n List of concepts with dosages for medication concepts\n \"\"\"\n\n for ind, concept in enumerate(concepts):\n next_med_concept = concepts[ind + 1] if len(concepts) > ind + 1 else None\n dosage_string = get_dosage_string(concept, next_med_concept, note.text)\n if len(dosage_string.split()) > 2:\n concept.dosage = dosage_extractor(dosage_string)\n concept.category = Category.MEDICATION if concept.dosage is not None else None\n if concept.dosage is not None:\n log.debug(\n f\"Extracted dosage for medication concept \"\n f\"({concept.id} | {concept.name}): {concept.dosage.text} {concept.dosage.dose}\"\n )\n\n return concepts\n\n @staticmethod\n def _link_reactions_to_allergens(concept_list: List[Concept], note: Note, link_distance: int = 5) -> List[Concept]:\n \"\"\"\n Links reaction concepts to allergen concepts based on their proximity in the given concept list.\n\n Args:\n concept_list (List[Concept]): The list of concepts to search for reaction and allergen concepts.\n note (Note): The note object containing the text.\n link_distance (int, optional): The maximum distance between a reaction and an allergen to be considered linked.\n Defaults to 5.\n\n Returns:\n The updated concept list with reaction concepts removed and linked to their corresponding allergen concepts.\n \"\"\"\n allergy_concepts = [concept for concept in concept_list if concept.category == Category.ALLERGY]\n reaction_concepts = [concept for concept in concept_list if concept.category == Category.REACTION]\n\n for reaction_concept in reaction_concepts:\n nearest_allergy_concept = None\n min_distance = inf\n meta_ann_values = (\n [meta_ann.value for meta_ann in reaction_concept.meta] if reaction_concept.meta is not None else []\n )\n\n for allergy_concept in allergy_concepts:\n # skip if allergy is after and meta is before_substance\n if ReactionPos.BEFORE_SUBSTANCE in meta_ann_values and allergy_concept.start < reaction_concept.start:\n continue\n # skip if allergy is before and meta is after_substance\n elif ReactionPos.AFTER_SUBSTANCE in meta_ann_values and allergy_concept.start > reaction_concept.start:\n continue\n else:\n distance = calculate_word_distance(\n reaction_concept.start, reaction_concept.end, allergy_concept.start, allergy_concept.end, note\n )\n log.debug(\n f\"Calculated distance between reaction {reaction_concept.name} \"\n f\"and allergen {allergy_concept.name}: {distance}\"\n )\n if distance == -1:\n log.warning(\n f\"Indices for {reaction_concept.name} or {allergy_concept.name} invalid: \"\n f\"({reaction_concept.start}, {reaction_concept.end})\"\n f\"({allergy_concept.start}, {allergy_concept.end})\"\n )\n continue\n\n if distance <= link_distance and distance < min_distance:\n min_distance = distance\n nearest_allergy_concept = allergy_concept\n\n if nearest_allergy_concept is not None:\n nearest_allergy_concept.linked_concepts.append(reaction_concept)\n log.debug(\n f\"Linked reaction concept {reaction_concept.name} to \"\n f\"allergen concept {nearest_allergy_concept.name}\"\n )\n\n # Remove the linked REACTION concepts from the main list\n updated_concept_list = [concept for concept in concept_list if concept.category != Category.REACTION]\n\n return updated_concept_list\n\n @staticmethod\n def _convert_allergy_severity_to_code(concept: Concept) -> bool:\n \"\"\"\n Converts allergy severity to corresponding codes and links them to the concept.\n\n Args:\n concept (Concept): The concept to convert severity for.\n\n Returns:\n True if the conversion is successful, False otherwise.\n \"\"\"\n meta_ann_values = [meta_ann.value for meta_ann in concept.meta] if concept.meta is not None else []\n if Severity.MILD in meta_ann_values:\n concept.linked_concepts.append(Concept(id=\"L\", name=\"Low\", category=Category.SEVERITY))\n elif Severity.MODERATE in meta_ann_values:\n concept.linked_concepts.append(Concept(id=\"M\", name=\"Moderate\", category=Category.SEVERITY))\n elif Severity.SEVERE in meta_ann_values:\n concept.linked_concepts.append(Concept(id=\"H\", name=\"High\", category=Category.SEVERITY))\n elif Severity.UNSPECIFIED in meta_ann_values:\n return True\n else:\n log.warning(f\"No severity annotation associated with ({concept.id} | {concept.name})\")\n return False\n\n log.debug(\n f\"Linked severity concept ({concept.linked_concepts[-1].id} | {concept.linked_concepts[-1].name}) \"\n f\"to allergen concept ({concept.id} | {concept.name}): valid meta model output\"\n )\n\n return True\n\n def _convert_allergy_type_to_code(self, concept: Concept) -> bool:\n \"\"\"\n Converts the allergy type of a concept to a code and adds it as a linked concept.\n\n Args:\n concept (Concept): The concept whose allergy type needs to be converted.\n\n Returns:\n True if the conversion and linking were successful, False otherwise.\n \"\"\"\n # get the ALLERGYTYPE meta-annotation\n allergy_type = [meta_ann for meta_ann in concept.meta if meta_ann.name == \"allergy_type\"]\n if len(allergy_type) != 1:\n log.warning(\n f\"Unable to map allergy type code: allergy_type meta-annotation \"\n f\"not found for concept {concept.__str__()}\"\n )\n return False\n else:\n allergy_type = allergy_type[0].value\n\n # perform lookup with ALLERGYTYPE and AllergenType combination\n lookup_combination: Tuple[str, str] = (concept.category.value, allergy_type.value)\n allergy_type_lookup_result = self.allergy_type_lookup.get(lookup_combination)\n\n # add resulting allergy type concept as to linked_concept\n if allergy_type_lookup_result is not None:\n concept.linked_concepts.append(\n Concept(\n id=str(allergy_type_lookup_result[0]),\n name=allergy_type_lookup_result[1],\n category=Category.ALLERGY_TYPE,\n )\n )\n log.debug(\n f\"Linked allergy_type concept ({allergy_type_lookup_result[0]} | {allergy_type_lookup_result[1]})\"\n f\" to allergen concept ({concept.id} | {concept.name}): valid meta model output + allergytype lookup\"\n )\n else:\n log.warning(f\"Allergen and adverse reaction type combination not found: {lookup_combination}\")\n\n return True\n\n def _process_meta_ann_by_paragraph(self, concept: Concept, paragraph: Paragraph):\n \"\"\"\n Process the meta annotations for a given concept and paragraph.\n\n Args:\n concept (Concept): The concept object.\n paragraph (Paragraph): The paragraph object.\n\n Returns:\n None\n \"\"\"\n # if paragraph is structured meds to convert to corresponding relevance\n if paragraph.type in self.structured_med_lists:\n for meta in concept.meta:\n if meta.name == \"substance_category\" and meta.value in [\n SubstanceCategory.TAKING,\n SubstanceCategory.IRRELEVANT,\n ]:\n new_relevance = self.structured_med_lists[paragraph.type]\n if meta.value != new_relevance:\n log.debug(\n f\"Converted {meta.value} to \"\n f\"{new_relevance} for concept ({concept.id} | {concept.name}): \"\n f\"paragraph is {paragraph.type}\"\n )\n meta.value = new_relevance\n # if paragraph is probs or irrelevant section, convert substance to irrelevant\n elif paragraph.type in self.structured_prob_lists or paragraph.type in self.irrelevant_paragraphs:\n for meta in concept.meta:\n if meta.name == \"substance_category\" and meta.value != SubstanceCategory.IRRELEVANT:\n log.debug(\n f\"Converted {meta.value} to \"\n f\"{SubstanceCategory.IRRELEVANT} for concept ({concept.id} | {concept.name}): \"\n f\"paragraph is {paragraph.type}\"\n )\n meta.value = SubstanceCategory.IRRELEVANT\n\n def process_paragraphs(self, note: Note, concepts: List[Concept]) -> List[Concept]:\n \"\"\"\n Process the paragraphs in a note and update the list of concepts.\n\n Args:\n note (Note): The note object containing the paragraphs.\n concepts (List[Concept]): The list of concepts to be updated.\n\n Returns:\n The updated list of concepts.\n \"\"\"\n for paragraph in note.paragraphs:\n for concept in concepts:\n if concept.start >= paragraph.start and concept.end <= paragraph.end:\n # log.debug(f\"({concept.name} | {concept.id}) is in {paragraph.type}\")\n if concept.meta:\n self._process_meta_ann_by_paragraph(concept, paragraph)\n\n return concepts\n\n def postprocess(self, concepts: List[Concept], note: Note) -> List[Concept]:\n \"\"\"\n Postprocesses a list of concepts and links reactions to allergens.\n\n Args:\n concepts (List[Concept]): The list of concepts to be postprocessed.\n note (Note): The note object associated with the concepts.\n\n Returns:\n The postprocessed list of concepts.\n \"\"\"\n # deepcopy so we still have reference to original list of concepts\n all_concepts = deepcopy(concepts)\n processed_concepts = []\n\n for concept in all_concepts:\n concept = self._validate_and_convert_concepts(concept)\n processed_concepts.append(concept)\n\n processed_concepts = self._link_reactions_to_allergens(processed_concepts, note)\n\n return processed_concepts\n\n def convert_VTM_to_VMP_or_text(self, concepts: List[Concept]) -> List[Concept]:\n \"\"\"\n Converts medication concepts from VTM (Virtual Therapeutic Moiety) to VMP (Virtual Medicinal Product) or text.\n\n Args:\n concepts (List[Concept]): A list of medication concepts.\n\n Returns:\n A list of medication concepts with updated IDs, names, and dosages.\n\n \"\"\"\n # Get medication concepts\n med_concepts = [concept for concept in concepts if concept.category == Category.MEDICATION]\n self.vtm_to_vmp_lookup[\"dose\"] = self.vtm_to_vmp_lookup[\"dose\"].astype(float)\n\n med_concepts_with_dose = []\n # I don't know man...Need to improve dosage methods\n for concept in med_concepts:\n if concept.dosage is not None:\n if concept.dosage.dose:\n if concept.dosage.dose.value is not None and concept.dosage.dose.unit is not None:\n med_concepts_with_dose.append(concept)\n\n med_concepts_no_dose = [concept for concept in concepts if concept not in med_concepts_with_dose]\n\n # Create a temporary DataFrame to match vtmId, dose, and unit\n temp_df = pd.DataFrame(\n {\n \"vtmId\": [int(concept.id) for concept in med_concepts_with_dose],\n \"dose\": [float(concept.dosage.dose.value) for concept in med_concepts_with_dose],\n \"unit\": [concept.dosage.dose.unit for concept in med_concepts_with_dose],\n }\n )\n\n # Merge with the lookup df to get vmpId\n merged_df = temp_df.merge(self.vtm_to_vmp_lookup, on=[\"vtmId\", \"dose\", \"unit\"], how=\"left\")\n\n # Update id in the concepts list\n for index, concept in enumerate(med_concepts_with_dose):\n # Convert VTM to VMP id\n vmp_id = merged_df.at[index, \"vmpId\"]\n if not pd.isna(vmp_id):\n log.debug(\n f\"Converted ({concept.id} | {concept.name}) to \"\n f\"({int(vmp_id)} | {concept.name + ' ' + str(int(concept.dosage.dose.value)) + concept.dosage.dose.unit} \"\n f\"tablets): valid extracted dosage + VMP lookup\"\n )\n concept.id = str(int(vmp_id))\n concept.name += \" \" + str(int(concept.dosage.dose.value)) + str(concept.dosage.dose.unit) + \" tablets\"\n # If found VMP match change the dosage to 1 tablet\n concept.dosage.dose.value = 1\n concept.dosage.dose.unit = \"{tbl}\"\n else:\n # If no match with dose convert to text\n lookup_result = self.vtm_to_text_lookup.get(int(concept.id))\n if lookup_result is not None:\n log.debug(\n f\"Converted ({concept.id} | {concept.name}) to (None | {lookup_result}: no match to VMP dosage lookup)\"\n )\n concept.id = None\n concept.name = lookup_result\n\n # Convert rest of VTMs that have no dose for VMP conversion to text\n for concept in med_concepts_no_dose:\n lookup_result = self.vtm_to_text_lookup.get(int(concept.id))\n if lookup_result is not None:\n log.debug(f\"Converted ({concept.id} | {concept.name}) to (None | {lookup_result}): no dosage detected\")\n concept.id = None\n concept.name = lookup_result\n\n return concepts\n\n def __call__(\n self,\n note: Note,\n record_concepts: Optional[List[Concept]] = None,\n dosage_extractor: Optional[DosageExtractor] = None,\n ) -> List[Concept]:\n \"\"\"\n Annotates the given note with concepts using the pipeline.\n\n Args:\n note (Note): The note to be annotated.\n record_concepts (Optional[List[Concept]]): A list of concepts to be recorded.\n dosage_extractor (Optional[DosageExtractor]): A dosage extractor to be used.\n\n Returns:\n The annotated concepts.\n \"\"\"\n concepts = self.run_pipeline(note, record_concepts, dosage_extractor)\n\n if self.config.add_numbering:\n concepts = self.add_numbering_to_name(concepts)\n\n return concepts\n
"},{"location":"api-reference/medsallergiesannotator/#miade.annotators.MedsAllergiesAnnotator.concept_types","title":"concept_types: List[Category]
property
","text":"Returns a list of concept types.
Returns:
Type DescriptionList[Category]
[Category.MEDICATION, Category.ALLERGY, Category.REACTION]
"},{"location":"api-reference/medsallergiesannotator/#miade.annotators.MedsAllergiesAnnotator.pipeline","title":"pipeline: List[str]
property
","text":"Returns a list of annotators in the pipeline.
The annotators are executed in the order they appear in the list.
Returns:
Type DescriptionList[str]
[\"preprocessor\", \"medcat\", \"paragrapher\", \"postprocessor\", \"dosage_extractor\", \"vtm_converter\", \"deduplicator\"]
"},{"location":"api-reference/medsallergiesannotator/#miade.annotators.MedsAllergiesAnnotator.__call__","title":"__call__(note, record_concepts=None, dosage_extractor=None)
","text":"Annotates the given note with concepts using the pipeline.
Parameters:
Name Type Description Defaultnote
Note
The note to be annotated.
requiredrecord_concepts
Optional[List[Concept]]
A list of concepts to be recorded.
None
dosage_extractor
Optional[DosageExtractor]
A dosage extractor to be used.
None
Returns:
Type DescriptionList[Concept]
The annotated concepts.
Source code insrc/miade/annotators.py
def __call__(\n self,\n note: Note,\n record_concepts: Optional[List[Concept]] = None,\n dosage_extractor: Optional[DosageExtractor] = None,\n) -> List[Concept]:\n \"\"\"\n Annotates the given note with concepts using the pipeline.\n\n Args:\n note (Note): The note to be annotated.\n record_concepts (Optional[List[Concept]]): A list of concepts to be recorded.\n dosage_extractor (Optional[DosageExtractor]): A dosage extractor to be used.\n\n Returns:\n The annotated concepts.\n \"\"\"\n concepts = self.run_pipeline(note, record_concepts, dosage_extractor)\n\n if self.config.add_numbering:\n concepts = self.add_numbering_to_name(concepts)\n\n return concepts\n
"},{"location":"api-reference/medsallergiesannotator/#miade.annotators.MedsAllergiesAnnotator.add_dosages_to_concepts","title":"add_dosages_to_concepts(dosage_extractor, concepts, note)
staticmethod
","text":"Gets dosages for medication concepts
Parameters:
Name Type Description Defaultdosage_extractor
DosageExtractor
The dosage extractor object
requiredconcepts
List[Concept]
List of concepts extracted
requirednote
Note
The input note
requiredReturns:
Type DescriptionList[Concept]
List of concepts with dosages for medication concepts
Source code insrc/miade/annotators.py
@staticmethod\ndef add_dosages_to_concepts(\n dosage_extractor: DosageExtractor, concepts: List[Concept], note: Note\n) -> List[Concept]:\n \"\"\"\n Gets dosages for medication concepts\n\n Args:\n dosage_extractor (DosageExtractor): The dosage extractor object\n concepts (List[Concept]): List of concepts extracted\n note (Note): The input note\n\n Returns:\n List of concepts with dosages for medication concepts\n \"\"\"\n\n for ind, concept in enumerate(concepts):\n next_med_concept = concepts[ind + 1] if len(concepts) > ind + 1 else None\n dosage_string = get_dosage_string(concept, next_med_concept, note.text)\n if len(dosage_string.split()) > 2:\n concept.dosage = dosage_extractor(dosage_string)\n concept.category = Category.MEDICATION if concept.dosage is not None else None\n if concept.dosage is not None:\n log.debug(\n f\"Extracted dosage for medication concept \"\n f\"({concept.id} | {concept.name}): {concept.dosage.text} {concept.dosage.dose}\"\n )\n\n return concepts\n
"},{"location":"api-reference/medsallergiesannotator/#miade.annotators.MedsAllergiesAnnotator.convert_VTM_to_VMP_or_text","title":"convert_VTM_to_VMP_or_text(concepts)
","text":"Converts medication concepts from VTM (Virtual Therapeutic Moiety) to VMP (Virtual Medicinal Product) or text.
Parameters:
Name Type Description Defaultconcepts
List[Concept]
A list of medication concepts.
requiredReturns:
Type DescriptionList[Concept]
A list of medication concepts with updated IDs, names, and dosages.
Source code insrc/miade/annotators.py
def convert_VTM_to_VMP_or_text(self, concepts: List[Concept]) -> List[Concept]:\n \"\"\"\n Converts medication concepts from VTM (Virtual Therapeutic Moiety) to VMP (Virtual Medicinal Product) or text.\n\n Args:\n concepts (List[Concept]): A list of medication concepts.\n\n Returns:\n A list of medication concepts with updated IDs, names, and dosages.\n\n \"\"\"\n # Get medication concepts\n med_concepts = [concept for concept in concepts if concept.category == Category.MEDICATION]\n self.vtm_to_vmp_lookup[\"dose\"] = self.vtm_to_vmp_lookup[\"dose\"].astype(float)\n\n med_concepts_with_dose = []\n # I don't know man...Need to improve dosage methods\n for concept in med_concepts:\n if concept.dosage is not None:\n if concept.dosage.dose:\n if concept.dosage.dose.value is not None and concept.dosage.dose.unit is not None:\n med_concepts_with_dose.append(concept)\n\n med_concepts_no_dose = [concept for concept in concepts if concept not in med_concepts_with_dose]\n\n # Create a temporary DataFrame to match vtmId, dose, and unit\n temp_df = pd.DataFrame(\n {\n \"vtmId\": [int(concept.id) for concept in med_concepts_with_dose],\n \"dose\": [float(concept.dosage.dose.value) for concept in med_concepts_with_dose],\n \"unit\": [concept.dosage.dose.unit for concept in med_concepts_with_dose],\n }\n )\n\n # Merge with the lookup df to get vmpId\n merged_df = temp_df.merge(self.vtm_to_vmp_lookup, on=[\"vtmId\", \"dose\", \"unit\"], how=\"left\")\n\n # Update id in the concepts list\n for index, concept in enumerate(med_concepts_with_dose):\n # Convert VTM to VMP id\n vmp_id = merged_df.at[index, \"vmpId\"]\n if not pd.isna(vmp_id):\n log.debug(\n f\"Converted ({concept.id} | {concept.name}) to \"\n f\"({int(vmp_id)} | {concept.name + ' ' + str(int(concept.dosage.dose.value)) + concept.dosage.dose.unit} \"\n f\"tablets): valid extracted dosage + VMP lookup\"\n )\n concept.id = str(int(vmp_id))\n concept.name += \" \" + str(int(concept.dosage.dose.value)) + str(concept.dosage.dose.unit) + \" tablets\"\n # If found VMP match change the dosage to 1 tablet\n concept.dosage.dose.value = 1\n concept.dosage.dose.unit = \"{tbl}\"\n else:\n # If no match with dose convert to text\n lookup_result = self.vtm_to_text_lookup.get(int(concept.id))\n if lookup_result is not None:\n log.debug(\n f\"Converted ({concept.id} | {concept.name}) to (None | {lookup_result}: no match to VMP dosage lookup)\"\n )\n concept.id = None\n concept.name = lookup_result\n\n # Convert rest of VTMs that have no dose for VMP conversion to text\n for concept in med_concepts_no_dose:\n lookup_result = self.vtm_to_text_lookup.get(int(concept.id))\n if lookup_result is not None:\n log.debug(f\"Converted ({concept.id} | {concept.name}) to (None | {lookup_result}): no dosage detected\")\n concept.id = None\n concept.name = lookup_result\n\n return concepts\n
"},{"location":"api-reference/medsallergiesannotator/#miade.annotators.MedsAllergiesAnnotator.postprocess","title":"postprocess(concepts, note)
","text":"Postprocesses a list of concepts and links reactions to allergens.
Parameters:
Name Type Description Defaultconcepts
List[Concept]
The list of concepts to be postprocessed.
requirednote
Note
The note object associated with the concepts.
requiredReturns:
Type DescriptionList[Concept]
The postprocessed list of concepts.
Source code insrc/miade/annotators.py
def postprocess(self, concepts: List[Concept], note: Note) -> List[Concept]:\n \"\"\"\n Postprocesses a list of concepts and links reactions to allergens.\n\n Args:\n concepts (List[Concept]): The list of concepts to be postprocessed.\n note (Note): The note object associated with the concepts.\n\n Returns:\n The postprocessed list of concepts.\n \"\"\"\n # deepcopy so we still have reference to original list of concepts\n all_concepts = deepcopy(concepts)\n processed_concepts = []\n\n for concept in all_concepts:\n concept = self._validate_and_convert_concepts(concept)\n processed_concepts.append(concept)\n\n processed_concepts = self._link_reactions_to_allergens(processed_concepts, note)\n\n return processed_concepts\n
"},{"location":"api-reference/medsallergiesannotator/#miade.annotators.MedsAllergiesAnnotator.process_paragraphs","title":"process_paragraphs(note, concepts)
","text":"Process the paragraphs in a note and update the list of concepts.
Parameters:
Name Type Description Defaultnote
Note
The note object containing the paragraphs.
requiredconcepts
List[Concept]
The list of concepts to be updated.
requiredReturns:
Type DescriptionList[Concept]
The updated list of concepts.
Source code insrc/miade/annotators.py
def process_paragraphs(self, note: Note, concepts: List[Concept]) -> List[Concept]:\n \"\"\"\n Process the paragraphs in a note and update the list of concepts.\n\n Args:\n note (Note): The note object containing the paragraphs.\n concepts (List[Concept]): The list of concepts to be updated.\n\n Returns:\n The updated list of concepts.\n \"\"\"\n for paragraph in note.paragraphs:\n for concept in concepts:\n if concept.start >= paragraph.start and concept.end <= paragraph.end:\n # log.debug(f\"({concept.name} | {concept.id}) is in {paragraph.type}\")\n if concept.meta:\n self._process_meta_ann_by_paragraph(concept, paragraph)\n\n return concepts\n
"},{"location":"api-reference/medsallergiesannotator/#miade.annotators.MedsAllergiesAnnotator.run_pipeline","title":"run_pipeline(note, record_concepts, dosage_extractor)
","text":"Runs the annotation pipeline on the given note.
Parameters:
Name Type Description Defaultnote
Note
The input note to run the pipeline on.
requiredrecord_concepts
List[Concept]
The list of previously recorded concepts.
requireddosage_extractor
Optional[DosageExtractor]
The dosage extractor function.
requiredReturns:
Type DescriptionList[Concept]
The list of annotated concepts.
Source code insrc/miade/annotators.py
def run_pipeline(\n self, note: Note, record_concepts: List[Concept], dosage_extractor: Optional[DosageExtractor]\n) -> List[Concept]:\n \"\"\"\n Runs the annotation pipeline on the given note.\n\n Args:\n note (Note): The input note to run the pipeline on.\n record_concepts (List[Concept]): The list of previously recorded concepts.\n dosage_extractor (Optional[DosageExtractor]): The dosage extractor function.\n\n Returns:\n The list of annotated concepts.\n \"\"\"\n concepts: List[Concept] = []\n\n for pipe in self.pipeline:\n if pipe not in self.config.disable:\n if pipe == \"preprocessor\":\n note = self.preprocess(note)\n elif pipe == \"medcat\":\n concepts = self.get_concepts(note)\n elif pipe == \"paragrapher\":\n concepts = self.process_paragraphs(note, concepts)\n elif pipe == \"postprocessor\":\n concepts = self.postprocess(concepts, note)\n elif pipe == \"deduplicator\":\n concepts = self.deduplicate(concepts, record_concepts)\n elif pipe == \"vtm_converter\":\n concepts = self.convert_VTM_to_VMP_or_text(concepts)\n elif pipe == \"dosage_extractor\" and dosage_extractor is not None:\n concepts = self.add_dosages_to_concepts(dosage_extractor, concepts, note)\n\n return concepts\n
"},{"location":"api-reference/metaannotations/","title":"MetaAnnotations","text":" Bases: BaseModel
Represents a meta annotation with a name, value, and optional confidence.
Attributes:
Name Type Descriptionname
str
The name of the meta annotation.
value
Enum
The value of the meta annotation.
confidence
float
The confidence level of the meta annotation.
Source code insrc/miade/metaannotations.py
class MetaAnnotations(BaseModel):\n \"\"\"\n Represents a meta annotation with a name, value, and optional confidence.\n\n Attributes:\n name (str): The name of the meta annotation.\n value (Enum): The value of the meta annotation.\n confidence (float, optional): The confidence level of the meta annotation.\n \"\"\"\n\n name: str\n value: Enum\n confidence: Optional[float]\n\n @validator(\"value\", pre=True)\n def validate_value(cls, value, values):\n enum_dict = META_ANNS_DICT\n if isinstance(value, str):\n enum_type = enum_dict.get(values[\"name\"])\n if enum_type is not None:\n try:\n return enum_type(value)\n except ValueError:\n raise ValueError(f\"Invalid value: {value}\")\n else:\n raise ValueError(f\"Invalid mapping for {values['name']}\")\n\n return value\n\n def __eq__(self, other):\n return self.name == other.name and self.value == other.value\n
"},{"location":"api-reference/note/","title":"Note","text":" Bases: object
Represents a note object.
Attributes:
Name Type Descriptiontext
str
The text content of the note.
raw_text
str
The raw text content of the note.
regex_config
str
The path to the regex configuration file.
paragraphs
Optional[List[Paragraph]]
A list of paragraphs in the note.
Source code insrc/miade/note.py
class Note(object):\n \"\"\"\n Represents a note object.\n\n Attributes:\n text (str): The text content of the note.\n raw_text (str): The raw text content of the note.\n regex_config (str): The path to the regex configuration file.\n paragraphs (Optional[List[Paragraph]]): A list of paragraphs in the note.\n \"\"\"\n\n def __init__(self, text: str, regex_config_path: str = \"./data/regex_para_chunk.csv\"):\n self.text = text\n self.raw_text = text\n self.regex_config = load_regex_config_mappings(regex_config_path)\n self.paragraphs: Optional[List[Paragraph]] = []\n\n def clean_text(self) -> None:\n \"\"\"\n Cleans the text content of the note.\n\n This method performs various cleaning operations on the text content of the note,\n such as replacing spaces, removing punctuation, and removing empty lines.\n \"\"\"\n\n # Replace all types of spaces with a single normal space, preserving \"\\n\"\n self.text = re.sub(r\"(?:(?!\\n)\\s)+\", \" \", self.text)\n\n # Remove en dashes that are not between two numbers\n self.text = re.sub(r\"(?<![0-9])-(?![0-9])\", \"\", self.text)\n\n # Remove all punctuation except full stops, question marks, dash and line breaks\n self.text = re.sub(r\"[^\\w\\s.,?\\n-]\", \"\", self.text)\n\n # Remove spaces if the entire line (between two line breaks) is just spaces\n self.text = re.sub(r\"(?<=\\n)\\s+(?=\\n)\", \"\", self.text)\n\n def get_paragraphs(self) -> None:\n \"\"\"\n Splits the note into paragraphs.\n\n This method splits the text content of the note into paragraphs based on double line breaks.\n It also assigns a paragraph type to each paragraph based on matching patterns in the heading.\n \"\"\"\n\n paragraphs = re.split(r\"\\n\\n+\", self.text)\n start = 0\n\n for text in paragraphs:\n # Default to prose\n paragraph_type = ParagraphType.prose\n\n # Use re.search to find everything before first \\n\n match = re.search(r\"^(.*?)(?:\\n|$)([\\s\\S]*)\", text)\n\n # Check if a match is found\n if match:\n heading = match.group(1)\n body = match.group(2)\n else:\n heading = text\n body = \"\"\n\n end = start + len(text)\n paragraph = Paragraph(heading=heading, body=body, type=paragraph_type, start=start, end=end)\n start = end + 2 # Account for the two newline characters\n\n # Convert the heading to lowercase for case-insensitive matching\n if heading:\n heading = heading.lower()\n # Iterate through the dictionary items and patterns\n for paragraph_type, pattern in self.regex_config.items():\n if re.search(pattern, heading):\n paragraph.type = paragraph_type\n break # Exit the loop if a match is found\n\n self.paragraphs.append(paragraph)\n\n def __str__(self):\n return self.text\n
"},{"location":"api-reference/note/#miade.note.Note.clean_text","title":"clean_text()
","text":"Cleans the text content of the note.
This method performs various cleaning operations on the text content of the note, such as replacing spaces, removing punctuation, and removing empty lines.
Source code insrc/miade/note.py
def clean_text(self) -> None:\n \"\"\"\n Cleans the text content of the note.\n\n This method performs various cleaning operations on the text content of the note,\n such as replacing spaces, removing punctuation, and removing empty lines.\n \"\"\"\n\n # Replace all types of spaces with a single normal space, preserving \"\\n\"\n self.text = re.sub(r\"(?:(?!\\n)\\s)+\", \" \", self.text)\n\n # Remove en dashes that are not between two numbers\n self.text = re.sub(r\"(?<![0-9])-(?![0-9])\", \"\", self.text)\n\n # Remove all punctuation except full stops, question marks, dash and line breaks\n self.text = re.sub(r\"[^\\w\\s.,?\\n-]\", \"\", self.text)\n\n # Remove spaces if the entire line (between two line breaks) is just spaces\n self.text = re.sub(r\"(?<=\\n)\\s+(?=\\n)\", \"\", self.text)\n
"},{"location":"api-reference/note/#miade.note.Note.get_paragraphs","title":"get_paragraphs()
","text":"Splits the note into paragraphs.
This method splits the text content of the note into paragraphs based on double line breaks. It also assigns a paragraph type to each paragraph based on matching patterns in the heading.
Source code insrc/miade/note.py
def get_paragraphs(self) -> None:\n \"\"\"\n Splits the note into paragraphs.\n\n This method splits the text content of the note into paragraphs based on double line breaks.\n It also assigns a paragraph type to each paragraph based on matching patterns in the heading.\n \"\"\"\n\n paragraphs = re.split(r\"\\n\\n+\", self.text)\n start = 0\n\n for text in paragraphs:\n # Default to prose\n paragraph_type = ParagraphType.prose\n\n # Use re.search to find everything before first \\n\n match = re.search(r\"^(.*?)(?:\\n|$)([\\s\\S]*)\", text)\n\n # Check if a match is found\n if match:\n heading = match.group(1)\n body = match.group(2)\n else:\n heading = text\n body = \"\"\n\n end = start + len(text)\n paragraph = Paragraph(heading=heading, body=body, type=paragraph_type, start=start, end=end)\n start = end + 2 # Account for the two newline characters\n\n # Convert the heading to lowercase for case-insensitive matching\n if heading:\n heading = heading.lower()\n # Iterate through the dictionary items and patterns\n for paragraph_type, pattern in self.regex_config.items():\n if re.search(pattern, heading):\n paragraph.type = paragraph_type\n break # Exit the loop if a match is found\n\n self.paragraphs.append(paragraph)\n
"},{"location":"api-reference/noteprocessor/","title":"NoteProcessor","text":"Main processor of MiADE which extract, postprocesses, and deduplicates concepts given annotators (MedCAT models), Note, and existing concepts
Parameters:
Name Type Description Defaultmodel_directory
Path
Path to directory that contains medcat models and a config.yaml file
requiredmodel_config_path
Path
Path to the model config file. Defaults to None.
None
log_level
int
Log level. Defaults to logging.INFO.
INFO
dosage_extractor_log_level
int
Log level for dosage extractor. Defaults to logging.INFO.
INFO
device
str
Device to run inference on (cpu or gpu). Defaults to \"cpu\".
'cpu'
custom_annotators
List[Annotator]
List of custom annotators. Defaults to None.
None
Source code in src/miade/core.py
class NoteProcessor:\n \"\"\"\n Main processor of MiADE which extract, postprocesses, and deduplicates concepts given\n annotators (MedCAT models), Note, and existing concepts\n\n Args:\n model_directory (Path): Path to directory that contains medcat models and a config.yaml file\n model_config_path (Path, optional): Path to the model config file. Defaults to None.\n log_level (int, optional): Log level. Defaults to logging.INFO.\n dosage_extractor_log_level (int, optional): Log level for dosage extractor. Defaults to logging.INFO.\n device (str, optional): Device to run inference on (cpu or gpu). Defaults to \"cpu\".\n custom_annotators (List[Annotator], optional): List of custom annotators. Defaults to None.\n \"\"\"\n\n def __init__(\n self,\n model_directory: Path,\n model_config_path: Path = None,\n log_level: int = logging.INFO,\n dosage_extractor_log_level: int = logging.INFO,\n device: str = \"cpu\",\n custom_annotators: Optional[List[Annotator]] = None,\n ):\n logging.getLogger(\"miade\").setLevel(log_level)\n logging.getLogger(\"miade.dosageextractor\").setLevel(dosage_extractor_log_level)\n logging.getLogger(\"miade.drugdoseade\").setLevel(dosage_extractor_log_level)\n\n self.device: str = device\n\n self.annotators: List[Annotator] = []\n self.model_directory: Path = model_directory\n self.model_config_path: Path = model_config_path\n self.model_factory: ModelFactory = self._load_model_factory(custom_annotators)\n self.dosage_extractor: DosageExtractor = DosageExtractor()\n\n def _load_config(self) -> Dict:\n \"\"\"\n Loads the configuration file (config.yaml) in the configured model path.\n If the model path is not explicitly passed, it defaults to the model directory.\n\n Returns:\n A dictionary containing the loaded config file.\n \"\"\"\n if self.model_config_path is None:\n config_path = os.path.join(self.model_directory, \"config.yaml\")\n else:\n config_path = self.model_config_path\n\n if os.path.isfile(config_path):\n log.info(f\"Found config file {config_path}\")\n else:\n log.error(f\"No model config file found at {config_path}\")\n\n with open(config_path, \"r\") as f:\n config = yaml.safe_load(f)\n\n return config\n\n def _load_model_factory(self, custom_annotators: Optional[List[Annotator]] = None) -> ModelFactory:\n \"\"\"\n Loads the model factory which maps model aliases to MedCAT model IDs and MiADE annotators.\n\n Args:\n custom_annotators (List[Annotators], optional): List of custom annotators to initialize. Defaults to None.\n\n Returns:\n The initialized ModelFactory object.\n\n Raises:\n Exception: If there is an error loading MedCAT models.\n\n \"\"\"\n meta_cat_config_dict = {\"general\": {\"device\": self.device}}\n config_dict = self._load_config()\n loaded_models = {}\n\n # get model {id: cat_model}\n log.info(f\"Loading MedCAT models from {self.model_directory}\")\n for model_pack_filepath in self.model_directory.glob(\"*.zip\"):\n try:\n cat = MiADE_CAT.load_model_pack(str(model_pack_filepath), meta_cat_config_dict=meta_cat_config_dict)\n # temp fix reload to load stop words\n cat.pipe._nlp = spacy.load(\n cat.config.general.spacy_model, disable=cat.config.general.spacy_disabled_components\n )\n cat._create_pipeline(config=cat.config)\n cat_id = cat.config.version[\"id\"]\n loaded_models[cat_id] = cat\n except Exception as e:\n raise Exception(f\"Error loading MedCAT models: {e}\")\n\n mapped_models = {}\n # map to name if given {name: <class CAT>}\n if \"models\" in config_dict:\n for name, model_id in config_dict[\"models\"].items():\n cat_model = loaded_models.get(model_id)\n if cat_model is None:\n log.warning(f\"No match for model id {model_id} in {self.model_directory}, skipping\")\n continue\n mapped_models[name] = cat_model\n else:\n log.warning(\"No model ids configured!\")\n\n mapped_annotators = {}\n # {name: <class Annotator>}\n if \"annotators\" in config_dict:\n for name, annotator_string in config_dict[\"annotators\"].items():\n if custom_annotators is not None:\n for annotator_class in custom_annotators:\n if annotator_class.__name__ == annotator_string:\n mapped_annotators[name] = annotator_class\n break\n if name not in mapped_annotators:\n try:\n annotator_class = getattr(sys.modules[__name__], annotator_string)\n mapped_annotators[name] = annotator_class\n except AttributeError as e:\n log.warning(f\"{annotator_string} not found: {e}\")\n else:\n log.warning(\"No annotators configured!\")\n\n mapped_configs = {}\n if \"general\" in config_dict:\n for name, config in config_dict[\"general\"].items():\n try:\n mapped_configs[name] = AnnotatorConfig(**config)\n except Exception as e:\n log.error(f\"Error processing config for '{name}': {str(e)}\")\n else:\n log.warning(\"No general settings configured, using default settings.\")\n\n model_factory_config = {\"models\": mapped_models, \"annotators\": mapped_annotators, \"configs\": mapped_configs}\n\n return ModelFactory(**model_factory_config)\n\n def add_annotator(self, name: str) -> None:\n \"\"\"\n Adds an annotator to the processor.\n\n Args:\n name (str): The alias of the annotator to add.\n\n Returns:\n None\n\n Raises:\n Exception: If there is an error creating the annotator.\n \"\"\"\n try:\n annotator = create_annotator(name, self.model_factory)\n log.info(\n f\"Added {type(annotator).__name__} to processor with config {self.model_factory.configs.get(name)}\"\n )\n except Exception as e:\n raise Exception(f\"Error creating annotator: {e}\")\n\n self.annotators.append(annotator)\n\n def remove_annotator(self, name: str) -> None:\n \"\"\"\n Removes an annotator from the processor.\n\n Args:\n name (str): The alias of the annotator to remove.\n\n Returns:\n None\n \"\"\"\n annotator_found = False\n annotator_name = self.model_factory.annotators[name]\n\n for annotator in self.annotators:\n if type(annotator).__name__ == annotator_name.__name__:\n self.annotators.remove(annotator)\n annotator_found = True\n log.info(f\"Removed {type(annotator).__name__} from processor\")\n break\n\n if not annotator_found:\n log.warning(f\"Annotator {type(name).__name__} not found in processor\")\n\n def print_model_cards(self) -> None:\n \"\"\"\n Prints the model cards for each annotator in the `annotators` list.\n\n Each model card includes the name of the annotator's class and its category.\n \"\"\"\n for annotator in self.annotators:\n print(f\"{type(annotator).__name__}: {annotator.cat}\")\n\n def process(self, note: Note, record_concepts: Optional[List[Concept]] = None) -> List[Concept]:\n \"\"\"\n Process the given note and extract concepts using the loaded annotators.\n\n Args:\n note (Note): The note to be processed.\n record_concepts (Optional[List[Concept]]): A list of existing concepts in the EHR record.\n\n Returns:\n A list of extracted concepts.\n\n \"\"\"\n if not self.annotators:\n log.warning(\"No annotators loaded, use .add_annotator() to load annotators\")\n return []\n\n concepts: List[Concept] = []\n\n for annotator in self.annotators:\n log.debug(f\"Processing concepts with {type(annotator).__name__}\")\n if Category.MEDICATION in annotator.concept_types:\n detected_concepts = annotator(note, record_concepts, self.dosage_extractor)\n concepts.extend(detected_concepts)\n else:\n detected_concepts = annotator(note, record_concepts)\n concepts.extend(detected_concepts)\n\n return concepts\n\n def get_concept_dicts(\n self, note: Note, filter_uncategorized: bool = True, record_concepts: Optional[List[Concept]] = None\n ) -> List[Dict]:\n \"\"\"\n Returns concepts in dictionary format.\n\n Args:\n note (Note): Note containing text to extract concepts from.\n filter_uncategorized (bool): If True, does not return concepts where category=None. Default is True.\n record_concepts (Optional[List[Concept]]): List of concepts in existing record.\n\n Returns:\n Extracted concepts in JSON-compatible dictionary format.\n \"\"\"\n concepts = self.process(note, record_concepts)\n concept_list = []\n for concept in concepts:\n if filter_uncategorized and concept.category is None:\n continue\n concept_dict = concept.__dict__\n if concept.dosage is not None:\n concept_dict[\"dosage\"] = {\n \"dose\": concept.dosage.dose.dict() if concept.dosage.dose else None,\n \"duration\": concept.dosage.duration.dict() if concept.dosage.duration else None,\n \"frequency\": concept.dosage.frequency.dict() if concept.dosage.frequency else None,\n \"route\": concept.dosage.route.dict() if concept.dosage.route else None,\n }\n if concept.meta is not None:\n meta_anns = []\n for meta in concept.meta:\n meta_dict = meta.__dict__\n meta_dict[\"value\"] = meta.value.name\n meta_anns.append(meta_dict)\n concept_dict[\"meta\"] = meta_anns\n if concept.category is not None:\n concept_dict[\"category\"] = concept.category.name\n concept_list.append(concept_dict)\n\n return concept_list\n
"},{"location":"api-reference/noteprocessor/#miade.core.NoteProcessor.add_annotator","title":"add_annotator(name)
","text":"Adds an annotator to the processor.
Parameters:
Name Type Description Defaultname
str
The alias of the annotator to add.
requiredReturns:
Type DescriptionNone
None
Raises:
Type DescriptionException
If there is an error creating the annotator.
Source code insrc/miade/core.py
def add_annotator(self, name: str) -> None:\n \"\"\"\n Adds an annotator to the processor.\n\n Args:\n name (str): The alias of the annotator to add.\n\n Returns:\n None\n\n Raises:\n Exception: If there is an error creating the annotator.\n \"\"\"\n try:\n annotator = create_annotator(name, self.model_factory)\n log.info(\n f\"Added {type(annotator).__name__} to processor with config {self.model_factory.configs.get(name)}\"\n )\n except Exception as e:\n raise Exception(f\"Error creating annotator: {e}\")\n\n self.annotators.append(annotator)\n
"},{"location":"api-reference/noteprocessor/#miade.core.NoteProcessor.get_concept_dicts","title":"get_concept_dicts(note, filter_uncategorized=True, record_concepts=None)
","text":"Returns concepts in dictionary format.
Parameters:
Name Type Description Defaultnote
Note
Note containing text to extract concepts from.
requiredfilter_uncategorized
bool
If True, does not return concepts where category=None. Default is True.
True
record_concepts
Optional[List[Concept]]
List of concepts in existing record.
None
Returns:
Type DescriptionList[Dict]
Extracted concepts in JSON-compatible dictionary format.
Source code insrc/miade/core.py
def get_concept_dicts(\n self, note: Note, filter_uncategorized: bool = True, record_concepts: Optional[List[Concept]] = None\n) -> List[Dict]:\n \"\"\"\n Returns concepts in dictionary format.\n\n Args:\n note (Note): Note containing text to extract concepts from.\n filter_uncategorized (bool): If True, does not return concepts where category=None. Default is True.\n record_concepts (Optional[List[Concept]]): List of concepts in existing record.\n\n Returns:\n Extracted concepts in JSON-compatible dictionary format.\n \"\"\"\n concepts = self.process(note, record_concepts)\n concept_list = []\n for concept in concepts:\n if filter_uncategorized and concept.category is None:\n continue\n concept_dict = concept.__dict__\n if concept.dosage is not None:\n concept_dict[\"dosage\"] = {\n \"dose\": concept.dosage.dose.dict() if concept.dosage.dose else None,\n \"duration\": concept.dosage.duration.dict() if concept.dosage.duration else None,\n \"frequency\": concept.dosage.frequency.dict() if concept.dosage.frequency else None,\n \"route\": concept.dosage.route.dict() if concept.dosage.route else None,\n }\n if concept.meta is not None:\n meta_anns = []\n for meta in concept.meta:\n meta_dict = meta.__dict__\n meta_dict[\"value\"] = meta.value.name\n meta_anns.append(meta_dict)\n concept_dict[\"meta\"] = meta_anns\n if concept.category is not None:\n concept_dict[\"category\"] = concept.category.name\n concept_list.append(concept_dict)\n\n return concept_list\n
"},{"location":"api-reference/noteprocessor/#miade.core.NoteProcessor.print_model_cards","title":"print_model_cards()
","text":"Prints the model cards for each annotator in the annotators
list.
Each model card includes the name of the annotator's class and its category.
Source code insrc/miade/core.py
def print_model_cards(self) -> None:\n \"\"\"\n Prints the model cards for each annotator in the `annotators` list.\n\n Each model card includes the name of the annotator's class and its category.\n \"\"\"\n for annotator in self.annotators:\n print(f\"{type(annotator).__name__}: {annotator.cat}\")\n
"},{"location":"api-reference/noteprocessor/#miade.core.NoteProcessor.process","title":"process(note, record_concepts=None)
","text":"Process the given note and extract concepts using the loaded annotators.
Parameters:
Name Type Description Defaultnote
Note
The note to be processed.
requiredrecord_concepts
Optional[List[Concept]]
A list of existing concepts in the EHR record.
None
Returns:
Type DescriptionList[Concept]
A list of extracted concepts.
Source code insrc/miade/core.py
def process(self, note: Note, record_concepts: Optional[List[Concept]] = None) -> List[Concept]:\n \"\"\"\n Process the given note and extract concepts using the loaded annotators.\n\n Args:\n note (Note): The note to be processed.\n record_concepts (Optional[List[Concept]]): A list of existing concepts in the EHR record.\n\n Returns:\n A list of extracted concepts.\n\n \"\"\"\n if not self.annotators:\n log.warning(\"No annotators loaded, use .add_annotator() to load annotators\")\n return []\n\n concepts: List[Concept] = []\n\n for annotator in self.annotators:\n log.debug(f\"Processing concepts with {type(annotator).__name__}\")\n if Category.MEDICATION in annotator.concept_types:\n detected_concepts = annotator(note, record_concepts, self.dosage_extractor)\n concepts.extend(detected_concepts)\n else:\n detected_concepts = annotator(note, record_concepts)\n concepts.extend(detected_concepts)\n\n return concepts\n
"},{"location":"api-reference/noteprocessor/#miade.core.NoteProcessor.remove_annotator","title":"remove_annotator(name)
","text":"Removes an annotator from the processor.
Parameters:
Name Type Description Defaultname
str
The alias of the annotator to remove.
requiredReturns:
Type DescriptionNone
None
Source code insrc/miade/core.py
def remove_annotator(self, name: str) -> None:\n \"\"\"\n Removes an annotator from the processor.\n\n Args:\n name (str): The alias of the annotator to remove.\n\n Returns:\n None\n \"\"\"\n annotator_found = False\n annotator_name = self.model_factory.annotators[name]\n\n for annotator in self.annotators:\n if type(annotator).__name__ == annotator_name.__name__:\n self.annotators.remove(annotator)\n annotator_found = True\n log.info(f\"Removed {type(annotator).__name__} from processor\")\n break\n\n if not annotator_found:\n log.warning(f\"Annotator {type(name).__name__} not found in processor\")\n
"},{"location":"api-reference/problemsannotator/","title":"ProblemsAnnotator","text":" Bases: Annotator
Annotator class for identifying and processing problems in medical notes.
This class extends the base Annotator
class and provides specific functionality for identifying and processing problems in medical notes. It implements methods for loading problem lookup data, processing meta annotations, filtering concepts, and post-processing the annotated concepts.
Attributes:
Name Type Descriptioncat
CAT
The CAT (Concept Annotation Tool) instance used for annotation.
config
AnnotatorConfig
The configuration object for the annotator.
Propertiesconcept_types (list): A list of concept types supported by this annotator. pipeline (list): The list of processing steps in the annotation pipeline.
Source code insrc/miade/annotators.py
class ProblemsAnnotator(Annotator):\n \"\"\"\n Annotator class for identifying and processing problems in medical notes.\n\n This class extends the base `Annotator` class and provides specific functionality\n for identifying and processing problems in medical notes. It implements methods\n for loading problem lookup data, processing meta annotations, filtering concepts,\n and post-processing the annotated concepts.\n\n Attributes:\n cat (CAT): The CAT (Concept Annotation Tool) instance used for annotation.\n config (AnnotatorConfig): The configuration object for the annotator.\n\n Properties:\n concept_types (list): A list of concept types supported by this annotator.\n pipeline (list): The list of processing steps in the annotation pipeline.\n \"\"\"\n\n def __init__(self, cat: CAT, config: AnnotatorConfig = None):\n super().__init__(cat, config)\n self._load_problems_lookup_data()\n\n @property\n def concept_types(self) -> List[Category]:\n \"\"\"\n Get the list of concept types supported by this annotator.\n\n Returns:\n [Category.PROBLEM]\n \"\"\"\n return [Category.PROBLEM]\n\n @property\n def pipeline(self) -> List[str]:\n \"\"\"\n Get the list of processing steps in the annotation pipeline.\n\n Returns:\n [\"preprocessor\", \"medcat\", \"paragrapher\", \"postprocessor\", \"deduplicator\"]\n \"\"\"\n return [\"preprocessor\", \"medcat\", \"paragrapher\", \"postprocessor\", \"deduplicator\"]\n\n def _load_problems_lookup_data(self) -> None:\n \"\"\"\n Load the problem lookup data.\n\n Raises:\n RuntimeError: If the lookup data directory does not exist.\n \"\"\"\n if not os.path.isdir(self.config.lookup_data_path):\n raise RuntimeError(f\"No lookup data configured: {self.config.lookup_data_path} does not exist!\")\n else:\n self.negated_lookup = load_lookup_data(self.config.lookup_data_path + \"negated.csv\", as_dict=True)\n self.historic_lookup = load_lookup_data(self.config.lookup_data_path + \"historic.csv\", as_dict=True)\n self.suspected_lookup = load_lookup_data(self.config.lookup_data_path + \"suspected.csv\", as_dict=True)\n self.filtering_blacklist = load_lookup_data(\n self.config.lookup_data_path + \"problem_blacklist.csv\", no_header=True\n )\n\n def _process_meta_annotations(self, concept: Concept) -> Optional[Concept]:\n \"\"\"\n Process the meta annotations for a concept.\n\n Args:\n concept (Concept): The concept to process.\n\n Returns:\n The processed concept, or None if it should be removed.\n\n Raises:\n ValueError: If the concept has an invalid negex value.\n \"\"\"\n # Add, convert, or ignore concepts\n meta_ann_values = [meta_ann.value for meta_ann in concept.meta] if concept.meta is not None else []\n\n convert = False\n tag = \"\"\n # only get meta model results if negex is false\n if concept.negex is not None:\n if concept.negex:\n convert = self.negated_lookup.get(int(concept.id), False)\n tag = \" (negated)\"\n elif Presence.SUSPECTED in meta_ann_values:\n convert = self.suspected_lookup.get(int(concept.id), False)\n tag = \" (suspected)\"\n elif Relevance.HISTORIC in meta_ann_values:\n convert = self.historic_lookup.get(int(concept.id), False)\n tag = \" (historic)\"\n else:\n if Presence.NEGATED in meta_ann_values:\n convert = self.negated_lookup.get(int(concept.id), False)\n tag = \" (negated)\"\n elif Presence.SUSPECTED in meta_ann_values:\n convert = self.suspected_lookup.get(int(concept.id), False)\n tag = \" (suspected)\"\n elif Relevance.HISTORIC in meta_ann_values:\n convert = self.historic_lookup.get(int(concept.id), False)\n tag = \" (historic)\"\n\n if convert:\n if tag == \" (negated)\" and concept.negex:\n log.debug(\n f\"Converted concept ({concept.id} | {concept.name}) to ({str(convert)} | {concept.name + tag}): \"\n f\"negation detected by negex\"\n )\n else:\n log.debug(\n f\"Converted concept ({concept.id} | {concept.name}) to ({str(convert)} | {concept.name + tag}):\"\n f\"detected by meta model\"\n )\n concept.id = str(convert)\n concept.name += tag\n else:\n if concept.negex:\n log.debug(f\"Removed concept ({concept.id} | {concept.name}): negation (negex) with no conversion match\")\n return None\n if concept.negex is None and Presence.NEGATED in meta_ann_values:\n log.debug(\n f\"Removed concept ({concept.id} | {concept.name}): negation (meta model) with no conversion match\"\n )\n return None\n if Presence.SUSPECTED in meta_ann_values:\n log.debug(f\"Removed concept ({concept.id} | {concept.name}): suspected with no conversion match\")\n return None\n if Relevance.IRRELEVANT in meta_ann_values:\n log.debug(f\"Removed concept ({concept.id} | {concept.name}): irrelevant concept\")\n return None\n if Relevance.HISTORIC in meta_ann_values:\n log.debug(f\"No change to concept ({concept.id} | {concept.name}): historic with no conversion match\")\n\n concept.category = Category.PROBLEM\n\n return concept\n\n def _is_blacklist(self, concept):\n \"\"\"\n Check if a concept is in the filtering blacklist.\n\n Args:\n concept: The concept to check.\n\n Returns:\n True if the concept is in the blacklist, False otherwise.\n \"\"\"\n # filtering blacklist\n if int(concept.id) in self.filtering_blacklist.values:\n log.debug(f\"Removed concept ({concept.id} | {concept.name}): concept in problems blacklist\")\n return True\n return False\n\n def _process_meta_ann_by_paragraph(\n self, concept: Concept, paragraph: Paragraph, prob_concepts_in_structured_sections: List[Concept]\n ):\n \"\"\"\n Process the meta annotations for a concept based on the paragraph type.\n\n Args:\n concept (Concept): The concept to process.\n paragraph (Paragraph): The paragraph containing the concept.\n prob_concepts_in_structured_sections (List[Concept]): The list of problem concepts in structured sections.\n \"\"\"\n # if paragraph is structured problems section, add to prob list and convert to corresponding relevance\n if paragraph.type in self.structured_prob_lists:\n prob_concepts_in_structured_sections.append(concept)\n for meta in concept.meta:\n if meta.name == \"relevance\" and meta.value == Relevance.IRRELEVANT:\n new_relevance = self.structured_prob_lists[paragraph.type]\n log.debug(\n f\"Converted {meta.value} to \"\n f\"{new_relevance} for concept ({concept.id} | {concept.name}): \"\n f\"paragraph is {paragraph.type}\"\n )\n meta.value = new_relevance\n # if paragraph is meds or irrelevant section, convert problems to irrelevant\n elif paragraph.type in self.structured_med_lists or paragraph.type in self.irrelevant_paragraphs:\n for meta in concept.meta:\n if meta.name == \"relevance\" and meta.value != Relevance.IRRELEVANT:\n log.debug(\n f\"Converted {meta.value} to \"\n f\"{Relevance.IRRELEVANT} for concept ({concept.id} | {concept.name}): \"\n f\"paragraph is {paragraph.type}\"\n )\n meta.value = Relevance.IRRELEVANT\n\n def process_paragraphs(self, note: Note, concepts: List[Concept]) -> List[Concept]:\n \"\"\"\n Process the paragraphs in a note and filter the concepts.\n\n Args:\n note (Note): The note to process.\n concepts (List[Concept]): The list of concepts to filter.\n\n Returns:\n The filtered list of concepts.\n \"\"\"\n prob_concepts_in_structured_sections: List[Concept] = []\n\n for paragraph in note.paragraphs:\n for concept in concepts:\n if concept.start >= paragraph.start and concept.end <= paragraph.end:\n # log.debug(f\"({concept.name} | {concept.id}) is in {paragraph.type}\")\n if concept.meta:\n self._process_meta_ann_by_paragraph(concept, paragraph, prob_concepts_in_structured_sections)\n\n # if more than set no. concepts in prob or imp or pmh sections, return only those and ignore all other concepts\n if len(prob_concepts_in_structured_sections) > self.config.structured_list_limit:\n log.debug(\n f\"Ignoring concepts elsewhere in the document because \"\n f\"more than {self.config.structured_list_limit} concepts exist \"\n f\"in prob, imp, pmh structured sections: {len(prob_concepts_in_structured_sections)}\"\n )\n return prob_concepts_in_structured_sections\n\n return concepts\n\n def postprocess(self, concepts: List[Concept]) -> List[Concept]:\n \"\"\"\n Post-process the concepts and filter out irrelevant concepts.\n\n Args:\n concepts (List[Concept]): The list of concepts to post-process.\n\n Returns:\n The filtered list of concepts.\n \"\"\"\n # deepcopy so we still have reference to original list of concepts\n all_concepts = deepcopy(concepts)\n filtered_concepts = []\n for concept in all_concepts:\n if self._is_blacklist(concept):\n continue\n # meta annotations\n concept = self._process_meta_annotations(concept)\n # ignore concepts filtered by meta-annotations\n if concept is None:\n continue\n filtered_concepts.append(concept)\n\n return filtered_concepts\n
"},{"location":"api-reference/problemsannotator/#miade.annotators.ProblemsAnnotator.concept_types","title":"concept_types: List[Category]
property
","text":"Get the list of concept types supported by this annotator.
Returns:
Type DescriptionList[Category]
[Category.PROBLEM]
"},{"location":"api-reference/problemsannotator/#miade.annotators.ProblemsAnnotator.pipeline","title":"pipeline: List[str]
property
","text":"Get the list of processing steps in the annotation pipeline.
Returns:
Type DescriptionList[str]
[\"preprocessor\", \"medcat\", \"paragrapher\", \"postprocessor\", \"deduplicator\"]
"},{"location":"api-reference/problemsannotator/#miade.annotators.ProblemsAnnotator.postprocess","title":"postprocess(concepts)
","text":"Post-process the concepts and filter out irrelevant concepts.
Parameters:
Name Type Description Defaultconcepts
List[Concept]
The list of concepts to post-process.
requiredReturns:
Type DescriptionList[Concept]
The filtered list of concepts.
Source code insrc/miade/annotators.py
def postprocess(self, concepts: List[Concept]) -> List[Concept]:\n \"\"\"\n Post-process the concepts and filter out irrelevant concepts.\n\n Args:\n concepts (List[Concept]): The list of concepts to post-process.\n\n Returns:\n The filtered list of concepts.\n \"\"\"\n # deepcopy so we still have reference to original list of concepts\n all_concepts = deepcopy(concepts)\n filtered_concepts = []\n for concept in all_concepts:\n if self._is_blacklist(concept):\n continue\n # meta annotations\n concept = self._process_meta_annotations(concept)\n # ignore concepts filtered by meta-annotations\n if concept is None:\n continue\n filtered_concepts.append(concept)\n\n return filtered_concepts\n
"},{"location":"api-reference/problemsannotator/#miade.annotators.ProblemsAnnotator.process_paragraphs","title":"process_paragraphs(note, concepts)
","text":"Process the paragraphs in a note and filter the concepts.
Parameters:
Name Type Description Defaultnote
Note
The note to process.
requiredconcepts
List[Concept]
The list of concepts to filter.
requiredReturns:
Type DescriptionList[Concept]
The filtered list of concepts.
Source code insrc/miade/annotators.py
def process_paragraphs(self, note: Note, concepts: List[Concept]) -> List[Concept]:\n \"\"\"\n Process the paragraphs in a note and filter the concepts.\n\n Args:\n note (Note): The note to process.\n concepts (List[Concept]): The list of concepts to filter.\n\n Returns:\n The filtered list of concepts.\n \"\"\"\n prob_concepts_in_structured_sections: List[Concept] = []\n\n for paragraph in note.paragraphs:\n for concept in concepts:\n if concept.start >= paragraph.start and concept.end <= paragraph.end:\n # log.debug(f\"({concept.name} | {concept.id}) is in {paragraph.type}\")\n if concept.meta:\n self._process_meta_ann_by_paragraph(concept, paragraph, prob_concepts_in_structured_sections)\n\n # if more than set no. concepts in prob or imp or pmh sections, return only those and ignore all other concepts\n if len(prob_concepts_in_structured_sections) > self.config.structured_list_limit:\n log.debug(\n f\"Ignoring concepts elsewhere in the document because \"\n f\"more than {self.config.structured_list_limit} concepts exist \"\n f\"in prob, imp, pmh structured sections: {len(prob_concepts_in_structured_sections)}\"\n )\n return prob_concepts_in_structured_sections\n\n return concepts\n
"},{"location":"user-guide/configuration/","title":"Configurations","text":""},{"location":"user-guide/configuration/#annotator","title":"Annotator","text":"The MiADE processor is configured by a yaml
file that maps a human-readable key for each of your models to a MedCAT model ID and a MiADE annotator class. The config file must be in the same folder as the MedCAT models.
models
: The models section maps human-readable key-value pairing to the MedCAT model ID to use in MiADEannotators
: The annotators section maps human-readable key-value pairing to Annotator
processing classes to use in MiADEgeneral
lookup_data_path
: Specifies the lookup data to usenegation_detection
: negex
(rule-based algorithm) or None
(use default MetaCAT models)structured_list_limit
: Specifies the maximum number of concepts detected in a structured paragraph section. If there are more than the specified number of concepts, then concepts in prose are ignored (to avoid returning too many concepts which could be less relevant). Default 0 so that this feature is disabled by default.disable
: Disable any specific pipeline components - the API here is similar to spacy pipelinesadd_numbering
: Option to add a number prefix to the concept display names e.g. \"01 Diabetes\"models:\n problems: f25ec9423958e8d6\n meds/allergies: a146c741501cf1f7\nannotators:\n problems: ProblemsAnnotator\n meds/allergies: MedsAllergiesAnnotator\ngeneral:\n problems:\n lookup_data_path: ./lookup_data/\n negation_detection: None\n structured_list_limit: 0 # if more than this number of concepts in structure section, ignore concepts in prose\n disable: []\n add_numbering: True\n meds/allergies:\n lookup_data_path: ./lookup_data/\n negation_detection: None\n disable: []\n add_numbering: False\n
"},{"location":"user-guide/configuration/#lookup-table","title":"Lookup Table","text":"Lookup tables are by default not packaged with the main MiADE package to provide flexibility to customise the postprocessing steps. We provide example lookup data in miade-dataset
which you can download and use.
git clone https://github.com/uclh-criu/miade-datasets.git\n
"},{"location":"user-guide/quickstart/","title":"Quickstart","text":""},{"location":"user-guide/quickstart/#extract-concepts-and-dosages-from-a-note-using-miade","title":"Extract concepts and dosages from a Note using MiADE","text":""},{"location":"user-guide/quickstart/#configuring-the-miade-processor","title":"Configuring the MiADE Processor","text":"NoteProcessor
is the MiADE core. It is initialised with a model directory path that contains all the MedCAT model pack .zip files we would like to use in our pipeline, and a config file that maps an alias to the model IDs (model IDs can be found in MedCAT model_cards
or usually will be in the name) and annotators we would like to use:
config.yaml
models:\n problems: f25ec9423958e8d6\n meds/allergies: a146c741501cf1f7\nannotators:\n problems: ProblemsAnnotator\n meds/allergies: MedsAllergiesAnnotator\n
We can initialise a MiADE NoteProcessor
object by passing in the model directory which contains our MedCAT models and config.yaml
file: miade = NoteProcessor(Path(\"path/to/model/dir\"))\n
Once NoteProcessor
is initialised, we can add annotators by the aliases we have specified in config.yaml
to our processor: miade.add_annotator(\"problems\", use_negex=True)\nmiade.add_annotator(\"meds/allergies\")\n
When adding annotators, we have the option to add NegSpacy to the MedCAT spaCy pipeline, which implements the NegEx algorithm (Chapman et al. 2001) for negation detection. This allows the models to perform simple rule-based negation detection in the absence of MetaCAT models.
"},{"location":"user-guide/quickstart/#creating-a-note","title":"Creating a Note","text":"Create a Note
object which contains the text we would like to extract concepts and dosages from:
text = \"\"\"\nSuspected heart failure\n\nPMH:\nprev history of Hypothyroidism\nMI 10 years ago\n\n\nCurrent meds:\nLosartan 100mg daily\nAtorvastatin 20mg daily\nParacetamol 500mg tablets 2 tabs qds prn\n\nAllergies:\nPenicillin - rash\n\nReferred with swollen ankles and shortness of breath since 2 weeks.\n\"\"\"\n\nnote = Note(text)\n
"},{"location":"user-guide/quickstart/#extracting-concepts-and-dosages","title":"Extracting Concepts and Dosages","text":"MiADE currently extracts concepts in SNOMED CT. Each concept contains:
name
: name of conceptid
: concept IDcategory
: type of concept e.g. problems, medictionsstart
: start index of concept spanend
: end index of concept spandosage
: for medication conceptsnegex
: Negex result if configuredmeta
: Meta annotations if MetaCAT models are usedThe dosages associated with medication concepts are extracted by the built-in MiADE DosageExtractor
, using a combination of NER model Med7 and the CALIBER rule-based drug dose lookup algorithm. It returns: The output format is directly translatable to HL7 CDA but can also easily be converted to FHIR.
dose
duration
frequency
route
Putting it all together, we can now extract concepts from our Note
object:
concepts = miade.process(note)\nfor concept in concepts:\n print(concept)\n\n# {name: breaking out - eruption, id: 271807003, category: Category.REACTION, start: 204, end: 208, dosage: None, negex: False, meta: None} \n# {name: penicillin, id: 764146007, category: Category.ALLERGY, start: 191, end: 201, dosage: None, negex: False, meta: None} \n
concepts = miade.get_concept_dicts(note)\nprint(concepts)\n\n# [{'name': 'hypothyroidism (historic)',\n# 'id': '161443002',\n# 'category': 'PROBLEM',\n# 'start': 46,\n# 'end': 60,\n# 'dosage': None,\n# 'negex': False,\n# 'meta': [{'name': 'relevance',\n# 'value': 'HISTORIC',\n# 'confidence': 0.999841570854187},\n# ...\n
"},{"location":"user-guide/quickstart/#handling-existing-records-deduplication","title":"Handling existing records: deduplication","text":"MiADE is built to handle existing medication records from EHR systems that can be sent alongside the note. It will perform basic deduplication matching on id for existing record concepts.
# create list of concepts that already exists in patient record\nrecord_concepts = [\n Concept(id=\"161443002\", name=\"hypothyroidism (historic)\", category=Category.PROBLEM),\n Concept(id=\"267039000\", name=\"swollen ankle\", category=Category.PROBLEM)\n]\n
We can pass in a list of existing concepts from the EHR to MiADE at runtime:
miade.process(note=note, record_concepts=record_concepts)\n
"},{"location":"user-guide/quickstart/#customising-miade","title":"Customising MiADE","text":""},{"location":"user-guide/quickstart/#training-custom-medcat-models","title":"Training Custom MedCAT Models","text":"MiADE provides command line interface scripts for automatically building MedCAT model packs, unsupervised training, supervised training steps, and the creation and training of MetaCAT models. For more information on MedCAT models, see MedCAT documentation and paper.
The --synthetic-data-path
option allows you to add synthetically generated training data in CSV format to the supervised and MetaCAT training steps. The CSV should have the following format:
# Trains unsupervised training step of MedCAT model\nmiade train $MODEL_PACK_PATH $TEXT_DATA_PATH --tag \"miade-example\"\n
# Trains supervised training step of MedCAT model\nmiade train-supervised $MODEL_PACK_PATH $MEDCAT_JSON_EXPORT --synthetic-data-path $SYNTHETIC_CSV_PATH\n
# Creates BBPE tokenizer for MetaCAT\nmiade create-bbpe-tokenizer $TEXT_DATA_PATH\n
# Initialises MetaCAT models to do training on\nmiade create-metacats $TOKENIZER_PATH $CATEGORY_NAMES\n
# Trains the MetaCAT Bi-LSTM models\nmiade train-metacats $METACAT_MODEL_PATH $MEDCAT_JSON_EXPORT --synthetic-data-path $SYNTHETIC_CSV_PATH\n
# Packages MetaCAT models with the main MedCAT model pack\nmiade add_metacat_models $MODEL_PACK_PATH $METACAT_MODEL_PATH\n
"},{"location":"user-guide/quickstart/#creating-custom-miade-annotators","title":"Creating Custom MiADE Annotators","text":"We can add custom annotators with more specialised postprocessing steps to MiADE by subclassing Annotator
and initialising NoteProcessor
with a list of custom annotators
Annotator
methods include:
.get_concepts()
: returns MedCAT output as MiADE Concepts
.add_dosages_to_concepts()
: uses the MiADE built-in DosageExtractor
to add dosages associated with medication concepts.deduplicate()
: filters duplicate concepts in list An example custom Annotator
class might look like this:
class CustomAnnotator(Annotator):\n def __init__(self, cat: MiADE_CAT):\n super().__init__(cat)\n # we need to include MEDICATIONS in concept types so MiADE processor will also extract dosages\n self.concept_types = [Category.MEDICATION, Category.ALLERGY]\n\n def postprocess(self, concepts: List[Concept]) -> List[Concept]:\n # some example post-processing code\n reactions = [\"271807003\"]\n allergens = [\"764146007\"]\n for concept in concepts:\n if concept.id in reactions:\n concept.category = Category.REACTION\n elif concept.id in allergens:\n concept.category = Category.ALLERGY\n return concepts\n\n def __call__(\n self,\n note: Note,\n record_concepts: Optional[List[Concept]] = None,\n dosage_extractor: Optional[DosageExtractor] = None,\n ):\n concepts = self.get_concepts(note)\n concepts = self.postprocess(concepts)\n # run dosage extractor if given\n if dosage_extractor is not None:\n concepts = self.add_dosages_to_concepts(dosage_extractor, concepts, note)\n concepts = self.deduplicate(concepts, record_concepts)\n\n return concepts\n
Add custom annotator to config file:
config.yamlmodels:\n problems: f25ec9423958e8d6\n meds/allergies: a146c741501cf1f7\n custom: a146c741501cf1f7\nannotators:\n problems: ProblemsAnnotator\n meds/allergies: MedsAllergiesAnnotator\n custom: CustomAnnotator\n
Initialise MiADE with the custom annotator:
miade = NoteProcessor(Path(MODEL_DIR), custom_annotators=[CustomAnnotator])\nmiade.add_annotator(\"custom\")\n
"}]}
\ No newline at end of file
+{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"Welcome to the MiADE Documentation","text":"MiADE (Medical information AI Data Extractor) is a set of tools for extracting formattable data from clinical notes stored in electronic health record systems (EHRs). Built with Cogstack's MedCAT package.
"},{"location":"#installing","title":"Installing","text":"To install MiADE, you need to download the spacy base model and Med7 model first:
pip install https://huggingface.co/kormilitzin/en_core_med7_lg/resolve/main/en_core_med7_lg-any-py3-none-any.whl\npython -m spacy download en_core_web_md\n
Then, install MiADE: pip install miade\n
"},{"location":"#license","title":"License","text":"MiADE is licensed under Elastic License 2.0.
The Elastic License 2.0 is a flexible license that allows you to use, copy, distribute, make available, and prepare derivative works of the software, as long as you do not provide the software to others as a managed service or include it in a free software directory. For the full license text, see our license page.
"},{"location":"contributing/","title":"Contributing","text":"Contribute to MiADE! Contribution guide
"},{"location":"about/overview/","title":"Project Overview","text":""},{"location":"about/overview/#background","title":"Background","text":"Data about people\u2019s health stored in electronic health records (EHRs) can play an important role in improving the quality of patient care. Much of the information in EHRs is recorded in ordinary language without any restriction on format ('free text'), as this is the natural way in which people communicate. However, if this information were stored in a standardised, structured format, computers will also be able to process the information to help clinicians find and interpret information for better and safer decision making. This would enable EHR systems such as Epic, the system in place at UCLH since April 2019, to support clinical decision making. For instance, the system may be able to ensure that a patient is not prescribed medicine that would give them an allergic reaction.
"},{"location":"about/overview/#the-challenge","title":"The challenge","text":"Free text may contain words and abbreviations which may be interpreted in more than one way, such as 'HR', which can mean 'Hour' or 'Heart Rate'. Free text may also contain negations; for example, a diagnosis may be mentioned in the text but the rest of the sentence might say that it was ruled out. Although computers can be used to interpret free text, they cannot always get it right, so clinicians will always have to check the results to ensure patient safety. Expressing information in a structured way can avoid this problem, but has a big disadvantage - it can be time-consuming for clinicians to enter the information. This can mean that information is incomplete, or clinicians are so busy on the computer that they do not have time to listen to their patients.
"},{"location":"about/overview/#meeting-the-need","title":"Meeting the need","text":"The aim of MiADE is to develop a system to support automatic conversion of the clinician\u2019s free text into a structured format. The clinician can check the structured data immediately, before making it a formal part of the patient\u2019s record. The system will record a patient\u2019s diagnoses, medications and allergies in a structured way, using NHS-endorsed clinical data standards (e.g. FIHR and SNOMED CT). It will use a technique called Natural Language Processing (NLP). NLP has been used by research teams to extract information from existing EHRs but has rarely been used to improve the way information is entered in the first place. Our NLP system will continuously learn and improve as more text is analysed and checked by clinicians.
We will first test the system in University College London Hospitals, where a new EHR system called Epic is in place. We will study how effective it is, and how clinicians and patients find it when it is used in consultations. Based on feedback, we will make improvements and install it for testing at a second site (Great Ormond Street Hospital). Our aim is for the system to be eventually rolled out to more hospitals and doctors\u2019 surgeries across the NHS.
"},{"location":"about/team/","title":"Team","text":"The MiADE project is developed by a team of clinicians, developers, AI researchers, and data standard experts at University College London (UCL) and the University College London Hospitals (UCLH), in collaboration with the Cogstack at King's College London (KCL).
"},{"location":"api-reference/annotator/","title":"Annotator","text":" Bases: ABC
An abstract base class for annotators.
Annotators are responsible for processing medical notes and extracting relevant concepts from them.
Attributes:
Name Type Descriptioncat
CAT
The MedCAT instance used for concept extraction.
config
AnnotatorConfig
The configuration for the annotator.
Source code insrc/miade/annotators.py
class Annotator(ABC):\n \"\"\"\n An abstract base class for annotators.\n\n Annotators are responsible for processing medical notes and extracting relevant concepts from them.\n\n Attributes:\n cat (CAT): The MedCAT instance used for concept extraction.\n config (AnnotatorConfig): The configuration for the annotator.\n \"\"\"\n\n def __init__(self, cat: CAT, config: AnnotatorConfig = None):\n self.cat = cat\n self.config = config if config is not None else AnnotatorConfig()\n\n if self.config.negation_detection == \"negex\":\n self._add_negex_pipeline()\n\n # TODO make paragraph processing params configurable\n self.structured_prob_lists = {\n ParagraphType.prob: Relevance.PRESENT,\n ParagraphType.imp: Relevance.PRESENT,\n ParagraphType.pmh: Relevance.HISTORIC,\n }\n self.structured_med_lists = {\n ParagraphType.med: SubstanceCategory.TAKING,\n ParagraphType.allergy: SubstanceCategory.ADVERSE_REACTION,\n }\n self.irrelevant_paragraphs = [ParagraphType.ddx, ParagraphType.exam, ParagraphType.plan]\n\n def _add_negex_pipeline(self) -> None:\n \"\"\"\n Adds the negex pipeline to the MedCAT instance.\n \"\"\"\n self.cat.pipe.spacy_nlp.add_pipe(\"sentencizer\")\n self.cat.pipe.spacy_nlp.enable_pipe(\"sentencizer\")\n self.cat.pipe.spacy_nlp.add_pipe(\"negex\")\n\n @property\n @abstractmethod\n def concept_types(self):\n \"\"\"\n Abstract property that should return a list of concept types supported by the annotator.\n \"\"\"\n pass\n\n @property\n @abstractmethod\n def pipeline(self):\n \"\"\"\n Abstract property that should return a list of pipeline steps for the annotator.\n \"\"\"\n pass\n\n @abstractmethod\n def process_paragraphs(self):\n \"\"\"\n Abstract method that should implement the logic for processing paragraphs in a note.\n \"\"\"\n pass\n\n @abstractmethod\n def postprocess(self):\n \"\"\"\n Abstract method that should implement the logic for post-processing extracted concepts.\n \"\"\"\n pass\n\n def run_pipeline(self, note: Note, record_concepts: List[Concept]) -> List[Concept]:\n \"\"\"\n Runs the annotation pipeline on a given note and returns the extracted concepts.\n\n Args:\n note (Note): The input note to process.\n record_concepts (List[Concept]): The list of concepts from existing EHR records.\n\n Returns:\n The extracted concepts from the note.\n \"\"\"\n concepts: List[Concept] = []\n\n for pipe in self.pipeline:\n if pipe not in self.config.disable:\n if pipe == \"preprocessor\":\n note = self.preprocess(note)\n elif pipe == \"medcat\":\n concepts = self.get_concepts(note)\n elif pipe == \"paragrapher\":\n concepts = self.process_paragraphs(note, concepts)\n elif pipe == \"postprocessor\":\n concepts = self.postprocess(concepts)\n elif pipe == \"deduplicator\":\n concepts = self.deduplicate(concepts, record_concepts)\n\n return concepts\n\n def get_concepts(self, note: Note) -> List[Concept]:\n \"\"\"\n Extracts concepts from a note using the MedCAT instance.\n\n Args:\n note (Note): The input note to extract concepts from.\n\n Returns:\n The extracted concepts from the note.\n \"\"\"\n concepts: List[Concept] = []\n for entity in self.cat.get_entities(note)[\"entities\"].values():\n try:\n concepts.append(Concept.from_entity(entity))\n log.debug(f\"Detected concept ({concepts[-1].id} | {concepts[-1].name})\")\n except ValueError as e:\n log.warning(f\"Concept skipped: {e}\")\n\n return concepts\n\n @staticmethod\n def preprocess(note: Note) -> Note:\n \"\"\"\n Preprocesses a note by cleaning its text and splitting it into paragraphs.\n\n Args:\n note (Note): The input note to preprocess.\n\n Returns:\n The preprocessed note.\n \"\"\"\n note.clean_text()\n note.get_paragraphs()\n\n return note\n\n @staticmethod\n def deduplicate(concepts: List[Concept], record_concepts: Optional[List[Concept]]) -> List[Concept]:\n \"\"\"\n Removes duplicate concepts from the extracted concepts list by strict ID matching.\n\n Args:\n concepts (List[Concept]): The list of extracted concepts.\n record_concepts (Optional[List[Concept]]): The list of concepts from existing EHR records.\n\n Returns:\n The deduplicated list of concepts.\n \"\"\"\n if record_concepts is not None:\n record_ids = {record_concept.id for record_concept in record_concepts}\n record_names = {record_concept.name for record_concept in record_concepts}\n else:\n record_ids = set()\n record_names = set()\n\n # Use an OrderedDict to keep track of ids as it preservers original MedCAT order (the order it appears in text)\n filtered_concepts: List[Concept] = []\n existing_concepts = OrderedDict()\n\n # Filter concepts that are in record or exist in concept list\n for concept in concepts:\n if concept.id is not None and (concept.id in record_ids or concept.id in existing_concepts):\n log.debug(f\"Removed concept ({concept.id} | {concept.name}): concept id exists in record\")\n # check name match for null ids - VTM deduplication\n elif concept.id is None and (concept.name in record_names or concept.name in existing_concepts.values()):\n log.debug(f\"Removed concept ({concept.id} | {concept.name}): concept name exists in record\")\n else:\n filtered_concepts.append(concept)\n existing_concepts[concept.id] = concept.name\n\n return filtered_concepts\n\n @staticmethod\n def add_numbering_to_name(concepts: List[Concept]) -> List[Concept]:\n \"\"\"\n Adds numbering to the names of problem concepts to control output ordering.\n\n Args:\n concepts (List[Concept]): The list of concepts to add numbering to.\n\n Returns:\n The list of concepts with numbering added to their names.\n \"\"\"\n # Prepend numbering to problem concepts e.g. 00 asthma, 01 stroke...\n for i, concept in enumerate(concepts):\n concept.name = f\"{i:02} {concept.name}\"\n\n return concepts\n\n def __call__(\n self,\n note: Note,\n record_concepts: Optional[List[Concept]] = None,\n ) -> List[Concept]:\n \"\"\"\n Runs the annotation pipeline on a given note and returns the extracted concepts.\n\n Args:\n note (Note): The input note to process.\n record_concepts (Optional[List[Concept]]): The list of concepts from existing EHR records.\n\n Returns:\n The extracted concepts from the note.\n \"\"\"\n concepts = self.run_pipeline(note, record_concepts)\n\n if self.config.add_numbering:\n concepts = self.add_numbering_to_name(concepts)\n\n return concepts\n
"},{"location":"api-reference/annotator/#miade.annotators.Annotator.concept_types","title":"concept_types
abstractmethod
property
","text":"Abstract property that should return a list of concept types supported by the annotator.
"},{"location":"api-reference/annotator/#miade.annotators.Annotator.pipeline","title":"pipeline
abstractmethod
property
","text":"Abstract property that should return a list of pipeline steps for the annotator.
"},{"location":"api-reference/annotator/#miade.annotators.Annotator.__call__","title":"__call__(note, record_concepts=None)
","text":"Runs the annotation pipeline on a given note and returns the extracted concepts.
Parameters:
Name Type Description Defaultnote
Note
The input note to process.
requiredrecord_concepts
Optional[List[Concept]]
The list of concepts from existing EHR records.
None
Returns:
Type DescriptionList[Concept]
The extracted concepts from the note.
Source code insrc/miade/annotators.py
def __call__(\n self,\n note: Note,\n record_concepts: Optional[List[Concept]] = None,\n) -> List[Concept]:\n \"\"\"\n Runs the annotation pipeline on a given note and returns the extracted concepts.\n\n Args:\n note (Note): The input note to process.\n record_concepts (Optional[List[Concept]]): The list of concepts from existing EHR records.\n\n Returns:\n The extracted concepts from the note.\n \"\"\"\n concepts = self.run_pipeline(note, record_concepts)\n\n if self.config.add_numbering:\n concepts = self.add_numbering_to_name(concepts)\n\n return concepts\n
"},{"location":"api-reference/annotator/#miade.annotators.Annotator.add_numbering_to_name","title":"add_numbering_to_name(concepts)
staticmethod
","text":"Adds numbering to the names of problem concepts to control output ordering.
Parameters:
Name Type Description Defaultconcepts
List[Concept]
The list of concepts to add numbering to.
requiredReturns:
Type DescriptionList[Concept]
The list of concepts with numbering added to their names.
Source code insrc/miade/annotators.py
@staticmethod\ndef add_numbering_to_name(concepts: List[Concept]) -> List[Concept]:\n \"\"\"\n Adds numbering to the names of problem concepts to control output ordering.\n\n Args:\n concepts (List[Concept]): The list of concepts to add numbering to.\n\n Returns:\n The list of concepts with numbering added to their names.\n \"\"\"\n # Prepend numbering to problem concepts e.g. 00 asthma, 01 stroke...\n for i, concept in enumerate(concepts):\n concept.name = f\"{i:02} {concept.name}\"\n\n return concepts\n
"},{"location":"api-reference/annotator/#miade.annotators.Annotator.deduplicate","title":"deduplicate(concepts, record_concepts)
staticmethod
","text":"Removes duplicate concepts from the extracted concepts list by strict ID matching.
Parameters:
Name Type Description Defaultconcepts
List[Concept]
The list of extracted concepts.
requiredrecord_concepts
Optional[List[Concept]]
The list of concepts from existing EHR records.
requiredReturns:
Type DescriptionList[Concept]
The deduplicated list of concepts.
Source code insrc/miade/annotators.py
@staticmethod\ndef deduplicate(concepts: List[Concept], record_concepts: Optional[List[Concept]]) -> List[Concept]:\n \"\"\"\n Removes duplicate concepts from the extracted concepts list by strict ID matching.\n\n Args:\n concepts (List[Concept]): The list of extracted concepts.\n record_concepts (Optional[List[Concept]]): The list of concepts from existing EHR records.\n\n Returns:\n The deduplicated list of concepts.\n \"\"\"\n if record_concepts is not None:\n record_ids = {record_concept.id for record_concept in record_concepts}\n record_names = {record_concept.name for record_concept in record_concepts}\n else:\n record_ids = set()\n record_names = set()\n\n # Use an OrderedDict to keep track of ids as it preservers original MedCAT order (the order it appears in text)\n filtered_concepts: List[Concept] = []\n existing_concepts = OrderedDict()\n\n # Filter concepts that are in record or exist in concept list\n for concept in concepts:\n if concept.id is not None and (concept.id in record_ids or concept.id in existing_concepts):\n log.debug(f\"Removed concept ({concept.id} | {concept.name}): concept id exists in record\")\n # check name match for null ids - VTM deduplication\n elif concept.id is None and (concept.name in record_names or concept.name in existing_concepts.values()):\n log.debug(f\"Removed concept ({concept.id} | {concept.name}): concept name exists in record\")\n else:\n filtered_concepts.append(concept)\n existing_concepts[concept.id] = concept.name\n\n return filtered_concepts\n
"},{"location":"api-reference/annotator/#miade.annotators.Annotator.get_concepts","title":"get_concepts(note)
","text":"Extracts concepts from a note using the MedCAT instance.
Parameters:
Name Type Description Defaultnote
Note
The input note to extract concepts from.
requiredReturns:
Type DescriptionList[Concept]
The extracted concepts from the note.
Source code insrc/miade/annotators.py
def get_concepts(self, note: Note) -> List[Concept]:\n \"\"\"\n Extracts concepts from a note using the MedCAT instance.\n\n Args:\n note (Note): The input note to extract concepts from.\n\n Returns:\n The extracted concepts from the note.\n \"\"\"\n concepts: List[Concept] = []\n for entity in self.cat.get_entities(note)[\"entities\"].values():\n try:\n concepts.append(Concept.from_entity(entity))\n log.debug(f\"Detected concept ({concepts[-1].id} | {concepts[-1].name})\")\n except ValueError as e:\n log.warning(f\"Concept skipped: {e}\")\n\n return concepts\n
"},{"location":"api-reference/annotator/#miade.annotators.Annotator.postprocess","title":"postprocess()
abstractmethod
","text":"Abstract method that should implement the logic for post-processing extracted concepts.
Source code insrc/miade/annotators.py
@abstractmethod\ndef postprocess(self):\n \"\"\"\n Abstract method that should implement the logic for post-processing extracted concepts.\n \"\"\"\n pass\n
"},{"location":"api-reference/annotator/#miade.annotators.Annotator.preprocess","title":"preprocess(note)
staticmethod
","text":"Preprocesses a note by cleaning its text and splitting it into paragraphs.
Parameters:
Name Type Description Defaultnote
Note
The input note to preprocess.
requiredReturns:
Type DescriptionNote
The preprocessed note.
Source code insrc/miade/annotators.py
@staticmethod\ndef preprocess(note: Note) -> Note:\n \"\"\"\n Preprocesses a note by cleaning its text and splitting it into paragraphs.\n\n Args:\n note (Note): The input note to preprocess.\n\n Returns:\n The preprocessed note.\n \"\"\"\n note.clean_text()\n note.get_paragraphs()\n\n return note\n
"},{"location":"api-reference/annotator/#miade.annotators.Annotator.process_paragraphs","title":"process_paragraphs()
abstractmethod
","text":"Abstract method that should implement the logic for processing paragraphs in a note.
Source code insrc/miade/annotators.py
@abstractmethod\ndef process_paragraphs(self):\n \"\"\"\n Abstract method that should implement the logic for processing paragraphs in a note.\n \"\"\"\n pass\n
"},{"location":"api-reference/annotator/#miade.annotators.Annotator.run_pipeline","title":"run_pipeline(note, record_concepts)
","text":"Runs the annotation pipeline on a given note and returns the extracted concepts.
Parameters:
Name Type Description Defaultnote
Note
The input note to process.
requiredrecord_concepts
List[Concept]
The list of concepts from existing EHR records.
requiredReturns:
Type DescriptionList[Concept]
The extracted concepts from the note.
Source code insrc/miade/annotators.py
def run_pipeline(self, note: Note, record_concepts: List[Concept]) -> List[Concept]:\n \"\"\"\n Runs the annotation pipeline on a given note and returns the extracted concepts.\n\n Args:\n note (Note): The input note to process.\n record_concepts (List[Concept]): The list of concepts from existing EHR records.\n\n Returns:\n The extracted concepts from the note.\n \"\"\"\n concepts: List[Concept] = []\n\n for pipe in self.pipeline:\n if pipe not in self.config.disable:\n if pipe == \"preprocessor\":\n note = self.preprocess(note)\n elif pipe == \"medcat\":\n concepts = self.get_concepts(note)\n elif pipe == \"paragrapher\":\n concepts = self.process_paragraphs(note, concepts)\n elif pipe == \"postprocessor\":\n concepts = self.postprocess(concepts)\n elif pipe == \"deduplicator\":\n concepts = self.deduplicate(concepts, record_concepts)\n\n return concepts\n
"},{"location":"api-reference/concept/","title":"Concept","text":" Bases: object
Represents a concept in the system.
Attributes:
Name Type Descriptionid
str
The unique identifier of the concept.
name
str
The name of the concept.
category
Optional[Enum]
The category of the concept (optional).
start
Optional[int]
The start position of the concept (optional).
end
Optional[int]
The end position of the concept (optional).
dosage
Optional[Dosage]
The dosage of the concept (optional).
linked_concepts
Optional[List[Concept]]
The linked concepts of the concept (optional).
negex
Optional[bool]
The negex value of the concept (optional).
meta_anns
Optional[List[MetaAnnotations]]
The meta annotations of the concept (optional).
debug_dict
Optional[Dict]
The debug dictionary of the concept (optional).
Source code insrc/miade/concept.py
class Concept(object):\n \"\"\"Represents a concept in the system.\n\n Attributes:\n id (str): The unique identifier of the concept.\n name (str): The name of the concept.\n category (Optional[Enum]): The category of the concept (optional).\n start (Optional[int]): The start position of the concept (optional).\n end (Optional[int]): The end position of the concept (optional).\n dosage (Optional[Dosage]): The dosage of the concept (optional).\n linked_concepts (Optional[List[Concept]]): The linked concepts of the concept (optional).\n negex (Optional[bool]): The negex value of the concept (optional).\n meta_anns (Optional[List[MetaAnnotations]]): The meta annotations of the concept (optional).\n debug_dict (Optional[Dict]): The debug dictionary of the concept (optional).\n \"\"\"\n\n def __init__(\n self,\n id: str,\n name: str,\n category: Optional[Enum] = None,\n start: Optional[int] = None,\n end: Optional[int] = None,\n dosage: Optional[Dosage] = None,\n linked_concepts: Optional[List[Concept]] = None,\n negex: Optional[bool] = None,\n meta_anns: Optional[List[MetaAnnotations]] = None,\n debug_dict: Optional[Dict] = None,\n ):\n self.name = name\n self.id = id\n self.category = category\n self.start = start\n self.end = end\n self.dosage = dosage\n self.linked_concepts = linked_concepts\n self.negex = negex\n self.meta = meta_anns\n self.debug = debug_dict\n\n if linked_concepts is None:\n self.linked_concepts = []\n\n @classmethod\n def from_entity(cls, entity: Dict) -> Concept:\n \"\"\"\n Converts an entity dictionary into a Concept object.\n\n Args:\n entity (Dict): The entity dictionary containing the necessary information.\n\n Returns:\n The Concept object created from the entity dictionary.\n \"\"\"\n meta_anns = None\n if entity[\"meta_anns\"]:\n meta_anns = [MetaAnnotations(**value) for value in entity[\"meta_anns\"].values()]\n\n return Concept(\n id=entity[\"cui\"],\n name=entity[\n \"source_value\"\n ], # can also use detected_name which is spell checked but delimited by ~ e.g. liver~failure\n category=None,\n start=entity[\"start\"],\n end=entity[\"end\"],\n negex=entity[\"negex\"] if \"negex\" in entity else None,\n meta_anns=meta_anns,\n )\n\n def __str__(self):\n return (\n f\"{{name: {self.name}, id: {self.id}, category: {self.category}, start: {self.start}, end: {self.end},\"\n f\" dosage: {self.dosage}, linked_concepts: {self.linked_concepts}, negex: {self.negex}, meta: {self.meta}}} \"\n )\n\n def __hash__(self):\n return hash((self.id, self.name, self.category))\n\n def __eq__(self, other):\n return self.id == other.id and self.name == other.name and self.category == other.category\n\n def __lt__(self, other):\n return int(self.id) < int(other.id)\n\n def __gt__(self, other):\n return int(self.id) > int(other.id)\n
"},{"location":"api-reference/concept/#miade.concept.Concept.from_entity","title":"from_entity(entity)
classmethod
","text":"Converts an entity dictionary into a Concept object.
Parameters:
Name Type Description Defaultentity
Dict
The entity dictionary containing the necessary information.
requiredReturns:
Type DescriptionConcept
The Concept object created from the entity dictionary.
Source code insrc/miade/concept.py
@classmethod\ndef from_entity(cls, entity: Dict) -> Concept:\n \"\"\"\n Converts an entity dictionary into a Concept object.\n\n Args:\n entity (Dict): The entity dictionary containing the necessary information.\n\n Returns:\n The Concept object created from the entity dictionary.\n \"\"\"\n meta_anns = None\n if entity[\"meta_anns\"]:\n meta_anns = [MetaAnnotations(**value) for value in entity[\"meta_anns\"].values()]\n\n return Concept(\n id=entity[\"cui\"],\n name=entity[\n \"source_value\"\n ], # can also use detected_name which is spell checked but delimited by ~ e.g. liver~failure\n category=None,\n start=entity[\"start\"],\n end=entity[\"end\"],\n negex=entity[\"negex\"] if \"negex\" in entity else None,\n meta_anns=meta_anns,\n )\n
"},{"location":"api-reference/dosage/","title":"Dosage","text":" Bases: object
Container for drug dosage information
Source code insrc/miade/dosage.py
class Dosage(object):\n \"\"\"\n Container for drug dosage information\n \"\"\"\n\n def __init__(\n self,\n dose: Optional[Dose],\n duration: Optional[Duration],\n frequency: Optional[Frequency],\n route: Optional[Route],\n text: Optional[str] = None,\n ):\n self.text = text\n self.dose = dose\n self.duration = duration\n self.frequency = frequency\n self.route = route\n\n @classmethod\n def from_doc(cls, doc: Doc, calculate: bool = True):\n \"\"\"\n Parses dosage from a spacy doc object.\n\n Args:\n doc (Doc): Spacy doc object with processed dosage text.\n calculate (bool, optional): Whether to calculate duration if total and daily dose is given. Defaults to True.\n\n Returns:\n An instance of the class with the parsed dosage information.\n\n \"\"\"\n quantities = []\n units = []\n dose_start = 1000\n dose_end = 0\n daily_dose = None\n total_dose = None\n route_text = None\n duration_text = None\n\n for ent in doc.ents:\n if ent.label_ == \"DOSAGE\":\n if ent._.total_dose:\n total_dose = float(ent.text)\n else:\n quantities.append(ent.text)\n # get span of full dosage string - not strictly needed but nice to have\n if ent.start < dose_start:\n dose_start = ent.start\n if ent.end > dose_end:\n dose_end = ent.end\n elif ent.label_ == \"FORM\":\n if ent._.total_dose:\n # de facto unit is in total dose\n units = [ent.text]\n else:\n units.append(ent.text)\n if ent.start < dose_start:\n dose_start = ent.start\n if ent.end > dose_end:\n dose_end = ent.end\n elif ent.label_ == \"DURATION\":\n duration_text = ent.text\n elif ent.label_ == \"ROUTE\":\n route_text = ent.text\n\n dose = parse_dose(\n text=\" \".join(doc.text.split()[dose_start:dose_end]),\n quantities=quantities,\n units=units,\n results=doc._.results,\n )\n\n frequency = parse_frequency(text=doc.text, results=doc._.results)\n\n route = parse_route(text=route_text, dose=dose)\n\n # technically not information recorded so will keep as an option\n if calculate:\n # if duration not given in text could extract this from total dose if given\n if total_dose is not None and dose is not None and doc._.results[\"freq\"]:\n if dose.value is not None:\n daily_dose = float(dose.value) * (round(doc._.results[\"freq\"] / doc._.results[\"time\"]))\n elif dose.high is not None:\n daily_dose = float(dose.high) * (round(doc._.results[\"freq\"] / doc._.results[\"time\"]))\n\n duration = parse_duration(\n text=duration_text,\n results=doc._.results,\n total_dose=total_dose,\n daily_dose=daily_dose,\n )\n\n return cls(\n text=doc._.original_text,\n dose=dose,\n duration=duration,\n frequency=frequency,\n route=route,\n )\n\n def __str__(self):\n return f\"{self.__dict__}\"\n\n def __eq__(self, other):\n return self.__dict__ == other.__dict__\n
"},{"location":"api-reference/dosage/#miade.dosage.Dosage.from_doc","title":"from_doc(doc, calculate=True)
classmethod
","text":"Parses dosage from a spacy doc object.
Parameters:
Name Type Description Defaultdoc
Doc
Spacy doc object with processed dosage text.
requiredcalculate
bool
Whether to calculate duration if total and daily dose is given. Defaults to True.
True
Returns:
Type DescriptionAn instance of the class with the parsed dosage information.
Source code insrc/miade/dosage.py
@classmethod\ndef from_doc(cls, doc: Doc, calculate: bool = True):\n \"\"\"\n Parses dosage from a spacy doc object.\n\n Args:\n doc (Doc): Spacy doc object with processed dosage text.\n calculate (bool, optional): Whether to calculate duration if total and daily dose is given. Defaults to True.\n\n Returns:\n An instance of the class with the parsed dosage information.\n\n \"\"\"\n quantities = []\n units = []\n dose_start = 1000\n dose_end = 0\n daily_dose = None\n total_dose = None\n route_text = None\n duration_text = None\n\n for ent in doc.ents:\n if ent.label_ == \"DOSAGE\":\n if ent._.total_dose:\n total_dose = float(ent.text)\n else:\n quantities.append(ent.text)\n # get span of full dosage string - not strictly needed but nice to have\n if ent.start < dose_start:\n dose_start = ent.start\n if ent.end > dose_end:\n dose_end = ent.end\n elif ent.label_ == \"FORM\":\n if ent._.total_dose:\n # de facto unit is in total dose\n units = [ent.text]\n else:\n units.append(ent.text)\n if ent.start < dose_start:\n dose_start = ent.start\n if ent.end > dose_end:\n dose_end = ent.end\n elif ent.label_ == \"DURATION\":\n duration_text = ent.text\n elif ent.label_ == \"ROUTE\":\n route_text = ent.text\n\n dose = parse_dose(\n text=\" \".join(doc.text.split()[dose_start:dose_end]),\n quantities=quantities,\n units=units,\n results=doc._.results,\n )\n\n frequency = parse_frequency(text=doc.text, results=doc._.results)\n\n route = parse_route(text=route_text, dose=dose)\n\n # technically not information recorded so will keep as an option\n if calculate:\n # if duration not given in text could extract this from total dose if given\n if total_dose is not None and dose is not None and doc._.results[\"freq\"]:\n if dose.value is not None:\n daily_dose = float(dose.value) * (round(doc._.results[\"freq\"] / doc._.results[\"time\"]))\n elif dose.high is not None:\n daily_dose = float(dose.high) * (round(doc._.results[\"freq\"] / doc._.results[\"time\"]))\n\n duration = parse_duration(\n text=duration_text,\n results=doc._.results,\n total_dose=total_dose,\n daily_dose=daily_dose,\n )\n\n return cls(\n text=doc._.original_text,\n dose=dose,\n duration=duration,\n frequency=frequency,\n route=route,\n )\n
"},{"location":"api-reference/dosageextractor/","title":"DosageExtractor","text":"Parses and extracts drug dosage
Attributes:
Name Type Descriptionmodel
str
The name of the model to be used for dosage extraction.
dosage_extractor
Language
The Spacy pipeline for dosage extraction.
Source code insrc/miade/dosageextractor.py
class DosageExtractor:\n \"\"\"\n Parses and extracts drug dosage\n\n Attributes:\n model (str): The name of the model to be used for dosage extraction.\n dosage_extractor (Language): The Spacy pipeline for dosage extraction.\n \"\"\"\n\n def __init__(self, model: str = \"en_core_med7_lg\"):\n self.model = model\n self.dosage_extractor = self._create_drugdoseade_pipeline()\n\n def _create_drugdoseade_pipeline(self) -> Language:\n \"\"\"\n Creates a spacy pipeline with given model (default med7)\n and customised pipeline components for dosage extraction\n\n Returns:\n nlp (spacy.Language): The Spacy pipeline for dosage extraction.\n \"\"\"\n nlp = spacy.load(self.model)\n nlp.add_pipe(\"preprocessor\", first=True)\n nlp.add_pipe(\"pattern_matcher\", before=\"ner\")\n nlp.add_pipe(\"entities_refiner\", after=\"ner\")\n\n log.info(f\"Loaded drug dosage extractor with model {self.model}\")\n\n return nlp\n\n def extract(self, text: str, calculate: bool = True) -> Optional[Dosage]:\n \"\"\"\n Processes a string that contains dosage instructions (excluding drug concept as this is handled by core)\n\n Args:\n text (str): The string containing dosage instructions.\n calculate (bool): Whether to calculate duration from total and daily dose, if given.\n\n Returns:\n The dosage object with parsed dosages in CDA format.\n \"\"\"\n doc = self.dosage_extractor(text)\n\n log.debug(f\"NER results: {[(e.text, e.label_, e._.total_dose) for e in doc.ents]}\")\n log.debug(f\"Lookup results: {doc._.results}\")\n\n dosage = Dosage.from_doc(doc=doc, calculate=calculate)\n\n if all(v is None for v in [dosage.dose, dosage.frequency, dosage.route, dosage.duration]):\n return None\n\n return dosage\n\n def __call__(self, text: str, calculate: bool = True):\n return self.extract(text, calculate)\n
"},{"location":"api-reference/dosageextractor/#miade.dosageextractor.DosageExtractor.extract","title":"extract(text, calculate=True)
","text":"Processes a string that contains dosage instructions (excluding drug concept as this is handled by core)
Parameters:
Name Type Description Defaulttext
str
The string containing dosage instructions.
requiredcalculate
bool
Whether to calculate duration from total and daily dose, if given.
True
Returns:
Type DescriptionOptional[Dosage]
The dosage object with parsed dosages in CDA format.
Source code insrc/miade/dosageextractor.py
def extract(self, text: str, calculate: bool = True) -> Optional[Dosage]:\n \"\"\"\n Processes a string that contains dosage instructions (excluding drug concept as this is handled by core)\n\n Args:\n text (str): The string containing dosage instructions.\n calculate (bool): Whether to calculate duration from total and daily dose, if given.\n\n Returns:\n The dosage object with parsed dosages in CDA format.\n \"\"\"\n doc = self.dosage_extractor(text)\n\n log.debug(f\"NER results: {[(e.text, e.label_, e._.total_dose) for e in doc.ents]}\")\n log.debug(f\"Lookup results: {doc._.results}\")\n\n dosage = Dosage.from_doc(doc=doc, calculate=calculate)\n\n if all(v is None for v in [dosage.dose, dosage.frequency, dosage.route, dosage.duration]):\n return None\n\n return dosage\n
"},{"location":"api-reference/medsallergiesannotator/","title":"MedsAllergiesAnnotator","text":" Bases: Annotator
Annotator class for medication and allergy concepts.
This class extends the Annotator
base class and provides methods for running a pipeline of annotation tasks on a given note, as well as validating and converting concepts related to medications and allergies.
Attributes:
Name Type Descriptionvalid_meds
List[int]
A list of valid medication IDs.
reactions_subset_lookup
Dict[int, str]
A dictionary mapping reaction IDs to their corresponding subset IDs.
allergens_subset_lookup
Dict[int, str]
A dictionary mapping allergen IDs to their corresponding subset IDs.
allergy_type_lookup
Dict[str, List[str]]
A dictionary mapping allergen types to their corresponding codes.
vtm_to_vmp_lookup
Dict[str, str]
A dictionary mapping VTM (Virtual Therapeutic Moiety) IDs to VMP (Virtual Medicinal Product) IDs.
vtm_to_text_lookup
Dict[str, str]
A dictionary mapping VTM IDs to their corresponding text.
Source code insrc/miade/annotators.py
class MedsAllergiesAnnotator(Annotator):\n \"\"\"\n Annotator class for medication and allergy concepts.\n\n This class extends the `Annotator` base class and provides methods for running a pipeline of\n annotation tasks on a given note, as well as validating and converting concepts related to\n medications and allergies.\n\n Attributes:\n valid_meds (List[int]): A list of valid medication IDs.\n reactions_subset_lookup (Dict[int, str]): A dictionary mapping reaction IDs to their corresponding subset IDs.\n allergens_subset_lookup (Dict[int, str]): A dictionary mapping allergen IDs to their corresponding subset IDs.\n allergy_type_lookup (Dict[str, List[str]]): A dictionary mapping allergen types to their corresponding codes.\n vtm_to_vmp_lookup (Dict[str, str]): A dictionary mapping VTM (Virtual Therapeutic Moiety) IDs to VMP (Virtual Medicinal Product) IDs.\n vtm_to_text_lookup (Dict[str, str]): A dictionary mapping VTM IDs to their corresponding text.\n \"\"\"\n\n def __init__(self, cat: CAT, config: AnnotatorConfig = None):\n super().__init__(cat, config)\n self._load_med_allergy_lookup_data()\n\n @property\n def concept_types(self) -> List[Category]:\n \"\"\"\n Returns a list of concept types.\n\n Returns:\n [Category.MEDICATION, Category.ALLERGY, Category.REACTION]\n \"\"\"\n return [Category.MEDICATION, Category.ALLERGY, Category.REACTION]\n\n @property\n def pipeline(self) -> List[str]:\n \"\"\"\n Returns a list of annotators in the pipeline.\n\n The annotators are executed in the order they appear in the list.\n\n Returns:\n [\"preprocessor\", \"medcat\", \"paragrapher\", \"postprocessor\", \"dosage_extractor\", \"vtm_converter\", \"deduplicator\"]\n \"\"\"\n return [\n \"preprocessor\",\n \"medcat\",\n \"paragrapher\",\n \"postprocessor\",\n \"dosage_extractor\",\n \"vtm_converter\",\n \"deduplicator\",\n ]\n\n def run_pipeline(\n self, note: Note, record_concepts: List[Concept], dosage_extractor: Optional[DosageExtractor]\n ) -> List[Concept]:\n \"\"\"\n Runs the annotation pipeline on the given note.\n\n Args:\n note (Note): The input note to run the pipeline on.\n record_concepts (List[Concept]): The list of previously recorded concepts.\n dosage_extractor (Optional[DosageExtractor]): The dosage extractor function.\n\n Returns:\n The list of annotated concepts.\n \"\"\"\n concepts: List[Concept] = []\n\n for pipe in self.pipeline:\n if pipe not in self.config.disable:\n if pipe == \"preprocessor\":\n note = self.preprocess(note)\n elif pipe == \"medcat\":\n concepts = self.get_concepts(note)\n elif pipe == \"paragrapher\":\n concepts = self.process_paragraphs(note, concepts)\n elif pipe == \"postprocessor\":\n concepts = self.postprocess(concepts, note)\n elif pipe == \"deduplicator\":\n concepts = self.deduplicate(concepts, record_concepts)\n elif pipe == \"vtm_converter\":\n concepts = self.convert_VTM_to_VMP_or_text(concepts)\n elif pipe == \"dosage_extractor\" and dosage_extractor is not None:\n concepts = self.add_dosages_to_concepts(dosage_extractor, concepts, note)\n\n return concepts\n\n def _load_med_allergy_lookup_data(self) -> None:\n \"\"\"\n Loads the medication and allergy lookup data.\n \"\"\"\n if not os.path.isdir(self.config.lookup_data_path):\n raise RuntimeError(f\"No lookup data configured: {self.config.lookup_data_path} does not exist!\")\n else:\n self.valid_meds = load_lookup_data(self.config.lookup_data_path + \"valid_meds.csv\", no_header=True)\n self.reactions_subset_lookup = load_lookup_data(\n self.config.lookup_data_path + \"reactions_subset.csv\", as_dict=True\n )\n self.allergens_subset_lookup = load_lookup_data(\n self.config.lookup_data_path + \"allergens_subset.csv\", as_dict=True\n )\n self.allergy_type_lookup = load_allergy_type_combinations(self.config.lookup_data_path + \"allergy_type.csv\")\n self.vtm_to_vmp_lookup = load_lookup_data(self.config.lookup_data_path + \"vtm_to_vmp.csv\")\n self.vtm_to_text_lookup = load_lookup_data(self.config.lookup_data_path + \"vtm_to_text.csv\", as_dict=True)\n\n def _validate_meds(self, concept) -> bool:\n \"\"\"\n Validates if the concept is a valid medication.\n\n Args:\n concept: The concept to validate.\n\n Returns:\n True if the concept is a valid medication, False otherwise.\n \"\"\"\n # check if substance is valid med\n if int(concept.id) in self.valid_meds.values:\n return True\n return False\n\n def _validate_and_convert_substance(self, concept) -> bool:\n \"\"\"\n Validates and converts a substance concept for allergy.\n\n Args:\n concept: The substance concept to be validated and converted.\n\n Returns:\n True if the substance is valid and converted successfully, False otherwise.\n \"\"\"\n # check if substance is valid substance for allergy - if it is, convert it to Epic subset and return that concept\n lookup_result = self.allergens_subset_lookup.get(int(concept.id))\n if lookup_result is not None:\n log.debug(\n f\"Converted concept ({concept.id} | {concept.name}) to \"\n f\"({lookup_result['subsetId']} | {concept.name}): valid Epic allergen subset\"\n )\n concept.id = str(lookup_result[\"subsetId\"])\n\n # then check the allergen type from lookup result - e.g. drug, food\n try:\n concept.category = AllergenType(str(lookup_result[\"allergenType\"]).lower())\n log.debug(\n f\"Assigned substance concept ({concept.id} | {concept.name}) \"\n f\"to allergen type category {concept.category}\"\n )\n except ValueError as e:\n log.warning(f\"Allergen type not found for {concept.__str__()}: {e}\")\n\n return True\n else:\n log.warning(f\"No lookup subset found for substance ({concept.id} | {concept.name})\")\n return False\n\n def _validate_and_convert_reaction(self, concept) -> bool:\n \"\"\"\n Validates and converts a reaction concept to the Epic subset.\n\n Args:\n concept: The concept to be validated and converted.\n\n Returns:\n True if the concept is a valid reaction and successfully converted to the Epic subset,\n False otherwise.\n \"\"\"\n # check if substance is valid reaction - if it is, convert it to Epic subset and return that concept\n lookup_result = self.reactions_subset_lookup.get(int(concept.id), None)\n if lookup_result is not None:\n log.debug(\n f\"Converted concept ({concept.id} | {concept.name}) to \"\n f\"({lookup_result} | {concept.name}): valid Epic reaction subset\"\n )\n concept.id = str(lookup_result)\n return True\n else:\n log.warning(f\"Reaction not found in Epic subset conversion for concept {concept.__str__()}\")\n return False\n\n def _validate_and_convert_concepts(self, concept: Concept) -> Concept:\n \"\"\"\n Validates and converts the given concept based on its metadata annotations.\n\n Args:\n concept (Concept): The concept to be validated and converted.\n\n Returns:\n The validated and converted concept.\n\n \"\"\"\n meta_ann_values = [meta_ann.value for meta_ann in concept.meta] if concept.meta is not None else []\n\n # assign categories\n if SubstanceCategory.ADVERSE_REACTION in meta_ann_values:\n if self._validate_and_convert_substance(concept):\n self._convert_allergy_type_to_code(concept)\n self._convert_allergy_severity_to_code(concept)\n concept.category = Category.ALLERGY\n else:\n log.warning(f\"Double-checking if concept ({concept.id} | {concept.name}) is in reaction subset\")\n if self._validate_and_convert_reaction(concept) and (\n ReactionPos.BEFORE_SUBSTANCE in meta_ann_values or ReactionPos.AFTER_SUBSTANCE in meta_ann_values\n ):\n concept.category = Category.REACTION\n else:\n log.warning(\n f\"Reaction concept ({concept.id} | {concept.name}) not in subset or reaction_pos is NOT_REACTION\"\n )\n if SubstanceCategory.TAKING in meta_ann_values:\n if self._validate_meds(concept):\n concept.category = Category.MEDICATION\n if SubstanceCategory.NOT_SUBSTANCE in meta_ann_values and (\n ReactionPos.BEFORE_SUBSTANCE in meta_ann_values or ReactionPos.AFTER_SUBSTANCE in meta_ann_values\n ):\n if self._validate_and_convert_reaction(concept):\n concept.category = Category.REACTION\n\n return concept\n\n @staticmethod\n def add_dosages_to_concepts(\n dosage_extractor: DosageExtractor, concepts: List[Concept], note: Note\n ) -> List[Concept]:\n \"\"\"\n Gets dosages for medication concepts\n\n Args:\n dosage_extractor (DosageExtractor): The dosage extractor object\n concepts (List[Concept]): List of concepts extracted\n note (Note): The input note\n\n Returns:\n List of concepts with dosages for medication concepts\n \"\"\"\n\n for ind, concept in enumerate(concepts):\n next_med_concept = concepts[ind + 1] if len(concepts) > ind + 1 else None\n dosage_string = get_dosage_string(concept, next_med_concept, note.text)\n if len(dosage_string.split()) > 2:\n concept.dosage = dosage_extractor(dosage_string)\n concept.category = Category.MEDICATION if concept.dosage is not None else None\n if concept.dosage is not None:\n log.debug(\n f\"Extracted dosage for medication concept \"\n f\"({concept.id} | {concept.name}): {concept.dosage.text} {concept.dosage.dose}\"\n )\n\n return concepts\n\n @staticmethod\n def _link_reactions_to_allergens(concept_list: List[Concept], note: Note, link_distance: int = 5) -> List[Concept]:\n \"\"\"\n Links reaction concepts to allergen concepts based on their proximity in the given concept list.\n\n Args:\n concept_list (List[Concept]): The list of concepts to search for reaction and allergen concepts.\n note (Note): The note object containing the text.\n link_distance (int, optional): The maximum distance between a reaction and an allergen to be considered linked.\n Defaults to 5.\n\n Returns:\n The updated concept list with reaction concepts removed and linked to their corresponding allergen concepts.\n \"\"\"\n allergy_concepts = [concept for concept in concept_list if concept.category == Category.ALLERGY]\n reaction_concepts = [concept for concept in concept_list if concept.category == Category.REACTION]\n\n for reaction_concept in reaction_concepts:\n nearest_allergy_concept = None\n min_distance = inf\n meta_ann_values = (\n [meta_ann.value for meta_ann in reaction_concept.meta] if reaction_concept.meta is not None else []\n )\n\n for allergy_concept in allergy_concepts:\n # skip if allergy is after and meta is before_substance\n if ReactionPos.BEFORE_SUBSTANCE in meta_ann_values and allergy_concept.start < reaction_concept.start:\n continue\n # skip if allergy is before and meta is after_substance\n elif ReactionPos.AFTER_SUBSTANCE in meta_ann_values and allergy_concept.start > reaction_concept.start:\n continue\n else:\n distance = calculate_word_distance(\n reaction_concept.start, reaction_concept.end, allergy_concept.start, allergy_concept.end, note\n )\n log.debug(\n f\"Calculated distance between reaction {reaction_concept.name} \"\n f\"and allergen {allergy_concept.name}: {distance}\"\n )\n if distance == -1:\n log.warning(\n f\"Indices for {reaction_concept.name} or {allergy_concept.name} invalid: \"\n f\"({reaction_concept.start}, {reaction_concept.end})\"\n f\"({allergy_concept.start}, {allergy_concept.end})\"\n )\n continue\n\n if distance <= link_distance and distance < min_distance:\n min_distance = distance\n nearest_allergy_concept = allergy_concept\n\n if nearest_allergy_concept is not None:\n nearest_allergy_concept.linked_concepts.append(reaction_concept)\n log.debug(\n f\"Linked reaction concept {reaction_concept.name} to \"\n f\"allergen concept {nearest_allergy_concept.name}\"\n )\n\n # Remove the linked REACTION concepts from the main list\n updated_concept_list = [concept for concept in concept_list if concept.category != Category.REACTION]\n\n return updated_concept_list\n\n @staticmethod\n def _convert_allergy_severity_to_code(concept: Concept) -> bool:\n \"\"\"\n Converts allergy severity to corresponding codes and links them to the concept.\n\n Args:\n concept (Concept): The concept to convert severity for.\n\n Returns:\n True if the conversion is successful, False otherwise.\n \"\"\"\n meta_ann_values = [meta_ann.value for meta_ann in concept.meta] if concept.meta is not None else []\n if Severity.MILD in meta_ann_values:\n concept.linked_concepts.append(Concept(id=\"L\", name=\"Low\", category=Category.SEVERITY))\n elif Severity.MODERATE in meta_ann_values:\n concept.linked_concepts.append(Concept(id=\"M\", name=\"Moderate\", category=Category.SEVERITY))\n elif Severity.SEVERE in meta_ann_values:\n concept.linked_concepts.append(Concept(id=\"H\", name=\"High\", category=Category.SEVERITY))\n elif Severity.UNSPECIFIED in meta_ann_values:\n return True\n else:\n log.warning(f\"No severity annotation associated with ({concept.id} | {concept.name})\")\n return False\n\n log.debug(\n f\"Linked severity concept ({concept.linked_concepts[-1].id} | {concept.linked_concepts[-1].name}) \"\n f\"to allergen concept ({concept.id} | {concept.name}): valid meta model output\"\n )\n\n return True\n\n def _convert_allergy_type_to_code(self, concept: Concept) -> bool:\n \"\"\"\n Converts the allergy type of a concept to a code and adds it as a linked concept.\n\n Args:\n concept (Concept): The concept whose allergy type needs to be converted.\n\n Returns:\n True if the conversion and linking were successful, False otherwise.\n \"\"\"\n # get the ALLERGYTYPE meta-annotation\n allergy_type = [meta_ann for meta_ann in concept.meta if meta_ann.name == \"allergy_type\"]\n if len(allergy_type) != 1:\n log.warning(\n f\"Unable to map allergy type code: allergy_type meta-annotation \"\n f\"not found for concept {concept.__str__()}\"\n )\n return False\n else:\n allergy_type = allergy_type[0].value\n\n # perform lookup with ALLERGYTYPE and AllergenType combination\n lookup_combination: Tuple[str, str] = (concept.category.value, allergy_type.value)\n allergy_type_lookup_result = self.allergy_type_lookup.get(lookup_combination)\n\n # add resulting allergy type concept as to linked_concept\n if allergy_type_lookup_result is not None:\n concept.linked_concepts.append(\n Concept(\n id=str(allergy_type_lookup_result[0]),\n name=allergy_type_lookup_result[1],\n category=Category.ALLERGY_TYPE,\n )\n )\n log.debug(\n f\"Linked allergy_type concept ({allergy_type_lookup_result[0]} | {allergy_type_lookup_result[1]})\"\n f\" to allergen concept ({concept.id} | {concept.name}): valid meta model output + allergytype lookup\"\n )\n else:\n log.warning(f\"Allergen and adverse reaction type combination not found: {lookup_combination}\")\n\n return True\n\n def _process_meta_ann_by_paragraph(self, concept: Concept, paragraph: Paragraph):\n \"\"\"\n Process the meta annotations for a given concept and paragraph.\n\n Args:\n concept (Concept): The concept object.\n paragraph (Paragraph): The paragraph object.\n\n Returns:\n None\n \"\"\"\n # if paragraph is structured meds to convert to corresponding relevance\n if paragraph.type in self.structured_med_lists:\n for meta in concept.meta:\n if meta.name == \"substance_category\" and meta.value in [\n SubstanceCategory.TAKING,\n SubstanceCategory.IRRELEVANT,\n ]:\n new_relevance = self.structured_med_lists[paragraph.type]\n if meta.value != new_relevance:\n log.debug(\n f\"Converted {meta.value} to \"\n f\"{new_relevance} for concept ({concept.id} | {concept.name}): \"\n f\"paragraph is {paragraph.type}\"\n )\n meta.value = new_relevance\n # if paragraph is probs or irrelevant section, convert substance to irrelevant\n elif paragraph.type in self.structured_prob_lists or paragraph.type in self.irrelevant_paragraphs:\n for meta in concept.meta:\n if meta.name == \"substance_category\" and meta.value != SubstanceCategory.IRRELEVANT:\n log.debug(\n f\"Converted {meta.value} to \"\n f\"{SubstanceCategory.IRRELEVANT} for concept ({concept.id} | {concept.name}): \"\n f\"paragraph is {paragraph.type}\"\n )\n meta.value = SubstanceCategory.IRRELEVANT\n\n def process_paragraphs(self, note: Note, concepts: List[Concept]) -> List[Concept]:\n \"\"\"\n Process the paragraphs in a note and update the list of concepts.\n\n Args:\n note (Note): The note object containing the paragraphs.\n concepts (List[Concept]): The list of concepts to be updated.\n\n Returns:\n The updated list of concepts.\n \"\"\"\n for paragraph in note.paragraphs:\n for concept in concepts:\n if concept.start >= paragraph.start and concept.end <= paragraph.end:\n # log.debug(f\"({concept.name} | {concept.id}) is in {paragraph.type}\")\n if concept.meta:\n self._process_meta_ann_by_paragraph(concept, paragraph)\n\n return concepts\n\n def postprocess(self, concepts: List[Concept], note: Note) -> List[Concept]:\n \"\"\"\n Postprocesses a list of concepts and links reactions to allergens.\n\n Args:\n concepts (List[Concept]): The list of concepts to be postprocessed.\n note (Note): The note object associated with the concepts.\n\n Returns:\n The postprocessed list of concepts.\n \"\"\"\n # deepcopy so we still have reference to original list of concepts\n all_concepts = deepcopy(concepts)\n processed_concepts = []\n\n for concept in all_concepts:\n concept = self._validate_and_convert_concepts(concept)\n processed_concepts.append(concept)\n\n processed_concepts = self._link_reactions_to_allergens(processed_concepts, note)\n\n return processed_concepts\n\n def convert_VTM_to_VMP_or_text(self, concepts: List[Concept]) -> List[Concept]:\n \"\"\"\n Converts medication concepts from VTM (Virtual Therapeutic Moiety) to VMP (Virtual Medicinal Product) or text.\n\n Args:\n concepts (List[Concept]): A list of medication concepts.\n\n Returns:\n A list of medication concepts with updated IDs, names, and dosages.\n\n \"\"\"\n # Get medication concepts\n med_concepts = [concept for concept in concepts if concept.category == Category.MEDICATION]\n self.vtm_to_vmp_lookup[\"dose\"] = self.vtm_to_vmp_lookup[\"dose\"].astype(float)\n\n med_concepts_with_dose = []\n # I don't know man...Need to improve dosage methods\n for concept in med_concepts:\n if concept.dosage is not None:\n if concept.dosage.dose:\n if concept.dosage.dose.value is not None and concept.dosage.dose.unit is not None:\n med_concepts_with_dose.append(concept)\n\n med_concepts_no_dose = [concept for concept in concepts if concept not in med_concepts_with_dose]\n\n # Create a temporary DataFrame to match vtmId, dose, and unit\n temp_df = pd.DataFrame(\n {\n \"vtmId\": [int(concept.id) for concept in med_concepts_with_dose],\n \"dose\": [float(concept.dosage.dose.value) for concept in med_concepts_with_dose],\n \"unit\": [concept.dosage.dose.unit for concept in med_concepts_with_dose],\n }\n )\n\n # Merge with the lookup df to get vmpId\n merged_df = temp_df.merge(self.vtm_to_vmp_lookup, on=[\"vtmId\", \"dose\", \"unit\"], how=\"left\")\n\n # Update id in the concepts list\n for index, concept in enumerate(med_concepts_with_dose):\n # Convert VTM to VMP id\n vmp_id = merged_df.at[index, \"vmpId\"]\n if not pd.isna(vmp_id):\n log.debug(\n f\"Converted ({concept.id} | {concept.name}) to \"\n f\"({int(vmp_id)} | {concept.name + ' ' + str(int(concept.dosage.dose.value)) + concept.dosage.dose.unit} \"\n f\"tablets): valid extracted dosage + VMP lookup\"\n )\n concept.id = str(int(vmp_id))\n concept.name += \" \" + str(int(concept.dosage.dose.value)) + str(concept.dosage.dose.unit) + \" tablets\"\n # If found VMP match change the dosage to 1 tablet\n concept.dosage.dose.value = 1\n concept.dosage.dose.unit = \"{tbl}\"\n else:\n # If no match with dose convert to text\n lookup_result = self.vtm_to_text_lookup.get(int(concept.id))\n if lookup_result is not None:\n log.debug(\n f\"Converted ({concept.id} | {concept.name}) to (None | {lookup_result}: no match to VMP dosage lookup)\"\n )\n concept.id = None\n concept.name = lookup_result\n\n # Convert rest of VTMs that have no dose for VMP conversion to text\n for concept in med_concepts_no_dose:\n lookup_result = self.vtm_to_text_lookup.get(int(concept.id))\n if lookup_result is not None:\n log.debug(f\"Converted ({concept.id} | {concept.name}) to (None | {lookup_result}): no dosage detected\")\n concept.id = None\n concept.name = lookup_result\n\n return concepts\n\n def __call__(\n self,\n note: Note,\n record_concepts: Optional[List[Concept]] = None,\n dosage_extractor: Optional[DosageExtractor] = None,\n ) -> List[Concept]:\n \"\"\"\n Annotates the given note with concepts using the pipeline.\n\n Args:\n note (Note): The note to be annotated.\n record_concepts (Optional[List[Concept]]): A list of concepts to be recorded.\n dosage_extractor (Optional[DosageExtractor]): A dosage extractor to be used.\n\n Returns:\n The annotated concepts.\n \"\"\"\n concepts = self.run_pipeline(note, record_concepts, dosage_extractor)\n\n if self.config.add_numbering:\n concepts = self.add_numbering_to_name(concepts)\n\n return concepts\n
"},{"location":"api-reference/medsallergiesannotator/#miade.annotators.MedsAllergiesAnnotator.concept_types","title":"concept_types: List[Category]
property
","text":"Returns a list of concept types.
Returns:
Type DescriptionList[Category]
[Category.MEDICATION, Category.ALLERGY, Category.REACTION]
"},{"location":"api-reference/medsallergiesannotator/#miade.annotators.MedsAllergiesAnnotator.pipeline","title":"pipeline: List[str]
property
","text":"Returns a list of annotators in the pipeline.
The annotators are executed in the order they appear in the list.
Returns:
Type DescriptionList[str]
[\"preprocessor\", \"medcat\", \"paragrapher\", \"postprocessor\", \"dosage_extractor\", \"vtm_converter\", \"deduplicator\"]
"},{"location":"api-reference/medsallergiesannotator/#miade.annotators.MedsAllergiesAnnotator.__call__","title":"__call__(note, record_concepts=None, dosage_extractor=None)
","text":"Annotates the given note with concepts using the pipeline.
Parameters:
Name Type Description Defaultnote
Note
The note to be annotated.
requiredrecord_concepts
Optional[List[Concept]]
A list of concepts to be recorded.
None
dosage_extractor
Optional[DosageExtractor]
A dosage extractor to be used.
None
Returns:
Type DescriptionList[Concept]
The annotated concepts.
Source code insrc/miade/annotators.py
def __call__(\n self,\n note: Note,\n record_concepts: Optional[List[Concept]] = None,\n dosage_extractor: Optional[DosageExtractor] = None,\n) -> List[Concept]:\n \"\"\"\n Annotates the given note with concepts using the pipeline.\n\n Args:\n note (Note): The note to be annotated.\n record_concepts (Optional[List[Concept]]): A list of concepts to be recorded.\n dosage_extractor (Optional[DosageExtractor]): A dosage extractor to be used.\n\n Returns:\n The annotated concepts.\n \"\"\"\n concepts = self.run_pipeline(note, record_concepts, dosage_extractor)\n\n if self.config.add_numbering:\n concepts = self.add_numbering_to_name(concepts)\n\n return concepts\n
"},{"location":"api-reference/medsallergiesannotator/#miade.annotators.MedsAllergiesAnnotator.add_dosages_to_concepts","title":"add_dosages_to_concepts(dosage_extractor, concepts, note)
staticmethod
","text":"Gets dosages for medication concepts
Parameters:
Name Type Description Defaultdosage_extractor
DosageExtractor
The dosage extractor object
requiredconcepts
List[Concept]
List of concepts extracted
requirednote
Note
The input note
requiredReturns:
Type DescriptionList[Concept]
List of concepts with dosages for medication concepts
Source code insrc/miade/annotators.py
@staticmethod\ndef add_dosages_to_concepts(\n dosage_extractor: DosageExtractor, concepts: List[Concept], note: Note\n) -> List[Concept]:\n \"\"\"\n Gets dosages for medication concepts\n\n Args:\n dosage_extractor (DosageExtractor): The dosage extractor object\n concepts (List[Concept]): List of concepts extracted\n note (Note): The input note\n\n Returns:\n List of concepts with dosages for medication concepts\n \"\"\"\n\n for ind, concept in enumerate(concepts):\n next_med_concept = concepts[ind + 1] if len(concepts) > ind + 1 else None\n dosage_string = get_dosage_string(concept, next_med_concept, note.text)\n if len(dosage_string.split()) > 2:\n concept.dosage = dosage_extractor(dosage_string)\n concept.category = Category.MEDICATION if concept.dosage is not None else None\n if concept.dosage is not None:\n log.debug(\n f\"Extracted dosage for medication concept \"\n f\"({concept.id} | {concept.name}): {concept.dosage.text} {concept.dosage.dose}\"\n )\n\n return concepts\n
"},{"location":"api-reference/medsallergiesannotator/#miade.annotators.MedsAllergiesAnnotator.convert_VTM_to_VMP_or_text","title":"convert_VTM_to_VMP_or_text(concepts)
","text":"Converts medication concepts from VTM (Virtual Therapeutic Moiety) to VMP (Virtual Medicinal Product) or text.
Parameters:
Name Type Description Defaultconcepts
List[Concept]
A list of medication concepts.
requiredReturns:
Type DescriptionList[Concept]
A list of medication concepts with updated IDs, names, and dosages.
Source code insrc/miade/annotators.py
def convert_VTM_to_VMP_or_text(self, concepts: List[Concept]) -> List[Concept]:\n \"\"\"\n Converts medication concepts from VTM (Virtual Therapeutic Moiety) to VMP (Virtual Medicinal Product) or text.\n\n Args:\n concepts (List[Concept]): A list of medication concepts.\n\n Returns:\n A list of medication concepts with updated IDs, names, and dosages.\n\n \"\"\"\n # Get medication concepts\n med_concepts = [concept for concept in concepts if concept.category == Category.MEDICATION]\n self.vtm_to_vmp_lookup[\"dose\"] = self.vtm_to_vmp_lookup[\"dose\"].astype(float)\n\n med_concepts_with_dose = []\n # I don't know man...Need to improve dosage methods\n for concept in med_concepts:\n if concept.dosage is not None:\n if concept.dosage.dose:\n if concept.dosage.dose.value is not None and concept.dosage.dose.unit is not None:\n med_concepts_with_dose.append(concept)\n\n med_concepts_no_dose = [concept for concept in concepts if concept not in med_concepts_with_dose]\n\n # Create a temporary DataFrame to match vtmId, dose, and unit\n temp_df = pd.DataFrame(\n {\n \"vtmId\": [int(concept.id) for concept in med_concepts_with_dose],\n \"dose\": [float(concept.dosage.dose.value) for concept in med_concepts_with_dose],\n \"unit\": [concept.dosage.dose.unit for concept in med_concepts_with_dose],\n }\n )\n\n # Merge with the lookup df to get vmpId\n merged_df = temp_df.merge(self.vtm_to_vmp_lookup, on=[\"vtmId\", \"dose\", \"unit\"], how=\"left\")\n\n # Update id in the concepts list\n for index, concept in enumerate(med_concepts_with_dose):\n # Convert VTM to VMP id\n vmp_id = merged_df.at[index, \"vmpId\"]\n if not pd.isna(vmp_id):\n log.debug(\n f\"Converted ({concept.id} | {concept.name}) to \"\n f\"({int(vmp_id)} | {concept.name + ' ' + str(int(concept.dosage.dose.value)) + concept.dosage.dose.unit} \"\n f\"tablets): valid extracted dosage + VMP lookup\"\n )\n concept.id = str(int(vmp_id))\n concept.name += \" \" + str(int(concept.dosage.dose.value)) + str(concept.dosage.dose.unit) + \" tablets\"\n # If found VMP match change the dosage to 1 tablet\n concept.dosage.dose.value = 1\n concept.dosage.dose.unit = \"{tbl}\"\n else:\n # If no match with dose convert to text\n lookup_result = self.vtm_to_text_lookup.get(int(concept.id))\n if lookup_result is not None:\n log.debug(\n f\"Converted ({concept.id} | {concept.name}) to (None | {lookup_result}: no match to VMP dosage lookup)\"\n )\n concept.id = None\n concept.name = lookup_result\n\n # Convert rest of VTMs that have no dose for VMP conversion to text\n for concept in med_concepts_no_dose:\n lookup_result = self.vtm_to_text_lookup.get(int(concept.id))\n if lookup_result is not None:\n log.debug(f\"Converted ({concept.id} | {concept.name}) to (None | {lookup_result}): no dosage detected\")\n concept.id = None\n concept.name = lookup_result\n\n return concepts\n
"},{"location":"api-reference/medsallergiesannotator/#miade.annotators.MedsAllergiesAnnotator.postprocess","title":"postprocess(concepts, note)
","text":"Postprocesses a list of concepts and links reactions to allergens.
Parameters:
Name Type Description Defaultconcepts
List[Concept]
The list of concepts to be postprocessed.
requirednote
Note
The note object associated with the concepts.
requiredReturns:
Type DescriptionList[Concept]
The postprocessed list of concepts.
Source code insrc/miade/annotators.py
def postprocess(self, concepts: List[Concept], note: Note) -> List[Concept]:\n \"\"\"\n Postprocesses a list of concepts and links reactions to allergens.\n\n Args:\n concepts (List[Concept]): The list of concepts to be postprocessed.\n note (Note): The note object associated with the concepts.\n\n Returns:\n The postprocessed list of concepts.\n \"\"\"\n # deepcopy so we still have reference to original list of concepts\n all_concepts = deepcopy(concepts)\n processed_concepts = []\n\n for concept in all_concepts:\n concept = self._validate_and_convert_concepts(concept)\n processed_concepts.append(concept)\n\n processed_concepts = self._link_reactions_to_allergens(processed_concepts, note)\n\n return processed_concepts\n
"},{"location":"api-reference/medsallergiesannotator/#miade.annotators.MedsAllergiesAnnotator.process_paragraphs","title":"process_paragraphs(note, concepts)
","text":"Process the paragraphs in a note and update the list of concepts.
Parameters:
Name Type Description Defaultnote
Note
The note object containing the paragraphs.
requiredconcepts
List[Concept]
The list of concepts to be updated.
requiredReturns:
Type DescriptionList[Concept]
The updated list of concepts.
Source code insrc/miade/annotators.py
def process_paragraphs(self, note: Note, concepts: List[Concept]) -> List[Concept]:\n \"\"\"\n Process the paragraphs in a note and update the list of concepts.\n\n Args:\n note (Note): The note object containing the paragraphs.\n concepts (List[Concept]): The list of concepts to be updated.\n\n Returns:\n The updated list of concepts.\n \"\"\"\n for paragraph in note.paragraphs:\n for concept in concepts:\n if concept.start >= paragraph.start and concept.end <= paragraph.end:\n # log.debug(f\"({concept.name} | {concept.id}) is in {paragraph.type}\")\n if concept.meta:\n self._process_meta_ann_by_paragraph(concept, paragraph)\n\n return concepts\n
"},{"location":"api-reference/medsallergiesannotator/#miade.annotators.MedsAllergiesAnnotator.run_pipeline","title":"run_pipeline(note, record_concepts, dosage_extractor)
","text":"Runs the annotation pipeline on the given note.
Parameters:
Name Type Description Defaultnote
Note
The input note to run the pipeline on.
requiredrecord_concepts
List[Concept]
The list of previously recorded concepts.
requireddosage_extractor
Optional[DosageExtractor]
The dosage extractor function.
requiredReturns:
Type DescriptionList[Concept]
The list of annotated concepts.
Source code insrc/miade/annotators.py
def run_pipeline(\n self, note: Note, record_concepts: List[Concept], dosage_extractor: Optional[DosageExtractor]\n) -> List[Concept]:\n \"\"\"\n Runs the annotation pipeline on the given note.\n\n Args:\n note (Note): The input note to run the pipeline on.\n record_concepts (List[Concept]): The list of previously recorded concepts.\n dosage_extractor (Optional[DosageExtractor]): The dosage extractor function.\n\n Returns:\n The list of annotated concepts.\n \"\"\"\n concepts: List[Concept] = []\n\n for pipe in self.pipeline:\n if pipe not in self.config.disable:\n if pipe == \"preprocessor\":\n note = self.preprocess(note)\n elif pipe == \"medcat\":\n concepts = self.get_concepts(note)\n elif pipe == \"paragrapher\":\n concepts = self.process_paragraphs(note, concepts)\n elif pipe == \"postprocessor\":\n concepts = self.postprocess(concepts, note)\n elif pipe == \"deduplicator\":\n concepts = self.deduplicate(concepts, record_concepts)\n elif pipe == \"vtm_converter\":\n concepts = self.convert_VTM_to_VMP_or_text(concepts)\n elif pipe == \"dosage_extractor\" and dosage_extractor is not None:\n concepts = self.add_dosages_to_concepts(dosage_extractor, concepts, note)\n\n return concepts\n
"},{"location":"api-reference/metaannotations/","title":"MetaAnnotations","text":" Bases: BaseModel
Represents a meta annotation with a name, value, and optional confidence.
Attributes:
Name Type Descriptionname
str
The name of the meta annotation.
value
Enum
The value of the meta annotation.
confidence
float
The confidence level of the meta annotation.
Source code insrc/miade/metaannotations.py
class MetaAnnotations(BaseModel):\n \"\"\"\n Represents a meta annotation with a name, value, and optional confidence.\n\n Attributes:\n name (str): The name of the meta annotation.\n value (Enum): The value of the meta annotation.\n confidence (float, optional): The confidence level of the meta annotation.\n \"\"\"\n\n name: str\n value: Enum\n confidence: Optional[float]\n\n @validator(\"value\", pre=True)\n def validate_value(cls, value, values):\n enum_dict = META_ANNS_DICT\n if isinstance(value, str):\n enum_type = enum_dict.get(values[\"name\"])\n if enum_type is not None:\n try:\n return enum_type(value)\n except ValueError:\n raise ValueError(f\"Invalid value: {value}\")\n else:\n raise ValueError(f\"Invalid mapping for {values['name']}\")\n\n return value\n\n def __eq__(self, other):\n return self.name == other.name and self.value == other.value\n
"},{"location":"api-reference/note/","title":"Note","text":" Bases: object
Represents a note object.
Attributes:
Name Type Descriptiontext
str
The text content of the note.
raw_text
str
The raw text content of the note.
regex_config
str
The path to the regex configuration file.
paragraphs
Optional[List[Paragraph]]
A list of paragraphs in the note.
Source code insrc/miade/note.py
class Note(object):\n \"\"\"\n Represents a note object.\n\n Attributes:\n text (str): The text content of the note.\n raw_text (str): The raw text content of the note.\n regex_config (str): The path to the regex configuration file.\n paragraphs (Optional[List[Paragraph]]): A list of paragraphs in the note.\n \"\"\"\n\n def __init__(self, text: str, regex_config_path: str = \"./data/regex_para_chunk.csv\"):\n self.text = text\n self.raw_text = text\n self.regex_config = load_regex_config_mappings(regex_config_path)\n self.paragraphs: Optional[List[Paragraph]] = []\n\n def clean_text(self) -> None:\n \"\"\"\n Cleans the text content of the note.\n\n This method performs various cleaning operations on the text content of the note,\n such as replacing spaces, removing punctuation, and removing empty lines.\n \"\"\"\n\n # Replace all types of spaces with a single normal space, preserving \"\\n\"\n self.text = re.sub(r\"(?:(?!\\n)\\s)+\", \" \", self.text)\n\n # Remove en dashes that are not between two numbers\n self.text = re.sub(r\"(?<![0-9])-(?![0-9])\", \"\", self.text)\n\n # Remove all punctuation except full stops, question marks, dash and line breaks\n self.text = re.sub(r\"[^\\w\\s.,?\\n-]\", \"\", self.text)\n\n # Remove spaces if the entire line (between two line breaks) is just spaces\n self.text = re.sub(r\"(?<=\\n)\\s+(?=\\n)\", \"\", self.text)\n\n def get_paragraphs(self) -> None:\n \"\"\"\n Splits the note into paragraphs.\n\n This method splits the text content of the note into paragraphs based on double line breaks.\n It also assigns a paragraph type to each paragraph based on matching patterns in the heading.\n \"\"\"\n\n paragraphs = re.split(r\"\\n\\n+\", self.text)\n start = 0\n\n for text in paragraphs:\n # Default to prose\n paragraph_type = ParagraphType.prose\n\n # Use re.search to find everything before first \\n\n match = re.search(r\"^(.*?)(?:\\n|$)([\\s\\S]*)\", text)\n\n # Check if a match is found\n if match:\n heading = match.group(1)\n body = match.group(2)\n else:\n heading = text\n body = \"\"\n\n end = start + len(text)\n paragraph = Paragraph(heading=heading, body=body, type=paragraph_type, start=start, end=end)\n start = end + 2 # Account for the two newline characters\n\n # Convert the heading to lowercase for case-insensitive matching\n if heading:\n heading = heading.lower()\n # Iterate through the dictionary items and patterns\n for paragraph_type, pattern in self.regex_config.items():\n if re.search(pattern, heading):\n paragraph.type = paragraph_type\n break # Exit the loop if a match is found\n\n self.paragraphs.append(paragraph)\n\n def __str__(self):\n return self.text\n
"},{"location":"api-reference/note/#miade.note.Note.clean_text","title":"clean_text()
","text":"Cleans the text content of the note.
This method performs various cleaning operations on the text content of the note, such as replacing spaces, removing punctuation, and removing empty lines.
Source code insrc/miade/note.py
def clean_text(self) -> None:\n \"\"\"\n Cleans the text content of the note.\n\n This method performs various cleaning operations on the text content of the note,\n such as replacing spaces, removing punctuation, and removing empty lines.\n \"\"\"\n\n # Replace all types of spaces with a single normal space, preserving \"\\n\"\n self.text = re.sub(r\"(?:(?!\\n)\\s)+\", \" \", self.text)\n\n # Remove en dashes that are not between two numbers\n self.text = re.sub(r\"(?<![0-9])-(?![0-9])\", \"\", self.text)\n\n # Remove all punctuation except full stops, question marks, dash and line breaks\n self.text = re.sub(r\"[^\\w\\s.,?\\n-]\", \"\", self.text)\n\n # Remove spaces if the entire line (between two line breaks) is just spaces\n self.text = re.sub(r\"(?<=\\n)\\s+(?=\\n)\", \"\", self.text)\n
"},{"location":"api-reference/note/#miade.note.Note.get_paragraphs","title":"get_paragraphs()
","text":"Splits the note into paragraphs.
This method splits the text content of the note into paragraphs based on double line breaks. It also assigns a paragraph type to each paragraph based on matching patterns in the heading.
Source code insrc/miade/note.py
def get_paragraphs(self) -> None:\n \"\"\"\n Splits the note into paragraphs.\n\n This method splits the text content of the note into paragraphs based on double line breaks.\n It also assigns a paragraph type to each paragraph based on matching patterns in the heading.\n \"\"\"\n\n paragraphs = re.split(r\"\\n\\n+\", self.text)\n start = 0\n\n for text in paragraphs:\n # Default to prose\n paragraph_type = ParagraphType.prose\n\n # Use re.search to find everything before first \\n\n match = re.search(r\"^(.*?)(?:\\n|$)([\\s\\S]*)\", text)\n\n # Check if a match is found\n if match:\n heading = match.group(1)\n body = match.group(2)\n else:\n heading = text\n body = \"\"\n\n end = start + len(text)\n paragraph = Paragraph(heading=heading, body=body, type=paragraph_type, start=start, end=end)\n start = end + 2 # Account for the two newline characters\n\n # Convert the heading to lowercase for case-insensitive matching\n if heading:\n heading = heading.lower()\n # Iterate through the dictionary items and patterns\n for paragraph_type, pattern in self.regex_config.items():\n if re.search(pattern, heading):\n paragraph.type = paragraph_type\n break # Exit the loop if a match is found\n\n self.paragraphs.append(paragraph)\n
"},{"location":"api-reference/noteprocessor/","title":"NoteProcessor","text":"Main processor of MiADE which extract, postprocesses, and deduplicates concepts given annotators (MedCAT models), Note, and existing concepts
Parameters:
Name Type Description Defaultmodel_directory
Path
Path to directory that contains medcat models and a config.yaml file
requiredmodel_config_path
Path
Path to the model config file. Defaults to None.
None
log_level
int
Log level. Defaults to logging.INFO.
INFO
dosage_extractor_log_level
int
Log level for dosage extractor. Defaults to logging.INFO.
INFO
device
str
Device to run inference on (cpu or gpu). Defaults to \"cpu\".
'cpu'
custom_annotators
List[Annotator]
List of custom annotators. Defaults to None.
None
Source code in src/miade/core.py
class NoteProcessor:\n \"\"\"\n Main processor of MiADE which extract, postprocesses, and deduplicates concepts given\n annotators (MedCAT models), Note, and existing concepts\n\n Args:\n model_directory (Path): Path to directory that contains medcat models and a config.yaml file\n model_config_path (Path, optional): Path to the model config file. Defaults to None.\n log_level (int, optional): Log level. Defaults to logging.INFO.\n dosage_extractor_log_level (int, optional): Log level for dosage extractor. Defaults to logging.INFO.\n device (str, optional): Device to run inference on (cpu or gpu). Defaults to \"cpu\".\n custom_annotators (List[Annotator], optional): List of custom annotators. Defaults to None.\n \"\"\"\n\n def __init__(\n self,\n model_directory: Path,\n model_config_path: Path = None,\n log_level: int = logging.INFO,\n dosage_extractor_log_level: int = logging.INFO,\n device: str = \"cpu\",\n custom_annotators: Optional[List[Annotator]] = None,\n ):\n logging.getLogger(\"miade\").setLevel(log_level)\n logging.getLogger(\"miade.dosageextractor\").setLevel(dosage_extractor_log_level)\n logging.getLogger(\"miade.drugdoseade\").setLevel(dosage_extractor_log_level)\n\n self.device: str = device\n\n self.annotators: List[Annotator] = []\n self.model_directory: Path = model_directory\n self.model_config_path: Path = model_config_path\n self.model_factory: ModelFactory = self._load_model_factory(custom_annotators)\n self.dosage_extractor: DosageExtractor = DosageExtractor()\n\n def _load_config(self) -> Dict:\n \"\"\"\n Loads the configuration file (config.yaml) in the configured model path.\n If the model path is not explicitly passed, it defaults to the model directory.\n\n Returns:\n A dictionary containing the loaded config file.\n \"\"\"\n if self.model_config_path is None:\n config_path = os.path.join(self.model_directory, \"config.yaml\")\n else:\n config_path = self.model_config_path\n\n if os.path.isfile(config_path):\n log.info(f\"Found config file {config_path}\")\n else:\n log.error(f\"No model config file found at {config_path}\")\n\n with open(config_path, \"r\") as f:\n config = yaml.safe_load(f)\n\n return config\n\n def _load_model_factory(self, custom_annotators: Optional[List[Annotator]] = None) -> ModelFactory:\n \"\"\"\n Loads the model factory which maps model aliases to MedCAT model IDs and MiADE annotators.\n\n Args:\n custom_annotators (List[Annotators], optional): List of custom annotators to initialize. Defaults to None.\n\n Returns:\n The initialized ModelFactory object.\n\n Raises:\n Exception: If there is an error loading MedCAT models.\n\n \"\"\"\n meta_cat_config_dict = {\"general\": {\"device\": self.device}}\n config_dict = self._load_config()\n loaded_models = {}\n\n # get model {id: cat_model}\n log.info(f\"Loading MedCAT models from {self.model_directory}\")\n for model_pack_filepath in self.model_directory.glob(\"*.zip\"):\n try:\n cat = MiADE_CAT.load_model_pack(str(model_pack_filepath), meta_cat_config_dict=meta_cat_config_dict)\n # temp fix reload to load stop words\n cat.pipe._nlp = spacy.load(\n cat.config.general.spacy_model, disable=cat.config.general.spacy_disabled_components\n )\n cat._create_pipeline(config=cat.config)\n cat_id = cat.config.version[\"id\"]\n loaded_models[cat_id] = cat\n except Exception as e:\n raise Exception(f\"Error loading MedCAT models: {e}\")\n\n mapped_models = {}\n # map to name if given {name: <class CAT>}\n if \"models\" in config_dict:\n for name, model_id in config_dict[\"models\"].items():\n cat_model = loaded_models.get(model_id)\n if cat_model is None:\n log.warning(f\"No match for model id {model_id} in {self.model_directory}, skipping\")\n continue\n mapped_models[name] = cat_model\n else:\n log.warning(\"No model ids configured!\")\n\n mapped_annotators = {}\n # {name: <class Annotator>}\n if \"annotators\" in config_dict:\n for name, annotator_string in config_dict[\"annotators\"].items():\n if custom_annotators is not None:\n for annotator_class in custom_annotators:\n if annotator_class.__name__ == annotator_string:\n mapped_annotators[name] = annotator_class\n break\n if name not in mapped_annotators:\n try:\n annotator_class = getattr(sys.modules[__name__], annotator_string)\n mapped_annotators[name] = annotator_class\n except AttributeError as e:\n log.warning(f\"{annotator_string} not found: {e}\")\n else:\n log.warning(\"No annotators configured!\")\n\n mapped_configs = {}\n if \"general\" in config_dict:\n for name, config in config_dict[\"general\"].items():\n try:\n mapped_configs[name] = AnnotatorConfig(**config)\n except Exception as e:\n log.error(f\"Error processing config for '{name}': {str(e)}\")\n else:\n log.warning(\"No general settings configured, using default settings.\")\n\n model_factory_config = {\"models\": mapped_models, \"annotators\": mapped_annotators, \"configs\": mapped_configs}\n\n return ModelFactory(**model_factory_config)\n\n def add_annotator(self, name: str) -> None:\n \"\"\"\n Adds an annotator to the processor.\n\n Args:\n name (str): The alias of the annotator to add.\n\n Returns:\n None\n\n Raises:\n Exception: If there is an error creating the annotator.\n \"\"\"\n try:\n annotator = create_annotator(name, self.model_factory)\n log.info(\n f\"Added {type(annotator).__name__} to processor with config {self.model_factory.configs.get(name)}\"\n )\n except Exception as e:\n raise Exception(f\"Error creating annotator: {e}\")\n\n self.annotators.append(annotator)\n\n def remove_annotator(self, name: str) -> None:\n \"\"\"\n Removes an annotator from the processor.\n\n Args:\n name (str): The alias of the annotator to remove.\n\n Returns:\n None\n \"\"\"\n annotator_found = False\n annotator_name = self.model_factory.annotators[name]\n\n for annotator in self.annotators:\n if type(annotator).__name__ == annotator_name.__name__:\n self.annotators.remove(annotator)\n annotator_found = True\n log.info(f\"Removed {type(annotator).__name__} from processor\")\n break\n\n if not annotator_found:\n log.warning(f\"Annotator {type(name).__name__} not found in processor\")\n\n def print_model_cards(self) -> None:\n \"\"\"\n Prints the model cards for each annotator in the `annotators` list.\n\n Each model card includes the name of the annotator's class and its category.\n \"\"\"\n for annotator in self.annotators:\n print(f\"{type(annotator).__name__}: {annotator.cat}\")\n\n def process(self, note: Note, record_concepts: Optional[List[Concept]] = None) -> List[Concept]:\n \"\"\"\n Process the given note and extract concepts using the loaded annotators.\n\n Args:\n note (Note): The note to be processed.\n record_concepts (Optional[List[Concept]]): A list of existing concepts in the EHR record.\n\n Returns:\n A list of extracted concepts.\n\n \"\"\"\n if not self.annotators:\n log.warning(\"No annotators loaded, use .add_annotator() to load annotators\")\n return []\n\n concepts: List[Concept] = []\n\n for annotator in self.annotators:\n log.debug(f\"Processing concepts with {type(annotator).__name__}\")\n if Category.MEDICATION in annotator.concept_types:\n detected_concepts = annotator(note, record_concepts, self.dosage_extractor)\n concepts.extend(detected_concepts)\n else:\n detected_concepts = annotator(note, record_concepts)\n concepts.extend(detected_concepts)\n\n return concepts\n\n def get_concept_dicts(\n self, note: Note, filter_uncategorized: bool = True, record_concepts: Optional[List[Concept]] = None\n ) -> List[Dict]:\n \"\"\"\n Returns concepts in dictionary format.\n\n Args:\n note (Note): Note containing text to extract concepts from.\n filter_uncategorized (bool): If True, does not return concepts where category=None. Default is True.\n record_concepts (Optional[List[Concept]]): List of concepts in existing record.\n\n Returns:\n Extracted concepts in JSON-compatible dictionary format.\n \"\"\"\n concepts = self.process(note, record_concepts)\n concept_list = []\n for concept in concepts:\n if filter_uncategorized and concept.category is None:\n continue\n concept_dict = concept.__dict__\n if concept.dosage is not None:\n concept_dict[\"dosage\"] = {\n \"dose\": concept.dosage.dose.dict() if concept.dosage.dose else None,\n \"duration\": concept.dosage.duration.dict() if concept.dosage.duration else None,\n \"frequency\": concept.dosage.frequency.dict() if concept.dosage.frequency else None,\n \"route\": concept.dosage.route.dict() if concept.dosage.route else None,\n }\n if concept.meta is not None:\n meta_anns = []\n for meta in concept.meta:\n meta_dict = meta.__dict__\n meta_dict[\"value\"] = meta.value.name\n meta_anns.append(meta_dict)\n concept_dict[\"meta\"] = meta_anns\n if concept.category is not None:\n concept_dict[\"category\"] = concept.category.name\n concept_list.append(concept_dict)\n\n return concept_list\n
"},{"location":"api-reference/noteprocessor/#miade.core.NoteProcessor.add_annotator","title":"add_annotator(name)
","text":"Adds an annotator to the processor.
Parameters:
Name Type Description Defaultname
str
The alias of the annotator to add.
requiredReturns:
Type DescriptionNone
None
Raises:
Type DescriptionException
If there is an error creating the annotator.
Source code insrc/miade/core.py
def add_annotator(self, name: str) -> None:\n \"\"\"\n Adds an annotator to the processor.\n\n Args:\n name (str): The alias of the annotator to add.\n\n Returns:\n None\n\n Raises:\n Exception: If there is an error creating the annotator.\n \"\"\"\n try:\n annotator = create_annotator(name, self.model_factory)\n log.info(\n f\"Added {type(annotator).__name__} to processor with config {self.model_factory.configs.get(name)}\"\n )\n except Exception as e:\n raise Exception(f\"Error creating annotator: {e}\")\n\n self.annotators.append(annotator)\n
"},{"location":"api-reference/noteprocessor/#miade.core.NoteProcessor.get_concept_dicts","title":"get_concept_dicts(note, filter_uncategorized=True, record_concepts=None)
","text":"Returns concepts in dictionary format.
Parameters:
Name Type Description Defaultnote
Note
Note containing text to extract concepts from.
requiredfilter_uncategorized
bool
If True, does not return concepts where category=None. Default is True.
True
record_concepts
Optional[List[Concept]]
List of concepts in existing record.
None
Returns:
Type DescriptionList[Dict]
Extracted concepts in JSON-compatible dictionary format.
Source code insrc/miade/core.py
def get_concept_dicts(\n self, note: Note, filter_uncategorized: bool = True, record_concepts: Optional[List[Concept]] = None\n) -> List[Dict]:\n \"\"\"\n Returns concepts in dictionary format.\n\n Args:\n note (Note): Note containing text to extract concepts from.\n filter_uncategorized (bool): If True, does not return concepts where category=None. Default is True.\n record_concepts (Optional[List[Concept]]): List of concepts in existing record.\n\n Returns:\n Extracted concepts in JSON-compatible dictionary format.\n \"\"\"\n concepts = self.process(note, record_concepts)\n concept_list = []\n for concept in concepts:\n if filter_uncategorized and concept.category is None:\n continue\n concept_dict = concept.__dict__\n if concept.dosage is not None:\n concept_dict[\"dosage\"] = {\n \"dose\": concept.dosage.dose.dict() if concept.dosage.dose else None,\n \"duration\": concept.dosage.duration.dict() if concept.dosage.duration else None,\n \"frequency\": concept.dosage.frequency.dict() if concept.dosage.frequency else None,\n \"route\": concept.dosage.route.dict() if concept.dosage.route else None,\n }\n if concept.meta is not None:\n meta_anns = []\n for meta in concept.meta:\n meta_dict = meta.__dict__\n meta_dict[\"value\"] = meta.value.name\n meta_anns.append(meta_dict)\n concept_dict[\"meta\"] = meta_anns\n if concept.category is not None:\n concept_dict[\"category\"] = concept.category.name\n concept_list.append(concept_dict)\n\n return concept_list\n
"},{"location":"api-reference/noteprocessor/#miade.core.NoteProcessor.print_model_cards","title":"print_model_cards()
","text":"Prints the model cards for each annotator in the annotators
list.
Each model card includes the name of the annotator's class and its category.
Source code insrc/miade/core.py
def print_model_cards(self) -> None:\n \"\"\"\n Prints the model cards for each annotator in the `annotators` list.\n\n Each model card includes the name of the annotator's class and its category.\n \"\"\"\n for annotator in self.annotators:\n print(f\"{type(annotator).__name__}: {annotator.cat}\")\n
"},{"location":"api-reference/noteprocessor/#miade.core.NoteProcessor.process","title":"process(note, record_concepts=None)
","text":"Process the given note and extract concepts using the loaded annotators.
Parameters:
Name Type Description Defaultnote
Note
The note to be processed.
requiredrecord_concepts
Optional[List[Concept]]
A list of existing concepts in the EHR record.
None
Returns:
Type DescriptionList[Concept]
A list of extracted concepts.
Source code insrc/miade/core.py
def process(self, note: Note, record_concepts: Optional[List[Concept]] = None) -> List[Concept]:\n \"\"\"\n Process the given note and extract concepts using the loaded annotators.\n\n Args:\n note (Note): The note to be processed.\n record_concepts (Optional[List[Concept]]): A list of existing concepts in the EHR record.\n\n Returns:\n A list of extracted concepts.\n\n \"\"\"\n if not self.annotators:\n log.warning(\"No annotators loaded, use .add_annotator() to load annotators\")\n return []\n\n concepts: List[Concept] = []\n\n for annotator in self.annotators:\n log.debug(f\"Processing concepts with {type(annotator).__name__}\")\n if Category.MEDICATION in annotator.concept_types:\n detected_concepts = annotator(note, record_concepts, self.dosage_extractor)\n concepts.extend(detected_concepts)\n else:\n detected_concepts = annotator(note, record_concepts)\n concepts.extend(detected_concepts)\n\n return concepts\n
"},{"location":"api-reference/noteprocessor/#miade.core.NoteProcessor.remove_annotator","title":"remove_annotator(name)
","text":"Removes an annotator from the processor.
Parameters:
Name Type Description Defaultname
str
The alias of the annotator to remove.
requiredReturns:
Type DescriptionNone
None
Source code insrc/miade/core.py
def remove_annotator(self, name: str) -> None:\n \"\"\"\n Removes an annotator from the processor.\n\n Args:\n name (str): The alias of the annotator to remove.\n\n Returns:\n None\n \"\"\"\n annotator_found = False\n annotator_name = self.model_factory.annotators[name]\n\n for annotator in self.annotators:\n if type(annotator).__name__ == annotator_name.__name__:\n self.annotators.remove(annotator)\n annotator_found = True\n log.info(f\"Removed {type(annotator).__name__} from processor\")\n break\n\n if not annotator_found:\n log.warning(f\"Annotator {type(name).__name__} not found in processor\")\n
"},{"location":"api-reference/problemsannotator/","title":"ProblemsAnnotator","text":" Bases: Annotator
Annotator class for identifying and processing problems in medical notes.
This class extends the base Annotator
class and provides specific functionality for identifying and processing problems in medical notes. It implements methods for loading problem lookup data, processing meta annotations, filtering concepts, and post-processing the annotated concepts.
Attributes:
Name Type Descriptioncat
CAT
The CAT (Concept Annotation Tool) instance used for annotation.
config
AnnotatorConfig
The configuration object for the annotator.
Propertiesconcept_types (list): A list of concept types supported by this annotator. pipeline (list): The list of processing steps in the annotation pipeline.
Source code insrc/miade/annotators.py
class ProblemsAnnotator(Annotator):\n \"\"\"\n Annotator class for identifying and processing problems in medical notes.\n\n This class extends the base `Annotator` class and provides specific functionality\n for identifying and processing problems in medical notes. It implements methods\n for loading problem lookup data, processing meta annotations, filtering concepts,\n and post-processing the annotated concepts.\n\n Attributes:\n cat (CAT): The CAT (Concept Annotation Tool) instance used for annotation.\n config (AnnotatorConfig): The configuration object for the annotator.\n\n Properties:\n concept_types (list): A list of concept types supported by this annotator.\n pipeline (list): The list of processing steps in the annotation pipeline.\n \"\"\"\n\n def __init__(self, cat: CAT, config: AnnotatorConfig = None):\n super().__init__(cat, config)\n self._load_problems_lookup_data()\n\n @property\n def concept_types(self) -> List[Category]:\n \"\"\"\n Get the list of concept types supported by this annotator.\n\n Returns:\n [Category.PROBLEM]\n \"\"\"\n return [Category.PROBLEM]\n\n @property\n def pipeline(self) -> List[str]:\n \"\"\"\n Get the list of processing steps in the annotation pipeline.\n\n Returns:\n [\"preprocessor\", \"medcat\", \"paragrapher\", \"postprocessor\", \"deduplicator\"]\n \"\"\"\n return [\"preprocessor\", \"medcat\", \"paragrapher\", \"postprocessor\", \"deduplicator\"]\n\n def _load_problems_lookup_data(self) -> None:\n \"\"\"\n Load the problem lookup data.\n\n Raises:\n RuntimeError: If the lookup data directory does not exist.\n \"\"\"\n if not os.path.isdir(self.config.lookup_data_path):\n raise RuntimeError(f\"No lookup data configured: {self.config.lookup_data_path} does not exist!\")\n else:\n self.negated_lookup = load_lookup_data(self.config.lookup_data_path + \"negated.csv\", as_dict=True)\n self.historic_lookup = load_lookup_data(self.config.lookup_data_path + \"historic.csv\", as_dict=True)\n self.suspected_lookup = load_lookup_data(self.config.lookup_data_path + \"suspected.csv\", as_dict=True)\n self.filtering_blacklist = load_lookup_data(\n self.config.lookup_data_path + \"problem_blacklist.csv\", no_header=True\n )\n\n def _process_meta_annotations(self, concept: Concept) -> Optional[Concept]:\n \"\"\"\n Process the meta annotations for a concept.\n\n Args:\n concept (Concept): The concept to process.\n\n Returns:\n The processed concept, or None if it should be removed.\n\n Raises:\n ValueError: If the concept has an invalid negex value.\n \"\"\"\n # Add, convert, or ignore concepts\n meta_ann_values = [meta_ann.value for meta_ann in concept.meta] if concept.meta is not None else []\n\n convert = False\n tag = \"\"\n # only get meta model results if negex is false\n if concept.negex is not None:\n if concept.negex:\n convert = self.negated_lookup.get(int(concept.id), False)\n tag = \" (negated)\"\n elif Presence.SUSPECTED in meta_ann_values:\n convert = self.suspected_lookup.get(int(concept.id), False)\n tag = \" (suspected)\"\n elif Relevance.HISTORIC in meta_ann_values:\n convert = self.historic_lookup.get(int(concept.id), False)\n tag = \" (historic)\"\n else:\n if Presence.NEGATED in meta_ann_values:\n convert = self.negated_lookup.get(int(concept.id), False)\n tag = \" (negated)\"\n elif Presence.SUSPECTED in meta_ann_values:\n convert = self.suspected_lookup.get(int(concept.id), False)\n tag = \" (suspected)\"\n elif Relevance.HISTORIC in meta_ann_values:\n convert = self.historic_lookup.get(int(concept.id), False)\n tag = \" (historic)\"\n\n if convert:\n if tag == \" (negated)\" and concept.negex:\n log.debug(\n f\"Converted concept ({concept.id} | {concept.name}) to ({str(convert)} | {concept.name + tag}): \"\n f\"negation detected by negex\"\n )\n else:\n log.debug(\n f\"Converted concept ({concept.id} | {concept.name}) to ({str(convert)} | {concept.name + tag}):\"\n f\"detected by meta model\"\n )\n concept.id = str(convert)\n concept.name += tag\n else:\n if concept.negex:\n log.debug(f\"Removed concept ({concept.id} | {concept.name}): negation (negex) with no conversion match\")\n return None\n if concept.negex is None and Presence.NEGATED in meta_ann_values:\n log.debug(\n f\"Removed concept ({concept.id} | {concept.name}): negation (meta model) with no conversion match\"\n )\n return None\n if Presence.SUSPECTED in meta_ann_values:\n log.debug(f\"Removed concept ({concept.id} | {concept.name}): suspected with no conversion match\")\n return None\n if Relevance.IRRELEVANT in meta_ann_values:\n log.debug(f\"Removed concept ({concept.id} | {concept.name}): irrelevant concept\")\n return None\n if Relevance.HISTORIC in meta_ann_values:\n log.debug(f\"No change to concept ({concept.id} | {concept.name}): historic with no conversion match\")\n\n concept.category = Category.PROBLEM\n\n return concept\n\n def _is_blacklist(self, concept):\n \"\"\"\n Check if a concept is in the filtering blacklist.\n\n Args:\n concept: The concept to check.\n\n Returns:\n True if the concept is in the blacklist, False otherwise.\n \"\"\"\n # filtering blacklist\n if int(concept.id) in self.filtering_blacklist.values:\n log.debug(f\"Removed concept ({concept.id} | {concept.name}): concept in problems blacklist\")\n return True\n return False\n\n def _process_meta_ann_by_paragraph(\n self, concept: Concept, paragraph: Paragraph, prob_concepts_in_structured_sections: List[Concept]\n ):\n \"\"\"\n Process the meta annotations for a concept based on the paragraph type.\n\n Args:\n concept (Concept): The concept to process.\n paragraph (Paragraph): The paragraph containing the concept.\n prob_concepts_in_structured_sections (List[Concept]): The list of problem concepts in structured sections.\n \"\"\"\n # if paragraph is structured problems section, add to prob list and convert to corresponding relevance\n if paragraph.type in self.structured_prob_lists:\n prob_concepts_in_structured_sections.append(concept)\n for meta in concept.meta:\n if meta.name == \"relevance\" and meta.value == Relevance.IRRELEVANT:\n new_relevance = self.structured_prob_lists[paragraph.type]\n log.debug(\n f\"Converted {meta.value} to \"\n f\"{new_relevance} for concept ({concept.id} | {concept.name}): \"\n f\"paragraph is {paragraph.type}\"\n )\n meta.value = new_relevance\n # if paragraph is meds or irrelevant section, convert problems to irrelevant\n elif paragraph.type in self.structured_med_lists or paragraph.type in self.irrelevant_paragraphs:\n for meta in concept.meta:\n if meta.name == \"relevance\" and meta.value != Relevance.IRRELEVANT:\n log.debug(\n f\"Converted {meta.value} to \"\n f\"{Relevance.IRRELEVANT} for concept ({concept.id} | {concept.name}): \"\n f\"paragraph is {paragraph.type}\"\n )\n meta.value = Relevance.IRRELEVANT\n\n def process_paragraphs(self, note: Note, concepts: List[Concept]) -> List[Concept]:\n \"\"\"\n Process the paragraphs in a note and filter the concepts.\n\n Args:\n note (Note): The note to process.\n concepts (List[Concept]): The list of concepts to filter.\n\n Returns:\n The filtered list of concepts.\n \"\"\"\n prob_concepts_in_structured_sections: List[Concept] = []\n\n for paragraph in note.paragraphs:\n for concept in concepts:\n if concept.start >= paragraph.start and concept.end <= paragraph.end:\n # log.debug(f\"({concept.name} | {concept.id}) is in {paragraph.type}\")\n if concept.meta:\n self._process_meta_ann_by_paragraph(concept, paragraph, prob_concepts_in_structured_sections)\n\n # if more than set no. concepts in prob or imp or pmh sections, return only those and ignore all other concepts\n if len(prob_concepts_in_structured_sections) > self.config.structured_list_limit:\n log.debug(\n f\"Ignoring concepts elsewhere in the document because \"\n f\"more than {self.config.structured_list_limit} concepts exist \"\n f\"in prob, imp, pmh structured sections: {len(prob_concepts_in_structured_sections)}\"\n )\n return prob_concepts_in_structured_sections\n\n return concepts\n\n def postprocess(self, concepts: List[Concept]) -> List[Concept]:\n \"\"\"\n Post-process the concepts and filter out irrelevant concepts.\n\n Args:\n concepts (List[Concept]): The list of concepts to post-process.\n\n Returns:\n The filtered list of concepts.\n \"\"\"\n # deepcopy so we still have reference to original list of concepts\n all_concepts = deepcopy(concepts)\n filtered_concepts = []\n for concept in all_concepts:\n if self._is_blacklist(concept):\n continue\n # meta annotations\n concept = self._process_meta_annotations(concept)\n # ignore concepts filtered by meta-annotations\n if concept is None:\n continue\n filtered_concepts.append(concept)\n\n return filtered_concepts\n
"},{"location":"api-reference/problemsannotator/#miade.annotators.ProblemsAnnotator.concept_types","title":"concept_types: List[Category]
property
","text":"Get the list of concept types supported by this annotator.
Returns:
Type DescriptionList[Category]
[Category.PROBLEM]
"},{"location":"api-reference/problemsannotator/#miade.annotators.ProblemsAnnotator.pipeline","title":"pipeline: List[str]
property
","text":"Get the list of processing steps in the annotation pipeline.
Returns:
Type DescriptionList[str]
[\"preprocessor\", \"medcat\", \"paragrapher\", \"postprocessor\", \"deduplicator\"]
"},{"location":"api-reference/problemsannotator/#miade.annotators.ProblemsAnnotator.postprocess","title":"postprocess(concepts)
","text":"Post-process the concepts and filter out irrelevant concepts.
Parameters:
Name Type Description Defaultconcepts
List[Concept]
The list of concepts to post-process.
requiredReturns:
Type DescriptionList[Concept]
The filtered list of concepts.
Source code insrc/miade/annotators.py
def postprocess(self, concepts: List[Concept]) -> List[Concept]:\n \"\"\"\n Post-process the concepts and filter out irrelevant concepts.\n\n Args:\n concepts (List[Concept]): The list of concepts to post-process.\n\n Returns:\n The filtered list of concepts.\n \"\"\"\n # deepcopy so we still have reference to original list of concepts\n all_concepts = deepcopy(concepts)\n filtered_concepts = []\n for concept in all_concepts:\n if self._is_blacklist(concept):\n continue\n # meta annotations\n concept = self._process_meta_annotations(concept)\n # ignore concepts filtered by meta-annotations\n if concept is None:\n continue\n filtered_concepts.append(concept)\n\n return filtered_concepts\n
"},{"location":"api-reference/problemsannotator/#miade.annotators.ProblemsAnnotator.process_paragraphs","title":"process_paragraphs(note, concepts)
","text":"Process the paragraphs in a note and filter the concepts.
Parameters:
Name Type Description Defaultnote
Note
The note to process.
requiredconcepts
List[Concept]
The list of concepts to filter.
requiredReturns:
Type DescriptionList[Concept]
The filtered list of concepts.
Source code insrc/miade/annotators.py
def process_paragraphs(self, note: Note, concepts: List[Concept]) -> List[Concept]:\n \"\"\"\n Process the paragraphs in a note and filter the concepts.\n\n Args:\n note (Note): The note to process.\n concepts (List[Concept]): The list of concepts to filter.\n\n Returns:\n The filtered list of concepts.\n \"\"\"\n prob_concepts_in_structured_sections: List[Concept] = []\n\n for paragraph in note.paragraphs:\n for concept in concepts:\n if concept.start >= paragraph.start and concept.end <= paragraph.end:\n # log.debug(f\"({concept.name} | {concept.id}) is in {paragraph.type}\")\n if concept.meta:\n self._process_meta_ann_by_paragraph(concept, paragraph, prob_concepts_in_structured_sections)\n\n # if more than set no. concepts in prob or imp or pmh sections, return only those and ignore all other concepts\n if len(prob_concepts_in_structured_sections) > self.config.structured_list_limit:\n log.debug(\n f\"Ignoring concepts elsewhere in the document because \"\n f\"more than {self.config.structured_list_limit} concepts exist \"\n f\"in prob, imp, pmh structured sections: {len(prob_concepts_in_structured_sections)}\"\n )\n return prob_concepts_in_structured_sections\n\n return concepts\n
"},{"location":"user-guide/configuration/","title":"Configurations","text":""},{"location":"user-guide/configuration/#annotator","title":"Annotator","text":"The MiADE processor is configured by a yaml
file that maps a human-readable key for each of your models to a MedCAT model ID and a MiADE annotator class. The config file must be in the same folder as the MedCAT models.
models
: The models section maps human-readable key-value pairing to the MedCAT model ID to use in MiADEannotators
: The annotators section maps human-readable key-value pairing to Annotator
processing classes to use in MiADEgeneral
lookup_data_path
: Specifies the lookup data to usenegation_detection
: negex
(rule-based algorithm) or None
(use default MetaCAT models)structured_list_limit
: Specifies the maximum number of concepts detected in a structured paragraph section. If there are more than the specified number of concepts, then concepts in prose are ignored (to avoid returning too many concepts which could be less relevant). Default 0 so that this feature is disabled by default.disable
: Disable any specific pipeline components - the API here is similar to spacy pipelinesadd_numbering
: Option to add a number prefix to the concept display names e.g. \"01 Diabetes\"models:\n problems: f25ec9423958e8d6\n meds/allergies: a146c741501cf1f7\nannotators:\n problems: ProblemsAnnotator\n meds/allergies: MedsAllergiesAnnotator\ngeneral:\n problems:\n lookup_data_path: ./lookup_data/\n negation_detection: None\n structured_list_limit: 0 # if more than this number of concepts in structure section, ignore concepts in prose\n disable: []\n add_numbering: True\n meds/allergies:\n lookup_data_path: ./lookup_data/\n negation_detection: None\n disable: []\n add_numbering: False\n
"},{"location":"user-guide/configuration/#lookup-table","title":"Lookup Table","text":"Lookup tables are by default not packaged with the main MiADE package to provide flexibility to customise the postprocessing steps. We provide example lookup data in miade-dataset
which you can download and use.
git clone https://github.com/uclh-criu/miade-datasets.git\n
"},{"location":"user-guide/quickstart/","title":"Quickstart","text":""},{"location":"user-guide/quickstart/#extract-concepts-and-dosages-from-a-note-using-miade","title":"Extract concepts and dosages from a Note using MiADE","text":""},{"location":"user-guide/quickstart/#configuring-the-miade-processor","title":"Configuring the MiADE Processor","text":"NoteProcessor
is the MiADE core. It is initialised with a model directory path that contains all the MedCAT model pack .zip files we would like to use in our pipeline, and a config file that maps an alias to the model IDs (model IDs can be found in MedCAT model_cards
or usually will be in the name) and annotators we would like to use:
config.yaml
models:\n problems: f25ec9423958e8d6\n meds/allergies: a146c741501cf1f7\nannotators:\n problems: ProblemsAnnotator\n meds/allergies: MedsAllergiesAnnotator\n
We can initialise a MiADE NoteProcessor
object by passing in the model directory which contains our MedCAT models and config.yaml
file: miade = NoteProcessor(Path(\"path/to/model/dir\"))\n
Once NoteProcessor
is initialised, we can add annotators by the aliases we have specified in config.yaml
to our processor: miade.add_annotator(\"problems\", use_negex=True)\nmiade.add_annotator(\"meds/allergies\")\n
When adding annotators, we have the option to add NegSpacy to the MedCAT spaCy pipeline, which implements the NegEx algorithm (Chapman et al. 2001) for negation detection. This allows the models to perform simple rule-based negation detection in the absence of MetaCAT models.
"},{"location":"user-guide/quickstart/#creating-a-note","title":"Creating a Note","text":"Create a Note
object which contains the text we would like to extract concepts and dosages from:
text = \"\"\"\nSuspected heart failure\n\nPMH:\nprev history of Hypothyroidism\nMI 10 years ago\n\n\nCurrent meds:\nLosartan 100mg daily\nAtorvastatin 20mg daily\nParacetamol 500mg tablets 2 tabs qds prn\n\nAllergies:\nPenicillin - rash\n\nReferred with swollen ankles and shortness of breath since 2 weeks.\n\"\"\"\n\nnote = Note(text)\n
"},{"location":"user-guide/quickstart/#extracting-concepts-and-dosages","title":"Extracting Concepts and Dosages","text":"MiADE currently extracts concepts in SNOMED CT. Each concept contains:
name
: name of conceptid
: concept IDcategory
: type of concept e.g. problems, medictionsstart
: start index of concept spanend
: end index of concept spandosage
: for medication conceptsnegex
: Negex result if configuredmeta
: Meta annotations if MetaCAT models are usedThe dosages associated with medication concepts are extracted by the built-in MiADE DosageExtractor
, using a combination of NER model Med7 and the CALIBER rule-based drug dose lookup algorithm. It returns: The output format is directly translatable to HL7 CDA but can also easily be converted to FHIR.
dose
duration
frequency
route
Putting it all together, we can now extract concepts from our Note
object:
concepts = miade.process(note)\nfor concept in concepts:\n print(concept)\n\n# {name: breaking out - eruption, id: 271807003, category: Category.REACTION, start: 204, end: 208, dosage: None, negex: False, meta: None} \n# {name: penicillin, id: 764146007, category: Category.ALLERGY, start: 191, end: 201, dosage: None, negex: False, meta: None} \n
concepts = miade.get_concept_dicts(note)\nprint(concepts)\n\n# [{'name': 'hypothyroidism (historic)',\n# 'id': '161443002',\n# 'category': 'PROBLEM',\n# 'start': 46,\n# 'end': 60,\n# 'dosage': None,\n# 'negex': False,\n# 'meta': [{'name': 'relevance',\n# 'value': 'HISTORIC',\n# 'confidence': 0.999841570854187},\n# ...\n
"},{"location":"user-guide/quickstart/#handling-existing-records-deduplication","title":"Handling existing records: deduplication","text":"MiADE is built to handle existing medication records from EHR systems that can be sent alongside the note. It will perform basic deduplication matching on id for existing record concepts.
# create list of concepts that already exists in patient record\nrecord_concepts = [\n Concept(id=\"161443002\", name=\"hypothyroidism (historic)\", category=Category.PROBLEM),\n Concept(id=\"267039000\", name=\"swollen ankle\", category=Category.PROBLEM)\n]\n
We can pass in a list of existing concepts from the EHR to MiADE at runtime:
miade.process(note=note, record_concepts=record_concepts)\n
"},{"location":"user-guide/quickstart/#customising-miade","title":"Customising MiADE","text":""},{"location":"user-guide/quickstart/#training-custom-medcat-models","title":"Training Custom MedCAT Models","text":"MiADE provides command line interface scripts for automatically building MedCAT model packs, unsupervised training, supervised training steps, and the creation and training of MetaCAT models. For more information on MedCAT models, see MedCAT documentation and paper.
The --synthetic-data-path
option allows you to add synthetically generated training data in CSV format to the supervised and MetaCAT training steps. The CSV should have the following format:
# Trains unsupervised training step of MedCAT model\nmiade train $MODEL_PACK_PATH $TEXT_DATA_PATH --tag \"miade-example\"\n
# Trains supervised training step of MedCAT model\nmiade train-supervised $MODEL_PACK_PATH $MEDCAT_JSON_EXPORT --synthetic-data-path $SYNTHETIC_CSV_PATH\n
# Creates BBPE tokenizer for MetaCAT\nmiade create-bbpe-tokenizer $TEXT_DATA_PATH\n
# Initialises MetaCAT models to do training on\nmiade create-metacats $TOKENIZER_PATH $CATEGORY_NAMES\n
# Trains the MetaCAT Bi-LSTM models\nmiade train-metacats $METACAT_MODEL_PATH $MEDCAT_JSON_EXPORT --synthetic-data-path $SYNTHETIC_CSV_PATH\n
# Packages MetaCAT models with the main MedCAT model pack\nmiade add_metacat_models $MODEL_PACK_PATH $METACAT_MODEL_PATH\n
"},{"location":"user-guide/quickstart/#creating-custom-miade-annotators","title":"Creating Custom MiADE Annotators","text":"We can add custom annotators with more specialised postprocessing steps to MiADE by subclassing Annotator
and initialising NoteProcessor
with a list of custom annotators
Annotator
methods include:
.get_concepts()
: returns MedCAT output as MiADE Concepts
.add_dosages_to_concepts()
: uses the MiADE built-in DosageExtractor
to add dosages associated with medication concepts.deduplicate()
: filters duplicate concepts in list An example custom Annotator
class might look like this:
class CustomAnnotator(Annotator):\n def __init__(self, cat: MiADE_CAT):\n super().__init__(cat)\n # we need to include MEDICATIONS in concept types so MiADE processor will also extract dosages\n self.concept_types = [Category.MEDICATION, Category.ALLERGY]\n\n def postprocess(self, concepts: List[Concept]) -> List[Concept]:\n # some example post-processing code\n reactions = [\"271807003\"]\n allergens = [\"764146007\"]\n for concept in concepts:\n if concept.id in reactions:\n concept.category = Category.REACTION\n elif concept.id in allergens:\n concept.category = Category.ALLERGY\n return concepts\n\n def __call__(\n self,\n note: Note,\n record_concepts: Optional[List[Concept]] = None,\n dosage_extractor: Optional[DosageExtractor] = None,\n ):\n concepts = self.get_concepts(note)\n concepts = self.postprocess(concepts)\n # run dosage extractor if given\n if dosage_extractor is not None:\n concepts = self.add_dosages_to_concepts(dosage_extractor, concepts, note)\n concepts = self.deduplicate(concepts, record_concepts)\n\n return concepts\n
Add custom annotator to config file:
config.yamlmodels:\n problems: f25ec9423958e8d6\n meds/allergies: a146c741501cf1f7\n custom: a146c741501cf1f7\nannotators:\n problems: ProblemsAnnotator\n meds/allergies: MedsAllergiesAnnotator\n custom: CustomAnnotator\n
Initialise MiADE with the custom annotator:
miade = NoteProcessor(Path(MODEL_DIR), custom_annotators=[CustomAnnotator])\nmiade.add_annotator(\"custom\")\n
"}]}
\ No newline at end of file