Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transformers should include concepts that fail to normalize #417

Open
korikuzma opened this issue Jan 20, 2025 · 2 comments
Open

Transformers should include concepts that fail to normalize #417

korikuzma opened this issue Jan 20, 2025 · 2 comments
Assignees
Labels
enhancement New feature or request priority:medium Medium priority

Comments

@korikuzma
Copy link
Member

Currently, we only include concepts that succeed in VICC normalization. However, in the CDMs we also want to be able to include concepts that fail to normalize. For these cases, we'll simply add an extension ({"name": "vicc_normalizer_failure", "value": True}).

I'm not really sure how we want to handle this in /search... @mcannon068nw may have some guidance. We can create a separate issue for this. For now, we'll skip loading concepts in the DB that have this extension.

@korikuzma korikuzma added enhancement New feature or request priority:medium Medium priority labels Jan 20, 2025
@korikuzma korikuzma self-assigned this Jan 20, 2025
@korikuzma
Copy link
Member Author

MOA does not have internal identifiers for therapy, genes, or diseases.

In cases where normalization succeeds, we use

f"moa.{norm_resp.{therapy|disease|gene}.id}"

(examples:
moa.normalize.therapy.rxcui:1727455, moa.normalize.disease.ncit:C2926, moa.normalize.gene.hgnc:427) for the MappableConcept.id.

I'm not sure what we want the id to be for concepts that fail normalization. Currently, I'm just doing:

def _sanitize_name(name: str) -> str:
    """Trim leading and trailing whitespace and replace whitespace characters with
    underscores

    :param name: Name to sanitize
    :return: Sanitized string with whitespace characters replaced by underscores
    """
    return re.sub(r"\s+", "_", name.strip())

id_ = f"moa.{therapy|disease|gene}:{_sanitize_name(name)}"

@ahwagner is this okay? Would you like something different? We require an id for these concepts in the database.

@ahwagner
Copy link
Member

Seems like a reasonable approach to me. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request priority:medium Medium priority
Projects
None yet
Development

No branches or pull requests

2 participants