Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Account for Genome Nexus Timeout issue - check if all the variants in the MAF are annotated #206

Open
rmadupuri opened this issue Jul 26, 2022 · 0 comments

Comments

@rmadupuri
Copy link

Issue:
Sometimes when annotating a huge MAF, some variants do not get a protein change or the annotation times out. Re-running those failed records produces the amino acid change.

Solution:
Would be nice to add a flag to the annotate subcommand or let the default behaviour be such that if all the records are not successfully annotated on the first attempt, then the script will continue running the annotator on the remaining unannotated records until no new annotated records are produced in further attempts.

Logic:

  • Let the annotation run normally the in the first attempt.
  • Subset the variants which failed annotation i.e, the Variant_Classification is not in ["Silent", "Intron", "3'UTR", "5'UTR", "3'Flank", "5'Flank", "IGR"] and the HGVSp_Short is empty or Annotation_Status is FAILED. Basically, split the MAF to annotated and unannotated parts.
  • Re-run the annotation of the unannotated MAF and merge the annotated records to the annotated MAF.
  • Repeat until all the variants are annotated or no new annotated records are produced in the further attempts.

Refer to the python implementation here for more details - https://github.com/cBioPortal/datahub-study-curation-tools/tree/master/GN-annotation-wrapper

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant