Vv ensembl dev susmi #660

Peter-J-Freeman · 2024-12-05T13:39:07Z

@John-F-Wagstaff. The main part of this commit is handling variants like NW_011332691.1(NM_012234.6):c.335+1G>C. See issue #657.

This transcript is on the reverse strand and is working. I have added in some tests too.

…es2transcripts functionality

…680.6:c.153G>T in issue #651

…ing and HGNC genes with no transcript info openvar/rest_variantValidator#186 and also handle the longer deletions in #651

…nate alignments in patches vs the primary assembly. Issue #657

gitguardian · 2024-12-05T13:39:16Z

⚠️ GitGuardian has uncovered 1 secret following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

🔎 Detected hardcoded secret in your pull request

GitGuardian id	GitGuardian status	Secret	Commit	Filename
1081033	Triggered	Generic Password	`d08f447`	db_dockerfiles/vvta/Dockerfile	View secret

🛠 Guidelines to remediate hardcoded secrets

Understand the implications of revoking this secret by investigating where it is used in your code.
Replace and store your secret safely. Learn here the best practices.
Revoke and rotate this secret.
If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider

following these best practices for managing and storing secrets including API keys and other credentials
install secret detection on pre-commit to catch secret before it leaves your machine and ease remediation.

^{🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.}

…3 prime UTRs in uncertain positions for the LOVD paper

…lons

…D team

This is used internally for validation as well as when presenting VCF data to users.

We attempt to push gap variants both left and right when validating, this used to result in up to 50 separate SeqRepo seq fetch calls. This commit replaces this with 1 50 base SeqRepo call. In the cases where we trigger gap movement attempts but only attempt 1 move Micro-benchmarking puts the 50bp fetch at about 4% slower than the 1bp fetch, but this commit will at least nearly half the time spent in SeqRepo seq fetch calls in every other case. The net effect on our input tests is a ~22% decrease in time taken to run, so this is probably a net gain.

This reduces the calls to parse hgvs objects from strings in mappings, by using the existing versions, where they exist already. First pass for improving efficiency/performance of genomic to transcript and transcript to genome mappings.

This will allow us to simplify the logic where we use report_hgvs2vcf to generate output later.

We used to call report_hgvs2vcf once per valid mapping genome, this change reduces this to just once and simplifies the logic around the genomic variant mapping output at the same time. Normally this will half the number of report_hgvs2vcf calls.

This also includes features to use existing validated VCF data when provided.

Don't double validate the same variant when doing fallback for RSG mapping recovery during hgvs to vcf conversion.

This patch reduces parsing hgvs objects from text by creating them directly from the available parts, in the function of hgvs_to_delins_hgvs in hgvs_utils, by replacing the function of vcfcp_to_hgvsstr with an new function of vcfcp_to_hgvs_obj.

Add a new hgvs_delins_parts_to_hgvs_obj function and use it instead of parse for the hgvs delins creation inside hgvs_utils.py. This is particularly used in hgvs<->vcf functions.

Also update docs a bit.

This avoids going from object to text only to then go back to object output almost all the time, allowing us to skip the unnecessary parser step.

Upgrade delins creation in gapped_mapping.py by using helper functions to directly create delins hgvs objects, as opposed to parsing the objects from text, which has performance costs.

This function hgvs_obj_from_existing_edit, is intended for when we are re-creating a variant with a new location, but the same edit.

Reduce the re-parsing of variants into hgvs delins objects via strings in vvMixinConverters.py, use hgvs_delins_parts_to_hgvs_obj instead.

We need slightly different bracket handling on predicted proteins, and different handling on */Ter= too. This allows us to fix the issue centrally rather than re-applying the edits in multiple places.

As yet unused, needed for object handling of protein variant mappings.

Expanded repeat formatting needs to happen before the final hgvs string to hgvs object conversion step, move it in preparation for switching this conversion to happening only once, inside initial format conversion.

Input formatting that depends on having a text quibble from user input needs to be completed before we parse into a hgvs object. Prepare for centralising input parsing by moving this kind of formatting before the intended hgvs string->hgvs object parsing point (at the end of the initial_format_conversions function).

Methyl syntax handling needs to happen before the first conversion into a hgvs object as the parser does not currently handle it.

Harden some initial input handling functions to cope with objects instead of strings, and add improved type aware posedit parsing.

Add required variables to handle methylation to the VVPosEdit, as well as a simple PosEdit to VVPosEdit function.

Jfw reparse reduction

Peter-J-Freeman added 4 commits December 2, 2024 15:21

Fixes that overcome NR transcripts with LOC based gene symbols in gen…

4e2e82e

…es2transcripts functionality

Fixes the mapping of NC_000009.12:g.92474742delinsATCA back to NM_017…

dbfb7b1

…680.6:c.153G>T in issue #651

Add tweaks to genes2transcripts to handle gene symbols that are updat…

033d948

…ing and HGNC genes with no transcript info openvar/rest_variantValidator#186 and also handle the longer deletions in #651

Update the code to accept intronic variants in transcripts with alter…

45f5d07

…nate alignments in patches vs the primary assembly. Issue #657

Peter-J-Freeman requested a review from John-F-Wagstaff December 5, 2024 13:39

Peter-J-Freeman and others added 24 commits December 5, 2024 14:30

code that deals with protein references with nucleotide variant types

2570119

Update position added for Ter=

cf2cfe1

Update vdb version and test issue 87

36d4237

Changes to the code base to correct some unhandled descriptions e.g. …

b3692f8

…3 prime UTRs in uncertain positions for the LOVD paper

Expand code to handle expanded repeat syntax in allele descriptions

cc05e3c

add in code to deal with common HGVS early stage typos like double co…

9a7e4b2

…lons

Additional code changes to handle failed variants reported by the LOV…

95d6591

…D team

Final commit before merge with parse bypass code

64b5132

Reduce SeqRepo calls on hgvs to VCF mapping

b3cd378

This is used internally for validation as well as when presenting VCF data to users.

Don't re-build existing hgvs objects in mappers

ed61716

This reduces the calls to parse hgvs objects from strings in mappings, by using the existing versions, where they exist already. First pass for improving efficiency/performance of genomic to transcript and transcript to genome mappings.

Add a 'All' valid genomes flag to report_hgvs2vcf

a80c386

This will allow us to simplify the logic where we use report_hgvs2vcf to generate output later.

Reduce unnecessary vcf re-generation from liftover

5ef024c

This also includes features to use existing validated VCF data when provided.

Use existing genomic vcf data in liftover

ca23057

Add validation skip option, use in map fallback

c6d7a8f

Don't double validate the same variant when doing fallback for RSG mapping recovery during hgvs to vcf conversion.

Avoid parsing of hgvs from text hgvs_utils 2

3e9eaee

Add a new hgvs_delins_parts_to_hgvs_obj function and use it instead of parse for the hgvs delins creation inside hgvs_utils.py. This is particularly used in hgvs<->vcf functions.

Add hgvs object span handling to parts>delins func

0f83cd6

Also update docs a bit.

Add object based equivalent to hgvs_dup2indel

70a2476

This avoids going from object to text only to then go back to object output almost all the time, allowing us to skip the unnecessary parser step.

Upgrade delins creation in gapped_mapping.py

cfd3c22

Upgrade delins creation in gapped_mapping.py by using helper functions to directly create delins hgvs objects, as opposed to parsing the objects from text, which has performance costs.

Add a function to build new variants from existing

b48b22b

This function hgvs_obj_from_existing_edit, is intended for when we are re-creating a variant with a new location, but the same edit.

Improve hgvs obj handling for complex coordinates

b0f9abe

Don't re-parse hgvs obj from strings MixinConv 1

cc06e5d

Reduce the re-parsing of variants into hgvs delins objects via strings in vvMixinConverters.py, use hgvs_delins_parts_to_hgvs_obj instead.

John-F-Wagstaff and others added 30 commits March 2, 2025 21:21

Move abort for con type variants before parsing

d41d1f7

Return early and reduce indent on non c prot map

fc3e7f8

Exploit early return to reduce indent in prot map

26de224

Further use of returns to reduce indent in protmap

36c8476

Add VVPosEdit for output formatting tweaks

401e0fb

We need slightly different bracket handling on predicted proteins, and different handling on */Ter= too. This allows us to fix the issue centrally rather than re-applying the edits in multiple places.

Add helper functions for hgvs obj protein handling

d38b52b

As yet unused, needed for object handling of protein variant mappings.

Switch protein to use hgvs obj to avoid re-parsing

81551f9

Add test for RNA indel -> prot del case

5e3512b

Move expanded repeat formatting before obj convert

3c1ef7b

Expanded repeat formatting needs to happen before the final hgvs string to hgvs object conversion step, move it in preparation for switching this conversion to happening only once, inside initial format conversion.

Move methyl syntax suffix handling before obj conv

006c44c

Methyl syntax handling needs to happen before the first conversion into a hgvs object as the parser does not currently handle it.

Update/improve tests for variant format_quibble

ecb52c2

Reduce hgvs obj re-parsing in complex_descriptions

1a1b8e3

Slight output formatting improvement HGVS AA

a9e431c

Improve formatting hgvs output on prot/NA

a5d885b

Improve handling of bad mappings in gapped_mapping

b22dfdc

Fix handling of ref type/source with obj not str

731878f

Make refseq mistake finder work for txt *and* obj

bec39cf

Harden initial convert to obj presence + parser

8572b28

Harden some initial input handling functions to cope with objects instead of strings, and add improved type aware posedit parsing.

Handel t->c shift without declaring as a re-map

08e52c4

Harden against n variant issues in mapping

7a1326a

Add limited allele parser improvements

b797255

Fix use_checking to handle object quibble as well

a491a81

Reduce unneeded exon fetch and re-start in mappers

db70e6b

Switch to early (singular) hgvs str->object parse

85e9474

Add&use handing for checked variants resubmitted

1e5d669

Add methylation to VVPosEdit + PosEdit->VVPosEdit

1188b2a

Add required variables to handle methylation to the VVPosEdit, as well as a simple PosEdit to VVPosEdit function.

Switch from text to VVPosEdit for methylation out

b026ea2

Merge pull request #663 from openvar/JFW_reparse_reduction

ed9ed24

Jfw reparse reduction

update tests where gene symbol HIF1A\-AS3 changed to HIF1A\-AS1

40e9f66

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vv ensembl dev susmi #660

Vv ensembl dev susmi #660

Peter-J-Freeman commented Dec 5, 2024

gitguardian bot commented Dec 5, 2024 •

edited

Loading

Vv ensembl dev susmi #660

Are you sure you want to change the base?

Vv ensembl dev susmi #660

Conversation

Peter-J-Freeman commented Dec 5, 2024

gitguardian bot commented Dec 5, 2024 • edited Loading

⚠️ GitGuardian has uncovered 1 secret following the scan of your pull request.

gitguardian bot commented Dec 5, 2024 •

edited

Loading