Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vv ensembl dev susmi #660

Open
wants to merge 82 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
82 commits
Select commit Hold shift + click to select a range
4e2e82e
Fixes that overcome NR transcripts with LOC based gene symbols in gen…
Peter-J-Freeman Dec 2, 2024
dbfb7b1
Fixes the mapping of NC_000009.12:g.92474742delinsATCA back to NM_017…
Peter-J-Freeman Dec 3, 2024
033d948
Add tweaks to genes2transcripts to handle gene symbols that are updat…
Peter-J-Freeman Dec 4, 2024
45f5d07
Update the code to accept intronic variants in transcripts with alter…
Peter-J-Freeman Dec 5, 2024
2570119
code that deals with protein references with nucleotide variant types
Peter-J-Freeman Dec 5, 2024
cf2cfe1
Update position added for Ter=
Peter-J-Freeman Dec 5, 2024
36d4237
Update vdb version and test issue 87
Peter-J-Freeman Dec 11, 2024
b3692f8
Changes to the code base to correct some unhandled descriptions e.g. …
Peter-J-Freeman Jan 9, 2025
cc05e3c
Expand code to handle expanded repeat syntax in allele descriptions
Peter-J-Freeman Jan 14, 2025
9a7e4b2
add in code to deal with common HGVS early stage typos like double co…
Peter-J-Freeman Jan 16, 2025
95d6591
Additional code changes to handle failed variants reported by the LOV…
Peter-J-Freeman Jan 17, 2025
64b5132
Final commit before merge with parse bypass code
Peter-J-Freeman Jan 17, 2025
b3cd378
Reduce SeqRepo calls on hgvs to VCF mapping
John-F-Wagstaff Oct 30, 2024
d274486
Attempt to reduce SeqRepo calls when shifting
John-F-Wagstaff Oct 31, 2024
ed61716
Don't re-build existing hgvs objects in mappers
John-F-Wagstaff Nov 5, 2024
a80c386
Add a 'All' valid genomes flag to report_hgvs2vcf
John-F-Wagstaff Nov 7, 2024
446074c
Exploit new report_hgvs2vcf flag for output
John-F-Wagstaff Nov 7, 2024
5ef024c
Reduce unnecessary vcf re-generation from liftover
John-F-Wagstaff Nov 8, 2024
ca23057
Use existing genomic vcf data in liftover
John-F-Wagstaff Nov 8, 2024
c6d7a8f
Add validation skip option, use in map fallback
John-F-Wagstaff Nov 8, 2024
5da3492
Avoid parsing of hgvs from text hgvs_utils 1
John-F-Wagstaff Nov 11, 2024
3e9eaee
Avoid parsing of hgvs from text hgvs_utils 2
John-F-Wagstaff Nov 11, 2024
0f83cd6
Add hgvs object span handling to parts>delins func
John-F-Wagstaff Nov 13, 2024
70a2476
Add object based equivalent to hgvs_dup2indel
John-F-Wagstaff Nov 13, 2024
cfd3c22
Upgrade delins creation in gapped_mapping.py
John-F-Wagstaff Nov 13, 2024
b48b22b
Add a function to build new variants from existing
John-F-Wagstaff Nov 14, 2024
b0f9abe
Improve hgvs obj handling for complex coordinates
John-F-Wagstaff Nov 18, 2024
cc06e5d
Don't re-parse hgvs obj from strings MixinConv 1
John-F-Wagstaff Nov 14, 2024
e140dfe
Don't re-parse hgvs obj from strings MixinConv 2
John-F-Wagstaff Nov 14, 2024
00ac401
Don't re-parse hgvs obj from strings MixinConv 3
John-F-Wagstaff Nov 14, 2024
f602f39
Add extra test for chr_to_rsg
John-F-Wagstaff Mar 2, 2025
8309a01
Allow obj for validateHGVS to avoid re-parse
John-F-Wagstaff Nov 14, 2024
5e536bf
Allow obj for genomic mapper to avoid re-parse
John-F-Wagstaff Nov 14, 2024
440b2c9
Return obj for chr_to_rsg to avoid re-parse
John-F-Wagstaff Nov 14, 2024
4d47b3f
More re-parsing reductions for mappers.py
John-F-Wagstaff Nov 14, 2024
c8cf78b
Reduce trivial re-parsing in vvMixinCore
John-F-Wagstaff Nov 15, 2024
45be50c
Prevent re-parsing of relevant_transcripts output
John-F-Wagstaff Nov 15, 2024
bd7ac94
Reduce re-parsing in vvMixinConverters.py
John-F-Wagstaff Nov 18, 2024
f921881
Do vcf style multi-alt variants without re-parsing
John-F-Wagstaff Nov 18, 2024
554a763
Further reduce re-parsing in format_converters.py
John-F-Wagstaff Nov 18, 2024
9749a3a
Remove last non-input re-parse in liftover
John-F-Wagstaff Nov 18, 2024
400badb
Improve ref-striping from hgvs obj
John-F-Wagstaff Nov 19, 2024
6559f60
Variant.genomic_g str->hgvs obj, reduce re-parsing
John-F-Wagstaff Nov 19, 2024
490575b
Don't search for gene symbol on chr/transcript
John-F-Wagstaff Nov 19, 2024
7453e27
Remove unneded re-parse from expanded_repeats
John-F-Wagstaff Nov 20, 2024
5804d54
Reduce re-parsing in gapped_mapping.py
John-F-Wagstaff Nov 22, 2024
d759d11
Remove final hgvs re-parsing from hgvs_utils.py
John-F-Wagstaff Nov 22, 2024
b7977bf
Don't unset ref for = type variants
John-F-Wagstaff Nov 25, 2024
36797e1
Convert hgvs transcript variation to hgvs object
John-F-Wagstaff Nov 25, 2024
8383088
Test improved alt ref != selected genomic ref code
John-F-Wagstaff Feb 12, 2025
3b3e3cf
Remove last hgvs obj re-parses from vvMixinInit.py
John-F-Wagstaff Nov 25, 2024
4d41452
Move hgvs obj conversion up a step in vvMixinCore
John-F-Wagstaff Nov 26, 2024
d41d1f7
Move abort for con type variants before parsing
John-F-Wagstaff Dec 13, 2024
fc3e7f8
Return early and reduce indent on non c prot map
John-F-Wagstaff Nov 26, 2024
26de224
Exploit early return to reduce indent in prot map
John-F-Wagstaff Nov 26, 2024
36c8476
Further use of returns to reduce indent in protmap
John-F-Wagstaff Nov 26, 2024
401e0fb
Add VVPosEdit for output formatting tweaks
John-F-Wagstaff Dec 5, 2024
d38b52b
Add helper functions for hgvs obj protein handling
John-F-Wagstaff Dec 5, 2024
81551f9
Switch protein to use hgvs obj to avoid re-parsing
John-F-Wagstaff Dec 5, 2024
5e3512b
Add test for RNA indel -> prot del case
John-F-Wagstaff Mar 2, 2025
3c1ef7b
Move expanded repeat formatting before obj convert
John-F-Wagstaff Dec 12, 2024
d79c2ed
Move input formatting before object creation point
John-F-Wagstaff Dec 6, 2024
006c44c
Move methyl syntax suffix handling before obj conv
John-F-Wagstaff Dec 13, 2024
ecb52c2
Update/improve tests for variant format_quibble
John-F-Wagstaff Feb 11, 2025
1a1b8e3
Reduce hgvs obj re-parsing in complex_descriptions
John-F-Wagstaff Nov 18, 2024
a9e431c
Slight output formatting improvement HGVS AA
John-F-Wagstaff Dec 23, 2024
a5d885b
Improve formatting hgvs output on prot/NA
John-F-Wagstaff Jan 8, 2025
b22dfdc
Improve handling of bad mappings in gapped_mapping
John-F-Wagstaff Jan 8, 2025
731878f
Fix handling of ref type/source with obj not str
John-F-Wagstaff Jan 8, 2025
bec39cf
Make refseq mistake finder work for txt *and* obj
John-F-Wagstaff Jan 8, 2025
8572b28
Harden initial convert to obj presence + parser
John-F-Wagstaff Jan 9, 2025
08e52c4
Handel t->c shift without declaring as a re-map
John-F-Wagstaff Jan 10, 2025
7a1326a
Harden against n variant issues in mapping
John-F-Wagstaff Jan 13, 2025
b797255
Add limited allele parser improvements
John-F-Wagstaff Jan 10, 2025
a491a81
Fix use_checking to handle object quibble as well
John-F-Wagstaff Jan 10, 2025
db70e6b
Reduce unneeded exon fetch and re-start in mappers
John-F-Wagstaff Jan 13, 2025
85e9474
Switch to early (singular) hgvs str->object parse
John-F-Wagstaff Jan 10, 2025
1e5d669
Add&use handing for checked variants resubmitted
John-F-Wagstaff Jan 10, 2025
1188b2a
Add methylation to VVPosEdit + PosEdit->VVPosEdit
John-F-Wagstaff Jan 20, 2025
b026ea2
Switch from text to VVPosEdit for methylation out
John-F-Wagstaff Jan 20, 2025
ed9ed24
Merge pull request #663 from openvar/JFW_reparse_reduction
Peter-J-Freeman Mar 4, 2025
40e9f66
update tests where gene symbol HIF1A\-AS3 changed to HIF1A\-AS1
Peter-J-Freeman Mar 4, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
312 changes: 250 additions & 62 deletions VariantValidator/modules/complex_descriptions.py

Large diffs are not rendered by default.

63 changes: 45 additions & 18 deletions VariantValidator/modules/expanded_repeats.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
import logging
import copy
from vvhgvs.assemblymapper import AssemblyMapper

from .hgvs_utils import hgvs_obj_from_existing_edit, hgvs_delins_parts_to_hgvs_obj
# Set up logger
logger = logging.getLogger(__name__)

Expand Down Expand Up @@ -123,6 +123,7 @@ def parse_repeat_variant(cls, variant_str, build, select_transcripts, validator)
>>>parse_repeat_variant("NM_024312.4:c.1_10A[10]")
"NM_024312.4", "c", "1_10" "A", "10", ""
"""

logger.info(f"Parsing variant: parse_repeat_variant({variant_str})")
# Strip any whitespace
variant_str = variant_str.strip()
Expand Down Expand Up @@ -329,9 +330,14 @@ def get_range_from_single_or_start_pos(self, validator):
# just dump range and re-build from the first base

if "_" in self.variant_position:
seq_check = validator.hp.parse(f"{self.reference}:{self.prefix}."
f"{self.variant_position}"
f"{self.repeat_sequence * int(self.copy_number)}=")
start_pos, _sep, end_pos = self.variant_position.partition('_')
seq_check = hgvs_delins_parts_to_hgvs_obj(
self.reference,
self.prefix,
start_pos,
f"{self.repeat_sequence * int(self.copy_number)}",
f"{self.repeat_sequence * int(self.copy_number)}",
end=end_pos)

self.original_position = copy.copy(self.variant_position)
intronic_genomic_variant = self.no_norm_evm.t_to_g(seq_check)
Expand Down Expand Up @@ -359,21 +365,29 @@ def get_range_from_single_or_start_pos(self, validator):
# Create a variant for mapping to the genome containing the whole repeat, we used
# to use only the first base of the repeat but this breaks on -1 mapping transcripts
# with multi-base repeats
map_var = f"{self.reference}:{self.prefix}."
length = len(self.repeat_sequence)
pos = self.variant_position
end = None
if length == 1:
map_var = map_var +f"{self.variant_position}{self.repeat_sequence}="
pos = self.variant_position
elif '+' in self.variant_position:
tx_pos,_sep,intron_offset = self.variant_position.partition('+')
intron_offset = int(intron_offset)
map_var = map_var +f"{tx_pos}+{intron_offset}_{tx_pos}+{str(intron_offset+length-1)}"+\
f"{self.repeat_sequence}="
pos = f"{tx_pos}+{intron_offset}"
end = f"{tx_pos}+{str(intron_offset+length-1)}"
elif '-' in self.variant_position:
tx_pos,_sep,intron_offset = self.variant_position.partition('-')
intron_offset = int(intron_offset)
map_var = map_var +f"{tx_pos}-{intron_offset}_{tx_pos}-{str(intron_offset-length+1)}"+\
f"{self.repeat_sequence}="
intronic_variant = validator.hp.parse(map_var)
pos = f"{tx_pos}-{intron_offset}"
end = f"{tx_pos}-{str(intron_offset-length+1)}"
intronic_variant = hgvs_delins_parts_to_hgvs_obj(
self.reference,
self.prefix,
pos,
self.repeat_sequence,
self.repeat_sequence,
end=end
)
intronic_genomic_variant = self.no_norm_evm.t_to_g(intronic_variant)
self.g_strand = validator.hdp.get_tx_exons(intronic_variant.ac, intronic_genomic_variant.ac,
validator.alt_aln_method)[0][3]
Expand Down Expand Up @@ -676,19 +690,32 @@ def check_exon_pos(exon_pos):
check_exon_pos(end)
return


def convert_tandem(variant, validator, build, my_all):
expanded_variant = TandemRepeats.parse_repeat_variant(variant.quibble, build, my_all, validator)
try:
expanded_variant = TandemRepeats.parse_repeat_variant(variant.quibble, build, my_all, validator)
except AttributeError:
expanded_variant = TandemRepeats.parse_repeat_variant(variant, build, my_all, validator)

if expanded_variant is False:
return False
else:
expanded_variant_string = expanded_variant.reformat(validator)
variant.expanded_repeat = {"variant": expanded_variant_string,
"position": expanded_variant.variant_position,
"copy_number": expanded_variant.copy_number,
"repeat_sequence": expanded_variant.repeat_sequence,
"reference": expanded_variant.reference,
"prefix": expanded_variant.prefix}
try:
variant.expanded_repeat = {"variant": expanded_variant_string,
"position": expanded_variant.variant_position,
"copy_number": expanded_variant.copy_number,
"repeat_sequence": expanded_variant.repeat_sequence,
"reference": expanded_variant.reference,
"prefix": expanded_variant.prefix}
except AttributeError:
expanded_repeat = {"variant": expanded_variant_string,
"position": expanded_variant.variant_position,
"copy_number": expanded_variant.copy_number,
"repeat_sequence": expanded_variant.repeat_sequence,
"reference": expanded_variant.reference,
"prefix": expanded_variant.prefix}
return expanded_repeat
return True


Expand Down
Loading