-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dev #18
Conversation
* Use generic MSA name for MSA object, defined in __init__ * Much shorter consensus string generation code, with added unit tests to make sure nothing broken
* Move functions to test interval consistency out of class * Unit test them
* New object-oriented interval partitioning code * Add unit test for small consensus string which led to a bugfix
* Make unit tests work for new interval partitioning code * Use new interval partitioning code in make_prg code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I can't provide a thorough review right now. I should be able to get around to it next week.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't say I understand all the changes, but sounds like your testing has been thorough.
Nice changes from a software engineering point of view. Nice and readable. Should make any future work much easier to do!
self.stop += right_delta | ||
|
||
def contains(self, position: int): | ||
return self.start <= position <= self.stop |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be nice to add a docstring mentioning what sort of intervals this class stores. i.e. half-open, open, closed. It seems like this interval is a closed interval?
Also, just mentioning this incase you aren't aware of it: https://github.com/chaimleib/intervaltree
I use this for all interval work and it is super nice. I'm not suggesting you replace the class here or anything. Just putting it on your radar if it isn't already in case there is need for it down the line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, i added a docstring
return IntervalType.NonMatch | ||
else: | ||
return IntervalType.Match | ||
@classmethod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For future reference @bricoletc this can just be @staticmethod
. Random example: https://github.com/mbhall88/MemoryUnits/blob/18551958dbf51a9f68e8733ba17b880dbecc0da7/memory_units.py#L45
This PR has two main goals:
Added
IntervalPartitioner
object and associated classes, in interval_partitioner.pyThis object replaces the
partition_alignment_into_intervals
function, which was harder to understand,also makes it more easily to modify in order to address Stop producing empty alleles #17 and in future.
Also added more unit tests for interval partitioning.
Before this PR, results in:
AATAATAAT 5 AAATTTTGTATAAACT 6 AAATTTTATATAAACT 6 5 TTACCCTAG
(see the empty third allele`After this PR, results in :
AATAATAA 5 TAAATTTTGTATAAACT 6 TAAATTTTATATAAACT 6 T 5 TTACCCTAG
( the T got prepended)(max_nesting 5, min_match_len 7)
This does not mean #17 is fully resolved, i'll now run on real datasets
Modified
An alignment column of all '-' is no longer counted as a match. I cannot find a good reason for it to be but may be missing something?
Moved subcommand parsing to subcommands/ dir, results in less cluttered main()
get_consensus function shorter and easier to understand + add unit tests for it
refactored interval partition checks (checking each position of alignment is in one and one only interval,
and switching non-match to match intervals where needed) into own functions and unit tested them.
-v
for verbose switch added at top-level command lineOverall I strove to make the main module
make_prg_from_msa
smaller and terser, for easier reading/modifying later: went from 499 to 385 lines.