-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor bwa (resolves #320, resolves #297, resolves #322) #324
Conversation
I will give this a review... |
@fnothaft I can ping you once I remove the needs work label and tests pass. |
That would be superb! |
d75e51c
to
b812b86
Compare
@fnothaft — I spread out the history in an attempt to make it more readable, but its not perfect, so if something looks odd check the finished product. |
Thanks @jvivian; I'll make a pass shortly. |
Thanks! — I'm happy to squash commits or make changes to make the review easier. |
@jvivian that would be great actually. I think it might make sense to squash down:
Thoughts? |
Thanks @fnothaft I'll ping you when I've squashed and rebased. |
b812b86
to
c14ac3f
Compare
@fnothaft —Almost as asked. I had a demon commit (53f) that gave me merge conflicts with missing code anytime I moved commits around it or tried to squash it. |
Demon commit; nice! I'll take a look. |
inputs, | ||
[uuid, | ||
's3://{s3_bucket}/{sequence_dir}/{uuid}_1.fastq.gz'.format(**args), | ||
's3://{s3_bucket}/{sequence_dir}/{uuid}_2.fastq.gz'.format(**args)], | ||
's3://{s3_bucket}/alignment{dir_suffix}'.format(**args), | ||
rg_line).encapsulate() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens to rg_line
now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Er. Nevermind. I see that rg_line
is now being passed through inputs
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why isn't rg_line
just in inputs
to start with? If we're refactoring this, mightn't we factor out passing rg_line
as a parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why isn't rg_line just in inputs to start with?
Looking at the code, I'm not sure if it's even defined anywhere in align_and_call
.
rg_line
gets set from uuid_items[1]
, which is taken from uuid_rg
from uuid_list
, but uuid_list
is parsed from the manifest which has no entry for read group... (confused yet?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ace!
# If all inputs provided, download from S3 | ||
download_ref = job.wrapJobFn(download_url_job, inputs.ref, disk='3G') | ||
faidx = job.wrapJobFn(run_samtools_faidx, download_ref.rv()) | ||
# If all the bwa index files are provided |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Semantics, but I would note here that the FAI is considered separate from the BWA indices, so we'll check for that later.
Just made another pass @jvivian! Let me know your thoughts! |
Great feedback @fnothaft , thanks! I'll ping you once tests pass. I also pinged Hannes about the toil-scripts google group. |
@@ -2,8 +2,9 @@ | |||
### Guide: Running the BWA Pipeline using Toil | |||
|
|||
This guide attempts to walk the user through running this pipeline from start to finish. If there are any questions | |||
please contact John Vivian ([email protected]). If you find any errors or corrections please feel free to make a | |||
pull request. Feedback of any kind is appreciated. | |||
please contact the Toil team at: [email protected] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move to project level readme, no? We should have it there, and not duplicate it across all pipeline readmes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, I'll do that now.
Dropped two comments; otherwise branch is good to squash down and rebase from my side. |
c783e8c
to
48ffd14
Compare
@hannes-ucsc — This has been reviewed, squashed and rebased, and is ready to merge. |
Hmm, don't like how the commits are organized. I should be able to revert a commit and that should undo exactly one change. I bet you that 48ffd14 touches on all of the above commits. In that sense, I'd rather have just one giant commit. But now that I have to share my Merge button superpowers, diluted beyond recognition to a mere plus or minus one, I don't want to hold this up. Your thoughts, @fnothaft? |
I squashed the commits because I had to rebase and didn't want to On Wed, Jul 6, 2016, 7:00 PM Hannes Schmidt [email protected]
|
I strongly +1 the 1 commit = 1 change approach. That's how we run all of the bdgenomics repositories, and I'd love to have this be an official policy over in BD2KGenomics as well. |
@jvivian, no need to apologize. Organizing changes into commits is an art, especially when it comes to large refactorings. Given what @fnothaft said, I think squashing all commits down to one would be best. I do like your detailed commit messages, so be sure to merge them all together so as to not lose that information. |
48ffd14
to
7b316cc
Compare
Remove email as point of contact. #324 will add the Toil google group as point of contact in parent README.
Add missing uuid to config Fix bad unpacking Add sample URLs to logger Add suffix option to test Remove unused import Move google group line to parent README Add whitespace before map_job Add comment explaining map_job Fix missing trim attribute in test Clean up logic for reference indices Change point of contact to google group Improve legibility with comments on their own lines Unindent FollowOn job Add logging for index creation Add inline explaining alt file can't be generated Clarify that FAI is created separately from BWA indices Add docstring to move_file_job Add inline comment to explain disk space Reverse use of csv Replace run_bwakit arguments with config Fix imports and spacing in bwa_alignment Add /data/ prefix Add missing import Move run_bwa_index and run_samtools_faidx to indexing.py Remove unnecessary renaming Colocate parameter logic Add inline comments for conditionals and rg Provide clarity for rg_line Move required_length to common lib Remove bad print statement Add comments explaining conditional Clarify samples type Clarify file_size option Remove attempt at humor (+5 squashed commits) Squashed commits: Move bwa_kit to tool library (resolves #297) Add bwa_index and samtools_faidx to tool library Indent Job.Runner, add sanity checks options -> args (convention) Print docstring help if no arguments provided Remove config options Change config formatting Add tab-friendly config and manifest names Replace old bwa_kit job with download_sample_and_align Clean imports Move top docstring to main() Fix test call Fix adam_gatk_pipeline's call to download_reference_files Add job version of `move_files` Update README.md Remove deprecated launch scripts Require only reference (resolves #320) Change download_shared_files -> download_reference_files Remove old reference requirements Add single end support for BWA (resolves #322) Edit manifest docs to include single-end Add nargs range for single-end and paired-end Replace parse_config job with parse_manifest function
7b316cc
to
eadc6b5
Compare
@hannes-ucsc — Ready for review/merge |
No description provided.