-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add tabix indexing of gzipped VCFs in VariantBuilder #214
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #214 +/- ##
==========================================
+ Coverage 90.86% 90.96% +0.09%
==========================================
Files 18 18
Lines 2289 2292 +3
Branches 339 340 +1
==========================================
+ Hits 2080 2085 +5
+ Misses 137 136 -1
+ Partials 72 71 -1 ☔ View full report in Codecov by Sentry. |
WalkthroughThe pull request introduces modifications to the VCF file handling in the ✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
fgpyo/vcf/builder.py (1)
283-285
: Add error handling for tabix indexing.Add try-except to handle potential indexing failures gracefully.
if str(path.suffix) == ".gz": + try: pysam.tabix_index(str(path), preset="vcf") + except Exception as e: + raise RuntimeError(f"Failed to create tabix index: {e}")tests/fgpyo/vcf/test_builder.py (1)
255-272
: Enhance test coverage with edge cases.Add tests for:
- Empty VCF file
- Invalid coordinates (e.g., negative positions)
- Regions spanning chromosome boundaries
+ # Test empty VCF + empty_vcf = temp_path / "empty.vcf.gz" + empty_builder = VariantBuilder() + empty_builder.to_path(empty_vcf) + with vcf_reader(empty_vcf) as reader: + assert len(list(reader.fetch(contig="chr1", start=1, end=1000))) == 0 + + # Test invalid coordinates + with pytest.raises(ValueError): + with vcf_reader(vcf) as reader: + list(reader.fetch(contig="chr1", start=-1, end=1000))
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
fgpyo/vcf/builder.py
(3 hunks)tests/fgpyo/vcf/test_builder.py
(2 hunks)
🧰 Additional context used
🪛 GitHub Check: codecov/patch
fgpyo/vcf/builder.py
[warning] 298-298: fgpyo/vcf/builder.py#L298
Added line #L298 was not covered by tests
🔇 Additional comments (2)
fgpyo/vcf/builder.py (2)
17-19
: LGTM!Required imports for VCF handling.
298-298
: Add test coverage for temporary file creation.Line not covered by tests.
✅ Verification successful
Temporary file creation is covered by integration tests.
Tests
test_zero_sample_vcf_round_trip
andtest_indexing_gzipped_vcf
verify both compressed and uncompressed VCF handling.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify test coverage for temporary file creation rg -A 5 "NamedTemporaryFile" tests/Length of output: 4339
Script:
#!/bin/bash # Find and examine VCF builder tests fd test_builder.py tests/fgpyo/vcf/ --exec cat {}Length of output: 14018
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 298-298: fgpyo/vcf/builder.py#L298
Added line #L298 was not covered by tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to find one comment right? Thank-you for adding this!
fgpyo/vcf/builder.py
Outdated
@@ -279,6 +280,9 @@ def to_path(self, path: Optional[Path] = None) -> Path: | |||
for variant in self.to_sorted_list(): | |||
writer.write(variant) | |||
|
|||
if str(path.suffix) == ".gz": | |||
pysam.tabix_index(str(path), preset="vcf") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit because this is on the builder class and we control the file :P
Will this fail if the file is already indexed (idk)? If it isn't block-compressed (probably)?
Can you add to the docstring that indexing will occur if the VCF is gz?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added to the docstring.
We only partially control the path - a user can request it be written to anywhere, and may specifically ask for a non-gzipped path, so I think the test has to remain.
I added force=True
which will cause it to override any existing index (e.g. if the user repeatedly write different VCFs to the same path). It will fail if the file isn't block-compressed ... but samtools' VCF writing writes bgzip for .gz
extensions, so yay?
24122f4
to
75523b7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
fgpyo/vcf/builder.py (1)
271-276
: Add index file location to docstring.Add that the index file is created at
<path>.tbi
.If the path given ends in ".gz" then the generated file will be bgzipped and -a tabix index generated for the file with the suffix ".gz.tbi". +a tabix index will be generated at <path>.tbi.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
fgpyo/vcf/builder.py
(4 hunks)tests/fgpyo/vcf/test_builder.py
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/fgpyo/vcf/test_builder.py
🧰 Additional context used
🪛 GitHub Check: codecov/patch
fgpyo/vcf/builder.py
[warning] 303-303: fgpyo/vcf/builder.py#L303
Added line #L303 was not covered by tests
🔇 Additional comments (2)
fgpyo/vcf/builder.py (2)
17-19
: LGTM! Clean import structure.
303-303
: Add test coverage for temporary file creation.Line creating temporary gzipped VCF isn't tested.
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 303-303: fgpyo/vcf/builder.py#L303
Added line #L303 was not covered by tests
fgpyo/vcf/builder.py
Outdated
if str(path.suffix) == ".gz": | ||
pysam.tabix_index(str(path), preset="vcf") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add error handling for tabix indexing.
Indexing will fail if file isn't block-compressed. Add try-catch.
if str(path.suffix) == ".gz":
- pysam.tabix_index(str(path), preset="vcf")
+ try:
+ pysam.tabix_index(str(path), preset="vcf")
+ except Exception as e:
+ raise ValueError(f"Failed to create tabix index: {e}")
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
if str(path.suffix) == ".gz": | |
pysam.tabix_index(str(path), preset="vcf") | |
if str(path.suffix) == ".gz": | |
try: | |
pysam.tabix_index(str(path), preset="vcf") | |
except Exception as e: | |
raise ValueError(f"Failed to create tabix index: {e}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
fgpyo/vcf/builder.py (1)
288-290
:⚠️ Potential issueAdd error handling for tabix indexing.
Add try-catch to handle non-block-compressed files.
if str(path.suffix) == ".gz": - pysam.tabix_index(str(path), preset="vcf", force=True) + try: + pysam.tabix_index(str(path), preset="vcf", force=True) + except Exception as e: + raise ValueError(f"Failed to create tabix index: {e}")
🧹 Nitpick comments (1)
fgpyo/vcf/builder.py (1)
271-276
: Add .tbi extension to docstring.Mention that the tabix index will have
.tbi
extension.- If the path given ends in ".gz" then the generated file will be bgzipped and - a tabix index generated for the file with the suffix ".gz.tbi". + If the path given ends in ".gz" then the generated file will be bgzipped and + a tabix index will be generated as a companion file with the suffix ".tbi".
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
fgpyo/vcf/builder.py
(4 hunks)
🧰 Additional context used
🪛 GitHub Check: codecov/patch
fgpyo/vcf/builder.py
[warning] 303-303: fgpyo/vcf/builder.py#L303
Added line #L303 was not covered by tests
🔇 Additional comments (2)
fgpyo/vcf/builder.py (2)
17-19
: LGTM! Clean imports for new functionality.
303-303
: Add test coverage for temporary file creation.Line creating temporary .vcf.gz file needs test coverage.
Run this to verify current coverage:
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 303-303: fgpyo/vcf/builder.py#L303
Added line #L303 was not covered by tests
No description provided.