Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Ranged Download for BAM and VCF Files #22

Open
berntpopp opened this issue Sep 11, 2024 · 0 comments
Open

Feature Request: Ranged Download for BAM and VCF Files #22

berntpopp opened this issue Sep 11, 2024 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@berntpopp
Copy link
Member

Description:

Implement support for ranged downloads of BAM and VCF files using external tools like samtools and tabix. The new feature should allow users to specify genomic regions for download, making the process more efficient when only certain regions are required. This feature will involve creating a new utility script to handle the ranged requests and file management.

Acceptance Criteria:

  1. Tool Verification:

    • The script should first check if the required external tools (samtools and tabix) are available on the system.
    • Check if the tools meet the minimal required version. If the tools are not available or the versions are insufficient, the script should log an error and exit gracefully.
  2. Performing Ranged Requests:

    • After downloading the respective index files locally, the script should perform the ranged requests using:
      • samtools for BAM files.
      • tabix for VCF files.
    • The script should support commands like the following example for samtools:
      ```bash
      samtools view '' chr1:1-100000
      ```
    • The output should be the specified genomic region.
  3. File Naming:

    • The resulting files should be renamed by adding the genomic range to the file name in a normalized form (e.g., chr1_1_100000 before the file extension).
    • For example:
      • Input file: LB24-ONTCCMJH308-ready.bam
      • Output file after ranged request: LB24-ONTCCMJH308-ready.chr1_1_100000.bam
  4. Index File Handling:

    • Before performing the ranged requests, ensure the necessary index files (.bai or .tbi) are downloaded and available locally.
  5. Utility Script:

    • Create a new utility script to handle the ranged requests.
    • The script should handle downloading the index files and using the appropriate external commands for processing.
    • The script should log the process, including tool checks, the region being downloaded, and the final file output.

Sample Command Example:

```bash
samtools view '' chr1:1-100000
```

Additional Notes:

  • Ensure the process is well-logged and user-friendly.
  • The feature should be integrated in a way that it can be easily invoked within the existing system.
  • Implement error handling for situations where tools are missing or the ranged request fails.

Resources:


Tasks:

  1. Check for required tools (samtools, tabix) and versions.
  2. Implement ranged requests for BAM and VCF files.
  3. Handle renaming of files with the specified range.
  4. Ensure index files are downloaded before performing requests.
  5. Create detailed logging for all processes.
@berntpopp berntpopp added the enhancement New feature or request label Sep 11, 2024
@berntpopp berntpopp self-assigned this Sep 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant