Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add parser for Oxford Nanopore Mk1C barcoded output #121

Open
pvanheus opened this issue Nov 9, 2021 · 0 comments
Open

Add parser for Oxford Nanopore Mk1C barcoded output #121

pvanheus opened this issue Nov 9, 2021 · 0 comments
Labels
enhancement New feature or request

Comments

@pvanheus
Copy link
Contributor

pvanheus commented Nov 9, 2021

Describe your idea for a new feature

The Oxford Nanopore Mk1C outputs basecalled fastq into this directory structure:

basecalling/
   sequencing_summary.txt
   fail/
   pass/
     barcode01/
     barcode02/
     barcode03/
     ...
     unclassified/

where the sequencing_summary.txt is a TSV with a row for each read, run ID, barcode, barcode kit and other info and in each barcodeXX directory there are files like this:

fastq_runid_3766493bc5400568a050ace2e0fd4b3a4040cfca_9_0.fastq.gz
fastq_runid_3766493bc5400568a050ace2e0fd4b3a4040cfca_8_0.fastq.gz

where each read in the FASTQ has a header like:

@a7b5abc5-0903-48b6-939d-4f9bbce75011 runid=3766493bc5400568a050ace2e0fd4b3a4040cfca sampleid=no_sample read=190698 ch=308 start_time=2021-09-21T20:15:13Z barcode=barcode01

The run ID seems to be a hash or UUID of some sort. It might be usable as a sequence identifier, or perhaps, like with Git commit IDs, the first 7 characters can be used. Each barcode directory contains gzipped FASTQ files, with filenames based on the run ID and a set of numbers - X_Y .

A parser for this data format to feed this data into the IRIDA Uploader would be very useful. This parser could optionally use a SampleList.csv inserted into the top level (basecalling) directory that could associate sample IDs with barcode IDs. Alternatively, sample IDs could be generated from the run ID and the barcode ID. The contents of each barcodeXX directory should be concatenated into a single gzipped FASTQ file prior to upload to IRIDA.

Additional information

For implementation, consult the existing code in the parsers directory

@pvanheus pvanheus added the enhancement New feature or request label Nov 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant