Add parser for Oxford Nanopore Mk1C barcoded output #121

pvanheus · 2021-11-09T13:33:03Z

Describe your idea for a new feature

The Oxford Nanopore Mk1C outputs basecalled fastq into this directory structure:

basecalling/
   sequencing_summary.txt
   fail/
   pass/
     barcode01/
     barcode02/
     barcode03/
     ...
     unclassified/

where the sequencing_summary.txt is a TSV with a row for each read, run ID, barcode, barcode kit and other info and in each barcodeXX directory there are files like this:

fastq_runid_3766493bc5400568a050ace2e0fd4b3a4040cfca_9_0.fastq.gz
fastq_runid_3766493bc5400568a050ace2e0fd4b3a4040cfca_8_0.fastq.gz

where each read in the FASTQ has a header like:

@a7b5abc5-0903-48b6-939d-4f9bbce75011 runid=3766493bc5400568a050ace2e0fd4b3a4040cfca sampleid=no_sample read=190698 ch=308 start_time=2021-09-21T20:15:13Z barcode=barcode01

The run ID seems to be a hash or UUID of some sort. It might be usable as a sequence identifier, or perhaps, like with Git commit IDs, the first 7 characters can be used. Each barcode directory contains gzipped FASTQ files, with filenames based on the run ID and a set of numbers - X_Y .

A parser for this data format to feed this data into the IRIDA Uploader would be very useful. This parser could optionally use a SampleList.csv inserted into the top level (basecalling) directory that could associate sample IDs with barcode IDs. Alternatively, sample IDs could be generated from the run ID and the barcode ID. The contents of each barcodeXX directory should be concatenated into a single gzipped FASTQ file prior to upload to IRIDA.

Additional information

For implementation, consult the existing code in the parsers directory

The text was updated successfully, but these errors were encountered:

pvanheus added the enhancement New feature or request label Nov 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add parser for Oxford Nanopore Mk1C barcoded output #121

Add parser for Oxford Nanopore Mk1C barcoded output #121

pvanheus commented Nov 9, 2021

Add parser for Oxford Nanopore Mk1C barcoded output #121

Add parser for Oxford Nanopore Mk1C barcoded output #121

Comments

pvanheus commented Nov 9, 2021

Describe your idea for a new feature

Additional information