You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
where the sequencing_summary.txt is a TSV with a row for each read, run ID, barcode, barcode kit and other info and in each barcodeXX directory there are files like this:
The run ID seems to be a hash or UUID of some sort. It might be usable as a sequence identifier, or perhaps, like with Git commit IDs, the first 7 characters can be used. Each barcode directory contains gzipped FASTQ files, with filenames based on the run ID and a set of numbers - X_Y .
A parser for this data format to feed this data into the IRIDA Uploader would be very useful. This parser could optionally use a SampleList.csv inserted into the top level (basecalling) directory that could associate sample IDs with barcode IDs. Alternatively, sample IDs could be generated from the run ID and the barcode ID. The contents of each barcodeXX directory should be concatenated into a single gzipped FASTQ file prior to upload to IRIDA.
Additional information
For implementation, consult the existing code in the parsers directory
The text was updated successfully, but these errors were encountered:
Describe your idea for a new feature
The Oxford Nanopore Mk1C outputs basecalled fastq into this directory structure:
where the
sequencing_summary.txt
is a TSV with a row for each read, run ID, barcode, barcode kit and other info and in each barcodeXX directory there are files like this:where each read in the FASTQ has a header like:
@a7b5abc5-0903-48b6-939d-4f9bbce75011 runid=3766493bc5400568a050ace2e0fd4b3a4040cfca sampleid=no_sample read=190698 ch=308 start_time=2021-09-21T20:15:13Z barcode=barcode01
The run ID seems to be a hash or UUID of some sort. It might be usable as a sequence identifier, or perhaps, like with Git commit IDs, the first 7 characters can be used. Each barcode directory contains gzipped FASTQ files, with filenames based on the run ID and a set of numbers - X_Y .
A parser for this data format to feed this data into the IRIDA Uploader would be very useful. This parser could optionally use a
SampleList.csv
inserted into the top level (basecalling
) directory that could associate sample IDs with barcode IDs. Alternatively, sample IDs could be generated from the run ID and the barcode ID. The contents of each barcodeXX directory should be concatenated into a single gzipped FASTQ file prior to upload to IRIDA.Additional information
For implementation, consult the existing code in the parsers directory
The text was updated successfully, but these errors were encountered: