Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sanity checks in order to prevent pipeline termination errors #161

Open
LordRust opened this issue Jul 11, 2023 · 0 comments
Open

Add sanity checks in order to prevent pipeline termination errors #161

LordRust opened this issue Jul 11, 2023 · 0 comments
Assignees
Labels
bug Something isn't working good first issue Good issue to tackle for new contributors medium Medium priority

Comments

@LordRust
Copy link
Contributor

After running through 2000+ MRSA samples some exited with errors. Most of these can be prevented if the fastq files have some initial sanity checks and stops the isolate from further processing since the whole batch of samples stops when one error is encountered.

  • SKESA exits with "Reads are too short for selected minimal kmer length" if the fastq files only have very short reads. Typically a failed sequencing, so in most cases it should be enough that the fastq files has some minimum number of reads.
  • is the file that is referenced in the csv actually there? It might sound a bit to pedantic to check, but attached storage servers could be down, automated starts of analysis would fail if the demultiplexing does not result in a file for one sample in a run, etc.
  • in some samples a trailing space had snuck into the "id" column of the csv. This crashes the pipeline at a later stage since it is not built to handle whitespace in filename. Chomping the whitespace in the ends of the id should be fine, because no one should sort files by leading trailing or leading whitespace (like my old colleague Flemming did).
@ryanjameskennedy ryanjameskennedy added the high High priority label Aug 14, 2023
@ryanjameskennedy ryanjameskennedy added the good first issue Good issue to tackle for new contributors label Aug 29, 2023
@ryanjameskennedy ryanjameskennedy added this to the Pipeline efficiency milestone Nov 30, 2023
@ryanjameskennedy ryanjameskennedy added medium Medium priority and removed high High priority labels Nov 30, 2023
@ryanjameskennedy ryanjameskennedy self-assigned this Jan 9, 2025
@ryanjameskennedy ryanjameskennedy added improvement bug Something isn't working medium Medium priority and removed medium Medium priority improvement labels Jan 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good issue to tackle for new contributors medium Medium priority
Projects
None yet
Development

No branches or pull requests

2 participants