`samplesheet` format is not recognised #65

bounlu · 2024-07-15T09:24:22Z

Description of the bug

Input samplesheet format provided on the Documentation does not work for me. I tried to delete subject_id and sample_name individually and it still failed, but it worked when I deleted both for all samples.

Command used and terminal output

$ ./run_oncoanalyser.sh

 N E X T F L O W   ~  version 24.06.0-edge

Pulling nf-core/oncoanalyser ...
 Already-up-to-date
Launching `https://github.com/nf-core/oncoanalyser` [big_church] DSL2 - revision: 2f86f87702 [dev]

ERROR ~ missing 'library_id' info field for compass TUMOR/DNA

 -- Check '.nextflow.log' file for details
$ ./run_oncoanalyser.sh

 N E X T F L O W   ~  version 24.06.0-edge

Pulling nf-core/oncoanalyser ...
 Already-up-to-date
Launching `https://github.com/nf-core/oncoanalyser` [curious_maxwell] DSL2 - revision: 2f86f87702 [dev]

ERROR ~ got unexpected subject name for compass 220123565: 220324109

 -- Check '.nextflow.log' file for details
$ ./run_oncoanalyser.sh

 N E X T F L O W   ~  version 24.06.0-edge

Pulling nf-core/oncoanalyser ...
 Already-up-to-date
Launching `https://github.com/nf-core/oncoanalyser` [stoic_leakey] DSL2 - revision: 2f86f87702 [dev]

ERROR ~ got unexpected sample name for compass TUMOR/DNA: 220324109_3_umi

 -- Check '.nextflow.log' file for details
$ ./run_oncoanalyser.sh

 N E X T F L O W   ~  version 24.06.0-edge

Pulling nf-core/oncoanalyser ...
 Already-up-to-date
Launching `https://github.com/nf-core/oncoanalyser` [jolly_archimedes] DSL2 - revision: 2f86f87702 [dev]

ERROR ~ got unexpected sample name for compass TUMOR/DNA: 220324109

 -- Check '.nextflow.log' file for details
$ ./run_oncoanalyser.sh

 N E X T F L O W   ~  version 24.06.0-edge

Pulling nf-core/oncoanalyser ...
 Already-up-to-date

Relevant files

group_id,subject_id,sample_id,sample_type,sequence_type,filetype,info,filepath
my_project,220836103,220836103_3_umi,tumor,dna,fastq,library_id:12;lane:L02,/data/220836103_1.fq.gz;/data/220836103_2.fq.gz
my_project,220324109,220024109_3_umi,tumor,dna,fastq,library_id:13;lane:L02,/data/220024109_1.fq.gz;/data/220024109_2.fq.gz
my_project,220324466,220024466_3_umi,tumor,dna,fastq,library_id:14;lane:L02,/data/220024466_1.fq.gz;/data/220024466_2.fq.gz
my_project,220325489,220024489_2_umi,tumor,dna,fastq,library_id:15;lane:L02,/data/220024489_1.fq.gz;/data/220024489_2.fq.gz
my_project,220326755,220024755_2_umi,tumor,dna,fastq,library_id:16;lane:L02,/data/220024755_1.fq.gz;/data/220024755_2.fq.gz
my_project,220327052,220025052_2_umi,tumor,dna,fastq,library_id:17;lane:L02,/data/220025052_1.fq.gz;/data/220025052_2.fq.gz

System information

24.06.0-edge
Server
local
Docker
Linux
dev

The text was updated successfully, but these errors were encountered:

scwatts · 2024-07-15T09:52:21Z

Thanks for the report @bounlu. Can you confirm that samplesheet is definitely being provided to oncoanalyser in your execution script as --input /path/to/samplesheet.csv?

If you can provide the exact full command used to invoke oncoanalyser (e.g. nextflow run nf-core/oncoanalyser ...) along with the referenced inputs (samplesheet, config), I'll be able to help further.

bounlu · 2024-07-16T01:25:27Z

Yes the samplesheet is provided to the oncoanalyser because every time I change something in the samplesheet and I get a different error.

Here is the full command:

#!/bin/bash

nextflow run nf-core/oncoanalyser \
-latest \
-profile docker \
--mode 'targeted' \
--genome 'GRCh38_hmf' \
--panel 'tso500' \
--input '/home/github/nf-core/samplesheet_oncoanalyser.csv' \
--outdir '/data/nextflow/oncoanalyser/my_project/results/' \
-work-dir '/data/nextflow/oncoanalyser/my_project/work/' \
-c '/home/github/nf-core/custom_local.config' \
-r dev \
-resume

I already provided the samplesheet above and the config_local file has no issues as I use the same for all.

scwatts · 2024-07-16T02:03:31Z

Thanks for the extra info. I noticed that the error message above were referencing compass entries in the samplesheet that weren't present in the one provided. Putting that aside, I've now tested your samplesheet and can see what is going wrong.

The samplesheet isn't considered valid as there are multiple tumor DNA samples given for a single analysis group, which is determined by values in the group_id column and in this case is my_project.

Since all of your tumor DNA samples are singletons, you can fix your samplesheet by setting a unique group_id value for each, e.g.:

group_id,subject_id,sample_id,sample_type,sequence_type,filetype,info,filepath
220123565,220123565,220836103_3_umi,tumor,dna,fastq,library_id:12;lane:L02,/data/220836103_1.fq.gz;/data/220836103_2.fq.gz
220324109,220324109,220024109_3_umi,tumor,dna,fastq,library_id:13;lane:L02,/data/220024109_1.fq.gz;/data/220024109_2.fq.gz
220324466,220324466,220024466_3_umi,tumor,dna,fastq,library_id:14;lane:L02,/data/220024466_1.fq.gz;/data/220024466_2.fq.gz
220325489,220325489,220024489_2_umi,tumor,dna,fastq,library_id:15;lane:L02,/data/220024489_1.fq.gz;/data/220024489_2.fq.gz
220326755,220326755,220024755_2_umi,tumor,dna,fastq,library_id:16;lane:L02,/data/220024755_1.fq.gz;/data/220024755_2.fq.gz
220327052,220327052,220025052_2_umi,tumor,dna,fastq,library_id:17;lane:L02,/data/220025052_1.fq.gz;/data/220025052_2.fq.gz

I also see that the entry with subject_id of 220123565 has a different sample_id pattern compared to the others, not sure whether this is intentional but figure it's worth pointing out just in case.

Input samplesheet format provided on the Documentation does not work for me

Are you finding that the exact samplesheet given in the documentation isn't working or that you hadn't been able to use it as a template successfully to create your own?

bounlu · 2024-07-16T02:34:27Z

Thanks a lot for the quick reply. I intentionally changed the sample ids and names to ambiguate the information hence the naming irregularities you observed.

I think what I needed is this:

The samplesheet isn't considered valid as there are multiple tumor DNA samples given for a single analysis group, which is determined by values in the group_id column and in this case is my_project.

This explains the error I got, I will try to assign unique group id per sample.

Thanks for the help.

scwatts · 2024-07-16T05:54:38Z

No worries, and to complete the explanation regarding grouping - in other cases you may want multiple samples to be part of the same analysis group e.g. a WGS tumor/normal pair must be provided under the same group_id value otherwise they'd be treated separately as individual tumor-only and normal-only samples.

scwatts · 2024-08-01T05:13:42Z

Closing this as resolved, please reopen if needed

davidmasp · 2024-10-03T08:24:04Z

Hi Stephen, this is very useful! I also stumbled into this issue. Thanks for the clear explanation.

I have a question about this,

in other cases you may want multiple samples to be part of the same analysis group e.g. a WGS tumor/normal pair must be provided under the same group_id value otherwise they'd be treated separately as individual tumor-only and normal-only samples.

Does this mean that for the case Normal1 - T1 and Normal1 - T2 the fqs from the normal sample will be processed two times or does it account for the fact that the files will have the same path etc.? Hope this makes sense.

Thanks again

scwatts · 2024-10-17T22:01:42Z

Hi @davidmasp, in the example you gave the normal sample will be processed twice.

I have planned to improve this aspect and the starting point for that would be alignment / post-processing since these are the most resource intense steps. You can effectively do this manually though it is a bit of work i.e. perform only alignment and MarkDups without redundancy of normals in the samplesheet and then run oncoanalyser with these generated BAMs as you had intended.

bounlu added the bug Something isn't working label Jul 15, 2024

scwatts closed this as completed Aug 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`samplesheet` format is not recognised #65

`samplesheet` format is not recognised #65

bounlu commented Jul 15, 2024 •

edited

Loading

scwatts commented Jul 15, 2024

bounlu commented Jul 16, 2024 •

edited

Loading

scwatts commented Jul 16, 2024

bounlu commented Jul 16, 2024 •

edited

Loading

scwatts commented Jul 16, 2024

scwatts commented Aug 1, 2024

davidmasp commented Oct 3, 2024

scwatts commented Oct 17, 2024

samplesheet format is not recognised #65

samplesheet format is not recognised #65

Comments

bounlu commented Jul 15, 2024 • edited Loading

Description of the bug

Command used and terminal output

Relevant files

System information

scwatts commented Jul 15, 2024

bounlu commented Jul 16, 2024 • edited Loading

scwatts commented Jul 16, 2024

bounlu commented Jul 16, 2024 • edited Loading

scwatts commented Jul 16, 2024

scwatts commented Aug 1, 2024

davidmasp commented Oct 3, 2024

scwatts commented Oct 17, 2024

`samplesheet` format is not recognised #65

`samplesheet` format is not recognised #65

bounlu commented Jul 15, 2024 •

edited

Loading

bounlu commented Jul 16, 2024 •

edited

Loading

bounlu commented Jul 16, 2024 •

edited

Loading