Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

samplesheet format is not recognised #65

Closed
bounlu opened this issue Jul 15, 2024 · 8 comments
Closed

samplesheet format is not recognised #65

bounlu opened this issue Jul 15, 2024 · 8 comments
Labels
bug Something isn't working

Comments

@bounlu
Copy link

bounlu commented Jul 15, 2024

Description of the bug

Input samplesheet format provided on the Documentation does not work for me. I tried to delete subject_id and sample_name individually and it still failed, but it worked when I deleted both for all samples.

Command used and terminal output

$ ./run_oncoanalyser.sh

 N E X T F L O W   ~  version 24.06.0-edge

Pulling nf-core/oncoanalyser ...
 Already-up-to-date
Launching `https://github.com/nf-core/oncoanalyser` [big_church] DSL2 - revision: 2f86f87702 [dev]

ERROR ~ missing 'library_id' info field for compass TUMOR/DNA

 -- Check '.nextflow.log' file for details
$ ./run_oncoanalyser.sh

 N E X T F L O W   ~  version 24.06.0-edge

Pulling nf-core/oncoanalyser ...
 Already-up-to-date
Launching `https://github.com/nf-core/oncoanalyser` [curious_maxwell] DSL2 - revision: 2f86f87702 [dev]

ERROR ~ got unexpected subject name for compass 220123565: 220324109

 -- Check '.nextflow.log' file for details
$ ./run_oncoanalyser.sh

 N E X T F L O W   ~  version 24.06.0-edge

Pulling nf-core/oncoanalyser ...
 Already-up-to-date
Launching `https://github.com/nf-core/oncoanalyser` [stoic_leakey] DSL2 - revision: 2f86f87702 [dev]

ERROR ~ got unexpected sample name for compass TUMOR/DNA: 220324109_3_umi

 -- Check '.nextflow.log' file for details
$ ./run_oncoanalyser.sh

 N E X T F L O W   ~  version 24.06.0-edge

Pulling nf-core/oncoanalyser ...
 Already-up-to-date
Launching `https://github.com/nf-core/oncoanalyser` [jolly_archimedes] DSL2 - revision: 2f86f87702 [dev]

ERROR ~ got unexpected sample name for compass TUMOR/DNA: 220324109

 -- Check '.nextflow.log' file for details
$ ./run_oncoanalyser.sh

 N E X T F L O W   ~  version 24.06.0-edge

Pulling nf-core/oncoanalyser ...
 Already-up-to-date

Relevant files

group_id,subject_id,sample_id,sample_type,sequence_type,filetype,info,filepath
my_project,220836103,220836103_3_umi,tumor,dna,fastq,library_id:12;lane:L02,/data/220836103_1.fq.gz;/data/220836103_2.fq.gz
my_project,220324109,220024109_3_umi,tumor,dna,fastq,library_id:13;lane:L02,/data/220024109_1.fq.gz;/data/220024109_2.fq.gz
my_project,220324466,220024466_3_umi,tumor,dna,fastq,library_id:14;lane:L02,/data/220024466_1.fq.gz;/data/220024466_2.fq.gz
my_project,220325489,220024489_2_umi,tumor,dna,fastq,library_id:15;lane:L02,/data/220024489_1.fq.gz;/data/220024489_2.fq.gz
my_project,220326755,220024755_2_umi,tumor,dna,fastq,library_id:16;lane:L02,/data/220024755_1.fq.gz;/data/220024755_2.fq.gz
my_project,220327052,220025052_2_umi,tumor,dna,fastq,library_id:17;lane:L02,/data/220025052_1.fq.gz;/data/220025052_2.fq.gz

System information

24.06.0-edge
Server
local
Docker
Linux
dev

@bounlu bounlu added the bug Something isn't working label Jul 15, 2024
@scwatts
Copy link
Collaborator

scwatts commented Jul 15, 2024

Thanks for the report @bounlu. Can you confirm that samplesheet is definitely being provided to oncoanalyser in your execution script as --input /path/to/samplesheet.csv?

If you can provide the exact full command used to invoke oncoanalyser (e.g. nextflow run nf-core/oncoanalyser ...) along with the referenced inputs (samplesheet, config), I'll be able to help further.

@bounlu
Copy link
Author

bounlu commented Jul 16, 2024

Yes the samplesheet is provided to the oncoanalyser because every time I change something in the samplesheet and I get a different error.

Here is the full command:

#!/bin/bash

nextflow run nf-core/oncoanalyser \
-latest \
-profile docker \
--mode 'targeted' \
--genome 'GRCh38_hmf' \
--panel 'tso500' \
--input '/home/github/nf-core/samplesheet_oncoanalyser.csv' \
--outdir '/data/nextflow/oncoanalyser/my_project/results/' \
-work-dir '/data/nextflow/oncoanalyser/my_project/work/' \
-c '/home/github/nf-core/custom_local.config' \
-r dev \
-resume

I already provided the samplesheet above and the config_local file has no issues as I use the same for all.

@scwatts
Copy link
Collaborator

scwatts commented Jul 16, 2024

Thanks for the extra info. I noticed that the error message above were referencing compass entries in the samplesheet that weren't present in the one provided. Putting that aside, I've now tested your samplesheet and can see what is going wrong.

The samplesheet isn't considered valid as there are multiple tumor DNA samples given for a single analysis group, which is determined by values in the group_id column and in this case is my_project.

Since all of your tumor DNA samples are singletons, you can fix your samplesheet by setting a unique group_id value for each, e.g.:

group_id,subject_id,sample_id,sample_type,sequence_type,filetype,info,filepath
220123565,220123565,220836103_3_umi,tumor,dna,fastq,library_id:12;lane:L02,/data/220836103_1.fq.gz;/data/220836103_2.fq.gz
220324109,220324109,220024109_3_umi,tumor,dna,fastq,library_id:13;lane:L02,/data/220024109_1.fq.gz;/data/220024109_2.fq.gz
220324466,220324466,220024466_3_umi,tumor,dna,fastq,library_id:14;lane:L02,/data/220024466_1.fq.gz;/data/220024466_2.fq.gz
220325489,220325489,220024489_2_umi,tumor,dna,fastq,library_id:15;lane:L02,/data/220024489_1.fq.gz;/data/220024489_2.fq.gz
220326755,220326755,220024755_2_umi,tumor,dna,fastq,library_id:16;lane:L02,/data/220024755_1.fq.gz;/data/220024755_2.fq.gz
220327052,220327052,220025052_2_umi,tumor,dna,fastq,library_id:17;lane:L02,/data/220025052_1.fq.gz;/data/220025052_2.fq.gz

I also see that the entry with subject_id of 220123565 has a different sample_id pattern compared to the others, not sure whether this is intentional but figure it's worth pointing out just in case.

Input samplesheet format provided on the Documentation does not work for me

Are you finding that the exact samplesheet given in the documentation isn't working or that you hadn't been able to use it as a template successfully to create your own?

@bounlu
Copy link
Author

bounlu commented Jul 16, 2024

Thanks a lot for the quick reply. I intentionally changed the sample ids and names to ambiguate the information hence the naming irregularities you observed.

I think what I needed is this:

The samplesheet isn't considered valid as there are multiple tumor DNA samples given for a single analysis group, which is determined by values in the group_id column and in this case is my_project.

This explains the error I got, I will try to assign unique group id per sample.

Thanks for the help.

@scwatts
Copy link
Collaborator

scwatts commented Jul 16, 2024

No worries, and to complete the explanation regarding grouping - in other cases you may want multiple samples to be part of the same analysis group e.g. a WGS tumor/normal pair must be provided under the same group_id value otherwise they'd be treated separately as individual tumor-only and normal-only samples.

@scwatts
Copy link
Collaborator

scwatts commented Aug 1, 2024

Closing this as resolved, please reopen if needed

@scwatts scwatts closed this as completed Aug 1, 2024
@davidmasp
Copy link

Hi Stephen, this is very useful! I also stumbled into this issue. Thanks for the clear explanation.

I have a question about this,

in other cases you may want multiple samples to be part of the same analysis group e.g. a WGS tumor/normal pair must be provided under the same group_id value otherwise they'd be treated separately as individual tumor-only and normal-only samples.

Does this mean that for the case Normal1 - T1 and Normal1 - T2 the fqs from the normal sample will be processed two times or does it account for the fact that the files will have the same path etc.? Hope this makes sense.

Thanks again

@scwatts
Copy link
Collaborator

scwatts commented Oct 17, 2024

Hi @davidmasp, in the example you gave the normal sample will be processed twice.

I have planned to improve this aspect and the starting point for that would be alignment / post-processing since these are the most resource intense steps. You can effectively do this manually though it is a bit of work i.e. perform only alignment and MarkDups without redundancy of normals in the samplesheet and then run oncoanalyser with these generated BAMs as you had intended.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants