Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MERGE_CFF not performed for certain samples #135

Open
anoronh4 opened this issue Jan 29, 2025 · 0 comments
Open

MERGE_CFF not performed for certain samples #135

anoronh4 opened this issue Jan 29, 2025 · 0 comments
Labels
bug Something isn't working

Comments

@anoronh4
Copy link
Collaborator

Description of the bug

currently MERGE_CFF is not kicking off for certain samples even though upstream processes STARFUSION_TO_CFF, ARRIBA_TO_CFF and FUSIONCATCHER_TO_CFF have completed for a given sample. The issue seems to be occurring here:

MERGE_CFF(
ARRIBA_TO_CFF.out.cff
.map{ meta, file -> [meta, file]}
.mix(
FUSIONCATCHER_TO_CFF.out.cff
.map{ meta, file -> [meta, file]}
).mix(
STARFUSION_TO_CFF.out.cff
.map{ meta, file -> [meta, file]}
).groupTuple(by:[0],size:numcallers),
)

with the mix operators working correctly but the groupTuple operation is not emitting the expected result each time. confusingly, resumed runs will have different samples running MERGE_CFF and other samples that never launch the task.

after some troubleshooting i found that the error originates from inconsistent values in the meta map, specifically read_group and fastq_pair_id. since the meta map is used as the key in the groupTuple operation, it needs to be consistent across each sample. The value is inconsistent because it is how it is grouped in another subworkflow:

grouped_reads = ungrouped_reads
.map{ meta, reads ->
def read_group = meta.read_group
def fastq_pair_id = meta.fastq_pair_id
def meta_clone = meta.clone().findAll { !["read_group","fastq_pair_id"].contains(it.key) }
meta_clone.id = meta.sample
[groupKey(meta_clone,meta.fq_num), reads, read_group, fastq_pair_id]
}.groupTuple(by:[0])
.map{ meta, reads, read_group, fastq_pair_id ->
meta = meta + [read_group:read_group.join(','), fastq_pair_id:fastq_pair_id.join(',')]
[meta, reads.flatten()]
}

here read_group and fastq_pair_id in the meta map is replaced after grouping with a string that concatenates the read group and fastq pair ids strings. they are sometimes not concatenated in the same order. sort() is expected to fix the problem.

Command used and terminal output

Relevant files

No response

System information

No response

@anoronh4 anoronh4 added the bug Something isn't working label Jan 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant