add subworkflow for functional enrichment analysis #7254

suzannejin · 2024-12-20T10:57:51Z

PR checklist

Closes #XXX

…ow locally implemented in differentialabundance. The original code was the result of the effort from multiple collaborators: Co-authored-by: Björn Langer <[email protected]> Co-authored-by: Cristina Araiz <[email protected]> Co-authored-by: Thomas Tams <[email protected]> Co-authored-by: Breeshey Roskams-Hieter <[email protected]>

…e snapshot

… method or not

…profiler2 that works when empty channels are given as Channel.of([], []), but stop working when empty channels are given as null, as combine/join methods cannot work on null

…work. Currently it is not running through GSEA modules -> check input channel

…lean code

…ults as the module itself too.

suzannejin · 2025-01-20T11:52:40Z

@pinin4fjords @grst @nschcolnicov @mirpedrol @JoseEspinosa Hello! Here you have the functional analysis subworkflow! Let me know what you think, specially in terms of optional input/output that would be needed given the fact that different functional analysis tools might need different inputs and produce different outputs.

On the other hand, the gprofiler2 test always fails in the CI here when using conda, as it produces png files with different md5. However, when I run it using Gitpod with conda, it always passes. Any idea of why? @pinin4fjords @nschcolnicov

pinin4fjords · 2025-01-20T13:05:46Z

Thanks @suzannejin ! I'll have a look closely soon.

PNGs are not snapshotable, they change across systems, and you've noted. The best you can do with those is check the name.

mirpedrol · 2025-01-20T15:34:34Z

subworkflows/nf-core/differential_functional_enrichment/main.nf

+    // here we define the input channel for the GSEA section
+
+    def criteria = multiMapCriteria { meta_input, input, meta_exp, samplesheet, featuresheet, features_id, features_symbol, meta_contrasts, variable, reference, target ->
+        def analysis_method = meta_input.method


is this needed if we merge meta_input which already contains the method key?

you are right! this was left from a previous implementation that need it explicit, but I can remove it now

mirpedrol · 2025-01-20T15:44:09Z

subworkflows/nf-core/differential_functional_enrichment/main.nf

+        ch_gene_sets.collect(),
+        ch_background.collect()


Is using collect() correct? won't it put everything (meta + files) into one single list?

the reason for using .collect() is that when ch_input has more than one element, but we only provide one ch_gene_sets elements, the module would still run as many times as ch_input elements.

I assume here that we are gonna only provide one gene_sets, this is likely the behavior for gprofiler2 and grea. For GSEA, the differentialabundance pipeline can take more than one gene_sets. Is there any reason for this difference? And how would you deal with this? @pinin4fjords

I have just replaced collect() by combining input with gene_sets before hand. This should work for either scenario :)

subworkflows/nf-core/differential_functional_enrichment/tests/all.config

grst · 2025-01-23T14:04:36Z

subworkflows/nf-core/differential_functional_enrichment/main.nf

+    ch_input                            // [ meta_input, input file, method to run ]
+
+    // gene sets and background
+    ch_gene_sets                        // [ meta_gmt, gmt file ]


Would this necessarily need to be a GMT file?
E.g. decoupler uses weighted gene sets for PROGENy and collecTRI analyses that are typically provided as a long-form data frame. If we included deconvolution as functional analysis tool, it would typically use a signature matrix.

It really depends a bit on the scope of this subworkflow. But if the plan is to support a wide range of functional analysis tools as suggested in nf-core/differentialabundance#367, it would be good to keep this generic.

Agreed, but I'm always wary of premature over-engineering. I don't object to gmt in the first instance.

Basically my suggestion comes down to making the gene sets + background method-specific. Already the current workflow logic doesn't really care about whether it's a gmt file or not. It's only specified in the comment.

Do you think it is possible to standardize the gene set input format for all methods, and then each module deals with the reformatting to the proper format specifically needed for the method?

Otherwise I would imagine it become confusing from the pipeline's user perspective to have to provide the input gene set with certain format depending on the method chosen, etc.

The "long-form dataframe" format used by decoupler is quite universal. It can cover signature matrices, weighted and signed gene sets.

This is also the format that can be obtained from omnipathdb via API. Omnipathdb contains most of the commonly used signatures, such as MSigDB, GO, Dorothea, Progeny. Like that users wouldn't need to obtain the genesets themselves, but just specify the name.

I agree that a common format makes sense, but I might still want to couple certain signatures to certain methods and not necessarily run all-vs-all. E.g. With PROGENy signatures, I'd typically use the recommended MLM algorithm in decoupler, while with MSigDB signatures, I'd rather use GSEA.

Not entirely sure yet if deconvolution is in the scope of this subworkflow, but here the methods are often shipped together with a signature matrix, so they wouldn't require any input signature at all.

I see your point. It would be really nice to include omnipathdb, then the coupling between signature and method can be automatized based on name.
For now, we could have a method specific gene sets channel just as input channel.

I'm not thrilled by the prospect of a multiplicity of gene_set inputs per method.

@suzannejin Maybe the gene sets should actually be part of ch_input. That way, there could be method-specific gene set files in that channel, and we wouldn't need multiple input channels

grst · 2025-01-23T14:09:32Z

subworkflows/nf-core/differential_functional_enrichment/main.nf

+    ch_background                       // [ meta_background, background file ]
+
+    // other - for the moment these files are only needed for GSEA
+    ch_contrasts                        // [ meta_contrast, contrast_variable, reference, target ]


(1) When providing results from differential expression analysis, a contrast would not be needed.
(2) When providing gene expression as input, differential testing is necessary that is not unlike DE analysis (it needs a model and a contrast definition).

IMO it would make sense to keep differential testing entirely out of this workflow. In case (1), it's not needed, and in case (2), a sample x signature matrix is produced. This matrix could just be fed into a differential analysis workflow (e.g. limma) again.

Like that we can keep the complexity of this subworkflow low.

I'm aware that the gsea module does support providing a contrast directly when using gene expression data. However, I believe that there's no point in using this mode.

In this case, GSEA anyway just computes a metric (signal2noise, t-statistic, ...) based on these variables (see docs), so we can as well provide a fold change or DE-statistic directly, while having the advantage that we can provide a full model definition including covariates to the DE method.

I'm aware that the gsea module does support providing a contrast directly when using gene expression data. However, I believe that there's no point in using this mode.

Fair point, but it's what diff. ab. does right now. I'd recognised the possbility of a switch, but hadn't got round to it: nf-core/differentialabundance#36

Fair point that we could model the current matrix-driven GSEA as part of the differential subworkflow, alongside LIMMA et al though.

I wanted the current subworkflow to be able to produce the exact behaviour of how the modules are used in the DA pipeline, hence GSEA is taking gene expression data instead of DE. Not sure why this is mode is chosen for the pipeline, maybe @pinin4fjords can step in here?

But yes, I do agree that it would be conceptually cleaner if the subworkflow just takes in DE output, and so for GSEA.

I'd be in favor of taking this chance of streamlining the workflow now that we are anyway changing quite a few things. We can help with implementing a module for preranked GSEA if required¹.

(alternatively, the decoupler module is kind of ready, and it comes with a very fast GSEA implementation -- it should generate the same results in terms of scores and p-values, but it doesn't produce all outputs such as the "leading edge" plots).

Footnotes

In a few weeks. Waiting for the contract extension with our external developers to be signed. ↩

Not to be a pain, but I do like those leading edge plots, they are useful.

I'm not against a preranked GSEA module. But what I really would like to get rid of is the contrast specification in this subworkflow.

I would actually like to start integrating the subworkflows in the current DA pipeline hopefully next week.
If the switch to preranked GSEA would take some time, maybe we could first agree on the current subworkflow version? I think it could serve as a nice starting point, and you are welcome to add modifications above it afterwards

pinin4fjords

Few minor comments, it's going the right way.

IMO having GSEA in here is fine, I can imagine other methods in here in future might use them, they can be optional, and 'pre-ranked' methods can ignore them.

I do think the gene_sets should be part of the iterable input if they're going to be method-specific, to save us adding method-wise get set channels to the interface.

pinin4fjords · 2025-01-24T13:51:51Z

subworkflows/nf-core/differential_functional_enrichment/meta.yml

+        - meta_input:
+            type: map
+            description: Metadata map
+        - input:


Can you expand on what is meant by 'input' here please?

pinin4fjords · 2025-01-24T13:58:20Z

subworkflows/nf-core/differential_functional_enrichment/main.nf

+    ch_input                            // [ meta_input, input file, method to run ]
+
+    // gene sets and background
+    ch_gene_sets                        // [ meta_gmt, gmt file ]


I'm not thrilled by the prospect of a multiplicity of gene_set inputs per method.

@suzannejin Maybe the gene sets should actually be part of ch_input. That way, there could be method-specific gene set files in that channel, and we wouldn't need multiple input channels

pinin4fjords · 2025-01-24T14:00:04Z

subworkflows/nf-core/differential_functional_enrichment/main.nf

+
+    // gsea-specific outputs
+    gsea_report           = GSEA_GSEA.out.report_tsvs_ref
+                                .join(GSEA_GSEA.out.report_tsvs_target)


strange indent

pinin4fjords · 2025-01-24T14:00:27Z

subworkflows/nf-core/differential_functional_enrichment/main.nf

+
+    // tool versions
+    versions              = ch_versions
+                                .mix(GPROFILER2_GOST.out.versions)


strange indents

pinin4fjords · 2025-01-24T14:00:44Z

subworkflows/nf-core/differential_functional_enrichment/main.nf

+                                .mix(CUSTOM_TABULARTOGSEACLS.out.versions)
+                                .mix(CUSTOM_TABULARTOGSEACHIP.out.versions)
+                                .mix(GSEA_GSEA.out.versions)
+                                .mix(PROPR_GREA.out.versions)


Why are there no emissions?

suzannejin and others added 19 commits December 11, 2024 16:47

[functional_enrichment] create a first template

4ea1b92

Merge branch 'nf-core:master' into functional_analysis

a434377

[functional_analysis] simplify code for grea and gprofiler2

318898d

[functional_analysis] add basic test for deseq2+gprofiler2

e8a1f7f

[functional_analysis] pass test for deseq2 + gprofiler

b3ce956

[functional_analysis] add test for limma-voom + gprofiler2

c0833c3

Merge branch 'nf-core:master' into functional_analysis

aa95ebe

Merge branch 'nf-core:master' into functional_analysis

88ff6c8

[functional_analysis] add optional inputs and set them to null. Updat…

9837349

…e snapshot

[functional_analysis] count elements in channel as condition to run a…

c9b8a88

… method or not

[functional_analysis] updated the code to handle gsea stuff. tested g…

643b7f7

…profiler2 that works when empty channels are given as Channel.of([], []), but stop working when empty channels are given as null, as combine/join methods cannot work on null

[functional_analysis] deseq2+gprofiler2 works

e86ece4

[functional_analysis] added test for gsea, but still need to make it …

7030f37

…work. Currently it is not running through GSEA modules -> check input channel

[functional_analysis] last changes, need to solve bugs

a7b7bd9

[functional_analysis] GSEA works now. Added snapshot. Still need to c…

a36f2fd

…lean code

[functional_analysis] add comments

8d9c31a

[functional_analysis] clean the code related to empty optional inputs

6c3df56

[functional_analysis] add test for limmavoom+gsea

e8f105c

suzannejin mentioned this pull request Dec 20, 2024

Create a subworkflow for functional analysis methods nf-core/differentialabundance#384

Open

4 tasks

suzannejin added 10 commits December 20, 2024 11:31

[functional_analysis] fill meta.yml

e3eb9df

correct errata

99f81ba

Merge branch 'master' into functional_analysis

294b1de

Merge branch 'master' into functional_analysis

e175066

update meta.yml

77ed5ea

update tests and snapshots

64ac34f

Merge branch 'master' into functional_analysis

01b1791

remove weird addition of module

df8945f

fix the tests that runs gprofiler2. Checked that it produces same res…

954fde6

…ults as the module itself too.

updated gsea test. Need to check why output files are empty though

b187583

suzannejin added 12 commits January 16, 2025 14:17

update snapshot

cac0b07

update gprofiler2_gost meta and test

2b78395

update gsea_gsea meta and test

709abba

add view to check output

aeedb60

modify the tests to have inputs without 'method' in meta

4b3013b

replace meta.remove by meta - [...]

65ea9e5

update test snapshots

5454a7d

update meta

f536a4e

Merge branch 'master' into functional_analysis

05edb99

update test snapshots as custom modules were updated in nf-core/modules

85d60f5

Merge branch 'master' into functional_analysis

5888db2

remove view

3003d04

mirpedrol reviewed Jan 21, 2025

View reviewed changes

suzannejin added 2 commits January 21, 2025 11:21

Merge branch 'master' into functional_analysis

d795904

modify gprofiler2 snapshot to assert unstable png

d88df5d

suzannejin marked this pull request as ready for review January 21, 2025 11:29

suzannejin added 3 commits January 21, 2025 11:32

update test in gprofiler2

3b3a05f

Merge branch 'master' into functional_analysis

1b49463

replace collect

99d5669

grst reviewed Jan 23, 2025

View reviewed changes

Merge branch 'master' into functional_analysis

c4e4bca

pinin4fjords reviewed Jan 24, 2025

View reviewed changes

suzannejin added 6 commits January 24, 2025 15:03

Simplify input channels by adding genesets and background to ch_input.

25f94ed

add stub to test and update snapshots.

6a4b1c7

fix indent

dbb6637

fix small bug

4f7cae8

add comments and update meta

6e5e1a6

add stub snapshot

3f10e16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add subworkflow for functional enrichment analysis #7254

add subworkflow for functional enrichment analysis #7254

suzannejin commented Dec 20, 2024 •

edited

Loading

suzannejin commented Jan 20, 2025 •

edited

Loading

pinin4fjords commented Jan 20, 2025

mirpedrol Jan 20, 2025

suzannejin Jan 21, 2025

mirpedrol Jan 20, 2025

suzannejin Jan 21, 2025

suzannejin Jan 21, 2025 •

edited

Loading

grst Jan 23, 2025

pinin4fjords Jan 23, 2025

grst Jan 23, 2025

suzannejin Jan 23, 2025 •

edited

Loading

grst Jan 23, 2025

suzannejin Jan 24, 2025

pinin4fjords Jan 24, 2025

grst Jan 23, 2025

grst Jan 23, 2025 •

edited

Loading

pinin4fjords Jan 23, 2025

pinin4fjords Jan 23, 2025

suzannejin Jan 23, 2025

grst Jan 23, 2025

pinin4fjords Jan 24, 2025

grst Jan 24, 2025

suzannejin Jan 24, 2025 •

edited

Loading

pinin4fjords left a comment

pinin4fjords Jan 24, 2025

pinin4fjords Jan 24, 2025

pinin4fjords Jan 24, 2025

pinin4fjords Jan 24, 2025

pinin4fjords Jan 24, 2025

add subworkflow for functional enrichment analysis #7254

Are you sure you want to change the base?

add subworkflow for functional enrichment analysis #7254

Conversation

suzannejin commented Dec 20, 2024 • edited Loading

PR checklist

suzannejin commented Jan 20, 2025 • edited Loading

pinin4fjords commented Jan 20, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

suzannejin Jan 21, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

suzannejin Jan 23, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

grst Jan 23, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Footnotes

Choose a reason for hiding this comment

Choose a reason for hiding this comment

suzannejin Jan 24, 2025 • edited Loading

Choose a reason for hiding this comment

pinin4fjords left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

suzannejin commented Dec 20, 2024 •

edited

Loading

suzannejin commented Jan 20, 2025 •

edited

Loading

suzannejin Jan 21, 2025 •

edited

Loading

suzannejin Jan 23, 2025 •

edited

Loading

grst Jan 23, 2025 •

edited

Loading

suzannejin Jan 24, 2025 •

edited

Loading