Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Respect given memory parameter in BBTools processes #36

Closed
mikemc opened this issue Jul 1, 2024 · 10 comments
Closed

Respect given memory parameter in BBTools processes #36

mikemc opened this issue Jul 1, 2024 · 10 comments
Assignees
Labels
bug Something isn't working priority_2

Comments

@mikemc
Copy link
Member

mikemc commented Jul 1, 2024

See #29 for a description and illustration of the issue. The fix involves generating the correct bbtools/java memory parameter string from the workflow's task.memory parameter.

Applies to 'RUN:DEDUP:CLUMPIFY_PAIRED' and other processes that use BBTools scripts

@mikemc mikemc added the bug Something isn't working label Jul 1, 2024
@mikemc mikemc self-assigned this Jul 1, 2024
@willbradshaw
Copy link
Contributor

willbradshaw commented Jul 1, 2024 via email

@mikemc
Copy link
Member Author

mikemc commented Jul 1, 2024

@willbradshaw a few clarification questions that will help me figure this out:

First, I notice sometimes we have params.mem appearing in the process definition, e.g.

process KRAKEN {
    label "Kraken2"
    cpus 16
    memory "${params.mem}"
    input:

can you explain what that is doing and if it is related to task.memory?

Second, to help me determine how to translate the resources.config memory parameters to those needed by Java (e.g. 30g, 1600m), could you clarify where the format used in the resources.config for memory (e.g. "32.GB") comes from or what strings are allowed? (e.g. is "32.5GB" allowed?)

Third, is there an example you can point to where the resources.config memory parameters are being used (if the KRAKEN instance above is not one)?

If they aren't being used at all yet, then perhaps it would make sense to require that they are specified in the java format to avoid needing to translate from "32.5GB" to "32500m" or "32g" ("-XmX32.5g" is not valid)

@mikemc mikemc changed the title Respect given memory parameter in 'RUN:DEDUP:CLUMPIFY_PAIRED' Respect given memory parameter in BBTools processes Jul 1, 2024
@willbradshaw
Copy link
Contributor

can you explain what that is doing and if it is related to task.memory?

params.mem here is an arbitrary value that's being passed to the process when the module is imported (using the addParams statement). I've called it mem in this case but I could call it anything and it would still work.

task.memory is a specific value that's declared via the memory process directive and specifies the amount of memory allocated by Nextflow to the process. In this case, params.mem is being used to allow the value of task.memory to be configured when the module is imported.

Second, to help me determine how to translate the resources.config memory parameters to those needed by Java (e.g. 30g, 1600m), could you clarify where the format used in the resources.config for memory (e.g. "32.GB") comes from or what strings are allowed? (e.g. is "32.5GB" allowed?)

Good question, I had to look this up. I think this is the relevant part of the Nextflow documentation, which states that "You can create a memory unit by adding a unit suffix to an integer, e.g. 1.GB". So I think we can safely assume it will be an integer value, with a 1:1 mapping between unit suffixes in nextflow and those in Java.

Third, is there an example you can point to where the resources.config memory parameters are being used (if the KRAKEN instance above is not one)?

As I understand it, it's being used in every process, including KRAKEN; it specifies the amount of memory that process is allowed to use. The problem in the BBTools processes isn't that the memory parameters aren't being used. Rather, it's that the process is only allocated the amount of memory specified by the relevant memory directive, but is trying to claim more than that and so failing.

@mikemc
Copy link
Member Author

mikemc commented Jul 2, 2024

Thanks @willbradshaw , this is very helpful!

@mikemc
Copy link
Member Author

mikemc commented Jul 2, 2024

I looked to see how https://github.com/nf-core/mag handles this issue and found this example,

    bbnorm.sh \\
        $input \\
        $output \\
        $args \\
        threads=$task.cpus \\
        -Xmx${task.memory.toGiga()}g \\
        &> ${prefix}.bbnorm.log

link

Apparently the Nextflow DSL defines a MemoryUnit class that has methods (https://www.nextflow.io/docs/latest/script.html#memoryunit) to help with the conversion, including toGiga() which is used above.

So my plan is to simply swap in -Xmx${task.memory.toGiga()}g everywhere. I'll do this once I get the full pipeline working for testing purposes.

notes for future reference

  • I made the edits by running grep -r "-Xmx30g" ./ to find the relevant files and then, within each file, running the following substitution command within neovim: :%s/-Xmx\d\+g/-Xmx${task.memory.toGiga()}g/g
  • previously the memory settings were hardcoded to 30g for clumpify and bbduk, and 60g for bbmap, and now we just set them to the nextflow task.memory without checking if they are lower than these previous values.

mikemc added a commit that referenced this issue Jul 2, 2024
mikemc added a commit that referenced this issue Jul 25, 2024
@jtjvanlunenburg
Copy link

Memory config can be unlimited, e.g. mem = 0, in which case task.memory is a null object and there is no toGiga method (nor a .giga prop)? How would you work around this scenario?

@willbradshaw
Copy link
Contributor

@mikemc did you end up getting any further with this? Would you like me to include it in the list of issues for Harmon to work on in Q4?

@mikemc
Copy link
Member Author

mikemc commented Sep 25, 2024

I consider the fix I describe above, implemented in 89b2c75, as fixing the issue and is what @evanfields has been using. I think it's an improvement over hard-setting the memory and would suggest making this fix. (This is also the behavior you'd get if you moved to using nf-core BBTools modules). I think I didn't make a pull request because you were in the midst of refactoring.

Regarding @jtjvanlunenburg 's question, I'm not really sure if unlimited memory is something you need to handle, but either way I'm guessing making the workflow support setting unlimited memory should be its own new issue and not block fixing the immediate issue.

@willbradshaw
Copy link
Contributor

Cool, I'll add it to the list. I agree handling unlimited memory isn't a major priority.

@willbradshaw
Copy link
Contributor

This is now implemented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working priority_2
Projects
None yet
Development

No branches or pull requests

3 participants