Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove internal workflows and scripts #742

Merged
merged 4 commits into from
Jan 21, 2025
Merged

Remove internal workflows and scripts #742

merged 4 commits into from
Jan 21, 2025

Conversation

mwalker174
Copy link
Collaborator

Moves workflows and scripts considered legacy code or that are otherwise not necessary to run gatk-sv. All of these files have been relocated to gatk-sv-internal.

Two WDLs had to be modified:

  • Moved needed utility functions out of TasksBenchmark.wdl into Vapor.wdl
  • Removed the "ReviseBase" option from GatherSampleEvidence.wdl

Some workflows may still be in regular use. They will still be available through Dockstore in previous versions.

We may also consider moving str related code out as well in a future PR.

Delete a few more files; update vapor

Move more stuff

Move out igv docker, ApplyManualVariantFilter, and some unused input templates
Copy link
Collaborator

@epiercehoffman epiercehoffman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks generally good. I double-checked all the WDLs for remaining dependencies and they were fine.

My biggest question is how we want to maintain utility workflows in the internal repo that depend on tasks in the main repo - ex. ConcatTextFiles and SubsetVcfBySamples rely on TasksMakeCohortVcf and Utils respectively, and as such are currently broken in gatk-sv-internal. Is it worthwhile to copy the tasks WDLs over, or to create an InternalTasks.wdl to copy specific tasks? I don't like to duplicate code but with two repos I think there isn't much of an alternative.

@@ -1,7 +0,0 @@
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like none of the JSON templates got moved to the new repo. I'm not sure any of the current ones are super necessary to keep around, but we might want the ability/setup to store and validate them in the future

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I did move example jsons over but not templates because they could get outdated. I think if we ever wanted to restore the templates we would just go back in the git history here instead.

@@ -1,313 +0,0 @@
#!/bin/python
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have shared these scripts with other groups and have links to them in the onboarding doc & slides. We'll need to update our onboarding materials and we may want to put a note in the README or somewhere to check the internal repo

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we should update internal onboarding documentation. gatk-sv-interrnal is a private repo though, so we should not point users to it.

Copy link
Collaborator Author

@mwalker174 mwalker174 Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the link in the onboarding doc

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I missed that it's private. I think it needs to be public so we can use it with dockstore

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My mistake, it is public. I still wouldn't want to refer users there since some code may not be maintained

@@ -1,117 +0,0 @@
#!/bin/python
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed you're only keeping the v2 scripts in the new internal repo - in which case this one can be deleted as get_cromwell_resource_usage2.sh renders it obsolete

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Obsolete or redundant? It seems like they do similar things but it might be a good idea to have a python implementation available.

@@ -1,132 +0,0 @@
#!/bin/python
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This script is referred to in website/docs/advanced/build_ref_panel.md - should update docs accordingly

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes okay, I am deleting the documentation on another branch mw_ss_mode_v1, which replaces reference panel generation with a Terra method, and copied it into the script header over in the other repo.

@@ -1,43 +0,0 @@
version 1.0

import "TasksMakeCohortVcf.wdl" as tasks
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the plan for utility workflows like this that we may want to use again, but rely on WDLs in the gatk-sv repo that haven't been moved to gatk-sv-internal? ie. here ConcatTextFiles needs TasksMakeCohortVcf.wdl - but we may not want to maintain two versions

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I will copy those dependencies over into gatk-sv-internal since it's possible that they could get desynced with the dependent WDLs over time. I'm not worried about having a little redundant code over there if they are just a few core utility tasks.

@@ -1,159 +0,0 @@
version 1.0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mentioned in website/docs/gs/input_files.md so should update docs

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I think that the ID generating script is critical so I have moved it into /scripts/inputs and updated the link in the doc.

@mwalker174 mwalker174 merged commit 53419e1 into main Jan 21, 2025
11 checks passed
@mwalker174 mwalker174 deleted the mw_release_cleanup branch January 21, 2025 15:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants