Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configs v2 #421

Merged
merged 2 commits into from
Aug 1, 2023
Merged

Configs v2 #421

merged 2 commits into from
Aug 1, 2023

Conversation

luis-mueller
Copy link
Contributor

Changelogs

This PR proposes a new structure for the hydra configs with a focus on the core aspects of Graphium (accelerators, benchmarking and pre-training/fine-tuning). The changes here might also directly help with #414. The PR also adds some documentation on the graphium/expts/hydra-configs level. I'd be happy to also add it to the documentation page. For more information about the proposed changes read here.

Tested on GPU for GIN and GCN models on ToyMix.


Checklist:

  • Was this PR discussed in an issue? It is recommended to first discuss a new feature into a GitHub issue before opening a PR.
  • Add tests to cover the fixed bug(s) or the new introduced feature(s) (if appropriate).
  • Update the API documentation is a new function is added, or an existing one is deleted.
  • Write concise and explanatory changelogs above.
  • If possible, assign one of the following labels to the PR: feature, fix or test (or ask a maintainer to do it for you).

discussion related to that PR

@codecov
Copy link

codecov bot commented Jul 30, 2023

Codecov Report

Merging #421 (16c2770) into main (b0d4fd5) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##             main     #421   +/-   ##
=======================================
  Coverage   66.77%   66.77%           
=======================================
  Files          82       82           
  Lines        7819     7819           
=======================================
  Hits         5221     5221           
  Misses       2598     2598           
Flag Coverage Δ
unittests 66.77% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
ipu 47.70% <ø> (ø)

@WenkelF
Copy link
Collaborator

WenkelF commented Jul 31, 2023

This looks nice @luis-mueller ! If I understand correctly, finetuning this model on lipophilicity would look like this:

# Accelerators
-accelerator: ipu

# Pre-training/fine-tuning
-architecture: toymix (would only work if datamodule/args/task_specific_args was in tasks)
-tasks: lipophilicity
-training: lipophilicity
-optional finetuning: lipophilicity

# Benchmarking
-model: gcn

# Specializations
-training/accelerator: ${training}_${accelerator}
-training/model: ${training}_${model}

Few thoughts/questions:

  • We could pass additional information necessary for finetuning (which pretrained model, freezing, etc.) as an optional default
  • If we wanted to finetune from a part other than a task_head, it would be done by moving parts from architecture to tasks if I understand correctly
  • Should’t datamodule/args/task_specific_args be in tasks config instead of architecture?
  • Would it make sense to make featurization a specialization of architecture, i.e., architecture/featurization. We could then always reuse the one from the pretrained model when finetuning
  • Should we use same main for benchmarking and finetuning?

@luis-mueller
Copy link
Contributor Author

@WenkelF: Thanks for spotting the task_specific_args, they are now under tasks.
Regarding your config, yes, this is how the fine-tuning could look from a config perspective. However, I would refrain from changing the main.yaml and rather override it in a new config file or, preferably, in the command line. Consider that the goal is to make the fine-tuning "switch" very easy, so it should be possible to make all changes in the CLI.

Regarding your comments:

We could pass additional information necessary for finetuning (which pretrained model, freezing, etc.) as an optional default

Agreed. This would go well with the claim that we pre-configure the use-case of pre-training/fine-tuning (see my docs). Once the finetuning configs are finalized, I'd be happy to add it to the documentation.

If we wanted to finetune from a part other than a task_head, it would be done by moving parts from architecture to tasks if I understand correctly

Yes, but keep in mind that this would only work for that one setup. If we want to provide a re-usable structure, we would also have to logically separate more parts (e.g., the encoders, gnn, pre-nn, etc.). This is of course a trade-off between flexibility and practicality as composing each part could be cumbersome. I will try to think of a solution that can make this trade-off easier.

Should’t datamodule/args/task_specific_args be in tasks config instead of architecture?

Yes, I fixed that already. Thanks for pointing this out 🙌

Would it make sense to make featurization a specialization of architecture, i.e., architecture/featurization. We could then always reuse the one from the pretrained model when finetuning

The way it is organized now should already support this use case. Since featurization is defined on the architecture level, it would be re-used when loading the architecture (irrespective of the tasks).

Should we use same main for benchmarking and finetuning?

IMO that would be ideal and it would showcase the flexibility of our config structure. From what I have seen in your PR most of the code is re-used, correct? You can always add additional code that only runs when finetuning is set.

Copy link
Collaborator

@DomInvivo DomInvivo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice changes! Would help makes everything cleaner

@WenkelF
Copy link
Collaborator

WenkelF commented Aug 1, 2023

From what I have seen in your PR most of the code is re-used, correct?

Yes, at least the first version of finetuning will be very similar. Moving forward only architecture part might change a bit.

One thing I thought about (assuming architecture was separated from featurization) was to have architecture as an optional and have it inferred from the pretrained model if we are finetuning. This could be integrated into the function load_architecture in graphium/data/_loader.py.

Let's say, we have pretrained many models on LargeMix and want to finetune a specific one for a new task. Then, we only need to provide the name of the pretrained model, e.g., looking at the leader board, and don't need to find the config of the pretrained.

Would that be a useful feature?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants