Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Added notebooks that are colab accessible, they should both be runnable from within colab. " #2

Merged
merged 10 commits into from
Aug 29, 2024

Conversation

John-Peters-UW
Copy link
Collaborator

To run,

  • Open colab
  • paste in the link to the notebooks (the github links should work, I think)
  • Press run all, or modify the inputs and run the notebooks to play with them

Copy link
Member

@agitter agitter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The notebooks are great and overall worked well for me. I'd like to get some feedback from others who are less familiar with how METL works.

We will need to add these to the notebooks README.md and explain how to run them in Colab. We can provide a direct link to open these notebooks in Colab. ColabFold uses https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb for example. We should document that the T4 GPU runtime is sufficient to run both notebooks. Is it necessary to connect the runtime in Colab before starting?

Will we keep the original finetuning.ipynb as well? We probably can as long as we explain their different purposes.

notebooks/colab_finetuning.ipynb Outdated Show resolved Hide resolved
notebooks/colab_finetuning.ipynb Outdated Show resolved Hide resolved
notebooks/colab_finetuning.ipynb Show resolved Hide resolved
notebooks/colab_finetuning.ipynb Show resolved Hide resolved
notebooks/colab_finetuning.ipynb Outdated Show resolved Hide resolved
notebooks/colab_metl_predicting.ipynb Outdated Show resolved Hide resolved
notebooks/colab_metl_predicting.ipynb Outdated Show resolved Hide resolved
notebooks/colab_metl_predicting.ipynb Outdated Show resolved Hide resolved
notebooks/colab_metl_predicting.ipynb Outdated Show resolved Hide resolved
notebooks/colab_metl_predicting.ipynb Outdated Show resolved Hide resolved
@John-Peters-UW
Copy link
Collaborator Author

John-Peters-UW commented Aug 19, 2024

Good feedback :)

We will need to add these to the notebooks README.md and explain how to run them in Colab. We can provide a direct link to open these notebooks in Colab. ColabFold uses https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb for example. We should document that the T4 GPU runtime is sufficient to run both notebooks. Is it necessary to connect the runtime in Colab before starting?

From my understanding I think you have to connect to it manually if you want to use the GPU. I think it has something to do with them not trying to automatically start billing you if you are allocating like A100's or something

Adding something to the readme (and the model card) is needed I think, too

@John-Peters-UW
Copy link
Collaborator Author

Will we keep the original finetuning.ipynb as well? We probably can as long as we explain their different purposes.

Yea. I think we should keep them separate. If you are running the normal one, you don't need the notebook cloning metl again (it's assumed it should be cloned already) and force-downloading all of the dependencies every time you try to run it

"\n",
"For demonstration purposes, this repository contains the [avGFP dataset](https://github.com/gitter-lab/metl/tree/main/data/dms_data/avgfp) from [Sarkisyan et al. (2016)](https://doi.org/10.1038/nature17995).\n",
"See the [metl-pub](https://github.com/gitter-lab/metl-pub) repository to access the other experimental datasets we used in our preprint.\n",
"See the README in the [dms_data](https://github.com/gitter-lab/metl/tree/main/data/dms_data/avgfp/data/dms_data) directory for information about how to use your own experimental dataset."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This global link and some others are broken. Some are still relative. Please double check them all.

Copy link
Collaborator Author

@John-Peters-UW John-Peters-UW Aug 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@agitter The broken links were links to python or ipynb files that are broken in the original notebooks as well.

Additionally, the broken global links are actually linking to pages that no longer exist. The only way to get to those files are by going to older commits, when they did exist, like the other notebook does.

I do not know what you want to do with that data. I fixed the ipynb links, but I will wait on the broken globals because those files are no longer in the current version of the main repository.

An example of that is this link here for the dms_data that is in the notebook: https://github.com/gitter-lab/metl/tree/main/data/dms_data/avgfp/data/dms_data

But the actual file was deleted though can be found in an earlier commit: https://github.com/gitter-lab/metl/tree/45237bbc4c6cfae1d4657930bdb732ec41a3fc79/data/dms_data

Are we supported this still? Does this need to be referenced or talk about? And if it does, why is it not included in the current version of main?

I'm confused on these so I am leaving them, though did I fix the broken links to python files and notebooks (that were also broken in the normal finetuning notebook)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. This is a question for @samgelman.

Copy link
Collaborator

@samgelman samgelman Aug 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

notebooks/colab_finetuning.ipynb Show resolved Hide resolved
@agitter
Copy link
Member

agitter commented Aug 23, 2024

I made another pass through the notebooks. Some of the orginal comments are still unresolved, including some small typos corrections. I'd like to add these to the readme table before merging.

@samgelman do you want to test these in Colab before we merge?

@John-Peters-UW
Copy link
Collaborator Author

I made another pass through the notebooks. Some of the orginal comments are still unresolved, including some small typos corrections. I'd like to add these to the readme table before merging.

@samgelman do you want to test these in Colab before we merge?

Were the typo's the links or were there other things that you saw? I can't seem to find anything else. I thought I got all of the typo corrections you put out on the previous round of comments

@agitter
Copy link
Member

agitter commented Aug 23, 2024

Were the typo's the links or were there other things that you saw? I can't seem to find anything else.

I think a lot of my small comments on the pretraining notebook were masked by GitHub as outdated but weren't incorporated. Try searching "To zero based helper functoin" as one example.

@John-Peters-UW
Copy link
Collaborator Author

Were the typo's the links or were there other things that you saw? I can't seem to find anything else.

I think a lot of my small comments on the pretraining notebook were masked by GitHub as outdated but weren't incorporated. Try searching "To zero based helper functoin" as one example.

Should've gotten them now. There are a few comments that you can answer though in the unresolved comments above

@samgelman
Copy link
Collaborator

I've reviewed both notebooks, and they are working great!

I have a few minor comments, for the prediction notebook:

  • For the "Variant text input" cell, specify whether the example variants are 0-based or 1-based indexing
  • For the "Variant file upload" cell, how should the JSON file be formatted? It might be simpler for biologists to be able to provide a list of variants in a text file, one on each line, rather than formatting a JSON file.
  • For the "METL predicting loop" cell, the output predictions are labeled as logits, but I'm not sure if that is the correct way to describe them. I usually see logits in the context of classifier networks, and the term refers to the outputs from the final layer of the network, before the softmax / activation. I think I've also seen the term to describe intermediate network layer outputs. For us, I think "predictions" or simply "outputs" would be more accurate.

For the finetuning notebook, there are some broken links. It looks like there may have been a find and replace, but the replacement text erroneously added "data/dms_data/avgfp" to the path.

I suggest fixing the broken links, and then other changes can be discussed / added later in a separate PR (especially the JSON vs. text file upload, I don't think that's a critical issue that should block the merge).

Great job!

…edback and updated the links in the finetuning notebook"
@samgelman samgelman merged commit c4dd1b3 into main Aug 29, 2024
4 of 5 checks passed
@agitter agitter deleted the colab_notebooks branch August 30, 2024 22:20
@agitter
Copy link
Member

agitter commented Aug 31, 2024

I reran both notebooks and made a few small changes in 9912989. They both look excellent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants