Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Input validation wrapper class initial revision #9

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

John-Peters-UW
Copy link
Collaborator

No description provided.

@John-Peters-UW
Copy link
Collaborator Author

John-Peters-UW commented Aug 30, 2024

This will break a lot of things until the flags are placed... everywhere. But I'll list out some features right here:

  • model_encoder.py has a little bit of code for validating the inputs given to METL.
    • mutation position is valid and lines up with currently places amino acid
    • PDB file (when run in strict mode) is compared to wild type to ensure match. Strict mode is enabled by default on UUID and INDENT read, but is off for checkpoint reads. Strict mode just means PDB file and wt must match exactly
    • change indexing method exists, so when loading models you can decide on load if you want 0 or 1 based indexing. This changes how the errors are reported as well, so if a user is using 1 based indexing the reports for what variant is wrong will be in 1 based indexing
    • verify if PDB file is needed by pre-checking on load if ra.RelativeTransformerEncoder is in the model, and if it is reports later in the code that you need to add a PDB file before forward is run.
    • in main, params have been added to all loading methods to allow for "raw" loading which returns the old model, encoder
    • indexing parameter allows indexing for the model to be set, setting it to 1 will convert all variants to 1 based before feeding them into METL

I think I covered the things I added here

@John-Peters-UW
Copy link
Collaborator Author

New API example is in the test_validation notebook I added temp for now so that
I could test. Here is a link to it:

https://github.com/gitter-lab/metl-pretrained/blob/ccccc6936fd95bd7295dc847abf81f16ff2c5b6d/notebooks/test_validation.ipynb

Copy link
Member

@agitter agitter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overal design looks like it worked out very well. I have some questions to make sure I'm following the details correctly. I'm not sure all the variant validation is currently handling all possible edge cases.

requirements.txt Outdated Show resolved Hide resolved
torch>=1.11.0
biopython==1.84
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other packages are all specified with >= versions. Should we use that here too or pin everything to a single version?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer pinning the versions if they work. There's no reason to risk a package updating in a way that breaks things when it works as is.

Unless there's a security update, then you have to refactor but I'd like to just pin versions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also tend to pin to specific versions. Let's grab specific versions from what was installed in the latest GitHub Actions run:

Successfully installed metl-pretrained-0.1
Package                  Version
------------------------ -----------
biopandas                0.5.1
biopython                1.84
filelock                 3.15.4
fsspec                   2024.6.1
Jinja2                   3.1.4
looseversion             1.1.2
MarkupSafe               2.1.5
metl-pretrained          0.1
mmtf-python              1.1.3
mpmath                   1.3.0
msgpack                  1.0.8
networkx                 3.3
numpy                    1.26.4
nvidia-cublas-cu12       12.1.3.1
nvidia-cuda-cupti-cu12   12.1.105
nvidia-cuda-nvrtc-cu12   12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12        9.1.0.70
nvidia-cufft-cu12        11.0.2.54
nvidia-curand-cu12       10.3.2.106
nvidia-cusolver-cu12     11.4.5.107
nvidia-cusparse-cu12     12.1.0.106
nvidia-nccl-cu12         2.20.5
nvidia-nvjitlink-cu12    12.6.68
nvidia-nvtx-cu12         12.1.105
pandas                   2.2.2
pip                      24.2
python-dateutil          2.9.0.post0
pytz                     4.1
scipy                    1.14.1
setuptools               74.1.1
six                      1.16.0
sympy                    1.13.2
torch                    2.4.0
triton                   3.0.0
typing_extensions        4.12.2
tzdata                   2024.1

metl/main.py Show resolved Hide resolved
metl/models.py Outdated Show resolved Hide resolved
metl/model_encoder.py Show resolved Hide resolved
metl/model_encoder.py Outdated Show resolved Hide resolved
metl/model_encoder.py Outdated Show resolved Hide resolved
metl/model_encoder.py Outdated Show resolved Hide resolved
metl/model_encoder.py Show resolved Hide resolved
notebooks/test_validation.ipynb Outdated Show resolved Hide resolved
metl/model_encoder.py Outdated Show resolved Hide resolved
Copy link
Member

@agitter agitter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The design is looking good. The GitHub Actions tests fail because they have not been updated. We may be ready to update that example code to use the new model encoder, or we could wait until @samgelman reviews it.

It would also be a good idea to add a couple tests showing that errors are thrown when bad inputs are provided.

@agitter
Copy link
Member

agitter commented Oct 8, 2024

@John-Peters-UW is there some minimal number of updates we can make to get this ready to merge? I like the design updates and would like to have it integrated into the main branch before we resubmit the manuscript, which is still ~weeks away.

@John-Peters-UW
Copy link
Collaborator Author

@John-Peters-UW is there some minimal number of updates we can make to get this ready to merge? I like the design updates and would like to have it integrated into the main branch before we resubmit the manuscript, which is still ~weeks away.

@agitter Not gonna lie, I completely forgot about this 😂. I was in Boston for a conference for a week, then Madison had a conference I had to participate it which took another week and then I was in full catchup mode with courses. I have an exam tomorrow 😬, but after that we can finish this off pretty quickly and have it ready to merge after some reviews next week.

It won't take a lot of time, but after Wednesday I should have time for research now and stuff like this 🙏. I've been like 90% school work / 10% research for the last week an a half. I'll mention it in my meeting with Juan just to block off some time so that I can get this done for ya'll - He won't mind, we're going pretty slow right now and don't really have any deadlines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants