Input validation wrapper class initial revision #9

John-Peters-UW · 2024-08-30T20:20:03Z

No description provided.

John-Peters-UW · 2024-08-30T20:25:11Z

This will break a lot of things until the flags are placed... everywhere. But I'll list out some features right here:

model_encoder.py has a little bit of code for validating the inputs given to METL.
- mutation position is valid and lines up with currently places amino acid
- PDB file (when run in strict mode) is compared to wild type to ensure match. Strict mode is enabled by default on UUID and INDENT read, but is off for checkpoint reads. Strict mode just means PDB file and wt must match exactly
- change indexing method exists, so when loading models you can decide on load if you want 0 or 1 based indexing. This changes how the errors are reported as well, so if a user is using 1 based indexing the reports for what variant is wrong will be in 1 based indexing
- verify if PDB file is needed by pre-checking on load if ra.RelativeTransformerEncoder is in the model, and if it is reports later in the code that you need to add a PDB file before forward is run.
- in main, params have been added to all loading methods to allow for "raw" loading which returns the old model, encoder
- indexing parameter allows indexing for the model to be set, setting it to 1 will convert all variants to 1 based before feeding them into METL

I think I covered the things I added here

John-Peters-UW · 2024-08-30T20:28:37Z

New API example is in the test_validation notebook I added temp for now so that
I could test. Here is a link to it:

https://github.com/gitter-lab/metl-pretrained/blob/ccccc6936fd95bd7295dc847abf81f16ff2c5b6d/notebooks/test_validation.ipynb

agitter

The overal design looks like it worked out very well. I have some questions to make sure I'm following the details correctly. I'm not sure all the variant validation is currently handling all possible edge cases.

requirements.txt

agitter · 2024-08-31T01:55:50Z

requirements.txt

+torch>=1.11.0
+biopython==1.84


The other packages are all specified with >= versions. Should we use that here too or pin everything to a single version?

I prefer pinning the versions if they work. There's no reason to risk a package updating in a way that breaks things when it works as is.

Unless there's a security update, then you have to refactor but I'd like to just pin versions.

I also tend to pin to specific versions. Let's grab specific versions from what was installed in the latest GitHub Actions run:

Successfully installed metl-pretrained-0.1 Package Version ------------------------ ----------- biopandas 0.5.1 biopython 1.84 filelock 3.15.4 fsspec 2024.6.1 Jinja2 3.1.4 looseversion 1.1.2 MarkupSafe 2.1.5 metl-pretrained 0.1 mmtf-python 1.1.3 mpmath 1.3.0 msgpack 1.0.8 networkx 3.3 numpy 1.26.4 nvidia-cublas-cu12 12.1.3.1 nvidia-cuda-cupti-cu12 12.1.105 nvidia-cuda-nvrtc-cu12 12.1.105 nvidia-cuda-runtime-cu12 12.1.105 nvidia-cudnn-cu12 9.1.0.70 nvidia-cufft-cu12 11.0.2.54 nvidia-curand-cu12 10.3.2.106 nvidia-cusolver-cu12 11.4.5.107 nvidia-cusparse-cu12 12.1.0.106 nvidia-nccl-cu12 2.20.5 nvidia-nvjitlink-cu12 12.6.68 nvidia-nvtx-cu12 12.1.105 pandas 2.2.2 pip 24.2 python-dateutil 2.9.0.post0 pytz 4.1 scipy 1.14.1 setuptools 74.1.1 six 1.16.0 sympy 1.13.2 torch 2.4.0 triton 3.0.0 typing_extensions 4.12.2 tzdata 2024.1

metl/main.py

metl/models.py

metl/model_encoder.py

notebooks/test_validation.ipynb

metl/model_encoder.py

agitter

The design is looking good. The GitHub Actions tests fail because they have not been updated. We may be ready to update that example code to use the new model encoder, or we could wait until @samgelman reviews it.

It would also be a good idea to add a couple tests showing that errors are thrown when bad inputs are provided.

agitter · 2024-10-08T14:17:59Z

@John-Peters-UW is there some minimal number of updates we can make to get this ready to merge? I like the design updates and would like to have it integrated into the main branch before we resubmit the manuscript, which is still ~weeks away.

John-Peters-UW · 2024-10-08T17:58:46Z

@John-Peters-UW is there some minimal number of updates we can make to get this ready to merge? I like the design updates and would like to have it integrated into the main branch before we resubmit the manuscript, which is still ~weeks away.

@agitter Not gonna lie, I completely forgot about this 😂. I was in Boston for a conference for a week, then Madison had a conference I had to participate it which took another week and then I was in full catchup mode with courses. I have an exam tomorrow 😬, but after that we can finish this off pretty quickly and have it ready to merge after some reviews next week.

It won't take a lot of time, but after Wednesday I should have time for research now and stuff like this 🙏. I've been like 90% school work / 10% research for the last week an a half. I'll mention it in my meeting with Juan just to block off some time so that I can get this done for ya'll - He won't mind, we're going pretty slow right now and don't really have any deadlines.

John-Peters-UW added 2 commits August 23, 2024 15:33

"Validation cache"

b11a075

Initial validation wrapper completed

ccccc69

John-Peters-UW requested review from samgelman and agitter August 30, 2024 20:20

agitter reviewed Aug 31, 2024

View reviewed changes

Updated repo with changes from PRbefore labor day

ba7e603

agitter reviewed Sep 6, 2024

View reviewed changes

metl/model_encoder.py Outdated Show resolved Hide resolved

agitter reviewed Sep 6, 2024

View reviewed changes

Lots of tests... Some small bug fixes

a2ac01b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Input validation wrapper class initial revision #9

Input validation wrapper class initial revision #9

John-Peters-UW commented Aug 30, 2024

John-Peters-UW commented Aug 30, 2024 •

edited

Loading

John-Peters-UW commented Aug 30, 2024

agitter left a comment

agitter Aug 31, 2024

John-Peters-UW Sep 3, 2024

agitter Sep 6, 2024

agitter left a comment

agitter commented Oct 8, 2024

John-Peters-UW commented Oct 8, 2024

		torch>=1.11.0
		biopython==1.84

Input validation wrapper class initial revision #9

Are you sure you want to change the base?

Input validation wrapper class initial revision #9

Conversation

John-Peters-UW commented Aug 30, 2024

John-Peters-UW commented Aug 30, 2024 • edited Loading

John-Peters-UW commented Aug 30, 2024

agitter left a comment

Choose a reason for hiding this comment

agitter Aug 31, 2024

Choose a reason for hiding this comment

John-Peters-UW Sep 3, 2024

Choose a reason for hiding this comment

agitter Sep 6, 2024

Choose a reason for hiding this comment

agitter left a comment

Choose a reason for hiding this comment

agitter commented Oct 8, 2024

John-Peters-UW commented Oct 8, 2024

John-Peters-UW commented Aug 30, 2024 •

edited

Loading