Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About secondary structure and block adjacency tensors in the newly published snake venom toxin binder design paper #313

Open
RodenLuo opened this issue Jan 21, 2025 · 1 comment

Comments

@RodenLuo
Copy link

Hi,

In the Methods section of this paper, it is described that the secondary structure and block adjacency tensors are one-hot tensors.

an [L,4] secondary one-hot tensor (0 = α-helix, 1 = β-strand, 2 = loop and 3 = masked secondary structure identity) to indicate the secondary structure classification of each residue in the binder–target complex
an [L,L,3] adjacency one-hot tensor (0 = non-adjacent, 1 = adjacent and 2 = masked adjacency) to indicate interacting partner residues for each residue in the binder–target complex

However, in the examples/target_folds example, they are not in one-hot encoding but label encoding. I also tried to use the provided script to generate these inputs, which are also in label encoding. Of note is that the generated secondary structure encoding is in floats rather than ints, as in the given example. Please see the output at the end.

I wonder if the paper is using a different version of RFdiffusion, also if one can add the running command script and the inputs for the case studies in this paper. I believe adding a reproducing guide for this paper would greatly benefit the research community. Many thanks!

>>> import torch
>>> target_ss_path = "target_folds/insulin_target_ss.pt"
>>> target_adj_path = "target_folds/insulin_target_adj.pt"
>>> target_ss = torch.load(target_ss_path)
>>> target_adj = torch.load(target_adj_path)
>>> 
>>> target_ss
tensor([2, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 0, 0, 0, 0, 0, 0, 2, 2, 2, 2, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 0, 0, 0, 0, 2, 2, 2, 2, 2, 2, 2,
        2, 2, 1, 1, 2, 2, 2, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
        2, 2, 2, 1, 1, 2, 2, 2, 2, 2, 1, 2, 2, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2,
        2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 2, 2, 2, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2,
        2, 2, 2, 2, 2, 2, 2, 0, 0, 0, 2, 1, 2, 2, 2, 2, 2, 2, 1, 1, 1, 2, 2, 2,
        2, 2, 2, 2, 2, 2])
>>> target_ss.shape
torch.Size([150])
>>> target_adj.shape
torch.Size([150, 150])
>>> target_adj
tensor([[0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        ...,
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.]])
>>> torch.unique(target_adj)
tensor([0., 1.])
>>> torch.unique(target_ss)
tensor([0, 1, 2])

### ------- script to generate fold conditioning inputs 
### ./helper_scripts/make_secstruc_adj.py --input_pdb ./examples/input_pdbs/2KL8.pdb --out_dir fold_conditioning_input_test
### -------

>>> target_ss_path = "/home/RFdiffusion/fold_conditioning_input_test/2KL8_ss.pt"
>>> target_adj_path = "/home/RFdiffusion/fold_conditioning_input_test/2KL8_adj.pt"
>>> target_ss = torch.load(target_ss_path)
>>> target_adj = torch.load(target_adj_path)
>>> target_ss
tensor([2., 1., 1., 1., 1., 1., 1., 1., 2., 2., 2., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 2., 2., 2., 1., 1., 1., 1., 1.,
        1., 2., 2., 2., 2., 1., 1., 1., 1., 1., 1., 2., 2., 2., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 2., 2., 2., 1., 1.,
        1., 1., 1., 1., 1., 2., 2.])
>>> target_ss.shape
torch.Size([79])
>>> target_adj.shape
torch.Size([79, 79])
>>> target_adj
tensor([[0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 1., 0., 0.],
        [0., 0., 0.,  ..., 1., 0., 0.],
        ...,
        [0., 1., 1.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.]])
>>> torch.unique(target_adj)
tensor([0., 1.])
>>> torch.unique(target_ss)
tensor([0., 1., 2.])
@RodenLuo
Copy link
Author

If I understand the README and the Methods in the paper correctly, the difference is actually the following.

In the README, at the fold conditioning section with a target structure, the secondary structure and block adjacency tensors for the target and the scaffold are independently set. scaffoldguided.target_ss and scaffoldguided.target_adj are responsible for the target, while scaffoldguided.scaffold_dir has the ss and adj tensors for the scaffold. I checked in the given examples/ppi_scaffolds and found some scaffolds are shorter than the insulin_target.pdb, which means the scaffold does not include the target itself.

In the new paper, these tensors are for "binder–target complex".

Now, a few questions come in:

  1. How do we set the adjacency tensor values between the binder's amino acids and the target's?
  2. With such tensors generated, either one-hot encoding or label encoding, how do we feed them into the RFdiffusion model?

Many thanks for any help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant