About secondary structure and block adjacency tensors in the newly published snake venom toxin binder design paper #313

RodenLuo · 2025-01-21T11:16:30Z

Hi,

In the Methods section of this paper, it is described that the secondary structure and block adjacency tensors are one-hot tensors.

an [L,4] secondary one-hot tensor (0 = α-helix, 1 = β-strand, 2 = loop and 3 = masked secondary structure identity) to indicate the secondary structure classification of each residue in the binder–target complex
an [L,L,3] adjacency one-hot tensor (0 = non-adjacent, 1 = adjacent and 2 = masked adjacency) to indicate interacting partner residues for each residue in the binder–target complex

However, in the examples/target_folds example, they are not in one-hot encoding but label encoding. I also tried to use the provided script to generate these inputs, which are also in label encoding. Of note is that the generated secondary structure encoding is in floats rather than ints, as in the given example. Please see the output at the end.

I wonder if the paper is using a different version of RFdiffusion, also if one can add the running command script and the inputs for the case studies in this paper. I believe adding a reproducing guide for this paper would greatly benefit the research community. Many thanks!

>>> import torch
>>> target_ss_path = "target_folds/insulin_target_ss.pt"
>>> target_adj_path = "target_folds/insulin_target_adj.pt"
>>> target_ss = torch.load(target_ss_path)
>>> target_adj = torch.load(target_adj_path)
>>> 
>>> target_ss
tensor([2, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 0, 0, 0, 0, 0, 0, 2, 2, 2, 2, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 0, 0, 0, 0, 2, 2, 2, 2, 2, 2, 2,
        2, 2, 1, 1, 2, 2, 2, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
        2, 2, 2, 1, 1, 2, 2, 2, 2, 2, 1, 2, 2, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2,
        2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 2, 2, 2, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2,
        2, 2, 2, 2, 2, 2, 2, 0, 0, 0, 2, 1, 2, 2, 2, 2, 2, 2, 1, 1, 1, 2, 2, 2,
        2, 2, 2, 2, 2, 2])
>>> target_ss.shape
torch.Size([150])
>>> target_adj.shape
torch.Size([150, 150])
>>> target_adj
tensor([[0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        ...,
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.]])
>>> torch.unique(target_adj)
tensor([0., 1.])
>>> torch.unique(target_ss)
tensor([0, 1, 2])

### ------- script to generate fold conditioning inputs 
### ./helper_scripts/make_secstruc_adj.py --input_pdb ./examples/input_pdbs/2KL8.pdb --out_dir fold_conditioning_input_test
### -------

>>> target_ss_path = "/home/RFdiffusion/fold_conditioning_input_test/2KL8_ss.pt"
>>> target_adj_path = "/home/RFdiffusion/fold_conditioning_input_test/2KL8_adj.pt"
>>> target_ss = torch.load(target_ss_path)
>>> target_adj = torch.load(target_adj_path)
>>> target_ss
tensor([2., 1., 1., 1., 1., 1., 1., 1., 2., 2., 2., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 2., 2., 2., 1., 1., 1., 1., 1.,
        1., 2., 2., 2., 2., 1., 1., 1., 1., 1., 1., 2., 2., 2., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 2., 2., 2., 1., 1.,
        1., 1., 1., 1., 1., 2., 2.])
>>> target_ss.shape
torch.Size([79])
>>> target_adj.shape
torch.Size([79, 79])
>>> target_adj
tensor([[0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 1., 0., 0.],
        [0., 0., 0.,  ..., 1., 0., 0.],
        ...,
        [0., 1., 1.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.]])
>>> torch.unique(target_adj)
tensor([0., 1.])
>>> torch.unique(target_ss)
tensor([0., 1., 2.])

RodenLuo · 2025-01-22T10:28:37Z

If I understand the README and the Methods in the paper correctly, the difference is actually the following.

In the README, at the fold conditioning section with a target structure, the secondary structure and block adjacency tensors for the target and the scaffold are independently set. scaffoldguided.target_ss and scaffoldguided.target_adj are responsible for the target, while scaffoldguided.scaffold_dir has the ss and adj tensors for the scaffold. I checked in the given examples/ppi_scaffolds and found some scaffolds are shorter than the insulin_target.pdb, which means the scaffold does not include the target itself.

In the new paper, these tensors are for "binder–target complex".

Now, a few questions come in:

How do we set the adjacency tensor values between the binder's amino acids and the target's?
With such tensors generated, either one-hot encoding or label encoding, how do we feed them into the RFdiffusion model?

Many thanks for any help.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About secondary structure and block adjacency tensors in the newly published snake venom toxin binder design paper #313

About secondary structure and block adjacency tensors in the newly published snake venom toxin binder design paper #313

RodenLuo commented Jan 21, 2025

RodenLuo commented Jan 22, 2025

About secondary structure and block adjacency tensors in the newly published snake venom toxin binder design paper #313

About secondary structure and block adjacency tensors in the newly published snake venom toxin binder design paper #313

Comments

RodenLuo commented Jan 21, 2025

RodenLuo commented Jan 22, 2025