-
Notifications
You must be signed in to change notification settings - Fork 199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Large ligands, and how do I specify bond restrains for ligands? #284
Comments
Hi @VoyageHSSS
initial step for ligands is generation of a reference conformer - but rdkit can't generate one after 10k attempts, so it is excluded from model inputs. PubChem page also doesn't provide a reference conformer, likely because of the same issue. Please open a discussion in RDkit if there is a way to generate some reference conformer for this molecule.
we use numbering from rdkit, it can be different.
yup, that's how we expect you to deal with it.
All parameters matter. I recommend tweaking in this order: restrains, increase number of diffusion samples, increase number of trunk recycles. |
Thank you so much for your patient reply. I will try more diffusion samples and trunk!! Regarding the 'restraints', I still have some questions. After generating the mol object using rdkit.MolFromSmiles("smiles"), the first ligand's atom that needs to be restricted is numbered (C18), and the second ligand's atom that needs to be restricted is numbered (O26). In the restraints file, the second line is set as |
smth like below should work:
if it doesn't, we'll need an example to debug. |
here is my input_fasta:
Thank you very much, this has been bothering me for a long time. |
Your indices are off, e.g. first molecule doesn't have 18 carbons, second doesn't have 26 oxygens - again, see how rdkit enumerates atoms. When I run your example with something more reasonable, like this:
the inputs are processed normally. |
my code for get atom_id: the output: chai-1 code: I thought it was a chain ID issue, but even after changing the chain, the problem still persists. output: My chai-1 version is 0.5.1. Could it be a version issue?
|
this is code we use for indexing: chai-lab/chai_lab/data/sources/rdkit.py Lines 162 to 166 in c813769
it enumerates each type of atom independently starting from one, e.g. O1, O2, O3, or C1, C2, C3. Here is the code for enumeration: from collections import defaultdict
from rdkit import Chem
from rdkit.Chem import Draw
from matplotlib import pyplot as plt
mol = Chem.MolFromSmiles('O=C(N1)N([C@H]2[C@H](O)[C@H](O)[C@@H](COP(O)(OP(O)(O[C@@H]3[C@H](O)[C@@H](O)[C@H](O)[C@@H](CO)O3)=O)=O)O2)C=CC1=O')
element_counter: dict = defaultdict(int)
for atom in mol.GetAtoms():
elem = atom.GetSymbol()
element_counter[elem] += 1 # Start each counter at 1
name = elem + str(element_counter[elem])
atom.SetProp("name", name) # will be used by downstream code
atom.SetProp("atomNote", name) # for plotting
img_substrate = Draw.MolToImage(mol, size=(1200, 1200), )
plt.imshow(img_substrate) and result for your molecule: |
Thank you very much, I will try according to this code!
Alex Rogozhnikov ***@***.***>于2025年1月20日 周一03:39写道:
… this is code we use for indexing:
https://github.com/chaidiscovery/chai-lab/blob/c8137690c66565b433cfbf8b97df351443822684/chai_lab/data/sources/rdkit.py#L162-L166
it enumerates each type of atom independently starting from one, e.g. O1,
O2, O3, or C1, C2, C3.
Here is the code for enumeration:
from collections import defaultdict
from rdkit import Chem
from rdkit.Chem import Draw
from matplotlib import pyplot as plt
mol = ***@***.******@***.******@***.***(O)[C@@h](COP(O)(OP(O)(O[C@@***@***.***(O)[C@@***@***.***(O)[C@@h](CO)O3)=O)=O)O2)C=CC1=O')
element_counter: dict = defaultdict(int)
for atom in mol.GetAtoms():
elem = atom.GetSymbol()
element_counter[elem] += 1 # Start each counter at 1
name = elem + str(element_counter[elem])
atom.SetProp("name", name) # will be used by downstream code
atom.SetProp("atomNote", name) # for plotting
img_substrate = Draw.MolToImage(mol, size=(1200, 1200), )
plt.imshow(img_substrate)
and result for your molecule:
affc4171-ce0e-4195-8124-504d117d5850.png (view on web)
<https://github.com/user-attachments/assets/284a0cfe-a9bb-44ba-9193-a014400eeb09>
—
Reply to this email directly, view it on GitHub
<#284 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/BAQZMYVKQHJAWVQ62JM4M3D2LP5QDAVCNFSM6AAAAABVNHK6JGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMBQHE4TINRQHA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
This code should be in the doc, too useful to be hiden in the issues ^^ |
I highly recommend adding a parameter to generate multiple predicted structures (optional for users).
Finally, I look forward to your reply!
The text was updated successfully, but these errors were encountered: