Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

about Preprocessing of data #3

Open
anny0316 opened this issue Oct 24, 2022 · 8 comments
Open

about Preprocessing of data #3

anny0316 opened this issue Oct 24, 2022 · 8 comments

Comments

@anny0316
Copy link

Hello, I have a question, when running the file get_fragment_vocab.py, the vocab of the fragment can be saved, why does the fragment need to be re-acquired in the file get_training_data.py, and align it with the previously saved fragment, and finally get the rotation matrix ? why do that? thank you very much.

@longlongman
Copy link
Owner

After running the file get_fragment_vocab.py, we can get the fragment vocabulary. Because we want to use the vocabulary to rebuild a 3D molecule, we align the molecule with the saved fragments. Specifically, a molecule first will be cut into fragments as we did in building vocabulary. Then we search for the molecule fragments from the vocabulary to get their vocabulary index. For each molecule fragment, after getting its index, we align the saved fragment (retrieved from the vocabulary according to the index) to get the corresponding translation vector and rotation matrix, so that we can apply the correct transformation to the saved fragment to rebuild the corresponding molecule fragment.

@anny0316
Copy link
Author

Thank you, I get it.

@anny0316
Copy link
Author

anny0316 commented Nov 8, 2022

Hello, when I got its pocket by using CAVITY for a given protein, I found that there are many pockets for a protein. In this experiment, do all pockets detected for a protein need to be considered? thank you very much.

@longlongman
Copy link
Owner

For each target protein, we only use the cavity with the best drugability score (the score is also provided by CAVITY).

@anny0316
Copy link
Author

anny0316 commented Nov 9, 2022

OK, many thanks to Siyu.

@anny0316
Copy link
Author

Hello Siyu,

  1. I would like to know what is the difference between “thischains_vacant_xx.pdb” and “thischains_cavity_xx.pdb”, after I use CAVITY to generate the cavity PDB file? In our experiment, I think “thischains_cavity_xx.pdb” should be used, isn't it?
  2. In sketching.py, in order to get "sample_n_o_f" for protein, the selected features are "for xyz in feature_dict[(7.0,)] + feature_dict[(8.0,)] + feature_dict[(9.0,)]", Why not choose another feature? such as feature_dict[(6.0,)] or other.

I'm looking forward for your reply. Thank you.

@longlongman
Copy link
Owner

Q: What is the difference between “thischains_vacant_xx.pdb” and “thischains_cavity_xx.pdb”?
A: "thischains_vacant_xx.pdb" gives us the cavity in the volume form (including surface). “thischains_cavity_xx.pdb” gives us the surface of the cavity.

Q: “thischains_cavity_xx.pdb” should be used, isn't it?
A: No, we use “thischains_vacant_xx.pdb”, because we need to calculate the volume of sampled molecular shapes.

Q: Why not choose another feature?
A: As mentioned in Appendix 2.2 Chemical Information Driven Design, we also explore the potential of integrating chemical
information of proteins into drug design. Briefly speaking, based on hydrogen bond acceptor-donor
rules, we put the fragment with more hydrogen atoms into the pocket region with more oxygen,
nitrogen, and fluorine atoms (feature_dict[(7.0,)], feature_dict[(8.0,)], feature_dict[(9.0,)).

Moreover, you can check out the issue #2, where I already answer some common questions.

@anny0316
Copy link
Author

Hello Siyu, thank you very much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants