Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Second enumerate example returns different output #133

Open
olivertomic opened this issue Apr 19, 2020 · 2 comments
Open

Second enumerate example returns different output #133

olivertomic opened this issue Apr 19, 2020 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@olivertomic
Copy link

In the second example the second line has one command too many. >>> peptide_backbone = PeptideBuilder(2) should be on the next line.

When running this example I get different ouputs.

from cocktail_shaker import Cocktail
from cocktail_shaker import PeptideBuilder
peptide_backbone = PeptideBuilder(2)
cocktail = Cocktail(peptide_backbone,ligand_library = ['Br', 'I'])
combinations = cocktail.shake()
print (combinations)
['NC(Br)C(=O)NC(I)C(=O)NCC(=O)O', 'NC(I)C(=O)NC(Br)C(=O)NCC(=O)O']
enumerations = cocktail.enumerate(enumeration_complexity='low')
print (len(enumerations))
20
enumerations = cocktail.enumerate(enumeration_complexity='med')
print (len(enumerations))
186
enumerations = cocktail.enumerate(enumeration_complexity='high')
print (len(enumerations))
1789

My output is the following (note that the numerical outputs are different):

Generating 2 Compounds...
['NC(Br)C(=O)NC(I)C(=O)NCC(=O)O', 'NC(I)C(=O)NC(Br)C(=O)NCC(=O)O']
Enumerating 2 Compounds....
192
Enumerating 2 Compounds....
20
Enumerating 2 Compounds....
1792

@Sulstice
Copy link
Owner

Ah I need to note something in the documentation here. What ends up happening and this is somewhat of a bug of the algorithm I implemented not so much the code and its pretty glaring. So the enumeration happens something like this

molecule = Chem.MolFromSmiles(self.combinations[i])
smiles_enumerated = Chem.MolToSmiles(molecule, doRandom=True)
if dimensionality == '1D' and smiles_enumerated not in enumerated_molecules:
    enumerated_molecules.append(smiles_enumerated)

So the algorithm itself is inconsistent because we are generating random representations of the string and then seeing if it previously existed in our unique list. The amount of times we try depends on the user sending in different complexity values.

I'm going to label this a bug and something to enhance in future releases.

@Sulstice
Copy link
Owner

I'll also include this in the documentation that your results may vary compared to the example.

@Sulstice Sulstice self-assigned this Apr 19, 2020
@Sulstice Sulstice added the enhancement New feature or request label Apr 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants