Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Calculate Properties" for macromolecules #2552

Closed
ljubica-milovic opened this issue Oct 14, 2024 · 13 comments · Fixed by #2743
Closed

"Calculate Properties" for macromolecules #2552

ljubica-milovic opened this issue Oct 14, 2024 · 13 comments · Fixed by #2743
Assignees

Comments

@ljubica-milovic
Copy link
Collaborator

ljubica-milovic commented Oct 14, 2024

Issue on Ketcher side: #5727

Bellow is a list of properties that should be included:

Name Symbol Unit Explanation Type of biomolecule Note
Molecular mass M kDa (Da and MDa rarely used; 1Da=1g/mol) The mass of one mole (6,022 x 20^24 molecules) of the substance Any
Molecular formula Information about atoms and the number of atoms that make up a specific structure Any
Isoelectric point pI Dimensionless pH at what the protein has no charge Peptide Median of all pKa values for that peptide
Melting temperature Tm (m in subscript) Celsius Temperature at what half of the NA is denatured RNA/DNA
Extinction coefficient ε (lowercase epsilon) 1/Mcm; M=mol/L Measure of how much light of a specific wavelength the substance absorbs; constant for one substance and one λ Peptide We will have ε only for λ=280nm
Hydrophobicity "Fear of water", how much certain residues dislike a water environment; it can influence many things like: solubility, the ability to pass though the cell membrane etc. Peptide The output should not be a number, but a graph with the x-axis representing residue numbers, and the y representing hydrophobicity of every amino acid residue
Monomer count How many monomers of each type there are in the biomolecule, for example: 3 alanines, 5 cysteines etc. Peptide/RNA/DNA

Molecular mass

Any one structure in macromolecules mode

  1. The mass of one monomer is the mass of the structure minus the mass of leaving group atom(s) if an attachment point is occupied.

M(Cys)=121,16g/mol
M(Cys, R1 occupied)=M(Cys)-M(H)=120,15g/mol
M(Cys, R1∧R2 occupied)=M(Cys)-M(H)-M(OH)=103.14g/mol

  1. The molecular mass of the whole polymer is the sum of the molecular masses of its monomers plus the molecular mass of any small molecules connected to it.
  2. Indigo should return one numerical value with the unit in kDa (1kDa=1000g/mol)

Molecular formula

Any one structure in macromolecules mode

  1. The atoms of monomers should be counted individually, taking into account occupied attachment points.
  2. Molecular formula of the whole polymer is the total amount of atom types and their number of all monomers that make the polymer plus the molecular formula of any small molecules connected to those monomers.
  3. Indigo should return a list of atoms with their number count in subscript. Atoms should be ordered: carbon first, hydrogen second, and all other atoms alphabetically.

For example: C2H5O


Isoelectric point

Only peptides

Peptide is any chain that has one or more amino acids in the backbone.

  1. pKa values for all ionizable groups of all monomers should be determined, ignoring the leaving group atoms if an attachment point is occupied.
  2. pI should be the median (not the mean!!!) of all pKa values for all groups of that polymer.
  3. Indigo should return one numerical value.

Melting temperature

Only for two chains of RNA/DNA where every base is connected via a hydrogen bond to a base from the other chain

  1. Variables for the equation are:
    SP (strength parameter per base),
    L (length of nucleotide sequence),
    UPC (molar (mol/L) concentration of unipositive cations; value can be entered by user in mM, but the default value is the average physiological - 140 mM)
    NAC (molar (M=mol/L) concentration of the nucleotide strands; user should enter the value in units μM or nM)

  2. Bases C, T and U are pyrimidines (Y), bases A and G are purines (R). Indigo should read only one chain from the 5' direction observing pairs of nucleotides and assign them a strength parameter (see bellow). Dividing the sum of strength parameters by a number of bases, one gets the strength parameter per base (SP)

Let's say we have a double stranded DNA with the sequence of one strand being: 5'-GACGAATGCT-3'
First we observe the pair GA - in the table bellow we get 8.
For AC=10; CG=10; GA=8; AA=5; AT=7; TG=7; GC=13; CT=8.
For a ten nucleotide sequence we have 9 nucleotide pairs whose sum of strength parameters is 76. So, the strength per base is (SP) 7,6.

  1. The equation for the melting temperature is as follows:
    Tm [°C] = 7,35 * SP + 17,34 * ln(L) + 4,96 * ln(UPC) + 0,89 * ln(NAC) - 25,42

  2. Indigo should return one numerical value.

RY YY RR YR
CG=13 CC=11 GG=11 CG=10
AC=10 TC/UC=8 AG=8 TG/UG=7
GT/GU=10 CT/CU=8 GA=8 CA=7
AT/AU=7 TT/UU/TU/UT=5 AA=5 TA/UA=4

A, C, G, T, and U are to be considered natural analogues.


Extinction coefficient

Only peptides

  1. For peptides the extinction coefficient (at λ=280nm) is ε = N(W)*5500 + N(Y)*1490 + N(C)*125, where N(W), N(Y), and N(C) are number of tryptophans, tyrosines and cysteines.
  2. Inspecto should return one numerical value.

Hydrophobicity

Only peptides

  1. Indigo should return a list with the x-axis values representing amino acid number (skipping non-amino acids, ambiguous amino acids and amino acids with natural analogue X), and the y-axis values representing the hydrophobicity coefficient of that amino acid (see bellow).
Natural analogue of amino acid Coefficient Natural analogue of amino acid Coefficient Natural analogue of amino acid Coefficient Natural analogue of amino acid Coefficient
A 0,616 G 0,501 M 0,738 S 0,359
C 0,680 H 0,165 N 0,236 T 0,450
D 0,028 I 0,943 P 0,711 V 0,825
E 0,043 K 0,283 Q 0,251 W 0,878
F 1,000 L 0,943 R 0,000 Y 0,880

Monomer count

Peptides, RNA, DNA

  1. For peptides, every monomer should be sorted into one of 21 categories (see bellow) and counted.

Peptide is any chain that has one or more amino acids in the backbone.

  1. For RNA/DNA, only bases (who are a part of a nucleotide/nucleoside) should be sorted into one of 6 categories (see bellow) and counted.

RNA/DNA is any chain that has one sugar in the backbone and a base connected to it via R3 (sugar) - R1 (base).

  1. Indigo should return a list containing the number of monomers in each category.
  • For peptides:
Symbol Monomers Symbol Monomers Symbol Monomers
A Alanine, and all other amino acids with natural analogue A I Isoleucine, and all other amino acids with natural analogue I R Arginine, and all other amino acids with natural analogue R
C Cysteine, and all other amino acids with natural analogue C K Lysine, and all other amino acids with natural analogue K S Serine, and all other amino acids with natural analogue S
D Aspartic acid, and all other amino acids with natural analogue D L Leucine, and all other amino acids with natural analogue L T Threonine, and all other amino acids with natural analogue T
E Glutamic acid, and all other amino acids with natural analogue E M Methionine, and all other amino acids with natural analogue M V Valine, and all other amino acids with natural analogue V
F Phenylalanine, and all other amino acids with natural analogue F N Asparagine, and all other amino acids with natural analogue N W Tryptophan, and all other amino acids with natural analogue W
G Glycine, and all other amino acids with natural analogue G P Proline, and all other amino acids with natural analogue P Y Tyrosine, and all other amino acids with natural analogue Y
H Histidine, and all other amino acids with natural analogue H Q Glutamine, and all other amino acids with natural analogue Q Other Amino acids with natural analogues O, U, and X; ambiguous amino acids; all non-amino acid monomers
  • For RNA/DNA:
Symbol Bases
A Adenine, and all other bases with natural analogue A
C Cytosine, and all other bases with natural analogue C
G Guanine, and all other bases with natural analogue G
T Thymine, and all other bases with natural analogue T
U Uracil, and all other bases with natural analogue U
Other Bases with natural analogue X; ambiguous bases
@ljubica-milovic ljubica-milovic changed the title [DRAFT] "Calculate Properties" for macromolecules "Calculate Properties" for macromolecules Oct 15, 2024
@AliaksandrDziarkach AliaksandrDziarkach self-assigned this Jan 20, 2025
@AliaksandrDziarkach
Copy link
Collaborator

About Peptide is any chain that has one or more amino acids in the backbone.
Is these structures should be considered as peptides?

  1. Image

  2. Image

  3. Image

  4. Image

  5. Image

  6. Image

  7. Image

@ljubica-milovic
Copy link
Collaborator Author

ljubica-milovic commented Jan 20, 2025

  1. If the bond is R1-R2 the chain is a peptide with two monomers. If the bond is not R1-R2 there are two chains (each with one monomer) and one of them is a peptide.
  2. H-bonds are always considered as side chain connections, so here we have two chains - one peptide, one CHEM.
  3. One chain, can be considered both a peptide and a nucleic acid. (The idea is to have two tabs - one for one kind of biopolymer and the other for a second kind, so a chain being considered both things at the same time is no problem for this functionality)
  4. Two chains - one peptide, one nucleic acid
  5. Two chains - one peptide, one nucleic acid
  6. Two chains - one peptide, one nucleic acid
  7. Two chains - both peptides

@AliaksandrDziarkach
Copy link
Collaborator

Should two chains of RNA/DNA where every base is connected via a hydrogen bond to a base from the other chain considered as one chain for calculating moloecular mass and monomer count?

@ljubica-milovic
Copy link
Collaborator Author

Is it possible, that if the whole double helix is selected we consider is one structure, and if only one side is selected it is also one structure?

@AliaksandrDziarkach
Copy link
Collaborator

I think ketcher should send only selected monomers. This will remove ambiguity.

@AliaksandrDziarkach
Copy link
Collaborator

Form Melting temperature

  1. Both chains should contain same count of nucleotides
  2. Natural analogs of bases considered in calculation.
  3. No peptides or chems should be connected.
  4. Bases should be connected by hydrogen connections in correct order i.e. 1-1, 2-2, .... n-n

@ljubica-milovic
Copy link
Collaborator Author

Image
Because counting starts from 5' base should be connected 1 to n, 2 to n-1...

@AliaksandrDziarkach
Copy link
Collaborator

What about nucletides? Should it be considered as double chain for Melting temperature?
Like this

Image

@AliaksandrDziarkach
Copy link
Collaborator

For Hydrophobicity it will be vector of pairs (peptid index, corresponding coefficient)?
e.g. having A | 0,616 | G | 0,501 | M | 0,738 | S | 0,359 for AXSMGA it will be (1, 0,616), (2, 0,359), (3, 0,738), (4, 0,501), (5, 0,616) ?

@ljubica-milovic
Copy link
Collaborator Author

Unsplit nucleotides are considered just a normal nucleotide with the appropriate natural analogue for the purpose of this ticket.
As for hydrophobicity, that is correct - the users on Ketcher side will see a graph with the x axis being monomer count, and the y being hydrophobicity coefficient.

@AliaksandrDziarkach
Copy link
Collaborator

AliaksandrDziarkach commented Jan 22, 2025

For Isoelectric point what value should be used in case of even values count? Mean of two central values?
e.g. for [1, 2, 3, 4, 5, 6, 8, 9] it should be (4+5)/2 ?

@AliaksandrDziarkach
Copy link
Collaborator

Microstructures connected without attachment points - should it be considered as backbone if connected to R1 or R2?

@ljubica-milovic
Copy link
Collaborator Author

For the purposes of calculation, I would say yes
But for other - layout for example - we should decide all together later

AliaksandrDziarkach added a commit that referenced this issue Jan 28, 2025
Added code to calculate properties. Added UTs
@AliaksandrDziarkach AliaksandrDziarkach linked a pull request Jan 28, 2025 that will close this issue
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants