Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Out of frame protein sequences #4

Closed
ayavinash opened this issue May 28, 2015 · 1 comment
Closed

Out of frame protein sequences #4

ayavinash opened this issue May 28, 2015 · 1 comment

Comments

@ayavinash
Copy link

Problem: proteins whose translation start sites are not certain gives out of frame sequences.
Solution: Somehow frame of the first exon should be included while generating CDS.

refGenome=Genome(name="GRCh38.80")
refProt=refGenome.get(Protein,id="ENSP00000349216")[0]
print "pyGeno"
print refProt.sequence
gencode_seq="XHIRIMKRRVHTHWDVNISFREASCSQDGNLPTLISSVHRSRHLVMPEHQS
RCEFQRGSLEIGLRPAGDLLGKRLGRSPRISSDCFSEKRARSESPQEALLLPRELGPSMAPEDHYRRLV
SALSEASTFEDPQRLYHLGLPSHDLLRVRQEVAAAALRGPSGLEAHLPSSTAGQRRKQGL
AQHREGAAPAAAPSFSERELPQPPPLLSPQNAPHVALGPHLRPPFLGVPSALCQTPGYGF
LPPAQAEMFAWQQELLRKQNLARLELPADLLRQKELESARPQLLAPETALRPNDGAEELQ
RRGALLVLNHGAAPLLALPPQGPPGSGPPTPSRDSARRAPRKGGPGPASARPSESKEMTG
ARLWAQDGSEDEPPKDSDGEDPETAAVGCRGPTPGQAPAGGAGAEGKGLFPGSTLPLGFP
YAVSPYFHTGAVGGLSMDGEEAPAPEDVTKWTVDDVCSFVGGLSGCGEYTRVFREQGIDG
ETLPLLTEEHLLTNMGLKLGPALKIRAQVARRLGRVFYVASFPVALPLQPPTLRAPEREL
GTGEQPLSPTTATSPYGGGHALAGQTSPKQENGTLALLPGAPDPSQPLC"
print "GENCODE"
print gencode_seq
first_exon_frame=refProt.transcript.exons[0].frame
print first_exon_frame
new_seq= "X"+translateDNA(refProt.transcript.cDNA[0:-3],frame="f"+str(1+first_exon_frame))
print "Corrected sequence"
print new_seq
print showDifferences(gencode_seq,new_seq)

@tariqdaouda
Copy link
Owner

Hi,

Thanks for the issue. pyGeno by default abides by the information provided by Ensembl. But if you know which proteins you are interested into, pyGeno provide tools for translating them into the reading frame of your choosing.

If you don't want to apply an offset simply do:

import pyGeno.tools.UsefulFunctions as uf

uf.translateDNA(refProt.transcript.cDNA, frame = "f2")

If you want to apply an offset:

import pyGeno.tools.UsefulFunctions as uf

#get the exons
exons = refProt.transcript.exons

#apply the offset to the first exon
CDS1 = refProt.chromosome.getSequence(e.CDS_start -1, e.CDS_end)

#this will contain the shifted sequence
protSeq = [CDS1]
#loop through the other exons
for e in exons[1:]:
  protSeq.append(e.CDS)

#concatenate the sequences
protSeq = ''.join(protSeq)

uf.translateDNA(protSeq, frame = "f2")

tariqdaouda pushed a commit that referenced this issue Jan 8, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants