Out of frame protein sequences #4

ayavinash · 2015-05-28T11:29:24Z

Problem: proteins whose translation start sites are not certain gives out of frame sequences.
Solution: Somehow frame of the first exon should be included while generating CDS.

refGenome=Genome(name="GRCh38.80")
refProt=refGenome.get(Protein,id="ENSP00000349216")[0]
print "pyGeno"
print refProt.sequence
gencode_seq="XHIRIMKRRVHTHWDVNISFREASCSQDGNLPTLISSVHRSRHLVMPEHQS
RCEFQRGSLEIGLRPAGDLLGKRLGRSPRISSDCFSEKRARSESPQEALLLPRELGPSMAPEDHYRRLV
SALSEASTFEDPQRLYHLGLPSHDLLRVRQEVAAAALRGPSGLEAHLPSSTAGQRRKQGL
AQHREGAAPAAAPSFSERELPQPPPLLSPQNAPHVALGPHLRPPFLGVPSALCQTPGYGF
LPPAQAEMFAWQQELLRKQNLARLELPADLLRQKELESARPQLLAPETALRPNDGAEELQ
RRGALLVLNHGAAPLLALPPQGPPGSGPPTPSRDSARRAPRKGGPGPASARPSESKEMTG
ARLWAQDGSEDEPPKDSDGEDPETAAVGCRGPTPGQAPAGGAGAEGKGLFPGSTLPLGFP
YAVSPYFHTGAVGGLSMDGEEAPAPEDVTKWTVDDVCSFVGGLSGCGEYTRVFREQGIDG
ETLPLLTEEHLLTNMGLKLGPALKIRAQVARRLGRVFYVASFPVALPLQPPTLRAPEREL
GTGEQPLSPTTATSPYGGGHALAGQTSPKQENGTLALLPGAPDPSQPLC"
print "GENCODE"
print gencode_seq
first_exon_frame=refProt.transcript.exons[0].frame
print first_exon_frame
new_seq= "X"+translateDNA(refProt.transcript.cDNA[0:-3],frame="f"+str(1+first_exon_frame))
print "Corrected sequence"
print new_seq
print showDifferences(gencode_seq,new_seq)

tariqdaouda · 2015-06-02T06:15:38Z

Hi,

Thanks for the issue. pyGeno by default abides by the information provided by Ensembl. But if you know which proteins you are interested into, pyGeno provide tools for translating them into the reading frame of your choosing.

If you don't want to apply an offset simply do:

import pyGeno.tools.UsefulFunctions as uf

uf.translateDNA(refProt.transcript.cDNA, frame = "f2")

If you want to apply an offset:

import pyGeno.tools.UsefulFunctions as uf

#get the exons
exons = refProt.transcript.exons

#apply the offset to the first exon
CDS1 = refProt.chromosome.getSequence(e.CDS_start -1, e.CDS_end)

#this will contain the shifted sequence
protSeq = [CDS1]
#loop through the other exons
for e in exons[1:]:
  protSeq.append(e.CDS)

#concatenate the sequences
protSeq = ''.join(protSeq)

uf.translateDNA(protSeq, frame = "f2")

Merge with bloody

tariqdaouda closed this as completed Jun 5, 2015

tariqdaouda pushed a commit that referenced this issue Jan 8, 2018

Merge pull request #4 from tariqdaouda/bloody

dd65915

Merge with bloody

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Out of frame protein sequences #4

Out of frame protein sequences #4

ayavinash commented May 28, 2015

tariqdaouda commented Jun 2, 2015

Out of frame protein sequences #4

Out of frame protein sequences #4

Comments

ayavinash commented May 28, 2015

tariqdaouda commented Jun 2, 2015