Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

select_uniprot_variants cannot handle mismatches with canonical transcript for a given UniProt #15

Open
stuartmac opened this issue Feb 8, 2016 · 4 comments

Comments

@stuartmac
Copy link
Member

from variants.to_table import select_uniprot_variants
select_uniprot_variants('P11586')
2016-02-08 09:36:41,670 - INFO - Starting new HTTP connection (1): www.uniprot.org 
2016-02-08 09:36:41,994 - DEBUG - "GET /uniprot/P11586 HTTP/1.1" 200 None 
2016-02-08 09:36:42,336 - INFO - Starting new HTTP connection (1): www.uniprot.org 
2016-02-08 09:36:42,386 - DEBUG - "GET /uniprot/?query=accession%3AP11586&contact=&columns=organism%2Csequence&format=tab HTTP/1.1" 200 None 
2016-02-08 09:36:42,390 - INFO - Starting new HTTP connection (1): rest.ensembl.org 
2016-02-08 09:36:42,934 - DEBUG - "GET /xrefs/symbol/homo_sapiens/P11586 HTTP/1.1" 200 222 
2016-02-08 09:36:42,937 - INFO - Starting new HTTP connection (1): rest.ensembl.org 
2016-02-08 09:36:43,474 - DEBUG - "GET /sequence/id/ENSP00000450560?type=protein HTTP/1.1" 200 261 
2016-02-08 09:36:43,475 - WARNING - Sequences don't match! skipping... ENSP00000450560 
2016-02-08 09:36:43,477 - INFO - Starting new HTTP connection (1): rest.ensembl.org 
2016-02-08 09:36:44,083 - DEBUG - "GET /sequence/id/ENSP00000216605?type=protein HTTP/1.1" 200 935 
2016-02-08 09:36:44,084 - WARNING - Sequences don't match! skipping... ENSP00000216605 
Traceback (most recent call last):
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2883, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-13-974825407562>", line 1, in <module>
    select_uniprot_variants('P11586')
  File "/Users/smacgowan/PycharmProjects/ProteoFAV/variants/to_table.py", line 444, in select_uniprot_variants
    table = pd.concat(tables)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/tools/merge.py", line 812, in concat
    copy=copy)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/tools/merge.py", line 845, in __init__
    raise ValueError('No objects to concatenate')
ValueError: No objects to concatenate

This happened because there was a mismatch between the EnsEMBL and UniProt sequences:

image

We can work around it by attempting a 'permissive' EnsEMBL - UniProt comparison that just checks that the sequences are of the same length and log the number of mismatches.

@biomadeira
Copy link
Collaborator

@stuartmac we have a method compare_uniprot_ensembl_sequence that would do this, but doesn't log the number of mismatches yet

@stuartmac
Copy link
Member Author

@biomadeira I implemented one yesterday on my branch ca1bdd4

@stuartmac
Copy link
Member Author

@biomadeira @tbrittoborges That's interesting, the fix was pulled into master even though I opened the pull request before I made that change. So the branch is pulled in its state at the time of the merge and not at the time of the pull request (?).

@tbrittoborges
Copy link
Collaborator

I accepted the PR yesterday.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants