Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transcript analysis (Grch37/38) - Log #31

Open
zhx828 opened this issue Mar 26, 2020 · 6 comments
Open

Transcript analysis (Grch37/38) - Log #31

zhx828 opened this issue Mar 26, 2020 · 6 comments

Comments

@zhx828
Copy link

zhx828 commented Mar 26, 2020

biomart mapping file, genes do no have entrez

genes not in cBioPortal

hugo symbol does not match with cBioPortal

Problem(mismatch) transcripts

problem_transcripts.txt

gene protein length check (ones without protein length do not have pfam, vice versa)

OncoKB issues

Good thing is, for both 37/38, they are using the same transcript.
But there are still two issues

  • the transcript GN uses vs OncoKB Use are different (I used the msk-transcript column from GN)
  • some of the hugo symbols are different
    grch37_mismatch_gn_oncokb.txt
@zhx828 zhx828 changed the title Transcript analysis - Log Transcript analysis (Grch37/38) - Log Mar 26, 2020
@inodb
Copy link
Member

inodb commented Mar 27, 2020

This is great! Thank you so much!

@inodb
Copy link
Member

inodb commented Apr 2, 2020

Thanks again @zhx828 !

A few questions

  • What do you mean by:

    ones without protein length do not have pfam, vice versa

  • Do we currently have grch38 loaded in the main cBioPortal? I think there is a seed database but im not sure if it's actually loaded. What does it mean exactly when a grch38 gene is not in cBioPortal?

  • What are problem mismatch transcripts? They mismatch between grch37 and grch38? Is that based on just the first part of the id? E.g. ENSTx.y so x is the same between grch37 and grch38 or does it include the y version part. Note that the protein length might change when .y changes between grch37 and grch38 (in most cases it does not but occasionally it does)

@zhx828
Copy link
Author

zhx828 commented Apr 2, 2020

  • What do you mean by: ones without protein length do not have pfam, vice versa

if you look at the *_info.txt files, the genes that do not have protein length, they do not have pfam data either. So we just need to look at one factor.

  • Do we currently have grch38 loaded in the main cBioPortal? I think there is a seed database but im not sure if it's actually loaded. What does it mean exactly when a grch38 gene is not in cBioPortal?

I just pulled the genes from the portal and run through both versions to see whether these entrez genes are in cbioportal. Didn't compare between 37 and 38 though. They might be identical. I think this is mainly for Ramya to finalize the portal gene table.

  • What are problem mismatch transcripts? They mismatch between grch37 and grch38? Is that based on just the first part of the id? E.g. ENSTx.y so x is the same between grch37 and grch38 or does it include the y version part. Note that the protein length might change when .y changes between grch37 and grch38 (in most cases it does not but occasionally it does)

I didn't not check y. Only the x to see whether they are the same.

@inodb

@inodb
Copy link
Member

inodb commented Apr 2, 2020

Thanks so much @zhx828 !

I didn't not check y. Only the x to see whether they are the same.

I see so there are a few corner cases where even though the id matches the length might not be the same. Yeah it's weird 🙂. So it's good to check if the length matches as well. For starters at least for OncoKB annotated genes

@zhx828
Copy link
Author

zhx828 commented Apr 2, 2020

Thanks so much @zhx828 !

I didn't not check y. Only the x to see whether they are the same.

I see so there are a few corner cases where even though the id matches the length might not be the same. Yeah it's weird 🙂. So it's good to check if the length matches as well. For starters at least for OncoKB annotated genes

Cool, will do. Thanks!

@zhx828
Copy link
Author

zhx828 commented Jun 8, 2020

This is related to genome-nexus/genome-nexus#306

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants