Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(Execution workflow): CSI-UTR - report Format 04 BED file in workflow (per-PAS fractional relative usage) #388

Open
Tracked by #382
SamBryce-Smith opened this issue Jul 18, 2022 · 4 comments
Assignees

Comments

@SamBryce-Smith
Copy link
Collaborator

SamBryce-Smith commented Jul 18, 2022

Parent issue - #382

Per updated execution workflow output specifications, We need CSI-UTR to report the per-PAS fractional relative usage in a format 04 BED file.

CSI-UTR calculates a number of relative usage metrics, but the one that fits the format 04 convention is the 'PSI' metric which is equivalent to the percent a polyA site is used relative to total expression of PAS in that gene/terminal exon. If reported on the % scale (0-100) this needs to be converted to a fraction by dividing by 100.

@SamBryce-Smith SamBryce-Smith changed the title CSI-UTR Execution workflow: CSI-UTR - report Format 04 BED file in workflow (per-PAS fractional relative usage) Jul 18, 2022
@yuukiiwa
Copy link
Collaborator

yuukiiwa commented Aug 4, 2022

Hi @SamBryce-Smith and @faricazjj, CSI-UTR's differential analysis output reports the PSI values for both of the samples in a 0-1 format, which do not need to be divided by 100. Should I split the differential analysis output reports into two quantification beds for each of the samples?
Here is the example differential analysis output file from CSI-UTR/TestCases.md

CSI     ENSGENE GENE_SYM        PSI1 (LOAD)     PSI2 (Control)  deltaPSI (LOAD-Control) P-value FDR
ENSG00000189241:116278517_116277027-116276921   ENSG00000189241 TSPYL1  0.068122        0.089263        -0.021141       5e-05   0.00679553001277139
ENSG00000189241:116278517_116277126-116277027   ENSG00000189241 TSPYL1  0.086873        0.109091        -0.022218       0.000114        0.012597769470405
ENSG00000189241:116278517_116278517-116277246   ENSG00000189241 TSPYL1  0.505286        0.456247        0.049039        0       0
ENSG00000100796:91458759_91458759-91458147      ENSG00000100796 PPP4R3A 0.580057        0.443966        0.136091        0.000605        0.0425812764550264
ENSG00000174684:66346049_66345714-66345577      ENSG00000174684 B4GAT1  0.314256        0.342451        -0.028195       0.000374        0.0302434133738602
ENSG00000174684:66346049_66346049-66345844      ENSG00000174684 B4GAT1  0.215083        0.177637        0.037446        0       0
ENSG00000119314:112223851_112219214-112219045   ENSG00000119314 PTBP3   0.134671        0.074944        0.059727        3e-06   0.000676385593220339
ENSG00000196652:99532247_99532615-99532684      ENSG00000196652 ZKSCAN5 0.036234        0.11213 -0.075896       6.2e-05 0.00795888540410133
ENSG00000126785:63291182_63291740-63291852      ENSG00000126785 RHOJ    0.047283        0.178141        -0.130858       0.000583        0.0415550529135968
ENSG00000115310:54973156_54972313-54972195      ENSG00000115310 RTN4    0.08397 0.101558        -0.017588       0       0
ENSG00000115310:54973156_54972352-54972313      ENSG00000115310 RTN4    0.144597        0.174865        -0.030268       0       0
ENSG00000115310:54973156_54972948-54972890      ENSG00000115310 RTN4    0.089743        0.076191        0.013552        6.6e-05 0.00830211347517731

Thanks!

@faricazjj
Copy link
Collaborator

@yuukiiwa Thanks for looking into this! :D From what I understand from the output we could split the differential analysis output into two quantification beds for each of the condition. But I'm going to tag @mrgazzara here for extra input :p I have 2 questions!

  1. So this is the tool that needs two conditions and two replicates per condition and the output will be relative usage per condition. Is this still something we want to implement now considering the other tools calculate relative usage per sample only and not per condition?
  2. Where do we extract PAS from the output file?

@mrgazzara
Copy link
Collaborator

I will have to look into this a little bit further. The usual way to get individual sample quantification with tools like this that require multiple conditions (because they're more focused on differential) is to run it with the same sample against itself. The requirement to also have a replicate might be a dealbreaker. I need to read the paper to see.

@faricazjj
Copy link
Collaborator

@mrgazzara When i implemented it I think I tried running it with the same sample against itself but naming the conditions different, and the replicates were the same sample and I also named the "replicates" differently but it errored out. The only way I could run it was if the replicates were distinct

@ninsch3000 ninsch3000 changed the title Execution workflow: CSI-UTR - report Format 04 BED file in workflow (per-PAS fractional relative usage) feat [Execution workflow]: CSI-UTR - report Format 04 BED file in workflow (per-PAS fractional relative usage) Aug 9, 2023
@ninsch3000 ninsch3000 changed the title feat [Execution workflow]: CSI-UTR - report Format 04 BED file in workflow (per-PAS fractional relative usage) feat(Execution workflow): CSI-UTR - report Format 04 BED file in workflow (per-PAS fractional relative usage) Aug 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants