Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs(Execution workflow): update output file specifications for reporting region-level 'polyA site usage scores' #380

Open
2 tasks
SamBryce-Smith opened this issue Jul 13, 2022 · 0 comments
Assignees
Labels
documentation Improvements or additions to documentation

Comments

@SamBryce-Smith
Copy link
Collaborator

SamBryce-Smith commented Jul 13, 2022

Related to #372 & it's PR #373

We have tools like LABRAT, APAlyzer that report their own metrics to represent relative PAS usage within a gene/terminal exon, rather than the relative usage of each individual polyA site within the region. We want to be able consider tools like this for relative quantification benchmarking event.

Tasks:

  • Update execution_workflows/execution_output_specification.md - specify an execution workflow output file format to capture the region-level 'score' for relative PAS usage.
  • Add example output files corresponding to the new specification in execution_workflows/example_output_files.

We are currently prioritising the implementation of fractional PAS usage quantification benchmark. For now this issue serves as a reminder to come back to this if we can with references to previous work/discussions!


Previous suggestion in PR #373

This generated some discussion and was probably not the consensus view. See comments from @dominikburri and my response.

Format 05

This BED file contains positions of regions (e.g. terminal exons of genes, whole genes) and the relative usage values for each identified region in the score column.

Fields:

  • chrom - the name of the chromosome
  • chromStart - the starting position of the feature in the chromosome; this corresponds to the first nucleotide of the region (e.g. terminal exon, gene); the starting position is 0-based, i.e. the first base on the chromosome is numbered 0
  • chromEnd - the ending position of the feature in the chromosome; this corresponds to the last nucleotide of the region.
  • name - defines the name of the identified region. It's recommended to use a conventional identifier (e.g. Ensembl transcript ID, gene ID)
  • score - relative usage value for the identified region
  • strand - defines the strand; either "." (=no strand) or "+" or "-".
@SamBryce-Smith SamBryce-Smith added the documentation Improvements or additions to documentation label Jul 13, 2022
@SamBryce-Smith SamBryce-Smith self-assigned this Jul 13, 2022
@ninsch3000 ninsch3000 changed the title Execution workflows: update output file specifications for reporting region-level 'polyA site usage scores' docs [Execution workflows]: update output file specifications for reporting region-level 'polyA site usage scores' Aug 9, 2023
@ninsch3000 ninsch3000 changed the title docs [Execution workflows]: update output file specifications for reporting region-level 'polyA site usage scores' docs(Execution workflow): update output file specifications for reporting region-level 'polyA site usage scores' Aug 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

1 participant