#####EXPRESSION_BY_FEATURE REPOSITORY DESCRIPTION
A single gene frequently encodes multiple overlapping transcripts and genes themselves occasionally overlap. Therefore classification of genomic regions into exonic, intronic and other related categories can be ambiguous.
fix_genic_features.py examines transcript annotation download from UCSC table browser in GTF fromat, and then annotated with gene names and manually curated to improve annotation of non-coding transcripts. See curating_RefSeq_GTF for description of annotation and curation procedures.
fix_genic_features.py resolves ambiguity of genomic annotation on per-gene basis. The program uses a hierarchy principle to resolve ambiguities, that for most protein coding genes allows to eventually attribute following six types of non-overlapping genomic features:
- upstream genic regions
- 3'UTRs
- CDS
- introns
- 5'UTRs
- downstream genic regions
The output of fix_genic_features.py is a whole-genome annotation. This annotation allows to quickly characterize every unique alignment of an NGS library with rpkm_genic_features.bxt5.py program, and then to quantify expression of every gene and of every feature-type by summarize_features.py.
Detailed documentation of programs of this repository is in doc/ directory:
fix_genic_features.py documenation