Skip to content

This repository contains the scripts used for genome assembly gap filling and annotation

Notifications You must be signed in to change notification settings

Mjaraespejo/Codes_GenomeAnnotation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 

Repository files navigation

Codes_GenomeAnnotation

This repository contains multiple bash, python and R scripts used for genome assembly gap filling and annotation. The gap-filling levereaged HIFI PacBio long reads to improve the genome completenes. The protein-coding genes was built using multi-stage RNA-seq libraries.

In 1_1_Assembly_gapfilling.ipynb are the files and scripts used:

In 1_2_GenomeAssembly_analysis.ipynb are the files and scripts to:

In 2_1_Genome_annotation_TE.ipynb are the files and scripts to:

In 2_2_TEannotation_analysis.ipynb are the files and scripts to:

  • Analyse and plotRepresentation of TE families
  • Analyse and graph TE families prevalence within the genome
  • Analyse and graph TE distribution in the genome

In 2_3_Genome_annotation_CodingGenes.ipynb are the steps to run the transcriptome assembly based on 79 PolyA RNA-seq libraries. Briefly, we integrated 79 RNA-seq samples and current Parhyale reference annotation into a StringTie2 annotation pipeline. Some analyses include:

In 2_4_CodingGenes_Annotation_analysis.ipynb are the steps to run the identification of protein-coding genes. The analyses run include:

  • Reduce redundancy of annotated transcripts using CD-HIT (https://www.bioinformatics.org/cd-hit/)
  • ORF identification with TransDecoder
  • Pfam and BlastP searches to enable homology-based coding region identification
  • Run Pfam search on longest ORFs
  • Run BlastP search on longest ORFs
  • Final coding region predictions
  • Select representattive transcript per gene

In 2_5_BLAST_and_Orthofinder.ipynb are the steps to run the annotation of protein-coding genes using BLAST and ORTHOFINDER. The analyses run include:

In 2_8_Process_MappingStats.ipynb are the steps to summarise RNA-seq maping metrics

Please email [email protected] if there is any problem, thanks! (Manuel)

This is a tool to view the html files.

About

This repository contains the scripts used for genome assembly gap filling and annotation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published