-
Notifications
You must be signed in to change notification settings - Fork 0
bioinfocao/pymaf
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Take known or novel transcripts GTF as input, Return transcripts' fasta sequence extracted from multiple species whole genome alignments MAF file (such as UCSC multiz100way) Requirements: ucsc-mafsinregion. Download and install as: http://hgdownload.cse.ucsc.edu/admin/exe/ https://anaconda.org/bioconda/ucsc-mafsinregion Usage: python GTF2Fasta.py -i inout known/novel GTF file -m maf files directory # multiz MAF files dir (chr1.maf.gz ... chrY.maf.gz) -n yes/no filterChrNNN # limit to chrN,chrNN,chrNNN. Any chromosome name longer than 6 letters are skipped. default is yes. -z yes/no # the MAF file in zipped format (as *.maf.gz this is default) file or in unzipped foramt .maf (no) -o output_fasta_dir # the dir for transcripts fasta output, if omitted create extracted_fasta_files folder under input GTF file. -t transcript_catMAF_dir # if not specified will put in output_fasta_dir folder -f yes/no # make fresh transcript_catMAF file? if no, will skip maf file that already exist (size>1). default is no. -s mm10,.../all/multiz60way29mammal/multiz100way58mammal # selected species names, default is keep all species. multiz60way29mammal means 29 mammals, multiz100way58mammal means 58 mammals(see below), all means keep all species, which is default. Please note that not all species' data will exist for every maf file -c transcripts_gtf_to_fasta_cmds.sh # output .sh cmd file, without this argument cmds will be excuate automatically. -g yes/no # only keep MAF files that is at least have another two species' fasta sequence longer than 1/10 of the frist(reference) species, default is yes. -d yes/no # delete temporary nonoverlapExons_bedFiles_ files, default is yes -v 0/1/2 # verbose level default is 1, 0 means nothing, 2 is detail. Example: python GTF2Fasta.py -i hg38_RefSeq.gtf -m ~/data/refgenomes/multiz100way_hg38 -o ~/data/hg38_transcript_catMAF_fasta/ -c hg38_RefSeq_gtf_to_fasta_cmds.sh
About
take transcript GTF as input and return transcripts fasta sequence extracted from MAF file
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published