-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Building Custom Genome Instructions #449
Comments
How to build genome database Install pipeline's Conda environment. $ bash scripts/uninstall_conda_env.sh # to remove any existing pipeline env $ conda activate encd-atac How to build genome database for your own genome .fasta.gz Get a URL for a gzipped blacklist BED file for your genome. If you don't have one then skip this step. An example blacklist for hg38 is here. Find the following lines in and modify them as follows. Give a good name for your genome. For use a correct mitochondrial chromosome name of your genome (e.g. or ). For Perl style regular expression must be used to keep regular chromosome names only in a blacklist filtered () peaks files. This peak files are considered final peaks output of the pipeline and peaks BED files for genome browser tracks ( and ) are converted from these peaks files. Chromosome name filtering with will be done even without the blacklist itself.scripts/build_genome_data.sh[YOUR_OWN_GENOME]MITO_CHR_NAMEchrMMTREGEX_BFILT_PEAK_CHR_NAME.bfilt..bfilt..bigBed.hammock.gz.bfilt.REGEX_BFILT_PEAK_CHR_NAME ... elif [[ $GENOME == "YOUR_OWN_GENOME" ]]; then Perl style regular expression to keep regular chromosomes only.this reg-ex will be applied to peaks after blacklist filtering (b-filt) with "grep -P".so that b-filt peak file (.bfilt.*Peak.gz) will only have chromosomes matching with this patternthis reg-ex will work even without a blacklist.you will still be able to find a .bfilt. peak fileREGEX_BFILT_PEAK_CHR_NAME="chr[\dXY]+" mitochondrial chromosome name (e.g. chrM, MT)MITO_CHR_NAME="chrM" URL for your reference FASTA (fasta, fasta.gz, fa, fa.gz, 2bit)REF_FA="https://some.where.com/your.genome.fa.gz" 3-col blacklist BED file to filter out overlapping peaks from b-filt peak file (.bfilt.*Peak.gz file).leave it empty if you don't have oneBLACKLIST= $ bash scripts/build_genome_data.sh [YOUR_OWN_GENOME] [DESTINATION_DIR] |
I have been using the pipeline with singularity for data coming from human and mouse, and I would now like to use it for data coming from a custom species. How do I build a custom genome with the latest version of the pipeline? Thanks so much!
The text was updated successfully, but these errors were encountered: