contact hloucks@ucsc.edu
This workflow will run FCS-GX and FCS Adapter workflows and output the reports generated by that process as well as a gzipped fasta with the contamination removed and adapters hard masked.
- Assembly
- FCS - GX database download instructions here (validation step before running)
If running locally, the inputs.json file should look something like this:
{
"RunFCS.assembly": "Assembly.fa.gz",
"RunFCS.blast_div": "/test-only/test-only.blast_div.tsv.gz",
"RunFCS.GXI": "/test-only/test-only.gxi",
"RunFCS.GXS": "/test-only/test-only.gxs",
"RunFCS.manifest": "/test-only/test-only.manifest",
"RunFCS.metaJSON": "/test-only/test-only.meta.jsonl",
"RunFCS.seq_info":"/test-only/test-only.seq_info.tsv.gz",
"RunFCS.taxa": "/test-only/test-only.taxa.tsv",
"RunFCS.diskSizeGBGX": 500,
"RunFCS.diskSizeGBAdapter": 32,
"RunFCS.threadCount": 20,
"RunFCS.preemptible": 1
}
The script will localize all of the database files - you can ignore the readme file.
- Assembly.clean.fasta.gz - assembly with contam contigs/scaffolds removed
- Assembly.contam.fasta.gz - fasta file containing the contamination contigs/scaffolds
- Assembly.fcs_gx_report.txt - this is the FCS report of the genomic contamination
- Assembly.fa.adapterClean.fa.gz - this is the cleaned version with adapter sequences
- fcs_adaptor_report.txt - report of the adapter contamination identified
- This workflow is hard coded for human assemblies
- As of 6/13/23 there is an issue with the output of FCS-adapter being labeled as .gz but not gzipped, which you can see reflected in this workflow. If FCS adapter updates this it will need to be updated
- The naming convention of the outfiles often includes ".fa" due to the naming convention of the FCS GX screen