-
Notifications
You must be signed in to change notification settings - Fork 43
metaGEM parser
The purpose of this page is to explain the inner workings of the metaGEM.sh
parser, which is designed to simplify metaGEM
's user experience/interface. Note that the procedures described below are carried out automatically by the metaGEM.sh
parser file, which orchestrates the submission of jobs on the cluster.
Most importantly, the parser:
- Configures the
Snakefile
to execute the user defined rule - Configures the
config.yaml
file to set root path - Configures the
cluster_config.json
file based on user input - Submits jobs
Each of these operations is discussed in further detail below.
Please see the tutorial for a demonstration of how to use the metaGEM.sh
parser.
The metaGEM.sh
parser takes care of modifying the output string of rule all
at the top of the Snakefile
in order to expand the wildcards of the desired output rule. This is done because target rules cannot contain wildcards in Snakemake-land. In short, the metaGEM.sh
parser stores a string associated with recognized user input tasks such as fastp
, megahit
, concoct
, etc. The parser will check the user input for the --task|-t
flag, and compare that to a list of list of recognized tasks. If the user input task matches the recognized task, then a task-specific string is defined as shown below:
elif [ $task == "fastp" ]; then
string='expand(config["path"]["root"]+"/"+config["folder"]["qfiltered"]+"/{IDs}/{IDs}_1.fastq.gz", IDs = IDs)'
if [ $local == "true" ]; then
submitLocal
else
submitCluster
fi
The submitLocal
or submitCluster
functions will then use this task-specific string to modify the output of rule all
in line 22 of the Snakefile
:
# Parse Snakefile rule all (line 22 of Snakefile) input to match output of desired target rule stored in "$string". Note: Hardcoded line number.
echo "Parsing Snakefile to target rule: $task ... "
sed -i "22s~^.*$~ $string~" Snakefile
Please refer to the Snakefile
wiki page for more information.
In order for metaGEM
to be able to find its way around your files/cluster we need to ensure that the root
directory in the config.yaml
file is set to the current directory, i.e. whatever location you are running metaGEM
from. Note that this is done automatically every time you run the metaGEM.sh
parser with the following code:
# Set root folder
echo "Setting current directory to root ... "
root=$(pwd)
sed -i "2s~/.*$~$root~" config.yaml # hardcoded line for root, change the number 2 if any new lines are added to the start of config.yaml
Please refer to the config.yaml
wiki page for more information.
Finally, the parser prepares the cluster_config.json
file for job submission by setting the desired resources as defined by user inputs --cores|-c
, --mem|-m
, and --hours|-h
. Please refer to the cluster_config.json
wiki page for more information.
After configuring the above mentioned files, the metaGEM.sh
parser will display the modified config.yaml
and cluster_config.json
files for user verification. After the user confirms that the files are properly configured then metaGEM.sh
performs a dry run of the jobs, meaning that it checks that everything is properly configured and rule dependencies can be properly resolved. You will generally get an error message here if the wildcards or Snakefile
are not properly configured. After the user verifies that the dry run jobs expanded wildcards as expected then the jobs are finally submitted to the cluster workload manager.
You may also refer to the metaGEM.sh
help message for information regarding flags and available tasks.
_________________________________________________________________________/\\\\\\\\\\\\___/\\\\\\\\\\\\\\\___/\\\\____________/\\\\_
_______________________________________________________________________/\\\//////////___\/\\\///////////___\/\\\\\\________/\\\\\\_
__________________________________________/\\\________________________/\\\______________\/\\\______________\/\\\//\\\____/\\\//\\\_
____/\\\\\__/\\\\\________/\\\\\\\\____/\\\\\\\\\\\___/\\\\\\\\\_____\/\\\____/\\\\\\\__\/\\\\\\\\\\\______\/\\\\///\\\/\\\/_\/\\\_
__/\\\///\\\\\///\\\____/\\\/////\\\__\////\\\////___\////////\\\____\/\\\___\/////\\\__\/\\\///////_______\/\\\__\///\\\/___\/\\\_
_\/\\\_\//\\\__\/\\\___/\\\\\\\\\\\______\/\\\_________/\\\\\\\\\\___\/\\\_______\/\\\__\/\\\______________\/\\\____\///_____\/\\\_
_\/\\\__\/\\\__\/\\\__\//\\///////_______\/\\\_/\\____/\\\/////\\\___\/\\\_______\/\\\__\/\\\______________\/\\\_____________\/\\\_
_\/\\\__\/\\\__\/\\\___\//\\\\\\\\\\_____\//\\\\\____\//\\\\\\\\/\\__\//\\\\\\\\\\\\/___\/\\\\\\\\\\\\\\\__\/\\\_____________\/\\\_
_\///___\///___\///_____\//////////_______\/////______\////////\//____\////////////_____\///////////////___\///______________\///__
Usage: bash metaGEM.sh [-t|--task TASK]
[-j|--nJobs NUMBER OF JOBS]
[-c|--cores NUMBER OF CORES]
[-m|--mem GB RAM]
[-h|--hours MAX RUNTIME]
[-l|--local]
Snakefile wrapper/parser for metaGEM.
Options:
-t, --task Specify task to complete:
SETUP
createFolders
downloadToy
organizeData
WORKFLOW
fastp
megahit
crossMap
concoct
metabat
maxbin
binRefine
binReassemble
extractProteinBins
carveme
memote
organizeGEMs
smetana
extractDnaBins
gtdbtk
abundance
grid
prokka
roary
VISUALIZATION (in development)
qfilterVis
assemblyVis
binningVis
taxonomyVis
modelVis
interactionVis
growthVis
-j, --nJobs Specify number of jobs to run in parallel
-c, --nCores Specify number of cores per job
-m, --mem Specify memory in GB required for job
-h, --hours Specify number of hours to allocated to job runtime
-l, --local Run jobs on local machine for non-cluster usage
- Quality filter reads with fastp
- Assembly with megahit
- Draft bin sets with CONCOCT, MaxBin2, and MetaBAT2
- Refine & reassemble bins with metaWRAP
- Taxonomic assignment with GTDB-tk
- Relative abundances with bwa
- Reconstruct & evaluate genome-scale metabolic models with CarveMe and memote
- Species metabolic coupling analysis with SMETANA