For questions about input file format, see example folder.
For test runs with files in example folder, users will need to modify directories of 'fin_matrices', etc.
When starting from standard output of rMATS, users should use this step to 1) reformat splice junction counts into a PSI (percent-spliced-in) value matrix, and 2) index and 3) move the PSI matrix for IRIS screening (when -d is enabled).
IRIS formatting -h
usage: IRIS formatting [-h] -t {SE,RI,A3,A5} -n DATA_NAME -s {1,2}
[-c COV_CUTOFF] [-e] [-d IRIS_DB_PATH]
rmats_mat_path_manifest rmats_sample_order
required arguments:
rmats_mat_path_manifest
txt manifest of path(s) to rMATS output folder(s)
rmats_sample_order txt manifest of corresponding rMATS input sample order file(s)
Required input for rMATS
-t {SE,RI,A3,A5}, --splicing_event_type {SE,RI,A3,A5}
String of splicing event types based on rMATS definition (SE,RI,A3,A5)
Used to name output file
-n DATA_NAME, --data-name DATA_NAME
Defines dataset name (disease state, study name, group name etc.)
Used during IRIS screening
-s {1,2}, --sample-name-field {1,2}
Specifies sample name field for each sample in sample order file(s)
listed by "rmats_sample_order" (1- BAM file name, 2- BAM folder name)
optional arguments:
-h, --help Shows help message and exits
-c COV_CUTOFF, --cov-cutoff COV_CUTOFF
Average coverage filter for merged matrix (default is 10)
-e, --merge-events-only
Do not perform matrix merge, only merge events list
-d IRIS_DB_PATH, --iris-db-path IRIS_DB_PATH
Path to IRIS database
Formatted/indexed AS matrices are stored here and used during IRIS screening
This step takes a user-defined screening parameter file (example), which performs comparisons against reference databases, and returns tumor-associated, tumor-recurrent, and tumor-specific AS events based on user-defined criteria.
When the -t option is enabled, the screening step translates identified tumor AS events into peptide sequences that can be used in the prediction step.
IRIS screening -h
usage: IRIS screening [-h] [-o OUTDIR] [-t] parameter_fin
required arguments:
parameter_fin File of IRIS screening parameters
-o OUTDIR, --outdir OUTDIR
Directory of IRIS screening results
optional arguments:
-h, --help Shows help message and exits
-t, --translating Translates IRIS-screened tumor splice junctions into peptides
This step takes the screening result and performs annotation of extracellular and HLA-binding epitope predictions to discover immunotherapy targets.
IRIS prediction of HLA-binding epitopes is a massive prediction job that requires access to computing clusters with the SGE system for completion. The 'prediction' step will create qsub scripts for job array submission.
IRIS prediction -h
usage: IRIS prediction [-h] [-p PARAMETER_FIN] [--iedb-local IEDB_LOCAL]
[-c DELTAPSI_COLUMN] [-d DELTAPSI_CUT_OFF] -m MHC_LIST
[--extracellular-anno-by-junction]
IRIS_screening_result_path
required arguments:
IRIS_screening_result_path
Input AS event coordinates and PSI values
-p PARAMETER_FIN, --parameter-fin PARAMETER_FIN
File of parameters used in IRIS screening
--iedb-local IEDB_LOCAL
Specify local IEDB location (if installed)
-m MHC_LIST, --mhc-list MHC_LIST
List of HLA/MHC types among samples
HLA type follows seq2HLA format
optional arguments:
-h, --help Shows help message and exits
-c DELTAPSI_COLUMN, --deltaPSI-column DELTAPSI_COLUMN
Column of deltaPSI value in matrix, 1-based (default is 5th column)
-d DELTAPSI_CUT_OFF, --deltaPSI-cut-off DELTAPSI_CUT_OFF
Defines cutoff of deltaPSI (or other metric) to select tumor-enriched
splice form (default is 0)
--extracellular-anno-by-junction
By default, CAR-T targets are annotated by association of event
with extracellular domain
This option annotates target based on a junction (not recommended)
IRIS epitope_post -h
usage: IRIS epitope_post [-h] -p PARAMETER_FIN -o OUTDIR -m MHC_BY_SAMPLE
[-e GENE_EXP_MATRIX] [--ic50-cut-off IC50_CUT_OFF]
required arguments:
-p PARAMETER_FIN, --parameter_fin PARAMETER_FIN
File of parameters used in IRIS screening
-o OUTDIR, --outdir OUTDIR
Directory of IRIS screening results
-m MHC_BY_SAMPLE, --mhc-by-sample MHC_BY_SAMPLE
Tab-delimited matrix of HLA/MHC type vs. samples
HLA type follows seq2HLA format
-e GENE_EXP_MATRIX, --gene-exp-matrix GENE_EXP_MATRIX
Tab-delimited matrix of gene expression vs. samples
optional arguments:
-h, --help Shows help message and exits
--ic50-cut-off IC50_CUT_OFF
Specifies IC50 cut-off to define HLA-binding epitopes (default is 500)
When starting from a FASTQ file, users should use this step to perform RNA-Seq alignment and quantification. This module uses STAR and cufflinks. This module only takes one sample (can be multiple FASTQ files) for each run. Users are recommended to run this module in parallel in the SGE system.
IRIS process_rnaseq -h
usage: IRIS process_rnaseq [-h] --starGenomeDir STARGENOMEDIR --gtf GTF -p
SAMPLEID_OUTDIR [--db-length DB_LENGTH] [--mapping]
[--quant] [--sort]
readsFilesRNA
required arguments:
--starGenomeDir STARGENOMEDIR
Path to STAR-indexed reference genome
Passes to "genomeDir" parameter in STAR
--gtf GTF Genome annotation file.
-p SAMPLEID_OUTDIR, --sampleID-outdir SAMPLEID_OUTDIR
Output directory, where sample ID will be used as output folder name
--db-length DB_LENGTH
Passes to "sjdbOverhang" parameter in STAR (default is 100)
readsFilesRNA Specifies path to paired-end FASTQ files for sample
Files separated by ","
optional arguments:
-h, --help Shows help message and exits
--mapping Only perform reads mapping
--quant Only perform gene expression and AS quantification
--sort Only perform BAM file sorting
After running 'process_rnaseq', this step can be used to prepare files to run rMATS-turbo in parallel in the SGE system.
IRIS makeqsub_rmats -h
usage: IRIS makeqsub_rmats [-h] --rMATS-path RMATS_PATH --bam-dir BAM_DIR
--gtf GTF --read-length READ_LENGTH
required arguments:
--rMATS-path RMATS_PATH
Path to rMATS-turbo script
--bam-dir BAM_DIR Path one level higher to folders containing BAM file generated by 'process_rnaseq'
--gtf GTF Genome annotation file
--read-length READ_LENGTH
Passes to "readLength" parameter in rMATS-turbo
optional arguments:
-h, --help Shows help message and exits
After running 'process_rnaseq', if samples of interest are all processed, users can use this script to generate a gene expression matrix, which will be used as annotations in downstream IRIS prediction and/or proteomics reports.
IRIS exp_matrix -h
usage: IRIS exp_matrix [-h] [--exp-cutoff EXP_CUTOFF] -o OUTDIR -n DATA_NAME
gene_exp_file_list
required arguments:
gene_exp_file_list txt manifest of path(s) of cufflinks gene expression output(s)
-n DATA_NAME, --data-name DATA_NAME
Name of dataset (disease state, study name, group name, etc.)
optional arguments:
-h, --help Shows help message and exits
--exp-cutoff EXP_CUTOFF
Gene expression cut-off based on FPKM (default is 1)
-o OUTDIR, --outdir OUTDIR
Output directory for IRIS exp_matrix
This step is incorporated by formatting. For users who already have a matrix of AS PSI values (generated by rMATS or another tool), this command could finish the indexing and other steps to prepare for IRIS screening.
IRIS indexing -h
usage: IRIS indexing [-h] -n DATA_NAME [-d DB_DIR] splicing_matrix
required arguments:
splicing_matrix Tab-delimited matrix of splicing events (row) vs. sample IDs (col)
-n DATA_NAME, --data-name DATA_NAME
Name of data matrix (disease state, study name, group name, etc.) being
formatted & indexed
Used by IRIS during screening
optional arguments:
-h, --help Shows help message and exits
-d DB_DIR, --db-dir DB_DIR
Directory of IRIS database
Program creates a folder in this directory for IRIS to recognize
IRIS translation -h
usage: IRIS translation [-h] -g REF_GENOME -o OUTDIR [-c DELTAPSI_COLUMN]
[-d DELTAPSI_CUT_OFF] [--no-tumor-form-selection]
as_input
required arguments:
as_input Inputs AS event coordinates and PSI values
-g REF_GENOME, --ref-genome REF_GENOME
Specifies reference genome (FASTA format) location
-o OUTDIR, --outdir OUTDIR
Defines IRIS translation output directory
optional arguments:
-h, --help Show help message and exits
-c DELTAPSI_COLUMN, --deltaPSI-column DELTAPSI_COLUMN
Column of deltaPSI value in matrix, 1-based (default is 5th column)
-d DELTAPSI_CUT_OFF, --deltaPSI-cut-off DELTAPSI_CUT_OFF
Defines cutoff of deltaPSI (or other metric) used to select tumor-enriched
splice form (default is 0)
--no-tumor-form-selection
Translates splicing junctions derived from both skipping and inclusion forms
This step uses the RNA-Seq FASTQ file to infer the HLA type of a sample.
IRIS seq2hla -h
usage: IRIS seq2hla [-h] -b SEQ2HLA_PATH -p SAMPLEID_OUTDIR readsFilesCaseRNA
required arguments:
-b SEQ2HLA_PATH, --seq2hla-path SEQ2HLA_PATH
Path to seq2hla folder
-p SAMPLEID_OUTDIR, --sampleID-outdir SAMPLEID_OUTDIR
Output directory, where sample ID will be used as output folder name
readsFilesCaseRNA Tumor sample paired-end fastq files, separated by ","
optional arguments:
-h, --help Shows help message and exits
This module is a wrapper of prediction tools (IEDB) for predicting peptide-HLA binding. The 'prediction' and 'epitope_post' modules can make qsub submissions to run this module in parallel and summarize the result into one TCR target report.
usage: IRIS pep2epitope [-h] [-e EPITOPE_LEN_LIST] [-a HLA_ALLELE_LIST] -o
OUTDIR [--iedb-local IEDB_LOCAL]
[--ic50-cut-off IC50_CUT_OFF]
junction_pep_input
required arguments:
junction_pep_input Inputs AS event coordinates and PSI values
-e EPITOPE_LEN_LIST, --epitope-len-list EPITOPE_LEN_LIST
Epitope length for prediction (default is 9,10,11)
-a HLA_ALLELE_LIST, --hla-allele-list HLA_ALLELE_LIST
List of HLA types (default is HLA-A*01:01, HLA-B*08:01, HLA-C*07:01)
-o OUTDIR, --outdir OUTDIR
Define output directory of pep2epitope
--iedb-local IEDB_LOCAL
Specify local IEDB location (if installed)
--ic50-cut-off IC50_CUT_OFF
Cut-off based on median value of consensus-predicted IC50 values (default is 500)
```
optional arguments:
-h, --help Shows help message and exits