VNtyper 2.0 is an advanced pipeline designed to genotype MUC1 coding Variable Number Tandem Repeats (VNTR) in Autosomal Dominant Tubulointerstitial Kidney Disease (ADTKD-MUC1) using Short-Read Sequencing (SRS) data. This version integrates enhanced variant calling algorithms, robust logging mechanisms, and streamlined installation processes to provide researchers with a powerful tool for VNTR analysis.
- We have developed a web server to provide free access to VNtyper, which runs in the background for ease of use. Access it through the following link: vntyper-online
- Features
- Installation
- Usage
- Pipeline Overview
- Dependencies
- Pipeline Logic Diagram
- Results
- Notes
- Citations
- Contributing
- License
- Contact
-
Variant Calling Algorithms:
- Kestrel: Mapping-free genotyping using k-mer frequencies.
- code-adVNTR (optional): Profile-HMM-based method for VNTR genotyping.
- SHARK (optional, FASTQ-only): Rapid filtering and read extraction for MUC1 region in exome/whole-genome data.
-
Comprehensive Logging:
- Logs both to the console and a dedicated log file.
- Generates MD5 checksums for all downloaded and processed files.
-
Flexible Installation:
- Supports installation via
pip
usingsetup.py
. - Provides Conda environment setup for easy dependency management.
- Supports installation via
-
Subcommands:
install-references
pipeline
fastq
bam
kestrel
report
cohort
online
VNtyper 2.0 can be installed using either pip
with setup.py
or via Conda environments for streamlined dependency management.
-
Clone the Repository:
mkdir vntyper git clone https://github.com/hassansaei/vntyper.git cd vntyper pip install .
VNtyper 2.0 offers multiple subcommands that can be used depending on your input data and requirements. Below are the main subcommands available:
To run the entire pipeline using a BAM file:
vntyper --config-path /path/to/config.json pipeline \
--bam /path/to/sample.bam \
--output-dir /path/to/output/dir \
--threads 4 --fast-mode
Alternatively, using paired-end FASTQ files:
vntyper --config-path /path/to/config.json pipeline \
--fastq1 /path/to/sample_R1.fastq.gz \
--fastq2 /path/to/sample_R2.fastq.gz \
--output-dir /path/to/output/dir \
--threads 4 --fast-mode
The adVNTR genotyping is optional and skipped by default. To enable adVNTR genotyping, use the --extra-modules advntr
option.
New: To enable SHARK filtering on FASTQ reads before the usual QC and alignment (for improved MUC1 detection), add shark
to the --extra-modules
flag (e.g., --extra-modules shark
). This can be done as:
vntyper --config-path /path/to/config.json pipeline \
--fastq1 /path/to/sample_R1.fastq.gz \
--fastq2 /path/to/sample_R2.fastq.gz \
--extra-modules shark \
--threads 4 \
--output-dir /path/to/output/dir
- SHARK will run first on the raw FASTQ files to extract and filter reads covering the MUC1 VNTR region.
- Important: SHARK is only supported in FASTQ mode. If you try to use
--extra-modules shark
together with--bam
or--cram
, VNtyper will exit gracefully with a warning.
Docker image for VNtyper 2.0 is provided and can be pulled and used as follows:
# pull the docker image
docker pull saei/vntyper:main
# run the pipeline using the docker image
docker run -w /opt/vntyper --rm \
-v /local/input/folder/:/opt/vntyper/input \
-v /local/output/folder/:/opt/vntyper/output \
saei/vntyper:main \
vntyper pipeline \
--bam /opt/vntyper/input/filename.bam \
-o /opt/vntyper/output/filename/
An Apptainer image can be generated from the Docker image as follows:
# create the apptainer sif image
apptainer pull docker://saei/vntyper:main
# run the pipeline using the apptainer image
apptainer run --pwd /opt/vntyper \
-B /local/input/folder/:/opt/vntyper/input \
-B /local/output/folder/:/opt/vntyper/output \
vntyper_main.sif vntyper pipeline \
--bam /opt/vntyper/input/filename.bam \
-o /opt/vntyper/output/filename/
vntyper --config-path /path/to/config.json install-references \
--output-dir /path/to/reference/install \
--skip-indexing # Optional: skip BWA indexing if needed
vntyper --config-path /path/to/config.json report \
--output-dir /path/to/output/dir
VNtyper 2.0 integrates multiple steps into a streamlined pipeline. The following is an overview of the steps involved:
- FASTQ Quality Control: Raw FASTQ files are checked for quality.
- (Optional) SHARK Filtering: If
shark
is specified in--extra-modules
, raw FASTQ reads are first filtered to extract MUC1-specific reads (especially relevant for exome or large WGS datasets). - Alignment: Reads are aligned using BWA (if FASTQ files are provided).
- Kestrel Genotyping: Mapping-free genotyping of VNTRs.
- (Optional) adVNTR Genotyping: Profile-HMM-based method for VNTR genotyping (requires additional setup).
- Summary Report Generation: A final HTML report is generated to summarize the results.
VNtyper 2.0 relies on several tools and Python libraries. Ensure that the following dependencies are available in your environment:
- Python >= 3.9
- BWA
- Samtools
- Fastp
- Pandas
- Numpy
- Biopython
- Pysam
- Jinja2
- Matplotlib
- Seaborn
- IGV-Reports
You can easily set up these dependencies via the provided Conda environment file.
Below is a logical overview of the VNtyper pipeline:
graph TD
A[Input: FASTQ/BAM] -->|Quality Control| B[Alignment BWA]
B -->|Genotyping| C[Kestrel]
C --> D[Optional: adVNTR]
D --> E[Generate Summary Report]
E --> F[Output: VCF, Summary HTML]
Once the pipeline completes, you will have:
- BAM or FASTQ slices containing MUC1-specific reads.
- VCF files or TSV files with genotyping results (for Kestrel and optional adVNTR).
- HTML summary report detailing coverage stats, genotyping calls, and relevant logs.
- This tool is for research use only.
- Ensure high-coverage WES/WGS or targeted data is used to genotype MUC1 VNTR accurately.
- For questions or issues, refer to the GitHub repository for support.
If you use VNtyper 2.0 in your research, please cite the following:
- Saei H, Morinière V, Heidet L, et al. VNtyper enables accurate alignment-free genotyping of MUC1 coding VNTR using short-read sequencing data. iScience. 2023.
- Audano PA, Ravishankar S, et al. Mapping-free variant calling using haplotype reconstruction from k-mer frequencies. Bioinformatics. 2018.
- Park J, Bakhtiari M, et al. Detecting tandem repeat variants in coding regions using code-adVNTR. iScience. 2022.
We welcome contributions to VNtyper. Please refer to the CONTRIBUTING.md file for guidelines.
VNtyper is licensed under the BSD 3-Clause License. See the LICENSE file for more details.
For questions or issues, please open an issue on GitHub or email the corresponding authors listed in the manuscript.