Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Work on testing pipeline #5

Merged
merged 24 commits into from
Jun 7, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions .github/data/reads_to_simulate.csv
Original file line number Diff line number Diff line change
@@ -1,2 +1 @@
NC026423.1,.github/data/assemblies/NC_026423.1.fa
NC026431.1,.github/data/assemblies/NC_026431.1.fa
MK58361X-H3N2,.github/data/assemblies/MK58361X-H3N2.fa
6 changes: 6 additions & 0 deletions .github/scripts/check_outputs.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,12 @@ def check_expected_files_exist(output_dir, sample_ids):
"""
for sample_id in sample_ids:
expected_files = [
f"fastq/fluviewer-nf-v0.2/{sample_id}/{sample_id}_fluviewer_alignment.bam",
f"fastq/fluviewer-nf-v0.2/{sample_id}/{sample_id}_fluviewer_alignment.bam.bai",
f"fastq/fluviewer-nf-v0.2/{sample_id}/{sample_id}_fluviewer_depth_of_cov.png",
f"fastq/fluviewer-nf-v0.2/{sample_id}/{sample_id}_fluviewer_mapping_refs.fa",
f"fastq/fluviewer-nf-v0.2/{sample_id}/{sample_id}_fluviewer_report.tsv",
f"fastq/fluviewer-nf-v0.2/{sample_id}/{sample_id}_genoflu.tsv",
]

for expected_file in expected_files:
Expand Down
12 changes: 10 additions & 2 deletions .github/scripts/download_assemblies.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,13 @@

mkdir -p .github/data/assemblies

curl -o .github/data/assemblies/NC_026423.1.fa "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?retmode=text&id=NC_026423.1&db=nucleotide&rettype=fasta"
curl -o .github/data/assemblies/NC_026431.1.fa "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?retmode=text&id=NC_026431.1&db=nucleotide&rettype=fasta"
curl -o .github/data/assemblies/MK583610.1_segment_1_PB2_H3N2.fa "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?retmode=text&id=MK583610.1&db=nucleotide&rettype=fasta"
curl -o .github/data/assemblies/MK583611.1_segment_2_PB1_H3N2.fa "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?retmode=text&id=MK583611.1&db=nucleotide&rettype=fasta"
curl -o .github/data/assemblies/MK583612.1_segment_3_PA_H3N2.fa "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?retmode=text&id=MK583612.1&db=nucleotide&rettype=fasta"
curl -o .github/data/assemblies/MK583613.1_segment_4_HA_H3N2.fa "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?retmode=text&id=MK583613.1&db=nucleotide&rettype=fasta"
curl -o .github/data/assemblies/MK583614.1_segment_5_NP_H3N2.fa "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?retmode=text&id=MK583614.1&db=nucleotide&rettype=fasta"
curl -o .github/data/assemblies/MK583615.1_segment_6_NA_H3N2.fa "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?retmode=text&id=MK583615.1&db=nucleotide&rettype=fasta"

cat .github/data/assemblies/MK58361*.fa > .github/data/assemblies/MK58361X-H3N2.fa

rm .github/data/assemblies/MK58361*.1_segment_*.fa
7 changes: 7 additions & 0 deletions .github/scripts/download_fluviewer_db.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
#!/bin/bash

mkdir -p .github/data/fluviewer_db

wget -O .github/data/fluviewer_db/FluViewer_db_v_0_1_8.fa.gz https://raw.githubusercontent.com/KevinKuchinski/FluViewer/main/FluViewer_db_v_0_1_8.fa.gz

gunzip .github/data/fluviewer_db/FluViewer_db_v_0_1_8.fa.gz
21 changes: 18 additions & 3 deletions .github/scripts/run_pipeline.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,26 @@

set -eo pipefail

sed -i 's/cpus = 8/cpus = 4/g' nextflow.config
sed -i "s/memory = '32 GB'/memory = '2 GB'/g" nextflow.config
source ${HOME}/.bashrc

eval "$(conda shell.bash hook)"

nextflow run main.nf \
conda activate base

# Check for a sign that we're in the GitHub Actions environment.
# Prevents these settings from being applied in other environments.
if [ -z "${GITHUB_ACTIONS}" ]; then
echo "Not in GitHub Actions environment. Will not modify nextflow.config or FluViewer.nf."
else
echo "In GitHub Actions environment. Modifying nextflow.config and FluViewer.nf."
sed -i 's/cpus = 8/cpus = 4/g' nextflow.config
sed -i '/memory/d' modules/FluViewer.nf
fi

nextflow -log artifacts/nextflow.log \
run main.nf \
-profile conda \
--cache ${HOME}/.conda/envs \
--fastq_input .github/data/fastq \
--db .github/data/fluviewer_db/FluViewer_db_v_0_1_8.fa \
--outdir .github/data/test_output
2 changes: 1 addition & 1 deletion .github/scripts/simulate_reads.sh
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ while IFS=',' read -r sample_id assembly; do
art_illumina \
--paired \
--in ${assembly} \
--fcov 12 \
--fcov 500 \
--len 150 \
--mflen 400 \
--sdev 100 \
Expand Down
50 changes: 50 additions & 0 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
on:
pull_request:
branches:
- main
push:
branches:
- main
workflow_dispatch:
name: Tests
jobs:
test:
strategy:
fail-fast: false
matrix:
nextflow_version:
- "21.04.3"
# - "23.10.1" <- Failing due to 'conda.useMamba = true'. Issue is in test environment. Revisit
name: Run tests
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@master
- name: Create Artifacts Directory
run: mkdir artifacts
- name: Install Miniconda
run: bash .github/scripts/install_conda.sh
- name: Install Nextflow
env:
NXF_VER: ${{ matrix.nextflow_version }}
run: bash .github/scripts/install_nextflow.sh
- name: Create ART Read-Simulation Environment
run: bash .github/scripts/create_art_environment.sh
- name: Download Assemblies
run: bash .github/scripts/download_assemblies.sh
- name: Simulate Reads
run: bash .github/scripts/simulate_reads.sh
- name: Run Pipeline
run: bash .github/scripts/run_pipeline.sh
- name: Create Output Checking Environment
run: bash .github/scripts/create_output_checking_environment.sh
- name: Check Outputs
run: bash .github/scripts/check_outputs.sh
- name: Prepare Artifacts
if: always()
run: bash .github/scripts/prepare_artifacts.sh
- name: Upload Artifacts
uses: actions/upload-artifact@v4
if: always()
with:
name: artifacts-BCCDC-PHL-fluviewer-nf-nextflow-v${{ matrix.nextflow_version }}-${{ github.run_id }}.${{ github.run_attempt }}
path: artifacts
11 changes: 8 additions & 3 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,11 +1,16 @@
.github/data/assemblies
.github/data/fastq
.github/data/fluviewer_db
.nextflow*
work
test*
test_input
test_output
test_data/
./test*
ref/
input_test/
output_test/
Validation_notes.md
.Rproj.user
__pycache__/
assets/genoflu/GenoFLU
*/__pycache__/*.pyc
assets/genoflu/GenoFLU
41 changes: 26 additions & 15 deletions ReadMe.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[![Tests](https://github.com/BCCDC-PHL/fluviewer-nf/actions/workflows/tests.yml/badge.svg)](https://github.com/BCCDC-PHL/fluviewer-nf/actions/workflows/tests.yml)

# FluViewer-nf
# fluviewer-nf

This is a Nextflow pipeline for running the FluViewer analysis tool (https://github.com/KevinKuchinski/FluViewer) and other custom modules to obtain consensus sequences, HA and NA subtypes, clade calls, and amino acid mutations for Influenza A WGS.

Expand Down Expand Up @@ -67,7 +67,13 @@ For a full list of optional arguments, see: https://github.com/KevinKuchinski/Fl

**Example command:**
```
nextflow run FluViewer_installation/main.nf -r 0.1.0 -profile --cache ~/.conda/envs/ --fastq_input flu_A_reference_collection/ --db ref/FluViewer_db_full_20220915.fasta --outdir [outdir]
nextflow run BCCDC-PHL/fluviewer-nf \
-r v0.2.2 \
-profile conda \
--cache ~/.conda/envs \
--fastq_input /path/to/your_fastqs \
--db /path/to/FluViewer_db.fa \
--outdir /path/to/output_dir
```

## Output
Expand Down Expand Up @@ -121,27 +127,32 @@ Output for each run includes:
For each pipeline invocation, each sample will produce a `provenance.yml` file with the following contents. Note the below is a contrived example.

```yml
- process_name: FluViewer
tool_name: FluViewer
tool_version: FluViewer v0.0.2
database used: FluViewer_db_full_20220915.fasta
database_path: /home/{USER}/Flu/ref/FluViewer_db_full_20220915.fasta
database sha256: 55b33afa21ad44ed1e6db896cf420fae6b1524c0ad205775a1ce9dd11595905d
- pipeline_name: BCCDC-PHL/FluViewer-nf
pipeline_version: 0.2.2
timestamp_analysis_start: 2023-11-21T05:43:25.541743
- input_filename: {Sample}_R1.fastq.gz
input_path: /home/{USER{}}/Flu/test_data/test_production_run/{Sample}_R1.fastq.gz
sha256: 47380e49f10374660a2061d3571efe5339401484e646c2b47896fa701dbcf0a8
- input_filename: {Sample}_R2.fastq.gz
input_path: /home/{USER}/Flu/test_data/test_production_run/{Sample}.fastq.gz
sha256: 39c95fd26af111ee9a6caeb840a7aced444b657550efea3ab7f74add0b30f69d
- process_name: fastp
tool_name: fastp
tool_version: 0.23.1
tools:
- tool_name: fastp
tool_version: 0.23.1
- process_name: cutadapt
tool_name: cutadapt
tool_version: 4.1
- pipeline_name: BCCDC-PHL/FluViewer-nf
pipeline_version: 0.2.0
- timestamp_analysis_start: 2023-11-21T05:43:25.541743
tools:
- tool_name: cutadapt
tool_version: 4.1
- process_name: fluviewer
tools:
- tool_name: FluViewer
tool_version: FluViewer v0.0.2
databases:
- database_name: FluViewer_db_full_20220915.fasta
database_path: /home/{USER}/Flu/ref/FluViewer_db_full_20220915.fasta
database_sha256: 55b33afa21ad44ed1e6db896cf420fae6b1524c0ad205775a1ce9dd11595905d

- process_name: nextclade
tool_name: nextclade
tool_version: 2.9.1
Expand Down
Binary file removed bin/__pycache__/tools.cpython-310.pyc
Binary file not shown.
2 changes: 1 addition & 1 deletion environments/main.yml → environments/environment.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: FluViewer-nf
name: fluviewer-nf
channels:
- conda-forge
- bioconda
Expand Down
4 changes: 2 additions & 2 deletions environments/fluviewer.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: FluViewer
name: fluviewer-nf-FluViewer
channels:
- conda-forge
- bioconda
Expand Down Expand Up @@ -209,4 +209,4 @@ dependencies:
- zlib=1.2.13=hd590300_5
- zstd=1.5.5=hfc55251_0
- pip:
- fluviewer
- FluViewer==0.1.11
4 changes: 2 additions & 2 deletions environments/nextclade.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name: FluViewer-nf
name: fluviewer-nf-nextclade
channels:
- conda-forge
- bioconda
- defaults
dependencies:
- nextclade=2.9.1
- nextclade=2.9.1
Loading