Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Master html creation with timestamps #43

Open
wants to merge 24 commits into
base: dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
98895ba
Add py script that generates master html
ryanjameskennedy Dec 16, 2024
05cc1ac
Add basic master html template
ryanjameskennedy Dec 17, 2024
e1b9079
Add basic master html template
ryanjameskennedy Dec 17, 2024
4f18949
Add bootstrap to master html template
ryanjameskennedy Dec 17, 2024
a47a664
Add cards and seqrun date
ryanjameskennedy Dec 19, 2024
d6b6f40
Add generate_master_html module
ryanjameskennedy Dec 19, 2024
25caede
Add generate_master_html module to gmsemu.nf workflow
ryanjameskennedy Dec 19, 2024
11df34f
Add generate_master_html to configs
ryanjameskennedy Dec 19, 2024
b8829d8
Add cmd.config
ryanjameskennedy Dec 20, 2024
917da0e
Add search for date_id
ryanjameskennedy Jan 2, 2025
a4bfa1c
Add nested header to master.html and remove fastqc
ryanjameskennedy Jan 22, 2025
eba0557
Merge pull request #2 from SMD-Bioinformatics-Lund/35-generate-master…
ryanjameskennedy Jan 23, 2025
7ea775e
Merge branch 'dev' into main
ryanjameskennedy Jan 31, 2025
0e37f44
Merge dev branch into main w resolved conflicts
ryanjameskennedy Jan 31, 2025
1bca118
Change pipeline execution output filenames
ryanjameskennedy Feb 3, 2025
112465a
Update generate_master_html to include timestap as input variable
ryanjameskennedy Feb 4, 2025
0dcbacc
Rm MERGE_BARCODES publishDir for unnecessary publishing of reads to s…
ryanjameskennedy Feb 4, 2025
9722dc0
Add params.trace_timestamp
ryanjameskennedy Feb 4, 2025
6050b9b
Update CHANGELOG re generate_master_html
ryanjameskennedy Feb 4, 2025
5836b23
Add changelog_update_reminder GA workflow
ryanjameskennedy Feb 4, 2025
af8db01
Fix params.trace_timestamp in GENERATE_MASTER_HTML process
ryanjameskennedy Feb 4, 2025
a542cb1
Update CHANGELOG re changelog_update_reminder GA workflow
ryanjameskennedy Feb 4, 2025
57145f1
Provide option to save_merged_reads
ryanjameskennedy Feb 12, 2025
b8d1ae1
Add toggling of publishDir for merged reads
ryanjameskennedy Feb 13, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions .github/workflows/changelog_update_reminder.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
name: "Changelog update reminder"
on:
pull_request:
types: [opened, synchronize, reopened, ready_for_review, labeled, unlabeled]

jobs:
changelog:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: dangoslen/changelog-enforcer@v3
with:
changeLogPath: 'CHANGELOG.md'
skipLabel: 'Skip-Changelog'
16 changes: 16 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,24 @@ All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]

### Added

- Added a `generate_master_html` python script that creates `master.html` file containing a table of samples with corresponding pointers to each html output file
- Added repective `GENERATE_MASTER_HTML` process
- Added `cmd.config`
- Added `params.trace_timestamp` to `nextflow.config`
- Added `changelog_update_reminder` GA workflow
- Added optional ability to save merged reads

### Fixed

### Changed

- Provided option to `save_merged_reads`

## [v0.1.0]

### Added
Expand Down
72 changes: 72 additions & 0 deletions assets/master_template.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>16S Samples Report</title>
<link href="https://cdn.jsdelivr.net/npm/[email protected]/dist/css/bootstrap.min.css" rel="stylesheet">
</head>
<body>
<div class="container my-5">
<div class="card">
<div class="card-header text-white bg-primary">
<h2 class="card-title mb-0">Sample Report</h2>
</div>
<div class="card-body">
<div class="table-responsive">
<table class="table table-bordered table-striped table-hover">
<thead class="table-success">
<tr>
<th rowspan="2">Sample ID</th>
<th colspan="1" class="text-center">Results</th>
<th colspan="1" class="text-center">QC</th>
<th colspan="8" class="text-center">NanoPlot</th>
<th colspan="3" class="text-center">Pipeline</th>
</tr>
<tr>
<th class="text-center">Krona</th>
<th class="text-center">MultiQC Report</th>
<th class="text-center">Report</th>
<th class="text-center">Length vs Quality Scatter (Dot)</th>
<th class="text-center">Length vs Quality Scatter (KDE)</th>
<th class="text-center">Non-weighted Histogram</th>
<th class="text-center">Non-weighted Log-transformed Histogram</th>
<th class="text-center">Weighted Histogram</th>
<th class="text-center">Weighted Log-transformed Histogram</th>
<th class="text-center">Yield by Length</th>
<th class="text-center">Execution Report</th>
<th class="text-center">Execution Timeline</th>
<th class="text-center">DAG</th>
</tr>
</thead>
<tbody>
{% for sample_id in sample_ids %}
<tr>
<td>{{ sample_id }}</td>
<td><a href="./krona/{{ sample_id }}_T1_krona.html">Krona</a></td>
<td><a href="./multiqc/multiqc_report.html">MultiQC</a></td>
<td><a href="./nanoplot/{{ sample_id }}_T1_nanoplot_unprocessedLengthvsQualityScatterPlot_dot.html">Dot Scatter Plot</a></td>
<td><a href="./nanoplot/{{ sample_id }}_T1_nanoplot_unprocessedLengthvsQualityScatterPlot_kde.html">KDE Scatter Plot</a></td>
<td><a href="./nanoplot/{{ sample_id }}_T1_nanoplot_unprocessedNanoPlot-report.html">NanoPlot Report</a></td>
<td><a href="./nanoplot/{{ sample_id }}_T1_nanoplot_unprocessedNon_weightedHistogramReadlength.html">Non-weighted Histogram</a></td>
<td><a href="./nanoplot/{{ sample_id }}_T1_nanoplot_unprocessedNon_weightedLogTransformed_HistogramReadlength.html">Non-weighted Log-transformed Histogram</a></td>
<td><a href="./nanoplot/{{ sample_id }}_T1_nanoplot_unprocessedWeightedHistogramReadlength.html">Weighted Histogram</a></td>
<td><a href="./nanoplot/{{ sample_id }}_T1_nanoplot_unprocessedWeightedLogTransformed_HistogramReadlength.html">Weighted Log-transformed Histogram</a></td>
<td><a href="./nanoplot/{{ sample_id }}_T1_nanoplot_unprocessedYield_By_Length.html">Yield by Length</a></td>
<td><a href="./pipeline_info/execution_report_{{ timestamp }}.html">Execution Report</a></td>
<td><a href="./pipeline_info/execution_timeline_{{ timestamp }}.html">Execution Timeline</a></td>
<td><a href="./pipeline_info/pipeline_dag_{{ timestamp }}.html">Pipeline DAG</a></td>
</tr>
{% endfor %}
</tbody>
</table>
</div>
</div>
<div class="card-footer text-muted">
Sequenced on {{ seqrun_date }}
</div>
</div>
</div>
<script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/js/bootstrap.bundle.min.js"></script>
</body>
</html>
115 changes: 115 additions & 0 deletions bin/generate_master_html.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
#!/usr/bin/env python

"""Generate a master html template."""

import os
import re
import argparse
import pandas as pd
from jinja2 import Template
from datetime import datetime

description = '''
------------------------
Title: generate_master_html.py
Date: 2024-12-16
Author(s): Ryan Kennedy
------------------------
Description:
This script creates master html file that points to all html files that were outputted from EMU.

List of functions:
get_sample_ids, generate_master_html.

List of standard modules:
csv, os, argparse.

List of "non standard" modules:
pandas, jinja2.

Procedure:
1. Get sample IDs by parsing samplesheet csv.
2. Render html using template.
3. Write out master.html file.

-----------------------------------------------------------------------------------------------------------
'''

usage = '''
-----------------------------------------------------------------------------------------------------------
Generates master html file that points to all html files.
Executed using: python3 ./generate_master_html.py -i <Input_Directory> -o <Output_Filepath>
-----------------------------------------------------------------------------------------------------------
'''

parser = argparse.ArgumentParser(
description=description,
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog=usage
)
parser.add_argument(
'-v', '--version',
action='version',
version='%(prog)s 0.0.1'
)
parser.add_argument(
'-c', '--csv',
help='input samplesheet csv filepath',
metavar='SAMPLESHEET_CSV_FILEPATH',
dest='csv',
required=True
)
parser.add_argument(
'-m', '--html',
help='input master html template filepath',
metavar='MASTER_HTML_TEMPLATE_FILEPATH',
dest='html',
required=True
)
parser.add_argument(
'-t', '--timestamp',
help='pipeline execution timestamp',
metavar='PIPELINE_EXECUTION_TIMESTAMP',
dest='timestamp',
required=True
)

args = parser.parse_args()

def find_date_in_string(input_string, date_pattern):
"""Searches for a date within a given string."""
date = ""
match = re.search(date_pattern, input_string)
if match:
date_regex = match.group(1)
if len(date_regex) == 8:
date = datetime.strptime(date_regex, "%Y%m%d").strftime("%d-%m-%Y")
elif len(date_regex) > 8:
date = date_regex
else:
date = "(No date found)"
return date

def get_sample_ids(samplesheet_csv):
"""Get sample id from csv."""
df = pd.read_csv(samplesheet_csv)
sample_ids = df['sample'].tolist()
return sample_ids

def generate_master_html(template_html_fpath, sample_ids, seqrun_date, timestamp):
"""Read the template from an HTML file."""
with open(template_html_fpath, "r") as file:
master_template = file.read()
template = Template(master_template)
rendered_html = template.render(sample_ids=sample_ids, seqrun_date=seqrun_date, timestamp=timestamp)
return rendered_html

def main():
sample_ids = get_sample_ids(args.csv)
seqrun_date = find_date_in_string(args.csv, r'/(\d{8})_')
rendered_html = generate_master_html(args.html, sample_ids, seqrun_date, args.timestamp)
with open("master.html", "w") as fout:
fout.write(rendered_html)

if __name__ == "__main__":
main()
30 changes: 30 additions & 0 deletions conf/cmd.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Nextflow config file for running minimal tests
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Defines input files and everything required to run a fast and simple pipeline test.

Use as follows:
nextflow run nf-core/gmsemu -profile test,<docker/singularity> --outdir <OUTDIR>

----------------------------------------------------------------------------------------
*/

params {
process.executor = 'slurm'
process.queue = 'low'
config_profile_name = 'cmd profile'
config_profile_description = 'CMD High performance profile'

// Databases
db = '/fs1/pipelines/gms_16S-dev/assets/databases/emu_database'

// Limit resources so that this can run on GitHub Actions
max_cpus = 60
max_memory = '300.GB'
max_time = '48.h'

// Reads
save_merged_reads = false

}
16 changes: 12 additions & 4 deletions conf/modules.config
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could keep that part and instead we add a flag where you can have the option to save the merged files?
It seems that the files were still saved though in a directory called "merge".
But still, a flag for this is a good idea I think.

Original file line number Diff line number Diff line change
Expand Up @@ -22,17 +22,17 @@ process {
publishDir = [
path: { "${params.outdir}/fastq_pass_merged" },
mode: params.publish_dir_mode,
pattern: 'fastq_pass_merged'
pattern: 'fastq_pass_merged',
enabled: params.save_merged_reads
]
}


withName: MERGE_BARCODES_SAMPLESHEET {
publishDir = [
path: { "${params.outdir}/fastq_pass_merged" },
mode: params.publish_dir_mode,
pattern: 'fastq_pass_merged'
// pattern: '*fastq.gz'
pattern: 'fastq_pass_merged',
enabled: params.save_merged_reads
]
}

Expand All @@ -44,6 +44,14 @@ process {
]
}

withName: GENERATE_MASTER_HTML {
publishDir = [
path: { "${params.outdir}/" },
mode: params.publish_dir_mode,
pattern: 'master.html'
]
}

withName: NANOPLOT1 {
publishDir = [
path: { "${params.outdir}/nanoplot" },
Expand Down
19 changes: 19 additions & 0 deletions modules/local/generate_master_html/main.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
process GENERATE_MASTER_HTML {
// Software MUST be pinned to channel (i.e. "bioconda"), version (i.e. "1.10").
// For Conda, the build (i.e. "pyhdfd78af_1") must be EXCLUDED to support installation on different operating systems.
conda "conda-forge::nf-core=3.0.2"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/nf-core:3.0.2--pyhdfd78af_1':
'quay.io/biocontainers/nf-core:3.0.2' }"

input:
path csv

output:
path 'master.html', emit: master_html

script:
"""
generate_master_html.py --csv ${csv} --html ${params.master_template} --timestamp ${params.trace_timestamp}
"""
}
54 changes: 54 additions & 0 deletions modules/local/generate_master_html/meta.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
name: "emu_abundance"
## TODO nf-core: Add a description of the module and list keywords
description: A taxonomic profiler for metagenomic 16S data optimized for error prone long reads.
keywords:
- Metagenomics
- 16S
- Nanopore

tools:
- "emu":
## TODO nf-core: Add a description and other details for the software below
description: "Emu is a relative abundance estimator for 16s genomic data."
homepage: "https://gitlab.com/treangenlab/emu"
documentation: "https://gitlab.com/treangenlab/emu"
tool_dev_url: "None"
doi: "https://doi.org/10.1038/s41592-022-01520-4"
licence: "['MIT']"

## TODO nf-core: Add a description of all of the variables used as input
input:
# Only when we have meta
- meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test', single_end:false ]
#
## TODO nf-core: Delete / customise this example input
- reads:
type: file
description: fastq.gz file containing metagenomic 16S data
pattern: "*.{fastq.gz}"

## TODO nf-core: Add a description of all of the variables used as output
output:
#Only when we have meta
- meta:
type: map
description: |
Groovy Map containing sample information
e.g. [ id:'test', single_end:false ]
#
- versions:
type: file
description: File containing software versions
pattern: "versions.yml"
## TODO nf-core: Delete / customise this example output
- report:
type: file
description: Report (tsv file) over detected species and estimated number of reads and relative abundance
pattern: "*{.tsv}"

authors:
- "@ryanjameskennedy"
Loading
Loading