Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PD-2559andPD-2558: UMI filtering and cell barcode correction #1263

Merged
merged 10 commits into from
Apr 25, 2024
11 changes: 8 additions & 3 deletions pipelines/skylab/multiome/Multiome.changelog.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,20 @@
# 3.4.3
2024-04-24 (Date of Last Commit)

* Updated the input parameters for STARsolo in STARsoloFastq task. These include the parameters: soloCBmatchWLtype, soloUMIdedup and soloUMIfiltering
* Added "Uniform" as the default string for STARsolo multimapping parameters

# 3.4.2
20240403 (Date of Last Commit)
2024-04-03 (Date of Last Commit)
* Modified adaptor trimming in Paired-tag WDL; this does not impact Multiome

# 3.4.1
20240326 (Date of Last Commit)
2024-03-26 (Date of Last Commit)

* Updated the median umi per cell metric for STARsolo library-level metrics

# 3.4.0
20240315 (Date of Last Commit)
2024-03-15 (Date of Last Commit)

* Added cell metrics to the library-level metrics

Expand Down
2 changes: 1 addition & 1 deletion pipelines/skylab/multiome/Multiome.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ import "https://raw.githubusercontent.com/broadinstitute/CellBender/v0.3.0/wdl/c

workflow Multiome {

String pipeline_version = "3.4.2"
String pipeline_version = "3.4.3"

input {
String input_id
Expand Down
9 changes: 7 additions & 2 deletions pipelines/skylab/optimus/Optimus.changelog.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,16 @@
# 6.6.2
2024-04-24 (Date of Last Commit)

* Updated the input parameters for STARsolo in STARsoloFastq task. These include the parameters: soloCBmatchWLtype, soloUMIdedup and soloUMIfiltering
* Added "Uniform" as the default string for STARsolo multimapping parameters

# 6.6.1
20240326 (Date of Last Commit)
2024-03-26 (Date of Last Commit)

* Updated the median umi per cell metric for STARsolo library-level metrics

# 6.6.0
20240315 (Date of Last Commit)
2024-03-15 (Date of Last Commit)

* Added cell metrics to the library-level metrics

Expand Down
4 changes: 2 additions & 2 deletions pipelines/skylab/optimus/Optimus.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ workflow Optimus {
File tar_star_reference
File annotations_gtf
File? mt_genes
String? soloMultiMappers
String? soloMultiMappers = "Uniform"

# Chemistry options include: 2 or 3
Int tenx_chemistry_version
Expand Down Expand Up @@ -65,7 +65,7 @@ workflow Optimus {
# version of this pipeline


String pipeline_version = "6.6.1"
String pipeline_version = "6.6.2"


# this is used to scatter matched [r1_fastq, r2_fastq, i1_fastq] arrays
Expand Down
6 changes: 6 additions & 0 deletions pipelines/skylab/paired_tag/PairedTag.changelog.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
# 0.5.1
2024-04-12 (Date of Last Commit)

* Updated the input parameters for STARsolo in STARsoloFastq task. These include the parameters: soloCBmatchWLtype, soloUMIdedup and soloUMIfiltering
* Added "Uniform" as the default string for STARsolo multimapping parameters

# 0.5.0
20240403 (Date of Last Commit)

Expand Down
2 changes: 1 addition & 1 deletion pipelines/skylab/paired_tag/PairedTag.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ import "../../../pipelines/skylab/optimus/Optimus.wdl" as optimus
import "../../../tasks/skylab/H5adUtils.wdl" as H5adUtils
import "../../../tasks/skylab/PairedTagUtils.wdl" as Demultiplexing
workflow PairedTag {
String pipeline_version = "0.5.0"
String pipeline_version = "0.5.1"

input {
String input_id
Expand Down
9 changes: 7 additions & 2 deletions pipelines/skylab/slideseq/SlideSeq.changelog.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,15 @@
# 3.1.5
2024-04-12 (Date of Last Commit)

* Updated the input parameters for STARsolo in STARsoloFastq task. These include the parameters: soloCBmatchWLtype, soloUMIdedup and soloUMIfiltering

# 3.1.4
20240326 (Date of Last Commit)
2024-03-26 (Date of Last Commit)

* Updated the median umi per cell metric for STARsolo library-level metrics

# 3.1.3
20240315 (Date of Last Commit)
2024-03-15 (Date of Last Commit)

* Added cell metrics to the library-level metrics CSV; this does not impact the slide-seq pipeline

Expand Down
2 changes: 1 addition & 1 deletion pipelines/skylab/slideseq/SlideSeq.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ import "../../../tasks/skylab/MergeSortBam.wdl" as Merge

workflow SlideSeq {

String pipeline_version = "3.1.4"
String pipeline_version = "3.1.5"

input {
Array[File] r1_fastq
Expand Down
Original file line number Diff line number Diff line change
@@ -1,10 +1,15 @@
# 1.3.4
2024-04-12 (Date of Last Commit)

* Updated the input parameters for STARsolo in STARsoloFastq task. These include the parameters: soloCBmatchWLtype, soloUMIdedup and soloUMIfiltering

# 1.3.3
20240326 (Date of Last Commit)
2024-03-26 (Date of Last Commit)

* Updated the median umi per cell metric for STARsolo library-level metrics

# 1.3.2
20240315 (Date of Last Commit)
2024-03-15 (Date of Last Commit)

* Added cell metrics to the library-level metrics CSV; this does not impact the Single-nucleus Multi Sample Smartseq pipeline

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ workflow MultiSampleSmartSeq2SingleNucleus {
String? input_id_metadata_field
}
# Version of this pipeline
String pipeline_version = "1.3.3"
String pipeline_version = "1.3.4"

if (false) {
String? none = "None"
Expand Down
82 changes: 23 additions & 59 deletions tasks/skylab/StarAlign.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -252,7 +252,7 @@ task STARsoloFastq {
}

command <<<
set -e
set -e

UMILen=10
CBLen=16
Expand Down Expand Up @@ -292,79 +292,43 @@ task STARsoloFastq {
## single cell or whole cell
COUNTING_MODE="Gene"
echo "Running in ~{counting_mode} mode. The Star parameter --soloFeatures will be set to $COUNTING_MODE"
STAR \
--soloType Droplet \
--soloStrand ~{star_strand_mode} \
--runThreadN ~{cpu} \
--genomeDir genome_reference \
--readFilesIn "~{sep=',' r2_fastq}" "~{sep=',' r1_fastq}" \
--readFilesCommand "gunzip -c" \
--soloCBwhitelist ~{white_list} \
--soloUMIlen $UMILen --soloCBlen $CBLen \
--soloFeatures $COUNTING_MODE \
--clipAdapterType CellRanger4 \
--outFilterScoreMin 30 \
--soloCBmatchWLtype 1MM_multi_Nbase_pseudocounts \
--soloUMIdedup 1MM_Directional_UMItools \
--outSAMtype BAM SortedByCoordinate \
--outSAMattributes UB UR UY CR CB CY NH GX GN sF \
--soloBarcodeReadLength 0 \
--soloCellReadStats Standard \
~{"--soloMultiMappers " + soloMultiMappers}
elif [[ "~{counting_mode}" == "sn_rna" ]]
then
## single nuclei
if [[ ~{count_exons} == false ]]
then
COUNTING_MODE="GeneFull_Ex50pAS"
echo "Running in ~{counting_mode} mode. Count_exons is false and the Star parameter --soloFeatures will be set to $COUNTING_MODE"
STAR \
--soloType Droplet \
--soloStrand ~{star_strand_mode} \
--runThreadN ~{cpu} \
--genomeDir genome_reference \
--readFilesIn "~{sep=',' r2_fastq}" "~{sep=',' r1_fastq}" \
--readFilesCommand "gunzip -c" \
--soloCBwhitelist ~{white_list} \
--soloUMIlen $UMILen --soloCBlen $CBLen \
--soloFeatures $COUNTING_MODE \
--clipAdapterType CellRanger4 \
--outFilterScoreMin 30 \
--soloCBmatchWLtype 1MM_multi_Nbase_pseudocounts \
--soloUMIdedup 1MM_Directional_UMItools \
--outSAMtype BAM SortedByCoordinate \
--outSAMattributes UB UR UY CR CB CY NH GX GN sF \
--soloBarcodeReadLength 0 \
--soloCellReadStats Standard \
~{"--soloMultiMappers " + soloMultiMappers}
else
COUNTING_MODE="GeneFull_Ex50pAS Gene"
echo "Running in ~{counting_mode} mode. Count_exons is true and the Star parameter --soloFeatures will be set to $COUNTING_MODE"
STAR \
--soloType Droplet \
--soloStrand ~{star_strand_mode} \
--runThreadN ~{cpu} \
--genomeDir genome_reference \
--readFilesIn "~{sep=',' r2_fastq}" "~{sep=',' r1_fastq}" \
--readFilesCommand "gunzip -c" \
--soloCBwhitelist ~{white_list} \
--soloUMIlen $UMILen --soloCBlen $CBLen \
--soloFeatures $COUNTING_MODE \
--clipAdapterType CellRanger4 \
--outFilterScoreMin 30 \
--soloCBmatchWLtype 1MM_multi_Nbase_pseudocounts \
--soloUMIdedup 1MM_Directional_UMItools \
--outSAMtype BAM SortedByCoordinate \
--outSAMattributes UB UR UY CR CB CY NH GX GN sF \
--soloBarcodeReadLength 0 \
--soloCellReadStats Standard \
~{"--soloMultiMappers " + soloMultiMappers}
echo "Running in ~{counting_mode} mode. Count_exons is true and the Star parameter --soloFeatures will be set to $COUNTING_MODE"
fi
else
echo Error: unknown counting mode: "$counting_mode". Should be either sn_rna or sc_rna.
exit 1;
fi

STAR \
--soloType Droplet \
--soloStrand ~{star_strand_mode} \
--runThreadN ~{cpu} \
--genomeDir genome_reference \
--readFilesIn "~{sep=',' r2_fastq}" "~{sep=',' r1_fastq}" \
--readFilesCommand "gunzip -c" \
--soloCBwhitelist ~{white_list} \
--soloUMIlen $UMILen --soloCBlen $CBLen \
--soloFeatures $COUNTING_MODE \
--clipAdapterType CellRanger4 \
--outFilterScoreMin 30 \
--soloCBmatchWLtype 1MM_multi \
--soloUMIdedup 1MM_CR \
--outSAMtype BAM SortedByCoordinate \
--outSAMattributes UB UR UY CR CB CY NH GX GN sF \
--soloBarcodeReadLength 0 \
--soloCellReadStats Standard \
~{"--soloMultiMappers " + soloMultiMappers} \
--soloUMIfiltering MultiGeneUMI_CR

echo "UMI LEN " $UMILen

touch barcodes_sn_rna.tsv
Expand Down
Loading