Skip to content

Commit

Permalink
Merge pull request #625 from KristinaGomoryova/scp_galaxy
Browse files Browse the repository at this point in the history
galaxy wrapper for the scp tool
  • Loading branch information
hechth authored Jan 22, 2025
2 parents 129488b + 5adfebd commit a0a1a3d
Show file tree
Hide file tree
Showing 17 changed files with 1,933 additions and 0 deletions.
10 changes: 10 additions & 0 deletions tools/bioconductor-scp/.shed.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
owner: recetox
remote_repository_url: "https://github.com/RECETOX/galaxytools/tree/master/tools/bioconductor-scp"
homepage_url: "https://uclouvain-cbio.github.io/scp/index.html"
categories:
- Proteomics
- Single Cell
description: "scp is a package for the single cell proteomics data processing."
long_description: "scp is an R package for the analysis of mass spectrometry-based single cell proteomics data. It builds on the QFeatures package and allows aggregation to peptide or protein level, data transformation such as log2 transformation or normalization, batch correction and imputation of missing values. It also provides several quality control metrics."
type: unrestricted
name: bioconductor_scp
515 changes: 515 additions & 0 deletions tools/bioconductor-scp/bioconductor_scp.xml

Large diffs are not rendered by default.

38 changes: 38 additions & 0 deletions tools/bioconductor-scp/help.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
<macros>

<token name="@GENERAL_HELP@">
Scp Help section
===================

Overview
--------
The `scp` tool facilitates the processing of the mass spectrometry-based single cell proteomics (SCP) data. It builds on the `scp` R package developed in the laboratory of prof. Laurent Gatto and provides functions for the peptide-to-spectrum match (PSM), peptide or protein-level filtering, normalization, transformation and imputation of missing values.

The source code can be found in the following Github repository or on BioConductor:
.. _GitHub: https://github.com/UCLouvain-CBIO/scp/
.. _issues: https://github.com/UCLouvain-CBIO/scp/issues
.. _Bioconductor: https://www.bioconductor.org/packages/release/bioc/html/scp.html

Workflow
--------

The scp workflow currently supports the processing of MaxQuant results and requires two input files:

- evidence.txt file (output from MaxQuant)
- sampleAnnotation file (provided by user). The SampleAnnotation file is a metadata file, describing annotation of individual samples (such as quantification column names, batches, sample types, etc.). Please note, that the run identifier column MUST be present in both evidence and sampleAnnotation files.

The workflow starts at the level of PSM. Firstly, the data are filtered extensively to keep only the most reliable identifications: reverse sequences and potential contaminants are removed, as well as PSMs below certain parental ion fraction threshold or not passing a q-value threshold. Also batches with very few features are excluded.

Subsequently, PSMs are aggregated to peptide level. On the peptide level, another filtering is applied based on median relative intensity or median CV. Peptide-level intensities are then normalized and log2 transformed.

Such intensities are then further aggregated to the protein level, where they undergo another normalization and imputation of missing values.

Because of the unavoidable batch effects present in the single-cell data, scp offers two methods for the batch correction: ComBat and removeBatchEffect() from the limma package.

Finally, dimenson reduction such as PCA or UMAP (on the PCA components) is provided. PCA and UMAP plots are then provided alongside with the (optional) quality controls plots within the `Plots` collection.

Final log2 transformed, normalized, imputed and batch-corrected data are provided, with the option to export also intermediate results.

Due to the internal complexity of data formats handling, we opted for one form with pre-defined settings for the whole processing pipeline. However, we highly recommend to check also QC plots and intermediate results and based on that adjust the workflow settings.
</token>
</macros>
239 changes: 239 additions & 0 deletions tools/bioconductor-scp/macros.xml

Large diffs are not rendered by default.

Binary file added tools/bioconductor-scp/test-data/PCA.pdf
Binary file not shown.
Binary file added tools/bioconductor-scp/test-data/QC_boxplot.pdf
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added tools/bioconductor-scp/test-data/QC_medianCV.pdf
Binary file not shown.
Binary file added tools/bioconductor-scp/test-data/QC_plot_SCR.pdf
Binary file not shown.
Binary file not shown.
Binary file added tools/bioconductor-scp/test-data/UMAP.pdf
Binary file not shown.
1,000 changes: 1,000 additions & 0 deletions tools/bioconductor-scp/test-data/evidence_subset.txt

Large diffs are not rendered by default.

65 changes: 65 additions & 0 deletions tools/bioconductor-scp/test-data/sampleAnnotation.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
runCol quantCols SampleType lcbatch sortday digest
1 190222S_LCA9_X_FP94BM Reporter.intensity.1 Carrier LCA9 s8 N
2 190222S_LCA9_X_FP94BM Reporter.intensity.2 Reference LCA9 s8 N
3 190222S_LCA9_X_FP94BM Reporter.intensity.3 Unused LCA9 s8 N
4 190222S_LCA9_X_FP94BM Reporter.intensity.4 Monocyte LCA9 s8 N
5 190222S_LCA9_X_FP94BM Reporter.intensity.5 Blank LCA9 s8 N
6 190222S_LCA9_X_FP94BM Reporter.intensity.6 Monocyte LCA9 s8 N
7 190222S_LCA9_X_FP94BM Reporter.intensity.7 Macrophage LCA9 s8 N
8 190222S_LCA9_X_FP94BM Reporter.intensity.8 Macrophage LCA9 s8 N
9 190222S_LCA9_X_FP94BM Reporter.intensity.9 Macrophage LCA9 s8 N
10 190222S_LCA9_X_FP94BM Reporter.intensity.10 Macrophage LCA9 s8 N
11 190222S_LCA9_X_FP94BM Reporter.intensity.11 Macrophage LCA9 s8 N
12 190222S_LCA9_X_FP94BM Reporter.intensity.12 Unused LCA9 s8 N
13 190222S_LCA9_X_FP94BM Reporter.intensity.13 Unused LCA9 s8 N
14 190222S_LCA9_X_FP94BM Reporter.intensity.14 Unused LCA9 s8 N
15 190222S_LCA9_X_FP94BM Reporter.intensity.15 Unused LCA9 s8 N
16 190222S_LCA9_X_FP94BM Reporter.intensity.16 Unused LCA9 s8 N
17 190321S_LCA10_X_FP97AG Reporter.intensity.1 Carrier LCA10 s8 Q
18 190321S_LCA10_X_FP97AG Reporter.intensity.2 Reference LCA10 s8 Q
19 190321S_LCA10_X_FP97AG Reporter.intensity.3 Unused LCA10 s8 Q
20 190321S_LCA10_X_FP97AG Reporter.intensity.4 Macrophage LCA10 s8 Q
21 190321S_LCA10_X_FP97AG Reporter.intensity.5 Monocyte LCA10 s8 Q
22 190321S_LCA10_X_FP97AG Reporter.intensity.6 Macrophage LCA10 s8 Q
23 190321S_LCA10_X_FP97AG Reporter.intensity.7 Macrophage LCA10 s8 Q
24 190321S_LCA10_X_FP97AG Reporter.intensity.8 Macrophage LCA10 s8 Q
25 190321S_LCA10_X_FP97AG Reporter.intensity.9 Macrophage LCA10 s8 Q
26 190321S_LCA10_X_FP97AG Reporter.intensity.10 Macrophage LCA10 s8 Q
27 190321S_LCA10_X_FP97AG Reporter.intensity.11 Macrophage LCA10 s8 Q
28 190321S_LCA10_X_FP97AG Reporter.intensity.12 Unused LCA10 s8 Q
29 190321S_LCA10_X_FP97AG Reporter.intensity.13 Unused LCA10 s8 Q
30 190321S_LCA10_X_FP97AG Reporter.intensity.14 Unused LCA10 s8 Q
31 190321S_LCA10_X_FP97AG Reporter.intensity.15 Unused LCA10 s8 Q
32 190321S_LCA10_X_FP97AG Reporter.intensity.16 Unused LCA10 s8 Q
33 190914S_LCB3_X_16plex_Set_21 Reporter.intensity.1 Carrier LCB3 s9 R
34 190914S_LCB3_X_16plex_Set_21 Reporter.intensity.2 Reference LCB3 s9 R
35 190914S_LCB3_X_16plex_Set_21 Reporter.intensity.3 Unused LCB3 s9 R
36 190914S_LCB3_X_16plex_Set_21 Reporter.intensity.4 Unused LCB3 s9 R
37 190914S_LCB3_X_16plex_Set_21 Reporter.intensity.5 Macrophage LCB3 s9 R
38 190914S_LCB3_X_16plex_Set_21 Reporter.intensity.6 Macrophage LCB3 s9 R
39 190914S_LCB3_X_16plex_Set_21 Reporter.intensity.7 Blank LCB3 s9 R
40 190914S_LCB3_X_16plex_Set_21 Reporter.intensity.8 Monocyte LCB3 s9 R
41 190914S_LCB3_X_16plex_Set_21 Reporter.intensity.9 Macrophage LCB3 s9 R
42 190914S_LCB3_X_16plex_Set_21 Reporter.intensity.10 Monocyte LCB3 s9 R
43 190914S_LCB3_X_16plex_Set_21 Reporter.intensity.11 Blank LCB3 s9 R
44 190914S_LCB3_X_16plex_Set_21 Reporter.intensity.12 Macrophage LCB3 s9 R
45 190914S_LCB3_X_16plex_Set_21 Reporter.intensity.13 Macrophage LCB3 s9 R
46 190914S_LCB3_X_16plex_Set_21 Reporter.intensity.14 Macrophage LCB3 s9 R
47 190914S_LCB3_X_16plex_Set_21 Reporter.intensity.15 Macrophage LCB3 s9 R
48 190914S_LCB3_X_16plex_Set_21 Reporter.intensity.16 Macrophage LCB3 s9 R
49 190321S_LCA10_X_FP97_blank_01 Reporter.intensity.1 Blank LCA10 s8 NA
50 190321S_LCA10_X_FP97_blank_01 Reporter.intensity.2 Blank LCA10 s8 NA
51 190321S_LCA10_X_FP97_blank_01 Reporter.intensity.3 Blank LCA10 s8 NA
52 190321S_LCA10_X_FP97_blank_01 Reporter.intensity.4 Blank LCA10 s8 NA
53 190321S_LCA10_X_FP97_blank_01 Reporter.intensity.5 Blank LCA10 s8 NA
54 190321S_LCA10_X_FP97_blank_01 Reporter.intensity.6 Blank LCA10 s8 NA
55 190321S_LCA10_X_FP97_blank_01 Reporter.intensity.7 Blank LCA10 s8 NA
56 190321S_LCA10_X_FP97_blank_01 Reporter.intensity.8 Blank LCA10 s8 NA
57 190321S_LCA10_X_FP97_blank_01 Reporter.intensity.9 Blank LCA10 s8 NA
58 190321S_LCA10_X_FP97_blank_01 Reporter.intensity.10 Blank LCA10 s8 NA
59 190321S_LCA10_X_FP97_blank_01 Reporter.intensity.11 Blank LCA10 s8 NA
60 190321S_LCA10_X_FP97_blank_01 Reporter.intensity.12 Blank LCA10 s8 NA
61 190321S_LCA10_X_FP97_blank_01 Reporter.intensity.13 Blank LCA10 s8 NA
62 190321S_LCA10_X_FP97_blank_01 Reporter.intensity.14 Blank LCA10 s8 NA
63 190321S_LCA10_X_FP97_blank_01 Reporter.intensity.15 Blank LCA10 s8 NA
64 190321S_LCA10_X_FP97_blank_01 Reporter.intensity.16 Blank LCA10 s8 NA
66 changes: 66 additions & 0 deletions tools/bioconductor-scp/utils.r
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# Export intermediate results
# Function to export a single assay with metadata
export_assay_with_metadata <- function(qf, assay_name) {
# Extract assay data, row metadata, and col metadata
assay_data <- SummarizedExperiment::assay(qf[[assay_name]])
row_metadata <- as.data.frame(SummarizedExperiment::rowData(qf[[assay_name]]))
col_metadata <- as.data.frame(SummarizedExperiment::colData(qf))
# Combine row metadata with assay data
export_data <- cbind(RowNames = rownames(assay_data), row_metadata, as.data.frame(assay_data))
# Save the table to a CSV file
output_file <- file.path("outputs", paste0(assay_name, "_export.txt"))
write.table(export_data, output_file, row.names = FALSE, sep = "\t", quote = F)
}

# Export all assays
export_all_assays <- function(qf) {
# Get the names of all assays
# assay_names <- names(assays(qf))
assay_names <- c("peptides", "peptides_norm", "peptides_log", "proteins", "proteins_norm", "proteins_imptd")
dir.create("outputs")
# Export each assay
for (assay_name in assay_names) {
export_assay_with_metadata(qf, assay_name)
}
}

# Plot the QC boxplots
create_boxplots <- function(scp, i, is_log2, name) {
sce <- scp[[i]]
assay_data <- as.data.frame(SummarizedExperiment::assay(sce)) |>
tibble::rownames_to_column("FeatureID")
col_data <- as.data.frame(SummarizedExperiment::colData(scp)) |>
tibble::rownames_to_column("SampleID")
long_data <- assay_data |>
tidyr::pivot_longer(
cols = -FeatureID,
names_to = "SampleID",
values_to = "Value"
)
long_data <- long_data |>
dplyr::left_join(col_data, by = "SampleID")
if (is_log2 == TRUE) {
long_data$Value <- log2(long_data$Value)
}
long_data |>
dplyr::filter(Value != "NaN") |>
ggplot2::ggplot(ggplot2::aes(x = runCol, y = Value, fill = SampleType)) +
ggplot2::geom_boxplot() +
ggplot2::theme_bw() +
ggplot2::labs(
title = name,
x = "Run",
y = "Log2 intensity"
) +
ggplot2::theme(axis.text.x = ggplot2::element_text(angle = 45, hjust = 1))
}

# Heatmap
plot_heatmap <- function(scp, i) {
sce <- scp[[i]]
heatmap_mat <- as.matrix(SummarizedExperiment::assay(sce))
heatmap_mat[is.na(heatmap_mat)] <- 0
heatmap_bin <- ifelse(heatmap_mat > 0, 1, 0)
colnames(heatmap_bin) <- gsub("Reporter.intensity.", "", colnames(heatmap_bin))
heatmap(heatmap_bin, scale = "none", col = c("white", "black"), labRow = FALSE, margins = c(10, 5), cexCol = 0.5)
}

0 comments on commit a0a1a3d

Please sign in to comment.