-
Notifications
You must be signed in to change notification settings - Fork 14
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #625 from KristinaGomoryova/scp_galaxy
galaxy wrapper for the scp tool
- Loading branch information
Showing
17 changed files
with
1,933 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
owner: recetox | ||
remote_repository_url: "https://github.com/RECETOX/galaxytools/tree/master/tools/bioconductor-scp" | ||
homepage_url: "https://uclouvain-cbio.github.io/scp/index.html" | ||
categories: | ||
- Proteomics | ||
- Single Cell | ||
description: "scp is a package for the single cell proteomics data processing." | ||
long_description: "scp is an R package for the analysis of mass spectrometry-based single cell proteomics data. It builds on the QFeatures package and allows aggregation to peptide or protein level, data transformation such as log2 transformation or normalization, batch correction and imputation of missing values. It also provides several quality control metrics." | ||
type: unrestricted | ||
name: bioconductor_scp |
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
<macros> | ||
|
||
<token name="@GENERAL_HELP@"> | ||
Scp Help section | ||
=================== | ||
|
||
Overview | ||
-------- | ||
The `scp` tool facilitates the processing of the mass spectrometry-based single cell proteomics (SCP) data. It builds on the `scp` R package developed in the laboratory of prof. Laurent Gatto and provides functions for the peptide-to-spectrum match (PSM), peptide or protein-level filtering, normalization, transformation and imputation of missing values. | ||
|
||
The source code can be found in the following Github repository or on BioConductor: | ||
.. _GitHub: https://github.com/UCLouvain-CBIO/scp/ | ||
.. _issues: https://github.com/UCLouvain-CBIO/scp/issues | ||
.. _Bioconductor: https://www.bioconductor.org/packages/release/bioc/html/scp.html | ||
|
||
Workflow | ||
-------- | ||
|
||
The scp workflow currently supports the processing of MaxQuant results and requires two input files: | ||
|
||
- evidence.txt file (output from MaxQuant) | ||
- sampleAnnotation file (provided by user). The SampleAnnotation file is a metadata file, describing annotation of individual samples (such as quantification column names, batches, sample types, etc.). Please note, that the run identifier column MUST be present in both evidence and sampleAnnotation files. | ||
|
||
The workflow starts at the level of PSM. Firstly, the data are filtered extensively to keep only the most reliable identifications: reverse sequences and potential contaminants are removed, as well as PSMs below certain parental ion fraction threshold or not passing a q-value threshold. Also batches with very few features are excluded. | ||
|
||
Subsequently, PSMs are aggregated to peptide level. On the peptide level, another filtering is applied based on median relative intensity or median CV. Peptide-level intensities are then normalized and log2 transformed. | ||
|
||
Such intensities are then further aggregated to the protein level, where they undergo another normalization and imputation of missing values. | ||
|
||
Because of the unavoidable batch effects present in the single-cell data, scp offers two methods for the batch correction: ComBat and removeBatchEffect() from the limma package. | ||
|
||
Finally, dimenson reduction such as PCA or UMAP (on the PCA components) is provided. PCA and UMAP plots are then provided alongside with the (optional) quality controls plots within the `Plots` collection. | ||
|
||
Final log2 transformed, normalized, imputed and batch-corrected data are provided, with the option to export also intermediate results. | ||
|
||
Due to the internal complexity of data formats handling, we opted for one form with pre-defined settings for the whole processing pipeline. However, we highly recommend to check also QC plots and intermediate results and based on that adjust the workflow settings. | ||
</token> | ||
</macros> |
Large diffs are not rendered by default.
Oops, something went wrong.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
1,000 changes: 1,000 additions & 0 deletions
1,000
tools/bioconductor-scp/test-data/evidence_subset.txt
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,65 @@ | ||
runCol quantCols SampleType lcbatch sortday digest | ||
1 190222S_LCA9_X_FP94BM Reporter.intensity.1 Carrier LCA9 s8 N | ||
2 190222S_LCA9_X_FP94BM Reporter.intensity.2 Reference LCA9 s8 N | ||
3 190222S_LCA9_X_FP94BM Reporter.intensity.3 Unused LCA9 s8 N | ||
4 190222S_LCA9_X_FP94BM Reporter.intensity.4 Monocyte LCA9 s8 N | ||
5 190222S_LCA9_X_FP94BM Reporter.intensity.5 Blank LCA9 s8 N | ||
6 190222S_LCA9_X_FP94BM Reporter.intensity.6 Monocyte LCA9 s8 N | ||
7 190222S_LCA9_X_FP94BM Reporter.intensity.7 Macrophage LCA9 s8 N | ||
8 190222S_LCA9_X_FP94BM Reporter.intensity.8 Macrophage LCA9 s8 N | ||
9 190222S_LCA9_X_FP94BM Reporter.intensity.9 Macrophage LCA9 s8 N | ||
10 190222S_LCA9_X_FP94BM Reporter.intensity.10 Macrophage LCA9 s8 N | ||
11 190222S_LCA9_X_FP94BM Reporter.intensity.11 Macrophage LCA9 s8 N | ||
12 190222S_LCA9_X_FP94BM Reporter.intensity.12 Unused LCA9 s8 N | ||
13 190222S_LCA9_X_FP94BM Reporter.intensity.13 Unused LCA9 s8 N | ||
14 190222S_LCA9_X_FP94BM Reporter.intensity.14 Unused LCA9 s8 N | ||
15 190222S_LCA9_X_FP94BM Reporter.intensity.15 Unused LCA9 s8 N | ||
16 190222S_LCA9_X_FP94BM Reporter.intensity.16 Unused LCA9 s8 N | ||
17 190321S_LCA10_X_FP97AG Reporter.intensity.1 Carrier LCA10 s8 Q | ||
18 190321S_LCA10_X_FP97AG Reporter.intensity.2 Reference LCA10 s8 Q | ||
19 190321S_LCA10_X_FP97AG Reporter.intensity.3 Unused LCA10 s8 Q | ||
20 190321S_LCA10_X_FP97AG Reporter.intensity.4 Macrophage LCA10 s8 Q | ||
21 190321S_LCA10_X_FP97AG Reporter.intensity.5 Monocyte LCA10 s8 Q | ||
22 190321S_LCA10_X_FP97AG Reporter.intensity.6 Macrophage LCA10 s8 Q | ||
23 190321S_LCA10_X_FP97AG Reporter.intensity.7 Macrophage LCA10 s8 Q | ||
24 190321S_LCA10_X_FP97AG Reporter.intensity.8 Macrophage LCA10 s8 Q | ||
25 190321S_LCA10_X_FP97AG Reporter.intensity.9 Macrophage LCA10 s8 Q | ||
26 190321S_LCA10_X_FP97AG Reporter.intensity.10 Macrophage LCA10 s8 Q | ||
27 190321S_LCA10_X_FP97AG Reporter.intensity.11 Macrophage LCA10 s8 Q | ||
28 190321S_LCA10_X_FP97AG Reporter.intensity.12 Unused LCA10 s8 Q | ||
29 190321S_LCA10_X_FP97AG Reporter.intensity.13 Unused LCA10 s8 Q | ||
30 190321S_LCA10_X_FP97AG Reporter.intensity.14 Unused LCA10 s8 Q | ||
31 190321S_LCA10_X_FP97AG Reporter.intensity.15 Unused LCA10 s8 Q | ||
32 190321S_LCA10_X_FP97AG Reporter.intensity.16 Unused LCA10 s8 Q | ||
33 190914S_LCB3_X_16plex_Set_21 Reporter.intensity.1 Carrier LCB3 s9 R | ||
34 190914S_LCB3_X_16plex_Set_21 Reporter.intensity.2 Reference LCB3 s9 R | ||
35 190914S_LCB3_X_16plex_Set_21 Reporter.intensity.3 Unused LCB3 s9 R | ||
36 190914S_LCB3_X_16plex_Set_21 Reporter.intensity.4 Unused LCB3 s9 R | ||
37 190914S_LCB3_X_16plex_Set_21 Reporter.intensity.5 Macrophage LCB3 s9 R | ||
38 190914S_LCB3_X_16plex_Set_21 Reporter.intensity.6 Macrophage LCB3 s9 R | ||
39 190914S_LCB3_X_16plex_Set_21 Reporter.intensity.7 Blank LCB3 s9 R | ||
40 190914S_LCB3_X_16plex_Set_21 Reporter.intensity.8 Monocyte LCB3 s9 R | ||
41 190914S_LCB3_X_16plex_Set_21 Reporter.intensity.9 Macrophage LCB3 s9 R | ||
42 190914S_LCB3_X_16plex_Set_21 Reporter.intensity.10 Monocyte LCB3 s9 R | ||
43 190914S_LCB3_X_16plex_Set_21 Reporter.intensity.11 Blank LCB3 s9 R | ||
44 190914S_LCB3_X_16plex_Set_21 Reporter.intensity.12 Macrophage LCB3 s9 R | ||
45 190914S_LCB3_X_16plex_Set_21 Reporter.intensity.13 Macrophage LCB3 s9 R | ||
46 190914S_LCB3_X_16plex_Set_21 Reporter.intensity.14 Macrophage LCB3 s9 R | ||
47 190914S_LCB3_X_16plex_Set_21 Reporter.intensity.15 Macrophage LCB3 s9 R | ||
48 190914S_LCB3_X_16plex_Set_21 Reporter.intensity.16 Macrophage LCB3 s9 R | ||
49 190321S_LCA10_X_FP97_blank_01 Reporter.intensity.1 Blank LCA10 s8 NA | ||
50 190321S_LCA10_X_FP97_blank_01 Reporter.intensity.2 Blank LCA10 s8 NA | ||
51 190321S_LCA10_X_FP97_blank_01 Reporter.intensity.3 Blank LCA10 s8 NA | ||
52 190321S_LCA10_X_FP97_blank_01 Reporter.intensity.4 Blank LCA10 s8 NA | ||
53 190321S_LCA10_X_FP97_blank_01 Reporter.intensity.5 Blank LCA10 s8 NA | ||
54 190321S_LCA10_X_FP97_blank_01 Reporter.intensity.6 Blank LCA10 s8 NA | ||
55 190321S_LCA10_X_FP97_blank_01 Reporter.intensity.7 Blank LCA10 s8 NA | ||
56 190321S_LCA10_X_FP97_blank_01 Reporter.intensity.8 Blank LCA10 s8 NA | ||
57 190321S_LCA10_X_FP97_blank_01 Reporter.intensity.9 Blank LCA10 s8 NA | ||
58 190321S_LCA10_X_FP97_blank_01 Reporter.intensity.10 Blank LCA10 s8 NA | ||
59 190321S_LCA10_X_FP97_blank_01 Reporter.intensity.11 Blank LCA10 s8 NA | ||
60 190321S_LCA10_X_FP97_blank_01 Reporter.intensity.12 Blank LCA10 s8 NA | ||
61 190321S_LCA10_X_FP97_blank_01 Reporter.intensity.13 Blank LCA10 s8 NA | ||
62 190321S_LCA10_X_FP97_blank_01 Reporter.intensity.14 Blank LCA10 s8 NA | ||
63 190321S_LCA10_X_FP97_blank_01 Reporter.intensity.15 Blank LCA10 s8 NA | ||
64 190321S_LCA10_X_FP97_blank_01 Reporter.intensity.16 Blank LCA10 s8 NA |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,66 @@ | ||
# Export intermediate results | ||
# Function to export a single assay with metadata | ||
export_assay_with_metadata <- function(qf, assay_name) { | ||
# Extract assay data, row metadata, and col metadata | ||
assay_data <- SummarizedExperiment::assay(qf[[assay_name]]) | ||
row_metadata <- as.data.frame(SummarizedExperiment::rowData(qf[[assay_name]])) | ||
col_metadata <- as.data.frame(SummarizedExperiment::colData(qf)) | ||
# Combine row metadata with assay data | ||
export_data <- cbind(RowNames = rownames(assay_data), row_metadata, as.data.frame(assay_data)) | ||
# Save the table to a CSV file | ||
output_file <- file.path("outputs", paste0(assay_name, "_export.txt")) | ||
write.table(export_data, output_file, row.names = FALSE, sep = "\t", quote = F) | ||
} | ||
|
||
# Export all assays | ||
export_all_assays <- function(qf) { | ||
# Get the names of all assays | ||
# assay_names <- names(assays(qf)) | ||
assay_names <- c("peptides", "peptides_norm", "peptides_log", "proteins", "proteins_norm", "proteins_imptd") | ||
dir.create("outputs") | ||
# Export each assay | ||
for (assay_name in assay_names) { | ||
export_assay_with_metadata(qf, assay_name) | ||
} | ||
} | ||
|
||
# Plot the QC boxplots | ||
create_boxplots <- function(scp, i, is_log2, name) { | ||
sce <- scp[[i]] | ||
assay_data <- as.data.frame(SummarizedExperiment::assay(sce)) |> | ||
tibble::rownames_to_column("FeatureID") | ||
col_data <- as.data.frame(SummarizedExperiment::colData(scp)) |> | ||
tibble::rownames_to_column("SampleID") | ||
long_data <- assay_data |> | ||
tidyr::pivot_longer( | ||
cols = -FeatureID, | ||
names_to = "SampleID", | ||
values_to = "Value" | ||
) | ||
long_data <- long_data |> | ||
dplyr::left_join(col_data, by = "SampleID") | ||
if (is_log2 == TRUE) { | ||
long_data$Value <- log2(long_data$Value) | ||
} | ||
long_data |> | ||
dplyr::filter(Value != "NaN") |> | ||
ggplot2::ggplot(ggplot2::aes(x = runCol, y = Value, fill = SampleType)) + | ||
ggplot2::geom_boxplot() + | ||
ggplot2::theme_bw() + | ||
ggplot2::labs( | ||
title = name, | ||
x = "Run", | ||
y = "Log2 intensity" | ||
) + | ||
ggplot2::theme(axis.text.x = ggplot2::element_text(angle = 45, hjust = 1)) | ||
} | ||
|
||
# Heatmap | ||
plot_heatmap <- function(scp, i) { | ||
sce <- scp[[i]] | ||
heatmap_mat <- as.matrix(SummarizedExperiment::assay(sce)) | ||
heatmap_mat[is.na(heatmap_mat)] <- 0 | ||
heatmap_bin <- ifelse(heatmap_mat > 0, 1, 0) | ||
colnames(heatmap_bin) <- gsub("Reporter.intensity.", "", colnames(heatmap_bin)) | ||
heatmap(heatmap_bin, scale = "none", col = c("white", "black"), labRow = FALSE, margins = c(10, 5), cexCol = 0.5) | ||
} |