Skip to content

Commit

Permalink
Seurat 4 + seurat convert + mappings tools (#251)
Browse files Browse the repository at this point in the history
* Seurat convert passing 4/5 tests - no data uploaded yet

* Passing AnnData reading test

* Ready for trying tests locally

* Passing all Seurat tests with planemo locally

* Profile and version history

* Almost all tests passing with Seurat 4

* All seurat 4 tests passing locally

* Seurat UMAP passing test and macro mapper

* Seurat AnnData Scanpy 1.8.2 test data retrieval and test hidden

* Point to nature methods paper

(cherry picked from commit 1ea601d)

* Seurat integration and macro WIP

* fix loom and others multi-inputs

* Seurat integration test data, integration passing lints and tests, tags for umap

* Remove repeated options

* Seurat map query (all done last night)

* Select integration features passing planemo tests

* Integration passing tests after adding file option for int. features

* Seurat plot

* Plot linting passes, initial testing, fixes

* Working plots on UI with adequate namings

* Seurat plot passing planemo test locally

* DoHeatmap with tests

* Hover locator

* Sanitise potential injections

* Please linter warning

* History and lintern pleasing

* Remove AnnData as input and make sure it is as output

* Documentation

* Fix version in macro

* AnnData is a valid input

* Seurat dimplot test expected size

* Seurat plot test sizes fixes

* Fix plottting labels for linting

* Fix input files for seurat_map

* Try with conditional nesting for linting

* Fix test data downloads

* Change map-query missing input

* Scale data multipe regress out vars

* Map query refdata param changes

* Fix scale data vars to regress

* Use EBI OC query link for classify query

* Size comparison for seurat map query
  • Loading branch information
pcm32 authored Mar 2, 2024
1 parent ee197a8 commit 0264c35
Show file tree
Hide file tree
Showing 15 changed files with 1,986 additions and 29 deletions.
2 changes: 1 addition & 1 deletion tools/tertiary-analysis/seurat/.shed.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ long_description: |
Seurat is an R package designed for QC, analysis, and exploration of single cell RNA-seq data. Seurat aims to enable users to identify and interpret sources of heterogeneity from single cell transcriptomic measurements, and to integrate diverse types of single cell data.
name: suite_seurat
owner: ebi-gxa
remote_repository_url: https://github.com/ebi-gene-expression-group/container-galaxy-sc-tertiary/
homepage_url: https://github.com/ebi-gene-expression-group/container-galaxy-sc-tertiary/
type: unrestricted
auto_tool_repositories:
name_template: "{{ tool_id }}"
Expand Down
90 changes: 90 additions & 0 deletions tools/tertiary-analysis/seurat/extra/macro_mapper_seurat.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
---
- option_group:
- input-object-file
- input-format
pre_command_macros:
- INPUT_OBJ_PREAMBLE
post_command_macros:
- INPUT_OBJECT
input_declaration_macros:
- input_object_params
- option_group:
- output-object-file
- output-format
post_command_macros:
- OUTPUT_OBJECT
input_declaration_macros:
- output_object_params
output_declaration_macros:
- output_files
- option_group:
- input-object-files
- input-format
pre_command_macros:
- INPUT_OBJS_PREAMBLE
post_command_macros:
- INPUT_OBJECTS
input_declaration_macros:
- input_object_params:
multiple: true
- option_group:
- reference-object-files
- reference-format
pre_command_macros:
- REFERENCE_OBJS_PREAMBLE
post_command_macros:
- REFERENCE_OBJECTS
input_declaration_macros:
- input_object_params:
varname: reference
multiple: true
optional: true
- option_group:
- reference-object-file
- reference-format
pre_command_macros:
- REFERENCE_OBJ_PREAMBLE
post_command_macros:
- REFERENCE_OBJECT
input_declaration_macros:
- input_object_params:
varname: reference
- option_group:
- anchors-object-file
- anchors-format
pre_command_macros:
- ANCHORS_OBJ_PREAMBLE
post_command_macros:
- ANCHORS_OBJECT
input_declaration_macros:
- input_object_params:
varname: anchors
- option_group:
- query-object-file
- query-format
pre_command_macros:
- QUERY_OBJ_PREAMBLE
post_command_macros:
- QUERY_OBJECT
input_declaration_macros:
- input_object_params:
varname: query
- option_group:
- plot-out
post_command_macros:
- OUTPUT_PLOT
output_declaration_macros:
- plot_output_files_format:
format: png
- plot_output_files_format:
format: pdf
- plot_output_files_format:
format: eps
- plot_output_files_format:
format: jpg
- plot_output_files_format:
format: ps
- plot_output_files_format:
format: tiff
- plot_output_files_format:
format: svg
26 changes: 26 additions & 0 deletions tools/tertiary-analysis/seurat/get_test_data.sh
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,23 @@ MARKERS_LINK='https://drive.google.com/uc?export=download&id=18OmWNc7mF-4pzH6DQk

LOOM_LINK='https://drive.google.com/uc?export=download&id=1qNk5cg8hJG3Nv1ljTKmUEnxTOf11EEZX'
H5AD_LINK='https://drive.google.com/uc?export=download&id=1YpE0H_t_dkh17P-WBhPijKvRiGP0BlBz'

H5AD_SC182_LINK='https://drive.google.com/uc?export=download&id=16PUJ2KAkXT8F1UkfqU-9LWoOJUkUG1rp'
SCE_LINK='https://drive.google.com/uc?export=download&id=1UKdyf3M01uAt7oBg93JfmRvNVB_jlUKe'

# Seurat v4 exclusives
IFNB_BASE_FILE='ifnb_'

IFNB_CTRL_INT_LINK='https://drive.google.com/uc?export=download&id=15E_MLz-UclJYInNaA7YKLhLo5W-qlykL'
IFNB_STIM_INT_LINK='https://drive.google.com/uc?export=download&id=14iKgCJGPk16dEmpJJF-Gp_lBDcOdo-54'

## Classify and UMAP mapping
CLASSIFY_QUERY_LINK='https://oc.ebi.ac.uk/s/MlEDILFYRrvkS6E/download'
CLASSIFY_RESULTS_ANCHORS_OBJECT_LINK='https://drive.google.com/uc?export=download&id=1Xtv4K_CxIU1cJ8RjJ7NTvzLQkLvc8a3i'
# UMAP_RESULT_OBJECT_LINK='https://oc.ebi.ac.uk/s/k4MdM07y9DAnurp/download'
UMAP_RESULT_OBJECT_LINK='https://oc.ebi.ac.uk/s/D1z4z2ef1e3dyc3/download'


function get_data {
local link=$1
local fname=$2
Expand All @@ -28,6 +43,7 @@ function get_data {
}

# get matrix data
mkdir -p test-data
pushd test-data
get_data $MTX_LINK mtx.zip
unzip mtx.zip
Expand All @@ -49,3 +65,13 @@ rm -f $BASENAME_FILE"-markers.csv.zip"
get_data $LOOM_LINK $BASENAME_FILE"_loom.h5"
get_data $SCE_LINK $BASENAME_FILE"_sce.rds"
get_data $H5AD_LINK $BASENAME_FILE".h5ad"

get_data $H5AD_SC182_LINK $BASENAME_FILE"_sc182.h5ad"

get_data $IFNB_CTRL_INT_LINK $IFNB_BASE_FILE"ctrl_norm_fvg.rds"
get_data $IFNB_STIM_INT_LINK $IFNB_BASE_FILE"stim_norm_fvg.rds"

get_data $CLASSIFY_QUERY_LINK "Classify_query.rds"
get_data $CLASSIFY_RESULTS_ANCHORS_OBJECT_LINK "Classify_anchors.rds"
get_data $UMAP_RESULT_OBJECT_LINK "UMAP_result_integrated.rds"

165 changes: 165 additions & 0 deletions tools/tertiary-analysis/seurat/scripts/seurat-scale-data.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,165 @@
#!/usr/bin/env Rscript

# Load optparse we need to check inputs

suppressPackageStartupMessages(require(optparse))

# Load common functions

suppressPackageStartupMessages(require(workflowscriptscommon))

# parse options

option_list = list(
make_option(
c("-i", "--input-object-file"),
action = "store",
default = NA,
type = 'character',
help = "File name in which a serialized R matrix object may be found."
),
make_option(
c("--input-format"),
action = "store",
default = "seurat",
type = 'character',
help = "Either loom, seurat, anndata or singlecellexperiment for the input format to read."
),
make_option(
c("--output-format"),
action = "store",
default = "seurat",
type = 'character',
help = "Either loom, seurat, anndata or singlecellexperiment for the output format."
),
make_option(
c("-e", "--genes-use"),
action = "store",
default = NULL,
type = 'character',
help = "File with gene names to scale/center (one gene per line). Default is all genes in object@data."
),
make_option(
c("-v", "--vars-to-regress"),
action = "store",
default = NULL,
type = 'character',
help = "Comma-separated list of variables to regress out (previously latent.vars in RegressOut). For example, nUMI, or percent.mito."
),
make_option(
c("-m", "--model-use"),
action = "store",
default = 'linear',
type = 'character',
help = "Use a linear model or generalized linear model (poisson, negative binomial) for the regression. Options are 'linear' (default), 'poisson', and 'negbinom'."
),
make_option(
c("-u", "--use-umi"),
action = "store",
default = FALSE,
type = 'logical',
help = "Regress on UMI count data. Default is FALSE for linear modeling, but automatically set to TRUE if model.use is 'negbinom' or 'poisson'."
),
make_option(
c("-s", "--do-not-scale"),
action = "store_true",
default = FALSE,
type = 'logical',
help = "Skip the data scale."
),
make_option(
c("-c", "--do-not-center"),
action = "store_true",
default = FALSE,
type = 'logical',
help = "Skip data centering."
),
make_option(
c("-x", "--scale-max"),
action = "store",
default = 10,
type = 'double',
help = "Max value to return for scaled data. The default is 10. Setting this can help reduce the effects of genes that are only expressed in a very small number of cells. If regressing out latent variables and using a non-linear model, the default is 50."
),
make_option(
c("-b", "--block-size"),
action = "store",
default = 1000,
type = 'integer',
help = "Default size for number of genes to scale at in a single computation. Increasing block.size may speed up calculations but at an additional memory cost."
),
make_option(
c("-d", "--min-cells-to-block"),
action = "store",
default = 1000,
type = 'integer',
help = "If object contains fewer than this number of cells, don't block for scaling calculations."
),
make_option(
c("-n", "--check-for-norm"),
action = "store",
default = TRUE,
type = 'logical',
help = "Check to see if data has been normalized, if not, output a warning (TRUE by default)."
),
make_option(
c("-o", "--output-object-file"),
action = "store",
default = NA,
type = 'character',
help = "File name in which to store serialized R object of type 'Seurat'.'"
)
)

opt <- wsc_parse_args(option_list, mandatory = c('input_object_file', 'output_object_file'))

# Check parameter values

if ( ! file.exists(opt$input_object_file)){
stop((paste('File', opt$input_object_file, 'does not exist')))
}

if (! is.null(opt$genes_use)){
if (! file.exists(opt$genes_use)){
stop((paste('Supplied genes file', opt$genes_use, 'does not exist')))
}else{
genes_use <- readLines(opt$genes_use)
}
}else{
genes_use <- NULL
}

# break up opt$vars_to_regress into a list if it has commas
opt$vars_to_regress <- unlist(strsplit(opt$vars_to_regress, ","))

# Now we're hapy with the arguments, load Seurat and do the work

suppressPackageStartupMessages(require(Seurat))
if(opt$input_format == "loom" | opt$output_format == "loom") {
suppressPackageStartupMessages(require(SeuratDisk))
} else if(opt$input_format == "singlecellexperiment" | opt$output_format == "singlecellexperiment") {
suppressPackageStartupMessages(require(scater))
}

# Input from serialized R object

seurat_object <- read_seurat4_object(input_path = opt$input_object_file, format = opt$input_format)
# https://stackoverflow.com/questions/9129673/passing-list-of-named-parameters-to-function
# might be useful
scaled_seurat_object <- ScaleData(seurat_object,
features = genes_use,
vars.to.regress = opt$vars_to_regress,
model.use = opt$model_use,
use.umi = opt$use_umi,
do.scale = !opt$do_not_scale,
do.center = !opt$do_not_center,
scale.max = opt$scale_max,
block.size = opt$block_size,
min.cells.to.block = opt$min_cells_to_block,
verbose = FALSE)


# Output to a serialized R object
write_seurat4_object(seurat_object = scaled_seurat_object,
output_path = opt$output_object_file,
format = opt$output_format)
35 changes: 25 additions & 10 deletions tools/tertiary-analysis/seurat/seurat_convert.xml
Original file line number Diff line number Diff line change
@@ -1,11 +1,12 @@
<tool id="seurat_convert" name="Seurat 3 converter" profile="18.01" version="@SEURAT_VERSION@+galaxy0">
<tool id="seurat_convert" name="Seurat 4 converter" profile="18.01" version="@SEURAT_VERSION@+galaxy0">
<description>translates different single cell formats</description>
<macros>
<import>seurat_macros.xml</import>
</macros>
<expand macro="requirements" />
<expand macro="version" />
<command detect_errors="exit_code"><![CDATA[
@INPUT_OBJ_PREAMBLE@
seurat-convert.R
@INPUT_OBJECT@
@OUTPUT_OBJECT@
Expand Down Expand Up @@ -76,33 +77,47 @@ seurat-convert.R
<param name="format" value="rds_seurat"/>
<output name="rds_seurat_file">
<assert_contents>
<has_size value="2965562" delta="200000"/>
<has_size value="3761959" delta="200000"/>
</assert_contents>
</output>
</test>
<!-- <test>
<conditional name="input">
<param name="format" value="anndata"/>
<param name="anndata_file" value="E-MTAB-6077-3k_features_90_cells_sc182.h5ad" ftype="h5ad"/>
</conditional>
<param name="format" value="rds_seurat"/>
<output name="rds_seurat_file">
<assert_contents>
<has_size value="3761959" delta="200000"/>
</assert_contents>
</output>
</test> -->

</tests>
<help><![CDATA[
.. class:: infomark
**What it does**
This tool uses Seurat 3 to convert formats. Possible inputs are:
@SEURAT_INTRO@
This tool uses Seurat 4 to convert formats. Possible inputs are:
* Seurat 3
* Loom (probably earlier than Loom 3.0)
* AnnData (contemporary versions to Seurat 3, most likely up to AnnData 0.6.22.post1)
* Seurat 3/4
* Loom (versions contemporary to Seurat 4)
* AnnData (contemporary versions to Seurat 4)
* Single Cell Experiment
Possible outputs are:
* Seurat 3
* Loom (as produced by loomR package)
* Seurat 4
* Loom
* Single Cell Experiment
For newer versions of AnnData and Loom, please try the Seurat 4 version of this tool.
@SEURAT_INTRO@
.. _Seurat: https://www.nature.com/articles/nbt.4096
.. _Satija Lab: https://satijalab.org/seurat/
Expand Down
2 changes: 1 addition & 1 deletion tools/tertiary-analysis/seurat/seurat_dim_plot.xml
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@
<param name="rds_seurat_file" value="E-MTAB-6077-3k_features_90_cells-tsne.rds" ftype="rdata" />
<output name="output_image_file" >
<assert_contents>
<has_size value="18122" delta="2000"/>
<has_size value="18122" delta="4000"/>
</assert_contents>
</output>
</test>
Expand Down
Loading

0 comments on commit 0264c35

Please sign in to comment.