Skip to content

Commit

Permalink
Merge pull request #104 from nukappa/master
Browse files Browse the repository at this point in the history
Updated docs for explaining openst functionality
  • Loading branch information
nukappa authored Apr 17, 2024
2 parents cdcfc43 + dd49d66 commit 505a358
Show file tree
Hide file tree
Showing 4 changed files with 202 additions and 101 deletions.
193 changes: 109 additions & 84 deletions docs/config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,15 @@ Spacemake provides the following barcode\_flavors out of the box:
default:
cell: "r1[0:12]"
UMI: "r1[12:20]"
openst:
cell: "r1[2:27]"
UMI: "r2[0:9]"
sc_10x_v2:
cell: "r1[0:16]"
UMI: "r1[16:26]"
seq_scope:
UMI: "r2[0:9]"
cell: "r1[0:20]"
slide_seq_14bc:
cell: "r1[0:14]"
UMI: "r1[14:23]"
Expand All @@ -70,12 +79,6 @@ Spacemake provides the following barcode\_flavors out of the box:
visium:
cell: "r1[0:16]"
UMI: "r1[16:28]"
sc_10x_v2:
cell: "r1[0:16]"
UMI: "r1[16:26]"
seq_scope:
UMI: "r2[0:9]"
cell: "r1[0:20]"
To list the currently available ``barcode_flavor``-s, type::
Expand Down Expand Up @@ -169,52 +172,72 @@ Provided run\_mode(s)

.. code-block:: yaml
default:
n_beads: 100000
umi_cutoff: [100, 300, 500]
clean_dge: False
detect_tissue: False
polyA_adapter_trimming: True
count_intronic_reads: True
count_mm_reads: False
mesh_data: False
mesh_type: 'circle'
mesh_spot_diameter_um: 55
mesh_spot_distance_um: 100
spatial_barcode_min_matches: 0
visium:
n_beads: 10000
umi_cutoff: [1000]
clean_dge: False
detect_tissue: True
polyA_adapter_trimming: False
count_intronic_reads: False
count_mm_reads: True
slide_seq:
n_beads: 100000
umi_cutoff: [50]
clean_dge: False
detect_tissue: False
scRNA_seq:
n_beads: 10000
umi_cutoff: [500]
detect_tissue: False
polyA_adapter_trimming: True
count_intronic_reads: True
count_mm_reads: False
seq_scope:
clean_dge: false
count_intronic_reads: false
count_mm_reads: false
detect_tissue: false
mesh_data: true
mesh_spot_diameter_um: 10
mesh_spot_distance_um: 15
mesh_type: hexagon
n_beads: 1000
umi_cutoff:
- 100
- 300
default:
clean_dge: false
count_intronic_reads: true
count_mm_reads: false
detect_tissue: false
mesh_data: false
mesh_spot_diameter_um: 55
mesh_spot_distance_um: 100
mesh_type: circle
n_beads: 100000
polyA_adapter_trimming: true
spatial_barcode_min_matches: 0
umi_cutoff:
- 100
- 300
- 500
openst:
clean_dge: false
count_intronic_reads: true
count_mm_reads: true
detect_tissue: false
mesh_data: true
mesh_spot_diameter_um: 7
mesh_spot_distance_um: 7
mesh_type: hexagon
n_beads: 100000
polyA_adapter_trimming: true
spatial_barcode_min_matches: 0.1
umi_cutoff:
- 100
- 250
- 500
scRNA_seq:
count_intronic_reads: true
count_mm_reads: false
detect_tissue: false
n_beads: 10000
umi_cutoff:
- 500
seq_scope:
clean_dge: false
count_intronic_reads: false
count_mm_reads: false
detect_tissue: false
mesh_data: true
mesh_spot_diameter_um: 10
mesh_spot_distance_um: 15
mesh_type: hexagon
n_beads: 1000
umi_cutoff:
- 100
- 300
slide_seq:
clean_dge: false
detect_tissue: false
n_beads: 100000
umi_cutoff:
- 50
visium:
clean_dge: false
count_intronic_reads: false
count_mm_reads: true
detect_tissue: true
n_beads: 10000
umi_cutoff:
- 1000
.. note::
If a sample has no ``run_mode`` provided, the ``default`` will be used
Expand Down Expand Up @@ -259,48 +282,50 @@ Configure pucks

.. _configure-puck:

Each spatial sample, needs to have a ``puck``. The ``puck`` sample-variable will define the
dimensionality of the underlying spatial structure, which then spacemake will use
during the autmated analysis and plotting.
Each spatial sample is associated with a ``puck``. The ``puck`` variable defines the
dimensionality of the underlying spatial structure, which spacemake uses
during the automated analysis and plotting, as well as the binning (meshing) of
the data when selected in the ``run_mode``.

Each puck has the following variables:

- ``width_um``: the width of the puck, in microns
- ``spot_diameter_um``: the diameter of bead on this puck, in microns.
- ``barcodes`` (optional): the path to the barcode file, containing the cell\_barcode
and (x,y) position for each. This is handy, when several pucks have the same barcodes,
such as for 10x visium.
and (x,y) position for each. This is handy when several pucks have the same barcodes,
such as for 10x Visium.
- ``coordinate_system`` (optional): the path to the coordinate system file, containing puck
IDs and the (x,y,z) position for each, in global coordinates. This coordinate system is analogous
to the global coordinate system for image stitching. When specified, this 'stitching' is
automatically performed on ``puck``-s with spatial information
automatically performed on ``puck``-s with spatial information.


Provided pucks
^^^^^^^^^^^^^^

.. code-block:: yaml
default:
width_um: 3000
spot_diameter_um: 10
visium:
barcodes: 'puck_data/visium_barcode_positions.csv'
width_um: 6500
spot_diameter_um: 55
seq_scope:
width_um: 1000
spot_diameter_um: 1
slide_seq:
width_um: 3000
spot_diameter_um: 10
openst:
width_um: 1200
spot_diameter_um: 0.6
coordinate_system: 'puck_data/openst_coordinate_system.csv'
as you can see, the ``visium`` puck comes with a ``barcodes`` variable, which points to
``puck_data/visium_barcode_positions.csv``; similarly, the ``openst`` puck comes with
default:
coordinate_system: ''
spot_diameter_um: 10
width_um: 3000
openst:
coordinate_system: puck_data/openst_coordinate_system.csv
spot_diameter_um: 0.6
width_um: 1200
seq_scope:
spot_diameter_um: 1
width_um: 1000
slide_seq:
spot_diameter_um: 10
width_um: 3000
visium:
barcodes: puck_data/visium_barcode_positions.csv
spot_diameter_um: 55
width_um: 6500
The ``visium`` puck comes with a ``barcodes`` variable, which points to
``puck_data/visium_barcode_positions.csv``. Similarly, the ``openst`` puck comes with
a ``coordinate_system`` variable, pointing to ``puck_data/openst_coordinate_system.csv``.

Upon initiation, these files will automatically placed there by spacemake
Expand All @@ -324,18 +349,18 @@ Add a new puck
Custom snakemake rules
^^^^^^^^^^^^^^^^^^^^^^
----------------------

As of version ``0.7`` it is now add custom snakemake rules to your spacemake workflow. Simply
add the following line to the ``config.yaml`` in your spacemake root folder:
As of version ``0.7`` it is now possible to add custom snakemake rules to your spacemake workflow.
Simply add the following line to the ``config.yaml`` in your spacemake root folder:

.. code-block:: yaml
custom_rules: /path/to/my_own_custom_snakefile.smk
Within your custom code you can import spacemake modules and have access to internal variables.
If you need to make spacemake aware of new top-level targets that have to be made, you can register a
callback
Within your custom code, you can import spacemake modules and have access to internal variables.
If you need to make spacemake aware of new top-level targets that have to be made,
you can register a callback

.. code-block:: python
Expand Down
106 changes: 91 additions & 15 deletions docs/quick-start/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -41,15 +41,15 @@ More info :ref:`here <configure-species>`.
index provided has the same version of STAR as the command-line STAR. If this is
not the case, an error will be raised.

Visium quick start
------------------
Open-ST quick start
-------------------

Step 1: add a Visium sample
^^^^^^^^^^^^^^^^^^^^^^^^^^^
Step 1: add an Open-ST sample
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

After :ref:`spacemake has been initialized <initialization>`, a `Visium`_ sample can be added.
After :ref:`spacemake has been initialized <initialization>`, an `Open-ST` sample can be added.

To add a `Visium`_ sample, type in terminal:
To add an `Open-ST` sample, type in the terminal:

.. code-block:: console
Expand All @@ -59,15 +59,10 @@ To add a `Visium`_ sample, type in terminal:
--R1 <path_to_R1.fastq.gz> \ # single R1 or several R1 files
--R2 <path_to_R2.fastq.gz> \ # single R2 or several R2 files
--species <species> \
--puck visium \
--run_mode visium \
--barcode_flavor visium
Above we add a new visium project with ``puck, run_mode, barcode_flavor`` all set to ``visium``.

This is possible as spacemake comes with pre-defined variables, all suited for visium. The visium ``run_mode`` will process the
sample in the same way as `spaceranger <https://support.10xgenomics.com/spatial-gene-expression/software/pipelines/latest/what-is-space-ranger>`_ would: intronic reads will not be counted, multi-mappers (where the multi-mapping read maps only to one CDS or UTR region) will be counted,
3' polyA stretches will not be trimmed from Read2.
--puck openst \
--puck_barcode_file <path_to_puck_barcode_file.tsv.gz> \
--run_mode openst \
--barcode_flavor openst
.. note::

Expand All @@ -84,6 +79,87 @@ sample in the same way as `spaceranger <https://support.10xgenomics.com/spatial-

The important thing is to always keep the order consistent between the two mates.

.. note::
With Open-ST data, each sample covers a *piece* of capture area,
which contains at least one tile (``puck``).

Thus, we need to provide ``--puck_barcode_file`` (since each puck has different barcodes, unlike for visium samples).
This file should be a comma or tab separated, containing column names as first row. Acceptable column names are:

- ``cell_bc``, ``barcodes`` or ``barcode`` for cell-barcode
- ``xcoord`` or ``x_pos`` for x-positions
- ``ycoord`` or ``y_pos`` for y-positions

These can be generated using the `openst package <https://rajewsky-lab.github.io/openst/latest/computational/preprocessing_sequencing/>`_.

It is typically unknown *a priori* which tiles fall under a sample, so in principle all available
``puck_barcode_files`` would need to be specified under ``--puck_barcode_file``. To generate output files
and reports only for the relevant tiles per sample, we provide the variable ``spatial_barcode_min_matches``
under ``run_mode``, as the minimum proportion of spatial barcodes that a tile has in common with
the sample reads to be considered during quantification and downstream analysis.

The default threshold (0.1, i.e., at least 10% of barcodes per tile are present in the sample) was chosen
empirically, and typically keeps all relevant tiles. Spacemake will modify the ``project_df`` for each sample
to keep only those tiles that passed filters. So, if you see that some tiles might be missing
because this threshold was too high, you can update the sample to add missing tiles, and then rerun
spacemake using a lower ``spatial_barcode_min_matches``.

The above will add a new Open-ST project with ``barcode_flavor, run_mode, puck`` all
set to ``openst``.

The structure in ``barcode_flavor`` assumes that sequencing of the library is paired-end,
with Read1 having the spot barcodes, and Read2 containing the UMI (first 9 nucleotides) and
the sequence to be aligned to the genome. You can see the details by running
``spacemake config list_barcode_flavors``.

The ``run_mode`` is tailored to Open-ST samples and it
(i) counts intronic reads, (ii) includes multi-mappers, (iii) bins the data
into regular hexagons of a 7 um side, and (iv) performs automated analyses for
three UMI thresholds [100, 250, 500]. You can see the details by running
``spacemake config list_run_modes``.

The ``puck`` variable is used to correctly set the size of the capture area. This
is required for accurately plotting the QC sheets and the results of the automated
analyses, but also for binning the data from the spot level into regular hexagons.
The ``openst`` is set to have a ``spot_diameter`` of 0.6 um and ``width`` of 1,200 um.
The ``coordinate_system`` that ships with ``spacemake`` is used for stitching
multiple tiles into a single area. To adapt the file for your data, rename the
tile names found in the ``puck_id`` column (default value: ``fc_1_L*_tile_*``) to the
prefix of your data.

Step 2: running spacemake
^^^^^^^^^^^^^^^^^^^^^^^^^

.. include:: run_spacemake.rst

Visium quick start
------------------

Step 1: add a Visium sample
^^^^^^^^^^^^^^^^^^^^^^^^^^^

After :ref:`spacemake has been initialized <initialization>`, a `Visium`_ sample can be added.

To add a `Visium`_ sample, type in terminal:

.. code-block:: console
spacemake projects add_sample \
--project_id <project_id> \
--sample_id <sample_id> \
--R1 <path_to_R1.fastq.gz> \ # single R1 or several R1 files
--R2 <path_to_R2.fastq.gz> \ # single R2 or several R2 files
--species <species> \
--puck visium \
--run_mode visium \
--barcode_flavor visium
Above we add a new visium project with ``puck, run_mode, barcode_flavor`` all set to ``visium``.

This is possible as spacemake comes with pre-defined variables, all suited for visium. The visium ``run_mode`` will process the
sample in the same way as `spaceranger <https://support.10xgenomics.com/spatial-gene-expression/software/pipelines/latest/what-is-space-ranger>`_ would: intronic reads will not be counted, multi-mappers (where the multi-mapping read maps only to one CDS or UTR region) will be counted,
3' polyA stretches will not be trimmed from Read2.

To see the values of these predefined variables checkout the :ref:`configuration <Configuration>` docs.

**To add several visium samples at once, follow** :ref:`the tutorial here <add-several-samples>`
Expand Down
2 changes: 1 addition & 1 deletion environment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ channels:
- bioconda
- defaults
dependencies:
- python>=3.6
- python>=3.6,<3.12
- snakemake>=5.32.0,<6.4.0
- star>=2.7.1a
- samtools>=1.13
Expand Down
2 changes: 1 addition & 1 deletion spacemake/snakemake/scripts/filter_mm_reads.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ def is_exonic(aln):
delta_seconds = (finish_time - start_time).seconds
# restart time
start_time = finish_time
print(f'Processed 1 millon records in {delta_seconds} seconds, total records processed {counter}. current time: {finish_time}')
print(f'Processed 1 million records in {delta_seconds} seconds, total records processed {counter}. current time: {finish_time}')

mapped_number = aln.get_tag('NH')

Expand Down

0 comments on commit 505a358

Please sign in to comment.