diff --git a/docs/config.rst b/docs/config.rst index c9013473..b794c760 100644 --- a/docs/config.rst +++ b/docs/config.rst @@ -61,6 +61,15 @@ Spacemake provides the following barcode\_flavors out of the box: default: cell: "r1[0:12]" UMI: "r1[12:20]" + openst: + cell: "r1[2:27]" + UMI: "r2[0:9]" + sc_10x_v2: + cell: "r1[0:16]" + UMI: "r1[16:26]" + seq_scope: + UMI: "r2[0:9]" + cell: "r1[0:20]" slide_seq_14bc: cell: "r1[0:14]" UMI: "r1[14:23]" @@ -70,12 +79,6 @@ Spacemake provides the following barcode\_flavors out of the box: visium: cell: "r1[0:16]" UMI: "r1[16:28]" - sc_10x_v2: - cell: "r1[0:16]" - UMI: "r1[16:26]" - seq_scope: - UMI: "r2[0:9]" - cell: "r1[0:20]" To list the currently available ``barcode_flavor``-s, type:: @@ -169,52 +172,72 @@ Provided run\_mode(s) .. code-block:: yaml - default: - n_beads: 100000 - umi_cutoff: [100, 300, 500] - clean_dge: False - detect_tissue: False - polyA_adapter_trimming: True - count_intronic_reads: True - count_mm_reads: False - mesh_data: False - mesh_type: 'circle' - mesh_spot_diameter_um: 55 - mesh_spot_distance_um: 100 - spatial_barcode_min_matches: 0 - visium: - n_beads: 10000 - umi_cutoff: [1000] - clean_dge: False - detect_tissue: True - polyA_adapter_trimming: False - count_intronic_reads: False - count_mm_reads: True - slide_seq: - n_beads: 100000 - umi_cutoff: [50] - clean_dge: False - detect_tissue: False - scRNA_seq: - n_beads: 10000 - umi_cutoff: [500] - detect_tissue: False - polyA_adapter_trimming: True - count_intronic_reads: True - count_mm_reads: False - seq_scope: - clean_dge: false - count_intronic_reads: false - count_mm_reads: false - detect_tissue: false - mesh_data: true - mesh_spot_diameter_um: 10 - mesh_spot_distance_um: 15 - mesh_type: hexagon - n_beads: 1000 - umi_cutoff: - - 100 - - 300 + default: + clean_dge: false + count_intronic_reads: true + count_mm_reads: false + detect_tissue: false + mesh_data: false + mesh_spot_diameter_um: 55 + mesh_spot_distance_um: 100 + mesh_type: circle + n_beads: 100000 + polyA_adapter_trimming: true + spatial_barcode_min_matches: 0 + umi_cutoff: + - 100 + - 300 + - 500 + openst: + clean_dge: false + count_intronic_reads: true + count_mm_reads: true + detect_tissue: false + mesh_data: true + mesh_spot_diameter_um: 7 + mesh_spot_distance_um: 7 + mesh_type: hexagon + n_beads: 100000 + polyA_adapter_trimming: true + spatial_barcode_min_matches: 0.1 + umi_cutoff: + - 100 + - 250 + - 500 + scRNA_seq: + count_intronic_reads: true + count_mm_reads: false + detect_tissue: false + n_beads: 10000 + umi_cutoff: + - 500 + seq_scope: + clean_dge: false + count_intronic_reads: false + count_mm_reads: false + detect_tissue: false + mesh_data: true + mesh_spot_diameter_um: 10 + mesh_spot_distance_um: 15 + mesh_type: hexagon + n_beads: 1000 + umi_cutoff: + - 100 + - 300 + slide_seq: + clean_dge: false + detect_tissue: false + n_beads: 100000 + umi_cutoff: + - 50 + visium: + clean_dge: false + count_intronic_reads: false + count_mm_reads: true + detect_tissue: true + n_beads: 10000 + umi_cutoff: + - 1000 .. note:: If a sample has no ``run_mode`` provided, the ``default`` will be used @@ -259,21 +282,22 @@ Configure pucks .. _configure-puck: -Each spatial sample, needs to have a ``puck``. The ``puck`` sample-variable will define the -dimensionality of the underlying spatial structure, which then spacemake will use -during the autmated analysis and plotting. +Each spatial sample is associated with a ``puck``. The ``puck`` variable defines the +dimensionality of the underlying spatial structure, which spacemake uses +during the automated analysis and plotting, as well as the binning (meshing) of +the data when selected in the ``run_mode``. Each puck has the following variables: - ``width_um``: the width of the puck, in microns - ``spot_diameter_um``: the diameter of bead on this puck, in microns. - ``barcodes`` (optional): the path to the barcode file, containing the cell\_barcode - and (x,y) position for each. This is handy, when several pucks have the same barcodes, - such as for 10x visium. + and (x,y) position for each. This is handy when several pucks have the same barcodes, + such as for 10x Visium. - ``coordinate_system`` (optional): the path to the coordinate system file, containing puck IDs and the (x,y,z) position for each, in global coordinates. This coordinate system is analogous to the global coordinate system for image stitching. When specified, this 'stitching' is - automatically performed on ``puck``-s with spatial information + automatically performed on ``puck``-s with spatial information. Provided pucks @@ -281,26 +305,27 @@ Provided pucks .. code-block:: yaml - default: - width_um: 3000 - spot_diameter_um: 10 - visium: - barcodes: 'puck_data/visium_barcode_positions.csv' - width_um: 6500 - spot_diameter_um: 55 - seq_scope: - width_um: 1000 - spot_diameter_um: 1 - slide_seq: - width_um: 3000 - spot_diameter_um: 10 - openst: - width_um: 1200 - spot_diameter_um: 0.6 - coordinate_system: 'puck_data/openst_coordinate_system.csv' - -as you can see, the ``visium`` puck comes with a ``barcodes`` variable, which points to -``puck_data/visium_barcode_positions.csv``; similarly, the ``openst`` puck comes with + default: + coordinate_system: '' + spot_diameter_um: 10 + width_um: 3000 + openst: + coordinate_system: puck_data/openst_coordinate_system.csv + spot_diameter_um: 0.6 + width_um: 1200 + seq_scope: + spot_diameter_um: 1 + width_um: 1000 + slide_seq: + spot_diameter_um: 10 + width_um: 3000 + visium: + barcodes: puck_data/visium_barcode_positions.csv + spot_diameter_um: 55 + width_um: 6500 + +The ``visium`` puck comes with a ``barcodes`` variable, which points to +``puck_data/visium_barcode_positions.csv``. Similarly, the ``openst`` puck comes with a ``coordinate_system`` variable, pointing to ``puck_data/openst_coordinate_system.csv``. Upon initiation, these files will automatically placed there by spacemake @@ -324,18 +349,18 @@ Add a new puck Custom snakemake rules -^^^^^^^^^^^^^^^^^^^^^^ +---------------------- -As of version ``0.7`` it is now add custom snakemake rules to your spacemake workflow. Simply -add the following line to the ``config.yaml`` in your spacemake root folder: +As of version ``0.7`` it is now possible to add custom snakemake rules to your spacemake workflow. +Simply add the following line to the ``config.yaml`` in your spacemake root folder: .. code-block:: yaml custom_rules: /path/to/my_own_custom_snakefile.smk -Within your custom code you can import spacemake modules and have access to internal variables. -If you need to make spacemake aware of new top-level targets that have to be made, you can register a -callback +Within your custom code, you can import spacemake modules and have access to internal variables. +If you need to make spacemake aware of new top-level targets that have to be made, +you can register a callback .. code-block:: python diff --git a/docs/quick-start/index.rst b/docs/quick-start/index.rst index d3e25981..6c58515b 100644 --- a/docs/quick-start/index.rst +++ b/docs/quick-start/index.rst @@ -41,15 +41,15 @@ More info :ref:`here `. index provided has the same version of STAR as the command-line STAR. If this is not the case, an error will be raised. -Visium quick start ------------------- +Open-ST quick start +------------------- -Step 1: add a Visium sample -^^^^^^^^^^^^^^^^^^^^^^^^^^^ +Step 1: add an Open-ST sample +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -After :ref:`spacemake has been initialized `, a `Visium`_ sample can be added. +After :ref:`spacemake has been initialized `, an `Open-ST` sample can be added. -To add a `Visium`_ sample, type in terminal: +To add an `Open-ST` sample, type in the terminal: .. code-block:: console @@ -59,15 +59,10 @@ To add a `Visium`_ sample, type in terminal: --R1 \ # single R1 or several R1 files --R2 \ # single R2 or several R2 files --species \ - --puck visium \ - --run_mode visium \ - --barcode_flavor visium - -Above we add a new visium project with ``puck, run_mode, barcode_flavor`` all set to ``visium``. - -This is possible as spacemake comes with pre-defined variables, all suited for visium. The visium ``run_mode`` will process the -sample in the same way as `spaceranger `_ would: intronic reads will not be counted, multi-mappers (where the multi-mapping read maps only to one CDS or UTR region) will be counted, -3' polyA stretches will not be trimmed from Read2. + --puck openst \ + --puck_barcode_file \ + --run_mode openst \ + --barcode_flavor openst .. note:: @@ -84,6 +79,87 @@ sample in the same way as `spaceranger `_. + + It is typically unknown *a priori* which tiles fall under a sample, so in principle all available + ``puck_barcode_files`` would need to be specified under ``--puck_barcode_file``. To generate output files + and reports only for the relevant tiles per sample, we provide the variable ``spatial_barcode_min_matches`` + under ``run_mode``, as the minimum proportion of spatial barcodes that a tile has in common with + the sample reads to be considered during quantification and downstream analysis. + + The default threshold (0.1, i.e., at least 10% of barcodes per tile are present in the sample) was chosen + empirically, and typically keeps all relevant tiles. Spacemake will modify the ``project_df`` for each sample + to keep only those tiles that passed filters. So, if you see that some tiles might be missing + because this threshold was too high, you can update the sample to add missing tiles, and then rerun + spacemake using a lower ``spatial_barcode_min_matches``. + +The above will add a new Open-ST project with ``barcode_flavor, run_mode, puck`` all +set to ``openst``. + +The structure in ``barcode_flavor`` assumes that sequencing of the library is paired-end, +with Read1 having the spot barcodes, and Read2 containing the UMI (first 9 nucleotides) and +the sequence to be aligned to the genome. You can see the details by running +``spacemake config list_barcode_flavors``. + +The ``run_mode`` is tailored to Open-ST samples and it +(i) counts intronic reads, (ii) includes multi-mappers, (iii) bins the data +into regular hexagons of a 7 um side, and (iv) performs automated analyses for +three UMI thresholds [100, 250, 500]. You can see the details by running +``spacemake config list_run_modes``. + +The ``puck`` variable is used to correctly set the size of the capture area. This +is required for accurately plotting the QC sheets and the results of the automated +analyses, but also for binning the data from the spot level into regular hexagons. +The ``openst`` is set to have a ``spot_diameter`` of 0.6 um and ``width`` of 1,200 um. +The ``coordinate_system`` that ships with ``spacemake`` is used for stitching +multiple tiles into a single area. To adapt the file for your data, rename the +tile names found in the ``puck_id`` column (default value: ``fc_1_L*_tile_*``) to the +prefix of your data. + +Step 2: running spacemake +^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. include:: run_spacemake.rst + +Visium quick start +------------------ + +Step 1: add a Visium sample +^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +After :ref:`spacemake has been initialized `, a `Visium`_ sample can be added. + +To add a `Visium`_ sample, type in terminal: + +.. code-block:: console + + spacemake projects add_sample \ + --project_id \ + --sample_id \ + --R1 \ # single R1 or several R1 files + --R2 \ # single R2 or several R2 files + --species \ + --puck visium \ + --run_mode visium \ + --barcode_flavor visium + +Above we add a new visium project with ``puck, run_mode, barcode_flavor`` all set to ``visium``. + +This is possible as spacemake comes with pre-defined variables, all suited for visium. The visium ``run_mode`` will process the +sample in the same way as `spaceranger `_ would: intronic reads will not be counted, multi-mappers (where the multi-mapping read maps only to one CDS or UTR region) will be counted, +3' polyA stretches will not be trimmed from Read2. + To see the values of these predefined variables checkout the :ref:`configuration ` docs. **To add several visium samples at once, follow** :ref:`the tutorial here ` diff --git a/environment.yaml b/environment.yaml index 44530b3b..d2e77eed 100644 --- a/environment.yaml +++ b/environment.yaml @@ -5,7 +5,7 @@ channels: - bioconda - defaults dependencies: - - python>=3.6 + - python>=3.6,<3.12 - snakemake>=5.32.0,<6.4.0 - star>=2.7.1a - samtools>=1.13 diff --git a/spacemake/snakemake/scripts/filter_mm_reads.py b/spacemake/snakemake/scripts/filter_mm_reads.py index 1bd587cb..139a3a01 100644 --- a/spacemake/snakemake/scripts/filter_mm_reads.py +++ b/spacemake/snakemake/scripts/filter_mm_reads.py @@ -55,7 +55,7 @@ def is_exonic(aln): delta_seconds = (finish_time - start_time).seconds # restart time start_time = finish_time - print(f'Processed 1 millon records in {delta_seconds} seconds, total records processed {counter}. current time: {finish_time}') + print(f'Processed 1 million records in {delta_seconds} seconds, total records processed {counter}. current time: {finish_time}') mapped_number = aln.get_tag('NH')