Skip to content

Latest commit

 

History

History
132 lines (101 loc) · 4.79 KB

README.md

File metadata and controls

132 lines (101 loc) · 4.79 KB

FeatureSelectionBenchMarks

This is the code repository of the feature (gene) selection benchmark in both scRNA-seq and spatial transcriptomics.

Software features

After a simple configuration, you can run the benchmark (including data loading, quality control, feature selection, and cell clustering/domain detection) in one single line of code:

from benchmark.run_benchmark import run_bench


# configure the dataset information
data_cfg = {
    'your_data_name': {
        'adata_path': 'path/to/h5ad/file',
        'annot_key': 'annotation_name',
    }}
# configure feature selection methods and numbers of selected features
fs_cfg = {'feature_selection_method': [1000, 2000]}
# configure clustering methods and numbers of runs
cl_cfg = {'clustering_method': 2}
# run the benchmark in one line of code
run_bench(data_cfg, fs_cfg, cl_cfg, modality='scrna', metrics=['ARI', 'NMI'])

The evaluation results will be automatically saved as an XLSX file in the working directory with name like this:

2023-02 14_54_32 scrna.xlsx

Other software features are:

  • Automatically save the results of each step (preprocessed data, selected features, and cluster labels)
  • Reload the cached genes and cluster labels when you use the same data (specified by the data name)
  • Support custom feature selection and cell clustering/domain detection methods
  • Present detailed and pretty logging messages based on rich and loguru (see examples in tutorial)

Currently supported methods

scRNA-seq

Feature selection

Name Language Reference
GeneClust Python paper
vst Python paper
mvp Python paper
triku Python paper
GiniClust3 Python paper
SC3 Python paper
scran R paper
FEAST R paper
M3Drop R paper
scmap R paper
deviance R paper
FEAST R paper
sctransform R paper

Cell clustering

Name Language Reference
SC3s Python paper
Seurat R paper
SHARP R paper
TSCAN R paper
CIDR R paper

Spatial transcriptomics

Feature selection

Name Language Reference
SpatialDE Python paper
SPARK-X R paper
Giotto R paper

Domain detection

Name Language Reference
SpaGCN Python paper
stLearn Python paper
STAGATE Python paper

Requirements

R packages

This benchmark is written in Python and calls R functions through rpy2. If you want to use some methods implemented with R language, please install the corresponding R packages.

Python packages

  • anndata>=0.8.0
  • numpy>=1.21.6
  • setuptools>=59.5.0
  • anndata2ri>=1.1
  • sc3s>=0.1.1
  • scanpy>=1.9.1
  • loguru>=0.6.0
  • rpy2>=3.5.6
  • sklearn>=0.0.post2
  • scikit-learn>=1.2.0
  • SpaGCN>=1.2.5
  • torch>=1.13.1
  • stlearn>=0.4.11
  • pandas>=1.5.2
  • opencv-python>=4.6.0
  • scipy>=1.9.3
  • rich>=13.0.0
  • triku>=2.1.4
  • statsmodels>=0.13.5
  • SpatialDE>=1.1.3
  • STAGATE_pyG>=1.0.0

Installation

git clone https://github.com/ToryDeng/FeatureSelectionBenchmarks
cd FeatureSelectionBenchmarks/
python3 setup.py install --user

Tutorial