Skip to content

Benchmarking feature selection methods for scRNA-seq and spatially resolved transcriptomics

License

Notifications You must be signed in to change notification settings

ToryDeng/FeatureSelectionBenchmarks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FeatureSelectionBenchMarks

This is the code repository of the feature (gene) selection benchmark in both scRNA-seq and spatial transcriptomics.

Software features

After a simple configuration, you can run the benchmark (including data loading, quality control, feature selection, and cell clustering/domain detection) in one single line of code:

from benchmark.run_benchmark import run_bench


# configure the dataset information
data_cfg = {
    'your_data_name': {
        'adata_path': 'path/to/h5ad/file',
        'annot_key': 'annotation_name',
    }}
# configure feature selection methods and numbers of selected features
fs_cfg = {'feature_selection_method': [1000, 2000]}
# configure clustering methods and numbers of runs
cl_cfg = {'clustering_method': 2}
# run the benchmark in one line of code
run_bench(data_cfg, fs_cfg, cl_cfg, modality='scrna', metrics=['ARI', 'NMI'])

The evaluation results will be automatically saved as an XLSX file in the working directory with name like this:

2023-02 14_54_32 scrna.xlsx

Other software features are:

  • Automatically save the results of each step (preprocessed data, selected features, and cluster labels)
  • Reload the cached genes and cluster labels when you use the same data (specified by the data name)
  • Support custom feature selection and cell clustering/domain detection methods
  • Present detailed and pretty logging messages based on rich and loguru (see examples in tutorial)

Currently supported methods

scRNA-seq

Feature selection

Name Language Reference
GeneClust Python paper
vst Python paper
mvp Python paper
triku Python paper
GiniClust3 Python paper
SC3 Python paper
scran R paper
FEAST R paper
M3Drop R paper
scmap R paper
deviance R paper
FEAST R paper
sctransform R paper

Cell clustering

Name Language Reference
SC3s Python paper
Seurat R paper
SHARP R paper
TSCAN R paper
CIDR R paper

Spatial transcriptomics

Feature selection

Name Language Reference
SpatialDE Python paper
SPARK-X R paper
Giotto R paper

Domain detection

Name Language Reference
SpaGCN Python paper
stLearn Python paper
STAGATE Python paper

Requirements

R packages

This benchmark is written in Python and calls R functions through rpy2. If you want to use some methods implemented with R language, please install the corresponding R packages.

Python packages

  • anndata>=0.8.0
  • numpy>=1.21.6
  • setuptools>=59.5.0
  • anndata2ri>=1.1
  • sc3s>=0.1.1
  • scanpy>=1.9.1
  • loguru>=0.6.0
  • rpy2>=3.5.6
  • sklearn>=0.0.post2
  • scikit-learn>=1.2.0
  • SpaGCN>=1.2.5
  • torch>=1.13.1
  • stlearn>=0.4.11
  • pandas>=1.5.2
  • opencv-python>=4.6.0
  • scipy>=1.9.3
  • rich>=13.0.0
  • triku>=2.1.4
  • statsmodels>=0.13.5
  • SpatialDE>=1.1.3
  • STAGATE_pyG>=1.0.0

Installation

git clone https://github.com/ToryDeng/FeatureSelectionBenchmarks
cd FeatureSelectionBenchmarks/
python3 setup.py install --user

Tutorial

About

Benchmarking feature selection methods for scRNA-seq and spatially resolved transcriptomics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published