Skip to content

LSENS-BMI-EPFL/ephys_preprocessing

Repository files navigation

EphysUtils

Pipeline to preprocess extracellular electrophysiology Neuropixels data acquired using SpikeGLX. 🐁🔌

Notes about this pipeline ♻️

Overview of the pipeline 📑

graph LR
    Step1["1- Event extraction <br/> filtering"] -.-> Step2["2- Coil artifact <br/> correction"]
    Step2 -.-> Step3(["3- Optional: <br/> chunk zeroing"])
    Step3 -.-> Step4["4- Spike sorting <br/> & quality metrics"]
    Step4 -.-> Step5["5- Data stream <br/> synchronization"]
    Step5 -.-> Step6["6- Mean waveform <br/> & metrics"]
    Step6 -.-> Step7["7- LFP <br/> analysis"]
Loading

Summary of the main steps

  • Events extraction (CatGT): extracts times of TTL pulses acquired with the NI card in the nidq.bin output file of SpikeGLX
  • Filtering (CatGT): common median referencing by default
  • Coil artifact correction (TPrime):
    1. synchronize extracted coil/whisker stimulation times to each IMEC probe base time
    2. at each artifact time, replace duration of artifact (3ms default) by mean voltage just before, for all channels
    3. create copy of .ap/.meta file with the "corrected" suffix
  • Chunk zeroing (OverStrike): zero-out entire chunks of data in the recordings when there is unsalvageable noise
  • Spike sorting (Kilosort): spike sorting algorithm for neuron identification, calls Kilosort 2.0 from the Python MATLAB engine (see below)
  • Quality metrics: runs quality metrics pipeline from Bombcell (CortexLab) from the MATLAB engine, with by modified default:
    • Plotting is set to off (one plot/cluster generated), set to True for initial debugging/inspection
    • Further splitting of non-somatic to mua/good is set False
    • Computations of drift estimation/ephys properties is set to False (not immediately necessary)
  • Data stream synchronization (TPrime): synchronizes task event times (e.g. trial starts) and spikes times to the same time from a reference stream (default is the first IMEC probe clock)
  • Mean waveform estimation (C_Waves): efficient parsing of raw recordings to extract single spike waveforms to compute mean waveforms for each cluster
  • Mean waveform metrics: code that calculates waveform metrics like peak-to-trough duration, etc. (note, bombcell looks at template waveforms for peaks/troughs, but can also get raw mean waveforms and metrics)
  • LFP analysis: performs depth estimation on LFP data

Execution time ⏱️: for a recording of ~1h with 4 probes inserted deep (~3mm) and saving the entire default bank 0, the entire pipeline take about 12-24 hours on a local machine. This is very dependent on the recordings itself. Spike sorting, CatGT and C_waves take the longest time.

Installation 🖥️

Setting up

  • You must have a GPU for spike sorting
  • You must have installed Kilosort e.g. Kilosort2.0 (from here: https://github.com/jamesjun/Kilosort2) (with correct MATLAB version e.g. R2021b)
  • You must have installed CatGT, TPrime, C_Waves and OverStrike
  • You must have cloned npy-matlab and bombcell e.g. in users/Github/

Environments

  1. Install the provided ephys_utils conda environment:
  • conda env create -f environment.yml or conda create --name ephys_utils --file requirements.txt
  1. Install MATLAB e.g. R2021b - specify the MATLAB version to use when calling the MATLAB engine in Python:
  • In MATLAB command window, type matlabroot to get root path
  • In terminal, go to <matlabroot>\extern\engines\pyton, then type python setup.py install
  • If the previous did not work, try: https://ch.mathworks.com/matlabcentral/answers/1998578-invalid-version-r2021-when-installing-for-python-3-7-3-9. That is, first run: python -m pip install --upgrade setuptools
  • Example for R2021b, run python -m pip install matlabengine==9.11.21
  • Note: if you can't run the matlab engine to run kilosort, run kilosort separately in MATLAB directly. Then continue with the steps of this pipeline.
  1. Copy the file run_main_kilosort.m from this repo in matlab to the repo where you have installed Kilosort2, and update in that file:
  • path to kilosort folder
  • path to npy-matlab
  • path to config files
  1. Copy the file run_bombcell.m from this repo in matlab to the repo where you have installed bombcell, and update in that file:
  • path to bombcell folder
  • path to config files
  • Note: bombcell's main script has changed, so you need to adapt the script to the new version of bombcell and noting that some bombcell functions are commented in this pipeline.
  1. Install Phy, (optional, for data visualization):

Usage ⚡

The pipeline is separated into two main scripts:

  1. preprocess_spikesort.py: performs Steps 1-2-3-4 -> specify raw data input folder path in lab server data/
  2. optionally, inspect spike sorting and curation results using Phy and Phy's environment
    • conda activate phy2
    • phy template-gui params.py in the Kilosort output folder (note: edit params.py to point to the .ap.bin file if you want to see TraceView or single waveforms)
  3. preprocess_sync.py: performs Steps 5-6-7 -> specify processed data input folder path in lab server analysis/FirstName_LastName/data

The output of this pipeline can then be used to create NWB files using the NWB_converter in particular the ephys_to_nwb.py converter.

How to contribute ✨

  1. Let's discuss changes/fixes
  2. Make a branch, implement changes
  3. Make a pull request and ask a user to review it!
  4. Merge & inform other users 🙂

Possible future improvements (and ideas) 🗻

  • Adaptation/robustness for Neuropixels 2.0 probes specifications and metadata (although most tools do take care of different metadata files)
  • Kilosort 4.0 called from python directly (if performance judged satisfactory)
  • Integration of SpikeInterface tool(s)
  • More LFP analyses...?
  • etc.

About

Preprocessing pipeline for Neuropixels data.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published