========================== Spike Sorting and Curation ========================== SpikeLab includes a spike-sorting pipeline in the ``spikelab.spike_sorting`` sub-package. It supports three sorting algorithms — Kilosort2 (Pachitariu et al. 2016), Kilosort4 (Pachitariu et al. 2024), and RT-Sort (Van der Molen et al. 2024) — behind a unified interface built on SpikeInterface (Buccino et al. 2020). The pipeline returns curated :class:`~spikelab.SpikeData` objects ready for downstream analysis. SpikeLab also provides standalone curation methods that can be used on any ``SpikeData`` object, whether it came from the sorting pipeline or from an external source. **References:** - Pachitariu, M., Steinmetz, N., Kadir, S., Carandini, M. & Harris, K. D. Kilosort: realtime spike-sorting for extracellular electrophysiology with hundreds of channels. *bioRxiv* (2016). - Pachitariu, M., Sridhar, S., Pennington, J. & Stringer, C. Spike sorting with Kilosort4. *Nature Methods* 21, 914--921 (2024). - Van der Molen, T., Lim, M., Bartram, J. et al. RT-Sort: An action potential propagation-based algorithm for real time spike detection and sorting with millisecond latencies. *PLoS ONE* 19, e0312438 (2024). - Buccino, A. P., Hurwitz, C. L., Garcia, S. et al. SpikeInterface, a unified framework for spike sorting. *eLife* 9, e61834 (2020). .. contents:: On this page :local: :depth: 2 Spike Sorting ------------- Prerequisites ^^^^^^^^^^^^^ The sorting pipeline requires external dependencies that are **not** installed with SpikeLab by default: - **Kilosort2** requires MATLAB (R2019b+) and the `Kilosort2 repository `_. A Docker variant is available that removes the MATLAB requirement. - **Kilosort4** is pure Python but requires ``torch`` and ``kilosort``. A Docker variant is also available. - **RT-Sort** requires ``torch`` for its neural-network spike detection model. For Maxwell Biosystems ``.h5`` files the HDF5 decompression plugin must also be installed; follow the instructions printed by the loader if the plugin is missing. Basic usage ^^^^^^^^^^^ The main entry point is :func:`~spikelab.spike_sorting.sort_recording`, which accepts a list of recording files and returns a list of :class:`~spikelab.SpikeData` objects: .. code-block:: python from spikelab.spike_sorting import sort_recording results = sort_recording( recording_files=["session1.raw.h5"], sorter="kilosort4", ) sd = results[0] print(sd.N, "units") print(sd.length / 1000, "seconds") The ``sorter`` parameter selects the algorithm: ``"kilosort2"``, ``"kilosort4"``, or ``"rt_sort"``. Configuration and presets ^^^^^^^^^^^^^^^^^^^^^^^^^ The pipeline is configured via a :class:`~spikelab.spike_sorting.config.SortingPipelineConfig` dataclass composed of sub-configs for recording I/O, sorting, curation, waveform extraction, and execution. Pre-built presets provide sensible defaults: .. code-block:: python from spikelab.spike_sorting.config import KILOSORT4 results = sort_recording( recording_files=["session1.raw.h5"], config=KILOSORT4, ) To customise a preset, use the ``override`` method: .. code-block:: python config = KILOSORT4.override( fr_min=0.1, # stricter minimum firing rate (Hz) snr_min=6.0, # stricter SNR threshold n_jobs=16, total_memory="32G", ) results = sort_recording(["session1.raw.h5"], config=config) Individual parameters can also be passed directly as keyword arguments to ``sort_recording``, which builds a config internally: .. code-block:: python results = sort_recording( recording_files=["session1.raw.h5"], sorter="kilosort2", kilosort_path="/opt/Kilosort2", fr_min=0.1, n_jobs=8, ) Multi-stream recordings ^^^^^^^^^^^^^^^^^^^^^^^ For multi-stream files (e.g. Maxwell multi-well ``.raw.h5``), use :func:`~spikelab.spike_sorting.sort_multistream`: .. code-block:: python from spikelab.spike_sorting import sort_multistream stream_results = sort_multistream( recording="multiwell.raw.h5", stream_ids=["well000", "well001"], sorter="kilosort4", ) for stream_id, sds in stream_results.items(): print(f"{stream_id}: {sds[0].N} units") This returns a dict mapping stream IDs to lists of ``SpikeData`` objects. Reusing intermediate results ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The pipeline caches intermediate files (binary recordings, sorting output, waveforms). Control which stages are re-run with the ``recompute_*`` flags: .. code-block:: python results = sort_recording( recording_files=["session1.raw.h5"], sorter="kilosort4", recompute_recording=False, # reuse existing binary recompute_sorting=True, # force re-sort reextract_waveforms=True, # force waveform re-extraction ) See the :doc:`API reference ` for the full configuration options. When the same recording is targeted by two sorts at once, the second raises :class:`~spikelab.spike_sorting._exceptions.ConcurrentSortError` thanks to a per-recording lock file in the intermediate folder; stale locks left behind by a crashed previous run are reclaimed automatically. Built-in safeguards ^^^^^^^^^^^^^^^^^^^ The pipeline runs a set of preflight checks before each sort and live resource watchdogs during it — for free disk, host RAM, GPU memory, sorter log inactivity, and kernel I/O stalls. These are on by default and rarely need attention: a successful sort produces no extra noise, and a failure surfaces as one of the classified exceptions described in :ref:`sort-failures` (e.g. :class:`~spikelab.spike_sorting._exceptions.HostMemoryWatchdogError`). After a successful sort, a human-readable ``sorting_report.md`` is written next to the results, summarising configuration, timings, curation outcome, and per-unit quality stats. A machine-readable ``recording_report.json`` is written alongside it. To tune thresholds (e.g. raise the GPU watchdog's abort percentage, disable the sleep blocker on a workstation that genuinely needs to suspend), pass overrides via the :class:`~spikelab.spike_sorting.config.ExecutionConfig` sub-config of ``SortingPipelineConfig`` — see the autodoc for the full list of knobs. Pipeline canary ^^^^^^^^^^^^^^^ For long sorts, you can opt-in to a short smoke test that runs the configured backend on the first N seconds of each recording before committing to the full sort. This catches "first-time" failures — broken Docker image, missing CUDA kernel, MEX compile errors — in seconds rather than hours. .. code-block:: python from spikelab.spike_sorting.config import KILOSORT4 config = KILOSORT4.override(canary_first_n_s=30.0) results = sort_recording(["session1.raw.h5"], config=config) A canary failure that maps to a :class:`~spikelab.spike_sorting._exceptions.SpikeSortingClassifiedError` short-circuits the full sort with that classified exception. Unexpected canary failures (e.g. canary OOM in a tiny window) are logged but **not** propagated — the live watchdogs handle resource-shaped issues during the real run. Recordings shorter than ``canary_first_n_s`` skip the canary. The canary is off by default (``canary_first_n_s=0.0``). .. _stim-artifact-removal: Stimulation Artifact Removal ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ When sorting recordings with electrical stimulation, stim artifacts must be removed before spike detection. SpikeLab provides :func:`~spikelab.spike_sorting.stim_sorting.artifact_removal.remove_stim_artifacts` for this purpose. Two methods are available: - **polynomial** (default) — fits a low-order polynomial to the post-saturation artifact tail and subtracts it, preserving neural spikes that ride on the artifact decay. Saturated samples are blanked (zeroed). - **blank** — zeros the entire artifact window. Simpler but discards any spikes within the window. Stim times logged by the hardware may not align exactly with the artifact in the recording. Use :func:`~spikelab.spike_sorting.stim_sorting.recentering.recenter_stim_times` to find the true artifact onset: .. code-block:: python from spikelab.spike_sorting.stim_sorting.recentering import recenter_stim_times from spikelab.spike_sorting.stim_sorting.artifact_removal import remove_stim_artifacts # Recenter stim times to the actual artifact onset corrected_times = recenter_stim_times( traces, # (channels, samples) raw voltage stim_times_ms, # logged stim times in ms fs_Hz=20000, peak_mode="down_edge", # alignment mode for biphasic pulses ) # Remove artifacts cleaned, blanked_mask = remove_stim_artifacts( traces, corrected_times, fs_Hz=20000, method="polynomial", artifact_window_ms=10.0, # max tail duration after desaturation poly_order=3, # cubic polynomial (default) ) The ``peak_mode`` parameter controls how each artifact is aligned: - ``"abs_max"`` — largest absolute voltage (monophasic pulses) - ``"down_edge"`` — positive-to-negative zero-crossing (biphasic anodic-first) - ``"up_edge"`` — negative-to-positive zero-crossing (biphasic cathodic-first) - ``"pos_peak"`` / ``"neg_peak"`` — largest positive or most negative voltage When multiple stim events occur in rapid succession (e.g. burst or paired-pulse protocols), the module automatically detects whether the signal has returned to baseline between events and extends the blanking region across the entire burst if needed. Multi-pulse alignment (``multi_peak``) """""""""""""""""""""""""""""""""""""" When a recentering window may contain several pulses (paired-pulse, short burst, high-frequency train), pass ``multi_peak=True`` to lock onto a specific pulse rather than the strongest one. Available on :func:`~spikelab.spike_sorting.stim_sorting.recentering.recenter_stim_times`, :func:`~spikelab.spike_sorting.stim_sorting.sort_stim_recording`, and :func:`~spikelab.spike_sorting.stim_sorting.preprocess_stim_artifacts`. Parameters: - ``multi_peak`` (default ``False``) — opt-in to multi-pulse-aware alignment. - ``multi_peak_select`` (default ``"first"``) — ``"first"`` for first-pulse alignment (typical for train PSTHs); ``"last"`` for after-train rebound. - ``multi_peak_threshold`` (default ``0.6``) — minimum amplitude as a fraction of the strongest peak in the search window for a peak to count as a real pulse. - ``multi_peak_min_separation_ms`` (default ``2.0``) — minimum spacing between candidate peaks; prevents a single broad pulse from being counted twice. .. code-block:: python corrected_times = recenter_stim_times( traces, stim_times_ms, fs_Hz=20000, peak_mode="down_edge", multi_peak=True, multi_peak_select="first", multi_peak_threshold=0.6, multi_peak_min_separation_ms=2.0, ) The polynomial detrend approach is related to SALPA, adapted for offline use: - Wagenaar, D. A. & Potter, S. M. Real-time multi-channel stimulus artifact suppression by local curve fitting. *J Neurosci Methods* 120, 113--120 (2002). Unit Curation ------------- SpikeLab provides curation methods that filter units by quality metrics. These work on any :class:`~spikelab.SpikeData` object — not just output from the sorting pipeline. Each curation method returns a tuple ``(sd_curated, result_dict)`` where ``result_dict`` contains: - ``"metric"`` — ``np.ndarray (N,)`` with the per-unit metric for **all** original units. - ``"passed"`` — ``np.ndarray (N,)`` boolean mask of units that passed. Individual criteria ^^^^^^^^^^^^^^^^^^^ Apply a single quality criterion at a time: .. code-block:: python # Remove units with fewer than 50 spikes sd_curated, res = sd.curate_by_min_spikes(min_spikes=50) # Remove units below 0.1 Hz firing rate sd_curated, res = sd.curate_by_firing_rate(min_rate_hz=0.1) # Remove units with > 1% ISI violations sd_curated, res = sd.curate_by_isi_violations( max_violation=1.0, threshold_ms=1.5, ) # Remove units with low SNR sd_curated, res = sd.curate_by_snr(min_snr=5.0) # Remove units with inconsistent waveforms sd_curated, res = sd.curate_by_std_norm(max_std_norm=1.0) SNR and waveform consistency (``curate_by_snr``, ``curate_by_std_norm``) require that the ``SpikeData`` object has ``raw_data`` attached. If the metrics have not been pre-computed, call :meth:`~spikelab.SpikeData.compute_waveform_metrics` first: .. code-block:: python sd, metrics = sd.compute_waveform_metrics() sd_curated, res = sd.curate_by_snr(min_snr=5.0) Merge-based deduplication ^^^^^^^^^^^^^^^^^^^^^^^^^ When spike sorting produces duplicate units for the same neuron recorded on adjacent electrodes, you can merge them using :meth:`~spikelab.SpikeData.curate_by_merge_duplicates`. This finds spatially nearby unit pairs, filters by ISI violation rate and waveform cosine similarity, then greedily merges accepted pairs: .. code-block:: python sd_merged, res = sd.curate_by_merge_duplicates( dist_um=24.8, # max inter-electrode distance in um cosine_threshold=0.5, # min waveform similarity max_isi_increase=0.04, # reject merge if ISI violations rise too much ) n_absorbed = (~res["passed"]).sum() print(f"Merged {n_absorbed} duplicate units") This requires ``neuron_attributes`` with position and ``avg_waveform`` entries, which are set automatically when the ``SpikeData`` comes from the sorting pipeline. Batch curation ^^^^^^^^^^^^^^ To apply multiple criteria in one call, use :meth:`~spikelab.SpikeData.curate`. Only criteria with non-``None`` values are applied: .. code-block:: python sd_curated, results = sd.curate( min_spikes=50, min_rate_hz=0.1, isi_max=1.0, min_snr=5.0, ) # results contains per-criterion entries for criterion, res in results.items(): n_removed = (~res["passed"]).sum() print(f"{criterion}: removed {n_removed} units") Curation history ^^^^^^^^^^^^^^^^ For reproducibility, build a serialisable record of what was removed and why: .. code-block:: python history = sd.build_curation_history( sd_original=sd_raw, sd_curated=sd_curated, results=results, ) The returned dict is JSON-serialisable and can be stored in a workspace or saved alongside the curated data. Visualising sorted units ^^^^^^^^^^^^^^^^^^^^^^^^ To inspect the spatial waveform footprint of one or more sorted units across the channel grid, use :func:`~spikelab.spikedata.plot_utils.plot_unit_footprints` (or the :meth:`~spikelab.SpikeData.plot_unit_footprints` convenience method on ``SpikeData``). Channels whose per-channel peak-to-peak amplitude exceeds a configurable threshold are drawn at their (x, y) positions, with the primary channel highlighted; each unit gets its own subplot in an auto-sized grid. Useful for sanity-checking footprint locality and spotting duplicated/over-merged units after sorting. Sorting Concatenated Recordings -------------------------------- When a directory containing multiple recording files is passed to ``sort_recording``, the pipeline concatenates them into a single recording for sorting. The returned ``SpikeData`` objects are automatically split back into per-recording epochs. When a list of recording paths is passed instead, each file is processed sequentially without concatenation. If you need to re-split a concatenated ``SpikeData`` manually (e.g. after loading a saved pickle that was not yet split), use :meth:`~spikelab.SpikeData.split_epochs`. This requires ``rec_chunks_ms`` in ``metadata`` (set automatically by the sorting pipeline) and time-shifts each epoch to start at t=0: .. code-block:: python epoch_sds = sd.split_epochs() for i, epoch_sd in enumerate(epoch_sds): print(f"Epoch {i}: {epoch_sd.N} units, {epoch_sd.length:.0f} ms") .. _sort-failures: Handling Sort Failures ---------------------- When a sorting run fails, SpikeLab can classify the failure into one of three categories so that batch scripts can implement skip/retry/stop policies without parsing generic error messages. The three categories are: - **BiologicalSortFailure** -- the recording itself cannot be sorted (too silent, all channels bad, no waveforms to compute metrics on). Recommended policy: mark the target as not-sortable and move on. - **EnvironmentSortFailure** -- the host environment or container runtime is misconfigured (Docker down, HDF5 plugin missing). Recommended policy: stop and fix the environment. - **ResourceSortFailure** -- the job exhausted a machine resource (GPU out of memory). Recommended policy: retry with reduced parameters. All three inherit from :class:`~spikelab.spike_sorting._exceptions.SpikeSortingClassifiedError`, which in turn inherits from ``RuntimeError``. Each sorter has a dedicated classifier that inspects both sorter logs and exception chains to identify known failure signatures: - ``classify_ks2_failure`` — Kilosort2 - ``classify_ks4_failure`` — Kilosort4 - ``classify_rt_sort_failure`` — RT-Sort .. code-block:: python from spikelab.spike_sorting._classifier import ( classify_ks2_failure, classify_ks4_failure, classify_rt_sort_failure, ) from spikelab.spike_sorting._exceptions import ( BiologicalSortFailure, EnvironmentSortFailure, ResourceSortFailure, ) # Pick the classifier matching your sorter classify = classify_ks4_failure # or classify_ks2_failure, classify_rt_sort_failure try: results = sort_recording(["session1.raw.h5"], sorter="kilosort4") except Exception as exc: classified = classify(output_folder, exc) if classified is not None: if isinstance(classified, BiologicalSortFailure): print(f"Skipping (biology): {classified}") elif isinstance(classified, EnvironmentSortFailure): raise # stop the batch elif isinstance(classified, ResourceSortFailure): print(f"Retry with smaller batch: {classified}") else: raise # unrecognised failure Specific exception classes carry diagnostic attributes. For example, :class:`~spikelab.spike_sorting._exceptions.InsufficientActivityError` exposes ``threshold_crossings``, ``units_at_failure``, and ``nspks_at_failure`` parsed from sorter logs. RT-Sort additionally raises :class:`~spikelab.spike_sorting._exceptions.ModelLoadingError` when the detection model cannot be loaded. See the :doc:`API reference ` for the full exception hierarchy. Watchdog and lock failures ^^^^^^^^^^^^^^^^^^^^^^^^^^ The classified-error hierarchy includes additional subclasses raised by the resource watchdogs and the per-recording sort lock. They surface through ``sort_recording`` like any other classified failure and carry a ``RecordingResult.status`` string for batch scripts to dispatch on: .. list-table:: :header-rows: 1 :widths: 30 20 50 * - Exception - ``status`` - Recommended remediation * - :class:`~spikelab.spike_sorting._exceptions.HostMemoryWatchdogError` - ``oom_host_ram`` - Reduce ``n_jobs`` / ``total_memory``, or raise ``host_ram_abort_pct``. * - :class:`~spikelab.spike_sorting._exceptions.GpuMemoryWatchdogError` - ``oom_gpu`` - Halve the sorter's per-batch knob (``NT`` for KS2, ``batch_size`` for KS4, ``num_processes`` for RT-Sort) and retry. The pipeline does this automatically up to ``oom_retry_max`` times. * - :class:`~spikelab.spike_sorting._exceptions.GpuThermalWatchdogError` - ``gpu_thermal`` - GPU temperature crossed the abort threshold. Pause until the device cools; check airflow, ambient temperature, and heatsink dust. A persistent trip indicates a cooling failure. * - :class:`~spikelab.spike_sorting._exceptions.SorterTimeoutError` - ``sorter_timeout`` - Sorter log went silent past the inactivity tolerance. Raise ``sorter_inactivity_max_s`` for unusually long preprocessing, or investigate a hung subprocess. * - :class:`~spikelab.spike_sorting._exceptions.DiskExhaustionError` - ``disk_exhausted`` - Free space on the intermediate / results volume. Inspect ``disk_exhaustion_report.json`` next to the results. * - :class:`~spikelab.spike_sorting._exceptions.IOStallError` - ``io_stall`` - The kernel I/O byte counters stopped advancing. Usually a hung network share or failing disk; investigate before retrying. * - :class:`~spikelab.spike_sorting._exceptions.ConcurrentSortError` - ``concurrent_sort`` - Another process is already sorting this recording (per-recording lock file). Wait for it to finish, or remove the stale lock if the previous sort crashed.