Sort Spikes

This analysis groups spikes into clusters based on the similarity of their shapes.

Clustering algorithm in this analysis is similar to the one used in the clustering step of the SpyKING CIRCUS spike sorting toolbox.

Parameters

Parameter	Description
Percent of Waveforms in Neighborhood	Percent of waveforms to use when calculating the density of data points around each waveform in the PCA projection space. See Algorithm below.
Maximum Initial Number of Clusters	Maximum initial number of clusters. Similar clusters will be merged later.
Cluster Merge Threshold	Threshold to be used when merging clusters. See Algorithm below.
Confidence Level for Outliers	Confidence level (in percent) for detecting outliers. See Algorithm below.
Create Neurons	An option to create neuron variables for each cluster (waveform variables for each cluster are always created).

Summary of Numerical Results

The following information is available in the Summary of Numerical Results

Column	Description
Variable	Variable name.
XMin	X Axis minimum in the PCA projections space.
XMax	X Axis maximum in the PCA projections space.
YMin	Y Axis minimum in the PCA projections space.
YMax	Y Axis maximum in the PCA projections space.

Algorithm

The program selects waveforms in the specified time range and the interval filter.

Principal components are calculated using selected N waveforms of the given waveform variable.

First, the matrix of covariances between waveform points (c[t, s]) is calculated:

c[t, s] = covariance between vectors waveform_value[t, *] and waveform_value[s,*], s, t = 1, ...,number_of_points_in_each_waveform.

Then, the eigenvalues and eigenvectors are calculated for the matrix c[t, s]. The eigenvectors (principal components) are sorted according to their eigenvalues. The first principal component has the largest eigenvalue.

Analysis graph shows the scatter plot where x and y are projections of the selected waveforms to the first two principal components (projection is a sum of products waveform_value[t]*principal_component_value[t]).

The points in the PCA projections space are then used for cluster analysis.

For each point, the mean distance R to the nearest S points is calculated, where

S = Number_of_waveforms * Percent_of_Waveforms_in_Neighborhood/100

Then, the distance D to the nearest point with a lower R (or higher density) is calculated for each data point.

The intuition of the algorithm is that the cluster centroids should be the points with a high density (i.e. low R) and far apart from other points with higher density (high D).

The M points (where M = Maximum_Initial_Number_of_Clusters) with the highest ratios D/R are considered as initial cluster centroids. Each point is then assigned to the same cluster as the closest point with a higher density (lower R).

Normalized distances Gamma between clusters are calculated according to equation (2) of the publication describing the details of the SpyKING CIRCUS algorithm. The pairs of clusters with Gamma less than Cluster_Merge_Threshold are merged.

For each cluster, the Confidence_Level_for_Outliers percentile P for the R values of all the points in the cluster is calculated using bootstrap. Data points with R values exceeding P are marked as outliers and the waveforms corresponding to outliers are assigned as unsorted.

Reference

Pierre Yger et al. A spike sorting toolbox for up to thousands of electrodes validated with ground truth recordings in vitro and in vivo. Elife 2018 Mar 20;7:e34518