Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction: Brain as a Source of Big Data

In the age of the Internet of Things (IoT) with millions of interconnected sensors spewing out data, we are facing a data deluge—there is a need for solutions to store and process this data. A unique set of IoT applications relates to the human body—in particular wearables and implantables to collect data from the human brain for neuroscientific research, prostheses or medical interventions [16]. The study of the human brain is one of the most important frontiers in science research today—there is a lot of emphasis on this with several billion dollar efforts worldwide to understand more about the brain [7, 8]. To get an idea about the scale of data generated by the brain, we first note from anatomy that the average adult human cortex has approximately 1011 neurons, widely regarded as the fundamental computational unit of the brain, with 1014 synapses or interconnections [9]. Assuming average cortical firing rates (a neural firing or discharge refers to a digital like pulse also called a spike or action potential) of 1–10 Hz [10], the human brain is generating at least 1011 spikes or events per second and about 1014 synaptic operations per second. Assigning an unique address or identifier to each neuron would need b addr = log2(1011) ≈ 35 bits—hence, the data rate generated by the brain is a whopping 3. 5 Terabits/second. To put this in perspective, the exponential growth of data has put internet data in the exascale ( 1018 bits). One human brain can generate approximately the same amount of data in 106 s or 50 days! Of course, this is an extreme case and we are not aiming to store all the neural firings of a human brain over his or her lifetime (at least not at this moment) and neither do we currently possess the technology to access this data (but we are constantly striving to record data from more neurons and this is one of the prime goals of the Brain initiative)—but this helps to give an idea about the scale of the problem. Figure 14.1 shows the rapid scaling of data generated from a single human brain over time.

Fig. 14.1
figure 1

The brain as a source of big data: a single human brain generates data at a rate of 3. 5 Terabits/second. The total data can reach exabyte scale within a year

Just like any other application related to big-data, the problems of storage and manipulation exist in this data generated by the brain. However, an added problem stems in this case from the strict power dissipation requirement of electronics implanted within the brain to collect the data. Any electronics in contact with the cortical tissue cannot generate heat larger than 80 mW/cm2 [11, 12] to avoid damaging the neural tissue (temperature rise less than 1 C). Instead of implants, another option is to collect data non-invasively through EEG from the scalp—however, EEG provides a highly filtered (both spatially and temporally) picture of the brain activity and is not informative enough for activities with many degrees of freedom such as upper limb prostheses [13, 14]. Therefore, in the rest of this chapter, we only consider the case of neural recording from implanted electrodes that can provide enough information for dexterous motor control.

2 The Nature of Neural Data

The signals recorded by neural implants are obtained typically through microelectrode arrays such as the Utah or Michigan arrays [1517]. The neural signals can be broadly divided into two categories—(1) Local Field Potentials (LFP) that are 1–10 mV in amplitude occupying a bandwidth of 1–100 Hz produced by combined activity of groups of neurons and (2) neural spikes or action potentials which are much smaller (10–100 μV in amplitude) but occupy a much larger bandwidth of ≈ 0. 2–5 kHz. While both signals have useful information [18, 19], most of the studies on neural prosthetics that require fine motor manipulation typically use neural spikes [2023]. In this chapter, we will therefore focus on neural recording systems for sensing and transmitting neural spikes. Unlike LFP signals where the amplitude is informative, it is believed that spikes are like digital signals [24] where the amplitude is non-informative but the timing and firing rate of spikes are important. An example of a spike recorded from pre-frontal cortex of a rat is shown in Fig. 14.2.

Fig. 14.2
figure 2

A neural spike recorded from the pre-frontal cortex of a rat. Neural spikes typically have a small amplitude ≈ 10–100 μV while occupying a large bandwidth of ≈ 0. 2–5 kHz

3 System Architectures for Neural Spike Recording Systems: Neuromorphic Compression Schemes

The different blocks comprising a typical neural recording system are shown in Fig. 14.3a. In a typical system, the neural signal is amplified by a low-noise amplifier (LNA) [2529], followed by an optional variable gain amplifier (VGA) and finally an analog-digital converter (ADC) [2932] before being transmitted wirelessly. We can estimate the data rate for such a system under some mild assumptions. Denoting the number of recording channels as N chan, ADC sampling rate and bit resolution as f ADC and b ADC, respectively, the data rate R typ of a typical neural implant is given by:

$$\displaystyle{ R_{\mathrm{typ}} = N_{\mathrm{chan}} \times f_{\mathrm{ADC}} \times b_{\mathrm{ADC}} }$$
(14.1)

As an example, for moderate values of N chan = 100, f ADC = 20 kHz, and b ADC = 10 bits, we get R typ = 20 Mbps—a huge data rate that will drain out an implant’s battery in a matter of hours given typical power requirements of ≈ 50–1000 pJ/bit for wireless transmitters [3336]. Hence, it is imperative to compress the data and reduce the concomitant power dissipation so that the neural recording system can be scaled in future to thousands or millions of channels. One possible way to do this is to take inspiration from the brain—in the absence of the implant, the brain would have processed the thousands of neural spikes recorded by the implant and given a refined command to the next region. Similarly, we can also use electronics to perform this signal processing on the implant, thus reducing the bandwidth of data to be transmitted. Figure 14.3 shows three different modes of compression based on the amount of signal processing kept on the implant. There is a trade-off in this case between amount of extra area and energy expended on signal processing in-implant versus the energy saved in reduced transmission. Clearly, it is not beneficial if the added circuits for signal processing burn as much energy as the energy saved in reduced data rate!

Fig. 14.3
figure 3

Block diagram of a typical neural recording system which senses, digitizes and wirelessly transmits the neural data. As an alternative to sending raw data, different neuromorphic schemes may be used as shown to achieve different rates of compression. (a ) Typical. (b ) Mode 1. (c ) Mode 2. (d ) Mode 3

One way to perform the processing at very low energy/area overheads is to use neuro-inspired analog circuits, sometimes also called ‘neuromorphic’ circuits following Carver Mead’s seminal paper [37]. Mead and others [38] have shown that analog circuits require less energy and area than digital counterparts when processing signals at a low resolution, typically ≤ 8 bits. The brain also uses a similar principle by computing using analog quantities such as charge, currents and ionic concentrations and this is cited as one of the reasons for its power efficiency. This is hence well suited for processing noisy sensory signals where precision is limited by input signal to noise ratios. In the rest of the chapter, we will explore several such schemes to compress neural recording data by extracting information from it.

3.1 Compression Mode 1: Spike Detection

The first scheme is inspired by a communication protocol used in neuromorphic chips. Several neuromorphic sensors and neural networks have been designed using brain-inspired analog processing principles [3944] while noise robust digital pulses are used for communication [45, 46]. Since digital communication is much faster ( ∼ 10 Gbps) than the average firing rate of a neuron ( ∼ 10 Hz), the firing information of multiple neurons can be multiplexed on the same serial bus where the identity of the source neuron is encoded in a simultaneously transmitted digital address. This protocol is referred to as Address Event Representation (AER) and allows neuromorphic spiking chips to communicate data from N neurons using only log2(N) wires.

The AER scheme can be adopted for neural implants as well since in many cases, we are interested in only knowing the occurrence of spikes. In that case, circuits are needed to distinguish spikes from background noise—these are called spike detectors. Figure 14.3b denotes this scheme as Mode 1 with three possible variants. The earliest instance of such detectors is based on simple thresholding circuits[24] where it is assumed that the amplitude of the spike is larger than background noise by a certain amount. A feedback loop is used to track the baseline noise level and the spike detection threshold is set to a multiple of this value. However, this method was found to produce high false positives in noisy conditions and hence an improved detection method using a non-linear energy operator (NEO) has been proposed. The NEO operator is defined as:

$$\displaystyle{ \mathrm{NEO}(V ) = \left (\frac{dV } {dt} \right )^{2} -\frac{d^{2}V } {dt^{2}} \cdot V }$$
(14.2)

Several analog implementations of the NEO scheme have been reported [4750] and an example of spike detection waveforms from the implementation in [47] is shown in Fig. 14.4. We refer to this method as Mode 1-A.

Fig. 14.4
figure 4

Input noisy neural signal and corresponding digital spike detection output from the implementation in [47]. Only the detection result can be transmitted thus eliminating background data

The spike detection method discards all information about the amplitude and shape of the neural spike—this information may, however, be useful at a later stage to decide the identity of the source neuron. Hence, two other variants of the previously mentioned detection scheme have been commonly used. In some cases [51, 52], the authors use a regular spike detector to trigger the capture of a pre-defined number of samples of the neural spike signal so that all the features of the wave shape are retained for future extraction. We refer to this method as Mode 1-B. The other prevalent approach is to extract the relevant features (such as maximum, minimum, temporal width, derivative extrema) from the neural spike waveform when triggered by the spike detector [36, 48, 5356]. Only these features are now digitized and transmitted providing a good trade-off between data reduction and signal information retention. We refer to this as Mode 1-C.

We can now derive the data rates R 1−A , R 1−B , and R 1−C required by each of the compression schemes. Denoting the number of biological neurons recorded by the sensor as N neu (different from N chan), firing rates of each neuron as f bio we can write the equations as:

$$\displaystyle{ R_{1-A} = N_{\mathrm{neu}} \times f_{\mathrm{bio}} \times \left \lceil \mathrm{log}_{2}(N_{\mathrm{chan}})\right \rceil }$$
(14.3)
$$\displaystyle{ R_{1-B} = N_{\mathrm{neu}} \times f_{\mathrm{bio}} \times f_{\mathrm{ADC}} \times b_{\mathrm{ADC}} \times t_{\mathrm{spk}} }$$
(14.4)
$$\displaystyle{ R_{1-C} = N_{\mathrm{neu}} \times f_{\mathrm{bio}} \times N_{f} \times b_{\mathrm{ADC}} }$$
(14.5)

where t spk denotes the time span of the neural signal per spike transmitted in Mode 1-B, N f denotes the number of features extracted in Mode 1-C and other variables have same meaning as defined earlier. We can estimate the degree of compression by assuming some nominal values of the parameters: N neu = 200, f bio = 10 Hz, t spk = 3 ms, N f  = 4, N chan = 100, f ADC = 20 kHz and b ADC = 10 bits. Then the three data rates become R 1−A  = 14 kbps, R 1−B  = 120 kbps and R 1−C  = 80 kbps. Compared to the typical data rate, these modes offer a compression between ≈ 100–1000×.

3.2 Compression Mode 2: Spike Sorting

The next possible scenario for compression is to use the features of the spike waveform to separate or classify each different wave shape into its own category representing a different source neuron. This method of assigning each distinct neural spike shape recorded on the same channel one unique identifier is called ‘spike sorting’ [57, 58]. Each category, in which spikes have similar shape, is believed to be generated by one neuron. The reasoning behind spike sorting is that the shape of spikes generated by neurons and recorded by an electrode is stereotypical, determined by the morphology of the dendritic trees of the neuron and the transmission pathway to the electrode. It is therefore believed that the shape of spikes from different neurons are distinct from each other and does not change over time, or at least over a significant amount of time. Though some work has demonstrated spike sorting may not be necessary for robust decoding performance [59, 60], the majority of work today still uses spike sorting to squeeze out as much information as possible from the neural recording implant.

Some authors have integrated a spike sorting classifier on the implant [61, 62]. While there are some implementations that have used supervised methods similar to template matching [63], most other approaches [64, 65] use unsupervised clustering techniques due to the advantage of not needing explicit training sessions. Figure 14.5 depicts the typical steps involved in spike sorting. After sorting, only the distinct identifier of the source neuron needs to be sent resulting in huge compression. We can estimate this data rate in Mode 2 as:

$$\displaystyle{ R_{2} = N_{\mathrm{neu}} \times f_{\mathrm{bio}} \times \left \lceil \mathrm{log}_{2}(N_{\mathrm{neu}})\right \rceil }$$
(14.6)

where the symbols have the same meaning defined earlier. Using the same values of the parameters used in the earlier Sect. 14.3.1, we can estimate the data rate for this mode to be R 2 = 16 kbps equivalent to a compression of ≈ 1000× compared to a typical case.

Fig. 14.5
figure 5

The steps involved in spike sorting include feature extraction followed by unsupervised clustering to separate the neural spikes into distinct categories according to their shape

3.3 Compression Mode 3: Intention Decoding

The final and most advanced mode of compression is attained when the last stage of signal processing—decoding intentions from the recorded multi-channel spike train—is also integrated in the implant. This is shown in Fig. 14.3c as Mode 3. In this chapter, we focus on systems for motor prosthesis only—hence, in this case, intentions refer to ‘motor’ intentions or desire to move a limb. The fundamental of current decoding algorithms can be referred back to the work done by Georgopoulos and his colleagues [66, 67]. It is revealed in the experiment that the activity intensity of some neurons in the motor cortex is tuned to be a sinusoidal function of the movement direction of the arm with respect to a preferred direction where the activity reaches its maximum. They therefore proposed to represent each neuron by a vector indicating its preferred direction. The population vectors can be obtained by linear combination of all preferred vectors in the group weighted by the firing rate in the short time period of tens of millisecond, leading to a prediction on the velocity of upcoming arm movement [68].

Current state-of-the-art decoding algorithms for mapping population activity into motor intention can be categorized into two broad subgroups: inferential decoders [6971] and classifiers [1, 20, 72]. However, most of these algorithms are run using bulky computers with wires connecting to the patient which impairs free movement and are a risk for infection. Recently, some approaches have been proposed for custom, low-power, compact hardware implementations of decoding algorithms [7375] of which only one has shown measured results from a low-power integrated circuit [76] to decode motor intentions for dexterous finger movement as done in [20]. In the rest of the chapter, we elaborate on the details of this design, show the decoding performance and estimate achievable data compression using this scheme.

3.3.1 Algorithm: Extreme Learning Machine

The machine learning algorithm used in this work is the Extreme Learning Machine (ELM) [77, 78]. It is a two-layer neural network (Fig. 14.6) where the first layer of weights from inputs to hidden neurons (w ij denotes weight from i-th input to j-th hidden neuron) are fixed and random. Only the weights in the second layer from the hidden neurons to output neurons need to be trained. Using β ki to denote the weight from the i-th hidden neuron to the k-th output neuron, we can express the k-th output o k as:

Fig. 14.6
figure 6

Extreme Learning Machine (ELM) is a two-layer neural network where the weights of the first layer are random and fixed. Only second layer weights are tuned according to the task

$$\displaystyle\begin{array}{rcl} & & o_{k} =\sum _{ i}^{L}\beta _{ ki}g(\boldsymbol{w}_{i},\boldsymbol{x},b_{i}) =\sum _{ i}^{L}\beta _{ ki}h_{i} =\boldsymbol{ h}^{T}\boldsymbol{\beta }_{ k} \\ & & \boldsymbol{w}_{i},\boldsymbol{x} \in \mathfrak{R}^{D};\text{ }\beta _{ ki},b_{i} \in \mathfrak{R};\text{ }\boldsymbol{h},\boldsymbol{\beta }_{k} \in \mathfrak{R}^{L}{}\end{array}$$
(14.7)

where x denotes the D-dimensional input vector, h is the L-dimensional output of the hidden layer, g() is the non-linear activation function of the hidden layer and b i denotes the bias of the i-th hidden layer neuron. One of the commonly used activation functions is the additive node where \(h_{i} = g(\boldsymbol{w}_{i}^{T}\boldsymbol{x} + b_{i})\) and \(g: \mathfrak{R}\rightarrow \mathfrak{R}\) is any non-linear function with finitely many discontinuities. While the outputs o k can be directly used for regression, for classification, we assign the input sample to the class belonging to the output neuron with the highest value.

The second layer weights can be obtained by a direct solution instead of typically used iterative methods such as back propagation for multi-layer neural networks—hence, the training time for ELM based systems is much smaller. The output weights for each of the C classes can be optimized separately by using the same hidden layer values. Suppose there are p samples and let H denote the p × L hidden layer matrix where each row stores the output of the hidden neurons for one sample. Further, let \(\boldsymbol{T}_{k} \in \mathfrak{R}^{p}\) denote the target or desired values for the k-th hidden neuron. Then, the ideal weights \(\hat{\boldsymbol{\beta }_{k}}\) for the k-th hidden neuron is obtained as solution of the following optimization problem [78]:

$$\displaystyle{ \hat{\boldsymbol{\beta }}_{k} =\mathop{ \mathrm{arg\,min}}\limits _{\beta _{k}}\|H\boldsymbol{\beta }_{k} -\boldsymbol{ T}_{k}\|_{2} +\gamma \|\boldsymbol{\beta } _{k}\|_{2} }$$
(14.8)

where the second term in the equation is needed for regularization and γ is optimized on the validation set as a hyper-parameter. Closed form solutions to the value of \(\boldsymbol{\beta }_{k}\) can be obtained in two different ways for the cases where the number of training samples is less or more than the number of hidden neurons [78].

To apply this neural network to neural decoding, the authors use an approach similar to [20] where the Artificial Neural Network is replaced by an ELM. The ELM decodes the onset time as well as the type of movement from the asynchronous neural spikes every T s  = 20 ms. First, instantaneous firing rate r i (t k ) at time t k of each biological neuron is computed by counting the number of spikes in a time window T w  = 100 ms. Then, the input feature vector to the ELM at time t k is defined by:

$$\displaystyle{ \boldsymbol{x}(t_{k}) = [r_{1}(t_{k}),r_{2}(t_{k})\ldots r_{D}(t_{k})] }$$
(14.9)

The total number of output neurons C in this case is equal to M + 1 where there are M movement types and one extra neuron is used to classify the onset time of movement. For training, the last output for onset time is trained on the entire dataset while the others are trained only on neural data during movement. Also, the last neuron is trained to solve a regression problem where the target function is trapezoidal—it gradually rises from 0 to 1 to mimic the gradually increasing activity of biological neuron ensembles. To reduce false positives in detecting movement onset, further processing is done on this ‘primary’ output by voting across the decision for several consecutive time samples [76] to produce the post-processed output. Another special signal processing feature of the IC is ability to include time delayed versions of neuronal activity as additional inputs to the ELM, i.e the number of inputs D to the ELM may be larger than the number of biological neurons N. This feature, referred to as Time-delay based dimension increase (TDBDI), is especially useful for chronic implants where the signal quality from many probes degrades with time due to scarring and fibrotic encapsulation.

The main reason for choosing the ELM algorithm is that most of the multiplications to be done in this architecture are the D × L random scalings in the first stage which can be done in very low energy and area using analog neuromorphic circuits. The mismatch induced errors [79] in analog circuits is not a problem in this case but can be part of the random coefficients. To get high accuracy, the trainable weights of the second stage can be implemented using digital circuits. However, this does not degrade system level energy efficiency as long as D > > C which ensures that the number of multiplications in second stage are much less than that in the first stage. The circuit implementation of this algorithm is shown next.

3.3.2 Chip Architecture

The system architecture for the neuromorphic ELM chip is shown in Fig. 14.7. Since biological firing rate are sparse, the AER protocol described in Sect. 14.3.1 is used to send the neural spikes to a desired channel based on the address or identity of the source neuron. Then, the input handling circuits (IHC) compute an average firing rate using digital circuits in two steps (Fig. 14.8a). First, a counter estimates instantaneous firing rates by counting the number of spikes in a time interval T s . Then a moving average circuit finds average firing rate in a time window T w . This digital number is then converted to an analog current I DAC using a digital to analog converter (DAC) so that following steps can be implemented in the analog domain. The major task of multiplication by a random number is performed by the synapse—a current mirror comprising identical minimum sized transistors. Ideally, without statistical variations, the current mirror would produce same output current as its input. However, due to mismatch and sub-threshold operation of the transistors, the output current from a mirror is given by:

$$\displaystyle{ I_{\mathrm{out}} = e^{\varDelta V _{T}/U_{T} }I_{\mathrm{in}} }$$
(14.10)

where Δ V T denotes threshold voltage mismatch between the two mirror transistors and U T denotes thermal voltage. In this architecture, the diode connected transistor for every row is shared while the synapse just consists of a single mirror transistor. Hence, the weight of the synapse connecting i-th input to the j-th neuron is given by \(w_{ij} = e^{\varDelta V _{T,ij}/U_{T}}\). The sum of these currents are obtained by just wiring the drains of the mirror transistors together. Finally, this current is converted to the hidden layer output by passing it through a neuron circuit shown in Fig. 14.8b. The neuron is a current controlled oscillator (CCO) whose frequency of oscillation is given by:

$$\displaystyle{ f_{\mathrm{CCO}} = \frac{I_{\mathrm{in}} - I_{\mathrm{leak}}} {C_{f} \times V DD} }$$
(14.11)

This equation is valid as long as I in < < I rst where I rst denotes the reset current flowing through transistor M3 when turned fully on. The current I leak serves the function of the bias term b i in Eq. (14.7). Similar to the weights w ij , these also follow a log-normal distribution. The digital pulses from the CCO are used to clock a counter which is enabled along with the neuron for T en seconds. Also, the counter can be stopped at a digitally programmable count value h max which provides a saturating nonlinearity. Hence, the hidden layer output after the counter can be expressed as:

$$\displaystyle\begin{array}{rcl} h& =& f_{\mathrm{CCO}}T_{\mathrm{en}}\,\text{ if }\,f_{\mathrm{CCO}}T_{\mathrm{en}} <h_{\mathrm{max}} \\ & =& h_{\mathrm{max}}\text{ otherwise.} {}\end{array}$$
(14.12)
Fig. 14.7
figure 7

Overall architecture of the ELM based decoder IC has a decoder to pass input spikes to desired channel, input handling circuits (IHC) to calculate average firing rate of spikes as a feature, a synapse array to create the random weighting of inputs needed in stage 1 of ELM and an array of hidden neurons

Fig. 14.8
figure 8

(a ) The IHC block comprises a counter and a moving average circuit to compute average firing rate in digital domain. The DAC then converts the digital number to an analog current. (b ) The neuron is made of a current controlled oscillator (CCO) that clocks a counter (not shown)

3.3.3 Measurement Results

The chip described above was fabricated in 0. 35 μm CMOS process. With 128 input channels and 128 hidden neurons, the die size of this chip was 4. 95 × 4. 95mm 2. An example of the mismatch is shown in the variability in measured tuning curves of the hidden neurons (Fig. 14.9) when the input spike frequency of only one channel is varied. A more detailed characterization of the mismatch across the entire synaptic array is shown in Fig. 14.10a. This figure is obtained by giving a fixed input frequency to each channel one by one and recording the hidden neuron firing frequency. These weights are fit to a log-normal distribution in Fig. 14.10b implying an underlying gaussian distribution of Δ V T . Across eight different dies, the mean of the gaussian distribution varies from − 0. 1 to 0. 57 mV and the standard deviation varies from 16. 2 to 17. 6 mV.

Fig. 14.9
figure 9

Measured transfer curves of the 128 hidden layer neurons on the chip obtained by sweeping the input spike frequency of one of the channels. The variation of the curves is due to statistical variations in the chip

Fig. 14.10
figure 10

(a ) A map of the threshold variation across the 128 × 128 synaptic current mirror transistors on one of the dies. (b ) The weights due to mismatch fit a log-normal distribution as expected

The authors in [76] have applied the IC for decoding flexion and extension of fingers and wrist from neural activity recorded from the M1 region of a non-human primate. The experiment with the monkey is described in detail in [20]. In brief, monkeys are trained to move individual fingers and wrist based on visual input while simultaneously, a single-unit recording device implanted in the motor cortex is used to record the brain activity. This data contains information about the monkey’s motor intention and is used for the decoding. The entire data set has experiments performed on three monkeys. This pre-recorded data was fed into the IC and the hardware performance has been benchmarked with software decoding results reported in [20].

Figure 14.11 shows an example of the decoding being performed—three different trials are shown. The bottom part of the figure shows neural spikes obtained after sorting from N = 40 M1 neurons. The middle panel shows the onset detection while the top panel shows predicted movement type. The authors reported that the decoding accuracy increases to ≈ 96%, at par with software results, for a hidden layer size of L = 60 neurons. It is also important to see how the decoding accuracy degrades when less number of biological M1 neurons are available for recording. This is shown in Fig. 14.12 for 8 different samples of the IC. It can be seen that using delayed samples to increase dimension (TDBDI) helps in boosting decoding accuracy for all samples. The result is specially significant when the number of M1 neurons is small. This clearly shows the benefit of TDBDI for chronic implants. For this IC, the authors report a power dissipation of 414 nW for the case of D = 40 and L = 60 resulting in an ultra-low energy per operation of 3. 45 pJ/MAC where MAC refers to multiply and accumulate. This is much smaller than recently reported digital multiplier which requires 16–70 pJ/MAC [8082].

Fig. 14.11
figure 11

Example of a neural decoding trial where the chip uses L = 60 hidden layer neurons to decode the onset time and type of movement from N = 40 biological neurons recorded from the M1 region of a non-human primate. 12 types of movement are considered here—flexion and extension of five fingers and wrist

Fig. 14.12
figure 12

Using the time delayed samples for extra information helps in increasing decoding accuracy especially when the number of biological M1 neurons is small. The results are verified from 8 chips

We can now estimate the amount of data compression achievable in this mode of operation with an integrated neural decoder. In the beginning of a session, this system needs to transmit the raw data rate of R typ or R 1 or R 2. This data is used for training. Once trained however, the data rate R 3 to be transmitted is given by:

$$\displaystyle{ R_{3} = f_{\mathrm{deco}} \times \lceil \mathrm{log}_{2}(C)\rceil }$$
(14.13)

where C is the number of classes of movement and f deco is the rate of classification. As an example, for the case described earlier with f deco = 50 Hz and C = 13, R 3 = 200 bps with a compression factor of 105 over R typ showing the huge potential of compression obtainable this way.

4 Conclusion and Discussions

Implantable brain machine interfaces are an emerging area of research which can be used by patients with motor disabilities to interact naturally with prosthetics or devices such as wheelchairs. More broadly, neural implants can be used to treat other neural diseases such as Parkinson’s, epilepsy or depression. In this chapter, we showed the issue of scaling neural implants to thousands of channels in the future stems from increasing wireless transmission rates of the order of 200 Mbps. It was also shown that it is possible to achieve variable rates of compression from 10–105 by incorporating more processing steps into the implanted chip as opposed to leaving it to the receiver module outside the body. To make this viable, the processing has to be done in ultra low power so that the power budget of the implant is not exceeded.

Neuromorphic or neuro-inspired analog circuits provide a viable alternative for reducing power dissipation beyond what is achievable from current digital circuits. In this chapter, we presented an extensive survey of the different levels of compression that are achievable when integrating spike detection, sorting or intention decoding within the neural implant. The most promising scheme for the future large scale implants—intention decoding—is described in great detail starting from the algorithm to chip architecture and details of sub-circuits. In the long term, we envision that as brain sensing technologies mature so that thousands of neurons can be simultaneously probed, integrated machine learners for intention decoding will become a common feature for managing the ‘big data’ originating from neural implants. However, to allow chronic or long-term recording using such devices, some challenges still need to be overcome. One of the major issues in long-term recordings is parameter drift such as change of probe impedance due to scarring or gliosis. Though the current solution has a feature of TDBDI to counter this, there is no automatic detection strategy of when to apply this and to which channels. This is a topic that deserves more attention in future. Also, the current method of training the machine learner used a trial structure where the time of movement was known—in real life operation, there will not be any such precise temporal markers and the training algorithm has to be modified to suit this. One promising possibility is reinforcement learning based training [83] but more work is needed in this direction. Lastly, the current training paradigm used data from a monkey performing actual movements. To move to a prosthetic control using imagined movements only, there will be an aspect of visual feedback that will alter the neural data recorded by the chip—a phenomenon referred to as ‘closed-loop’ decoder training. In this case, we have to retrain the machine learner iteratively over several closed-loop experimental trials and convergence of such training for ELM based decoders is an open avenue for research.