Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

28.1 Introduction

In 1978 NIH funding levels were low and jobs were few. I had just finished my second postdoctoral position and was looking for a job and having little success. One job involved working on a new and controversial device called a cochlear implant (CI) that electrically stimulated the auditory nerve. At that time there were only a few patients in the world with the experimental device and no commercial products. The CI was intriguing to me because I had been trying to connect auditory psychophysics to physiology. Although I was trained in psychophysics I was interested in auditory physiology and was trying to develop models relating aspects of nerve firing to perception. CIs provided a great opportunity for testing such models. We could now activate highly unnatural patterns of neural activity (in both space and time) and see if we could predict the perceptual outcome. How would loudness and pitch relate to the stimulus parameters of electric stimulation? How could we translate acoustic sound into an electric pattern that would produce the desired loudness or pitch or speech percept?

But many colleagues warned me not to take this position. They were concerned that putting electronic stimulators in human volunteers was borderline unethical. The long-term consequences of electrical stimulation were not well understood. Animal experiments appeared to show safety, but they were mostly short-term experiments. What might happen to the cochlea and nerve after decades of electrical stimulation? Might the electronics leak poisonous chemicals into the body? How could it possibly work? The very thought of replacing the highly complex micromechanical system of the cochlea with a handful of electrodes seemed preposterous. How could we ever expect to get any useful hearing by replacing the 3000–4000 hair cells and 30,000 independent nerve fibers with 12 electrodes activating broad areas of neurons all at once? But I took the position at UC San Francisco anyway, working on early CIs. My part in the project was psychophysics—I guess you should call it psychoelectrics in the case of implants because we were quantitatively relating electric current to perceptual magnitudes. Now, more than 30 years later, CIs, a project that some felt was ethically questionable, have turned out to be the most successful prosthesis ever developed. More than 200,000 people worldwide have received the CI and most recipients are now young children.

CIs provide far better speech understanding than most auditory neuroscientists ever expected. In the 1970s, when cochlear implants were first developed, the prediction was mostly that implants would provide sound awareness and help with lip-reading, and few researchers thought they would ever allow people to converse on the telephone. Most people thought that the relatively crude representation across a small number of electrodes would not be sufficient to convey the complex spectral-temporal patterns of speech. Auditory research greatly underestimated the power of the brain’s pattern recognition. Today, most people with cochlear implants can understand more than 80 % of words in sentences using only the sound from their implant. Many can understand sentences at 100 % correct. And when that level of sound-only performance is combined with lip-reading and the predictability of speech in normal conversations, many CI recipients can function at a high level in the hearing world. Children with CIs are mostly able to attend normal schools and function almost normally in a hearing world.

Auditory brain stem implants (ABIs) are similar to CIs in that they stimulate auditory neurons to restore hearing sensations. However, ABIs target the cochlear nucleus in the brain stem in patients who have no remaining auditory nerve and so cannot use a CI. When ABIs were first introduced auditory researchers again underestimated their potential. The initial ABI patients received sound awareness and a significant boost in lip-reading, but little open set speech understanding. Recent results in adults and children show that high levels of open set speech recognition are possible with ABIs, even though they bypass the auditory nerve and directly stimulating the cochlear nucleus in a non-tonotopic way.

Why have auditory researchers consistently underestimated the potential of auditory prostheses? How can auditory prostheses work so well? This chapter briefly reviews the past and present of CIs and ABIs and speculates about physiological mechanisms that might underlie the pattern of results.

28.2 Cochlear Implants

Early CIs stimulated auditory nerves with a single electrode placed on the round window or into the scala tympani. These early devices were well accepted by patients in spite of their limited ability to convey tonotopic information. The fundamental frequency of the voice was conveyed through periodicity and the overall speech envelope was conveyed by the low-frequency (<20 Hz) modulation in the electrical stimulus. These temporal patterns were sufficient to provide sound awareness and discrimination of some common environmental sounds based on their temporal patterns (e.g., telephone ringing vs. dog barking). In addition, the periodicity, when combined with lip reading in face-to-face communication, allowed much more fluid conversation than lip-reading alone. In general, patients obtained a 30–50 percentage point improvement in speech understanding with their implant compared to lip-reading alone. Although the prosthetic information was highly limited, the early CIs were enthusiastically received by postlingually deaf adults. Even these early CIs broke down the social isolation many deaf people feel in the absence of sound. Early CIs gave them awareness and some discrimination and identification of sounds, and a significant boost in the ease of face-to-face conversation.

Multichannel CIs provided additional electrodes distributed along the tonotopic axis of the scala tympani to provide more information about spectral shape and transitions. With early multichannel CIs performance immediately improved compared to single electrode CIs. These early multichannel CI users were able to recognize about 20 % of words in sentences without lip-reading. The additional spectral shape and spectral transition cues were crude compared to the detailed spectral information in a normal cochlea, but it was sufficient to allow CI users to recognize words. With this additional level of auditory quality, CI patients were now able to follow face-to-face conversations with ease.

Signal processing algorithms improved over the 1980s and early 1990s and CI performance continued to improve. By the mid-1990s most CI recipients were able to understand about 50 % of the words in sentences. This level of word recognition is enough to allow conversations on the telephone. Owing to the redundant and predictable nature of normal conversation, 50 % recognition, combined with knowledge of the personal speaking style of the talker and knowledge of the conversation topic, was enough to allow relatively good conversation over the telephone (or with a person whose lips are not clearly visible). The signal processing advance that allowed this improvement was a change in philosophy. Initial signal processing strategies assumed that there was only a limited amount of information that the implants could convey and so algorithms were developed to extract the most important features of speech from the running speech stream and code only those aspects into the CI. Such a strategy can work reasonably well in quiet listening conditions but break down badly in noisy listening conditions. Computer algorithms are not very good at reliably extracting speech cues in noisy conditions. In addition, the coded representation of the key speech features was presented in an unnatural manner across the electrodes and so constituted a new pattern of information that must be at least partially learned by the listener. A large improvement in performance came with the introduction of the continuous interleaved sampling (CIS) and the similar Advanced Combination Encoder (ACE) strategy. Electric pulses were interleaved in time across electrodes to avoid the complex interaction of simultaneously presented electric fields. These strategies simply filtered the sound stream into multiple-frequency bands and then presented an electric pulse train on each electrode representing the time-varying energy from each frequency band to the electrode assigned to that band. The pattern of stimulation produced was still very crude compared to a normal cochlea, but it was unselected in the sense that speech features were not explicitly extracted and presented. Instead, the brain’s own speech feature extraction was allowed to work on the CI stimulation pattern. Although this pattern was probably shifted and distorted in frequency relative to the normal cochlea’s tonotopic representation, and was a very coarse representation of spectral and temporal fine structure, most CI listeners were able to adapt their pattern recognition to the shifted pattern after a few months. The tonotopic patterns, though coarse, were sufficient to identify 40–50 % of random words presented without lip-reading cues. Further improvements in signal processing have led to improvements in speech recognition so that modern multichannel CIs provide more than 80 % recognition of words in sentences (Spahr et al., 2007). This result shows that the fine structure, both spectrally and temporally, is not necessary for speech recognition in quiet.

Research has shown that performance increases as the number of spectral bands increases and that as few as four bands were sufficient for high levels of speech recognition (Shannon et al., 1995). More bands are necessary for speech understanding in a variety of other conditions: As the complexity of the speech increases (Shannon et al., 2004); as noise interference increases (Fu et al., 1998); or when language familiarity is underdeveloped, such as in young children (Eisenberg et al., 2000), or people listening in a second language (Padilla, 2003). It appears that the brain is highly over-trained in recognition and categorization of speech patterns in our native language from millions of repetitions over our hearing lifetime. Under-optimal listening conditions speech can be recognized with surprisingly little spectro-temporal detail. As listening conditions deteriorate more fine structure is necessary.

One of the remaining puzzles about CI results is the wide variation in individual performance. It was once thought that a large part of the individual variability in outcomes was due to poor device parameter fitting. The standard clinical device fitting procedure may produce a good fit for some patients but not for others. Unlike the fitting of prescription lenses for vision problems, the fitting of CIs is not yet well developed in terms of individual fitting. It was hoped that improvements in individual fitting would convert poor CI users into good users, while already good users may get little of no benefit from fine tuning of implant parameters. This has not turned out to be the case. When individual customization of fitting parameters has been applied, the scores of all patients improve. Although poorly performing patients do show improvement with better parameter adjustments, rarely has a patient with a poor outcome been converted into one with a good outcome (Wilson et al., 1993).

Individual differences in implant performance have also been resistant to training. It was thought that brain plasticity could overcome some of the deficiencies in individual CI parameter fits and that training on speech materials would shape the brain’s experience to improve performance (Wilson et al., 1993). Training, like individual parameter adjustments, improves performance for all patients (Fu & Galvin, 2008; Zhang et al., 2012); it does not have a differentially larger improvement in patients with poor outcomes.

This pattern of outcomes presents a puzzle: what is the source of the large individual variability in outcomes. If it is not fine adjustments in customizing the device to the individual patient and it is not something that can be learned, what is it? The differences in outcomes might be due to differences in the underlying pathology of the deafness, possibly related to the degree and uniformity of the surviving nerve population. For another perspective on individual variability in outcomes we next look at outcomes with the ABI.

28.3 Auditory Brain Stem Implant

The ABI is similar to the CI but is intended to stimulate the cochlear nucleus in the brain stem. It was originally designed for patients with neurofibromatosis type 2 (NF2), a genetic disorder that produces bilateral tumors on the vestibular portion of the eighth cranial nerve (VIIIn). These patients are deafened after tumor removal severs both auditory and vestibular branches of VIIIn. Such patients are deafened in a way that cannot be helped by a CI because they have no remaining auditory nerve.

The first ABI was done in 1979 by Bill House and Bill Hitselberger at the House Research Institute (Hitselberger et al., 1984) and that first patient has used the device every waking hour since that time. The device evolved over time to have multiple electrodes (Brackmann et al., 1993; Shannon et al., 1993) and was first commercialized by Cochlear Corporation in 1992. FDA approval was received in 2000 and now there is also an ABI device available from MedEl Corporation. As of 2012 there are more than 1100 ABI patients worldwide and most of them lost their auditory nerves from bilateral tumors (NF2). Overall the ABI provides sound awareness and environmental sound discrimination and some minor recognition of words (Lenarz et al., 2001; Nevison et al., 2002; Otto et al., 2002). Although the psychophysical measures of ABI performance were similar to those seen in CIs (Shannon & Otto, 1990), speech recognition was significantly poorer.

The difference between CI and ABI outcomes may provide some insight into the function of the auditory system. This early result suggested that auditory implants might have reached the point of diminished returns in terms of implant function; activation of the cochlear nucleus may produce more complex and less tonotopically organized patterns of auditory activation that didn’t allow speech recognition. Stimulation at the level of the cochlear nucleus might bypass too much critical intrinsic processing so that the more central auditory structures do not have the fundamental information they need. Another possibility was that the surface array was not sufficiently tonotopically selective. Electric stimulation on the surface of the cochlear nucleus produces mostly low pitch sensations because high-frequency neurons are below the surface. ABI electrodes can interfere with each other because there is considerable overlap in the nerve populations activated by adjacent electrodes.

28.4 PABI: Penetrating Electrode ABI

It was thought that the limiting factor in ABI performance was that the surface electrode array was not making good contact with the tonotopic dimension of the human cochlear nucleus because it does not project to the surface of the nucleus. Surface electrodes primarily access low-frequency neurons and most ABI patients commented that the sound quality was low pitch and sounded “muffled.” To gain access to higher frequency neurons lying below the surface of the nucleus, the ABI device was modified to include an array of 10 penetrating microelectrodes, with the goal of providing selective activation of high pitch tonotopic layers of the posterior ventral cochlear nucleus (PVCN) beneath the surface. The PABI was developed over a period of 15 years, including electrode design, materials selection and biocompatibility, and long-term safety of insertion and stimulation in animal experiments. Animal studies showed that insertion and stimulation of microelectrodes in the PVCN was safe, and that stimulation of electrodes at different depths could activate different tonotopic regions as measured in the inferior colliculus (IC; McCreery et al., 1998). Clinical trials in humans were initiated in the fall of 2003, and 10 patients were implanted with the PABI device.

The penetrating array produced auditory sensations in eight of the ten patients, with threshold levels less than 1 nanoCoulomb (nC), indicating good positioning into the PVCN. Classical psychophysical measures from the penetrating electrodes were quantitatively similar to those measured with CIs and surface-electrode ABIs. No interaction could be measured between penetrating electrodes, confirming good spatial selectivity and small area of excitation. Patients commented that the perception elicited by the penetrating electrodes was “clean and sharp and high pitch.” In one case the patient still had temporary acoustic hearing in the contralateral ear and it was possible to match the pitch of each PABI electrode with acoustic tones in the nonimplanted ear, so the mapping of acoustic frequency information to electrode place was correct. In spite of successful implantation and the achievement of targeted psychophysical goals, speech performance with the PABI has been no better than with the surface electrode ABI (Otto et al., 2008). Even highly selective microstimulation of the cochlear nucleus was not sufficient to allow good speech recognition. Again, it appeared that stimulation of the cochlear nucleus, even with selective microstimulation, might have bypassed too much important neural processing, so that more central auditory nuclei didn’t have sufficient information. However, the picture changed dramatically in the 2000s.

28.5 ABI in Nontumor Adults

Vittorio Colletti, a surgeon in Verona, Italy provided the ABI to patients who lost their VIIIn from causes other than NF2—such as from head trauma, severe ossification that obliterated the nerve, neurodegenerative diseases and several other causes (Colletti et al., 2002, 2004). These patients did not have tumors, but still had no auditory nerve and so were not candidates for a CI. Colletti’s initial results showed excellent open set speech recognition in some of these nontumor (NT) patients. His results were met with considerable skepticism because ABI results in NF2 patients had never led to high levels of open set speech recognition. Independent testing verified Colletti’s claims and showed that these patients also had better ability to detect small sinusoidal modulations in electric stimuli (Colletti & Shannon, 2005). Several of Colletti’s NT ABI patients were able to achieve speech recognition scores near 100 % correct for simple sentences presented in quiet. Several could converse on the telephone as well as CI patients. One used a cell phone as his primary business contact as an independent contractor.

This exciting result showed that electric stimulation of the human cochlear nucleus could provide functional hearing comparable with CIs, even though the ABI had less access to the tonotopic gradient of the auditory system. The difference between ABI performance of NF2 and NT patients suggested that the difference in performance was related to the difference in etiology. Surgical removal of the NF2 tumor may damage some neural structure that is important for speech perception. Most psychophysical measures were similar between NF2 and NT ABI patients, but modulation detection was clearly better in the NT ABI patients and significantly correlated with speech recognition. What physiological structure that might be related to speech recognition is in a position to be damaged by tumor removal and plays a role in modulation detection? We will return to this question after reviewing two more patient populations and their results with the ABI.

28.6 ABI in Children

Based on his success with ABIs in NT adults, Colletti began a program to implant the ABI in children born without an auditory nerve. These children have a developmental or genetic abnormality in which the cochlear and/or auditory nerve fails to develop. Sometimes a full cochlea develops without an auditory nerve, and sometimes a nerve is present with a badly malformed cochlea. If a nerve is present the child may be suitable for a CI if the electrode array can be positioned near the nerve in the abnormal cochlea. Other children may have had hearing at birth but developed severe ossification following meningitis. In some cases the ossification is so severe that not only is the cochlea filled with new bone, but also the growth continues to invade the modiolus and internal auditory meatus, obliterating the auditory nerve. In such cases the children would have had some experience with hearing early in life but lost the hearing as the cochlea and then cochlear nerve were damaged by bone growth. In some cases these children may have received a CI, but it is now well known that children with this etiology do poorly or even obtain no benefit from a CI (Buchman et al., 2011). It was initially controversial to place an ABI in these children because ABI placement requires a transdural craniotomy to reach the brain stem. However, several cases showed excellent auditory development with the ABI (Colletti & Zoccante, 2008; Colletti et al., 2012)—even comparable developmental trajectories to that of congenitally deaf children with CIs (Eisenberg et al., 2008). Complications from surgery were minimal (Colletti et al., 2010). Several children developed open set sound recognition sufficient to attend mainstream schools. In 2012 there are more than 100 children with ABIs in the world and the number is growing rapidly. Again, these results show that the information delivered by the ABI to the brain stem is not only sufficient for experienced adult brains to recognize speech patterns, but it is also sufficient to allow a completely naïve child’s brain to learn these patterns from the beginning.

28.7 New ABI Outcomes in NF2

One more twist in the ABI story is important before considering the potential physiological basis for these good results. Early ABI results showed useful but limited speech performance in NF2 patients. Following the excellent results in NT adults and children it was thought that the NF2 tumor removal must damage some critical structure or pathway that remains intact in these NT patient populations. However, some surgeons started seeing CI-like auditory performance even in NF2 ABI patients (Skarzynski et al., 2000; Behr et al., 2007). Some of these patients could understand sentences at 100 % correct, and could even understand 50 % of the words in sentences at a speech to noise ratio of +3 dB—a level that is rarely achieved even in CI patients. Many of these patients could converse on the phone without difficulty. If the original interpretation was correct that the tumor removal was damaging some structure in the brain stem, how were these surgeons able to remove similar tumors without such damage? Or was there another explanation for the good outcomes of these patients?

28.8 The Auditory Midbrain Implant: Electrical Stimulation of the Inferior Colliculus

The difference in outcomes between NT and NF2 ABI patients suggests that NF2 tumors and/or their removal can cause damage to auditory pathways that interfere with speech recognition. Assuming that this damage is local to the CN, it may be possible to bypass the CN region and produce better speech recognition by stimulating higher neural centers of the brain stem and midbrain. The inferior colliculus (IC) is a prime candidate for such stimulation because it has a regular and well-documented tonotopic structure and it is surgically accessible. If good speech recognition can be achieved from nontonotopic activation of the cochlear nucleus, then it might be possible to achieve good speech recognition with stimulation of a higher nucleus in the auditory pathway. If speech pattern recognition was flexible enough that top-down processing could utilize highly unnatural patterns of activation in auditory nerve and cochlear nucleus, then it might be possible to achieve similar success from stimulating the IC. At present, two implants are under development to provide electrical stimulation of the IC: (1) the inferior colliculus implant (ICI), which uses an ABI 12-electrode array placed on the surface if the IC, and (2) the auditory midbrain implant (AMI), which uses a penetrating 21-electrode array. The first patient to receive the ICI was implanted in December 2005 (Colletti et al., 2007). Five patients have now received the AMI (Lim et al., 2009). Most ICI and AMI patients receive sound sensations from stimulation and many hear different pitch sensations across the electrode array, indicating that the arrays do access different tonotopic regions of the IC. However, no significant speech recognition has been observed from electric stimulation of the IC. Although it is possible to place electrodes in or on the IC and achieve tonotopic activation, speech recognition has not been achieved. Patients receive useful auditory information from these devices but they are not receiving open-set speech recognition.

28.9 Auditory Neuropathy

Another group of patients of interest are those diagnosed with auditory neuropathy (AN) (Starr et al., 1991). Although AN may represent more than one pathology, results suggest that, like poorer-performing CI and ABI users, AN patients exhibit poor modulation sensitivity and poor speech recognition (Zeng et al. 1999, 2005). It is possible that the pathology that causes at least some types of AN is rooted in the hair cell-neuron synapse, or in the VIII nerve itself. If so, a CI might not provide benefit, but an ABI might. However, if the pathology causing AN is rooted in damage to the putative neural subsystem in the CN that is critical for speech recognition (similar to NF2 patients), then an ABI may provide limited benefit.

28.10 Possible Physiological Substrates of Speech Recognition

Now that there is more than 30 years of experience with electrical stimulation of the auditory system it may be possible to look at possible neural underpinnings of the pattern of results observed. There is a large variation in performance across CIs, but most patients can achieve high levels of open set speech recognition, including the ability to converse easily on the telephone. Similar excellent speech recognition in some ABI patients shows that it is possible to achieve excellent open set speech recognition from an ABI stimulating the cochlear nucleus, even after NF2 tumor removal. The lack of good speech recognition from stimulation of the IC suggests that we may have reached a point of diminishing returns. It is possible that activation of the IC bypasses too much intrinsic processing in the auditory brain stem. Now I consider possible physiological mechanisms that might underlie the pattern of results observed.

The dichotomy in ABI patient outcomes provides considerable leverage on a key question in auditory processing: Is there a specialized physiological pathway for speech recognition? The differences between these patient groups appear to be subtle—both groups have no functioning auditory nerve, no known central pathology; both groups are implanted with same ABI device and both groups use the same stimulation strategy. Preliminary psychophysical results show that both groups have similar threshold levels, similar degrees of electrode selectivity, and similar pitch and loudness ranges (Shannon & Otto, 1990; Shannon & Colletti, 2005). The most significant performance difference (besides speech recognition) is for modulation detection; ABI patients with good speech recognition have significantly better modulation detection thresholds (MDTs) than those of ABI patients with poor speech recognition, regardless of etiology (Colletti & Shannon, 2005). MDTs were significantly correlated with speech vowel recognition and sentence recognition in both ABIs and CIs (Fu, 2002). Thus, whatever physiological differences exist seems to impact both speech recognition and modulation detection, but not other perceptual measures.

One possible explanation for the difference between good and poor ABI patient outcomes is that the NF2 tumor and/or its removal causes some sort of damage to a neural system that is critical for speech recognition. The most likely causes of damage during tumor removal are: (1) physical damage to the brain stem neurons, (2) anoxia related to venous bleeding, or (3) excitotoxicity from electrocautery to stop bleeding. If the presence and/or removal of tumors indeed damages a specific cell type or region of the CN, and that damage decreases speech understanding, this would be an important new finding and advance our understanding of the role of peripheral physiology in complex perception. Recent results of excellent speech recognition in NF2 patients suggest that surgical removal of NF2 tumors may not always result in damage. One common element in the NF2 ABI patients who achieved high levels of speech recognition is that most had surgery in the semi-sitting position, which lowers the venous pressure in the tumor region so that little or no cautery was used during tumor removal. Local anoxia or excitotoxicity would likely affect cells near the surface of the cochlear nucleus. If damage to the surface of the CN is causing a large difference in outcomes, what type of cells might be damaged?

NF2 tumors are benign schwannomas that originate near the myal/glial junction on the vestibular branch of the VIIIn. The myal/glial junction is near the medial opening of the internal auditory meatus. As they grow, vestibular schwannomas balloon into the cerebello-pontine angle and tumors larger than 2 cm typically contact the surface of the brain stem. Although benign, NF2 tumors produce an angiogenesis factor on their surface that attracts vascular blood supply from the surface of the brain stem, in this case the surface of the CN. The vascular supply of the CN in this region branches off of the posterior-inferior cerebellar artery (PICA). CN vessels travel along the surface and then dive into the interior of the nucleus. The tumor angiogenesis entangles the tumor’s blood supply with the blood supply to the surface of the cochlear nucleus. The mere existence of the tumor and the shared vasculature may not impair the functioning of the CN because some patients with 4- to 5-cm tumors retain normal hearing and speech understanding prior to surgical removal. Tumor removal and surgical cautery may damage CN cells that share blood supply with the tumor, either through anoxia or excitotoxicity.

The small cell cap (SCC) of the cochlear nucleus is a candidate for such vascular/excitotoxic damage or direct mechanical damage, due to its physical location on the surface of the CN. Physiologically, the SCC predominantly receives input from primary auditory neurons with high thresholds and low spontaneous rates (SRs). According to Liberman (1978, 1991): “The small cell cap was almost exclusively innervated by low- and medium-SR fibers, i.e., those with the highest acoustic thresholds.” Although the SCC is not well characterized, it is thought to project to the medial olivary complex (MOC; Ye et al., 2000) and to possibly have a role in intensity coding because of the wide dynamic range (DR) of its neurons (Ghoshal & Kim, 1996, 1997). The low spontaneous rate auditory neurons that project to the SCC also show little saturation with level and exhibit wide DRs (Sachs & Abbas, 1974; Winter et al., 1990).

High-threshold, low-SR neurons may provide the basis for rate coding of spectral profiles, because they are able to preserve spectral profiles at moderate loudness levels without saturating (Sachs & Young, 1979). Low spontaneous rate (LSR) neurons and cells in the SCC are also known to code modulation well because of their large dynamic range. Phylogenetically, the SCC is small region of the cochlear nucleus with an unknown function. The small cells probably cannot project to remote target sites because they are too small to metabolically support a long axon. In most mammals the SCC is quite small and it is hypertrophied only in humans and porpoises.

Figure 28.1 shows a comparison of the SCC in cats, humans, and porpoise (figure represents a composite from Moore, 1987; Osen & Jansen, 1965, coloring of SCC courtesy of Jean Moore). Note the large difference in relative size across species. Even other primates have a relatively small SCC compared to humans and cetaceans (Moore & Osen, 1979). Is the SCC a recent evolutionary structure specialized for complex pattern perception? Central mechanisms that may be selectively attentive to these neurons might provide a specialized pathway for coding complex pattern information (modulation of firing rate vs. tonotopic place). The SCC is the primary target for the initial synapses of the LSR neural population and so could provide a physiological subsystem specialized for spectral pattern processing. Damage to the SCC as a consequence of tumor removal might explain the difference in speech understanding between ABI patients who can understand speech and those who cannot. Another convergent piece of evidence is the fact that modulation detection is correlated with speech pattern recognition and the SCC neurons are also good at coding modulation because of their large dynamic range (Ghoshal & Kim, 1996, 1997). Loss of low spontaneous rate VIIIn fibers or SCC fibers may contribute to the loss of speech recognition, even when electrical thresholds and dynamic ranges and other psychophysical measures appear to be normal. If non-SCC fibers are still intact and stimulated by the ABI then they could still produce auditory sensations but may not contribute to speech recognition.

Fig. 28.1
figure 1

Comparison of the SCC in cats, humans and porpoise (figure represents a composite based on figures from Osen & Jansen, 1965 and Moore, 1987). The SCC is indicated in yellow (coloring of SCC courtesy of Jean Moore). Note the large difference in relative size across species

Alternatively, onset-chopper cells (OCCs) may also be candidates for structural damage in the CN. OCCs in the CN are known to enhance modulation relative to VIII nerve (Rhode & Greenberg, 1994; Frisina, 2001), and therefore could well be the physiological substrate important to both modulation detection and speech recognition. OCCs are metabolically more labile because their size makes them more susceptible to transient anoxia, which almost certainly occurs during surgical cautery. However, OCCs are so broadly distributed in the VCN that they would not be any more susceptible to mechanical surgical injury than other large cells.

Whether the physiological difference between good and poor ABI outcomes is due to damage to the SCC, OCCs, or some other cell type is not crucial for the concept of a specialized speech system. A simple physiological difference may or may not explain the large difference in performance between the two groups. All ABI patients have presumably intact central auditory processing; all had normal speech recognition before the loss of hearing, and all use the same electrode and similar speech processing strategies. Some ABI recipients have audiologically normal hearing right up to the tumor removal surgery. And in a few cases, temporary normal acoustic hearing on the contralateral ear allowed balancing of acoustic and electric stimulation in pitch and loudness, so the assignment of acoustic frequencies to tonotopically appropriate neural populations was possible. In spite of all this, most of these patients are not able to recognize speech with the ABI even though they had only a short period of deafness. But some (as many as 35 % in some clinics) can understand simple sentences at nearly 100 % correct and can converse on the telephone. The large difference in performance combined with the seemingly minor differences in etiology/pathology suggests that damage to a specific physiological mechanism may be at the root of this dichotomy in ABI patient outcomes. Whatever the physiological underpinnings of the perceptual differences between these patient groups, it is important to comprehensively characterize the perceptual capabilities of these patients. If the SCC/LSR hypothesis is correct, research with these NF2 and NT ABI patients may illuminate underlying physiological substrates/pathways for speech pattern recognition that may be independent of other auditory processing.

28.11 An Acoustic Fovea?

Consider an analogy between auditory and visual systems. Let us assume for a moment that the low spontaneous rate (LSR)/high spontaneous rate (HSR) system is analogous to the differential contribution of rods and cones to vision. Cones make up only 5 % of the retinal epithelium but perform a large part of visual pattern recognition. LSR neurons only make up 5–10 % of auditory neurons. Rods are specialized for high sensitivity and low thresholds, as are high spontaneous rate auditory neurons. Rods/HSR neurons are highly important evolutionarily because they allow early detection of predators and/or the ability to detect prey at low sound/light levels. In contrast, retinal cones and LSR auditory neurons are less sensitive but have a larger dynamic range of responsiveness. These systems are evolutionarily younger and may represent a more recent adaptation for processing more complex patterns of sensory information. It is known that complex pattern recognition like reading and face recognition is poor in the visual periphery where the receptors are mostly rods. People who experience a loss of foveal cones have great difficulty reading or recognizing people. We hypothesize that a loss of LSR neurons (or SCC neurons to which they project) may result in a loss of speech pattern recognition. Whether it is an intrinsic difference or due to experience, rods and cones have difference functions in visual processing. It is possible that the LSR auditory neurons represent an “auditory fovea”; specialized for complex pattern processing rather than sensitivity. Differences in speech recognition across implant users may reflect differences in the health of this LSR/SCC system. It may even explain some aspects of auditory neuropathy; loss of speech recognition and poor modulation sensitivity even with good threshold sensitivity. We should consider there might be multiple processing pathways in the auditory system as early as the auditory nerve and brain stem. Maybe there is an auditory fovea.

28.12 Summary

At present, there remains great variability in CI patient outcomes. Although most CI recipients show high levels of speech recognition, some do not. It has been assumed that this variability in outcomes relates to patients’ neural survival or to nonoptimized speech processor settings. However, studies in which speech processor parameters were varied have shown that relative performance levels across CI patients were preserved across parameter manipulations. No matter what processing parameters were tested, the top-performing patients always performed best and poorest-performing patients performed worst (Wilson et al., 1993). Thus, optimized speech processing for individual patients did not reduce the variability in patient outcomes. This result suggests that there is a physiological basis for the differences in performance across patients.

Consider the possibility that there may be two different sources of variability in CI patient outcomes: damage to the VIIIth nerve and/or damage to the putative speech-specific pathway. If poor performance in CI patients is due to damage to the VIIIth nerve, the ABI may provide some benefit. The good speech recognition performance by NT ABI patients (who have no functioning auditory nerve) suggests that the ABI may provide a new option for patients who receive little benefit from the CI. Indeed, greatly improved speech recognition was observed in several NT ABI patients who previously received little benefit from their CI (Colletti et al., 2002, 2004). However, if poor performance in CI patients is due to the loss of a more central speech pathway, the ABI may not provide any more benefit than the CI.

One hypothesis is that the SSC of the cochlear nucleus is a possible physiological substrate for a speech pathway. It is known that the SCC primarily receives input from the LSR auditory neurons. Recent results (Kujawa & Liberman, 2009; Lin et al., 2011) show that LSR neurons are more susceptible to acoustic overstimulation than other neurons. It is possible that the LSR–SCC system is essential for speech pattern processing. People with damage to either LSR neurons or SCC neurons may still have normal auditory thresholds and normal psychophysics mediated by high spontaneous rate neurons. But if they have damage to the SSC–LSR system they might have poor modulation detection and poor speech pattern recognition —a pattern exhibited in AN patients and poor users of CI and ABI devices.

Auditory research has traditionally underestimated the role of the brain in processing complex patterns of information from the cochlea. In the 1960s, auditory neuroscientists were convinced that CIs would not work because the complexity of cochlear processing could not be replaced with a few stimulating electrodes. At that time, most auditory researchers were fixated on the complexities of cochlear processing, and thought that the highly unnatural patterns of neural activation provided by electrical stimulation would only allow only rudimentary auditory sensations. Now it is clear that central processing of complex patterns of sensory information allow high levels of speech recognition, even though the peripheral pattern of activation is spectrally impoverished and highly unnatural. High levels of speech recognition have now been documented even from stimulation of the cochlear nucleus with a pattern of electric activation that is far more unnatural than that produced by a CI. Some factor seems to be limiting NF2 ABI patients’ ability to synthesize speech from the stimulation patterns provided by the ABI. Since there is no known central manifestation of NF2, the problem is most likely localized to the CN. This suggests that some physiological mechanism/structure/pathway in the CN may be damaged during NF2 tumor removal. Without this pathway, speech understanding and modulation detection are poor even in the presence of relatively normal psychophysical abilities.

This chapter proposed a hypothetical processing pathway that may be essential for speech recognition—the low spontaneous rate auditory neurons connecting to the small cell cap of the cochlear nucleus. Damage to such a putative pathway could potentially underlie the pattern of poor speech recognition and poor modulation detection documented in patients with auditory neuropathy, poor-performing CI patients, and ABI patients.

Whether or not the specific mechanisms proposed are correct is of little importance. There two principal messages of this chapter. Message one is that patient outcomes can provide important leverage on understanding the neuroscience of auditory processing. Quantitative study of pathologies and functional differences can suggest underlying mechanisms. We suggest that the linkage between patient pathology and auditory neuroscience is underutilized and can provide leverage on scientific questions. Message two is that, in spite of widespread acceptance that auditory processing is “massively parallel,” most theories of speech processing are serial/sequential. As a field we need to develop better insights and models of parallel processing in the auditory system.