The cortical analysis of speech-specific temporal structure revealed by responses to sound quilts

Overath, Tobias; McDermott, Josh H; Zarate, Jean Mary; Poeppel, David

doi:10.1038/nn.4021

The cortical analysis of speech-specific temporal structure revealed by responses to sound quilts

Article
Published: 18 May 2015

Volume 18, pages 903–911, (2015)
Cite this article

From

View current issue Submit your manuscript

Tobias Overath^1,2^na1,
Josh H McDermott³^na1,
Jean Mary Zarate² &
…
David Poeppel^2,4,5

6867 Accesses
134 Citations
172 Altmetric
12 Mentions
Explore all metrics

Abstract

Speech contains temporal structure that the brain must analyze to enable linguistic processing. To investigate the neural basis of this analysis, we used sound quilts, stimuli constructed by shuffling segments of a natural sound, approximately preserving its properties on short timescales while disrupting them on longer scales. We generated quilts from foreign speech to eliminate language cues and manipulated the extent of natural acoustic structure by varying the segment length. Using functional magnetic resonance imaging, we identified bilateral regions of the superior temporal sulcus (STS) whose responses varied with segment length. This effect was absent in primary auditory cortex and did not occur for quilts made from other natural sounds or acoustically matched synthetic sounds, suggesting tuning to speech-specific spectrotemporal structure. When examined parametrically, the STS response increased with segment length up to ∼500 ms. Our results identify a locus of speech analysis in human auditory cortex that is distinct from lexical, semantic or syntactic processes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

**Figure 1: Schematic of the quilting algorithm and example stimuli.**

**Figure 2: Extent and location of ROIs.**

Figure 3: Responses to German speech quilts as a function of segment length, in four ROIs: HG (red), PT (blue), group fROI (green) and individual fROIs (black), shown separately for the two hemispheres.

**Figure 4: Responses to modulation control stimuli.**

**Figure 5: Responses to environmental sound quilts.**

**Figure 6: Responses to noise-vocoded speech quilts.**

**Figure 7: Naturalness ratings and responses to compressed speech quilts.**

**Figure 8: Functional ROIs revealed by a parcellation algorithm²⁹.**

Robust Cortical Encoding of Slow Temporal Modulations of Speech

Cortical Representation of Speech Sounds: Insights from Intracranial Electrophysiology

Speaker-normalized sound representations in the human auditory cortex

Article Open access 05 June 2019

References

Stevens, K.N. Acoustic Phonetics (MIT Press, 2000).
Poeppel, D., Idsardi, W.J. & van Wassenhove, V. Speech perception at the interface of neurobiology and linguistics. Phil. Trans. R. Soc. Lond. B 363, 1071–1086 (2008).
Article Google Scholar
Scott, S.K., Blank, C.C., Rosen, S. & Wise, R.J. Identification of a pathway for intelligible speech in the left temporal lobe. Brain 123, 2400–2406 (2000).
Article PubMed Google Scholar
Hickok, G. & Poeppel, D. The cortical organization of speech processing. Nat. Rev. Neurosci. 8, 393–402 (2007).
Article CAS PubMed Google Scholar
Rauschecker, J.P. & Scott, S.K. Maps and streams in the auditory cortex: nonhuman primates illuminate human speech processing. Nat. Neurosci. 12, 718–724 (2009).
Article CAS PubMed PubMed Central Google Scholar
Binder, J.R. et al. Human temporal lobe activation by speech and non-speech sounds. Cereb. Cortex 10, 512–528 (2000).
Article CAS PubMed Google Scholar
Liebenthal, E., Binder, J.R., Spitzer, S.M., Possing, E.T. & Medler, D.A. Neural substrates of phonemic perception. Cereb. Cortex 15, 1621–1631 (2005).
Article PubMed Google Scholar
Obleser, J., Zimmermann, J., Van Meter, J. & Rauschecker, J.P. Multiple stages of auditory speech perception reflected in event-related fMRI. Cereb. Cortex 17, 2251–2257 (2007).
Article PubMed Google Scholar
Wild, C.J., Davis, M.H. & Johnsrude, I.S. Human auditory cortex is sensitive to the perceived clarity of speech. Neuroimage 60, 1490–1502 (2012).
Article PubMed Google Scholar
Giraud, A.L. et al. Contributions of sensory input, auditory search and verbal comprehension to cortical activity during speech processing. Cereb. Cortex 14, 247–255 (2004).
Article CAS PubMed Google Scholar
Obleser, J., Eisner, F. & Kotz, S.A. Bilateral speech comprehension reflects differential sensitivity to spectral and temporal features. J. Neurosci. 28, 8116–8123 (2008).
Article CAS PubMed PubMed Central Google Scholar
Zatorre, R.J. & Belin, P. Spectral and temporal processing in human auditory cortex. Cereb. Cortex 11, 946–953 (2001).
Article CAS PubMed Google Scholar
Schönwiesner, M., Rübsamen, R. & von Cramon, D.Y. Hemispheric asymmetry for spectral and temporal processing in the human antero-lateral auditory belt cortex. Eur. J. Neurosci. 22, 1521–1528 (2005).
Article PubMed Google Scholar
Boemio, A., Fromm, S., Braun, A. & Poeppel, D. Hierarchical and asymmetric temporal sensitivity in human auditory cortices. Nat. Neurosci. 8, 389–395 (2005).
Article CAS PubMed Google Scholar
Overath, T., Kumar, S., von Kriegstein, K. & Griffiths, T.D. Encoding of spectral correlation over time in auditory cortex. J. Neurosci. 28, 13268–13273 (2008).
Article CAS PubMed PubMed Central Google Scholar
Overath, T., Zhang, Y., Sanes, D.H. & Poeppel, D. Sensitivity to temporal modulation rate and spectral bandwidth in the human auditory system: fMRI evidence. J. Neurophysiol. 107, 2042–2056 (2012).
Article PubMed PubMed Central Google Scholar
Greenberg, S. A multi-tier framework for understanding spoken language. in Listening to Speech: An Auditory Perspective (eds. S. Greenberg & W.A. Ainsworth) 411–433 (Lawrence Erlbaum, 2006).
Rosen, S. Temporal information in speech: acoustic, auditory and linguistic aspects. Phil. Trans. R. Soc. Lond. B 336, 367–373 (1992).
Article CAS Google Scholar
Efros, A.A. & Leung, T.K. Texture synthesis by non-parametric sampling. in IEEE Int. Conf. Comp. Vis. 1033–1038 (1999).
Grill-Spector, K. et al. A sequence of object-processing stages revealed by fMRI in the human occipital lobe. Hum. Brain Mapp. 6, 316–328 (1998).
Article CAS PubMed PubMed Central Google Scholar
Lerner, Y., Honey, C.J., Silbert, L.J. & Hasson, U. Topographic mapping of a hierarchy of temporal receptive windows using a narrated story. J. Neurosci. 31, 2906–2915 (2011).
Article CAS PubMed PubMed Central Google Scholar
Pallier, C., Devauchelle, A.-D. & Dehaene, S. Cortical representation of the constituent structure of sentences. Proc. Natl. Acad. Sci. USA 108, 2522–2527 (2011).
Article PubMed PubMed Central Google Scholar
Abrams, D.A. et al. Decoding temporal structure in music and speech relies on shared brain resources but elicits different fine-scale spatial patterns. Cereb. Cortex 21, 1507–1518 (2011).
Article PubMed Google Scholar
Giraud, A.L. et al. Representation of the temporal envelope of sounds in the human brain. J. Neurophysiol. 84, 1588–1598 (2000).
Article CAS PubMed Google Scholar
Harms, M.P., Guinan, J.J., Sigalovsky, I.S. & Melcher, J.R. Short-term sound temporal envelope characteristics determine multisecond time patterns of activity in human auditory cortex as shown by fMRI. J. Neurophysiol. 93, 210–222 (2005).
Article PubMed Google Scholar
McDermott, J.H. & Simoncelli, E.P. Sound texture perception via statistics of the auditory periphery: evidence from sound synthesis. Neuron 71, 926–940 (2011).
Article CAS PubMed PubMed Central Google Scholar
Shannon, R.V., Zeng, F.G., Kamath, V., Wygonski, J. & Ekelid, M. Speech recognition with primarily temporal cues. Science 270, 303–304 (1995).
Article CAS PubMed Google Scholar
Davis, M.H. & Johnsrude, I. Hierarchical processing in spoken language comprehension. J. Neurosci. 23, 3423–3431 (2003).
Article CAS PubMed PubMed Central Google Scholar
Fedorenko, E., Hsieh, P.J., Nieto-Castanon, A., Whitfield-Gabrieli, S. & Kanwisher, N. New method for fMRI investigations of language: defining ROIs functionally in individual subjects. J. Neurophysiol. 104, 1177–1194 (2010).
Article PubMed PubMed Central Google Scholar
Lashkari, D., Vul, E., Kanwisher, N.G. & Golland, P. Discovering structure in the space of fMRI selectivity profiles. Neuroimage 50, 1085–1098 (2010).
Article PubMed Google Scholar
Formisano, E., De Martino, F., Bonte, M. & Goebel, R. “Who” is saying “what”? Brain-based decoding of human voice and speech. Science 322, 970–973 (2008).
Article CAS PubMed Google Scholar
Mesgarani, N., Cheung, C., Johnson, K. & Chang, E.F. Phonetic feature encoding in human superior temporal gyrus. Science 343, 1006–1010 (2014).
Article CAS PubMed PubMed Central Google Scholar
Kanwisher, N., McDermott, J. & Chun, M.M. The fusiform face area: a module in human extrastriate cortex specialized for face perception. J. Neurosci. 17, 4302–4311 (1997).
Article CAS PubMed PubMed Central Google Scholar
Price, C., Thierry, G. & Griffiths, T. Speech-specific auditory processing: where is it? Trends Cogn. Sci. 9, 271–276 (2005).
Article PubMed Google Scholar
Schirmer, A., Fox, M.P. & Grandjean, D. On the spatial organization of sound processing in the human temporal lobe: a meta-analysis. Neuroimage 63, 137–147 (2012).
Article PubMed Google Scholar
Ghitza, O. On the role of theta-driven syllabic parsing in decoding speech: intelligibility of speech with a manipulated modulation spectrum. Front. Psychol. 3, 238 (2012).
Article PubMed PubMed Central Google Scholar
Rauschecker, J.P. Cortical processing of complex sounds. Curr. Opin. Neurobiol. 8, 516–521 (1998).
Article CAS PubMed Google Scholar
Norman-Haignere, S., Kanwisher, N. & McDermott, J.H. Cortical pitch regions in humans respond primarily to resolved harmonics and are located in specific tonotopic regions of anterior auditory cortex. J. Neurosci. 33, 19451–19469 (2013).
Article CAS PubMed PubMed Central Google Scholar
Belin, P., Zatorre, R.J., Lafaille, P., Ahad, P. & Pike, B. Voice-selective areas in human auditory cortex. Nature 403, 309 (2000).
Article CAS PubMed Google Scholar
Liebenthal, E., Desai, R.H., Humphries, C., Sabri, M. & Desai, A. The functional organization of the left STS: a large scale meta-analysis of PET and fMRI studies of healthy adults. Front. Neurosci. 8, 289 (2014).
Article PubMed PubMed Central Google Scholar
Peelle, J.E. The hemispheric lateralization of speech processing depends on what “speech” is: a hierarchical perspective. Front. Hum. Neurosci. 6, 309 (2012).
Article PubMed PubMed Central Google Scholar
Cogan, G.B. et al. Sensory-motor transformations for speech occur bilaterally. Nature 507, 94–98 (2014).
Article CAS PubMed PubMed Central Google Scholar
McGettigan, C. et al. An application of univariate and multivariate approaches in FMRI to quantifying the hemispheric lateralization of acoustic and linguistic processes. J. Cogn. Neurosci. 24, 636–652 (2012).
Article PubMed Google Scholar
Voss, R.F. & Clarke, J. 1/f noise in music and speech. Nature 258, 317–318 (1975).
Article Google Scholar
Attias, H. & Schreiner, C.E. Temporal low-order statistics of natural sounds. in Advances in Neural Information Processing Systems, Vol. 9 (eds. M.C. Mozer, M.J. Jordan, & T. Petsche) 27–33 (MIT Press, 1997).
Meyer, M., Alter, K., Friederici, A.D., Lohmann, G. & von Cramon, D.Y. fMRI reveals brain regions mediating slow prosodic modulations in spoken sentences. Hum. Brain Mapp. 17, 73–88 (2002).
Article PubMed PubMed Central Google Scholar
Humphries, C., Sabri, M., Lewis, K. & Liebenthal, E. Hierarchical organization of speech perception in human auditory cortex. Front. Neurosci. 8, 406 (2014).
Article PubMed PubMed Central Google Scholar
Turken, A.U. & Dronkers, N.F. The neural architecture of the language comprehension network: converging evidence from lesion and connectivity analyses. Front. Syst. Neurosci. 5, 1 (2011).
Article PubMed PubMed Central Google Scholar
Lau, E.F., Phillips, C. & Poeppel, D. A cortical network for semantics: (de)constructing the N400. Nat. Rev. Neurosci. 9, 920–933 (2008).
Article CAS PubMed Google Scholar
Petkov, C.I., Logothetis, N. & Obleser, J. Where are the human speech and voice regions, and do other animals have anything like them? Neuroscientist 15, 419–429 (2009).
Article PubMed Google Scholar
Desmond, J.E. & Glover, G.H. Estimating sample size in functional MRI (fMRI) neuroimaging studies: Statistical power analyses. J. Neurosci. Methods 118, 115–128 (2002).
Article PubMed Google Scholar
Moulines, E. & Charpentier, F. Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones. Speech Commun. 9, 453–467 (1990).
Article Google Scholar
Brainard, D.H. The psychophysics toolbox. Spat. Vis. 10, 433–436 (1997).
Article CAS PubMed Google Scholar
Friston, K.J. et al. Statistical parametric maps in functional imaging: a general linear approach. Hum. Brain Mapp. 2, 189–210 (1995).
Article Google Scholar
Rademacher, J. et al. Probabilistic mapping and volume measurement of human primary auditory cortex. Neuroimage 13, 669–683 (2001).
Article CAS PubMed Google Scholar
Westbury, C.F., Zatorre, R.J. & Evans, A.C. Quantifying variability in the planum temporale: a probability map. Cereb. Cortex 9, 392–405 (1999).
Article CAS PubMed Google Scholar
Brett, M., Anton, J.-L., Valabregue, R. & Poline, J.-B. Region of interest analysis using an SPM toolbox (abstract). Neuroimage 16 (suppl. 2), (2002).

Download references

Acknowledgements

The authors thank K. Doelling for assistance with data collection, G. Lewis for extensive help with visualization of the results using FreeSurfer, E. Fedorenko for assistance with the parcellation algorithm, D. Ellis for implementing the PSOLA algorithm for segment concatenation, the volunteers who kindly allowed us to record their speech, T. Schofield, N. Kanwisher and J. Golomb for helpful discussions, and N. Ding, A.-L. Giraud, E. Fedorenko, S. Norman-Haignere and J. Simon for helpful comments on earlier drafts of the manuscript. This work was supported by US National Institutes of Health grant 2R01DC05660 to D.P., a GRAMMY Foundation Research Grant to J.M.Z., and a McDonnell Scholar Award to J.H.M.

Author information

Tobias Overath and Josh H McDermott: These authors contributed equally to this work.

Authors and Affiliations

Duke Institute for Brain Sciences, Duke University, Durham, North Carolina, USA
Tobias Overath
Department of Psychology, New York University, New York, New York, USA
Tobias Overath, Jean Mary Zarate & David Poeppel
Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, USA
Josh H McDermott
Center for Neural Science, New York University, New York, New York, USA
David Poeppel
Max Planck Institute for Empirical Aesthetics, Frankfurt, Germany
David Poeppel

Authors

Tobias Overath
View author publications
You can also search for this author in PubMed Google Scholar
Josh H McDermott
View author publications
You can also search for this author in PubMed Google Scholar
Jean Mary Zarate
View author publications
You can also search for this author in PubMed Google Scholar
David Poeppel
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

T.O., J.H.M., J.M.Z. and D.P. designed the experiments, interpreted the data and wrote the manuscript. J.H.M. designed the quilting algorithm and generated the stimuli. T.O. and J.M.Z. acquired the fMRI data. J.H.M. acquired the behavioral data. T.O. and J.H.M. analyzed the data.

Corresponding authors

Correspondence to Tobias Overath or Josh H McDermott.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Glassbrain projection of the group L30 > L960 functional localizer contrast.

The results are shown at a statistical significance threshold of p < 0.001 (uncorrected for multiple comparisons).

Supplementary Figure 2 Frequency power spectra for speech and modulation control sounds.

Frequency power spectra for speech (solid) and modulation control sounds (dashed) quilted with 30 ms (red) and 960 ms (blue) segment durations.

Supplementary Figure 3 Replicability across scanning sessions.

Replicability across scanning sessions for 12 participants who were scanned between two and four times. The graphs plot the BOLD response normalized by the response to the 960 ms functional localizer condition in each participant’s individual fROI for the right and left hemispheric individual fROIs (red and blue, respectively) in the individual scanning sessions (dashed, dashed-dotted, dotted) that included the original speech quilts (i.e. not compressed speech quilts). The solid line plots the average across scanning sessions. Note that the majority of participants exhibits a plateau at around 480 ms segment duration.

Supplementary Figure 4 Clustering analysis of voxel response profiles³⁰.

a) A permutation test revealed that only the top-ranked cluster (out of nine) was statistically significant (red line segment). b) Profile of the top-ranked discovered cluster for the eight experimental conditions (L30, L960, S30 to S960). Note that the data were mean-centered, which is why the response profile is negative for the conditions yielding a low response. c) Rendering of this cluster on coronal cross-sections of our participants' average structural images (y = -38, -30, -22, -14, -6, 2, 10), thresholded at a voxel by functional system assignment probability of r ≥ 0.7.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–4 and Supplementary Tables 1 and 2 (PDF 3438 kb)

Supplementary Methods Checklist (PDF 130 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Overath, T., McDermott, J., Zarate, J. et al. The cortical analysis of speech-specific temporal structure revealed by responses to sound quilts. Nat Neurosci 18, 903–911 (2015). https://doi.org/10.1038/nn.4021

Download citation

Received: 08 February 2015
Accepted: 20 April 2015
Published: 18 May 2015
Issue Date: June 2015
DOI: https://doi.org/10.1038/nn.4021
Springer Nature America, Inc.

This article is cited by

Auditory-motor synchronization and perception suggest partially distinct time scales in speech and music
- Alice Vivien Barchet
- Molly J. Henry
- Johanna M. Rimmele
Communications Psychology (2024)
Spontaneous emergence of rudimentary music detectors in deep neural networks
- Gwangsu Kim
- Dong-Kyum Kim
- Hawoong Jeong
Nature Communications (2024)
A modality-independent proto-organization of human multisensory areas
- Francesca Setti
- Giacomo Handjaras
- Emiliano Ricciardi
Nature Human Behaviour (2023)
Model metamers reveal divergent invariances between biological and artificial neural networks
- Jenelle Feather
- Guillaume Leclerc
- Josh H. McDermott
Nature Neuroscience (2023)
Joint, distributed and hierarchically organized encoding of linguistic features in the human auditory cortex
- Menoua Keshishian
- Serdar Akkol
- Nima Mesgarani
Nature Human Behaviour (2023)

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The cortical analysis of speech-specific temporal structure revealed by responses to sound quilts

From

Abstract

Access this article

Similar content being viewed by others

Robust Cortical Encoding of Slow Temporal Modulations of Speech

Cortical Representation of Speech Sounds: Insights from Intracranial Electrophysiology

Speaker-normalized sound representations in the human auditory cortex

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Integrated supplementary information

Supplementary Figure 1 Glassbrain projection of the group L30 > L960 functional localizer contrast.

Supplementary Figure 2 Frequency power spectra for speech and modulation control sounds.

Supplementary Figure 3 Replicability across scanning sessions.

Supplementary Figure 4 Clustering analysis of voxel response profiles³⁰.

Supplementary information

Supplementary Text and Figures

Supplementary Methods Checklist (PDF 130 kb)

Rights and permissions

About this article

Cite this article

This article is cited by

Auditory-motor synchronization and perception suggest partially distinct time scales in speech and music

Spontaneous emergence of rudimentary music detectors in deep neural networks

A modality-independent proto-organization of human multisensory areas

Model metamers reveal divergent invariances between biological and artificial neural networks

Joint, distributed and hierarchically organized encoding of linguistic features in the human auditory cortex

Navigation

The cortical analysis of speech-specific temporal structure revealed by responses to sound quilts

Abstract

Access this article

Similar content being viewed by others

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Integrated supplementary information

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Navigation