The Auditory English Lexicon Project: A multi-talker, multi-region psycholinguistic database of 10,170 spoken words and nonwords

Goh, Winston D.; Yap, Melvin J.; Chee, Qian Wen

doi:10.3758/s13428-020-01352-0

The Auditory English Lexicon Project: A multi-talker, multi-region psycholinguistic database of 10,170 spoken words and nonwords

Published: 14 April 2020

Volume 52, pages 2202–2231, (2020)
Cite this article

Download PDF

Behavior Research Methods Aims and scope Submit manuscript

The Auditory English Lexicon Project: A multi-talker, multi-region psycholinguistic database of 10,170 spoken words and nonwords

Download PDF

Winston D. Goh¹,
Melvin J. Yap¹ &
Qian Wen Chee¹

4270 Accesses
24 Citations
19 Altmetric
Explore all metrics

Abstract

The Auditory English Lexicon Project (AELP) is a multi-talker, multi-region psycholinguistic database of 10,170 spoken words and 10,170 spoken nonwords. Six tokens of each stimulus were recorded as 44.1-kHz, 16-bit, mono WAV files by native speakers of American, British, and Singapore English, with one from each gender. Intelligibility norms, as determined by average identification scores and confidence ratings from between 15 and 20 responses per token, were obtained from 561 participants. Auditory lexical decision accuracies and latencies, with between 25 and 36 responses per token, were obtained from 438 participants. The database also includes a variety of lexico-semantic variables and structural indices for the words and nonwords, as well as participants’ individual difference measures such as age, gender, language background, and proficiency. Taken together, there are a total of 122,040 sound files and over 4 million behavioral data points in the AELP. We describe some of the characteristics of this database. This resource is freely available from a website (https://inetapps.nus.edu.sg/aelp/) hosted by the Department of Psychology at the National University of Singapore.

The perceptual flow of phonetic information

Article 31 January 2019

The Jena Speaker Set (JESS)—A database of voice stimuli from unfamiliar young and old adult speakers

Article 21 October 2019

SyllabO+: A new tool to study sublexical phenomena in spoken Quebec French

Article 11 November 2016

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

The ease with which people are able to recognize printed and spoken words is one of the most impressive and important things humans do. Consequently, the processes underlying isolated word recognition and processing have been extensively studied (Balota, Yap, & Cortese, 2006; Dahan & Magnuson, 2006). Words are also one of the most commonly used set of stimuli in cognitive and experimental psychology (Balota et al., 2007). Researchers have accumulated a great deal of information regarding how the different statistical properties of words (e.g., frequency of occurrence, imageability, number of letters or phonemes) influence how quickly and accurately people can recognize words, and how they influence other cognitive processes, such as memory.

However, the overwhelming majority of experiments that have used word stimuli have focused on the processing of printed words. From a methodological point of view, the development and presentation of spoken, compared to printed, word stimuli is far more labor-intensive and complex. For example, each auditory token has to be recorded by one or more speakers, the sound file has to be edited to isolate the word, normalized, and tested for intelligibility before it can be used. In this light, it is perhaps unsurprising that empirical and theoretical developments in visual, compared to auditory, word recognition research have been relatively more rapid and extensive (see also Tucker, Brenner, Danielson, Kelley, Nenadić, & Sims, 2019). It is worth noting that Balota et al.’s (2007) English Lexicon Project’s (ELP) behavioral and descriptive repository of visual word recognition data has contributed to these developments.

This paper describes the Auditory English Lexicon Project (AELP), which was conceived to address the above constraints by developing a very large and well-characterized set of auditory word and nonword tokens that have been rigorously normed for intelligibility. These tokens are freely available to the research community via a webpage (https://inetapps.nus.edu.sg/aelp/), and can be used for any experiment involving the presentation of spoken words and/or nonwords. In the following sections, we provide a brief overview of the theoretical importance of auditory word processing for understanding cognitive processes, existing spoken word databases, the megastudy approach and recent auditory megastudies, before turning to the AELP.

Auditory word processing

Listening and reading essentially have the same goal – retrieving the meaning of the stimulus, but effects do not always generalize across modalities, suggesting that there may be fundamental differences in the underlying mechanisms for lexical processing depending on the medium. For example, spoken word processing is consistently slowed down by dense phonological neighborhoods, but orthographic neighborhoods exert inconsistent effects in visual word recognition (Andrews, 1997). Semantic richness effects, the general finding that words with richer semantic representations facilitate processing (Pexman, 2012), have been shown to be smaller in auditory compared to visual word recognition (Goh, Yap, Lau, Ng, & Tan, 2016). These dissociations between visual and spoken word recognition point to the possibility that the recognition process in speech may focus more on resolving phonological similarities first (Goh, Suárez, Yap, & Tan, 2009; Luce & Pisoni, 1998), and so any advantages from semantically richer words are attenuated in the face of greater word-form competition.

Research has also shown that speech perception may be a talker-contingent process (Nygaard, Sommers, & Pisoni, 1994), and that indexical properties of spoken words – gender, accent, and other unique aspects of the talker’s voice – are encoded and retained in memory (Goh, 2005; Goldinger, 1996b). Talker variability in the input enhances perceptual learning and word recognition in both adults (Logan, Lively, & Pisoni, 1991; Pisoni & Lively, 1995) and infants (Singh, 2008). These findings implicate the encoding of indexical information in long-term memory and provide support for an episodic mental lexicon (Goldinger, 1998).

In other cognitive domains, there is a well-known auditory advantage in the short-term memory (STM) literature, with several findings implicating the primacy of auditory codes in STM. For example, better memory for auditory compared to visually presented lists, especially in the primacy region, or the modality effect (Crowder, 1971; Penny, 1989); attenuation of the recency effect if an irrelevant speech sound is played at the end of list presentation, or the suffix effect (Crowder & Morton, 1969); and fewer false memories for auditory versus visually presented lists of semantic associates (Olszewska, Reuter-Lorenz, Munier, & Bendler, 2015) but the reverse for phonological associates (Lim & Goh, 2019).

These selected examples highlight some of the important findings that differentiate studies using auditory versus visual stimuli, and studies using auditory tokens produced by multiple talkers. They point to the utility of having a large and easily accessible database of auditory tokens for experimental research.

Spoken word databases

As noted earlier, a significant bottleneck in auditory word recognition research has to do with the difficulty of developing auditory stimuli. The vast majority of existing speech databases comprise recordings of sentences, connected speech, and dialogue (e.g., TIMIT Acoustic-Phonetic Continuous Speech Corpus – Garofolo et al., 1993; The British National Corpus, 2007). These are generally not suitable for research using isolated spoken words. Some large isolated word databases tend to be tied to very specific contexts (e.g., 3000 names of Japanese railroad stations – Makino, Abe, & Kido, 1988). Hence, many researchers using auditory tokens prepare their own stimuli from scratch for most new studies.

In 2014, at the initial stages of the current project, there were no large spoken word databases readily available. Since then, three have been published and are summarized in Table 1.

Table 1 List of large spoken word databases published after 2014

Score	Label
1	I have never seen the word before.
2	I think that I might have seen the word somewhere before.
3	I am somewhat sure that I have seen the word before, but am not certain.
4	I have definitely seen the word before, but I don’t know its meaning.
5	I am certain that I have seen the word before, but only have a vague idea about its meaning.
6	I think I might know the meaning of the word, but am not certain that the meaning I know is correct.
7	I recognize the word and am confident that I know at least one meaning.

The Auditory English Lexicon Project: A multi-talker, multi-region psycholinguistic database of 10,170 spoken words and nonwords

Abstract

Similar content being viewed by others

The perceptual flow of phonetic information

The Jena Speaker Set (JESS)—A database of voice stimuli from unfamiliar young and old adult speakers

SyllabO+: A new tool to study sublexical phenomena in spoken Quebec French

Auditory word processing

Spoken word databases

Auditory megastudies

The Auditory English Lexicon Project

Stimulus selection

Words

Nonwords

Descriptive characteristics

Word Recording

Talker Selection

Recording and editing procedures

Word identification

Materials

Participants

Procedure

Scoring

Word recognition

Materials

Participants

Procedure

Behavioral measures

Sample analyses and uses of the data

Intelligibility data

Auditory Lexical Decision Data

Item-level regressions

Linear mixed effects modeling

Virtual factorial experiments

Word frequency

Onset density

Number of features

The AELP website

Generate

Submit

Downloads

Conclusions

Open Practices Statement

Supplemental Material

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Electronic supplementary material

ESM 1

ESM 2

ESM 3

Appendices

Appendix 1 Familiarity rating scale

Appendix 2 Descriptive statistics of database properties

Appendix 3 Descriptive statistics of lexical decision measures and token durations

Appendix 4. Other lists in the downloads section of the website

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation