Introduction

In the last decade, a growing body of research on the relationships between language and affect has shown that the emotional content of words affects comprehension processes, challenging semantic models of word recognition and text comprehension that typically have not considered this important aspect (Jacobs, 2011; Jacobs et al., 2015). More specifically, emotionally laden words are recognized faster and have processing priority when compared to neutral words. Emotionally laden words also show larger amplitudes of the event-related potential (ERP) components associated with emotional-stimulus processing; furthermore, their processing is subserved by a network of brain regions functionally associated with affective states, as has been revealed by brain-imaging studies (for reviews, see Citron, 2012; Kissler, Assadollahi, & Herbert, 2006). Texts containing emotional information specifically activated the bilateral ventromedial prefrontal cortex (vmPFC) and the left amygdala, both regions associated with emotion processing, whereas texts containing chronological or spatial information activated different and distinct networks (Ferstl, Rinck, & von Cramon, 2005; Ferstl & von Cramon, 2007). In addition, emotionally negative texts activated brain regions associated with theory of mind as well as the vmPFC more strongly than did texts containing emotionally neutral information (Altmann, Bohrn, Lubrich, Menninghaus, & Jacobs, 2012).

According to dimensional models of affect, valence describes the extent to which a stimulus is positive or negative, and arousal refers to its degree of physiological activation (i.e., how calming or exciting/agitating a stimulus is; Lang, Bradley, & Cuthbert, 1997; Reisenzein, 1994; Russell, 2003). These dimensions typically show a quadratic relationship, whereby highly positive and negative stimuli are also highly arousing, whereas emotionally neutral stimuli tend to be low in arousal (e.g., Bradley & Lang, 1999; Võ et al., 2009). Several single-word corpora have also suggested that negative word meanings are more arousing than positive ones (e.g., Citron, Weekes, & Ferstl, 2014b; Schmidtke, Schröder, Jacobs, & Conrad, 2014; Võ et al., 2009). Although correlated, these two dimensions also show partial distinction, as evidenced by rating as well as neuroimaging studies (cf. Citron et al., 2014b; Lewis, Critchley, Rotshtein, & Dolan, 2007).

Emotion and figurative language

To date, most of the psychological and neuroscientific research on the relationship between language and emotion has been centered on literal language, despite the pervasiveness of nonliteral expressions in everyday communication. In fact, estimates based on simple frequency counts showed that people use approximately six nonliteral expressions per minute of discourse (Pollio, Barlow, Fine, & Pollio, 1977). Furthermore, at least in American English, there seem to be as many fixed expressions as there are words (Jackendoff, 1995). Nevertheless, the role of figurative language in conveying affect is still poorly understood (Citron & Goldberg, 2014; Fainsilber & Ortony, 1987; Fussell & Moss, 1998; Schrott & Jacobs, 2011).

Generally speaking, a word’s or sentence’s meaning is considered figurative if the conveyed meaning differs from the literal meaning assigned to the word(s)—that is, when the speaker/hearer has to go beyond the conventional word meaning and construct the intended sentential meaning by also using knowledge stored in semantic memory: For example, the interpretation of She had a rough day requires the reader to assign to rough the nonliteral meaning difficult or straining, rather than the literal reference to texture. Figurative language is formed by a variety of different types of expressions (e.g., metaphors, proverb, idioms, and oxymora). The present study tested the affective and psycholinguistic characteristics of idioms (e.g., to spill the beans), which are in many languages the most frequent instance of figurative language. Idioms are strings of words whose global meaning cannot generally be inferred solely on the basis of the meaning of the constituent words, and therefore has to be retrieved from semantic memory. The relationship between lexical items and phrasal meaning is to a large extent arbitrary and learned, although this does not imply that individual lexical items do not constrain the semantic and syntactic operations that an idiom can undergo while still retaining a figurative interpretation (for an overview, see Cacciari, 2014). The idiomatic meaning and the default idiom structure are stored in long-term semantic memory together with word meanings, concepts and many other types of multiword strings. Idioms are different from metaphors (although some idioms can diachronically come from metaphors) since metaphors (even the most frozen ones) do not possess a unique standardized meaning, and can convey more than one meaning depending on context; This occurs also in highly conventionalized metaphors such as Bob is an elephant, which can mean that he is clumsy, extremely big, a blunderer, and so forth. Idioms indeed have a unique meaning that can be specialized but not radically modified by context. As Konopka and Bock (2009) pointed out, speakers cannot retrieve and productively combine words online to create an idiomatic expression. Some idioms allow for some forms of variation—for example, She did not spill a single bean, The beans have been spilled, and Im going to spill the beans, none of which change the idiomatic meaning of “to reveal a secret.” However, this should not to be confounded with true metaphorical language. In contrast, we can create a metaphor on the fly, although not necessarily an apt one. Basically, metaphors concern categorization processes, whereas idioms require meaning retrieval from semantic memory (Cacciari & Glucksberg, 1994; Cacciari & Papagno, 2012; Glucksberg, 2001). Idioms, then, differ from proverbs—for example, You cant get blood from a stone or Two wrongs dont make a right—since the proverbs are temporarily undefined full sentences, signaled by specific grammatical, phonetic, and/or rhetorical patterns, or by a binary structure (theme/comment); in general, proverbs are literally and figuratively true statements (Ferretti, Schwint, & Katz, 2007; Turner & Katz, 1997).

Why do we use figurative rather than—or together with—literal language to speak about affect? This seemingly easy question has received few answers. Pioneering work by Fainsilber and Ortony (1987) has shown that figurative language is preferred to literal language in oral descriptions of autobiographical emotional experiences. In particular, participants used significantly more metaphors for describing how they felt during a specific event than what they did in the same circumstances. Furthermore, participants used more figurative expressions when asked to describe emotionally intense events than mildly intense ones (Fainsilber & Ortony, 1987; Fussell & Moss, 1998). Discourse analysis has shown that idiomatic expressions are preferred when speakers express complaints (Drew & Holt, 1988, 1998), presumably to elicit empathy in the addressee and thus become more convincing. Specifically, speakers were more likely to use idiomatic than literal expressions when summarizing their complaints in the presence of nonempathic interlocutors (Drew & Holt, 1988) and in topic transitions (Drew & Holt, 1998). Recent brain-imaging evidence has shown that nonliteral sentences evoked stronger implicit emotional responses than literal sentences (Bohrn, Altmann, & Jacobs, 2012). Similarly, a study on taste metaphors showed that metaphorical sentences elicited enhanced activation of the amygdala compared to their literal counterparts, which were matched for valence and arousal (Citron & Goldberg, 2014).

A few neurocognitive studies to date have controlled the affective characteristics of figurative as compared to literal language, with a predominant interest in metaphors (Bohrn, Altmann, Lubrich, Menninghaus, & Jacobs, 2012; Citron & Goldberg, 2014; Forgács et al., 2012; Forgács, Lukács, & Pléh, 2014). However, to the best of our knowledge, no study has yet provided descriptive norms of affective variables for the most common among figurative expressions—namely, idioms. To start filling this gap, the present descriptive study offers norms of affective and psycholinguistic properties for a set of 619 German idiomatic expressions (see also Fellbaum & Geyken, 2005, for a linguistically descriptive database). These data provide a structured tool for selecting experimental stimuli for future studies investigating the role of affect in nonliteral language. We chose idioms rather than other types of figurative expressions because idioms are frequent, highly conventionalized nonliteral strings of words with shared meanings. Thus, native speakers of a target language can easily rate many of their properties. Furthermore, these expressions are often semantically and syntactically flexible and can be embedded in different contexts without losing or changing their core meaning, and this allows idioms to be employed in many different experimental designs.

Descriptive norms of idiomatic expressions already exist for a few languages, including American English (Abel, 2003; Cronk, Lima, & Schweigert, 1993; Libben & Titone, 2008; Titone & Connine, 1994b), Italian (Tabossi, Arduino, & Fanari, 2011), and French (Bonin, Méot, & Bugaiska, 2013; Caillies, 2009). These databases provide ratings for the psycholinguistic properties of idioms (e.g., ambiguity, familiarity, and knowledge of the idiomatic meaning), and some studies also include reaction time data from different tasks—namely, online reading comprehension (Bonin et al., 2013), offline and online meaningfulness judgment (Libben & Titone, 2008), and self-paced reading (Cronk et al., 1993). However, these studies have not considered whether idiomatic meanings conveyed an affective content (differentiated for valence and arousal), or whether their meaning was concrete or abstract. Hence, the present study not only offers norms for the main psycholinguistic variables affecting idiom processing in German, but also provides new data on variables ignored in prior normative studies. In the present study, we also aimed to explore the relationships within affective variables and between these variables and the psycholinguistic norms.

The psycholinguistic characteristics of idioms

Several models have been proposed to account for idiom comprehension (for overviews, see Cacciari, 2014; Cacciari, Padovani, & Corradini, 2007; Libben & Titone, 2008; Titone & Connine, 1994a). According to lexical look-up models, idioms are fixed expressions listed in the mental lexicon, either together with other lexical units (Swinney & Cutler, 1979) or in a separate list (Bobrow & Bell, 1973). In this model, linguistic processing of the string and retrieval of the idiomatic meaning proceed in parallel, with the retrieval of the idiomatic meaning being faster than the computation of its literal meaning. According to the configuration hypothesis (Cacciari & Tabossi, 1988; Vespignani, Canal, Molinaro, Fonda, & Cacciari, 2009), idioms are processed word by word, like any other piece of language, until enough information has accumulated to render the sequence of words identifiable as—or highly expected to be—an idiom. At this point, the idiomatic meaning is retrieved.

The recent studies on idiom processing that led to nonlexical models highlighted that idioms differ in many respects, and that studies on idiom comprehension must take this variability into account in order to satisfactorily account for their comprehension. In fact, many psycholinguistic properties have been shown to affect idiom processing (for overviews, see Cacciari, 2014; Cacciari & Glucksberg, 1994; Libben & Titone, 2008). In this study, we collected descriptive norms of the most important of these variables, together with norms for concreteness, valence and arousal of the idiomatic meaning. Below we provide a definition for each of the psycholinguistic and affective characteristics investigated (familiarity, knowledge, confidence, ambiguity, semantic transparency, figurativeness, concreteness, valence, and arousal), accompanied by a summary of previous results. The details concerning data collection are presented in the “Method” section.

  1. (a)

    Familiarity refers to the subjective frequency of exposure to idioms (Titone & Connine, 1994b)—namely, how often one has read or heard an idiom. Familiarity may differ from objective frequency estimates, which are based on written and spoken databases. Subjective frequency estimates may provide information that helps the choice of the experimental materials of idiom studies for several reasons: (1) Idiomatic expressions are not necessarily processed word by word; (2) very common idioms may contain nonfrequent or old-fashioned words but nonetheless be understood very quickly and easily; (3) an idiom’s meaning is not necessarily associated with the meaning of its constituent words, and as such, the frequency of its constituent words may play less of a role than in the case of literal sentences. Highly familiar idioms have been shown to be comprehended faster and more accurately than less familiar ones (Cronk et al., 1993; Cronk & Schweigert, 1992; Libben & Titone, 2008). Previous norming studies reported highly positive correlations between familiarity, intended as subjective frequency of exposure, other-based familiarity [i.e., an estimate of how well others know the idiom; Tabossi et al. (2011) and Bonin et al. (2013) operationally defined this notion as “how well you think the expression is known by people like you, independently of whether you know it” (Tabossi et al., 2011, p. 115)], confidence about one’s own knowledge, and knowledge of the idiomatic meaning (Bonin et al., 2013; Libben & Titone, 2008; Tabossi et al., 2011; Titone & Connine, 1994b). In general, the vast majority of idioms are estimated to have been acquired at approximately 9 years of age (Libben & Titone, 2008; Tabossi et al., 2011). Familiar and well-known idioms are also estimated to have been acquired earlier. However, Bonin et al. (2013) reported that the estimated age of acquisition was a better predictor of comprehension times than familiarity, with faster reaction times to idioms that were acquired earlier.

    The subjective estimate of the frequency of exposure to idiomatic expressions is a better predictor of idiom processing than is a measure of frequency obtained by combining the single frequencies of an idiom’s constituent words (Bonin et al., 2013; Libben & Titone, 2008). This may reflect the fact that idiomatic meanings are often arbitrarily related to the meanings of their constituent words and that sometimes familiar idioms contain words that are no longer used out of the idiomatic context. For example, Flausen is only used in Flausen im Kopf haben, meaning to have nonsense/weird ideas in mind in German. Finally, the reliability of estimates of other-based familiarity (Bonin et al., 2013; Tabossi et al., 2011) can be problematic, since it is more likely that participants can reliably estimate their own frequency of exposure to an idiomatic expression than how well other people know such expressions (Cronk et al., 1993; Libben & Titone, 2008; Titone & Connine, 1994b).

  2. (b)

    Knowledge of idiomatic meaning refers to whether or not the correct idiomatic meaning is known. Some studies (Tabossi et al., 2011) tested this variable by asking participants to provide a written explanation of the idiomatic meaning, whereas others measured the extent to which participants were confident about their own knowledge of the idiomatic meaning (Bonin et al., 2013; Libben & Titone, 2008; Titone & Connine, 1994b). However, the latter procedure does not necessarily provide a reliable measure of the actual idiom knowledge. In fact, speakers may be very confident about their incorrect knowledge of an idiomatic meaning, particularly for less-familiar idioms. Since knowledge of the idiomatic meaning and confidence do differ, we tested these two variables separately. Confidence about the knowledge of the idiomatic meaning was rated before participants wrote down a definition of the idiom meaning. Confidence about one’s own knowledge as well as other-based familiarity have been shown to speed up online reading comprehension times (Bonin et al., 2013).

  3. (c)

    Ambiguity (also referred to as literality) refers to whether an idiom string also has a semantically plausible literal meaning (Cronk et al., 1993). In fact, some idioms are ambiguous, since they have both a literally plausible and an idiomatic meaning (e.g., kick the bucket can describe a literally plausible action, beyond the idiomatic meaning “to die”). In some normative studies (e.g., Bonin et al., 2013; Tabossi et al., 2011), participants have been asked to rate how often they came across an idiom used in a literal sense. To avoid any bias due to the preponderant figurative use of idiom strings, we did not ask participants to provide ratings of ambiguity. Rather, the experimenters divided the German idioms into ambiguous and unambiguous idioms on the basis of the presence versus absence of a semantically plausible literal meaning. Idiom ambiguity generally shows a less consistent pattern of correlations with other variables across studies: Ambiguity correlated negatively with other-based familiarity (Tabossi et al., 2011) and with confidence (Libben & Titone, 2008), and positively with subjective frequency (Bonin et al., 2013; Cronk et al., 1993). This suggests that participants rated the literal meaning as being more plausible when others were supposed to be familiar with the idiomatic meaning, and also when idioms had a high subjective frequency. Idiom ambiguity significantly predicted accuracy (of participants’ meaningfulness judgments), in that responses were more accurate to idioms with literally plausible meanings than to idioms without a literal counterpart (Libben & Titone, 2008). However, it still is unclear whether literally plausible and literally implausible idioms are comprehended with similar ease and through the same processes (Cacciari, 2014). Reaction time studies have suggested that the figurative meaning of literally plausible idioms is accessed faster than that of implausible idioms (Cronk & Schweigert, 1992; Libben & Titone, 2008; Mueller & Gibbs, 1987). This has been supported by a case study on semantic dementia (Papagno & Cacciari, 2010) and a study on aphasic patients (Papagno & Caporali, 2007), but has been contradicted by a different study on aphasic patients (Cacciari et al., 2006) that reported impaired comprehension of literally plausible than implausible idioms, possibly due to difficulty in inhibiting the literal meaning.

    In the present study, we specifically tested whether the psycholinguistic and affective properties of literally plausible and implausible idioms differed and/or showed different patterns of correlations.

  4. (d)

    Semantic transparency refers to the ease with which the figurative meaning of an idiom can be inferred from the constituent words’ literal meanings. For example, keep in touch is a relatively transparent idiom, since its figurative meaning “maintain social contact with someone” can be easily inferred from the constituent words. In contrast, the idiomatic meaning of kick the bucket is opaque, and the figurative meaning “to die” must be learned. This variable is quite problematic and unstable, because transparency estimates are based on readers’/listeners’ intuitions that derive from the knowledge of the idiomatic meaning. In fact, familiar idioms tend to be perceived as more transparent than unfamiliar ones, because they are repeatedly used with that stipulated meaning (Keysar & Bly, 1995). Studies reporting faster responses to semantic transparent, or decomposable, idioms than to nontransparent ones (Gibbs, Nayak, & Cutting, 1989) have predominantly used offline paradigms, which measure late interpretative phases rather than real-time comprehension processes (Tabossi, Fanari, & Wolf, 2008). Some studies have reported that the more semantically transparent an idiom is, the more familiar (Abel, 2003), the less ambiguous (Libben & Titone, 2008), and the better known (Tabossi et al., 2011) it seems to be. However, no correlation between semantic transparency and familiarity was found by Tabossi et al. (2011). This lack of consensus may also be due to the high variability in individual participants’ ratings of semantic transparency (Cacciari & Glucksberg, 1995; Levorato & Cacciari, 1999; Tabossi et al., 2011).

  5. (e)

    Figurativeness refers to the extent to which an idiomatic expression is perceived as expressing a nonliteral meaning. For instance, the Italian idiom dormire come un ghiro (“to sleep as a dormouse,” in English to sleep like a log) denotes a way of sleeping—an action literally expressed by the verb—and may be perceived as less figurative than idioms in which the verb changes its basic meaning, as for instance in to get cold feet (i.e., to become afraid and to refrain from doing something). Although this variable has not been considered in previous normative studies on idioms, we decided to include it in order to investigate whether the perceived degree of idiomaticity correlates with other variables.

  6. (f)

    Length is measured by either the number of composing words or the number of letters. Evidence has shown that, all other things being equal, the meaning of short idioms (i.e., with few words) is not yet available at the string offset, unless the prior context creates a bias toward an idiomatic interpretation. In contrast, the idiomatic meaning of long idioms is available at the string offset, regardless of the preceding context (Fanari, Cacciari, & Tabossi, 2010). Idiom length (in letters) significantly explained most of the variance in reading times (Bonin et al., 2013).

  7. (g)

    Concreteness refers to the extent to which an idiomatic meaning refers to a state or event that one can experience in one or more sensory modalities (cf. Paivio, 2007; Paivio, Yuille, & Madigan, 1968). This characteristic has not been assessed in previous normative studies of idioms, despite the fact that a vast literature has shown that concrete words are more easily accessed and processed than abstract words (e.g., Adorni & Proverbio, 2012; Zhang, Guo, Ding, & Wang, 2006). Concreteness is sometimes confused with imageability, which instead refers to the ability to create a mental image of a word (Paivio, 2007). Imageability also facilitates word processing (e.g., Sabsevitz, Medler, Seidenberg, & Binder, 2005). Emotionally valenced abstract words are rated as more imageable than neutral abstract words (Altarriba & Bauer, 2004). Imageability differs from concreteness in that even abstract concepts may be imageable (e.g., joy), whereas some concrete concepts (e.g., sloth) may be less so. Imageability and concreteness are usually positively correlated, and most of the variance they explain tends to overlap. We decided to measure only concreteness, since idiomatic meanings may be rather difficult to imagine in their nonliteral sense, due to the interference of the literal meaning of the constituent words (Cacciari & Glucksberg, 1995).

  8. (h)

    Emotional valence describes the extent to which a stimulus is positive or negative (Russell, 1980). Since normative studies on idiomatic expressions have not rated this variable (and arousal), we briefly review the literature on single words. Once a range of psycholinguistic variables have been controlled, emotionally valenced words have processing priority as compared to neutral words, leading to faster reaction times and higher accuracy in a variety of tasks (e.g., Citron, Weekes, & Ferstl, 2014a; Kousta, Vinson, & Vigliocco, 2009; Larsen, Mercer, & Balota, 2006). Furthermore, emotionally valenced words elicit a larger amplitude of ERP components associated with the processing of emotional stimuli (i.e., the early posterior negativity (EPN) and the late positive component (LPC)). Such words also elicit enhanced activation of brain regions associated with emotion processing (for an overview, see Citron, 2012). The results concerning the polarity of valence (positive vs. negative) have been mixed. Some studies reported processing facilitation and enhanced brain activity in response to positive but not to negative words, once the level of arousal had been matched (Citron, Gray, Critchley, Weekes, & Ferstl, 2014; Herbert et al., 2009; Kuchinke et al., 2005; Recio, Conrad, Hansen, & Jacobs, 2014). In contrast, other studies reported no difference between positive and negative words (Citron et al., 2014a; Kousta et al., 2009; Larsen et al., 2006), unless a block design was used (Algom, Chajut, & Lev, 2004; Nasrallah, Carmel, & Lavie, 2009).

  9. (i)

    Emotional arousal describes the excitation potential of a stimulus, independently of whether it is positive or negative (Barrett & Russell, 1998). Arousal ratings of words typically show a quadratic, U-shaped relationship with valence ratings (e.g., Bradley & Lang, 1999; Võ et al., 2009): The more emotionally valenced a word is, the more arousing it typically is. However, note that this overall U-shaped distribution involves a particularly strong negative linear correlation of arousal with valence within the domain of negative words, which sometimes leads to an overall negative linear correlation (e.g., Citron et al., 2014b; Schmidtke et al., 2014; Võ et al., 2009). Highly arousing words are processed faster and more accurately and elicit stronger neural responses than nonarousing words, when valence is kept constant (Bayer, Sommer, & Schacht, 2012; Hofmann, Kuchinke, Tamm, Võ, & Jacobs, 2009; Recio et al., 2014). Nevertheless, emotional valence seems to be a stronger predictor of response speed and accuracy than arousal (Estes & Adelman, 2008; Kousta et al., 2009).

The present study

The aims of this study were (1) to provide descriptive norms for psycholinguistic and affective properties of a large set of German idioms, and (2) to explore the relationships between psycholinguistic and affective properties of idioms. Toward this aim, 624 idioms (see Table 1 for examples) were rated, using Likert scales, for emotional valence, arousal, familiarity, semantic transparency, figurativeness, and concreteness. Knowledge of the idiomatic meaning was assessed by asking participants to write down an explanation of each idiom’s meaning and then to rate their confidence. Ambiguity was categorically determined by the experimenters.

Table 1 Examples of idioms from our database

In terms of the psycholinguistic variables tested in previous normative studies, we expected to replicate the positive correlations between familiarity, knowledge, and confidence (Bonin et al., 2013; Libben & Titone, 2008; Tabossi et al., 2011). Since concreteness, figurativeness, length in words and in letters, and valence and arousal have not yet been tested for idioms, we did not have a priori predictions, bur rather we explored their possible correlations. We also tested for the first time whether the properties of ambiguous and unambiguous idioms differ, and how they are correlated. This might help clarify which psycholinguistic properties underlie their differences, if any. Finally, as for the relationships between affective variables, we expected to replicate the results based on single words (e.g., Bradley & Lang, 1999; Citron et al., 2014b; Schmidtke et al., 2014)—namely, a quadratic relationship between valence and arousal (i.e., the more highly valenced an idiom, the more arousing it is) and a negative linear relationship (i.e., negative idioms are rated as more arousing than positive idioms).

Method

Materials

Idiom selection

A total of 624 idiomatic expressions were selected from different Web sources (http://german.about.com, www.spruecheportal.de, and www.staff.uni-marburg.de/~naeser/idiom-ak.htm; Udem, 2001) and from a list of figurative expressions collected by Verena Simon during the “Bilingualism and Affectivity in Reading” project at the Cluster of Excellence “Languages of Emotion.” The criteria for identifying a figurative expression as an idiom were as follows: It consists of a verb phrase (VP) with one or more arguments—for example, to spill (VP) the beans ( dir obj), to give (VP) someone (indir obj) a hard time (dir obj); the verb can be inflected for person and time; its meaning is conventionalized (to distinguish it from a metaphor); and it should not be formed by an entire sentence that cannot be altered, as in proverbs—for example, A mans home is his castle. Because we expected that the variables to be rated (i.e., emotional valence, arousal, familiarity, concreteness, figurativeness, semantic transparency, confidence, and knowledge) would generalize to inflected forms, we only presented idioms in the standard, infinitival form. Lengths in letters and in words were calculated with Excel. The lengths of the 619 idioms that were retained (please refer to the Data Analysis section) ranged from two to nine words, and from nine to 43 letters.

Variables determined by the experimenters

Ambiguity—that is, whether an idiom had or did not have a semantically plausible literal meaning—was first established separately by the first, third, and fourth authors, who examined each of the 624 idioms and classified them as ambiguous or unambiguous, depending on whether a literal interpretation of the idiom is plausible or not. For instance, “to bite into the sour apple” (idiom no. 001) describes an event that can actually happen, and therefore we considered this an ambiguous idiom. On the other hand, “to have someone in the pear” (idiom no. 003) does not describe a plausible event, and therefore we classified it as an unambiguous idiom. Then, the individual decisions were compared; possible differences were discussed until an agreement on the categorization was reached; and a categorical variable (ambiguous vs. nonambiguous) was created.

Procedure

Instructions for the rating task

Instructions were presented in written form and contained a definition of each of the variables to be rated, some examples of sentences rated with extreme values, an explanation of the Likert scale, and an explanation of the labels of the extreme and middle points. The original German instructions, an English translation, and a screenshot of one of the questionnaires may be found in Appendix A. Familiarity referred to the frequency with which the participant had heard or read the idiom. The rating scale ranged from 1 (never heard/read) to 7 (often heard/read). Semantic transparency referred to the extent to which the figurative meaning of an idiom could be inferred from the meaning of its constituent words. The scale ranged from 1 (semantically transparent) to 7 (semantically opaque). Figurativeness was actually labeled Metaphoricity in the instructions. This was because the average German participant is more familiar with the concept of metaphor or metaphorical meaning than with figurative expression or idiom. In fact, in current German, “metaphoric” is used as a synonym of “figurative” (Bibliographisches Institut GmbH, 2013), despite the different and more specific uses of these terms in linguistics (e.g., Cacciari & Glucksberg, 1994; but please see the “introduction”). In this way, we aimed to increase participants’ understanding of their task. The Figurativeness/Metaphoricity scale referred to how much the meaning of an idiom was perceived as nonliteral, on a scale ranging from 1 (not at all figurative/metaphorical) to 7 (very figurative/metaphorical). Concreteness referred to the extent to which the figurative meaning could be experienced with one or more sensory modalities. The scale ranged from 1 (totally abstract) to 7 (totally concrete). Confidence was measured by asking participants to rate their knowledge of the idiomatic meaning on a scale from 1 (“I dont know the meaning at all”) to 7 (“I know the meaning very well”). Knowledge of the idiomatic meaning was assessed right after confidence by asking participants to write down the idiomatic meaning. Emotional valence referred to the extent to which the idiomatic meaning was positive or negative. The scale ranged from –3 (very negative) to +3 (very positive), through 0 (neutral). Arousal referred to the extent to which the idiomatic meaning was stimulating, on a scale from not stimulating at all to very exciting or agitating, independently of whether it was positive or negative. The rating scale ranged from 1 (not at all arousing) to 7 (very arousing). At the end of each scale, the option unknown was given.

Questionnaires

Online questionnaires were created using SurveyMonkey. Six separate questionnaires were used to measure the emotional valence, arousal, familiarity, concreteness, figurativeness, and semantic transparency of the entire set of 624 idioms. Each of these questionnaires contained the full set of idioms. Another questionnaire measured participants’ confidence about their own knowledge of the idiomatic meaning (through ratings), as well as their actual knowledge (through written definitions). Since this task required more time to be completed, we split the total number of idioms into two parts. Hence, each confidence/knowledge questionnaire contained only half of the idioms.

Eight randomized orders of the 624 idioms were first created. Then, each variable to be rated (except confidence and knowledge) was randomly assigned to four different randomizations, therefore creating four versions of each questionnaire for each variable. The confidence/knowledge variables were randomly assigned to three different randomizations, and each version was split in two halves, therefore creating six distinct questionnaires, each containing half of the stimuli.

Each variable was rated by at least 30 participants, who were each randomly assigned to a specific variable. Each participant was allowed to complete more than one questionnaire (each one on a different variable) and was rewarded accordingly. The online survey lasted approximately one and a half hours.

Participants

A total of 249 native speakers of German from the Berlin area (131 women, 118 men), between 19 and 67 years of age (Median = 30), completed the online survey. Participants were recruited through an online newsletter from the “Languages of Emotion” research cluster, to which our participant pool subscribed themselves. Our group included students from different faculties studying in Berlin and Potsdam, as well as unemployed or self-employed people. As a reward, participants were either paid €8 or given a lottery ticket for a raffle through which 20 Amazon vouchers worth €20 were awarded. One of the questionnaires was randomly assigned to each participant and accessed by him/her upon receipt of a unique URL via e-mail. We report here only data from participants who completed at least 9/10 of the questions on each questionnaire and completed them accurately.

We compared gender proportion and mean age across the different subgroups of participants who rated the different variables, to make sure that these demographic characteristics were balanced across variables. The analyses showed no significant differences in either gender proportion [χ 2(6) = 4.25, n.s.] or mean age [F(6, 248) = 1.39, n.s.] (see Table 2 for descriptive statistics).

Table 2 Gender proportion and age of the participants across all rated variables

Data analysis

We calculated the mean rating and standard deviation of each idiom for emotional valence, arousal, familiarity, semantic transparency, figurativeness, concreteness, and confidence. Knowledge of the idiomatic meaning was calculated by counting the number of correct definitions for each idiom. To determine the correctness of these definitions, the first author, together with a native German-speaking linguist, went through all definitions and compared them with the definitions given in our database (by a professional translator), as well as with the German definitions given at the “Redensarten-Index” webpage (Udem, 2001). Definitions that correctly matched the idiomatic meaning were considered correct. For example, the idiom seine Worte abwägen (i.e., “to weigh one’s words,” idiom no. 289) means “to think about something carefully before speaking”; in this case, we categorized as correct definitions such as “to think before speaking” or “to think carefully before speaking.”

In addition, we calculated the total number of valid responses relative to each variable for each idiom (i.e., either ratings or definitions), the percentage of “unknown” responses, and the number of omissions. An additional variable was created by calculating the square of emotional valence, which was called valence 2. This variable represents the degree of absolute emotionality of a stimulus, independent of its polarity—namely, independently of whether its linear valence is positive or negative (Udem, 2001). Valence2 enabled us to explore quadratic relationships between emotional valence and other variables.

Upon inspection of the idioms and rated variables, we found that for one idiom all participants reported not knowing its meaning in the confidence/knowledge questionnaire. We therefore excluded it from our database. We also found that four idioms appeared twice: One pair had identical items, two pairs had slightly different forms (i.e., the presence of an additional word that was not essential to constitute the idiomatic meaning), and one pair had a right and a wrong version. Thus, we eliminated one of the identical items in the first pair (chosen randomly), the least frequent items in the other two pairs, and the wrong item in the last pair, therefore retaining a total of 619 idiomatic expressions.

Distribution of variables and statistical methods

For each variable obtained, we calculated mean value, standard deviation, median value, minimum, maximum, mean percentage of valid responses, of “unknown” responses and of omissions. For the length measures, we also calculated the first five values. Most of the variables were not normally distributed. Length in letters, words, emotional arousal, and concreteness were slightly positively skewed, and confidence was negatively skewed; these variables were successfully logarithmically transformed. However, other variables could not be transformed successfully: Familiarity and figurativeness were slightly negatively skewed; knowledge of the idiomatic meaning was strongly negatively skewed (given that we sampled relatively common idiomatic expressions that are almost all well known by native speakers); semantic transparency had a platykurtic distribution; emotional valence had a (natural) binomial distribution; and valence2 had a quadratic distribution (and was hence strongly skewed). In order to make up for the lack of normality, we applied a bootstrapping technique to all our parametric statistical analyses (1,000 samples, 95% percentile confidence interval); this procedure allows for the estimation of the sampling distribution of almost any statistic through resampling the observed data, and is therefore distribution-independent (cf. Bradley & Lang, 1999; Citron et al., 2014b).

Relationships among variables

We calculated Pearson partial correlations between each linear affective variable and each other variable, as well as between each pair of nonaffective variables, by partialing out the effects of all remaining ones. On the basis of the literature (Efron & Tibshirani, 1993), emotional valence and arousal ratings were plotted against each other and showed a quadratic relationship (see Fig. 1). Therefore, we also computed a quadratic regression predicting arousal ratings from valence: In a first step, we entered all variables of no interest in order to partial out their effects, and in a second step we entered valence and valence2 as predictors (i.e., the quadratic regression equation). Finally, we further conducted quadratic regressions for any nonaffective variable that correlated significantly with valence, to explore whether a quadratic function would better explain their relationship. Significant partial correlations up to ± .1 are referred to as “small correlations,” between ± .1 and ± .3 as “moderate correlations,” and between ± .3 and ± .5 as “large correlations.”

Fig. 1
figure 1

Distribution of our idioms’ affective properties. Emotional valence ratings (–3 = very negative, +3 = very positive, 0 = neutral) plotted against arousal ratings (1 = not at all arousing, 7 = very highly arousing)

Ambiguous versus unambiguous idioms

Since we were interested in possible differences between these two types of idioms, rather than in the relationship between ambiguity and other variables, we calculated descriptive statistics for each continuous variable (broken down by idioms’ ambiguity) and ran t tests to compare the two conditions. We also recalculated the correlations just described, separately for ambiguous and unambiguous idioms.

Reliability analysis

We conducted a reliability analysis based on internal consistency (Cronbach’s alpha), also referred to as intraclass correlations. We chose this analysis because it represents a more reliable measure than the split-half procedure (also referred to as product-moment correlations), as outlined, for example, in Cicchetti (1994). We used the raw ratings from each participant as a different variable and the single idioms as cases and obtained different Cronbach’s alpha values for emotional valence, arousal, familiarity, concreteness, figurativeness, and semantic transparency. However, the confidence and knowledge questionnaires were organized differently: Three different randomizations of all idioms were first applied, and then the randomized idioms were split into two halves. Thus, the six resulting questionnaires had to be analyzed separately (with sample sizes of nine to 12 participants only), since each of them contained a unique combination of idioms.

Results and discussion

The full list of 619 idiomatic expressions with their literal translation in English, their idiomatic meanings, and the means and standard deviations for all variables are reported in the Supplementary Material.

Descriptive statistics for each variable are reported in Table 3. The mean emotional valence and arousal values across idioms varied from very negative to very positive values and from very low to very high arousal values (please refer to Fig. 1). The median valence value was negative, suggesting a higher proportion of negatively than positively valenced items. Overall, the idioms were rated as being familiar and having predominantly abstract meanings. Furthermore, they were evaluated as moderately figurative and semantically transparent. Knowledge of the idiomatic meaning was high (Median = 94%), as was confidence (Median = 6.23), suggesting that the selected idioms were known and frequently used (see the Familiarity paragraph above). The mean percentage of omissions (1.84%) and of “I don’t know” responses (1.80%) were very low. However, the percentages slightly increased for the variables confidence and idiom knowledge, suggesting that, when explicitly asked about their knowledge of the idiomatic meanings, participants admitted that they did not know some of them.

Table 3 Descriptive statistics for each rated or calculated variable

In the following subsections, we report the partial correlations between the psycholinguistic and affective variables. A matrix of simple linear correlations is reported in Appendix B, Table C1.

Relationships between affective variables

The idiom list contained more than twice as many idioms with negative valence (N = 422) as with positive valence (N = 194; three idioms had valence = 0). This difference is unlikely to be due to the specific sample selected. Rather, it may reflect the fact that, since idiomatic expressions typically provide an indirect form of communication (Cacciari, 1998; Drew & Holt, 1988), they are preferred over literal expressions for negative statements.

In the quadratic regression predicting arousal from valence ratings, the first model, including all psycholinguistic variables, accounted for 12% of the variance (R 2 = .12, r = .35), F(8, 610) = 10.70, p < .001, whereas the second model, also including valence and valence squared, accounted for an additional 26% (R 2 = .38, r = .62), F(2, 608) = 128.29, p < .001, with both valence and valence2 as significant predictors. [The regression line was as follows: estimated arousal = 0.15 × familiarity + 0.13 × concreteness + 0.29 × figurativeness + 0.15 semantic transparency – 0.13 × valence + 0.48 × valence2.] Thus, the more emotionally valenced an idiomatic meaning was, the higher its level of arousal; the quadratic relationship between valence and arousal can be seen in Fig. 1. This result is in line with the typical U-shaped relationship between emotional dimensions that has repeatedly been found for single words (e.g., Bradley & Lang, 1999; Citron et al., 2014b; Schmidtke et al., 2014; Võ et al., 2009). In addition, higher level of arousal were attributed to negative than to positive idioms [t(614) = 5.89, p < .0001], a result often reported for single words (Citron et al., 2014b; Schmidtke et al., 2014; Võ et al., 2009).

Correlations between affective and psycholinguistic variables

In what follows, we present statistically significant partial correlations between variables (α = .05) (see the specific tables for Pearson’s r and p values).

Arousal and figurativeness, concreteness, semantic transparency, familiarity

A first interesting result was a moderate positive partial correlation between arousal and figurativeness (see Table 4 for correlations) that suggests an association between the nonliterality of the string and the attributed emotional–physiological intensity.

Table 4 Linear partial correlations between affective and psycholinguistic variables

A second interesting result was a small positive partial correlation between arousal and concreteness: The more concrete an idiom’s meaning was, the more emotionally arousing it was rated. The fact that concrete concepts have direct reference to one or more sensory modalities may have led to higher ratings of physiological arousal. A positive correlation between arousal and imageability was reported by Citron et al. (2014b) for single words. Our study and Citron et al.’s (2014b) study showed positive correlations within samples that contained mostly abstract items. However, larger word corpora with more balanced distributions of concrete and abstract words showed a negative quadratic correlation between arousal and concreteness (Montefinese, Ambrosini, Fairfield, & Mammarella, 2014) and negative linear and quadratic correlations between arousal and imageability (Schmidtke et al., 2014). Therefore, it seems that more research on the relationship between arousal and concreteness in idioms is needed to confirm and generalize our result.

Arousal also showed a moderate positive partial correlation with semantic transparency, in that the more transparent was the meaning of an idiom, the more arousing it was rated. Perhaps it may be easier to attribute high arousal values to idioms in which the literal meaning of the constituent words clearly contributes to the idiomatic interpretation.

Finally, arousal also had a moderate positive partial correlation with familiarity, in line with the results from single words (Montefinese et al., 2014).

Valence and familiarity

Emotional valence had a moderate positive partial linear correlation and a significant quadratic relationship with familiarity. The first regression model, which included all psycholinguistic variables and arousal, accounted for 19% of the variance (R 2 = .19, r = .44), F(8, 610) = 17.85, p < .0001, whereas the second model, which also included valence and valence squared, accounted for an additional 3% (R 2 = .22, r = .47), F(2, 608) = 11.17, p < .001. [The regression line was as follows: estimated familiarity = 0.19 × arousal (log 10) + 0.38 × knowledge – 0.23 × length in letters (log 10) + 0.11 × valence – 0.14 × valence2.]

In sum, the more positive an idiom, the more familiar it was rated, and the more highly valenced an idiom, the less familiar it was rated. This linear relationship is in line with previous findings on emotional words (Citron et al., 2014b), and the quadratic relationship confirms the first result: In fact, in the present corpus the most highly valenced idioms were mostly negative. This may reflect the bias hypothesized by Citron et al. (2014b) in a normative study, according to which participants may be more prone to declare that they are familiar with positive than with negative concepts (Citron et al., 2014b). However, for idioms, there is a much smaller number of items that convey positive than negative meanings, possibly leading to a higher frequency of use of positive than of negative idioms.

Emotional valence also showed a small positive partial linear correlation with length in words, but no significant quadratic relationship (R 2 change = .004), F(2, 608) = 2.71, n.s.

Correlations between nonaffective variables

Familiarity, confidence, and knowledge

Familiarity had a large partial positive correlation with knowledge of the idiomatic meaning, suggesting that the more familiar an idiom is, the better it is known; however, we found no significant correlation with confidence (please refer to Table 5). Furthermore, knowledge and confidence were not correlated. These results suggest that measuring participants’ knowledge of idiomatic meanings, without controlling whether or not their knowledge was correct (Bonin et al., 2013; Libben & Titone, 2008; Titone & Connine, 1994a), may be problematic. Only Tabossi et al. (2011) have measured idiom knowledge by asking participants to explain the idiomatic meaning. They also found a large positive correlation between knowledge and familiarity (r = .49). However, it should be noted that Tabossi et al. did not test the subjective frequency of idioms, but rather only other-based familiarity (see the Familiarity paragraph in the introduction), therefore obtaining potentially different estimates.

Table 5 Linear partial correlations between nonaffective variables

Figurativeness, concreteness, and semantic transparency

Figurativeness had a large partial negative correlation with semantic transparency and a moderate negative partial correlation with concreteness. These results suggest that the more figurative a meaning was, the less semantically transparent and the less concrete it was rated. Semantic transparency refers to how easily the idiomatic meaning can be inferred from the literal meanings of the constituent words. Therefore, a transparent idiom may be perceived as less figurative than a semantically opaque one. The negative partial correlation between figurativeness and concreteness suggests that the less abstract an idiom was, the less figurative it was considered. It should be noted that we had more abstract than concrete idioms overall, and that this reflects the fact that typically idioms refer to abstract events. Finally, semantic transparency showed a moderate positive partial correlation with concreteness.

Length, familiarity, semantic transparency, and figurativeness

Finally, length in letters had a moderate partial negative correlation with familiarity. The tendency of short idioms to be more frequently encountered or produced is in line with what is typically found for single words (Bird, Franklin, & Howard, 2001; Citron et al., 2014b; Stadthagen-Gonzalez & Davis, 2006). Furthermore, idiom length in words showed positive moderate and small partial correlations with figurativeness and semantic transparency, respectively. Perhaps the more semantic information participants had, the easier it was to rate the degree of figurativeness and semantic transparency of the idiom. In contrast, short idioms convey less semantic information, and their interpretation can be less dependent on the literal meaning of the word or word string.

Ambiguous versus unambiguous idioms

Ambiguous idioms had slightly lower valence2 [t(602.63) = 3.42, p < .01] (i.e., were less emotionally valenced, independently of whether positively or negatively) and lower arousal mean values [t(616.41) = 3.35, p < .01] than unambiguous idioms (please refer to Table 6 for descriptive statistics). Furthermore, ambiguous idioms were rated as significantly more concrete [t(617) = 8.85, p < .001] than unambiguous ones, and were less correctly defined [t(617) = 2.79, p < .01]. How can we interpret these differences? One possibility is that, since ambiguous idioms also have a literal plausible meaning, they may more easily evoke a concrete (literal) meaning as well as a more abstract, idiomatic meaning. This would be consistent with the results of a study by Cacciari and Glucksberg (1995), in which participants were asked to produce and describe a mental image for a set of semantically ambiguous idioms. The images they obtained overwhelmingly reflected the literal meanings of the idiomatic strings, rather than the idiomatic meanings. In addition, these literally oriented mental images interfered with idiomatic paraphrase verification times.

Table 6 Descriptive statistics of each rated or calculated variable, broken down my meaning type: Ambiguous versus unambiguous

Since unambiguous idioms only possess a figurative interpretation, the fact that they are rated as more arousing and emotionally valenced than ambiguous idioms fits nicely with recent neuroimaging data showing that figurative formulations are more emotionally engaging than their literal counterparts (Citron & Goldberg, 2014).

We did not observe significant differences in the correlations when they were calculated separately for ambiguous and unambiguous idioms.

Reliability analysis

The analyses showed high reliability of the measures of the variables for which we had 30 or more raters (see Table 7). However, we could not obtain high alpha values for the confidence and knowledge variables, due to the small sample sizes (please see Appendix B, Table C2, for more details).

Table 7 Measures of internal consistency (Cronbach’s α) for variables with a minimum sample size of N = 30

General discussion

The aims of this study were to provide norms for psycholinguistic and affective properties of a large number of German idioms and to explore, for the first time, the relationships between affective and psycholinguistic properties. In what follows, we summarize the main results of this descriptive study, starting from the affective characteristics of the 619 German idioms, since their investigation represents the major contribution of this study.

We found that the more emotionally valenced an idiomatic meaning was, the higher its level of arousal, with negative idioms being evaluated as leading to a higher level of arousal than positive idioms. Although interesting, this result may have been partly influenced by the composition of our idiom list, wherein more than two thirds of the items had negative valence (422 out of 619). However, these results may also reflect the fact that nonliteral language tends to be preferred over literal language when speakers make negative statements (cf. Cacciari, 1998; Drew & Holt, 1988). This is also indirectly supported by the fact that the more figurative an idiom was, the more “arousing” it was rated. Additionally, the concreteness of the idiomatic meaning was positively correlated with emotional arousal, presumably reflecting the fact that concrete concepts with a direct reference to sensory modalities may be seen as more linked to physiological states.

However, since studies on single words have already consistently shown that a high proportion of negative words tend to elicit higher arousal ratings (e.g., Citron et al., 2014b; Võ et al., 2009), our results may not be idiom-specific, but rather reflect a general feature of language.

The observed positive correlations of semantic transparency and arousal may reflect the fact that it may be easier to attribute high arousal values to idioms in which the literal meaning of the constituent words clearly contributes to the idiomatic interpretation. If so, it would be necessary to separate the arousal value of the single words from that of the entire idiom.

Emotional valence had a positive linear correlation and a negative quadratic correlation with familiarity. Since the number of idiomatic expressions used to convey positive concepts is much smaller than the number of negative idiomatic expressions, positive idioms may be more frequently used than negative ones.

We now turn to the relationships between psycholinguistic variables. Familiarity (i.e., subjective frequency) was positively correlated with knowledge of the idiomatic meaning, confirming previous findings (Bonin et al., 2013; Libben & Titone, 2008; Tabossi et al., 2011; Titone & Connine, 1994b). However, unlike in previous studies, we did not observe any significant correlation between familiarity and confidence or between knowledge and confidence. These discrepancies may reflect, at least in part, differences in the ways in which these variables were measured. For example, Tabossi et al. (2011) tested idiom knowledge by asking participants to write down the idiomatic meaning (as in our study) and measured other-based familiarity. In contrast, other studies reported an estimate of participants’ own knowledge (i.e., their confidence) and measured familiarity in terms of subjective frequency (Bonin et al., 2013; Libben & Titone, 2008). Our results suggest that confidence may not necessarily be a reliable measure of the actual knowledge of an idiomatic meaning, which is better captured by asking participants to write the meaning down. Furthermore, idiom familiarity, conceptualized in terms of subjective frequency of exposure to an idiom, can provide a more reliable measure than other-based familiarity.

The perceived level of figurativeness of an idiom was negatively correlated with concreteness and semantic transparency; specifically, the more idiomatic a meaning was, the less semantically transparent and concrete it was rated. In sum, the meanings of most idiomatic strings were unrelated to the literal meaning of the constituent words, and predominantly conveyed abstract contents. Shorter idioms were perceived as being more familiar, but longer idioms provided more semantic information than shorter ones, facilitating the evaluation of idioms’ figurativeness as well as semantic transparency.

Semantic transparency was not correlated with familiarity (in line with Tabossi et al., 2011; but see Abel, 2003) or with idiom knowledge (unlike Tabossi et al., 2011). This seems to reflect the fact that the idiomatic meaning of known, familiar idioms is stored in semantic memory and retrieved regardless of the fact that we detect a clear relationship between the component word meanings and the global figurative interpretation of the string (Bonin et al., 2013; Libben & Titone, 2008; Titone & Connine, 1994b). It should also be mentioned that the notion of semantic transparency is hard to capture, varies across participants, and often reflects a post-hoc attribution of a link between the idiomatic meaning that we have already apprehended and the individual words (Cacciari, 2014).

Ambiguous idioms were overall less emotionally salient (i.e., were rated as less valenced and arousing), less correctly defined, and more linked to concrete, sensory-based information than were unambiguous idioms. Since people cannot bypass the meanings of the constituent words en route to accessing (or generating) the idiom’s figurative meaning, this might represent a source of possible interference in ambiguous idioms, leading to wrong meaning definitions. However, when the correct meaning of ambiguous idioms is known, this may evoke a more intense emotional response with a more direct link to sensory domains.

Finally, this descriptive study provides a useful tool for researchers interested in exploring the relationships between figurative language and affect using German figurative expressions with empirically determined variables. To our knowledge, this is the first descriptive study on idioms that provides ratings for affective variables, beyond other psycholinguistic variables. It also shows high reliability—that is, internal consistency. In addition, variables such as concreteness and figurativeness were not tested in previous idiom norms, although they have been shown to correlate with affective and psycholinguistic properties of idioms. Thus, all of these variables should be taken into account when designing experiments on idiom processing.