Introduction

Arbitrariness is a core property of natural language in that most words tend to bear no obvious relationship to their referents (Saussure, 1959; Hockett, 1960). For example, there is nothing red about the word red, and the word big is itself rather small. However, non-arbitrary links between the forms of words and their meanings are not unknown in natural language. Widely referred to as sound symbolism (Hinton, Nichols & Ohala, 1994), the potential for words to ‘naturally’ denote their meanings was described as early as Plato’s Cratylus dialogue (Reeve, 1998), and has been examined in both Psychology (e.g. Werner, 1957; Werner & Wapner, 1952; Marks, 1996) and Linguistics (e.g. Sapir, 1929; Jesperson, 1933; Newman, 1933; Brown, Black & Horowitz, 1955; Nuckolls, 1999; Imai, Kita, Nagumo & Okada, 2008; Nygaard, Cook & Namy, 2009). This paper focuses on experimental approaches to one particular type of sound symbolism: associations between non-words and abstract shapes. These word–shape associations are most well known through a phenomenon often called the bouba–kiki effect (Ramachandran & Hubbard, 2001). In this effect, participants show striking agreement in their preferred labels for shapes in forced-choice naming tasks. In Ramachandran and Hubbard (2001, 2005), for example, American college students were asked to label a spiky shape and a rounded shape using either the words bouba or kiki (see Fig. 1).

Fig. 1
figure 1

Abstract shapes from Ramachandran and Hubbard (2001). Subjects tend to match the spiky shape (left) to the name kiki and the rounded shape (right) to the name bouba

Up to 98 % of respondents chose the word kiki for the spiky abstract shape, and bouba for the rounded abstract shape, and such biases have been reported using a range of variations on this paradigm dating back almost a century (Fischer, 1922; Uznadze, 1924; Köhler, 1929, 1947; Fox, 1935; Irwin & Newland, 1940; Westbury, 2005; Maurer, Pathman & Mondloch, 2006; Ahlner & Zlatev, 2010; Nielsen & Rendall, 2011, 2012; Aveyard, 2012; Parise & Spence, 2012).

The bouba–kiki effect was first established in the 1920s by Köhler (1929, 1947) using the non-words takete and maluma, to label spiky and rounded shapes, respectively (Fig. 2). Köhler noted an overwhelming preference for this pattern of naming, reporting that “most people answer without hesitation” (Köhler, 1947, p. 224). Indeed, Köhler takes this association as so obvious, he never explicitly states which shape matches with the word takete and which with maluma (Köhler, 1929, 1930, 1947). This finding has been cited and replicated repeatedly during the 20th and 21st centuries, and the effect is generally accepted as a robust, pervasive, shared cross-sensory bias to pair linguistic sounds and visual form. The effect has been found repeatedly in the explicit labelling of shapes with non-words (e.g. Ramachandran & Hubbard, 2001; D’Onofrio, 2013), and also with measures such as learning accuracy (e.g. Nielsen & Rendall, 2011; Monaghan, Mattock, & Walker, 2012) and facilitated processing of congruent pairings (e.g. faster reaction time to bouba paired with a rounded shape; Parise & Spence, 2012; Kovic, Plunkett & Westerman, 2010; Westbury, 2005).

Fig. 2
figure 2

Maluma and takete shapes originally used by Köhler (1929, 1947)

As noted above, explanations of the bouba–kiki effect are most often phrased in terms of iconic cross-sensory mechanisms. Broadly, they suggest the mechanism which underlies the effect is a process of matching properties inherent in the sound form of non-words or their motor articulations directly to properties of the abstract shapes. For example, the matching of spiky shapes to the word kiki is described by Ramachandran and Hubbard as mapping shapes to the “sharp phonemic inflections of the sound kiki” (Ramachandran & Hubbard, 2001, p. 19). Ramachandran and Hubbard extend their explanation by suggesting the phonemic sounds of words might map to the listener’s articulatory motor representation (via mirror neurons; Rizzolati & Craighero, 2004), and that this knowledge of motor movements might then map onto a shape. This account still remains a fundamentally iconic, cross-sensory one, linking the proprioceptive knowledge of linguistic sound to visual features. Similarly, Kovic and colleagues (2010) describe the effect in terms of the “round sounding” or “sharp sounding” articulation of phonemes (e.g. dom as “round sounding” and shick as “sharp sounding” in their particular materials; Kovic et al., 2010, p. 22). In summary, both types of accounts consider iconic cross-sensory mechanisms as key in the bouba–kiki effect: either from sound to shape directly, or via articulatory proprioception.

In the current article, we challenge the notion of a purely iconic cross-sensory account of the bouba–kiki phenomenon. Instead, we suggest that in literate participants in particular, the phenomenon is heavily mediated by the symbolic, culturally acquired shapes of letters. The similarity between orthography and the abstract target shapes can be seen in Fig. 3, in which we superimpose the letters B and K onto the ‘bouba-preferred’ and ‘kiki-preferred’ abstract shapes, respectively. We show that the bouba–kiki effect among literate participants (who make up the majority of subjects previously tested in the literature) is predominantly mediated not by matching properties of a non-word’s sound to properties of a shape, but by mapping letter shape in the written form of a non-word to an abstract shape. In this way, spiky abstract shapes are matched to non-words containing angular letters, and rounded shapes matched to non-words containing curved letters—regardless of the acoustic or articulatory properties of the non-words.

Fig. 3
figure 3

Orthographic similarity between the graphemes B and K and the shapes that tend to be named, respectively, bouba and kiki

Although many investigations of the bouba–kiki effect have tacitly acknowledged the potential for orthography to mediate responses, few studies have actually examined this in detail and given due consideration to its potential to confound results. The few studies that do mention orthography tend to claim it does not play a role in the bouba–kiki effect. Evidence for this claim has thus far come from three distinct areas of study: examining the bouba–kiki effect in pre-literate children, examining the bouba–kiki effect cross-culturally, and controlling methodologically for the potential influence of orthography. Before presenting our own study, we review these three types of study below.

Non-literate children, cross-cultural studies and methodological balancing

One way to assess the role of orthography in the bouba–kiki effect is to examine it in children who are not yet literate. If children show the bouba–kiki effect before acquiring knowledge of orthography, this would suggest the effects must be driven by associations between sound properties of the words and visual properties of the shapes. Early research was inconclusive in this regard. The earliest bouba–kiki study with children, Irwin and Newland (1940), found no bouba–kiki effect in children under 6 years; however, literacy levels were not clearly reported. More recently, Maurer et al. (2006) did show the bouba–kiki effect in children age 2:8 years, who had “not yet mastered the correspondence between the sound of letters and their grapheme” (p. 317; see also Spector & Maurer, 2013). On this basis, it might initially appear that the effect is mediated by sound to shape associations, and certainly independent of influences of orthography. However, there are two possible issues with this interpretation.

Although children at 2:8 years may not have mastered all sound–letter correspondences, we have found they may yet have acquired these sufficiently to leverage letter shape in engaging in a bouba–kiki task. A number of children within this age range are able to visually identify the letter B, the letter K, or both, and this corresponds roughly with their success in a bouba-kiki task (Cuskley, 2013a). In other words, well before children are fully literate, they are, to some extent, graphemically aware, and this awareness might be sufficient to mediate basic word–shape associations, particularly in a forced choice task. Second, although the children tested by Maurer et al. may have been graphemically unaware, their attention was purposefully and overtly directed towards the mouth of the experimenter during non-word articulation so they could clearly observe the rounding of the lips in bouba-type non-words. As Maurer et al. point out themselves in the closing lines of their study, it is not possible to “disentangle whether the child matched the sound to a shape based on its sound, [or] the shape of the experimenter’s lips as she spoke the word” (Maurer et al., 2006, p. 321). In other words, although the results may not have been driven by letter shape, it could have been driven by another visual matching process, in this case between lip shape and abstract shape. Either way, this may not be evidence for a purely iconic, cross-sensory ‘sound symbolic’ phenomenon, a fact clearly outlined by Maurer et al. (2006) but often overlooked in subsequent interpretations of the study.

However, there is evidence for other forms of cross-modal associations in much younger infants (e.g. for shape and pitch, Walker et al., 2010; or size and vowel quality, Peña et al., 2011), and a more recent study ostensibly shows the bouba–kiki effect specifically in a small sample of infants. Ozturk et al. (2013) used a preferential looking task (Teller, 1979) and showed that that four-month-old infants apparently gazed significantly longer at “incongruent” word–shape pairings (e.g. the syllable ki playing over a rounded shape) than “congruent” pairings (e.g. the syllable ki playing over a spiky shape). However, Fort, Weiß, Martin and Peperkamp (2013) failed to replicate this result using a similar preferential looking paradigm, a larger sample infants, and more diverse stimuli.

In all, the results of bouba–kiki studies with pre-literate children are decidedly mixed. The earliest study, Irwin and Newland (1940), reports no significant word–shape associations prior to the age of six. A more recent, systematic study has shown the effect in children as young as 2:8 years (Maurer et al., 2006); however, there are at least two viable visual matching mechanisms at work which could mediate the effect in this case: graphemic awareness and explicit mouth shape. Finally, only two recent studies have examined the effect in infants, with the second failing to replicate the first (even using more stimuli and a larger sample). In summary, a central influence of literacy on word–shape associations could explain why the effect is difficult to find in pre-literate children.

A second way to test whether orthography plays a role in the bouba–kiki effect would be to test adult participants who are either non-literate, or who use diverse writing systems. However, virtually every investigation of the bouba–kiki effect has described studies conducted with literate adults who speak English or other Indo-European languages which use Roman orthography (Ramachandran & Hubbard, 2001, 2005; Nielsen & Rendall, 2011, 2012; Parise & Spence, 2012; Aveyard, 2012). A small handful of studies have taken a wider cultural scope, but until recently (see Bremner et al., 2013), such studies either lacked detail (Uznadze, 1924), suffered from interference from the Roman alphabet (Davis, 1961), or report not having found the effect at all (Rogers & Ross, 1975; see Cuskley 2013a for a review). For example, although it has been widely reported that Köhler (1947) showed a bouba–kiki effect with a non-literate population from Tenerife, our own extensive search has yielded no detailed original report of this study at all (described in Cuskley, 2013a; Simner, 2011; repeated by Bremner et al., 2013). The recurrent mention in the literature of an apparently missing experiment may have contributed to an overestimation of the cross-cultural strength of the effect.

A report of the effect by Davis (1961) among Swahili speakers in central Africa is also widely used to argue against orthographic influence in the bouba–kiki effect. However, key facts about Davis’ procedure are often overlooked. For example, the study used Roman alphabet to elicit participants’ responses (the Roman alphabet is used for written Swahili, and participants were directed to write down their responses), providing the potential for orthographic influence in the study (Simner, 2011; Cuskley, 2013a). A later study testing the Songe of Papua New Guinea—who were likely illiterate—reported not finding the effect at all (Rogers & Ross, 1975). Finally, one other study showing similarities between speakers of Urdu and English in an adapted version of the bouba–kiki task (O’Boyle, Miller, & Rahmani, 1987) might also be accounted for by orthographic influence. Although Urdu has its own non-Roman script, it also has a Romanised version (Roman Urdu), and there is no information about whether participants were familiar with this. Perhaps more importantly, these Urdu speakers had all been resident in the United States for some time prior to testing (at least 6 months and up to two years), and so would likely have had reasonable knowledge of the Roman alphabet, at least sufficient to constitute graphemic awareness.

Only one recent study (Bremner et al., 2013) has unambiguously shown the effect among a non-literate, non-western culture in Namibia, using the procedure from Ramachandran and Hubbard (2001). This study shows the effect existing without the apparent influence of orthography, indicating that iconic cross-sensory (sound to shape) mechanisms certainly have the potential to play a role in the bouba–kiki effect. Yet, there is still little information regarding exactly what role orthography may play for the large majority of literate subjects who have been tested in the broader literature. It is interesting to note, for example, that Bremner et al. (2013) found a lower incidence of the effect (82 %) among Namibian subjects, compared to the 95–98 % of (literate) Americans reported in Ramachandran and Hubbard (2001, 2005). Lastly, the non-word stimuli of Bremner and colleagues were limited only to the words bouba and kiki. These stimuli are not systematically constrained in terms of their sounds: the words differ in terms of place of articulation, voicing, vowel quality, and reduplication. This is in marked contrast to other recent studies which have made broader efforts to examine specific properties of linguistic sound underlying the effect in a controlled way (e.g. D’Onofrio, 2013; Aveyard, 2012; Nielsen & Rendall, 2011; Ahlner & Zlatev, 2010; Westbury, 2005). Thus, Bremner et al. (2013) show that a non-literate population makes some broad word–shape associations, but it remains unclear which specific phonetic qualities underlie these associations.

Finally, the potential for orthographic confounds has also been addressed to some extent methodologically, but never fully investigated. Some studies have sought to avoid orthographic confounds by presenting stimuli only in the auditory modality (e.g. Nielsen and Rendall, 2011). However, orthographic information is immediately available to literate language users even during speech comprehension. In other words, phonological processing in literate subjects activates graphemic representations (Stone, Vanhoy, & Van Orden, 1997; Ziegler & Ferrand, 1998; Slowiaczek, Soltano, Wieting, & Bishop, 2003).

Nielsen and Rendall (2012) made the first attempt to systematically control for orthographic angularity. They found that non-words containing sonorants were preferentially paired with rounded shapes, versus obstruents (see also, Ahlner & Zlatev, 2010). To remove orthographic influence, Nielsen and Rendall (2012) capitalised the first letter in their non-words, thereby altering the curvature of some sonorants from rounded to angular (e.g. from m to M).

One final study attempted to rule out orthographic confounds in an entirely implicit task, where participants never engaged in explicitly using non-words to label shapes. Westbury (2005; see also Parise & Spence, 2012) used a ‘framing’ lexical decision paradigm to suggest a sound symbolic link between words and shapes. In this type of task, non-words are presented inside shape frames, and participants’ reaction times to the words are measured. Westbury (2005) found that responses in lexical decision were faster if written non-words containing stop consonants (e.g. kide) were presented in spiky rather than rounded frames (and vice versa for continuant consonants). The potential for an orthographic confound was addressed with a secondary test: participants indicated whether a target was a letter or digit, and were no faster for trials such as p in a rounded frame, nor for k in a spiky frame. Surprisingly, this suggests readers are unaware of the shape of letters in one task, while responding to the shapes of frames in another task. Nonetheless, these results would suggest that phonological effects can exist without orthographic influences. However, evidence for this has been found only in an implicit task (Westbury, 2005), while the role of orthography in explicit tasks—which form the overwhelming majority of the literature on the bouba–kiki effect—have not been conclusively explored.

In summary, despite a prevailing view in the literature that the bouba–kiki effect is driven by iconic cross-sensory mechanisms, the role of the culturally acquired sound-shape associations inherent in literacy remains unclear, at least in the commonly used explicit labelling task. Existing evidence for the effect in pre-literate children may be explained by visual matching strategies (partial orthographic matching via graphemic awareness, or lip-shape matching), and evidence for the effect in pre-lexical infants requires further study given the mixed results outlined above. The literature on bouba–kiki effects cross-culturally has historically been over-inflated, and attempts to avoid orthographic influences by methodological means attempt to explain away these influences rather than examining them. Only one recent study has definitively found the effect among non-literate adults (Bremner et al., 2013). Westbury (2005) shows that phonological effects may exist independently of orthography, at least in an implicit task. In our experiments below, we will assess whether phonological effects persist in more explicit paradigms when controlling for orthographic influences. Furthermore, we will ask whether phonological influences are in fact fully over-ridden by orthographic effects in written tasks, using the type of literate participant most often tested in the literature.

In our two studies below, we examine orthographic and phonological influences on object naming by presenting a rounded and a spiky abstract shape with a variety of non-words, and requiring literate, adult participants to rate the goodness of fit between each shape–non-word pair. We carefully chose non-words based not only on their phonological form but also on their orthographic angularity, and used a rating task rather than a more confined forced-choice task. By measuring explicit labelling with a continuous variable, we are able to ascertain if preferences are stronger for particular shape–non-word pairings, whereas in a classic forced-choice task, results are conflated (i.e. a strong kiki-spiky preference would manifest as a bouba-round preference automatically).

We aimed to make a detailed contrast of letter shape with letter sound. Our studies manipulated only the consonants, as previous studies have found stronger effects of consonants in shape–non-word associations than vowels (e.g. Nielsen & Rendall, 2012; but see Ozturk et al., 2013). We predict that non-words with curved letters will be matched with rounded abstract shapes, and those with angular letters with spiky abstract shapes. We also explore a possible phonological influence by contrasting voicing in our items. Voicing is an example of a contrast in sonority (Carr, 2012), a broad contrast underlying several different studies showing that shape–non-word associations are driven by contrasts in voicing (D’Onofrio, 2013), the stop/continuant distinction (Westbury, 2005; Aveyard, 2012), and obstruency (Nielsen & Rendall, 2012, 2013; Ahlner & Zlatev, 2010). We compare these orthographic and phonological factors using a written (Experiment 1) and auditory (Experiment 2) presentation of non-word items. This also allows for a test of earlier assumptions (e.g. Davis, 1961; Nielsen & Rendall, 2011) that auditory presentation avoids the influence of orthography. We predict that orthographic influences will be found in both modalities. Moreover, we predict that if phonological effects are found, they may appear in an auditory task, while written tasks are dominated by orthographic influences.

Experiment 1: word–shape associations in a visual/auditory task

Methods

Participants

Forty-one participants were opportunistically recruited from the University of Edinburgh community to perform a short pencil and paper task lasting approximately 10 min. All subjects were monolingual native English speakers.

Materials

We created eight non-words in a CVCV structure, in which the vowel was always e and only consonants were manipulated. Given earlier findings that vowels do not drive associations as strongly as consonants (Nielsen & Rendall, 2011), and given the lack of contrast in curvature among English vowel graphemes (i is the only angular vowel, but also the only high front vowel, producing a confound), we do not examine variation among vowels.

Our non-words were designed to contrast both orthographic and phonological features: orthographically, half our items were angular and half curved (see below). Phonologically, our items also contrasted systematically in terms of having voiced/voiceless consonants. Angular items contained consonants which included no curved lines, while curved items contained one or more curvatures within the consonant grapheme. We evaluated this objectively with a measure that considers the number of straight lines and the number of curved lines in each consonant grapheme (vowels are held constant and not considered). Using this method, the consonant graphemes in our curved items (s, f, d, g) contain two angular features and five rounded features, while those in our angular items (z, v, t, k) contain ten angular features and 0 rounded features. Within English orthography, there are only eight items that satisfy this crossing of orthographic angularity and voicing; for example, the voiced/voiceless pair /p/ and /b/ are both graphemes are curved.Footnote 1 Table 1 shows our full list of items and their orthographic and phonological features.

Table 1 Target non-words crossed by orthographic and phonological features

Procedure

Our words were presented using the Futura font, in which the letter t has no curvature. Each non-word was paired with both a rounded abstract shape and a spiky abstract shape (see Fig. 4, below). The task was presented to participants in a four-page booklet. Each page had two words which contrasted in terms of voicing and angularity, such that each of the pairs of words in Table 1 occurred on one page. The order of pages was counterbalanced across participants such that each word pair occurred in each position (first, second, third, and fourth page). Before each participant completed the booklet, the experimenter read aloud each of the non-words, instructing participants to attend to pronunciation (and where the grapheme e was consistently pronounced /ɛ/). We aimed to encourage participants to perform our task by considering the sounds and of words, rather than their visual form alone (e.g. if participants only saw the words and never heard them, this might force a visual matching strategy). Participants were directed to rate how well they thought each word matched with the shape accompanying it, using a 7-point Likert scale provided (where 1 = bad match and 7 = good match).

Fig. 4
figure 4

Shapes used for Experiments 1 & 2

Results

Participants’ ratings for each pairing were collapsed across similar items. Figure 5 shows means for each relevant word type graphically.

Fig. 5
figure 5

Results from Experiment 1. Bars represent standard deviation

A three-way 2 × 2 × 2 ANOVAFootnote 2 was performed (rounded/spiky shape × curved/angular orthography × voiced/voiceless consonant) and did not reveal any main effects [all F’s (1, 292) < 1, all p’s > 0.05], indicating that participants showed no overall preference for a particular shape or word type. An absence of a main effects is to be expected, since we did not anticipate our participants would prefer any one word type or shape over the others, given that they were tasked with rating goodness of fit for word–shape pairs.

There was a significant interaction between shape and orthographic angularity [F(1, 292) = 671.38, p < 0.001], indicating that items with curved graphemes were rated significantly higher with the rounded shape than with the spiky shape; likewise, angular graphemes were rated significantly higher with the spiky shape than with rounded shape (see Fig. 5). Moreover, a large effect size (\(\eta_{p}^{2}\) = 0.509) indicates that the interaction between grapheme angularity and shape roundedness accounted for over half of the variance in shape–word ratings of fit. There was no significant interaction between voicing and shape [F (1, 292) = 0.081, p (uncorrected) = 0.8], indicating that voicing was not a significant factor in rating fit between words and shapes. All other interactions were insignificant (all F’s < 1, all p’s > 0.05), including any three-way interaction between all factors.Footnote 3

A comparison to earlier bouba–kiki results and a careful consideration of mean ratings strongly supports the interpretation that orthography is the strongest influence on shape–non-word ratings. How participants prefer to pair non-words and shapes where sound effects have been found in previous studies shows that orthography provides a much better explanation than an interaction between voicing and stop/continuant status.

Previous bouba/kiki studies targeting particular sound features have found that voiced (D’Onofrio, 2013) and continuant (Nielsen & Rendall, 2012; Westbury, 2005) non-words items are more likely to be paired with rounded shapes (and conversely, voiceless and stop items with spiky shapes). Given this, we can predict exactly how results should look if some interaction between voicing and stop/continuant status were the mechanism underlying our results rather than orthography.

For example, if both voicing and stop/continuant status were influential but voicing was dominant, we should expect z/v to be rated most highly with the rounded shape (since it is both voiced and a continuant), followed by d/g and then s/z and finally k/t. On the other hand, if continuant status is dominant, z/v will still take the top spot in rounded shape ratings, but is more likely to be followed by s/f and trailed by d/g and k/t. (In each case, the reverse pattern would hold for the spiky shape). However, as Fig. 5 shows, our results find neither of these patterns. Instead, we find that d/g is rated most highly with the bouba shape, followed by s/f and then by z/v and t/k. This pattern cannot be accounted for by a reasonable interaction between voicing and stop/continuant status given what we know about the direction of these associations. Therefore, our results support an interpretation where orthographic angularity is driving the fitness between shapes and non-words in a written task.

Discussion

These results suggest that preferences in matching non-words and shapes in literate adults are driven primarily by orthographic angularity, particularly in explicit written tasks. Although participants were provided with the relevant pronunciations of words as well as the written forms, orthographic angularity was the only significant factor influencing ratings of fit between shapes and words. Words containing curved graphemes were rated more favourably with rounded shapes, and words containing angular graphemes were rated more favourably with spiky shapes. The influence of voicing and stop/continuant status alone were non-significant (i.e. there was no interaction between voicing or stop/continuant status and shape), suggesting that participants did not consider the sounds of words when rating their fit with abstract shapes. In Experiment 2, we repeat the task using a purely auditory procedure to determine orthographic effects remain, and examine if any phonological effects emerge.

Experiment 2: word–shape associations in an auditory task

Methods

Participants

Thirty-six participants were recruited from the University of Edinburgh community via our online student employment board, and were paid £1.50 for the 10-min task. These participants were paid since the computer-based nature of the task required them to travel to the lab.

Materials and procedure

Our materials and procedure were identical to Experiment 1, with the following exceptions. The eight target non-words were pre-recorded at studio quality with an even stress and pitch by a trained phonetician. The task was presented on a MacBook computer using a standard graphical user interface programmed in Tcl/Tk. Auditory stimuli were played through Bose Stenheiser PXC250 headphones at a constant volume. Each (rounded and spiky) abstract shape was presented on screen with an accompanying 1–7 Likert scale below it. A slider controlled by the mouse was used to manipulate the Likert scale. Each trial began with the participant hearing a word. Participants then rated how well the word matched with each shape using the Likert scale (from 1 = bad match, to 7 = good match). Participants could replay the non-word within a trial as many times as they wished. Participants submitted their rating with a mouse-click for each word, and this played a new word and re-set the Likert scales to the centre. This procedure was repeated for all eight words in a random order for each participant.

Results

Our data is shown in Fig. 6, below. As in Experiment 1, a three-way 2 × 2 × 2 ANOVA was performed (rounded/spiky shape × curved/angular orthography × voiced/voiceless consonant).

Fig. 6
figure 6

Results from Experiment 2. Bars represent standard deviation

As in Experiment 1, there were no significant main effects [all F’s (1, 257) < 3, all p’s > 0.05], indicating that no particular type of item was generally preferred; this result is expected since the task involved rating fitness between items. Two significant interactions were observed. First, a significant interaction between shape and orthographic angularity [F (1, 257) = 113.87, p < 0.001]Footnote 4 indicates that letter curvature influenced ratings as in Experiment 1, even in a purely auditory task. However, there was also a significant interaction between shape and voicing [F(1, 257) = 32.55, p < 0.001], indicating that voicing also played a role in matching words to shapes. Voiced items were rated more favourably with the rounded shape and voiceless items more favourably with the spiky shape. Estimates of effect size allow some additional comparison of these interactions. The interaction between shape and orthographic angularity accounted for more variance in ratings (\(\eta_{p}^{2}\) = 0.167) than the shape and voicing interaction (\(\eta_{p}^{2}\) = 0.054).Footnote 5

As with the first experiment, the ordering of ratings for each shape indicates that two-way interactions between (1) shape and orthography, and (2) shape and voicing, are the best candidates for genuine effects, rather than any other effects observed in post hoc analyses. First, this is due to the fact that interactions involving fewer factors provide a more parsimonious explanation. Perhaps more importantly, results from earlier bouba–kiki studies would predict that voiced continuants should be rated more highly with the rounded shape than voiced stops or voiceless continuants. Yet, our results show again that d/g and s/f again garnered the highest ratings for the rounded shape (rather than z/v followed by d/g). This indicates effects of voicing and orthography above stop/continuant status. However, stop continuant status did have a small effect in the expected direction in post hoc ANOVAS which included this contrast in lieu of orthography or voicing [F (1, 257) = 4.21, p = 0.04 uncorrected, p = 0.48 corrected]. However, this effect does not survive correction, and accounted for less than 1 % of the variance in ratings (\(\eta_{p}^{2}\) = 0.007).

Discussion

These results show that even in a purely auditory task, the curvature of letters in a non-word’s written form strongly influences associations between non-words and abstract shapes among literate adults. Non-words with curved orthography tended to be rated more highly with rounded shapes, and non-words with angular orthography tend to be rated highly with spiky shapes. There was also some phonological influence: rounded shapes were matched more strongly with voiced consonants and spiky shapes with voiceless consonants, while stop consonants were rated more highly with spiky shapes. Effects of voicing were still secondary to the influence of orthography; orthography accounted for more variance in shape ratings than voicing. There was a very small effect of stop/continuant status which did not survive correction, indicating that perhaps overall sonority drives associations where phonological factors are in play. Below we discuss the implications of our findings from both studies in the broader context of the bouba–kiki literature.

General discussion

We have examined a class of naming bias known widely as the bouba–kiki effect, in which shapes are preferentially labelled with certain non-words apparently ‘fitting’ to their referent in an iconic way (e.g. round shapes labelled bouba and spiky shapes labelled kiki). In the existing literature, this phenomenon has been overwhelmingly attributed to iconic cross-sensory associations; in other words, some natural goodness of fit between sound properties of words and shapes is taken to drive associations. Our contribution has been to investigate in detail if and how orthography plays a role in the bouba–kiki effect, and whether phonological features still hold sway when this factor is considered. We have suggested that the bouba–kiki effect in literate subjects might be predominantly mediated by the pairing of rounded shapes to words that contain rounded letters, and spiky shapes to words that contain angular letters. Our literature review showed that previous arguments against orthographic influence in the bouba–kiki effect were lacking. In the case of studies with illiterate adults, mis-representation of early research led to a tendency to brush off the potential for orthographic influence, although Bremner et al. (2013) have now provided a more definitive study albeit with a confined stimuli set. Studies with pre-literate children have been few: those involving explicit labelling have procedural confounds which may allow visual matching strategies (Maurer et al., 2006), and results from implicit preferential looking with infant subjects have been mixed (Ozturk et al., 2013; Fort et al., 2013).

Experiment 1 tested word–shape associations by asking literate adult participants to rate how well non-words matched to abstract shapes. Non-words were presented in written form but accompanied by spoken representations, to avoid forcing a visual strategy and to ensure that sounds of the non-words were interpreted consistently. We found that orthographic angularity was the sole significant factor influencing ratings: participants overlooked phonological features and matched words containing angular letters to spiky shapes (and words with curved letters to rounded shapes). Experiment 2 presented the same task in a purely auditory form, and still showed a strong influence of orthography on ratings of fit between non-words and shapes. Experiment 2 also showed a weaker phonological effect: rounded shapes were preferentially paired to words with voiced consonants and spiky shapes were preferentially paired to words with voiceless consonants. Our studies were able to capture subtle phonological effects only through the measurement of scalar goodness of fit between non-words and shapes, rather than a more traditional forced-choice task. In summary, although the dominant strategy was matching visual features in graphemes to abstract shape forms, we also found a more modest influence of phonology in auditory non-word/shape ratings. Together, these results show that symbolic, culturally acquired associations between letter shape and sound form the primary driving force among literate participants in a bouba–kiki task, while iconic associations between sound and shape constitute a weaker force which disappears entirely in a written task.

In light of these results, we would argue that the bouba–kiki effect in literate Western participants—in other words, the majority of reports in the literature—is not the strongest evidence for cross-sensory sound symbolism. Rather, the bouba–kiki effect in particular is heavily mediated by simple visual matching strategies which leverage similarities between shapes and letters. In other words, a culturally acquired, symbolic cross-modal association between linguistic sound and letter shape plays a strong role in the task. We have argued that certain studies purporting to have ruled out orthographic influences may not have succeeded in this, when testing non-literate children (who may in fact have been partially graphemically aware), testing subjects cross-culturally (who may in fact have been familiar with the Roman alphabet), or attempting to factor out orthography methodologically (but where confounds nonetheless still existed).

While it seems highly likely that iconic cross-sensory associations between linguistic sound and shape angularity exist (e.g. as demonstrated in Bremner et al., 2013; Ozturk et al., 2013), our data strongly suggest that at least in literate subjects familiar with the Roman alphabet, acquired orthographic knowledge overshadows more basic cross-sensory associations. While in some cases this influence may tend to reinforce iconic cross-modal associations (e.g. /d/ is voiced and round), in other cases orthographic influence may override more basic associations (e.g. although /z/ is voiced and continuant, it is reliably rated highly with spiky shapes).

Our results can inform future bouba–kiki studies in three important ways. First, our written task shows that auditory presentation cannot eliminate potential effects of orthography. Second, future studies should aim to steer away from single-trial forced-choice methodologies modelled after the original taketemaluma study (Köhler, 1929), as they may be unable to separate orthographic and phonological effects. Indeed, several recent studies have moved in this direction (e.g. Nielsen & Rendall, 2012; D’Onofrio, 2013). Finally, our data show that the bouba–kiki effect in literate subjects is driven primarily by orthographic angularity, which completely obscures iconic cross-sensory effects in a written task, and largely overshadows more modest phonological influences even in an auditory task. Extensions to this finding could use other alphabetic systems or different mappings of sound and letter shape found in other languages using the Roman alphabet to make an increasingly detailed study of the relative contributions of phonology and orthography in the effect.

The nature of the influence of literacy in word–shape associations may even go beyond visual orthographic form. Specifically, the level of phonemic awareness necessary to access specific phonological features may be a consequence of literacy. Literacy has significant effects on meta-linguistic awareness, particularly phonological awareness. Phonological awareness is the conscious access to individual segments in a language, and is drastically enhanced by learning an alphabetic letter system. Lukatela, Carello, Shankweiler, and Liberman (1995) used a phoneme monitoring task to demonstrate this: participants listen to words and must identify the total number of sounds within the word. Lukatela et al. (1995) found that illiterates are significantly less accurate in this task, showing that their phoneme awareness is not as fine-tuned as the phoneme awareness of literates (see also Cheung, Chen, Yip Lai, Wong, & Hills, 2001; Cheung & Chin, 2004). This may mean that illiterate participants respond to whole word form more than consonant or vowel features in isolation. Indeed, Ward and Simner (2003) have shown that phonemic awareness plays a role in another cross-sensory phenomenon, lexical gustatory synaesthesia, with similar phonological features inducing similar tastes across words.

This interpretation is supported by the fact that there is still no cross-cultural evidence regarding associations between shapes and specific features of sounds. Bremner et al.’s (2013) study used only two non-words which differed along several phonological features (voicing, vowel quality, reduplication), so it is difficult to tell whether their participants responded to whole word form, or specific sound features. Furthermore, Ozturk et al. (2013) study failed to find preferential looking effects in infants when varying minimal properties of non-words—infants only demonstrated looking preferences when words varied in terms of the vowel and consonant, but neither in isolation (i.e. they showed the effect for kiki vs bubu, but not kuku vs bubu; Fort et al., 2013). In summary, it may be that associations between specific phonological features and shape are only possible with the enhanced phonemic awareness that comes with literacy. Without this, the evidence suggests, participants may make shape–non-word associations, but will respond more to the gestalt word form, making it difficult to identify iconic associations between very specific phonological features and shape.

A fuller understanding of how phonological and orthographic influences interact in the bouba–kiki effect, or to what extent orthographic influences dominate phonological ones, is an issue for further study. Some authors have suggested that sound–shape correspondences may be borne out in alphabetic systems themselves (Koriat, 1977) in that letters depicting articulatory “round” sounds (i.e. bilabial sounds or rounded vowels) tend to be more curved. This is evident in the persistent difficulty in choosing non-words which contain consonants varying only in terms of their sound without co-occurring contrasts in orthographic angularity. This consideration limited the number of non-word items examined in our own studies, which were heavily constrained to maintain distinct phonological and orthographic contrasts, all-the-while working within the confines of English. One area for future study would be to extend Koriat’s (1977) work to examine the extent to which alphabetic scripts may in fact reflect basic word–shape associations, which could mean orthography itself leverages and in turn reinforces such associations.

What is clear from our studies and previous examinations is that language users take cues from the word form—visual and/or acoustic—and respond to these cues when deducing the meanings of words (e.g. Berlin, 1994; Imai et al., 2008). The bouba–kiki studies reviewed in this paper have played an important role in revitalising interest in the question of arbitrariness in language (e.g. see Inglis-Arkell, 2010; Robson, 2011). Previous studies such as Ramachandran and Hubbard (2001) have made useful contributions in bringing this phenomenon to light, as well as inspiring a wider literature examining cross-sensory naming biases more generally (e.g. in terms of taste, Simner, Cuskley, & Kirby, 2010; Gallace, Bochin, & Spence, 2011; and motion, Cuskley, 2013b; or from an evolutionary perspective, Cuskley & Kirby, 2013; Cuskley, 2013a).

Many open questions remain surrounding the bouba–kiki effect, regarding what relationship the effect has with other sensory phenomena such as synaesthesia (e.g. see Cuskley & Kirby, 2013), and the relative contribution of higher order processes such as analogical reasoning and metaphor interpretation (e.g. Marks, 1996). Our studies have shed light on the potential mediating influences of symbolic, culturally acquired cross-modal associations, such as those inherent in learning an alphabet. This can provide a greater understanding regarding the role of acquired associations between linguistic sound and letter shape in the bouba–kiki effect, illuminating another corner of the range of factors which play a potential role in naming biases more generally.