Introduction

Languages and scripts are highly complex systems that vary in terms of the devices and mechanisms that are available to encode our thoughts, ideas, perceptions and experiences. Each language has a unique system of lexical and grammatical devices that carve up or partition reality in different ways. Some languages have rich rhetorical choices for representing particular sensory experiences. For example, Lao speakers have codability (more consistently nameable categories) for certain odours but have fewer terms for colours than English speakers (Majid et al., 2018). Similarly, scripts that are used to write these languages have unique visual features that encode aspects of the particular language. Of particular note is that learning to read builds on this primary language system. From an early stage in development, children learn to speak and have knowledge and understanding of their spoken language(s). At a later stage when children learn to read, they gradually learn how the visual features of the script interface with their spoken language.

Script relativity is an extension of the extensively debated linguistic relativity hypothesis. According to the traditional Whorfian or linguistic relativity hypothesis, speakers of different languages have different ways of conceptualising or perceiving the world due to linguistic variation (Whorf, 1956). A relatively weaker version adopts the view that distinctions in a language that are grammaticized, obligatory, or habitually used have a channelling effect on attention towards particular functions of these forms while actually speaking or preparing to speak (Berman & Slobin, 1994; Slobin, 1996, 2003; Strömqvist & Verhoeven, 2004). This version has been termed thinking-for-speaking (Slobin, 1996). Aligned with the thinking-for-speaking view, Levelt’s (1989) speech production model proposes that language-specific demands on the formulation of messages shape the preparation of encodable messages even before the activation of specific lexical items. Thus, according to this model, there are early detectable differences in attention allocation in speakers of distinctive languages as they prepare to describe an event or action. An alternative, stronger Whorfian perspective is that there are nonlinguistic, longer-term cognitive consequences of this linguistic diversity. Notably, advances in technology are providing a more systematic and scientific approach to investigating these language-on-cognition effects. There has been some very influential research that uses the latest methods and technologies, including neural imaging and eye tracking, to investigate language-on-cognition effects (e.g.,Athanasopoulos & Casaponsa, 2020; Athanasopoulos et al., 2016; Lupyan et al., 2020; Maier & Abdel Rahman, 2018; Majid et al., 2018; Meyer et al., 2019; Papafragou & Grigoroglou, 2019). Eye tracking technology has revealed that preparing to describe motion events can lead to distinct shifts of attention in languages that vary in terms of manner- and path-salience. In satellite-framed languages (e.g., English, German), manner is typically encoded in the verb (e.g., run) and path is expressed using a variety of other satellite devices (e.g., in or out). In contrast, in verb-framed languages (e.g., French, Spanish), path tends to be encoded in the verb (e.g., entre, sortir) and manner is expressed using additional lexical devices (e.g., entre en courant ‘enter while running’). Languages are considered to fall on a continuum in relation to manner- and path-salience (Soroli & Verkerk, 2017). Greek, for example, has satellite- and verb-framed characteristics (Soroli & Verkerk, 2017; Talmy, 2000), where both patterns are equally frequent. Tracking of eye movements has revealed that when preparing to describe short animations depicted (e.g., person skating towards a snowman) on a computer screen, Greek speakers tended to attend to manner of motion (e.g., skating) while English speakers focused on the path endpoint (e.g., the snowman) (Papafragou & Grigoroglou, 2019; Papafragou et al., 2008). A factor that plays a role in language-on-cognition effects is the varied allocation of attention due to contrasting linguistic features in different language domains, which in turn may affect other observable cognitive processes.

Recently, the linguistic relativity hypothesis has been applied not only to languages spoken but the diverse scripts used to represent those languages. Pae (2020) has referred to these script-on-cognition effects as the ‘script relativity hypothesis’ (i.e., the script we read in affects our cognition). The aim of this relatively new area of research is to examine variation in writing systems and if these differences affect cognition, and if so, in what way. There is enormous variation in the characteristics exhibited by the different writing systems of the world. In order to capture this diversity, Daniels and Share (2018) have developed an inclusive, multiple-dimensions-of-complexity framework that accounts for the enormous variation in the full spectrum of writing systems rather than just the more traditionally studied European orthographies. This framework includes characteristics such as spatial arrangement and nonlinearity, historical change, spelling constancy, omission of phonological elements, allography, dual-purpose letters, ligaturing, visual complexity and inventory size (Daniels & Share, 2018; Share & Daniels, 2016). According to Daniels and Share (2018), the unique characteristics and complexity of a script can be characterised in terms of a combination of some or all of these dimensions.

In this review, two areas of variation in (1). spatial arrangement and layout of the script and (2). presence or omission of phonological elements namely lexical tone will be focused on. Both prominent script features are strong candidates for potentially producing script relativity effects. In reference to spatial layout of scripts, interword spacing and linearity-nonlinearity configuration will be examined. Roman script has a serial or linear configuration, where the text is read from left-to-right, whereas other scripts have nonlinear configurations (e.g., Thai, Sinhala, Devanagari, Kannada and Korean Hangul) as characters or letters may occur above or below the main text line or in blocks. There may be heightened visuo-perceptual abilities due to reading densely crowded nonlinear scripts in comparison to linear scripts (e.g., Roman script).

In many languages around the world, lexical tone plays a crucial role in both spoken and written language (Yip, 2002). When learning to read Chinese characters (我‘I’), there is no explicit indication of tone represented by the orthography but there is in pinyin (wǒ ‘I’) or when learning to read Thai (แม่ ‘mother’), Lao (แม่ ‘mother’), Myanmar ( ‘grandpa’), or Vietnamese (tuần ‘week’). Tonal languages vary in terms of both their relative complexity and whether they orthographically encode this feature in their script. This variation may produce differences in sensitivity to tone perception and auditory perceptual skills between readers of tonal languages that orthographically encode tones and readers of tonal languages that do not orthographically encode tones.

The question posed here is whether the habitual experience of reading a script with a particular spatial layout or with variation in lexical tone differentially affects attention, and in turn, other cognitive processes. We can draw on the wealth of research that has been conducted on linguistic relativity and apply that expertise and knowledge to script relativity. There may be a relatively weak effect such that differences due to script variation occur only while an individual is engaged in the process of reading, or alternatively, a stronger version may emerge resulting in longer-term nonlinguistic cognitive consequences.

We will examine some of the distinctions between language and reading prior to evaluating empirical evidence that is relevant to script-on-cognition effects due to spatial layout and lexical tone variation. As this is a relatively new area of research, methods used in the studies reviewed are not designed to examine this specific issue of script-on-cognition effects.

Comparison of language and reading

One of the critical distinctions between the acquisition of language and the act of reading is that the human brain has adapted and evolved over time for language processing whereas reading is a relatively recent cultural invention, and consequently, does not have a region of the brain associated with that function. Instead, the processes involved in reading have been found to piggyback onto pre-existing regions of the visual cortex that are typically used for recognising objects and faces (e.g., Dehaene, 2005; Dehaene & Cohen, 2007; Dehaene et al., 2005, 2010, 2015; Dehaene-Lambertz et al., 2018). Moreover, most children who are not neurologically or hearing impaired learn to speak at about the same age in the very diverse languages of the world. In contrast, reading and writing need to be directly taught to the majority of children. It builds on the child’s knowledge of their spoken language and is thus a secondary system in relation to language. Furthermore, written text is inherently linked to the phonology and meaning of the particular spoken language. Literacy develops at different rates depending on the depth of the orthography a child is learning to read.

From an early stage in development, language begins to shape children’s cognitive processes. Infants gradually transition from being universal listeners to tuning into the language-specific speech categories of their language during the first year of life (e.g., Eimas et al., 1971; Kuhl, 2011; Werker & Tees, 1984). Subsequently, children learn how their language maps onto different aspects of their experienced world. In fact, children learn how their language partitions the semantic domain from a very early stage of development (e.g., Bowerman & Choi, 2003; Choi & Bowerman, 1991; Choi et al., 1999). Children tune into the obligatory or habitually used categories of their language and represent events in the style of their particular language from an early age and in a typical way that their language encodes experience (Bowerman & Choi, 1994; Choi & Bowerman, 1991; Choi et al., 1999). Thus, the child learns ‘language-specific patterns of thinking for speaking’ from a very early stage of development (Slobin, 1996, p. 77).

Similarly, when learning to read a particular script, albeit at a later stage in development, attention needs to focus on how the visual features of the particular script interface with the spoken language, and what aspects of the spoken language are explicitly encoded by the script. The overall aim is to extract information and discern the meaning from what is written in that script on the page or screen (Perfetti & Dunlap, 2008). The numerous orthographies around the world substantially differ in how they achieve this and what aspects of the language are encoded by the written form. Thus, in order to effectively read and comprehend a text, readers need to tune in and attend to the distinctive visual features of the script that interface with their language. Thus, attentional resources are differentially allocated dependent on the specific characteristics or features of the orthography and the role that they play in forming a coherent mental representation of the words or text. Over time, the habitual process of reading a script leads to an automatic lexical level of processing of visual words in children and adults, as illustrated by the interference effects found in the classic Stroop paradigm (Stroop, 1935), where participants name the font colour of the word but are told not to read the actual colour word (e.g., RED in green font). This habitually learned behaviour may lead to substantial variation in cognitive processing in scripts with distinctive features. Just like linguistic relativity that postulates that habitual language use results in a unique set of habitual thought and thinking patterns, habitual reading of a particular script may have comparable effects on cognition and mental processes. The fact that research has shown that becoming literate has profound effects on the brain and associated neural and cortical networks (e.g., Dehaene, 2011; Dehaene et al., 2010, 2015; Dehaene-Lambertz et al., 2018; Huettig et al., 2018), lends support to this possibility. Learning to read, even in illiterate adults, restructures and rewires the brain to accommodate the particular writing system. Common and script-specific regions of the brain are activated when reading different scripts driven by the demands of the script and its language (e.g., Bolger et al., 2005; Kumar & Padakannaya, 2019; Paulesu et al., 2000; Seghier et al., 2014; Sun et al., 2011).

Spatial arrangement or layout of scripts

In relation to the contrasting spatial arrangement or layout of scripts (Daniels & Share, 2018), robust effects of reading or writing direction (i.e., left-to-right, right-to-left, up-down) on nonlinguistic cognition have been established. There has been ample research demonstrating the effects of reading or writing direction on nonlinguistic cognition, for example, in numerical cognition tasks (e.g., Azhar et al., 2020; Göbel, 2015; Singh et al., 2000), spatial and scanning biases in drawing (Faghihi et al., 2018, 2019; Padakannaya et al., 2002; Tosun & Vaid, 2014; Vaid, 1995) and aesthetic preference biases (Friedrich & Elias, 2016). Other script-specific contrasts in spatial layout are unlikely to have such an impact as the overt behaviour of reading or writing direction.

Reading with and without interword spaces

Some Asian scripts such as Thai, Chinese, Japanese, Lao, Khmer, Balinese, Sundanese, Tibetan and Myanmar do not have salient interword spaces that demarcate word boundaries, that is, where a word begins and ends. Historically, interword spaces were not introduced into alphabetic scripts until about the 8th to tenth centuries (Manguel, 1996; Saenger, 1997). Some European languages such as Finnish and German still have relatively long unspaced compound words (e.g., Windschutzschiebenwischer ‘windscreen wipers’) (see Inhoff et al., 2000, for research on German compound words). The importance of interword spaces, as illustrated by the current Roman script text, is that they form clear parafoveal word segmentation cues so that eye movements (saccades) can be readily targeted close to the centre or Optimal Viewing Position (OVP) (O’Reagan, 1990) of the next word to be read. The lack of these salient visual word segmentation cues in scripts implies that during normal reading there is a degree of ambiguity in relation to which word a given letter belongs to (an example to illustrate this difficulty: interwordspacesserveanimportanfunctioninromanscript). Furthermore, there is an additional load due to perceptual crowding in the relatively denser, more tightly knit text. This can be particularly challenging for beginning readers (Pan et al., 2020). Typically, when children are learning to read scripts without these salient visual cues, interword spaces are inserted but by the second year of learning to read, children have to learn alternative, script-specific means of segmentation. Adult readers of these scripts have had a lifetime of reading text without spaces and so have adapted to reading in this format. This is the normal layout that readers are accustomed and habituated to reading. In scripts without these salient segmentation cues, other orthography-specific cues need to be identified and utilised by the reader to segment words. Thus, there is an additional in-built step or process involved in reading an unspaced script, which involves demarcating where words begin and end using alternative script-specific segmentation cues. If we consider the different demands of reading scripts without interword spaces with those that do, somewhat different cognitive processes emerge. These vary dependent on the particular unspaced script too.

Eye tracking technology offers an informative tool to investigate the underlying cognitive processes and mechanisms involved in reading scripts with and without interword spaces (Rayner, 1998). Using this technology enables us to monitor eye movement measures (e.g., first fixation, gaze duration, total fixation time) and to gain a detailed picture of how word processing and reading unfolds over time (Juhasz et al., 2005). Not surprisingly in skilled Roman script readers, who have had a lifetime of experience of reading spaced text, when these salient boundary cues are removed, both eye movements and word recognition are substantially disrupted (Morris et al., 1990; Rayner et al., 1998; Spragins et al., 1976). Reading is typically slowed down by 40 to 70%, disrupting both the way the eyes move (saccades) through the text and the word recognition process (Rayner et al., 1996; Rayner et al., 1998). This effect has been found to be more deleterious when reading relatively unfamiliar or low-frequency words (e.g., aorta) in the unspaced condition than when reading length-matched relatively familiar or high-frequency words (e.g., party). This lexical frequency effect is interpreted as signifying that removal of spaces interferes with the word recognition process. Eye movements are also disrupted as indicated by substantial changes in where the saccades land on the target words. Readers in normal spaced text typically land a bit to the left of the middle of the word (the Preferred Viewing Location: PVL), whereas when spaces are removed, they tend to land close to the beginning of the word. Readers tend to adopt a more cautious approach when reading unspaced text with additional forward and regressive eye movements that are shorter in length (Mirault et al., 2019). Typically, readers have longer fixations on words when reading unspaced text, which is likely due to the absence of familiar word initial and ending cues. This research on reading in Roman script highlights the important function that spaces serve when reading this particular script.

In order to investigate the cognitive processes involved in reading a script that does not normally have these highly salient visual segmentation cues, Winskel et al. (2009) examined the eye movements of Thai-English bilinguals when reading both Thai and English with and without interword spaces. Thai has an alphabetic script that is read from left-to-right. As in previous studies on Roman script, the frequency of critical target words in the experimental sentences was manipulated, as word frequency is related to the ease or difficulty of processing a word (Radach & Kennedy, 2004; Rayner, 1998). When reading Thai, results revealed that the movement of the eyes through the text was not affected, as first fixation durations and initial landing positions were not different when reading spaced or unspaced sentences. First fixation landing positions in both the spaced and unspaced condition were at the PVL. Thus, eye guidance (word targeting and lexical segmentation) was neither facilitated nor disrupted by the insertion of spaces. However, the comprehension of words was facilitated by insertion of spaces as indicated by the refixation measures (gaze duration and total fixation time), which were significantly shorter in duration on the target words in the spaced than unspaced sentences. As expected, removal of spaces severely disrupted reading English sentences in both bilinguals and English monolinguals: effects were accentuated in the bilinguals due to their lower English language proficiency. In this study, Thai monolinguals were not used. This would have been a useful addition as it would enable us to specifically examine the effects of reading Thai script in comparison to the Thai-English bilinguals and English monolinguals.

Potential language-specific candidates for word or syllable segmentation in Thai are the salient nonaligned initial vowels (e.g., โ) that occur prior to the consonant at the beginning of the syllable (e.g., โรค is written as /o:rk/ but spoken as /ro:k/ ‘disease’), as they possibly form salient syllabic segmentation cues. Thai is also a nonlinear script with tone markers and vowel diacritics that occur above or below the initial consonant in the syllable or lexeme (e.g., ผู้หญิง /phû:jĩŋ/ ‘female’), which may form effective syllable or morpheme segmentation patterns for the skilled reader. Support for this idea comes from the finding that when the tone markers for a target word were viewed in the parafovea prior to fixating that word, subsequent fixation durations on the target word were shorter (Winskel, 2009). Initial character or letter frequency may also play a role in lexical segmentation in adults and children (Kasisopa et al., 2013, 2016). There is evidence that young Thai readers use first character frequency to land their eyes close to the PVL. It was found that children tended to land their eyes further into words, close to the word centre if the word began with a high-frequency character.

These results on Thai form interesting comparisons with studies conducted on other unspaced scripts, namely Chinese and Japanese. Similar to Thai, it was found that sentences with an unfamiliar word spaced format were as easy to read as visually familiar unspaced text in Chinese (Bai et al., 2008). However, a more complex picture emerged for Japanese with its mixed script composed of Hiragana, Katakana, and Kanji. When reading Japanese Hiragana-only script, Sainio et al. (2007) found similar results to English, as spaces tended to facilitate both eye movement targeting and word recognition. Interestingly, when reading Kanji-Hiragana script, similar to Thai, initial saccade landing positions were not affected by the spacing manipulation. However, the PVL for Thai and Japanese were not the same. In Japanese, the PVL was found to be at the word beginning, which is typically occupied by a perceptually salient Kanji character, whereas for Thai the PVL was observed to occur a bit to the left of the middle of the word. In Japanese text, content words tend to begin with a Kanji character whereas function words always begin with a Hiragana character. This mechanism is known as the Kanji targeting strategy. Japanese children have been found to develop a similar Kanji targeting strategy as adult readers (Jincho et al., 2014). Thus, we can see that script-specific characteristics channel attention in different ways in these unspaced scripts. In a recent study, Yan et al. (2019) found that targeting of eye movements in Grade 5 Chinese readers was still under development in comparison to Roman script readers. They attributed this relatively delayed development to the linguistic properties of Chinese including the lack of interword spaces and word boundary ambiguity.

We can see that attention allocation, as revealed by detailed eye movement measures, varies while reading scripts with and without spaces. This supports a weaker version of the script relativity hypothesis, similar in this respect to the Papafragou et al. (2008) language-based study. In relation to a stronger version of script relativity, readers of such densely packed scripts likely develop highly honed perceptual skills, which may transfer to other modalities or nonlinguistic tasks. However, in order to examine this, we need to carefully design experiments with this specific goal in mind.

Linearity-nonlinearity of spatial layout of scripts

Roman script has a serial or linear configuration whereas Brahmi-derived alphasyllabaries such as Thai, Sinhala, Devanagari and Kannada, typically have nonlinear configurations. Thai, a member of this family of scripts, has vowels that can occur above or below the main text line or either side of the consonant. Moreover, the phonological representation of vowels may not adhere to the orthographic sequence (e.g., โรค is written as /o:rk/ but spoken as /ro:k/ ‘disease’ or with an English example: ‘odg’ is read as /dog/). Due to these combined characteristics, Thai script is relatively dense or crowded and exerts distinct challenges to the child learning to read and spell Thai (Winskel & Iemwanthong, 2010).

Differences have been found between readers of linear and nonlinear scripts when processing strings of letters versus symbols in a two-alternative-forced-choice (2AFC) procedure. In that task, participants are asked to identify which of two characters have previously been briefly presented to them in a five-character array (e.g., B, D, F, G, K). This method allows identification accuracy to be measured at all positions in a string of five letters, digits or symbols. For Roman script readers, typically a W-shaped function for Roman letters and a Λ-shaped function for symbols is found (Tydgat & Grainger, 2009). In other words, there is higher accuracy of recall at initial, middle and final positions in the array of 5 letters but only higher accuracy for the middle position in an array of 5 symbols. Thus, there are different patterns in response to letters versus symbols in Roman script readers. In contrast to the Roman script readers, Thai readers responded similarly to Roman letters, Thai letters and symbols (Winskel et al., 2014). It was suggested that this could be due to an adaptive specialized process occurring when reading this visually complex and crowded nonlinear script without interword spaces. Experience of reading Thai may result in smaller receptive field sizes developing as reading skills become more honed in this extremely crowded letter environment.

When we conducted a study with Sinhala readers, another nonlinear Brahmi-derived script that has interword spaces (Jayawardena & Winskel, 2016), we found discrepancies in the serial string identification patterns for native Sinhala and Thai readers. In contrast to the Thai readers, Sinhala readers displayed distinctive patterns when responding to letters and symbols, similar in that respect to the Roman script readers. This disparity in results could be due to the lack of interword spaces in Thai, where readers have the additional task of segmenting words using other cues (Winskel et al., 2009). However, we did find a heightened attentional response to initial letter positions (rather than just initial letter position as occurs in Roman script) in Thai and Sinhala readers (Jayawardena & Winskel, 2016; Winskel et al., 2014). In both Thai and Sinhala scripts, the critical initial phonological letter of a word that is crucial for lexical access may occur in first, second or even third positions due to their shared nonaligned vowel characteristic (where orthographic order does not necessarily correspond to phonological order). Thai has five of these types of vowels and Sinhala has one commonly occurring vowel (the Kombuwa ). This means that the consonant that the vowel modifies can be written in second or third position respectively and yet when the word is read the phonological form of the consonant occurs first. These results indicate that this variation in script-specific features affects attention allocation when converting orthographic code to phonological code.

Variation in lexical tone

Tone languages represent a large proportion of the spoken languages of the world (Yip, 2002). In Asian languages such as Chinese, Vietnamese, Myanmar (Burmese), Hmong, Lisu, Lao, Punjabi and Thai, lexical tone forms an integral aspect of the syllable and serves an essential function in distinguishing meanings of words with identical phonological structures. Tonal languages such as Thai, Vietnamese and Chinese have many different tone homophones where words differ only in lexical tone. When a child learns a tonal language, each new word is learned as a combination of a syllable comprising the vowels and consonants with a particular tone of the language, that is, the tone is an integral element of the word or morpheme. It is also important to note that lexical tone forms substantially vary across tone languages and even within a language and its regional dialects (e.g., Abramson, 2014; Hyman, 2016; Remijsen, 2016). For example, Standard Thai (based on the Central dialect of Bangkok), the official language of Thailand, has five tones but most dialects have six tones (Abramson, 2014). Furthermore, tonal languages can vary in terms of characteristics and complexity, for example, Cantonese and Vietnamese have six tones, Thai has five tones, Mandarin has four tones and Punjabi has three tones.

From a linguistic relativity perspective, there is evidence that the experience of speaking a tonal language has an effect on cognition. Tone language experience has been found to facilitate music processing. For example, Bidelman et al. (2013) found that English-speaking musicians and Cantonese-speaking non-musicians were similar in their pitch discrimination sensitivity, whereas English-speaking non-musicians had significantly lower performance than either of those groups. In a more recent study, Zhang et al. (2020) suggest that listeners who speak a tonal language such as Mandarin Chinese may be able to take greater advantage of talker sex cues than listeners who do not speak a tonal language.

Lexical tone is likely to play a more prominent role in spoken language than reading due to its inherent characteristics and that language is the primary system that reading builds on. Thus, reading may contribute to these language-specific effects. After all, print is linked to the phonology and meaning of the spoken language. Phonology encompasses segmental (vowels and consonants) and suprasegmental (tone) information and has been shown to play a key role in activating word meanings when reading (e.g., Coltheart, 2000; DeMarco et al., 2017; Jared et al., 1999; Ryherd et al., 2018; Tan & Perfetti, 1999).

Tone awareness, the ability to reflect upon and manipulate tones, plays an essential role in reading tonal languages. Research on Chinese children’s reading has shown an association between lexical tone awareness and visual word recognition (e.g., Cheung, et al., 2009; McBride-Chang, Tong, et al., McBride-Chang, Shu, et al., 2008; Shu et al., 2008; Tong et al., 2015; Wang et al., 2005). Several studies have reported the unique contribution of lexical tone awareness to reading performance when statistically controlling for reading-related abilities such as phonological awareness and RAN (e.g., Shu et al., 2008; Tong et al., 2015). For example, Tong et al. (2015) have shown in beginning readers that awareness of different lexical tones explained a unique variance in Chinese character reading, after syllable and onset awareness and morphological awareness were controlled for. The relationship between lexical tone awareness and reading has been shown to be more robust than the association between phonemic awareness and reading (McBride-Chang, Lam, et al., 2008). Moreover, tonal awareness has been shown to differentiate good and poor Chinese readers (Cheung et al., 2009; Ding et al., 2015; Li & Ho, 2011a, 2011b; Wang et al., 2012). Notably, there has been a lack of research on tonal languages, especially in relation to word identification and reading. Moreover, the majority of research has been conducted on Chinese; hence there are a plethora of opportunities for further research in lesser studied tonal languages.

In Chinese characters neither segmental nor tonal information is explicitly represented, whereas it is in pinyin that uses Roman alphabet letters in conjunction with lexical tone marks. Four tone marks are placed over the vowels to represent the four tones in pinyin (Wang et al., 2016; Zhang, Georgiou, et al., 2020). Yin et al. (2011) found that learning pinyin facilitated children's tone awareness in mainland China. According to Zhang et al. (2020), learning pinyin is likely to sensitise children to both segmental (i.e., onset and rime, phoneme) and suprasegmental (i.e., tones) information.

The classic Stroop task (Stroop, 1935) has been used to investigate whether phonological and suprasegmental information is automatically activated in visual word recognition in Mandarin Chinese adult readers (Li et al., 2013; Spinks et al., 2000). In these types of tasks, different pairings of phonologically similar and dissimilar segmental and tonal information are combined. Li et al. (2014) modified the colour naming task used by Spinks et al. (2000) to include an extra stimulus type that shared the same tone but not same syllable with the colour character (e.g., 红 /hong2/, ‘red’ vs. 瓶 /ping2/, ‘bottle’, S–T +). They found significant Stroop facilitation in these same-tone trials. They also found the Stroop effect was stronger for S + T– (e.g., 轰 /hong1/ ‘boom’) than for S–T + trials, and was similar between S + T + (e.g., 洪 /hong2/ ‘flood’) and S + T– trials. Based on these results, Li et al. (2014) concluded that both tonal and segmental information contribute to lexical access; however, segmental information plays a more prominent role than tonal information in visual word recognition in Mandarin Chinese.

In contrast to Chinese, Thai has segmental and suprasegmental information visually represented in its orthography. Thai has five tones (i.e., five different F0 contours) conceptualised as high, mid, falling, rising, low and four tone markers (maj3 e:k1, maj3 tho:0, maj3 tri:0, and maj3 tçat1ta1wa:0) that orthographically occur above the initial consonant in the syllable as diacritics. The tone determination of a syllable is complex as it is influenced by a combination of the class of initial consonant, the type of syllable (open or closed), the tone marker, and the length of the vowel (for further detail refer to Winskel, 2014; Winskel & Iemwanthong, 2010; Winskel & Ratitamkul, 2019). An example in Thai is ขาว /khã:w/ (white) rising tone, ข่าว /khà:w/ (news) low tone, and ข้าว /khâ:w/ falling tone (rice).

Similar to Li et al. (2014), Winskel et al. (2017) also included a condition where segmental information was different but the tone was the same as the target colour word (e.g., ผอม /ph :m/ ‘thin’ was used for ขาว /khã:w/ ‘white’ (S-T +). Results revealed that tonal information in isolation was not activated, as the word (S-T +) condition (where the tone was the same as the target colour word) was not different from the neutral control word (S-T- e.g., กาว /ka:w/ ‘glue’) condition (where the tone was not the same) . In fact, tonal information only had an interference effect when the orthographic syllable segment (including initial consonant) was the same as the colour target word (e.g., S + T + compared with S + T-, i.e., ขาว /khã:w/ ‘white’ vs. ข่าว /khà:w/ ‘news’). Thus, tonal information had a constraining effect when both segmental and tonal information corresponded to the colour target word. This concurs with previous research conducted on Thai using a different paradigm, masked priming (Winskel & Perea, 2014). In that study, it was found that there was an additive orthographic facilitative effect when segmental (the identical initial consonant) was paired with tonal (the identical tone marker) information. This disparity in the role of tone in Thai and Mandarin Chinese is likely due to orthography-specific differences of Thai and Mandarin Chinese. In Thai, segmental and tonal information are both represented orthographically in its script and homophones are not as extensive as in Chinese. This may result in tone being more readily activated when reading Chinese than Thai. As Vietnamese has both segmental and suprasegmental information explicitly coded and a high degree of monosyllabic homophones or homophonic ‘syllabemes’ (Pham & Baayen, 2015), it would form an informative comparison.

In general, results from both the adapted Stroop and masked priming paradigms indicate that segmental information is more readily activated than tonal information in Thai and Chinese and that tonal information plays a more secondary role in early lexical processing. Thus, it appears that tonal information comes into play at a later stage in lexical processing. This seems to be a common pattern in the limited experimental research that has so far been conducted on visual word recognition. As pointed out by Tong et al. (2007), each tone is associated with more words than each segment. Consequently, tonal information exerts fewer constraints on word recognition than segmental information, which could account for these skewed results. However, additional research needs to be conducted on other tonal languages to substantiate these claims.

Limitations and future research directions

The aim of this review was to examine the script relativity hypothesis in relation to variation in spatial layout (interword spacing and linear-nonlinear configuration) and lexical tone, and whether that variation affected cognition in significant or measurable ways. We have drawn on the extensive literature on the linguistic relativity hypothesis, where relatively weaker and stronger versions of the hypothesis have been proposed. According to the weaker version of the hypothesis, differences occur only while the speaker is actually involved in the process of speaking or reading, whereas in a stronger version, there are longer-term nonlinguistic cognitive consequences. A prime example of a stronger or more robust version of the hypothesis is reading or writing in a particular direction (e.g., left-to-right, right-to-left or up-down) (e.g., Azhar et al., 2020; Göbel, 2015; Singh et al., 2000). The consequences of other script-specific contrasts may not have such an impact or are less readily observed or detected in comparison to the overt habitual behaviour of reading or writing direction.

The research examining eye movements of readers while reading scripts with and without interword spaces has revealed that there is a difference in attention allocation, as reflected by the eye movement measures when reading these scripts. These findings support a weaker version of the script relativity hypothesis, where there is a channeling of attention due to script-specific features while actually reading, similar in this respect to the Papafragou et al. (2008) language-based study on expression of motion events in satellite- and verb-framed languages. The overriding question is whether this habitual experience of reading with interword spacing variation affects nonlinguistic cognition in significant ways. In scripts without interword spaces, there is more densely packed and crowded information contained in the text in comparison to spaced scripts such as Roman script. This is exacerbated when the script has a nonlinear configuration as occurs in Thai. Using a 2AFC identification task, Thai readers were found to process strings of letters (Thai and Roman script) and symbols in a similar manner (Winskel et al., 2014). This contrasts with Roman script and Sinhala readers, who typically responded with a distinctive Λ-shape to identification of strings of symbols (Jayawardena & Winskel, 2016; Tydgat & Grainger, 2009; Winskel et al., 2014). This could reflect the heightened visuo-perceptual skills that develop in Thai readers due to habitually reading this compact nonlinear and unspaced script. There was also a heightened attentional response to initial letter positions (rather than just initial letter position as occurs in Roman script) in Thai and Sinhala readers (Jayawardena & Winskel, 2016; Winskel et al., 2014). This is in line with the nonaligned vowel characteristic of Thai and Sinhala (e.g., ‘odg’ is read as /dog/). These results indicate that this shared script-specific feature affects attention allocation when converting orthographic code to phonological code. This contrasts with Roman script readers, where initial letter position has a special role for word recognition in comparison to internal letters, and consequently, attention is preferentially channeled to the critical initial letter (e.g., Chambers, 1979; Estes et al., 1976; Gómez et al., 2008; Jordan et al., 2003; Perea, 1998; Rayner & Kaiser, 1975; Tydgat & Grainger, 2009; White et al., 2008).

In relation to a stronger version of script relativity, readers of perceptually crowded nonlinear scripts through the habitual process of reading may develop highly honed perceptual skills. These heightened visuo-perceptual skills may transfer to other modalities or nonlinguistic domains. However, in order to examine that, we need to design experiments with this specific goal in mind. One potential approach is to compare the perceptual skills of readers of linear and nonlinear scripts on visual search (see Wolfe, 2020) or change blindness (see Simons & Rensink, 2005) tasks involving detection of a target among distractors consisting of symbols, shapes or patterns. In order to distinguish between the effects of variation in these script features, it would also be informative to compare perceptual skills of biscriptals literate in a linear script (e.g., Roman script) and either a nonlinear script with interword spaces (e.g., Sinhala, Kannada, Devanagari) or a nonlinear script without interword spaces (e.g., Thai). Monoscriptals of the respective linear and nonlinear scripts could also be included in the design. This would allow for the relative contribution of linearity and interword spacing variation in different scripts to be delineated.

Lexical tone forms a critical feature in both speaking and reading in tone languages. Notably, lexical tone is likely to play a more prominent role in spoken than written language due to its inherent phonological characteristics. Consequently, script-specific characteristics may play a secondary role and contribute to linguistic relativity effects. A challenge for future research is to delineate between linguistic and script-specific effects. With that goal in mind, studies could be conducted with people who can speak a language (e.g., Chinese, Thai, Vietnamese) but who are either literate or illiterate in the written form. The relative contribution of linguistic and script-specific effects of tone variation on cognition could be investigated using this approach.

Tone languages vary in terms of both their relative complexity and whether they orthographically encode this feature. In terms of complexity, Cantonese and Vietnamese have six tones, Thai has five tones, Mandarin has four tones and Punjabi has three tones. We could expect readers of more complex tonal languages that partition auditory space into a greater number of categories to be more sensitive to tone and have enhanced auditory perceptual skills, which in turn may transfer to other nonlinguistic modalities.

In Thai, Myanmar and Vietnamese, tone is orthographically represented by diacritics. Learning to read scripts where suprasegmental information is explicitly represented may sensitise readers to tonal information in comparison to readers of Chinese, where tone is not explicitly expressed in the script. This may result in readers from such scripts as Thai, Myanmar and Vietnamese having heightened script-on-cognition effects in particular nonlinguistic domains. Moreover, tone diacritics in Vietnamese are transparent in comparison to the more complex and opaque expressions in Thai. Based on these contrasting features, we may expect to find a greater impact of tone diacritics on perceptual skills in Vietnamese in comparison to Thai readers.

Conclusion

The aim of this review was to examine the script relativity hypothesis in relation to variation in spatial layout and lexical tone and whether that variation affected cognition in significant ways. In general, the research reviewed reveals that there is a selective channeling of attention while reading scripts that vary in these characteristics. These findings support a weaker version of the script relativity hypothesis. An important consideration is that as language is the primary system and reading builds on that system, script-specific effects may be subtle or difficult to disentangle from language-specific effects. Moreover, the studies reviewed were not designed specifically to investigate the script relativity hypothesis. In order to investigate if there are off-line, longer-term cognitive consequences of this script variation, carefully designed studies need to be conducted with this overriding goal in mind. Advances in technology are providing us with useful tools for investigating these often quite subtle contrasting effects on attention, and in turn other cognitive processes. The question is still open to debate as to whether variation in attention allocation due to scripts that vary in terms of spatial layout (interword spaces and linearity) and lexical tone translates into more profound, longer-term nonlinguistic cognitive consequences. Future research needs to include other lesser studied languages and their scripts so that we can ascertain what are common cognitive patterns or processes and what are shaped by variation in script-specific features.