1 Introduction

This chapter is a description of a research project that examines the distribution of language structures as reflected in actual language use in Arabic diglossia. Specifically, it examines grammatical differences between varieties and modalities in Arabic as reflected in narrative text production. With respect to variety distinction, the study compares Palestinian Arabic (hereafter PA), the spoken variety used for everyday speech, and Modern Standard Arabic (hereafter MSA); with respect to modality distinctions, we compare spoken texts produced in PA and in MSA, on the one hand, with those produced in written MSA. To illustrate, two variety-related differences are demonstrated in the sentences in (1) below. The two sentences were produced by the same 4th grade Arabic speaker depicting the same event, once in PA (a) and once in spoken as well as written MSA (b).

(1)

a. ʔibin sˁaffi waqqaʕni ʕa-l- ʔardˁ (Spoken PA)

 

 ‘mate class-my made-me-fall on-the-ground’

 

b. ʔibn sˁaffi ʔawqaʕani ʔardˁ-an (Spoken and Written MSA)

 

 ‘mate class-my made-me-fall ground-accusative case’

 

 “My classmate made me fall to the ground“

 

 (Ahmad, 4th grade)

(1a) was produced in PA, while (1b) was produced twice in MSA, once in the written and once in the spoken modality. Two differences stamp the variety-related differences of these utterances. These are the two different morphological forms of the verb ‘made X fall’ which share the consonantal root w-q-ʕ but surface in two different word patterns: CaCCaC in PA (waqqaʕ), and ʔaCCaC in MSA (ʔawqaʕ). The second is the two different morpho-syntactic forms of the phrase ‘to the ground’. In PA the noun ʔardˁ ‘ground’ takes the definite article l and is part of a prepositional phrase headed by the preposition ʕa ‘on’, while in MSA it is indefinite and takes the adverbial case marker suffix -an. The current study examines such grammatical differences between varieties, as well as between modalities within the same MSA variety (spoken versus written) as manifested in actual text production by school graders and adults.

The development of writing and written language has over the past few decades become a topic of interest for theoretically motivated research going beyond primarily pedagogical or clinical concerns (See for example, Bereiter & Scardamalia, 1987; Halliday, 1989; Olson, 1994). A major impetus to such research in recent years has been the flourishing of the domain of “later language development” across the school years from middle childhood to adolescence (Berman, 2007; Nippold, 1998; Tolchinsky, 2004). Such studies underscore the lengthy developmental route to discourse-embedded command of linguistic forms and structures in general (Berman & Slobin, 1994) and the attainment of “linguistic literacy” in particular (Ravid & Tolchinsky, 2002). Importantly, recent research goes well beyond questions of writing as a notational system, the center of interest in the well-studied field of “emergent literacy”, with the goal of investigating acquisition of the written language as a special style of discourse.

Studying language acquisition from the perspective of developing linguistic literacy is of crucial importance in contexts where there is a substantial linguistic distance between the spoken and written forms of language. Such a situation is particularly acute in the case of Arabic diglossia (Ferguson, 1959) where speech and writing typically involve use of two linguistic systems that differ markedly (Saiegh-Haddad, 2012, 2017, 2018) to a point where they have been shown to be cognitively represented as two distinct languages (Ibrahim & Aharon-Peretz, 2005; Khamis-Dakwar & Froud, 2007).

The goal of the project discussed in this chapter is to investigate the developing linguistic abilities of native speakers of urban dialects of PA from middle childhood to adolescence, as well as a group of adult students at universities/colleges, as reflected in the texts they construct in different genres (narrative and expository) in both Spoken Arabic, in our case PA, and in MSA in both speech and writing.

2 Language Development in Arabic Diglossia

2.1 Diglossia

In all literate societies, spoken and written languages are used in different socio-cultural contexts, and the two forms of linguistic expression tend to be associated with different communicative conditions and distinct processing constraints, involving such factors as clarity, speed, and effort in online versus offline output (Chafe, 1994; Olson, 1994; Slobin, 1977; Strömqvist et al., 2004). Yet, what appears unique to diglossia (although possibly applying to some extent in some other sociolinguistically analogous situations) is that the spoken and written language varieties are so remarkably distinct in lexicon, phonology, morphology, and syntax, that preliterate children find it very difficult, and in some cases impossible, to understand a story, or even an isolated utterance, when it is presented to them in the standard language.

Native Arabic speaking children are born into a linguistic context called diglossia” which is “a relatively stable language situation in which, in addition to the primary dialects of the language there is a very divergent, highly codified (often grammatically more complex) superposed variety, which is largely learned by formal education and is used for most written and formal spoken purposes but is not used by any sector of the community for ordinary conversation” (Ferguson, 1959: 345). Though Ferguson proposes a dichotomy between a spoken and a written variety, the linguistic situation in Arabic diglossia has been described in terms of levels, or a continuum, with speakers shifting between as many as four (Meiseles, 1980) or five (Badawi, 1973) varieties, ranging between colloquial/vernacular and literary/standard forms, resulting in levels that are neither fully standard nor fully colloquial. As such, there are “gradual transitions” (Blanc, 1960) between the various varieties, and “theoretically an infinite number of levels” (Basiouny, 2009: 15).

In diglossic Arabic, children start out speaking a local variety of Spoken Arabic (hereafter SpA), the one used in their immediate environment: at home and in the neighborhood; once they enter school, at age 6 years, they are formally and extensively exposed to Modern Standard Arabic as the language of reading and writing while Spoken Arabic remains the language of informal speech. Academic school-related speech is conducted in a semi standard variety, Educated Spoken Arabic (Badawi, 1973), except in Arabic lessons, where MSA is more dominant, at least in aspiration (Amara, 1995). Outside the school milieu, there is a similarly stable co-existence of the two major varieties, each functioning for distinct spheres of social communication: Spoken Arabic is used by all native speakers: young and old, educated and uneducated, for informal and intimate verbal interaction in the home, at work, in the community, and so forth. On the other hand, MSA, alternating with Educated SpA, is expected to expected to be used for formal oral interactions, such as giving a speech or a lecture, and for writing (however, see, Abu Elhija, 2012; Al-Khatib & Sabbah, 2008; Haggan, 2007; Mostari, 2009; Palfreyman & Al-Khalil, 2007; Warschauer et al., 2002 for use of Spoken Arabic in electronic writing in Arabic). Thus, while Spoken Arabic is undoubtedly the primary language of spoken usage, native speakers of Arabic, including young children, are actively and constantly engaged with MSA as well. They pray, do their homework, and study for their exams in MSA, and they also watch certain TV programs and dubbed series in MSA. Thus, besides proficiency in using Spoken Arabic, linguistic proficiency in Arabic involves concurrent proficiency in using MSA, from an early age, both for reading and writing, and also for speech.

2.2 Linguistic Differences in Arabic Diglossia

Arabic diglossia was established, at the latest with the standardization of Arabic in the eighth and ninth centuries A.D. with the early grammarians producing a set of norms for the written form of the language that they called fusha. Over the course of many years, the continued use of this favored set of written linguistic norms led to substantial differences between the dynamic spoken varieties and the fixed written form, making the two linguistically distant, and engendered the notion that the written standard was the ‘real language’, while the other varieties were ‘degenerate’ and ‘corrupt’ versions (Maamouri, 1998). The linguistic distance between the spoken and the written varieties of Arabic is evident in all areas of structure and usage, including not only lexicon and phonology, but also syntax and morphology, as documented in a range of studies in the past several decades (see for example, Eid, 1990; Geva-Kleinberger, 2000; Hary, 1996; Henkin, 2010; Ibrahim, 1983; Kay, 1994; Levin, 1995; Meiseles, 1980; Rosenhouse, 2007, 2014; Myhill, 2014; Saiegh-Haddad & Henkin-Roitfarb, 2014; Saiegh-Haddad & Armon-Lotem, forthcoming; Versteegh, 1997, 2001; Wright, 1889).

Phonological differences between the two varieties are apparent in their phonemic and syllabic structure, phonotactic constraints, syllable weight and stress patterns (Aquil, 2011; Broselow, 1979; Jastrow, 2004; Watson, 1999, 2002). Morphologically, MSA and SpA differ markedly in inflectional categories, such as the absence in SpA of final short vowels indicating case and mood and of the preponderance of the genitive-accusative forms of duals and so-called “sound masculine plurals” (Holes, 1995, 2004). MSA has a rich morphological system of grammatical agreement contrasting with a far less varied and less complex system of agreement marking in SpA (Aoun et al., 1994; Aoun et al., 2010; Benmamoun, 2000; Brustad, 2000). Derivational morphology also reveals differences between the two varieties, primarily in the distribution and frequency of verbal patterns, with some patterns being less frequent and productive in MSA than in SpA (Benmamoun, 1991; Blanc, 1970; Bolozky & Saad, 1983; Fassi Fehri, 1994; Rosenhouse, 2002; Shawarbah, 2007; Younes, 2000). For example, the verb pattern aCCaC (Pattern IV) is hardly productive in PA, with a dictionary search revealing only 75 aCCaC verbs in PA, only 3.5% of all PA verbs (Laks, 2011, 2018). Syntactically, SpA and MSA vary in clausal word order; with VSO as the typical word order of MSA as against SVO in SpA (Bolotin, 1995; Fassi Fehri, 1993; Mohammad, 1989, 2000; Shlonsky, 1997). SpA, on the other hand, has a more complex system of lexical categories (parts of speech) than MSA, including an autonomous system of adverbs. The two varieties also differ in use of nominal constructions, with nominalizations being far more common in MSA than SpA (Laks & Berman, 2014; Rosenhouse, 1990, 2008). Moreover, at the intersection of morphology and syntax, the two varieties differ in processes of passivization with use of passive verbs being far more common in MSA than in SpA (Hallman, 2002; Holes, 1998; Laks, 2013). Lexically, SpA and MSA feature overlapping, yet different lexicons with approximately 80% of the spoken words in the spoken lexicon of young children (a dialect of PA) having different lexical and lexco-phonological forms in MSA (Saiegh-Haddad & Spolsky, 2014).

Given the linguistic distance between SpA and MSA and the basic complementary distribution of how words and structures pattern in the two varieties, a given linguistic form can generally be identified as belonging to either SpA or MSA, with certain forms common to both varieties. For example, inflectional endings marking case and mood are used only in MSA, never in SpA, and negation relies on different sets of negation particles in SpA and MSA. On the other hand, processes of noun pluralization are similar in SpA and MSA, yet the same word may be pluralized differently in the two varieties (Saiegh-Haddad et al., 2011).

These linguistic differences have clear implications for language development in general and for acquisition of linguistic literacy in particular. Yet, the literature to date is almost totally lacking in psycholinguistic developmental research measuring (rather than only describing) linguistic differences between the two varieties of Arabic and investigating the consequences of such differences for language acquisition and usage. One exception is a recent study measuring the lexical distance between SpA and MSA in a dialect of Palestinian Arabic used in Central Israel: about 40% of the words in the spoken lexicon of kindergarten children had completely different lexical forms in MSA; another 40% consisted of partial cognates that had overlapping yet different forms in the two varieties (with differences ranging between one-to-seven phonological parameters, including phoneme substitution, addition, and deletion); and about only 20% had the same lexico-phonological form in both SpA and MSA (Saiegh-Haddad & Spolsky, 2014). The fact that only 20% of the words used by children aged 4–6 years maintain an identical surface lexical form in MSA is a compelling result – particularly in light of the finding that children found it difficult to recognize the lexical relatedness between SpA/MSA partial cognates even when the gap between the two forms consisted of a single phoneme. (Saiegh-Haddad, 2011; Saiegh-Haddad & Haj, 2018). These findings support the results of earlier studies demonstrating the difficulty encountered by preschool children as well as by adolescents speaking the same variety of Palestinian Arabic in operating on the phonological structure of MSA words – such as recognizing, isolating, or encoding a phoneme – when the same word had a different phonological form from that used in their SpA vernacular (Saiegh-Haddad, 2003, 2004, 2005, 2012; Saiegh-Haddad et al., 2011, 2020). These results have been shown to be argued to be related to quality of phonological representations in the lexicon (Saiegh-Haddad & Haj, 2018) and were shown to have cross-dialectal validity (Saiegh-Haddad, 2007). Further evidence from the scant research available in this domain demonstrating the difficulty schoolchildren have with linguistic structures that do not exist in their spoken vernacular is provided by the forced-choice grammaticality judgment study of Khamis-Dakwar et al. (2012) among schoolchildren, native speakers of Palestinian SpA, when presented with MSA linguistic structures. Recently, Laks and Berman (2014) measured morpho-syntactic differences between SpA and MSA as reflected in the speech and writing of adult native speakers of Jordanian Arabic; they found clear inter-modality linguistic differences on a range of linguistic structures, including case marking, adverbials, dual forms, copula construction, nominalizations, aspect, and modalized prepositions.

3 Text Production as a Window on Language Development

Authentic, unedited text production has proven a valuable and methodologically valid tool for elicitation of a broad range of reliable data on language acquisition and development in different languages and contexts, both spoken (see Berman & Slobin, 1994; Labov, 1972) and written (Berman, 2005; Berman & Verhoeven, 2002; van Hell et al. 2008; Verhoeven & van Hell, 2008). Such studies yielded robust, age-sensitive data across a range of linguistic dimensions, including careful controlled comparisons of spoken and written usages in different languages (for Hebrew – Berman & Ravid, 2009; for French – Jisa, 2004; for Swedish – Strömqvist et al., 2004; Johansson, 2009). For example, in lexical usage, written texts differed from their spoken counterparts both in lexical density (the proportion of content words to total number of words) and lexical diversity (the ratio of different words to total number of words, so-called type-token ratio), with written texts more lexically dense and diverse. Such differences between the lexical properties of texts in speech and writing emerged as significant at all age-groups included in the large-scale crosslinguistic project in which the current study is anchored, including 9-to-10-year-old 4th-graders, 12–13-year-old middle school students, as well as high-school 11th-graders and university graduate level adults (Strömqvist et al., 2002). Other studies in this same framework that compared written and spoken texts in English and Hebrew demonstrated a range of differences in lexical usage (Berman & Nir, 2011a, b), in reliance on non-referential auxiliary material like repetitions, disfluencies and discourse markers (Ravid & Berman, 2006), as well as in level of usage or linguistic register (Ravid & Berman, 2009).

Beyond language variety, an important variable in examining text production abilities is that of genre. Narrative texts, arguably the most universal and earliest acquired type of extended discourse, were the first to be employed in this domain, providing important age-sensitive insights into children’s grammatical and lexical knowledge from young pre-school age and into adolescence (Berman, 1997; Berman & Slobin, 1994; Hickmann, 2003; Labov, 1972). Subsequent research, including studies referred to above, went beyond narrative discourse to examine non-narrative, expository type discourse, largely, though not exclusively, in the written medium (e.g., Berman & Nir, 2009; Jisa & Viguié, 2005; Ravid, 2005). Such studies, without exception, demonstrate the early developing psychological reality of the distinction between narrative and expository genres of expression in the linguistic forms of expression selected for each of these contrasting types of discourse. The psychological reality of genre effects was manifest in linguistic expression across different languages and in a variety of linguistic constructions, including verbal structures (Ragnarsdottir et al., 2001), subject-NP patterning (Ravid et al., 2002), and lexical usage (Strömqvist et al., 2002). For example, narrative texts triggered use of the past tense and of perfective aspect (where relevant) whereas expository texts were associated with reliance on timeless present tense and irrealis mood (Reilly et al., 2002). Clausal constructions also varied with genre with a higher proportion of copular and existential constructions in expository than in narrative texts, and use of personal pronouns in narratives as against impersonal pronouns and noun phrases with lexical heads in expository texts (Ravid et al., 2002). Inter-genre contrasts also reveal a certain “developmental paradox” (Berman & Nir-Sagiv, 2007). While schoolchildren find it harder to cope with the task of producing expository texts on an abstract topic, they invariably use high-register, more formal and less everyday means of linguistic expression in lexicon and syntax in producing such texts compared with narratives.

4 Arabic Diglossia in Text Production

The complex linguistic context in diglossia means that language development in Arabic can only be understood by a carefully controlled examination of acquisition of SpA oral skills, as well as MSA proficiency in both speech and writing, along with investigating the relationship between students’ linguistic command of SpA and MSA. Such an investigation has not yet been undertaken in research on Arabic language or literacy acquisition. Moreover, despite the rich body of research comparing written and spoken text production, to the best of our knowledge, extended text production has not been exploited as a means of examining diglossia as a special factor along the lines envisaged in the proposed study. The major goal of this study is, consequently, to measure linguistic ability in three varieties of language use – in SpA speech, MSA speech and MSA writing across the school grades and among adult speakers.

A major aim of this study is to trace the development of linguistic skills in these different varieties of the language, from early stages of formal tuition in MSA to near-completion of high schooling by applying parallel procedures across large groups of native speakers across four different levels of age/schooling (4th, 7th, 9th, and 11th grades, and adults). The goal here is to trace the role of Arabic diglossia in children’s linguistic skills at different stages of language, cognitive, and social development, on the one hand, and to examine the impact of formal schooling and increased exposure to MSA on their ability to differentiate between the two varieties and to adjust their language to different communicative situations and different contexts of use (in speaking or writing, in telling story or discussing an abstract topic).

A related goal is to test the extent to which speakers at different age are sensitive to the fact that the socio-functional complementarity between SpA and MSA has resulted in the alignment of text genre with language especially in the school context. Thus, for example, exposition, as a primarily academic genre, is typically formulated in MSA, or a semi-standard variety like Educated SpA, even when spoken. A further goal of the study is, consequently, to evaluate student’s linguistic expression in both narrative and expository texts in both SpA and MSA, where MSA will be elicited in both the spoken and written modalities. This innovative three-way comparison should yield important insights into how, across different phases of development, students differentiate between different varieties of their native language in keeping with communicative circumstances and genre-dependent level of formality. This novel direction of the proposed study is particularly important in the case of Arabic, where modality (speech/writing) is typically equated with language variety (SpA/MSA) and where all school-based language use, even in the early grades of elementary school is expected to be in MSA.

The study provides insights into a range of to date largely unexplored issues in various domains of linguistic and psycholinguistic research: Arabic diglossia in general, acquisition of Arabic as a first language, later language development in conditions of diglossia, and the interrelations between the variables of age/schooling level (grade-school, middle school, high school), variety of Arabic (SpA/MSA), modality (speech/writing), and the type of discourse genre (narrative/expository). Thus, it is expected to have significant theoretical and practical implications. Theoretically, it will identify and measure the linguistic differences between SpA and MSA in development and in interaction with modality and genre as they are used in actual text production. Practically, the findings are expected to have significant implications for instruction and assessment in Arabic as the first language as well as in Arabic as a foreign language. Moreover, although SpA dialects differ markedly from one place to another, linguistic commonalities obtain between them, especially in the domain of morpho-syntax, while importantly, all speakers of all dialects use a single highly uniform standard Arabic form. The findings of this study should thus have implications for language development and instruction among speakers of other SpA dialects beyond the Palestinian dialect dealt with here, and could constitute a point of departure for examining the same variables in other dialects.

4.1 Working Hypotheses

  1. (a)

    Given the linguistic distance between SpA and MSA, it was predicted that participants would use different linguistic structures in (spoken) PA, on the one hand, and in MSA both spoken and written, on the other. Linguistic forms that are typically associated with PA are expected to be employed in speech in PA but not in MSA speech or writing. At the same time, we predicted to see some use of PA linguistic structures in linguistic expression in MSA speech and in MSA writing, but more in speech than in writing. This is due to the spoken modality and due to the cognitive constraints in producing online speech and the difficulty of attending concurrently to both form and content in the course of unmonitored spoken output, especially for the younger participants.

  2. (b)

    Due to the strong alignment of genre with language variety in the case of Arabic, we expected to find a heavier reliance on MSA in expository than in narrative texts in both speech and writing. Nonetheless, we expected writing, and regardless of genre, to be closer to the MSA end of the continuum than speaking.

  3. (c)

    As an early acquired genre, we expected text construction abilities and modality-appropriate linguistic expression to emerge earlier and to be better consolidated in narrative than in expository texts in all three assignments: PA, MSA speech and MSA writing. Inter-genre differences were expected to diminish with age, as the modality factor becomes more dominant.

  4. (d)

    Since acquisition of MSA and knowledge about the appropriate deployment of linguistic forms in this more formal variety is strongly impacted by increased age/schooling, we expected to see a developmental progression in linguistic expression over time in all assignments: PA, MSA speech, and MSA writing. We expected to find stronger three-way interrelations and distinctiveness over time, such that proficiency in linguistic expression would improve in tandem with increased adaptability to the demands of modality and genre across all three assignments as a function of age/schooling development.

4.2 Experimental Design and Method

The principles underlying the proposed study derive from a “form/function” approach to language acquisition, with a focus on how linguistic forms are deployed in the service of discourse functions such as reference to space, time, and person (Berman, 1990, 1996, 1997; Hickmann, 2003; Karmiloff-Smith, 1979; Slobin, 1990, 1991, 1996; Berman & Slobin, 1994). The current study is methodologically grounded in the framework of an international cross-linguistic research project on “Developing Literacy in Different Contexts and Different Languages” (funded by the Spencer Foundation, Chicago, Ruth Berman PI) that investigated the text construction abilities of schoolchildren and university graduate students in seven different countries (as described in Berman, 2008; Berman & Verhoeven, 2002). This study yielded rich research results that were reported in many publications, demonstrating the validity of its design. The current study largely replicated the design and procedures applied successfully in this large-scale project to enable directly comparable analyses of parallel texts – elicited on a shared topic (of interpersonal conflict) in both speech and writing, and in both narrative and expository genres across participants at four different levels of age and schooling. These procedures also provide a unique basis for applying analyses relating (linguistic) forms to (discourse) functions anchored in extended texts produced in different discourse genres. Our design differs from and goes beyond the “source” study to take into account the special sociolinguistic circumstances of the Arabic language, as follows. First, it evaluates linguistic usage not only in spoken PA and written MSA, but also in spoken MSA. Second, given the multi-faceted nature of linguistic distance between PA and MSA, the study addresses not only syntactic and lexical, but also phonological and morphological features of the linguistic expression in the two varieties, modalities and genres.

4.2.1 Participants, Materials and Procedure

Closely comparable written and spoken texts were produced by middle class children, adolescents and adults who are native PA speakers from Kufur Qareʕ in 5 age/grade level groups: 4th grade (9–10 years), 7th grade (12–13 years), 9th grade (14–15 years), 11th grade (16–17 years) and adults (25–35 years, university/college students). These age/grade level groups were targeted because studies have shown that during this period, between mid-childhood across adolescence, language usage changes significantly in comparison to what has been observed for younger children. A total of 150 participants produced narratives and expository texts in the two modalities and varieties of Arabic yielding 6 texts: three expository texts produced in PA, MSA-SP and MSA-W as well as three narratives produced in PA, MSA-SP and MSA-W. There were 30 students in each group and the pool of data consisted of a total of 900 texts (5 groups × 30 subjects × 6 texts).

Each participant was asked to produce both a narrative and an expository text in MSA in the two modalities: spoken and written, and the same texts in (spoken) PA, yielding a total of 6 texts per participant: PA Oral Narrative; PA Oral Expository; MSA Oral Narrative, MSA Oral Expository; MSA Written Narrative, MSA Written Expository. To elicit PA oral texts participants were instructed to use PA as they would do in speaking to a friend. To elicit MSA oral texts they were told to use MSA as if they were giving an oral presentation in class. To elicit MSA writing they were asked to write as they would normally do. Order of assignment was counter-balanced across the six tasks. To elicit the narrative and expository texts, the study employed the same three-minute speechless video clip as was employed in the cross-linguistic “source” project. The film depicts a variety of short scenes of interpersonal conflict in a school setting. Participants were shown the film at a quiet room in their school and were immediately asked to talk and write about “problems between people”. To elicit narratives, participants were asked to tell a story about an incident or situation in which they had experienced problems with someone and to write it down, while to elicit expository texts, they were instructed to discuss the subject of ‘problems between people’ by giving a talk and writing a composition on the subject.

4.2.2 Transcription and Coding

All texts were transcribed and divided into clauses as specified in Berman and Slobin (1994, pp. 660–662) and measures of analysis in large part follow the principles adopted by the cross-linguistic “source” project. Data segmentation of both spoken and written texts takes into account three main linguistic units: words, clauses, and “clause-packages” – the latter as specified in analyses of English, Hebrew, and Spanish data in the crosslinguistic project (Berman & Nir, 2009). Elicited texts were transcribed in broad phonemic transcription using CHILDES program (MacWhinney, 2000). The main categories of analysis for comparison were coded in separate tiers. The categories selected for coding are as follows: verbs were coded according to root, pattern, transitivity and semantic function (Berman, 1978, 2003; Ravid et al., 2016); auxiliary verbs were coded for root and pattern and their function (e.g. aspect); nouns were coded according to definiteness, gender, number and syntactic case marker (if any); nominalizations were coded according to their nominal patterns, their semantic function and their syntactic position; and adjectives were coded for gender, number, morphological type (e.g. affixation vs. patterns), semantic classification and their syntactic position (attributive vs. predicative).

5 Results

In the next section, we present results from two types of data elicited from narrative texts: the first involves the distribution of verbal patterns; the second involves voluntary usage of syntactic case markers. Both linguistic indicators have been found to distinguish between varieties and modalities among Arabic native speaking school graders and adults.

5.1 Distribution of Verbal Patterns

Semitic morphology highly relies on non-concatenative morphology, which involves forming words in configurations named ‘patterns’. The pattern determines the phonological shape of the verb (Bolozky, 1978; Berman, 1978, 1987; Schwarzwald, 1981, 2002; Junger, 1987; Bat-El, 1989, 2011, 2017; Ravid, 1990, 2008; Aronoff, 1994, 2007; Holes, 1995; Ussishkin, 1999; Benmamoun, 2003; Izre’el, 2010; among many others). Verbal patterns differ from one another mainly in the type of semantic and syntactic properties of the verbs they host (Ariel, 1971; Berman, 1978; Bolozky & Saad, 1983; Wittig, 1990; Guerssel & Lowenstamm, 1996; Benmamoun, 2000, 2003; Doron, 2003; Goldenberg, 1998; Schwarzwald, 2002; Younes, 2000; Hallman, 2006; Henkin, 2009, 2010; Glanville, 2011; Tucker, 2011; Shawarbah, 2012; Ouhalla, 2014). For example, the root k-s-r ‘to break’ can be configured in two distinct patterns; CaCaC as a transitive verb, kasar ‘break X’ (transitive-causative), and inCaCaC as an intransitive verb, inkasar ‘be/get broken’ (intransitive-inchoative). The distribution of verbal patterns in Hebrew has been examined within different frameworks, including verb innovation (Bolozky, 1978, 1999; Ravid, 1990; Berman, 1987, 2003; Laks, 2018), language variation and change (Schwarzwald, 1981, 2002; Ravid, 1995, 2003, 2004; Bat-El, 2002, 2019; Laks, 2013; Ravid et al., 2016), acquisition (Berman, 1980, 1982, 1993; Armon-Lotem & Berman, 2003; Ravid, 2011; Ravid et al., 2016; Ravid & Vered, 2017) and different types of elicited texts (Berman & Slobin, 1994; Ravid, 2004; Berman et al. 2011; Ashkenazi et al., 2016; Levie et al., 2020). Fewer studies have examined psycholinguistic aspects of verbal patterns in Arabic as they are employed in actual text production (DeMiller, 1988; Shawarbah, 2007; Ford, 2009; Henkin, 2009; Benmamoun, 2003; Dank, 2011).

While the literature provides a classification of the functions of Arabic verbal patterns (see Ryding, 2005), there has been little research on the psychological reality of these classifications and the scope of their usage in actual text production. In one such study, Laks et al. (2019) examined the distribution of verbal patterns in PA narrative texts produced by 30 adult speakers. We showed that while there are ten verbal patterns with the potential of using them all for verb formation, they differ in frequency of use in text production even when they convey similar semantic functions. However, Laks et al. (2019) examined PA only. The current study extends this question to MSA too. We present here some of the data reported in Laks et al. (2019) as well as new data based on spoken MSA and written MSA texts, in order to probe differences in the distribution of verbal patterns according to both variety: PA vs. MSA, and modality: spoken PA and spoken MSA, on the one hand, vs. written MSA. Both types of data are presented here as one pool. Texts were transcribed and verbs were coded according to root, verbal pattern, semantic type and transitivity. Tables 1 and 2 below summarize the distribution of patterns by type and token frequency and percentage out of the total number of patterns in the corpus.

Table 1 Distribution of Arabic verbal patterns in types by variety-modality
Table 2 Distribution of Arabic verbal patterns in tokens by variety-modality

As can be seen from Tables 1 and 2, CaCaC is the most productive pattern in text production in both modalities and varieties, and with respect to both type and token. In PA, it constitutes 41% of the total number of verb types and 59% of the total number of tokens. In spoken MSA, it constitutes 34% of the total number of verb types and 50% of the total number of tokens, and it written MSA it constitutes 38% of the total number of verb types and 51% of the total number of tokens. The CaCaC pattern is followed in frequency by CaCCaC and tCaCCaC which constitute between 12% and 19% of the verb types, respectively, and 7% and 12% of tokens, depending on modality and variety. The remaining patterns are less frequent, and each constitutes less than 10% of the verb types and tokens. Thus, CaCaC is the most frequent pattern and it hosts most basic verbs in both PA and MSA (Holes, 1995). This stands in contrast to studies on verb innovation (Laks, 2018), which demonstrate that CaCaC is hardly used in the formation of new verbs, and that CaCCaC and tCaCCaC are used almost exclusively for this purpose. In addition, the data in Tables 1 and 2 also shows that iCCaCC is not used at all, and inCaCaC and istaCCaC are rarely used.

The results above reveal variety-related differences, where some patterns are more typical of one variety rather than the other. As shown in Table 1, CaCaC and CaCCaC are more dominanat in PA than in MSA both spoken and written. MSA texts, in contrast, demonsrate greater variation in the distribution of verbal patterns. The iCtaCaC pattern is more frequent in MSA. It constitutes 12% of verbs types in spoken MSA and 14% in written MSA, in comparison to only 8% is PA. Similarly, the Ca:CaC pattern constitutes 8% of the verb types in spoken and written MSA, and 5% in PA. Similar tendencies were also found with respect to verb tokens, as shown in Table 2.

Differences between varieties and modalities are also reflected in some of the semantic functions of verbal patterns. We demonstrate this below with respect to the expression of causativity. As shown in (2) below, the same participant used the same consonantal root f-h-m ‘understand’ in two different patterns to denote the causative verb ‘make understand’: CaCCaC in PA, and aCCaC in MSA.

(2)

a. PA: u-fahhamtoh inno: ha:ðˁa il-iʃi ɣalatˁ

 

 ‘I made him understand that thing is wrong’

 

b. MSA-S: fa-ʔafhamtuhu wijhat naðˁari:

 

 ‘I made him understand my point of view’

 

c. MSA-W: wa-ʔafhamtuhu wijhat naðˁari:

 

 ‘I made him understand my point of view’

 

 (Aref, Adult)

Tables 3 and 4 summarize the distribution of causative verbs across patterns in type and token counts. In PA, the most typical pattern of causative verbs was CaCCaC, whose verbs made up 63% of the total types and 58% of the total tokens. Causative verbs were also common in CaCaC making up 17% and 23.5% of types and tokens, respectively. 17% of the causative verb types were also found in ʔaCCaC, but they made up only 13% of the tokens. In contrast, MSA texts, and especially written MSA, demonstrated greater variation with respect to the distribution of causative verbs in other patterns. CaCCaC hosted 48% of the causative verbs types in spoken MSA and only 29% in written MSA. In spoken MSA, 30% of the causative verbs types were in CaCaC, while in written MSA there was even greater variation between ʔaCCaC (38%) and CaCaC (24%). Similar tendencies were also found with respect to tokens, as shown in Table 4.

Table 3 Distribution of causative verbal patterns in types by variety-modality
Table 4 Distribution of causative verbal patterns in tokens by variety-modality

Taken together, these results explicated above shed light on the distribution of verbal patterns in text production in PA and MSA. Their distribution can be used as a linguistic tool to measure the differences between Arabic modalities and varieties. The most noticeable morphological differences are between varieties, where spoken MSA and written MSA pattern more closely together and different from PA. At the same time the morphological differences that surface more prominently are between PA on the one hand and written MSA on the other, with spoken MSA occupying an intermediary position.

5.2 Case Markers

Overt case markers are commonly regarded a key feature of the difference between MSA and spoken, or colloquial Arabic. This is because these markers have disappeared from all spoken varieties (Maamouri, 1998) remaining in a few lexicalized forms such as shukran ‘thank you’ or ahlan ‘welcome’ Only MSA marks case on nouns and adjectives by suffixation (Holes, 1995; Saiegh-Haddad & Henkin-Roitfarb, 2014, among others). Thus, knowledge of case marking in not acquired naturally but is learned mainly at school. Moreover, case markers are only obligatory in writing/spelling in Arabic when they involve letters rather than short vowel or nunation, both of which are represented as optional diacritics. This means that case markers are often not encoded in written MSA because the default orthography of Arabic is the unvowelized orthography which uses only letters and disposes of diacritics. Case markers are not encoded in spoken MSA either because speakers prefer to use the pausal forms and to disperse with word final inflections. This is mainly because: (a) they do not master the complex system of case marking, and (b) case marking does not alter the meaning of the word (Saiegh-Haddad & Henkin-Roitfarb, 2014).

In a previous study, Laks & Berman (2014) examined narrative text production among adult speakers in colloquial Jordanian Arabic and written MSA. The texts analyzed in that study did not display evidence for actual use of case marking in neither variety. Instances in which case was explicitly used were restricted to accusative case in adverbials, e.g. qari:b-an ‘soon’ (cf. qari:b ‘close’), and to “sound” masculine plurals and dual forms, where case-assignment is marked by the addition of one or more (consonant) letters as bound suffixes in direct objects , adverbials , duals and copular constructions.

Examination of our data produced by school graders reveals similar tendencies to those produced by adults as reported in Laks & Berman’s (2014) study. Overts case markers were rarely used in both spoken and written MSA texts. At the same time, interestingly, younger participants stood out in degree of usage of case markers with 4th graders using case markers to the greatest extent. This could be explained by the fact that 4th graders have been reading the fully vowelized orthography since the 1st grade and conceive the system of case marking as an indispensable part of MSA. Exposure to the unvowelized orthography in the Arab school system in Israel happens mainly after the fourth grade and hence these children are still immersed in the vowelized orthography and are used to reading and writing in this orthography. The usage of case markers decreased with age and was almost completely absent in the texts produced by adults.

Examination of the data reveals some degree of variation with respect to the usage of case markers. The following example (3) demonstrates a case where the same noun bayt ‘house’ is used in the same syntactic position once without and once with a case marker (in this case -i).

(3)

rajaʕtu ʔila l-bayt…wa-ʕindama rajaʕtu ʔila l-bayt-i ….

 

‘I returned to the house…and when I returned to the house …..’

 

(Majd, 4th grade)

Differences in degree of usage of case markers were also found according to modality. Case markers were more common in spoken texts (PA and spoken MSA) than in written texts (written MSA). MSA is typically more written than spoken, and case markers are typically learned in the context of written language. We believe that case markers were used more in spoken texts because participants wanted to over distinguish between spoken MSA and spoken PA and to mark MSA as the more formal variety. Another explanation pertains to the fact that usage of case marking is obscured by the fact that written MSA does not encode diacritics.

Morpho-syntactic factors appear to also be associated with the use of case markers and lack thereof. We focus here is on the distribution of case markers in 4th graders texts, where the usage of case markers was the highest. An analysis of the data shows that the distribution can be partially predicted based on systematic structural guidelines.

  1. (i)

    Lexical category. Case markers are more common on nouns than adjectives. This might suggest that case is perceived as more typical of nouns. As shown in (4), the noun walad ‘boy’ is used twice: once with the nominative case marker -un and once without it. In contrast, neither adjective in the same sentence marks case.

(4)

qa:la li: [eh] ʔannaka walad-un (N) sayyiʔ (Adj) wa-ʔanta [eh] walad (N) ʔana:ni: (Adj)

 

‘(he) said to me you are a bad boy and you are a selfish boy’

 

(Mahmud, 4th grade)

  1. (ii)

    Definiteness. Indefinite nouns demonstrate a higher ratio of case marking. As shown in (5), the speaker used the indefinite noun film with a case marker, but the definite noun without one. This may also be related to an orthographic characteristic, namely the use of the letter alif in the orthographic representation of accusative case.

(5)

ʔara:dat ʔan nuša:hid film-an…. kunna nuša:hid al-film

 

‘(she) wanted/liked that we watch a film....we were watching the film...’

 

(Aseel, 4th grade)

  1. (iii)

    Syntactic position. Subjects of sentences tended to be more marked for case. As shown in (6) below, the noun ʔawla:d ‘children’ is the subject of the sentence and receives the nominative case marker u, while the noun malʕab ‘courtyard’ is a direct object and does not receive the accusative case marker a.

(6)

wa-ʕindama ʔaxa∆a al-ʔawla:d-u al-malʕab

 

‘and when the children took over the court yard’

 

(Laian, 4th grade)

  1. (iv)

    Bound morphemes. Most nouns with possessive clitics received case markers. This is probably because when clitics are appended case markers become an internal part of the word and not pronouncing the case marker would result in a consonant cluster that is not licensed, or not typical of MSA phonology (baytuna/baytana/baytina vs. baytna ‘our house’). As shown in (7) below, the noun ʔuxt ‘sister’ receives the possessive clitic -ha ‘her’ and the case marker u surfaces between the two morphemes.

(7)

baʕda ∆a:lik ʔatat ʔuxt-u-ha

 

‘then her sister came’

 

(Lana, 4th grade)

Theses tendencies accord with previous studies that examined spoken MSA in formal speeches and interviews of adults (Meiseles, 1977; Parkinson, 1994; Magidow, 2012; Hallberg, 2016).

To sum up, as expected, texts produced in PA were not found to include case markers. This suggest that the usage of case markers (and lack thereof) case may be used as a distinctive feature differentiating the two varieties (PA vs. MSA) and modalities (spoken MSA vs. written MSA) as they are actually used by speakers. However, this feature is mostly dominant at the early grades because the use of case markers is encoded in school textbooks and is explicitly taught, yet its use decreases with development and gradually ceases to serve as a tool for distinguishing between modalities and varieties. The results also showed that usage of case marking may be predicted by structural properties such as lexical category, definiteness, syntactic position and the usage of bound possessive morphemes.

6 Conclusion and Future Directions

This chapter is a description of a research project that examines the distribution of language structures as reflected in actual language use in Arabic diglossia according to variety and modality distinctions. We presented results from two domains: verbal patterns and case markers. The distribution of verbal patterns reveals both variety and modality related differences. While there are three patterns that are the most frequently used in both varieties and modalities, some interesting differences emerged. In both varieties and modalities, CaCaC was the most productive pattern, followed by CaCCaC and tCaCCaC. The other patterns were less frequently used overall. At the same time, some of the less frequently used patterns, like Ca:CaC and iCtaCaC were clearly sensitive to variety and were more frequent in spoken and written MSA than in PA.

The voluntary usage of syntactic case markers is another manifestation of the language users’ sensitivity to variety and modality distinctions in Arabic. Voluntary use of case markers distinguished between PA and MSA among young children in particular. Case markers, which are typically only expected to be used in MSA, were used more in spoken MSA than in written MSA. We think this might be related to the participants’ intent to over distinguish between spoken MSA and (spoken) PA and to mark MSA as the more formal variety. Alternatively, case markers are represented in Arabic mostly using diacritical marks, and because the default orthography of Arabic is the unvowelized, writers tend to omit these markers from their written texts more often than they did in speaking MSA. Moreover, because the use of case markers, or lack thereof, determines the phonological structure of the word, especially when they appear word-internally (like before clitics), their use may be phonologically driven. More research is needed to test this hypothesis directly.

This chapter shed light on morphological differences in Arabic text production according to differences in varieties and modalities . The data reported in the current study is a first step in showing that such differences may be to a large degree predictable and suggest that certain linguistic constructions are more typical of one variety/modality condition than another. It remains to be seen whether other linguistic structures tend to also pattern systematically with differences in modality and variety such as nominal and adjectival patterns, syntactic agreement, word order, text length and syntactic complexity.