Introduction

Children practice writing and have some ideas about the writing system before they systematically learn to read or write (Gerde et al., 2012; Tolchinsky, 2003). When writing words without formal instruction, children often use unlicensed letters that have no connection to the word that they are writing—(see Fig. 2). These letters are referred to as unlicensed because they are neither licensed by phoneme-grapheme rules nor by orthographical representations in the mental lexicon. The current study investigated the use of these letters along children’s early writing development in Hebrew—an abjad writing system. We aimed to learn about the role of statistical learning in predicting specific categories of unlicensed letters during the early writing development. Also, we studied how children’s writing level, age, gender, length of the first name, Socio-Economic Status (SES, represented by the mothers’ education predict children’s use of unlicensed letters in each of these categories.

Early writing development

Before the formal study of writing, children participate in writing activities such as writing their name or writing a birthday card (e.g., Neumann et al., 2009; Puranik & Al Otaiba, 2012; Puranik & Lonigan, 2011). When children practice early writing, they first learn the basics of their writing system (e.g., words’ length and letter shapes), and gradually convert the sounds of the word (phoneme/sub-syllable) into the appropriate letters and eventually build the letters into words (Jones, 2015; Levin & Aram, 2013; Martins et al., 2013; Tolchinsky, 2005). This process is positively associated with the development of other literacy skills (e.g., Kessler et al., 2013; Martins et al., 2013; Ouellette & Sénéchal, 2008, 2017).

When studying children’s writing development, Levin and Bus (2003) and Levin et al. (2005) described a developmental scale that begins with nonrepresentational scribbles, progresses to letterlike scribbles, and then shifts to using letters with no phonological connection (”random” letters which we refer to in our study as unlicensed letters). Later, children transition into a basic phonetic representation of the word when they use at least a single letter that is phonologically related to the target word, accompanied by unlicensed letters. Gradually, children use more phonologically appropriate letters in their writing, and words become readable. They eventually write all the letters with a phonological connection to the word that they are spelling, even if the spelling is somewhat incorrect (e.g., spelling mistakes). Finally, children use conventional spelling.

Between the time of using letters with no phonological connection and using letters that are phonologically related to the word, there is a bridge in which children add unlicensed letters to their writing. These letters are not related to the word that the child aims to write but are used to varying degrees alongside other phonologically correct letters. Sometimes the child uses only unlicensed letters to write a word. For example, a child can write the word גזר GZR (carrot), pronounced gezer, as a string of letters that are entirely unrelated phonologically like ליבלד, or as letters that are partly phonologically related like גדק where the letters ג is phonologically related, but the other two letters are not, or גסלמ where the letters are phonologically related (we accepted phonologically close letters as phonologically related), and the other two letters are not. In the first example, all the letters are considered unlicensed; in the second and the third examples, only the two letters that are not related to the word are considered unlicensed letters. In the current study, we examined the characteristics of these unlicensed letters in Hebrew. We investigated the role of statistical learning in predicting characteristics of preschoolers’ use of unlicensed letters throughout their writing development.

Statistical learning and early writing development

Statistical learning is a form of implicit learning that occurs due to the internalization of patterns in the environment without conscious effort. The more frequent a pattern is, the better it is internalized (Perruchet & Pacton, 2006; Saffran et al., 2004). Statistical learning requires pattern recognition and relies on the human ability to identify rules and regularity in everyday life. Sensitivity to the regularities and structures is the building block for our expectations in various areas such as cognition, social behavior, visual domains, and language (Aslin & Newport, 2012; Monroy et al., 2017).

In the context of language, salient regular visual characteristics of the local orthography are those that children internalize (Turk-Browne et al., 2009). Statistical learning does not require learners to understand why they make certain writing choices. Rather, learners unknowingly pick up patterns in their written environment and use them in their spelling (Kessler et al., 2013). For example, children are exposed to letters in their names from a young age. Due to this exposure, they tend to overuse these letters in their spellings when writing different words (Treiman et al., 2001). In languages that capitalize the first letter of first names, children show a special preference for the first (capitalized) letter of their name and use it frequently, even when it is not correct phonologically (Both-de Vries & Bus, 2008). Similarly, letters with high frequency in the local orthography tend to be over-represented in children’s early spellings (Pollo et al., 2009). In languages with a high proportion of consonants to vowels such as English, children include more consonant letters in their early writing than in languages with a low consonant to vowel ratio, such as Portuguese (Pollo et al., 2005). Despite the growing evidence that statistical learning forms some of the characteristics of early spellings, research has mainly stemmed from Romance and Germanic languages like Portuguese, French, and English (see Read & Treiman, 2013). In the current study, we explored a Semitic script, that of Hebrew, which is an abjad writing system.

The Hebrew writing system

The Hebrew writing system is an abjad consonantal script. It consists of 22 consonant letters that are written from right to left. Five letters (final letters) have an allograph when placed as the final letter of a word (Ravid, 2012). Four letters can take the role of both consonant and vowel. For example, in the word (‘song’), the letter < י > takes the role of a vowel, while in the word (‘day’) it represents a consonant. However, these letters usually indicate a vowel. The twinned role of these letters makes them highly frequent in the Hebrew script, albeit difficult for young spellers to use correctly (Levin et al., 2013). Vowels are only partly represented by letters, and so a word’s length is relatively short (Frost et al., 1987) and generally ranges between 2–6 letters, as in the phrases “hello mother” «שלום אמא,» “good morning” «בוקר טוב» or “give me” «תן לי» (for an example of number of letters per word in a corpus of children’s books, see the methods section).

The Hebrew alphabet is known as a square script—a block alphabet with many 90° angle letters. Letters tend to have high visual similarity to one another. They have more horizontal and vertical strokes and fewer curves and diagonals than letters in the Latin script (Shatil et al., 2000; Treiman et al., 2012). As in other languages, Hebrew-speaking children use unlicensed letters in their early writing, gradually moving towards phonologically correct spelling with unlicensed letters slowly decreasing (Aram & Levin, 2001; Aram, 2005; Aram et al., 2014; Aram & Chorowicz Bar-Am, 2016; Aram et al., 2016).

In the current paper, we studied five categories of unlicensed letters. We focused on letters that are present/absent in the child’s first name, letters that are more/less frequent in Hebrew texts, letters that spell consonants/vowels, letters that are visually similar/dissimilar, and letters that are more/less difficult to graphically produce. We explored the role of children’s writing measures (writing level and mean sum of written letters per word) and individual indices (age, gender, mother’s education as a proxy for family SES and length of the child’s first name) in predicting the amount of children’s use of each of the abovementioned five categories of unlicensed letters during early writing development.

Children’s writing measures

Writing level

Children’s writing level is related to their use of unlicensed letters. During the process of early writing development, a higher level of writing is associated with the lower use of letters from the child’s first name and higher use of phonologically representing letters (Both-de Vries & Bus, 2008; Share & Levin, 1999). Since preschoolers keep using some unlicensed letters in their spelling as their general writing level progresses (Aram & Levin, 2001; Tolchinsky et al., 2012), we were interested in the amount of use of each of the five categories of unlicensed letters throughout writing development.

Number of letters per word

Another aspect that reflects children’s understanding of their orthography is the number of letters they write to represent a word. The more children know about their orthography, the more the number of letters in their written products resembles the mean length of words commonly used in their language (Pollo et al., 2009). Aram and Levin (2001) found that children who were more advanced in their literacy skills (letter knowledge, phonological awareness, and orthographic awareness) were more attuned to the phonological length of Hebrew words in their early writing. They very rarely used only one letter to represent a word and frequently wrote about 3–5 letters per word, even when all the letters were unlicensed and not related to the target word. Children with lower literacy skills used up to 10 letters to write one word, far from the mean letters per word in Hebrew.

Individual indices

Age

Children’s early literacy skills, including spelling, are related to age (e.g., Puranik & Lonigan, 2011). With the increase in age, children are more exposed to their orthography, have more experience, and learn more from their parents and teachers. For example, the nonphonological spellings of older children include a wider variety of letters compared with younger children (Treiman et al., 2007).

Gender

Some studies suggest that girls outperform boys in early literacy skills (Lee & Al Otaiba, 2015; McTigue et al., 2020). Additionally, a few studies found that girls are more interested in literacy than boys (Alexander et al., 2008; Baroody & Diamond, 2013; Meece et al., 2006; Peterson & Parr, 2012) and progress more in their reading during kindergarten (Chatterji, 2006).

Length of child’s first name

The length of the child’s first name may be related to spelling patterns due to the size of the pool of familiar letters from which children can choose. Preschoolers with a long first name tend to be familiar with more letter names and practice writing more letters than children whose names are short (Diamond et al., 2008; Puranik & Lonigan, 2011; Welsch et al., 2003).

Socioeconomic status (SES)

Children’s literacy skills are related to their socioeconomic background across cultures and orthographies (e.g., Aram et al., 2014; Ergül et al., 2017). Families from lower SES practice fewer literacy activities with their children, and parent–child interactions are characterized by lower levels of literacy support (Aram & Levin, 2001; Puranik & Al Otaiba, 2012; Robins et al., 2014). The spelling skills of children from a higher SES are significantly higher than those of children from a lower SES (Lee & Al Otaiba, 2015). Maternal education is a widely used indicator of SES (Dingemann et al., 2019), as it is associated with other indicators such as income (Rendall et al., 2021), and it is positively related to children’s literacy skills (e.g., Vernon‐Feagans et al., 2020; Younger et al., 2019). Thus, we used maternal education as an indicator of family SES.

We opted to learn how a child’s writing level, age, gender, SES (measured via mother’s education), and length of the child’s first name predict the amount of use of each of five categories of unlicensed letters: Letters from the child’s first name, letters that are more/less frequent in Hebrew texts, consonant/vowel letters, letters that are visually similar/dissimilar, and letters that are more/less difficult to graphically produce.

Based on the literature, we hypothesized that children’s use of unlicensed letters when spelling words in Hebrew would include greater use of letters from their first name, letters with high frequency in Hebrew texts, as well as greater use of consonants and letters that are easy to produce graphically. Unlike some other scripts, a capital letter at the beginning of one’s first name is not used in Hebrew; therefore, we asked if Hebrew-speaking children would show greater use of the first letter of their first name in comparison to other letters. In addition, acknowledging the high degree of visual similarity among letters in Hebrew, we asked if when using unlicensed letters in their spelling, children would use more letters that are visually similar to other letters than letters that are visually dissimilar. We aimed to study these five categories of unlicensed letters that children use at different writing levels (writing level and mean sum of letters per word) and across individual indices (age, gender, SES, and length of the first name).

Method

Participants

In this study, we used the writing outputs of Hebrew-speaking preschool children who participated in three studies in which they were asked to write between six to eight words (nouns) that are part of young children’s vocabulary (Table S1). Participants were 152 children (85 girls and 67 boys) recruited through convenience sampling. Their parents had volunteered to participate and reported that their children had no developmental disabilities. Children’s age ranged from 4 to 7 years (M = 63.9 months; SD = 6.90). They lived mainly in the center of Israel and studied in different preschools. In Israel, preschools are separated from elementary schools. Formal reading and writing instruction begin in first grade. Preschool teachers focus on language and communication skills, read books to children and introduce the alphabet, with little time devoted to writing activities (Aram et al., 2014). Mothers’ education ranged from high school diploma (27.1%), through BA (45.2%), to MA and Ph.D. (27.7%).

The analysis of the children’s 1205 words revealed that 115 of the word writing products (9.54%) were only scribbles (i.e., signs that are not identified as a Hebrew letter; see Fig. 1) and 357 writing products (29.63%) were words fully spelled phonologically or conventionally. This study did not analyze these 472 writing products (scribbles, phonologically or conventionally spelled).

Fig. 1
figure 1

Examples of Children’s writing on the 7-point writing development scale: the children wrote the word צלחת, (CLĦT, tsalaxat, ‘plate’)

The study focused on the use of unlicensed letters. Hence, we analyzed only written words that contained unlicensed letters (N = 733 written words). These 733 words included 2109 unlicensed letters—recognizable letters that have no connection to the target word. These letters appeared either as the only letters in the word or alongside phonologically related letters (see Fig. 2). In the current study, we only analyzed the characteristics of these unlicensed letters. We studied if they are present/absent in the child’s first name, more/less frequent in the Hebrew scripts, spell vowel/consonant, visual similar/dissimilar, and if they are easy/difficult to graphically produce.

Fig. 2
figure 2

Spelling development in Hebrew: examples of use of unlicensed letters

Procedure

Education M.A. students collected data. Following the parents’ signing a consent form, the student came to the children’s home and asked the children to independently write the words. The student gave the child an A4 paper and a pencil, showed a drawing of an object, and asked the child to write the word. The oral instructions were straightforward, for example: “Please write the word/tsalaxat/ (plate).” To decipher the children’s spellings and decide which letter was written and if it is an unlicensed letter, we had to reach an agreement regarding the identification of the letters. Inter-rater reliability was performed by the leading researcher and three M.A. students in Education. All raters coded 884 letters that appeared in 250 words written by 38 children (20% of the data). Each rater was given pictures of all the scanned writing outputs of a particular participant (including a scan of the written form of the child’s first name). Raters were told that the letters did not necessarily match the target words and that preschoolers might write the letters in a “mirror writing,” tilted or not perfectly graphically formed. The raters were only required to identify the letters by name in each writing output. A good percentage of agreement was achieved (inter-rater reliability of 94%). Any coding discrepancies between the raters were resolved through discussion.

Measures

Writing level and number of written letters per word

We evaluated the child’s general writing level on a 7-point scale adapted from Levin et al. (1996) and Levin and Bus (2003). This scale was successfully used in previous studies in Hebrew (Aram, 2007; Aram et al., 2021). The scale ranged from scribbles and pseudo letters, through the use of only unlicensed letters and basic consonantal spellings (one consonant letter that is phonologically represented), partial and full consonantal spellings, to conventional spelling, including vowel letters (see Fig. 1). Higher scores indicated a more advanced writing level. The mean score across the words served as the word writing score. Inter-judge reliability computed on the scores of the writings of 20% of the sample by two independent judges (M.A. students in Special Education), resulted in a highly significant Cohen's κ of 0.90. Reliability across items was excellent (Cronbach’s α = 0.99). The words that the children were asked to write included a mean of 3.35 letters per word. The letters that the child wrote for each word were counted, and the mean score across the words served as the number of written letters per word score (Table S1).

Characteristics of unlicensed letters

We were interested in children’s use of different letters as unlicensed letters in Hebrew, focusing on five characteristics: Child's name, frequency in texts, consonants and vowels, visual similarity, and ease of production. For each characteristic, we grouped letters into categories (e.g., frequent vs. infrequent letters), studied the chance of using unlicensed letters from the category and how age, gender, SES, and length of the first name predict the use of each category.

Letters from the child’s first name

The number of times the children used any letter from their first name or the first letter of their first name as an unlicensed letter was counted.

Letters’ frequency in Hebrew texts

Each letter that was used in children’s spellings as an unlicensed letter was categorized as belonging to one of four frequency groups (low, medium–low, medium–high, high) based on its frequency of appearance in children’s books. The Hebrew letters were categorized into these groups using a letter counter. We used this counter and calculated the frequency of Hebrew letters in eight popular narrative children’s books (e.g., Itamar Meets a Rabbit by David Grossman or Frog is Frog by Max Velthuijs). The number of words ranged from 330 to 728 (M = 528, SD = 143) and the number of letters ranged from1387 to 3676 letters (M = 2342, SD = 750). The number of letters per word ranged between 2 and 7 (M = 3.68, SD = 1.20). The correlation between our categorization of letters’ frequency and the letters’ frequency count in Israeli adults’ newspapers (Shoken & Shor, 2010) was high (r = 0.90, p < 0.001). Letters’ frequency was categorized as: low (n = 6 letters appeared between 0.1 and 1%); medium–low (n = 7 letters appeared between 1.1 and 3%); medium–high (n = 8 letters appeared between 1 and 6%); and high (n = 6 letters appeared between 6.1 and 10.65%) (See Table S2).

Consonants and vowel letters

Each unlicensed letter used in the children’s spellings was categorized as being a consonant or vowel letter. Notice that all the vowel letters in the words in our study served only as vowel letters [except for the letter /Y/ in the word yad (a hand), See Table S1].

Letter’s visual similarity versus dissimilarity

Each unlicensed letter used in the children’s spellings was categorized as either visually similar to other letters or visually dissimilar. For example, the letters have some visual similarities, while the letters do not look like other letters. To determine this, a visual similarity test was used (based on Treiman, 2006). The test was administered via email to 90 adults (60 Hebrew and 30 English-speakers). Participants were asked to rank pairs of letters for their level of similarity on a 7-point scale (0 = no similarity to 7 = very similar). Pairs of letters with an average score greater than 3.5 were defined as visually similar. Pairs of letters with an average score lower than 3.5 were defined as visually dissimilar. The test showed good reliability (Cronbach’s α = 0.82). Studying the possible differences between the Hebrew and the English-speaking groups showed no significant differences for the similar (t(88) = 0.712, p > 0.05) and the dissimilar letters (t(88) = 0.534, p > 0.05). The letters that show visual similarity (ב, ד, ה, ו, ז, ח, י, ך, כ, ק, נ, ן, ר, ת) have at least one letter that is visually similar to them. The dissimilar letters were . We ran ANOVAs with repeated measures to verify the difference between the two groups of letters. We found a significant difference (F(89,1) = 229.33, p = 0.000) between the letters rated as having high visual similarity (M = 4.20, SD = 1.04) and the letters rated as visually dissimilar (M = 2.10, SD = 0.086).

Letter’s complexity of graphic production

Each unlicensed letter used in the children’s spellings was categorized into one of three levels of graphic production (easy, medium, and difficult). Graphically writing letters is a demanding, acquired fine motor skill. According to the literature, children’s handwriting starts with vertical strokes, followed by horizontal strokes and circles. Last to appear are the diagonal lines (Feder & Majnemer, 2007). After controlling the direction of single lines children learn how to use a combination of lines. When studying children’s copying skills, the hardest shapes for preschoolers are the square and the triangle (Beery & Buktenica, 1989). Following these ideas, we based our scoring on Shatil’s (1993) mapping of the Hebrew letters according to psychographic development to three groups of letters according to the number of strokes, their directions, and combinations: Easy—one stroke, mainly vertical lines ; Medium—two strokes, vertical and horizontal lines ; and Difficult—two or three strokes that include diagonal lines (, א, ש, צ, ע, מ, ל, ט, ז, ג).

Data analysis

For baseline comparisons, we used the generalized estimating equations (GEE) procedure to test differences in letter counts of different characteristics (e.g., the number of times similar or dissimilar letters appeared in the child's word writing). Specifically, we applied the Negative Binomial Distribution (NBD; SPSS V.25.0) to these counted letters. The NBD is a corrected form of the Poisson distribution, where the mean is equal to the variance assumption is not assumed (Hilbe, 2017). A count outcome represents discrete numbers and is usually controlled by a total count, in case this total varies from one observation to another. The advantage of this statistical procedure is twofold: It applies various effects to predicted letter counts, i.e., differences between two or more letter characteristics that appear in the child's written words (Wald's Chi-Square test); and predicted marginal means, which are the predicted frequencies of each tested letter’s characteristics. In our study, each child wrote a few words. The GEE procedure integrates two words per child, that is, the child's name and the non-name word (Horton & Lipsitz, 1999), namely, the count of letters in the name, and the count of letters in other words.

The more advanced analysis aimed to explain children’s use of various categories of unlicensed letters (e.g., vowel letters, highly frequency letters, graphically similar letters) by the child’s writing level, personal variables (e.g., age, gender), and the target words (Baayen et al., 2008). To study this question, we used the HLM 7.01 program (Raudenbush et al., 2013). To account for the single word and the child's features of the study, we estimated a two-level random effects model, in which level one was assigned to words within children, and level two was assigned to children. Specifically, in level 1, we entered the child’s writing measures (writing level and mean sum of written letters) as assessed per word. In Level 2, we included the child’s variables (age, gender, mother’s education, and length of the first name). The HLM program improved the sensitivity of the model by allowing a random slope, whereas GEE was limited to random intercept only. The random slope means that regression coefficients varied randomly across children, namely, the association between the sum of letters per word, for example, and the outcome could vary across children. In the original analyses, we included a third level, which was comprised of the target word. We excluded it from the final modeling framework to remain with a two-level analysis due to the small variance of the third level. In this analysis, we assumed a normal distribution, yet the original letter count distributions were asymmetric—higher frequencies of smaller counts, and lower frequencies of higher counts, thus, a natural log transformation was applied to these letter counts with the minimum addition to avoid the undefined ln(0). Our interpretation of the estimates was more focused on the direction, that is, a negative sign indicated a negative association between the independent and the outcome variables and vice versa for a positive sign. In log-transformed models, exponentiated estimates should be interpreted as the percent change in the letter counts in response to a change in one unit of the independent variable ((exp(b) − 1)*100), and this interpretation refers to additional effects beyond the intercept.

Results

Descriptive statistics

Writing level and number of written letters per word

Evaluated on a 7-point scale, children’s average writing level was 3.60 (SD = 0.94), reflecting frequent use of phonological unlicensed letters and the beginning of phonological spelling (writing one or more letters phonologically correct in a word). It should be noted that even children who wrote at a higher writing level (basic or partial consonantal spelling) still used some unlicensed letters (for example, see Fig. 1). Studying children’s spellings, we found that all the outputs included at least two letters. On average, children wrote 3.63 letters per word (SD = 1.53), with 0.75 letters spelled phonologically correct per word (SD = 1.06), and 2.88 unlicensed letters per word (SD = 1.76).

The spelling outputs revealed a range of use of unlicensed letters from one or two per word (47.30%), through three or four (38.10%), five or six (10.50%), to more than seven letters in one written word (4.10%), even though the longest word the children had to write consisted of only six letters. The ratio was one phonologically correct letter for every four unlicensed letters. Children who used unlicensed letters in their spelling tended to use them more often than phonologically correct letters (See Fig. 2 for examples).

We counted the appearance of each of the Hebrew letters (including final letters) in the children’s unlicensed letters (2109 letters) to learn which letters appeared more frequently. We found that the first letter in the Hebrew script—א, was most frequently used (n = 226), followed by ה (n = 223), while the least used letters were ץ (n = 1) and ך (n = 2).

Predicting characteristics of unlicensed letter use

Table 1 presents the GEE modeling results; these are the predicted frequencies of use of each of the five categories of unlicensed letters based on children’s outputs.

Table 1 Generalized estimating equations modeling results: differences in frequencies of use of unlicensed letters by their characteristics in children’s spellings (N = 2109 unlicensed letters)

Table 1 shows that Hebrew-speaking children were generally less likely to use letters from their names (M = 6.24; SD = 0.60) as unlicensed letters compared with other letters (M = 7.66; SD = 0.53). They were also less likely to use the first letter of their name (M = 1.85; SD = 0.22) compared with other letters in their name (M = 4.39; SD = 0.46). Children tended to use more consonants (M = 8.12; SD = 0.60) than vowels (M = 5.77; SD = 0.50), and more visually similar letters (M = 8.40; SD = 0.63) than visually dissimilar letters (M = 5.50; SD = 0.45). Post-hoc tests revealed that children were more likely to use letters that were easy to produce graphically (M = 4.87; SD = 0.42) than letters that were moderately complex to produce (M = 4.00; SD = 0.31), however, the difference was not significant (p = 0.059).

To learn about the correspondence between the frequency that the children used specific letters as unlicensed letters in their spellings, and the letters’ frequency in Hebrew texts, we summed up their appearance in the children’s outputs. We found that the children’s mean use of letters with high, moderate-high, moderate-low, and low frequency in Hebrew texts as unlicensed letters was 7.60 (SD = 0.60), 3.73 (SD = 0.33), 1.86 (SD = 0.21), and 0.69 (SD = 0.11), respectively. There was a strong Spearman’s correlation (rp = 0.91; p < 0.000) between the letters’ frequency in Hebrew texts and their frequency of appearance as unlicensed letters in the children’s spellings. That is, the more frequently the letters appeared in Hebrew texts, the more frequently they were used as unlicensed letters in the children’s spellings and vice versa.

Table 2 presents the results regarding the role of the child’s word writing characteristics (writing level and mean sum of written letters—first level) and personal variables (age, gender, mother’s education, and length of the first name—second level) in predicting the percent change of using each of the five categories of unlicensed letters that we studied via HLMs. The estimated coefficients give the direction in which the independent factor affects the outcome number.

Table 2 Two-level modeling results predicting characteristics of unlicensed letters in children's spelling

Counts of letters from the child’s first name

The conditional model provided a significantly better fit to the data than the baseline model (∆ Deviance = 47.74, p < 0.001). In the conditional model, a lower writing level (b =  − 1.49, p < 0.001), more letters per word (b = 0.52, p < 0.001), younger age (b =  − 0.13, p < 0.05), and a longer name (b = 1.60, p < 0.001), were significantly related to a higher usage of unlicensed letters from the child’s first name. In other words, a unit change of lower writing was associated with a 75% decrease in applying the first name letters, and an additional letter in the word was associated with a 68% increase in the use of the first name letter.

Letters’ frequency in Hebrew texts

Overall, the conditional model provided a significantly better fit to the data than the baseline models for high (∆ Deviance = 115.8, p < 0.001) and low (∆ Deviance = 24.30, p < 0.001) frequency letters.

In the conditional model, a higher writing level was associated with lower use of both high (b =  − 1.00, p < 0.001) and low frequency letters (b = 0.52, p < 0.001). A higher total mean sum of written letters per word was associated with greater use of both high (b = 0.90, p < 0.001) and low frequency (b = 0.17, p < 0.05) letters as unlicensed letters. Older children used fewer high frequency letters (b =  − 0.05, p < 0.05) and more low frequency letters as unlicensed letters.

Counts of consonants and vowels

The conditional models provided a significantly better fit to the data than the baseline models for consonants (∆ Deviance = 143.5, p < 0.001) and vowels (∆ Deviance = 114.20, p < 0.001). This means that adding the level 1 and level 2 explanatory variables improved the model fit for both the counts of consonants and vowels.

In the conditional models, a higher writing level was associated with lower use of both consonant (b =  − 1.60, p < 0.001) and vowel (b =  − 0.94, p < 0.001) letters as unlicensed letters. A higher mean sum of written letters per word was associated with greater use of both consonants (b = 0.30, p < 0.001) and vowels (b = 1.00, p < 0.001) as unlicensed letters. Older children used more consonants (b =  − 0.06, p < 0.01) and fewer vowels (b =  − 0.10, p < 0.001).

Letters with visual similarity versus visual dissimilarity

The conditional models provided a significantly better fit to the data than the baseline models for visually similar (∆ Deviance = 122.67, p < 0.001) and visually dissimilar (∆ Deviance = 87.11, p < 0.001) letters. A higher writing level was associated with lower use of both visually similar (b =  − 1.20, p < 0.001) and dissimilar (b =  − 1.50, p < 0.001) letters as unlicensed letters. A higher mean sum of written letters per word was associated with greater use of both visually similar (b = 0.60, p < 0.001) and visually dissimilar (b = 0.50, p < 0.001) letters as unlicensed letters. Older children used fewer visually similar letters (b = 0.05, p < 0.01), and more visually dissimilar letters (b = 0.09, p < 0.05) as unlicensed letters. Girls were more likely to use visually similar letters than boys (b = 0.83, p < 0.01).

Letters’ complexity of graphical production

The conditional models provided a significantly better fit to the data than the baseline models for the use of letters with easy (∆ Deviance = 133.10, p < 0.001), moderate (∆ Deviance = 79.80, p < 0.001), and difficult (∆ Deviance = 81.60, p < 0.001) levels of graphical production. A higher writing level was associated with lower use of unlicensed letters with easy (b =  − 1.50, p < 0.001), moderate (b =  − 1.10, p < 0.001), and difficult (b =  − 1.46, p < 0.001) levels of graphical production. A higher mean sum of written letters per word was associated with greater use of unlicensed letters with easy (b = 0.96, p < 0.001), moderate (b = 0.78, p < 0.001), and difficult (b = 0.58, p < 0.001) graphical production. Older children used more unlicensed letters with difficult graphical production (b = 0.09, p < 0.05).

Discussion

Studying literacy development within a wide range of writing systems promotes our understanding of their uniqueness and provides implications for children’s learning (Share, 2020). In the current study, we examined the nature of unlicensed letters—written letters that have no exact or close phonological connection to the word’s letters that the child is writing in Hebrew. Before formally learning to write, children start to use unlicensed letters with varying frequency (from solely using these letters to using one or two letters alongside phonologically correct letters). In this study, we (a) explored characteristics of unlicensed letters in children’s early spellings in Hebrew, and (b) evaluated the role of children’s level of word writing and individual indices (age, gender, family SES, and length of the first name) in predicting the use of these categories of letters. Results indicated that children’s early spelling products contained significantly more unlicensed letters that have a high frequency in Hebrew texts, consonant letters, letters that are visually similar to other letters, and letters that are easy to produce graphically. However, children showed no general preference for letters from their first name, including the first letter of the name. The child’s writing level as well as their age, gender, and length of the first name (but not SES), uniquely predicted the use of the studied characteristics of unlicensed letters in early spellings.

Children’s writing measures significantly predicted each of the five categories of unlicensed letters. Specifically, a higher writing level and a lower mean sum of written letters per word in the children’s spelling products corresponded with the general lower use of unlicensed letters. This negative link is in accordance with the idea that higher writing skills go hand-in-hand with an increase in children’s ability to encode phonemes into graphemes and use fewer unlicensed letters (Both-de Vries & Bus, 2008). Since words in Hebrew are generally short and include 2–6 letters (Velan & Frost, 2011), and the mean length of letters per word in the present study was 3.35 letters, the use of more letters per word indicates a lower level of understanding of the writing system. This result supports a previous study that showed that Hebrew-speaking children who had higher early literacy skills used fewer letters per word (3–5) systematically in their spellings, even if all of them were unlicensed letters than children with lower literacy skills who used more letters (6–7) (Aram & Levin, 2001).

Children’s age also predicted characteristics of unlicensed letters use. Older age was linked to lower usage of letters from the child’s first name, letters with high frequency, and vowel letters, as well as greater use of low-frequency letters, consonant letters, letters with high visual dissimilarity, and letters with difficult graphical production as unlicensed letters. These findings align with the idea that age predicts writing level; thus, older children tended to use more sophisticated characteristics of unlicensed letters. Children who are older (and thus, have more exposure to the Hebrew orthography) used unlicensed letters that are different from the letters in their first name, fewer vowel letters that are easy to graphically produce in Hebrew, more consonant letters as the orthography is consonant based, and more letters with low frequency and high visual dissimilarity. All of these characteristics require a better understanding of the writing system.

Characteristics of unlicensed letters

Children’s first name

Based on the result, Hebrew-speaking children did not overuse letters from their first name compared with the rest of the alphabet. Yet, the length of the child’s name predicted the use of letters from the first name as unlicensed letters, with a longer first name associated with greater use of letters from first names as unlicensed letters. Since children use their names as their first pool of letters (Read & Treiman, 2013), the more letters one’s first name contains, the greater the pool of letters and possibilities, which is reflected in the greater use of letters from the first name as an unlicensed letter.

Based on the statistical learning principle of exposure, as well as previous findings (Bloodgood, 1999; Kessler et al., 2013; Pollo et al., 2009), we expected children to use letters from their first names more frequently than any other letters. Nonetheless, no such pattern was found. A possible explanation for this finding may be the relatively short length of first names in Hebrew compared with English names (Treiman et al., 2007). In our study, most of the children’s first names ranged between two or three (e.g., גל or תמר) to four letters (e.g., אלון). Six children (out of 152) had the longest names at five letters (e.g., אריאל). In their early writing, before understanding the phoneme to grapheme relations, children tend to use a variety of letters. They assume that the same letter cannot appear multiple times in a word, especially not one after the other (Ferreiro & Teberosky, 1982). We think that young Hebrew-speaking children try to diversify, and so if their name is short, they use other letters as well.

Also, unlike the Latin alphabet (Treiman & Kessler, 2011), in Hebrew, the first letter of the name is not capitalized and thus does not draw more visual attention. We suggest that when the first letter is the same size as the rest of the letters in the child’s name, children tend to use all the letters from their name with similar frequency. This pattern in Hebrew is different from other languages in which the capitalized first letter is used more frequently as an unlicensed letter, probably due to its salience (Both-De Vries & Bus, 2008; Levin & Aram, 2005; Treiman & Kessler, 2004, 2013).

Frequent/infrequent letters and consonants/vowels

As native speakers of languages such as English and Portuguese (Kessler et al., 2013; Pollo et al., 2009), Hebrew-speaking children used significantly more letters with high frequency in texts as unlicensed letters. This finding is consistent with the statistical learning approach, which states that young readers capture the orthographic regularities based on the relative frequency of their occurrence in written texts (Chetail, 2015).

Children in our sample used more letters that represent consonants than letters that represent vowels. Indeed, while using more consonants, they maintained a similar ratio of consonants to vowels as observed in Hebrew texts (Levin et al., 2013). Our results support those of Pollo et al. (2005), who found that Portuguese-speaking children tended to use more vowel letters than English-speaking children did, which is consistent with the higher percentage of vowels in Portuguese compared with English. Hebrew orthography is primarily consonantal, with consonantal phonemes fully represented by letters and vowels represented in a deficient manner (Shatil et al., 2000). It is plausible that even when spelling with unlicensed letters, children are affected by this consonantal nature of the Hebrew language.

Visual similarity versus visual dissimilarity

Children’s unlicensed letters contained more letters that are visually similar to other letters than letters that are visually dissimilar. This finding exemplifies the role of statistical learning in detecting visual regularities (Turk-Browne et al., 2009). The exposure to print in literate societies enables preschoolers to learn implicitly about the visual characteristics of written words. These findings are consistent with studies in which children also focused on the visual similarity of letters (Bourke et al., 2014; Treiman & Kessler, 2011; Treiman et al., 2012, 2014). While naming visually similar letters is a difficult task for preschoolers (Bourke et al., 2014; Treiman, 2006), using letters that are visually similar to other letters as unlicensed letters in their writing is easier. Since children learn from an early age to identify letters as members of a set, visual similarity seems to be one way that they link letters together (Lavine, 1977). By forming sets of patterns, children can make slight changes and write different letters.

Girls were more likely than boys to use visually similar letters. Beyond this single result, we found no gender differences. The results regarding gender differences in early literacy are divided, with some studies showing differences favoring girls’ literacy achievements while others showing no differences (McTigue et al., 2020). Our findings support the studies that did not find differences between genders in terms of literacy skills like writing letters and words (Puranik et al., 2013; Ritchey, 2008), as well as in the quality of graphically forming the letters (Graham et al., 2001, 2006; Weintraub & Graham, 2000).

Complexity of letters’ graphical production

Children’s use of unlicensed letters contained significantly more letters that were graphically easy to form than letters that were moderately complex to form. However, contrary to our assumption, no difference was found between letters that are easy and difficult to form. It is possible that the division of letters based on the psychographic theory (Shatil, 1993) is insufficient to explain the choices children make when using unlicensed letters. Some letters such as א (A) or ב (B) are more difficult to produce graphically, yet these letters are very familiar to children due to their location as the first letters of the alphabet and they use them frequently. This result is similar to studies that showed children favoring the first letters of the English ABC (Justice et al., 2006; Treiman et al., 2012). In addition, most Hebrew letters are comprised of a vertical line and one or more attached horizontal lines (e.g., ר, ד, ך, ז). This means that the difference in graphical production is not great and maybe this is why it is not reflected in the use of unlicensed letters.

Finally, SES did not contribute to any of the unlicensed letters’ characteristics. Although previous studies have demonstrated the role of children’s SES in writing development (Elimelech et al., 2020; Aram et al., 2014), the current findings suggest that statistical learning is less related to SES measures and more to the specifics of the orthography and other personal variables. Another possible explanation is that the sample was not varied enough, as about 73% of the mothers held a university degree compared to the national average of 55% (Taub Center, 2019).

Limitations and future direction

This study is the first to provide a window into children’s statistical learning and patterns of use of unlicensed letters during spelling development in Hebrew. However, some limitations should be noted. Participants’ SES background was relatively homogeneous, the children wrote different words, and linguistic measures such as vocabulary and other early literacy measures (e.g., letter knowledge, phonological awareness) that might have also accounted for the characteristics of their writing were not taken into account. Also, we used categorical rather than continuous scales. Some information was possibly lost in treating variables in such a categorical way.

Future studies should include a more heterogeneous sample and a smaller pool of target words while maintaining a clear distinction between consonant-only words versus consonant and vowel words. The measures should be continuous, and beyond writing measures, it may be useful to include linguistic measures to learn further about the contributions of these characteristics to the nature of children's use of unlicensed letters.

In conclusion, this study supports the role of statistical learning in writing development in Hebrew—an abjad writing system. It exemplifies the non-random way in which unlicensed letters are used and corresponds with universal (e.g., frequent letters in the orthography) as well as the language-specific features of Hebrew orthography (e.g., consonant vs. vowel letters, letters with visual similarity vs. visual dissimilarity).

Finally, our results could alert policymakers in the field of early literacy to the role of statistical learning and orthography in children’s literacy development in Hebrew. Sometimes, literacy teaching methods are based on literacy studies in English and are less adapted to the specifics of different orthographies (e.g., Lipka et al., 2016). Children need adults’ direction and support to learn to write and read (Ehri et al., 2001). Adults (parents and teachers) can use the fact that children use characteristics of unlicensed letters in their spelling to initiate conversations to support children’s literacy development. Adults who are engaged in writing activities with young children should be aware of children’s statistical learning and the nature of orthography-specific characteristics. Adults asking questions regarding letters when writing with children may promote children’s understanding of the writing system.