Keywords

Introduction

Lexical Errors are the most numerous ones among EFL/ESL learner errors (Agustín-Llach, 2011), while being the most complex ones at the same time (Agustín-Llach, 2017). Agustín-Llach (2011, 2017), emphasized the fact that lexical errors are not random, but adhere to a certain pattern and can be attributed to systematically repeated causes. There are two ways of looking at errors, a negative and a positive view. Thus behaviourist approaches look at errors as negative verbal behavior to be avoided lest it be learned, while cognitivist and other contemporary approaches welcome errors as evidence of learning (Richards & Rodgers, 2014). To a needs analyst, errors present a wealth of information about the learners and the level of their target language (TL) knowledge, as they pinpoint the vocabulary that might be either too difficult for the learner or not adequately taught or learned (Thornbury, 2002).

The first language of the learner might play a part in error formation (Dodigovic, Ma, & Jing, 2017). Hence it is vital to analyse learner errors in light of their first language (L1). Only a handful of such studies have been conducted in the Armenian English as a Foreign Language (EFL) context. Those mostly concentrate on the negative transfer from L1 (Aleksanyan, 2010; Levonyan, 2015; Yerznkyan & Chalabyan, 2015). However, none of them specifically targets lexical errors. Therefore, this study seeks to remedy the situation by conducting an analysis of lexical errors in fairly advanced Armenian leaners of English.

It is worth mentioning that Corder (1967) separates mistakes from errors because he finds mistakes of “no significance to the process of language learning”. In contrast, errors “provide evidence of the system of the language” that the L2 learner is using. James (1998) suggests another criterion for the distinction between mistakes and errors, which is self-correction. Here, mistakes can be corrected by the learner, if he or she is made aware of them, whereas errors cannot be self-corrected. The present study endorses James’ (1998) criterion of distinction and focuses on genuine errors only by compiling a corpus of academic writing which has already undergone the process of editing, most commonly based on self-correction. Thus, whatever erroneous language remains can be seen as genuine error, rather than mere mistake.

Background

Any inventory of errors would be useless if not grouped around some kind of taxonomy. According to Agustín-Llach (2011), a taxonomy helps describe the data coherently and analyse it systematically. In addition to bringing a sense of order and the potential to quantify the findings, error taxonomies, if adequately postulated, can expose regularities in learner language and learning processes (Chan, 2010). The present study is no exception. Hence this section examines the most common premises on which such taxonomies are built.

Form and meaning have always been considered important aspects of words. Thus, James (1998) classified lexical errors into two main categories: formal and semantic, which was later adapted by Hemchua and Schmitt (2006), who devised 24 subcategories of lexical errors found in their study under these two main categories: formal and semantic features, where semantic errors were twice as numerous as the formal ones. Three broad subcategories of formal errors according to this source are formal misselection, misformations and distortions, while the subcategories of semantic errors appear to be confusion of sense relations, collocation errors, connotation errors and stylistic errors.

The leaners’ first language is another criterion for an error taxonomy (Kaweera, 2013). While Lado (1957) built his contrastive analysis on the idea that L1 has the potential to negatively impact L2, Selinker (1972) considered the concept of interlanguage, i.e. the patterns of learner language. In research literature, interlanguage is often associated with the leaners’ L1. Thus, Yip (1995) devotes an entire volume to the description of the Chinese-English interlanguage. Interlanguages are sometimes compared with each other. An example of this is a research study on lexical errors conducted by Meara and English (1986) aimed to establish the effectiveness of English dictionaries both as an error correction tool for EFL beginner level learners and as a tool for lexicographers to develop more effective materials. The study (Meara & English, 1986) suggests that there are systematic differences between the errors made by students with various L1 backgrounds. For example, the findings revealed that the proportion of totally wrong words was very high among Chinese and Indonesian learners, whereas the same learners showed a very low proportion of semantically related errors, compared to all other first languages in the sample. This is to say that the proportion of error types varies significantly based on the learner’s L1.

Echoing the framework of error analysis (Corder, 1967), errors can also be classified according to their source, which is found either in L1 or within the learner L2 itself. This taxonomy mainly differentiates between interlingual errors, caused by L1 interference or transfer (Corder, 1967), and intralingual or developmental errors (caused either by faulty generalization from L2 rules, their inadequate use or forming incorrect hypotheses about how L2 works (Richards, 1971; Chan, 2010). Building on this underpinning idea, Carrió-Pastor and Mestre-Mestre (2014) subdivided lexical errors into interlingual, intralingual and conceptual. Among the first category are calques, adaptations and unnecessary borrowings. The second category comprises erroneous collocation, coinage, omission of parts of words, misformation and misordering of words, while the final category consists of confusion about meaning and form of a word, as well as a use of a general word or a near synonym. It is interesting that these authors classify collocation errors under the category of intralingual errors, whereas Dodigovic et al. (2017) identify transfer as a source of collocation errors in Chinese student writing, which would position the collocation errors into the interlingual category. Furthermore, Dodigovic et al. (2017) identify transfer by word polysemy as the cause of some of the incorrect choices of near synonyms, which also differs from Carrió-Pastor and Mestre-Mestre’s (2014) taxonomy.

Agustín-Llach (2011) proposes an error taxonomy which comprises misspellings, borrowings, coinages, calques, misselection and semantic confusion, which in 2017 was appended to include borrowings, lexical adaptation, semantic confusion, wrong cognates, spelling problems and construction errors. On the surface, both look very similar to that of Carrió-Pastor and Mestre-Mestre (2014), except for one major difference, which is the absence of the underlying division of all lexical errors into interlingual, intralingual and conceptual. Perhaps one of the reasons for this difference is Carrió-Pastor and Mestre-Mestre’s (2014) decision to take the conceptual errors outside the scope of both interlingual and intralingual aspects of lexical errors, which has the potential to become a stumbling block in some linguistic contexts, such as the Chinese one (Dodigovic et al., 2017).

Language transfer in itself constitutes a way to classify lexical errors, including both its positive and negative aspects (Dodigovic et al., 2017; Agustín-Llach, this volume). Wang (2011), who investigated Chinese L1 transfer in the acquisition process of light verbs (such as do, make, give, take or have) + noun collocations among 150 intermediate level non-English college students, found that 61.84% of such collocations are due to either negative or positive transfer from Chinese.

Relevant research (Hemchua & Schmitt, 2006; Zhou, 2010; Xia, 2013) suggests that lexical errors do not only occur at single word level, but also at collocation and multi-word unit (MWU) levels (Gray & Biber, 2013). Among those are lexical transfer errors which have been identified at every level (Yang, Ma, & Cao, 2013; Li, 2005; Yamashita & Jiang, 2010), although not necessarily all within the same study. Dodigovic et al. (2017) base their study of Chinese leaner errors on these three lexical levels. On the understanding that most single item transfer-related errors are based on the polysemy of L1 words, they differentiate between polysemy, collocation and multiword unit (MWU) errors. They find that a large majority (50%) of Chinese lexical errors in English are caused by the polysemy of L1 words, leading to the choice of an inadequate translation equivalent in English. Agustín-Llach (this volume) provides more information on lexical transfer.

Nativelike expression or the depth of lexical knowledge is also known to have been used as a criterion. In this respect, adequate use of collocations is often regarded as an attribute of nativelike language command (Nation, 2001; McGarrell & Nguien, 2017). A study by Yamashita and Jiang (2010) therefore focuses on this aspect of learner language. To investigate the influence on the acquisition of L2 collocations, Yamashita and Jiang (2010) examined the accuracy and speed of the performance of both the speakers of English L1 and Japanese L1 EFL and ESL learners when using congruent and incongruent collocations. In congruent collocations, the lexical components are similar in L1 and L2 while in incongruent collocations, lexical components are different in L1 and L2. The results revealed that there was a significant difference between Japanese EFL learners, who needed more time and made more errors (when choosing the incongruent collocations), while Japanese ESL learners needed less time to respond in both types of collocation, but again had more difficulties with the incongruent collocations.

Finally, aspects of word knowledge (Nation, 2006; Dodigovic, 2005) can be used to build error taxonomies. Agustín-Llach (2017) acknowledges this when allowing for construction error, which refers to the way a word impacts the choice of phrase or clause construction, and is one of the important aspects of word knowledge (Nation, 2006). In a similar vein, Dodigovic, Li, Chen, and Guo (2014) suggested classifying lexical errors under six criteria related to what can be known about a word, i.e. the meaning, form, function and spelling. The taxonomy applied to categorize the academic vocabulary used in the writings of Chinese EFL university students consists of six criteria: Context, Collocation, Word Form, Structure, Part of Speech and Spelling. Context here relates to meaning. Other categories appear to be self-explanatory.

The above taxonomy seems to have the potential of being exceedingly useful to language teachers, as it draws their attention to aspects of word knowledge most commonly missed by leaners. This in turn can lead to the adjustment of the curricula and teaching foci, bringing about possible advancement in vocabulary learning. Especially in Armenia, a small Eurasian country plagued by budgetary concerns, understanding the most frequently erroneous aspects of word knowledge, especially when produced by advanced learners, can to some extent help regroup the existing resources in language education. Therefore, one of the aims of this study is to identify the aspects of word knowledge most commonly found erroneous in the academic writing of Armenian tertiary students. Another aim is to offer practical suggestions for pedagogical action toward the prevention and remediation of such errors.

Accordingly, the study attempts to answer the following research questions:

  1. 1.

    What are the most frequent lexical errors in the academic writing of Armenian EFL students?

  2. 2.

    What are the causes of these errors?

  3. 3.

    Which English words are prone to most errors?

Methodology

The present study is descriptive in nature, seeking to explore the lexical errors of the Armenian EFL students’ in writing and establish their possible causes. It is mostly based on qualitative research which was eventually quantified by tallying the occurrences of every error type.

Data Collection

In this study, essays written by 39 freshman-year students studying in the English Communication (EC) department of the American university of Armenia, all with Armenian as L1, were collected as a source for creating a learner corpus. The corpus comprises 28,602 tokens. The essays were written in response to one of their first assignments, and the topics for the essays varied from social media and sexual harassment to educational system in Armenia and self-reflection on essay writing techniques. The instructors of the course had informed the students about the possibility that their papers may be used as an empirical data source for a research study, and the students gave their verbal consent.

Data Analysis

In the process of data analysis, all the following steps were adhered to: examining and identifying errors, describing and classifying them into a taxonomy (Dodigovic et al., 2014), examining the source of their possible cause (Interlingual or Intralingual).

Each sentence was checked manually, sentence by sentence and all the possible lexical errors were extracted by the researcher. As a point of reference for double checking the collocations and as a tool to enhance the overall analyses LEXTUTOR (Compleat Lexical Tutor. Retrieved from https://lextutor.ca/) was used. Several online dictionaries—Online Collocation Dictionary, Cambridge Learner’s Dictionary—were used to check the meanings, synonyms and collocations of words.

After proofreading by the researcher, the findings were verified and approved by an experienced researcher and a native speaker. To describe the errors of the same pattern, initially they were coded by the researcher, as wrong word meaning, wrong collocation, wrong word form, synonym confusion, etc. Here is the list of error types with brief explanations:

  • Context—indicates wrong word choice, including wrong meaning, synonym confusion, opposite meaning

  • Collocation—indicates wrong collocations, including fixed phrases and lexical chunks

  • Word Form—indicates wrong word form, including wrong form of plural/singular, comparative forms of the adjective

  • Structure—indicates words or phrases that require certain structure, including erroneous usage of prepositions with certain words

  • Part of Speech (PS)—indicates the use of one part of speech instead of another.

  • Spelling—indicates misspelled words

Errors were classified according to the taxonomy presented by Dodigovic et al. (2014). Once an error list was generated, each error was described in terms of L1 or L2 influence, depending on whether it was deemed to be interlingual or intralingual. Errors in each category were then tallied and their percentages calculated accordingly. Moreover, the percentages of Interlingual (L1) and Intralingual errors (2) were calculated for each of the categories.

Results

As indicated in Table 8.1, the corpus comprised 28,065 tokens, out of which a total of 279 lexical errors were detected. Errors falling under the category Context, including wrong word choice, wrong meaning of a word and synonym confusion, have the highest frequency (110 total, 39.56%). Part of Speech, on the other hand, presents the lowest percentage (7.19%). The next largest portion of errors belongs to the category Word Form (19.06%), followed by Collocation (12.58%) and Structure (11.15%). Spelling errors were just slightly fewer than Structure errors (10.43%) partially due to unnecessary capitalization of some words such as government, globalization, sophomore and freshman, which have been repeatedly capitalized in the corpus.

Table 8.1 Descriptive statistics of lexical errors according to their categories

An example for each error type is given in Table 8.2.

Table 8.2 Taxonomy of lexical errors

In addition, the current study aimed at identifying the words that are most prone to errors to understand their nature and the source which triggered those errors. So, among the most erroneously used words (9), five are verbs: make, put, protest, connect, and distribute; and five are nouns: network, protest, addict, connect and need.

The most frequent aspects of word knowledge that proved erroneous in this study were Context and Collocation, while the least frequent ones were Structure and Spelling.

The bar chart in Fig. 8.1 shows the 9 words that are most prone to errors according to the number of their erroneous appearances in the learner corpus. The result shows that the verb make had the highest number (9) of erroneous uses among the verbs. Next, words with equal number (5) of errors in the corpus are the verb put followed by content words network and protest. It is interesting that the latter is often incorrect, whether used as a verb or as a noun. Moreover, all of the above are among the 3000 most frequent words of the English language.

Fig. 8.1
figure 1

Frequency of words most prone to errors

Table 8.3 depicts a detailed summary of findings from the LC on words most prone to errors with respect to the source or error, the category they appeared in and the number of appearances both in the sources and in criteria. The results show that Interlingual errors, labelled as L1 negative transfer, are dominant in six out of nine words. Furthermore, the highest number of erroneous appearances for these words was detected in Context and Collocation (14 and 10, respectively), whereas the lowest count is in categories of Structure and Spelling (two errors in each).

Table 8.3 Findings on words most prone to errors

Table 8.4 depicts the results of causes for each category of errors by way of their distribution between Interlingual (L1) and Intralingual (L2) errors. In all criteria L2 errors exceed the number of L1 errors, except for Spelling. The highest number of Interlingual errors is found in the category of Context (84), where the Armenian EFL writers often literally translate L1 expressions into L2.

Table 8.4 Number of lexical errors according to the source of their cause

Discussion

It is ironical that structure and spelling, linguistic concepts which in this study present as the two least frequent lexical error categories, are notions most frequently focused on in the Armenian ESL classroom (Ohanyan, 2018). According to James (1998), such errors are in the formal category. In contrast, the highest number of errors is in the categories of context and collocation, which according to James (1998) as well as Hemchua and Schmitt (2006) count as semantic errors. This would suggest that aspects of word knowledge are a viable way of differentiating between errors (Dodigovic et al., 2014). Some deliberations on the errors in the top two categories are presented below.

In the current learner corpus, wrong word meaning, wrong word choice, confusion of sense relation and synonym confusion were considered under the umbrella of Context. Similar to Hemchua and Schmitt (2006), these semantic errors outnumber all of the other categories (111 errors in total), out of which 83 were due to negative transfer from L1 and 27 to L2. Here is an example:

  • The matter with false used prepositions also comes from the dissimilarity of Armenian and English languages.

First, the possible explanation is the influence of Armenian with respect to polysemous L1 words, the analogy of which is found among Chinese EFL writers (Dodigovic et al., 2017; Wang, 2011), or direct translation of Armenian words into English, parallel to the analogy among Thai EFL writers (Kaweera, 2013). For example, the verb provide is a polysemous word in Armenian used to indicate different meanings in different contexts (provide education: (krtoutyun tal), provide opportunities: (hnaravoroutyoun tramadrel), which is not the same in English. Unfortunately, the use of the Grammar-Translation method in the Armenian English classroom (Ohanyan, 2018) provides a fruitful ground for the influence of L1 on TL.

Second, the students from whose essays the corpus is comprised are studying at an English medium university and are required to compose well-written texts. Their writing instructors often urge them to consider the choice of synonyms in order to add lexical variety to their composition. This might lead to the choice of what seems to be a more sophisticated word, which would make this an intralingual error. This differs from the interpretation of Carrió-Pastor and Mestre-Mestre (2014), who would classify this as conceptual error. Here is an example:

  • Children grow up hand in hand with the abrogating* effects of social networks. (harmful)

Third, there are cases, where a wrong choice of preposition distorted the meaning of a word and was counted as a context error rather than a structural one. In the majority of cases these kinds of errors were caused by negative L1 transfer, such as the following:

  • I have made a checklist of several points, which I must always have under hand, when I write an essay (at hand).

Finally, context errors could be a result of wrong word choices, because of simply not knowing the word in L2, for example:

  • While others possess that it contributes getting an addiction and enhance the chances to restless nights (insist).

The stage of error classification revealed that collocations were the hardest to isolate because many times collocation errors could as well be counted as context errors. The high level of frequency—12.58% of total errors—is a good indicator of frequent usage of collocations. The frequency of collocations found in the current corpus is almost the same (12.12%) as the one mentioned by Shalaby (2009), but much lower than the frequency of collocation errors (26.05%) indicated by Hemchua and Schmitt (2006). This might well be due to the differing methodologies in the two studies. Whereas the present study used the edited student papers, written over a period of time in a setting in which various aids were available, in the Hemchua and Schmitt (2006) study, the participants wrote their papers under controlled conditions, with no dictionaries available. The different nationalities of the participants in the two studies, Thai and Armenian, and hence the different L1 Thai and Armenian, might also have been responsible for this difference, if examined in light of Meara and English (1986).

There are several examples of wrong collocations with the words information, time and knowledge that are a direct translation from L1 (Kaweera, 2013), which indicates the limited lexical competence of the students. Also, it seems that learners are inclined to overuse those collocations that they feel safe with (Chan, 2010). For instance, students have used “right consideration” instead of “careful consideration”.

Collocation and context-related errors might also explain the fact that the most frequent erroneously used words such as make or put are found among the 3000 most frequent words, which the participants would have encountered in the early stages of learning English. The fact that they use these words productively suggests some knowledge of them, without an adequate depth (Nation, 2006) however. Most likely, they based their perceptions of these words on what they knew about their L1 translation equivalents (Dodigovic et al., 2017).

Thus, it seems that collocation errors are predominantly interlingual in nature, which corresponds to the findings by Yamashita and Jiang (2010), although there are intralingual reasons as well, which to some extent conforms to the deliberations of Carrió-Pastor and Mestre-Mestre (2014). In any case, the results suggest that the leaners need more time and a vast amount of exposure to authentic texts in order to make collocations a part of their lexical repertoire (McGarrell & Nguien, 2017). Word meanings should equally be studied in context (Nation, 2006), rather than from lists in isolation, such as might sometimes be the case in Armenia (Ohanyan, 2018).

Regarding the most frequently misused words, the top two are make and put, both belonging to the category of the so-called light verbs, which according to Wang (2011) are frequently misused in collocations by Chinese L2 learners. In fact, Wang (2011) found that a large majority of the leaners’ uses of English light verb + noun collocations could be traced to either positive or negative transfer from L1. Similarly, most erroneous collocations in this study are interlingually caused and contain a light verb, with a noun being the second most frequently misused part of speech. Overall, the trends in most frequent error types as well as the most frequently misused words seem to echo those found in previous research. They also suggest that the issue at hand is the depth of vocabulary knowledge.

Conclusions

The results of this study revealed that among the six categories of errors, Context errors are the most common, and among those errors most are wrong word choice, synonym confusions or literal translations. Thus, the number of context or semantic errors is twice the number of word form errors, followed by word structure, which indicates that there may be lexicogrammatical errors due to the lack of adequate input, extensive output or constructive feedback.

Also, it became evident that there were twice as many interlingual (176) as there were intralingual errors (103), which means that L1 is one of the main causes of lexical errors in the written production of the Armenian learners of English. However, the results also showed that there is no difference between the two sources of errors with regards to Word Form (25 L1 vs 27 L2 errors).

In general, it can be concluded that high frequency words are most prone to errors, which suggests that depth of knowledge has suffered somewhere along the vocabulary acquisition path.

Pedagogical Implications

The above conclusions are telling. In line with Ohanyan’s (2018) study, they seem to suggest that there are deficiencies in the way vocabulary is taught that could and should be rectified. One of the main issues might be excessive focus on word form in the EFL classroom, at the expense of the much needed focus on meaning (Nation, 1990). The fact that Armenian as L1 seems to be responsible for the majority of lexical errors is well in line with Ohanyan’s (2018) finding that the Grammar-Translation Method is the main approach to EFL teaching in Armenian public schools. It could be argued that indiscriminately using translation in the learning process might lead to the habit of basing all production on an L1 model (Dodigovic et al., 2017).

For this reason, the learners should be made aware of the fact that there is no exact one-to-one L2 equivalent for each L1 word. Thus, the use of bilingual dictionaries should gradually decrease (Schmitt, 2008), especially with higher L2 proficiency students. Likewise, over-reliance on translation may hinder EFL learners at developing an independent L2 lexicon, because the learners will try to access the word through its L1 equivalent rather than directly (Thornbury, 2002). That is why monolingual learners’ English dictionaries should be encouraged, especially those which are reliable and model the use of words in authentic sentences. It is also very important to encourage students to use collocation dictionaries and concordances, particularly such as can be generated using tools such as the Compleat Lexical Tutor (lextotor.ca), with its helpful analytical tools and a wide range of authentic corpora.

In addition, words should be studied in context (Nation, 2006). Decontextualizing memorization of words from word lists and drilling can be a useful part of the learning process which nonetheless relies on a limited range of learning strategies (Schmitt, 2000). In contrast, the learner’s active involvement in word processing is required, since the higher the learners’ involvement in accessing a word, the more memorable it becomes (Thornbury, 2002). The same view is supported by many researchers such as Ferris (1999), Ghandi and Maghsoudi (2014), Kurzer (2018) and Sheen (2007), who support the effectiveness of indirect feedback over the direct one. One of the major arguments for this is a deeper level of learner’s involvement in the process of self-editing or task revision, which in turn results in better performance in their writing.

Moreover, both size and depth of vocabulary play an important role in language proficiency. The present study has indicated that the size of the participants’ vocabulary might be greater than its depth. Thus teaching should shift from size to depth, by reinforcing “situational presentation” (Thornbury, 2002, p. 81), including contextualized learning based on learners’ own experiences, as well as repeating those chunks and collocation in different contexts, so that the learner gains competence in using the words in a range of contexts (Thornbury, 2002). The activities that can be used to this end are information-transfer and information-gap activities, such as turning diagrammatic representation—graphs, plans and maps—into text. Synthesizing and summarizing information from different sources can also be effective in learning vocabulary (Nation, 2006).

Furthermore, paying more attention to teaching the word form and spelling explicitly can be more effective not only for that specific lexical item, but for learning additional vocabulary items, such as polysemous meaning senses (Dodigovic et al., 2017). As the sound, stress and overall syllable structure of the word determine the way it is stored in the learner’s mental lexicon, it is important to highlight the word’s shape and stress in its spoken form (Thornbury, 2002), using techniques such as listening drills or chorus mumble drills and phonemic script. Hopefully, the teachers can be empowered to follow through with the above recommendations.