1 Introduction

Argumentation skill is defined as “a verbal and social activity of reason aimed at increasing (or decreasing) the acceptability of a controversial standpoint for the listener or reader by putting forward a constellation of propositions intended to justify (or refute) the standpoint before a rational judge” (van Eemeren et al. 1996, p. 5). The need to develop argumentation skills through education has been increasingly recognized as essential in academic contexts (Németh and Kormos 2001; Rapanta et al. 2013; Toulmin 1958, 2003; Wolfe 2011). The last two decades witnessed a growing interest in argumentative reading and writing at university level (Helms-Park and Stapleton 2003; Newell et al. 2011; Varghese and Abraham 1998). Both readers’ ability to identify, express, and evaluate the underlying structure of argument such as claims, data, and warrants, as well as writers’ ability to analyze, compose, and judge an academically sound argument represent a key component of academic success at both undergraduate and graduate levels (Muller Mirza and Perret-Clermont 2009; Newell et al. 2011).

The importance of argumentative writing is further stressed given the recent surge of English as Foreign Language (EFL) learners, including Iranian EFL learners, applying for graduate studies in English-medium universities. Their English argumentative writing abilities are often tested through some internationally recognized tests such as TOEFL, IELTS, and GRE (Educational Testing Service 2009). In the same vein, graduate student writers are highly expected to produce research papers in which they critically engage with the literature and evaluate the status quo on their topic of interest. Poor argumentative skills have been attributed to learners’ unpreparedness, unawareness, and limited skills. Unfamiliarity with the typical structure of English argumentative writing may result in developing inadequate and poorly reasoned papers in English (Lunsford 2002; Varghese and Abraham 1998; Wingate 2012).

The work of the British Philosopher, Toulmin (2003, p. vii), in describing the structure of a basic argument has been of central importance in English as a first language contexts (Ong and Zhang 2010), though it has received little attention in L2 settings. His interpretative framework has been widely used in accounting for “the various elements marking the progress of an argument in English argumentative writing” (Qin and Karabacak 2010, p. 455) and in teaching the elements of an argument as well as measuring learners’ knowledge of argumentation (Chambliss 1995). Toulmin’s (2003, p. 90) ‘model of argumentation’ consists of six linguistically and semantically interrelated components: claim (a debatable assertion), data (the evidence to support the claim), warrant (assumptions, beliefs and principles of the author), qualifier (the degree of reliance on conclusions arising from arguments), backing (for strengthening the warrant), and rebuttal (the circumstances under which the claim would not be true).

In recent years, Toulminian model of argumentation and its variations has motivated a series of studies which comes under two strands, one pertaining to the assessment of the structural components (Nussbaum and Schraw 2007; Qin and Karabacak 2010) and the other specifically focusing on the soundness of the produced arguments (Stapleton and Wu 2015; Wolfe et al. 2009). The former involves description and analysis of the surface structure of arguments. Within this perspective, the strength of an argument is based on the presence or absence of specific combinations of Toulmin components. For example, written scripts employing more elements like counterarguments and rebuttals are assumed to be of better overall quality and more convincing than those with less of these elements (Qin and Karabacak 2010). More specifically, the quality of an argument was rated on the basis of presence or absence of possible opposing views, overall language use, overall argument effectiveness, and overall structure. Examining these features, though helpful, may not be particularly relevant for assessing argument soundness as it hinges too much on structural elements of argumentation at the expense of quality of logic and evidence (Crammond 1998; Nussbaum and Kardash 2005; Simon 2008). In other words, this holistic scoring heeds more the strength of writing not its deficiencies (Weigle 2002; White 1985).

The soundness of written arguments has been less acknowledged in the literature (Clark and Sampson 2007; Qin and Karabacak 2010; Rusfandi 2015; Simon 2008; Stapleton and Wu 2015) and thus warrants further investigation. The above literature underscores the need to investigate argumentation elements across various learning and cultural situations, and the extent to which the use of these elements is recognized in EFL students’ writings in their educational effort to learn-to-argue. More specifically, there is paucity of empirical research into the quality of reasoning in learners’ written arguments (Kuhn and Reiser 2005; Sampson and Clark 2008; Stapleton and Wu 2015; Zeidler 1997). As Stapleton and Wu (2015) argue “the surface structure, or shell of the argument, may appear appropriate or even exemplary, but the actual substance could still be exceedingly weak” (p. 12). Accordingly, a strong incentive for this study is to examine the relationship between the structure of the arguments(in terms of Toulmin’s argumentative elements) produced by learners on the one hand, and the soundness of their arguments on the other (Qin and Karabacak 2010; Rapanta et al. 2013). The findings from this study can provide context-specific implications on the impact of university education on the development of logic and reasoning (as one of the main competencies in higher education) among advanced EFL learners of English. The findings can also provide input on how to help L2 graduates develop sound arguments. It is hoped that the findings provide evidence on EFL learners’ critical thinking practices and how they integrate structure and meaning in their reasoning.

1.1 Research Questions

In this project, we initially describe the rhetorical organization of the argumentative essays (Research question1) produced by Iranian graduate EFL learners in terms of how they position their arguments. Based on the location of writers’ point of view, argumentative papers can be categorized in four ways (Kubota 1998). They may be deductive (when a point of view is presented at the beginning of a paper), inductive (when a point of view is presented at the end of a paper), both (when both deductive and inductive points of view are presented), and off (when a writer fails to address a clear position in a paper).

Chen (2001a, 2008) reported that Chinese EFL students used inductive patterns in discourse when writing English papers. However, Hirose (2003), as well as Kobayashi and Rinnert (2008) showed that Japanese EFL students used deductive patterns in discourse when writing English argumentative papers. Further, Rashidi and Dastkhezr (2009) found that Iranian intermediate EFL students tended to present the main ideas at the beginning of their papers, so did undergraduate Turkish EFL learners in Uysal (2008). Overall, except for Chinese learners, these studies provide some evidence for a dominantly deductive pattern in the argumentative discourse of Asian EFL students.

Second, drawing on Toulmin’s model of argumentation, we attempt to analyze the typical structure of these essays (Question 2 below). The results can help us identify the elements of the developed arguments (e.g., claims, grounds, and warrants), and thus statistically analyze the relationship between the use of these components and the overall quality of the arguments. Finally, we analyze the highly structured (in terms overall quality) essays in terms of the degree of soundness (i.e. acceptability and relevance) of their arguments. More specifically, the following questions will be investigated:

  1. 1.

    What is the overall organizational structure of English argumentative essays written by Iranian graduate EFL learners?

  2. 2.

    Is there any statistically significant relationship between the use of argumentation elements as evidenced by the revised Toulmin model and the overall quality of English argumentative essays?

  3. 3.

    To what extent well-structured arguments are qualitatively sound in their reasoning?

2 Review of the Related Literature

2.1 Toulminian-Inspired Studies on Argumentative Writing

Second language argumentative writing research addresses a wide range of issues such as learners’ perceptions of argumentative writing and the associated instruction they have received (Wingate 2012); comparison of organizational structures of argumentative writing papers across different languages (Kubota 1998; Uysal 2008); the effect of task complexity on students’ argumentative writing performance (Ong and Zhang 2010); the relationship between learners’ academic achievement and their performance in argumentative writing skills (Preiss et al. 2013; Stapleton 2001); instructional components designed to enhance argumentative writing quality (Varghese and Abraham 1998); and the investigation of how different kinds of arguments are situated in academic contexts (Wolfe 2011).

In the last few decades, an extensive body of research has accumulated in the field of argumentative writing inspired by the works of Toulmin (1958, 2003, p. vii). One set of studies sought to use Toulmin’s model as an instructional and methodological instrument to teach argumentative writing both in L1 and L2 contexts (Bacha 2010; Butler and Britt 2011; Lunsford 2002; Varghese and Abraham 1998; Wingate 2012). In these studies, the effectiveness of an explicit instructional approach, the type and quality of instruction learners received, and creating argumentative writing tutorial environments in classrooms have been explored.

A second set of studies has particularly focused on the application of Toulmin’s theoretical framework in analyzing argumentative writing (e.g., Németh and Kormos 2001; Nussbaum and Kardash 2005; Wolfe 2011). In these studies, mostly using small sample sizes, the frequency of use of Toulmin’s elements across various levels of expertise, the contribution of those elements to the overall quality of argumentative writing, and the influence of goal specification on the use of elements of argument structures have been investigated. It is noteworthy, however, that these studies have been mostly conducted in L1 contexts and very few in L2 contexts (e.g., Qin and Karabacak 2010).

A third set of studies has focused on the soundness of arguments in terms of acceptability, relevance, and adequacy. They have proposed and used some criteria for measuring the soundness of arguments (Hughes and Lavery 2008; Means and Voss 1996). These criteria constitute acceptability (an assertion which is logical to accept as true), relevance (a reason which supports a conclusion), and adequacy (all premises provide enough support to explain a belief in the conclusion). These studies have provided an account of the nature and quality of student-generated written arguments, different schemes and frameworks for evaluating the quality of reasoning, and some deficiencies in operationalizing the quality of reasoning (Kelly and Takao 2002; Lawson 2003; Sandoval 2003; Schwarz et al. 2003; Zohar and Nemet 2002). This study draws on the last two strands of research in argumentation, i.e. examining the structural components of argumentation, as well as the quality and/or soundness of the produced arguments in graduate EFL students’ writings.

2.2 The Overall Quality of Argumentative Papers

Very few studies have investigated proficient EFL learners’ argumentative writing structure in terms of how a position in an argument is put forward, what types of reasons are specified to support the position, and whether any opposing point of view is offered and refuted (Qin and Karabacak 2010). Answering these questions will inform “… designing of instructional materials and planning of classroom activities for L2 argumentative writing instruction” (Qin and Karabacak 2010, p. 455). Hence, the second question of this study addresses the structural elements of the developed arguments in view of the revised Toulmin model by examining advanced EFL writers’ essays in terms of using argument elements. Then, it examines the relationship between the use of these elements and the overall quality of the developed arguments (see 4.2 for more details).

2.3 Soundness of Arguments

Toulminian-inspired studies have been criticized for overemphasizing “structural elements of argumentation at the expense of quality of logic and evidence” (Stapleton and Wu 2015, p. 13). Several frameworks have been developed to deal with the issue of quality of reasoning in terms of logic and evidence (i.e., argument soundness). For example, Schwarz et al. (2003) assessed quality of arguments in terms of the overall number of reasons, argument types, the acceptability of an argument based on the logical structure, number of reasons supporting counterarguments, and types of reasons.

Other studies focused on domain-specific criteria for evaluating the quality of arguments. For example, Zohar and Nemet (2002) noted that good arguments ‘include true, reliable, and multiple justifications’ (p. 40). In this way, arguers are likely to generate a simple argument by constructing a claim with at least a single relevant justification. Their framework; however, did not address content issues such as the adequacy, usefulness, and accuracy of a claim. Similarly, Takao and Kelly (2003) devised an analytic scheme to analyze argument quality based on the relative epistemic status of the propositions. Initially a researcher needs to identify the propositions in an argument and then categorize them based on epistemic level. Sandoval and Millwood (2005) measured students’ arguments in terms of field-dependent criteria. They focus specifically on conceptual and epistemological quality of students’ arguments. Although this framework provides ‘the highest mechanical specificity in terms of content quality’, it offers less explicit focus on the structure of arguments (Sampson and Clark 2008). Erduran et al. (2004) developed an analytical framework for assessing the quality of arguments in terms of argument complexity level operationalized in terms of the presence and nature of rebuttals.

Surveying the above literature on the overall quality of written arguments reveals several gaps in research on argumentation in L2 writing. First, considerable attention has been given to ‘the shell of the argument’ (see Qin and Karabacak 2010), and thus more rigorously designed empirical studies to verify the quality of the student-generated arguments is needed. Stapleton and Wu (2015) maintain that “both surface structure and substance need to be considered when assessing the overall quality of an argument essay” (p. 14). They discovered patterns of inadequacies in the reasoning and substance of the highly well-structured essays. Further, Sampson and Clark (2008) emphasize the need to examine the connection between structural components and quality features such as relevance, sufficiency, and accuracy of their content. Hence, the third question of this study explores the soundness of the well-structured arguments.

3 Method

3.1 Participants

250 male and female Iranian graduate learners of TEFL (Teaching English as a Foreign Language) took part in the study. TEFL is considered a social science, and these graduate learners were taking different courses related to learning and teaching of English. They had all obtained a BA degree in English language and literature or in English translation and had passed the Iranian national matriculation exam for entering university. This national exam measures the participants’ English language proficiency as well as their knowledge of issues in TEFL (i.e. language teaching/learning principles, language testing, and educational linguistics). The participating volunteers came from eleven state universities of reputation across the country. Admission to these universities is more competitive than to other private universities. Thus, the entrants are all assumed to be highly proficient in English. Their formal writing experience is basically limited to two obligatory undergraduate courses, i.e. ‘Principles of Writing’ and ‘Essay Writing’. In their graduate program, they all take ‘Academic Writing’ in order to help them develop academically sound texts (e.g., term papers, review papers, and dissertations). In most of their modules, these graduate students are encouraged to position their arguments and evaluate the current literature in their assignments and engage with their audience. Although developing sound arguments is highly appreciated, argumentation as a skill may not be explicitly taught in their curriculum. All the state universities across the country with MA in TEFL departments were targeted and contacted. 11 of them eventually agreed to participate in the study. The number of the participants ranged from 7 to 28 in these universities, and their age ranged from 23 to 43. The instructors in these departments agreed to cooperate and get the consent of their students to participate in the study. Participation was voluntary, and they were notified about the purpose of the study, and that they could withdraw from the study anytime they wanted. Consequently, the final participants were 150 as some of them did not complete the writing task as requested and some withdrew from the study (see Table 1).

Table 1 Distribution of the study participants (N = 150)

3.2 Writing Task

The participants were asked to write an argumentative essay in English on a social issue. To select the appropriate topic, we referred to the online database “Opposing Viewpoint Resource Center” published by Thomson Higher Education (http://gale.cengage.com/Opposing Viewpoints). This database is a repository of different controversial topics. The researchers chose 11 topics which seemed appropriate for the purposes of the study. To gain a reasonable justification for choosing the topic, 14 experienced writing instructors were called upon to rate the selected topics on a five-point Likert scale questionnaire, ranging from 1 (the least interesting) to 5 (the most interesting). Based on the instructors’ ratings, the topic ‘Iran poses a serious threat to the United States vs. Iran does not pose a serious threat to the United States’ was finally selected. From among 14 raters, nine of them selected this topic as appropriate for the purposes of the study.

It was also assumed that the participants have adequate background knowledge on the issue given the massive amount of exposure to the media and public debate about the issue. However, the chosen topic might be emotionally charged for the respondents as all of them came from Iran. To control for this bias, the wording of the topic was reversed for half of the participants. This so-called split-ballot technique is utilized with the expectation that two alternative phrasings of the same topic may yield a more valid picture than will a single phrasing (Revilla and Saris 2013). The writing task involved simple and clearly-worded instructions on how to do the task. It required learners to develop well-organized arguments explaining and supporting their views, and making their position clear on the issue. This was followed by the essay prompt and three blank pages appended.

3.3 Procedure

Prior to data collection, an informed consent letter was given to the instructors and students to participate in the study. The instructors were briefed on the purposes of the study and the data collection procedures. Students were reassured that all the data will be treated confidentially and used for research purposes only, and they could withdraw from the study if they wanted to. All the respondents were given a writing package with simple and consistent instructions on how to do the task. The allotted time (50 min) for writing the essay was decided based on piloting the topic with a small sample of participants similar to the target group. They were asked to develop a balanced argument of at least 400 words in a session based on their background knowledge and personal experience on the selected topic.

The final participants were briefed on avoiding using biased, emotionally-charged or sketchy arguments. They were asked to present opposing views on the issue and come up with their own clear points of view. A uniform procedure for data collection was followed across all the eleven universities. All the instructors were given a script explaining explicitly how to administer the task, and were also contacted and briefed on potential questions they might have about the script and administration procedures. The collected papers were rated by two experienced writing instructors both holistically and analytically.

4 Data Analysis

4.1 The Rhetorical Organization of Argumentative Essays

Results of the Chi square goodness-of-fit test for total frequencies revealed that there were significant differences in the rhetorical pattern of the written arguments (X 2 = 45.3, df = 3, p = .00). Results for question one on the type of the developed arguments showed that about 82% (n = 123) of the participants (see Fig. 1) state their positions clearly either at the beginning (deductive) or at the end (inductive of the essay. Almost half of the essays (49%, n = 73,) presented a clear point of view deductively supported by reasons and pieces of evidence. Around 20% (n = 28) of the essays were Off-type, i.e., failed to adopt a clear point of view and the stance on the topic was neutral. They stated the pros and cons of the topic without taking a stance. A small portion of the essays (18%, n = 27) had an Inductive rhetorical organization which summarized the writer’s thesis at the end preceded by supporting reasons and pieces of evidence. Moreover, a smaller percentage was of Both type (15%, n = 22,) in which the writer stated the topic of interest at the outset and the writers’ stance was maintained until the very end of the essays. In sum, the essay types were Deductive (49%), Off (19%), Inductive (18%), and Both Inductive and Deductive (15%) in terms of the overall rhetorical organization.

Fig. 1
figure 1

The percentage of English argumentative essay types

4.2 Argument Elements and Overall Argument Quality

The second question of the study involved two stages: structural analysis of the arguments, and examining the correlation between the use of argument elements and overall argument quality. To examine the essays structurally, elements of arguments (i.e. claim, counterclaim, and rebuttals and their associated supporting reasons such as rebuttal claim/data) were identified and their frequencies were calculated. We drew on Qin and Karbacak’s (2010) rubric for identifying these elements which was originally based on Toulmin (2003), Nussabaum and Kardash (2005), Nussabaum and Shraw (2007; see “Appendix 1” for definitions and examples from the corpus). This rubric has been shown to be reliable in identifying argument elements (Qin and Karabacak 2010). Inter-rater reliability of rated essays for claim, data, counter-argument claim, counter-argument data, rebuttal claim, and rebuttal data was .91, .96, .86, .84, .85, and .87 respectively, and the overall inter-rater reliability was .87. In case of any discrepancy in the identification of argumentative elements between the raters, data were negotiated until a consensus was achieved. We also drew upon the semantic structure and linguistic elements in the produced texts following Qin and Karabacak (2010). Claims, for example, were identified through elements such as in my opinion, I believe, and I think; datathrough prepositional phrases such as for that reason and subordinators such as because; counterargument and rebuttal through certain phrases such as Some people claim that…. However; It is said that….. but; even though; despite; and although.

The results show that graduate EFL writers tend to use all the elements of argument structure in their writing (see Table 2). As can be seen, the highest mean scores relate to the fundamental elements of Toulmin model, i.e. ‘data’ and ‘claims’. The lowest means relate to secondary elements (i.e. counterargument claims, counterargument data, rebuttal claim, and rebuttal data). Rank-ordered frequency of use of the elements is as follows: counterargument claim > rebuttal claim > counterargument data > rebuttal data.

Table 2 Frequency of use of argument elements

The Chi Square test on the relationship between the use of primary (Mean = 3.58; SD = 1.36) and secondary (Mean = 1.06; SD = 1.19) elements indicated that the primary (χ2 = 150.840, df = 8, p = .00), and secondary elements (χ2 = 92.497 df = 4, p = .00) tend to be systematically related, i.e. the frequency of use of the two elements is significantly correlated. Further, writers tend to use significantly more primary than secondary elements.

Similarly, the Chi Square test of independence indicated that counterargument claim and rebuttal claim (χ2 = 66.21, df = 3, p = .00), and counterargument data and rebuttal data, (χ2 = 1.47; df = 2, p = .00) tend to be systematically related as well, i.e. the frequency of use of the two and their corresponding elements is significantly correlated in advanced EFL writing. Moreover, these writers employed significantly more claims than data (82% counterargument claim vs. 24% counterargument data).

To answer the second part of question 2, overall argument quality of the essays was examined following Nussbaum and Schraw (2007), and Qin and Karabacak (2010). This involved grading the essays holistically by two raters in terms of ‘the overall argument effectiveness’, ‘the presence or absence of the possible opposing views’, ‘overall structure’, and ‘overall language use’. These criteria served as general indicators of an effective argument. The raters were notified not to bias towards any of these three dimensions of quality when rating the papers (Qin and Karabacak 2010). To ensure consistency of ratings, two raters, scored 20 randomly-selected essays using the rubric. Then, the authors discussed vague points of the rubric until consensual agreement was reached. The inter-rater reliability was found to be .88.

Pearson Product-Moment Correlation coefficient revealed that the essay scores of overall quality co-varied significantly positively with the uses of the six elements of arguments (see Tables 3, 4).

Table 3 Descriptive statistics for the overall quality of the essays
Table 4 Correlation between the uses of argument elements and overall writing quality

Among the argument elements, ‘data’ correlated significantly with the overall writing performance (r = .47). All other elements correlated significantly positively with participants’ writing quality: claim (r = .32 p < .01), followed by rebuttal claim (r = .28 p < .01), counter argument claim(r = .24, p < .05), counter argument data (r = .19, p < .05.), and rebuttal data (r = .19 p < .05). This means that the more these elements are used, the higher the overall quality of the argumentative essays.

4.3 Analysis of Argument Soundness

We found that the above criteria for measuring overall argument quality per se do not take account of the criteria (such as acceptability, relevance, and adequacy) for measuring soundness of arguments (Means and Voss 1996; Rapanta et al. 2013; Rusfandi 2015; Stapleton and Wu 2015). Thus, to answer the third study question, we adopted an integrative analytic approach following Stapleton and Wu (2015) to attend to both the structure and substance of written arguments (see “Appendix 2”). They maintain that “for an argumentative essay to be persuasive, not only must it follow surface structure by including alternative viewpoints and showing their weaknesses, but it must also support claims with good quality reasons that convince others” (p. 22).

The rubric used contains descriptors of the surface structure (i.e. surface elements of arguments described in 4.2), and the quality of supporting reasons demonstrating the magnitude of soundness. It involves two broad levels, and a total of five sublevels with attributed scores. In this rubric, the six elements of argumentation are differentially weighted from a scale of 0 to 5 for claim and scale of 0 to 10 for the two categories of counter-argument claim and rebuttal claim, as well as a scale of 0, 10, 15, 20, and 25 for the three categories of data. Increased scores were given to the categories of data because they demand higher level of critical thinking and argumentation skills (Stapleton and Wu 2015).

To analyze the reasoning quality of the arguments, the following procedures for selection of scripts were taken: First, following Nussbaum and Kardash (2005) and Nussbaum and Schraw (2007), we selected only those scripts that received high scores in the second phase of analysis explained in 4.2. That is, essays whose overall argument quality was plus one standard deviation above the mean were selected (Crossley et al. 2014; McNamara et al. 2013). The reason for the ‘best evidence selection’ is to discover whether essays rated high in terms of argument quality in the second phase would still be highly rated in terms of argument relevance and soundness. This filtering resulted in 40 scripts (out of 150). Then, each script was scored independently based on Stapleton and Wu’s (2015) analytic scoring rubric for argumentative soundness. To ensure that student-articulated scripts were assessed reliably, all the 40 scripts were coded independently by two raters, who were briefed to adhere to the rubric. Points of disagreement were sorted out between raters through discussion until consensual agreement was reached. The overall agreement between them was .91. Analysis of the 40 high-rated scripts revealed three argumentative profiles presented below.

4.3.1 Profile 1: One-Sided Good Surface Structure But Weak Argument Quality: Failure to Include Counterargument

The first profile represents scripts (n = 20) with one-sided argument in terms of structure as they simply depicted a claim with at least one reason. The arguments were weak in terms of soundness as very few reason(s; 20%, n = 4) provided were acceptable and relevant to the claim. Table 5 presents the frequencies of the data with the corresponding claims in the selected scripts. The quality of the supporting reasons for the claim ranged from no relevant reason to multiple sound reasons. As shown in Table 5, the majority of the one-sided arguers (60%, n = 12) supplied a claim with one or two reasons for each claim; some reasons were sound and acceptable and some weak and irrelevant though. A small percentage of the arguers (15%, n = 3) provided only one reason for the claim and the reasons provided were weak and irrelevant. However, about 15% (n = 3) of the writers provided multiple reasons most of which were acceptable and free of irrelevancies for the claim. Only a small number (5%, n = 1) of the arguers failed to provide an acceptable reason. Interestingly, only 5% of the arguers succeed in producing very strong data for supporting claims. In sum, 20% (n = 4) of the scripts in this cluster received higher scores for soundness, i.e. used multiple reasons that were relevant, acceptable, and justified.

Table 5 Frequency of features of one-sided arguments used in the scripts (N = 20)

4.3.2 Profile 2: Two-Sided Good Surface Structure But Weak Argument Quality (No Rebuttals)

The second profile represents a two-sided argument as it includes argument-counterarguments (without rebuttal). Structurally speaking, the scripts containing these features (n = 12) present not only the writers’ assertion and its corresponding reasons but also the possible opposing views with their corresponding justification.

Table 6 shows that the arguers supplied a claim (33%, n = 4) and counter-argument claim (25%, n = 3) with one to two reasons for each claim. Almost 25% (n = 3) of the writers provided multiple reasons most of which were acceptable and free of irrelevancies for the claim; over 8% (n = 1) of the arguers provided multiple acceptable and sound reasons for the counter-argument claim. For both claim and counter-argument claim, the arguers provided only one reason and the reason provided was weak and irrelevant (25 and 33.3% respectively). Interestingly, multiple data in both claim and counter-argument claim components had the lowest frequencies: 0 and 8%, respectively. Less than 10% (n = 1) of the arguers succeeded in producing very strong data for supporting counterargument claim(s), i.e. they used multiple reasons that were relevant and acceptable for the claim and counterclaim. None of the arguers succeeded in producing very strong data for supporting claim.

Table 6 Frequencies features of two-sided arguments used in the scripts (N = 12)

4.3.3 Profile 3: Two-Sided Good Surface Structure But Weak Argument Quality (Rebuttal(s) Included)

This profile represents a high level of structural quality as it consists of all the six elements of argumentation discussed before. However, the overall reasoning quality is still far from satisfactory. As can be seen (Table 7), almost all the claims, counterargument claims, and rebuttal claims were supported with low frequencies of data. The arguers provided only one reason for claim, counter-argument claim, and rebuttal claim and the reason provided was weak and irrelevant. 13% of the arguers (n = 1) provided multiple reasons which were mostly acceptable and free of irrelevancies for the claims and counter-argument claims; the arguers failed to provide sound reasons for the rebuttal claim. This means that the more the better is not always the case in developing sound arguments.

Table 7 Frequencies of features of two-sided arguments used in the scripts (N = 8)

5 Discussion

The graduate learners organized most of their arguments in a deductive fashion. In line with most of the previous research (Hirose 2003; Kobayashi and Rinnert 2008; Rashidi and Dastkhezr 2009), their writing mainly falls into the dominant categories of Deductive, Off, Inductive, and Both deductive and Inductive patterns, respectively. The majority of the participants (82%) state their positions clearly either at the beginning (deductively, 49%) or at the end (inductively, 18%) or Both (15%). These advanced learners seem to adhere to English writers’ use of a deductive pattern. Their formal training over the years and academic writing experience in English, might lean them to approximate English writers’ style in terms of directedness in their essays. It could also be due to the dominant product-based mode of writing instruction in Iran at tertiary level in which more emphasis is based on stating the main claim at the beginning (Abdollahzadeh 2010). The rather minimal employment of the inductive pattern might show the impact of schooling on raising the writers’ awareness of this pattern as a non-English pattern (Chen 2008; Husin and Ariffin 2012). The rather low use of Both organizational patterns, on the other hand, may be attributed to the writers’ poor development of the arguments.

Participants employed all the elements of written argumentation; however, the extent of using basic and secondary components, though interdependent, was significantly different. The majority of the essays included basic elements, namely, the writer’s opinion (claim) and supporting evidence (data). These elements are the most preferred ones for learners (De Bernardi and Antolini 1996; Lunsford 2002; Qin and Karabacak 2010; Varghese and Abraham 1998). The current findings reveal a predisposition among Iranian graduate EFL writers not to present much counterargument and rebuttals in their written argumentations, despite the fact that good arguments involve counterarguments and rebuttals to augment writing quality (Nussbaum and Kardash 2005; Wolfe et al. 2009). About half of the argumentative essays applied some form of rebuttals and counterarguments. A probable explanation might be the cognitive constraints of developing the secondary elements as they are more complex to produce (Coirier et al. 1999; Wolfe et al. 2009), and their lack of experience and awareness of the effectiveness of these elements in argument quality. Further, they might perceive of secondary elements as optional or unnecessary for writing argumentatively. Counterarguments play a vital role in argumentation structure (Toulmin 2003). A significant majority of them failed to represent a critical and reflective positioning towards the topic which could subsequently affect the quality of their arguments in their academic assignments in their attempts to argue to learn and later join their respective academic community. We found that despite good surface structure, many reasons provided by the arguers were weak resonating Sadler’s (2004) contention that arguers might not typically “display high-quality written argumentation as defined by an ability to articulate and defend contentious positions” (p. 523).

Using an integrated assessment framework, we assessed arguments in terms of substance and structure (Sampson and Clark 2008; Stapleton and Wu 2015). The analysis revealed several patterns of argumentative behavior. First, the selected scripts (n = 40) were grouped in terms of the occurrence of double surface structure (cluster 1: claim–data), quadruple surface structure (cluster 2: claim–data–counterargument claim–counter–argument data), and sextuple (cluster 3: claim–data–counterargument claim–counter–argument data–rebuttal claim–rebuttal data) combinations.

The first argumentative writing profile (cluster 1) showed structurally well-designed essays which were significantly low in terms of argumentation quality. All these scripts were described as one-sided argument, containing claim(s) and one or more reasons. The arguers here tended to support their claim(s) and maintain their position with some reasons (ranging from one to multiple pieces of evidence (Sampson and Clark 2008). However, claim-data argument is the least sophisticated form of an argument (Rusfandi 2015). Despite well-designed surface structure, most of the arguers failed to provide relevant and acceptable reasons for the corresponding claim(s) proving that student-generated arguments often lack substance, including components that are inaccurate and/or irrelevant in terms of quality (Rapanta et al. 2013; Schwarz et al. 2003; Simon 2008; Sampson and Clark 2008).

The second writing profile was also well-constructed structurally but tended to be weak in terms of quality. The arguers in this profile tended to support their claim(s) and counter-argument claims with either data (from one to multiple pieces of evidence), dismissing rebuttals and maintaining their position. Structurally, with increasing argument components, the written arguments become more complex and sophisticated. Despite producing more complex and sophisticated arguments compared to those in the first cluster, the students were not able to adequately align rebuttals with the counterarguments and thus failed to refute them. This profile cannot be considered strong as there was no indication of including rebuttals as essential components of better quality arguments (Erduran et al. 2004). This could be attributed to the complex nature of the argument-counterargument structure in L2 (Qin and Karabacak 2010) as well as “risk avoidance, lack of confidence, and reformulation difficulties in producing argument-counter-argument claims and supported data” (Kobayashi and Rinnert 2008, p. 35). There is some evidence that arguers who did not provide a counter-argument in their English L2 essays included this when they wrote in L1 (Kobayashi and Rinnert 2008). Further research can help us discover the extent to which transfer of argumentation strategies occurs in L2 argumentative production. The impact of schooling and L1 educational and writing culture could be another factor accounting for L1–L2 differences in argumentative development.

The final argumentative profile embraced rebuttals (claim and data) and was far more effective than those without rebuttals. Nonetheless, although appropriate surface structures were followed, the overall reasoning quality of argumentation was still far from satisfactory. The findings demonstrate that good surface structure cannot necessarily guarantee well thought-out logical structure. Therefore, although they were linguistically competent, a significant majority of the graduate students still could not produce arguments backed by evidence and counter-evidence. Raising the graduate student’s consciousness about the value of justifying claims and inquiry into the argument, and attending opposing perspectives could be an important pedagogical practice to pursue both in EFL and L1 contexts (Sadler 2004).

6 Conclusion

The findings provide empirical evidence uncovering the gap between structure and substance in linguistically advanced student argumentative discourse. The reasoning quality of the student-generated arguments was generally weak, though the frequency of use of argument elements was rather high. Forming a good surface structure and sound argument quality is believed to contribute substantially to persuasiveness of argumentative writing (Sampson and Clark 2008; Stapleton and Wu 2015). Assessing written productions of students, then, requires using integrative evaluation criteria in which more weight is given to arguments discussing alternative viewpoints, while taking account of good quality reasons to support claims in an attempt to convince the audience.

Given that enhancing argumentative skills is an ambitious educational goal which needs time and practice (Means and Voss 1996; Sadler 2004), writing programs and writing-across-the curriculum instructors need to create ample opportunities for learners to take part in argumentative practices in which they can evaluate and provide justified explanations for their claims and assertions, attending contradictory claims, and the formation of counterarguments and rebuttals (Sampson and Clark 2008; Rusfandi 2015). This argumentative intervention can develop EFL learners’ critical thinking skills as people seem to learn better when they argue (Kuhn 2008), and thus help them understand the epistemic nature of knowledge and participate more effectively in their respective scientific discourse. After all, learners need to appreciate how to produce a cognitively mature argument and contextualized arguments using rebuttals, counterarguments and qualifiers (Schwarz 2009).

We mainly explored the persuasive discourse of advanced EFL learner writers. Students’ topical knowledge or lack thereof in the discipline was not investigated in this study. Further, we know that textual practices are contextualized within the genres unique to different disciplinary communities where knowledge-making and knowledge-sharing are central. The implication for disciplinary writing is that understanding ways of knowing in different sciences is informed by the ‘tool kit’ acquired through learning to argue (Newell et al. 2015). Therefore, students can use this toolkit to understand what knowing means in their discipline, and later argue to learn their way into the discipline’s texts and ways of knowing (Carter 2007).

There might be other factors impacting the quality and structure of the students’ argumentative essays such as use of metadiscourse,evaluation markers, hedging, argumentative strategies across L1 and L2, as well as norms of suitable argumentation behavior across various disciplines (Sampson and Clark 2008; Uccelli et al. 2013). Examining these issues was outside the scope of this study. Future research can determine the contributory role of these elements in producing sound arguments.