Keywords

1 Introduction

Since almost all activities at the university are connected in some way to academic writing, the amount of information related to student writing is vast. New methodologies, from such areas as corpus linguistics, writing research, or academic literacy, can shed light on a range of areas relevant to university writing teaching and learning: Which are the typical genres of a certain discipline? How and what do students learn with the help of a certain genre? Which linguistic features shape the profile of a text, and to what extent are they identifiable? How do the rhetorical features that characterize academic writing differ from one discipline to another?

Academic writing teaching and corpus linguistics are increasingly joining forces in interdisciplinary approaches that analyze, evaluate, and optimize student writing (Aull 2015; Biber et al. 2007; Flowerdew 2005; Flowerdew and Forest 2015; Gotti and Giannoni 2014; Hyland 2009; Nesi and Gardner 2012; Römer and O’Donnell 2011; Swales 2004; Upton and Connor 2001). In the Romanian university context, such approaches have never been used extensively, let alone integrated.

2 The Context: Academic Writing in Romania

Since the Fall of Communism, Romania has gone through a long process of transition from the Stalinist norms in education to new developments in higher education that were developed in Western countries (see also chapters “Academic writing at Babeș-Bolyai University. A Case Study”, “Institutional Writing Support in Romania: Setting Up a Writing Center at the West University of Timișoara”, and “Perceptions About “Good Writing” and “Writing Competences” in Romanian Academic Writing Practices: A Questionnaire Study”). In 25 years of continuous reforms in higher education, connected with the rise in number of disciplines and specializations (Chitez 2014, pp. 21–23) and the growing importance of English, the necessity of adaptation to new writing requirements emerged:

For cultures such as Italy and Romania, it is more a problem of introducing such new genres that interfere less with traditional ones but reach acceptance and provide a certain degree of comfort for all actors. Genre awareness and a deeper understanding of what genres accomplish in education are factors that supposedly play a crucial role for creating new teaching directives in the future. (Chitez and Kruse 2012, p. 175)

At Romanian universities, neither academic genres nor academic writing are taught explicitly. Faculty do not address, in general, issues of genre writing or writing process challenges due to a widely accepted view that learning to write is something specific to elementary and secondary education (ibid.). In fact, the majority of informative materials on written genres in the Romanian educational context refer to three primary categories: creative writing (e.g., compunerea, “composition”), argumentative writing (e.g., comentariul literar, “literary commentary”), and formal writing (e.g., scrisoare, “letter”; ibid., pp. 172–173). At the university level, it is the disciplinary setting that influences the academic writing performance: Students in the humanities, especially those studying foreign languages, have a greater chance of being exposed to structured genre-use and production training than students in the engineering disciplines, for instance. Almost all foreign-language departments offer practical modules to students where they learn how to write common genres such as stories, journal articles, and formal letters. Typically, such courses take place in foreign languages but not in Romanian, so that it may be a greater challenge for the Romanian students to handle academic texts in their mother tongue than in a foreign language.

Only in the recent past have academic writing research and support initiatives in Romania been launched. The SNF-SCOPES project LIDHUM (see chapter “Studying and Developing Local Writing Cultures: An Institutional Partnership Project Supporting Transition in Eastern Europe’s Higher Education”), coordinated by Otto Kruse and Mădălina Chitez, was one of the few initiatives in the field. The statistical analysis performed after the implementation of the EUWRIT survey (Chitez et al. 2015a) indicated that the repertoire of educational genres is similar in Eastern European countries but some genres are culture specific: for example, the seminar paper in Romania (Bekar et al. 2015, p. 130). Another project, OPEN RES,Footnote 1 was conducted at the Babeș-Bolyai University of Cluj and has resulted in several publications on academic writing training. Scattered publications on academic writing in Romania cannot easily be accessed (e.g., Andronescu 1997; Pavlenko and Bojan 2014) or need to be reinforced by extensive empirical evidence (e.g., Frăţila (Pungă) 2006).

3 Romanian Text Corpora

In Romania, corpus linguistics has been given very little attention compared to what it has received in other international research contexts, especially North America and Western Europe. It should be emphasized that by corpus linguistics we are referring to the research discipline that uses linguistic evidence extracted from electronic linguistic databases in order to conduct hypothesis-driven or explorative linguistic research. The alternative discipline, computational linguistics, shares with the field of corpus linguistics the key element of linguistic database construction, but its primary aim is to develop techniques of natural language processing, so its focus lies on information technology methods rather than on linguistic theory. Certainly, the two disciplines can overlap to a lesser or greater extent according to the underlying research questions in a project or a case study investigation.

Several research projects have resulted in compilations of corpora, which have also contributed to relevant linguistic analyses for the Romanian context.

One of the few larger collections of student texts is the Romanian Corpus of Learner English (RoCLE) databank (see Chitez 2014). This corpus complies with the general collection norms of the ICLE corpora (Granger et al. 2009), thus including specific genres (argumentative essays and/or literature essays) written by native speakers of Romanian in English. Informants are students having English as major or minor at their university and being enrolled in their third or fourth year of study. There are 352 texts in the corpus, which consist of 201,551 words. The corpus is used for the description of the salient features of student academic written discourse in English as a foreign language (more details below).

Lately, other small-scale corpora in the area of applied linguistics have been constructed that reflect the current use of language (either native or foreign language). For example, Herteg and Popescu (2013) proposed the use of comparable corpora consisting of English and Romanian newspaper business texts compiled by students (15,000 words compiled by each student in each language) for the extraction of business collocations in both languages.

On the other hand, Romanian corpus-based computational linguistic projects have been successfully conducted for several years now. The institution consistently active in corpus collection processes is the Romanian Academy Research Institute for Artificial Intelligence “Mihai Drăgănescu” (RACAI) led by Academician Dan Tufis, who has gained national and international recognition in the field (see Macoveiciuc and Kilgarriff 2010). Table 1 below almost exclusively includes (with the exception of RoWaC) RACAI corpora.

Table 1 Romanian corpora

4 At the Confluence of Academic Writing and Corpus Linguistics: Three Examples

4.1 Linguistic Fields

4.1.1 Contrastive Linguistics

There is a consistent body of corpus-based contrastive research on academic discourse in numerous languages (d’Angelo 2012; Fløttum et al. 2006; Johansson 2007; Mauranen 1993, 1994; Siepmann 2005). Many such studies make use of parallel corpora to investigate translation challenges (e.g., Mikhailov and Cooper 2016). By aligning the source and target texts, translators (or language learners) can better understand the mechanism and options of linguistic equivalence. In fact, “corpora have perhaps strengthened the trend away from word-equivalence to phrasal equivalence” (Krishnamurthy 2006, p. 253), which makes them also interesting to the academic writing contrastive field, given the importance of rhetorical appropriateness in genre use. Often followed by “further research with monolingual corpora in both languages” (Mauranen 2002, p. 182), translation-related research can either take the form of a genuine contrastive study or turn into an in-depth analysis of salient linguistic phenomena (Ebeling et al. 2013). Other corpus-based studies make use of independent comparable corpora to look at particular linguistic features, such as quantity approximation in De Cock and Goossens’ study (2013). In general, areas that benefit the most from corpus-based contrastive analyses seem to be bilingual and monolingual lexicography (Granger and Lefer 2013, p. 158).

There has, however, been little theoretical contrastive research on academic writing in Romanian versus other languages (see Chitoran 2013, for English-Romanian contrastive analyses). A few contrastive remarks have been offered by Chitez’s (2014) corpus-based study on several grammatical topics (articles, genitive, prepositions).

4.1.2 Academic Phraseology

Formulaic language, or phraseology (Granger and Meunier 2008; Stubbs 2001; Wray 2008), has long centered round the concept of lexicogrammar. As McEnery and Gabrielatos (2006, p. 41) point out, some linguists (Halliday 1991, 1992) prefer the term lexicogrammar because it is quite difficult to separate lexis from grammar, as they appear to be “the same thing seen by different observers” (Halliday 1992, p. 62). However, researchers tend to position their theories closer to one of the two ends of the lexico-grammatical continuum: lexical (Stubbs 2001; Halliday 1992) or grammatical (Sinclair 1966, 2004). In time, linguists have generally agreed that the notions of collocation (Biber et al. 1999, 2009; Ellis 1996; Firth 1957; Goldberg 2006; Römer 2009; Stefanowitsch and Gries 2003), lexical bundle (Biber et al. 1999, 2004; Biber 2006; Biber and Barbieri 2007), chunks (Wray 2008), and n-gram (Jarvis et al. 2012; Jarvis and Paquot 2012) are the key elements in lexicogrammatical approaches. In corpus linguistics, collocations and phrases are often interchangeable. Moreover, multiple studies have also indicated a certain degree of correlation between the users’ language competence and the phraseology profile of their discourse (Granger and Bestgen 2014; Laufer and Waldman 2011; Levitzky-Aviad and Laufer 2013; Nattinger and Decarrico 1992; Nesselhauf 2003). Academic writing research has taken on this awareness (see Bondi 2014; Charles et al. 2009) and integrated it into applied linguistics studies focusing on the compilation of the Academic Word List (Coxhead 2000; Nation 2001), the Academic Collocation List (Ackermann and Chen 2013; Durrant 2009; Laufer and Waldman 2011; Simpson-Vlach and Ellis 2010), and the Academic Phrasebank (Chitez et al. 2015b; Morley 2005).

4.1.3 Move Analysis

In genre research, certain structural patterns can be found repeatedly. These structures may be called conventional and are present as such in many instructional manuals and guides, but they also have functional meaning in organizing the discourse. Many studies (see also chapter “Research Articles as a Means of Communicating Science: Polish and Global Conventions”) have followed the research line of Swales (2004), who defined a sequence of moves and steps in which authors of research articles position themselves within a research field by first “establishing a territory” then defining a “niche” that they then, in a third step, “occupy.” Variations of this Creating a Research Space (CARS) model have been detected in research articles from many different cultures. The highly formalized evaluation model for move structures within the whole research article (not only the introduction) from Kanoksilapatham (2005), which follows the introduction, methods, results, and discussion (IMRAD) structure of the research article, can serve as a model for the coding of complex move text structures.

Bondi (2009), for example, followed a rather open comparison of Italian and English historical and economic discourse searching for differences in genre characteristics. She found large variations not only between disciplines and languages, but also between different approaches within the disciplines of the same language. She also analyzed statements of purpose in historical and business discourse. Such statements are often in fixed phraseological patterns (“in this paper,” “this paper is,” “of this paper,” “purpose of this,” “this paper examines,” etc.) and are therefore easily accessible in searches.

In Romanian academic genres, the open approach seems more appropriate since we cannot assume that Romanian students follow the standard IMRAD sequence in their research-related academic genres, given the fact that the typical Romanian university genres are much more variable and less conventionalized than research articles. However, we should be aware that the basic linguistic features of academic genres are identifiable.

4.2 How Can Corpora Be Used? Exemplifications from the Romanian Corpus of Learner English

4.2.1 Data

The examples in the present study are extracted from the RoCLE (see Chitez 2014) (Table 2).

Table 2 Topic distribution in RoCLE

The proportion of argumentative essays is around 75% of the total number of texts while literature essays make up 25% of the databank. In this way, some research questions can be addressed concerning the rhetorical patterns in either register settings (formality level in argumentation) or disciplinary settings (literature studies).

4.2.2 Example 1: Romanian-English Collocation Pattern Transfer

In her study, Chitez (ibid.) has identified several grammatical areas with potential for language and academic writing teaching: articles, genitives, and prepositions. For example, the use of the prepositions in, on, and to in collocation patterns are some of the cases with a great Romanian-English interference risk (Table 3):

Table 3 Romanian versus English prepositional collocation patterns in RoCLE

What academic writing experts can learn from the analysis of preposition collocation lists is the fact that Romanian students might use academic writing phraseology incorrectly (e.g., “in special”) mainly because they translate phraseology from Romanian (e.g., Ro: în special).

Another example is the use of the collocation pattern created by the demonstrative pronoun this together with a singular common noun. It has been shown (Chitez 2014, p. 116) that Romanian students excessively use expressions such as this situation, this way, this kind, and this movement. Considering that demonstrative anaphors are relevant markers of scientific writing (Lundquist 2007), this phenomenon can be also further investigated and exploited for the benefit of the Romanian students writing in English.

4.2.3 Example 2: Genre-Related Academic Phraseology

Two types of texts are included in RoCLE: argumentative and literature essays. It is interesting to look at the types of phrases that appear frequently in such genres (Fig. 1).

Fig. 1
figure 1

The use of the first person pronoun I in collocation patterns in argumentative essays

Surprisingly, students use the first person pronoun almost as frequently in argumentative essays (0.439% of the total number of words in RoCLE-ARG) as in literature essays (0.272% of the total number of words in RoCLE-LIT). The patterns most frequently encountered in argumentation are: I think, I know, I believe, I want, I consider, and I agree. In literature essays, students use past tense constructions such as I read, I found, I used, or I loved, which convey a rather narrative style to the literature text.

4.2.4 Example 3: Move-Analysis Indicators in RoCLE Essays

In order to be able to assess rhetorical moves in academic writing, a standard genre rhetoric move patterning has to be defined. In the case of argumentative essays, we will consider one of the structural units proposed by literature: situation, problem, solution, and evaluation (see Tirkkonen-Condit 1985). The multiword phrases associated with each of the moves can be searched for and extracted from the texts. In the case of the evaluation procedures, we checked the use of conclusion/conclude or similar expressions and noticed the following patterns:

  1. (a)

    In most argumentative essays, students use a concluding phrase to mark the evaluation move in their text, with frequent expressions being built around markers such as [to conclude]/[in conclusion]/[concluding], [thus,], [so,], [all (in all)], and [these facts], or in rare constructions such as [to cap it all] and [taking everything into account].

  1. (E1)

    Prove your independence to yourself and face today or tomorrow without cell phone and you will feel differently. <ICLE-RO-AIC-0002.2>

  2. (E2)

    If we remain silent and passive it will become one. But our immediate reaction and vivid interest will struggle against it and it will set forth an example and model for the others to follow. <ICLE-RO-AIC-0003.1>

  3. (E3)

    We should meditate on this problem, we should understand that if there are people who would give anything to spend yet another moment with their loved ones, why, and how could you decide to put an end to what God gave us: Life! <ICLE-RO-AIC-0008.1>

  1. (b)

    If the markers are not present, the alternative rhetorical phenomenon is the “recommendation,” either in the form of second person addressing or as a collective “we” formula (see E1–E3).

5 Pedagogical Recommendations

There are multiple ways by which the data extracted from corpora can be introduced into teaching scenarios. For the specific task of supporting student academic writing, a corpus can be implemented as follows:

Corpora Can Facilitate Induced-Learning Writing-Related Tasks

students can be given the task of analyzing databanks in either one language (Romanian) or in contrast (e.g., Romanian versus English L2; English L1 versus English L2) in order to identify salient features of academic writing use (see example of an exercise in contrastive phraseology in English and Romanian below)

Please use the corpus databank of Romanian Learner English (RoCLE corpus) in order to extract academic phraseology containing the following keywords: author, paper, intend, follow, important. Do the same with the corpus databank of Native Speaker English (LOCNESS corpus). Compare lists of phrases and identify interferences and/or transfer from native language Romanian into English.

Corpora as control for reference instruments

students can be given the task of analyzing databanks in English L1 and English L2 in order to check vocabulary/phraseology listed in dictionary entries or academic phrase banks.Footnote 2

Corpora as Support for Academic Writing Tools

students can have access to a specialized corpus within an electronic academic writing tool (see Chitez et al. 2015b) that can give them immediate support with linguistics problems encountered during the actual writing process.

6 Conclusions

We have used the RoCLE to exemplify the conception of corpus-based exercises to be completed in an academic writing class. A corpus of EFL texts containing specific types of genres (e.g., argumentative and literature essays) can be valuable to language tutors dealing with these text types.

As has been shown in this viewpoint paper, if we look at several highly used prepositions, there are numerous cases of collocation-pattern transfers from native Romanian into English. The analysis can be replicated for other grammatical elements or units. The RoCLE can also be employed for genre-specific analyses, such as the typology and use of phraseology in argumentative versus literature essays. In our case analysis, we showed that authorship (use of personal pronoun I) is rendered through genre-specific expressions. In the third example, we looked at the evaluation move in the sub-corpus of argumentative essays (RoCLE-ARG). Results showed frequent constructions (e.g., in conclusion) or multi-word Romanian borrowings (e.g., taking everything into account) and the tendency to replace overt concluding with recommending strategies.

The final recommendations on how to use corpora to improve students’ academic writing include free corpus consultation to increase awareness of target-language phraseology and vocabulary and comparison between corpora and dictionaries for lexical accuracy. A further application of corpora for academic writing, even if it is beyond the scope of the present article, would be to integrate corpora into academic writing tools.

It is essential to highlight that students can benefit from the use of corpora simply by learning to access, use, or create them. It is the teachers’ task to guide the learning process towards areas that are relevant for the academic writing field, such as the ones exemplified in this paper (contrastive collocation patterns, genre-specific phraseology, and rhetoric move constructions). Thus, the applicability of corpora is linked to the teacher’s creativity and openness to new methodologies in the classroom. In the Romanian EFL context, such approaches would be quite innovative, triggering or resulting in meaningful research in both corpus linguistics and academic writing or leading to unexplored but effective teaching strategies.