Keywords

1 Introduction

Pragmatics, in its broad sense, is related to and depends on choices of linguistic features in a text to achieve meaning in discourse. Translational language has been shown to exhibit a variety of linguistic properties which indicate that it is a “third code” different from both source and target languages (Frawley 1984). As Hansen and Teich (2001: 44) observe, “It is commonly assumed in translation studies that translations are specific kinds of texts that are not only different from their original source language (SL) texts, but also from comparable original texts in the same language as the target language (TL)”. Recent studies of linguistic features at lexical, syntactic and discourse levels, which are mainly based on translated English, have motivated the formulation of TU hypotheses such as simplification, explicitation, normalisation, sanitisation, under-representation, levelling out/convergence, and source language shining through.

Simplification refers to the “tendency to simplify the language used in translation” (Baker 1996: 181–182), and as a result translated language is expected to be simpler than non-translated target language lexically, syntactically and/or stylistically (cf. also Blum-Kulka and Levenston 1983; Laviosa-Braithwaite 1997; Laviosa 1998). Explicitation is manifested by the tendency in translations to “spell things out rather than leave them implicit” (Baker 1996: 180), for example, through more frequent use of connectives and increased cohesion (cf. also Pym 2005; Chen 2006; He 2003; Dai and Xiao 2010; Xiao 2010). Normalisation means that translational language displays a “tendency to exaggerate features of the target language and to conform to its typical patterns” so that translated texts tend to avoid creative language use and thus appear more “normal” than non-translated texts (Baker 1996: 183).

The TU hypothesis of sanitisation suggests that translated texts, with lost or reduced connotational and hidden meaning, are “somewhat ‘sanitised’ versions of the original” (Kenny 1998: 515). Under-representation, which is also known as the “unique items hypothesis”, means that the linguistic features that are unique in the target language but do not exist or are rarely used in the source language may be under-represented in translations in comparison with comparable non-translated texts in the target language (Mauranen 2007: 41–42; Xiao 2012). Levelling out refers to “the tendency of translated text to gravitate towards the centre of a continuum” (Baker 1996: 184), which Laviosa (2002: 72) calls “convergence”, i.e. the “relatively higher level of homogeneity of translated texts with regard to their own scores on given measures of universal features.”

Another common feature of translations, which is to be focused upon in the present study, is the “source language shining through” hypothesis put forward by Teich (2003: 207), which states that “[in] a translation into a given target language (TL), the translation may be oriented more towards the source language (SL), i.e. the SL shines through.” For example, Teich (2003: 207) finds that in both English-to-German and German-to-English translations, both target languages exhibit “a mixture of TL normalisation and SL shining through”.

Hopkinson (2007: 13) also notes, in translation from Czech (L1) into English (L2), that “[the] product of L1 – L2 translation will thus usually contain examples of what is colloquially termed ‘translationese’, i.e. a non-standard version of the target language that is to a greater or lesser extent affected by the source language.” His analysis focuses on three key factors in interference: poor reference materials, translators’ generalisations of false hypotheses, and systemic-structural differences between Czech and English. The examples analysed cover interference in lexis, word-formation, grammar and syntax. All of his analysis is within the framework of the interlanguage model, but does not pay attention to the interference from the source to target language in translation.

Indeed, source language interference is prevalent in translation. As Toury (1979: 226) notes, “virtually no translation is completely devoid of formal equivalents, i.e., of manifestations of interlanguage.” According to Toury’s (1995: 275–276) “law of interference”:

In translation, phenomena pertaining to the make-up of the source text tend to be transferred to the target text. […] The more the make-up of a text is taken as a factor in the formulation of its translation, the more the target text can be expected to show traces of interference.

Toury’s law gives a vivid description of the feature of translations and casts new light on translation studies. However, Toury does not explicitly deal with his law of interference (cf. Teich 2003). Teich suggests that one of the factors that makes translations different from comparable native texts in the target language is that the source language—to a greater or lesser extent—“shines through” in translation. In other words, the language used in translation is not as idiomatic and prototypical as it is in texts originally composed in the same language, for the translated language contains deviations from the general TL patterns, with SL being their source of such deviations.

Nevertheless, the TU hypothesis of source language interference has not attracted much attention in translation studies, possibly because TU research has until recently focused on, or indeed confined to translation involving closely related European languages, which may display less marked contrasts than typologically different languages. As English, German and Czech all belong to the Indo-European language family and are thus related languages, the studies reviewed above arguably provide only limited evidence for generalising source language shining through as a “universal” feature of translation.

This article seeks to approach the phenomenon of source language interference on the basis of evidence from two genetically distant languages, namely English and Chinese, with the aim of answering the following two questions:

  1. 1.

    Is the phenomenon of source language interference also observable in translational Chinese?

  2. 2.

    If so, to what extent does source language interference occur in English-to-Chinese translation?

In addressing these research questions, the present study investigates source language interference in translated texts, at both lexical and grammatical levels, in English-to-Chinese translation on the basis of comparable corpora and parallel corpora of the two languages. The evidence from the two genetically distant languages is of critical importance in generalising source language interference as a potential translation universal.

Following this introduction, the chapter first introduces the research method and corpora used in this study (Sect. 2). Sections 3 and 4 are respectively concerned with contrastive analyses of a range of linguistic features at lexical and grammatical levels, in translational and native Chinese, which demonstrate evidence of source language interference in English-to-Chinese translation. A case study of passive constructions is also undertaken on the basis of parallel corpus data in an attempt to quantify the extent of source language interference. Section 5 concludes the study by summarising the major research findings.

2 Research Method and Data

In order to address the research questions set in Sect. 1 above, the present study will take a composite approach that integrates monolingual comparable corpus analysis and parallel corpus analysis as advocated in McEnery and Xiao (2002). The monolingual comparable corpus approach compares matching corpora of translated Chinese and native Chinese in an attempt to uncover salient features of translations, while the parallel corpus approach compares source and target languages on the basis of English-to-Chinese parallel corpora to establish the extent of source language interference, i.e. the extent to which the features of translated texts are transferred from the source language. Four corpora are used in this study, including two comparable corpora and two parallel corpora, which are introduced as follows.

The monolingual comparable corpora are the Lancaster Corpus of Mandarin Chinese (LCMC) and the ZJU Corpus of Translational Chinese (ZCTC), which represent native and translational Chinese respectively. LCMC is designed as a Chinese match for the FLOB corpus of British English (Hundt et al. 1998) for use in cross-linguistic contrast of English and Chinese (McEnery and Xiao 2004), while ZCTC is created as a translational counterpart of LCMC with the explicit aim of studying features of translational Chinese (Xiao et al. 2010).

These two Chinese corpora are each composed of ca. one million words in five hundred approximately 2,000-word text samples which are taken proportionally from 15 text categories published in China in the 1990s as shown in Table 1. As can be seen, the two corpora are comparable in terms of both overall size and proportions for different genres. English is the source language of about 99 % of text samples included in the ZCTC corpus, which also includes a small number of texts translated from other languages to mirror the reality of the world of translations in China.

Table 1 LCMC and ZCTC corpus design

Of the 15 genres covered in the corpora, text categories A–C are press material; D–H represent general prose; H is academic writing while K–R represent various types of fiction. These registers can be further merged into two broad text categories, namely, non-literary (A–J) versus literary (K–R). The contrastive analyses to be presented in the following sections will be based on more fine-grained genres or broader text categories as appropriate.

In addition to these comparable corpora of Chinese, two English-Chinese parallel corpora are also used in a case study that attempts to determine the extent of source language transfer of passive constructions in English-to-Chinese translation. They are the Babel English-Chinese Parallel Corpus (Babel) and the General Chinese-English Parallel Corpus (GCEPC), which are both annotated with part-of-speech information for English and Chinese texts and aligned at the sentence level.

The Babel corpus consists of 327 English articles and their translations in Mandarin Chinese. Of these 115 texts were collected from the bilingual magazine World of English between October 2000 and February 2001 while the remaining 212 texts were collected from the Time magazine from September 2000 to January 2001. The corpus contains a total of 253,633 English words in the source texts and 287,462 Chinese words in the translations (see Xiao 2005). As this corpus comprises mixed genres which are not encoded in the corpus, it can only be used to investigate translation patterns in English-to-Chinese translation but cannot be used to explore genre variation.

The GCEPC corpus created by Beijing Foreign Studies University allows for such variation study. It is the largest existing parallel corpus of English and Chinese, containing approximately 20 million English words and Chinese characters. This is a bidirectional parallel corpus which comprises four subcorpora, namely Chinese-to-English Literature, Chinese-to-English Non-literature, English-to-Chinese Literature, and English-to-Chinese Non-literature (Wang 2004; Wang and Qin 2010). As we are interested in how Chinese translations are affected by English source texts, only the two English-to-Chinese subcorpora will be used, amounting to 12 million words/characters, 60 % of which are for English-Chinese Literature, and 40 % for English-Chinese Non-literature (cf. Wang 2004: 40).

Having introduced the research method and corpora used, we will move on to explore, in the sections that follow, the features of translational Chinese that are of relevance to the investigation of source language interference. We will first consider linguistic features at lexical level.

3 Lexical Features

This section compares four lexical features of translational and native Chinese as represented in the ZCTC and LCMC corpora, namely mean word length, prefixes and suffixes, pronouns, and word clusters. Mean word length is considered because it is often a lexical indicator of text readability and thus it is related to the simplification hypothesis. Affixes are included because Chinese is a non-inflectional language and is therefore expected to be less productive in its use of prefixes and suffixes than English, the source language of the translated texts in the translational Chinese corpus ZCTC. As pronouns are a linguistic device used for achieving cohesion, their frequency of use reflects the extent of explicitation. Finally a frequent use of word clusters is sometimes associated with the translational tendency to strive for fluency at discourse level (e.g. Baker 2004).

3.1 Mean Word Length

Mean word length is a basic statistic in text analysis which is readily available in a wordlist generated using WordSmith. It has also been used in translation studies to compare native and translated Chinese texts. For example, Wang and Qin (2010: 168) observe that the mean word length is marginally greater in translated Chinese than in native Chinese, with a higher proportion of monosyllabic words in native Chinese and a higher proportion of disyllabic words in translated texts. This observation is supported by our data.

As illustrated in Fig. 1, the mean word length in translated Chinese is slightly greater than in native Chinese (1.59 vs. 1.57, a statistically insignificant difference), which is true in both non-literary (1.63 vs. 1.61) and literary (1.47 vs. 1.42) texts, with an even more marked contrast between native and translated Chinese in literary texts possibly because these genres contain more proper nouns such as transliterated personal names and place names, which are longer than similar words in native Chinese.

Fig. 1
figure 1

Mean word length in LCMC and ZCTC

Table 2 shows the distribution of words of various lengths in LCMC and ZCTC across two broad categories, namely literary and non-literary texts, and their mean scores. As the subcorpora are of different sizes, relative frequencies in the form of percentages will be compared. As can be seen, no matter whether native and translational corpora are taken as a whole, or the two broad text categories are considered separately, monosyllabic and quadrisyllabic words are generally more common (except for quadrisyllabic words in literary texts) in native Chinese. A key word analysis suggests that monosyllabic words are more common in LCMC because native Chinese texts make more frequent use of Chinese surnames, which are typically monosyllabic, as well as high frequency monosyllabic words such as 元 yuan ‘Chinese currency unit’ and 党 dang ‘(Communist) Party’, though many monosyllabic function words are more frequently used in the translational corpus, e.g. the structural auxiliary 的 de and personal pronouns 你 ni ‘you’, 我 wo ‘I, me’ and 她 ta ‘she, her’, which are all negative keywords in LCMC in relation to ZCTC. Quadrisyllabic words are more common in LCMC because non-literary texts in native Chinese tend to make significantly more frequent use of idioms (see Xiao and Dai 2010), which are typically four-character words. In contrast, disyllabic and trisyllabic words are more frequent in translated texts. The translational tendency for long words is particularly marked in words containing five or more syllables, though these words per se are infrequent in both native and translated Chinese.

Table 2 Proportions of words of various lengths in LCMC and ZCTC

A key part-of-speech analysis shows that transliterated foreign personal names and place names, which are typically much longer than Chinese names, are on the key part-of-speech list of ZCTC in relation to LCMC. While the higher proportion of monosyllabic words in native Chinese makes the contrast in the mean word lengths in native and translational Chinese less marked, the inevitably more frequent but less varied use of transliterated foreign names in the translation process still results in a marginally greater mean word length in translated Chinese texts. In this sense, the general tendency in translational Chinese to use slightly longer words can be taken as evidence of source language interference.

3.2 Prefixes and Suffixes

A key part-of-speech analysis of the translational Chinese corpus in relation to the native Chinese corpus suggests that the suffix tag (K) is a key part-of-speech in the translational corpus. As can be seen in Fig. 2, which compares the frequencies of prefixes and suffixes in LCMC and ZCTC, there is a marked contrast between the two corpora in their frequency of affixes. The log-likelihood test (LL) indicates that the difference is highly significant (LL = 23.01, p < 0.001).

Fig. 2
figure 2

Prefixes and suffixes in LCMC and ZCTC

Because Chinese is not a morphologically inflectional language, the more frequent use of prefixes and suffixes in translated Chinese texts is arguably a result of source language interference. This finding is in accord with Wang and Qin’s (2010: 175) observation that some morphemes in translated texts, e.g. suffixes such as -xing (a nominal suffix indicating property, similar to -ness / -ity in English), are so productive in Chinese translations because of the influence of English source texts that there is a tendency for them to replace the original expressions in native Chinese.

3.3 Pronouns

Among all parts-of-speech, pronouns are the one that displays the most marked contrast between LCMC and ZCTC, which contain 49,582 and 70,401 instances in the two corpora (LL = 3,707.69, p < 0.001). As pronouns have the function of making discourse more cohesive (cf. Xiao 2012), translational language is hypothesised to make more frequent use of pronouns. This section seeks to test this hypothesis on the basis of LCMC and ZCTC by exploring the overall distribution of pronouns in the two corpora.

Figure 3 shows the overall distribution of pronouns in the two corpora. As can be seen, pronouns are distributed in native and translated Chinese in a similar pattern, with the most frequent use in fiction, followed by general prose and news, and the least frequent use in academic prose. On the other hand, translated Chinese makes more frequent use of pronouns no matter whether the two corpora are taken as a whole or individual registers are considered separately. All differences are statistically significant (at p < 0.001) according to the results of log-likelihood tests.

Fig. 3
figure 3

Pronouns in LCMC and ZCTC

The significantly more frequent use of pronouns (especially personal and demonstrative pronouns) can be taken as an indicator of translational explicitation (cf. Xiao 2012). The relatively low frequency of pronouns in LCMC in comparison with ZCTC can also be regarded as a reflection of source language interference. This is because in native Chinese, unlike in English, the grammatical subject can be dropped because of its connective discourse function, whereas the subject in the English source text is likely to be transferred to the translated text. This point is well illustrated in example (1a), which is excerpted from A Madman’s Diary by the renowned Chinese writer Lu Xun.

(1a)

看不见他 , 已经三十多年了 ; 今天[我]见了 , 精神分外爽快。[我]才知道以前的三十多年 , [我]全是发昏 , 然而[我]须十分小心。

Wo kanbujian ta, yijing sanshi duo nian le; jintian [wo] jian le, jingshen fenwai shuangkuai. [Wo] cai zhidao yiqian de sanshi duo nian, [wo] quan shi fahun, ran’er [wo] xu shifen xiaoxin.

(1b)

I have not seen it for over thirty years, so today when I saw it I felt in unusually high spirits. I begin to realize that during the past thirty odd years I have been in the dark; but now I must be extremely careful.

In example (1a), which is originally written in Chinese, the subject pronoun 我 wo ‘I’ is dropped after its occurrence in the first sentence. Although the passage comprises more than one sentence, the subject pronoun in the first sentence functions to glue the ensuing discourse in the excerpt together. Because of the cohesive function of pronouns in Chinese, a competent Chinese speaker would hardly have any difficulty in understanding the passage. However, if the same message is translated into Chinese from English (1b), the translator is very likely, under the influence of the English source text, to include all of the dropped subjects as highlighted and included in the brackets in (1a). This is because English and Chinese have different conventions of using pronouns: English tends to repeat personal pronouns, which is dispreferred in Chinese so that where a pronoun is repeated in a text in English, Chinese either drops the pronoun or repeats a noun instead (cf. Liu 1991: 371).

3.4 Word Clusters

Word clusters are fixed and semi-fixed formulaic expressions based on collocations, which are also known as ‘lexical bundles’, ‘multiword units’, ‘prefabs’, and ‘n-grams’ and so on. Scott (2009: 286) observes that “all words have a tendency to cluster together with some others”. Word clusters are purely structurally defined on the basis of co-occurrences with no regard to their semantic contents. They can be computed automatically using corpus exploration tools such as WordSmith (Scott 2009). Generally speaking, the frequency of word clusters tends to drop sharply as their length grows. For example, the frequency of 4-word clusters is significantly lower than that of 3-word clusters, which are in turn substantially less frequent than 2-word clusters. The statistical significance of word clusters is usually measured by their recurring rate, e.g. 5 or 10 occurrences in a million words. Another useful parameter in computing word clusters is their coverage rate, which measures how widespread a word cluster occurs in a given corpus. It is expressed as a percentage of the number of text samples containing a particular word cluster in the total number of text samples in that corpus.

In translation studies, Baker (2004) and Nevalainen (2005, cited in Mauranen 2007) both find that recurring word clusters are more commonly used in translations in comparison with non-translated texts. This finding echoes Baroni and Bernardini’s (2003: 379) observations based on their investigation of collocations in translated and native texts, which even differentiate between two types of repetition patterns:

[…] translated language is repetitive, possibly more repetitive than original language. Yet the two differ in what they tend to repeat: translations show a tendency to repeat structural patterns and strongly topic-dependent sequences, whereas originals show a higher incidence of topic-independent sequences, i.e. the more usual lexicalised collocations in the language.

Xiao (2011) notes that this finding is also applicable in translational Chinese, which demonstrates that word clusters composed of 2-to-6 words are significantly more frequent in translational Chinese than in native Chinese. The higher use of word clusters in the translational corpus is also evidenced by a keyword cluster analysis. The more frequent use of word clusters in translational Chinese is probably a result of the translation process in which “translators are likely to opt for safe, typical patterns of the target language and shy away from creative or playful uses”, and consequently, translators tend to be make heavy use of “pre-packed, recurring stretches of language” (Baker 2007: 14). However, an equally plausible alternative explanation for the more frequent use of word clusters in translational Chinese, in our view, is that translations are under the influence of the English source language.

As can be seen in Fig. 4, which compares the use of 2–4-word clusters (clusters of more than four words are infrequent in the million-word corpora and thus excluded in the graph) in the three comparable corpora, all of the three types of word clusters are most frequent in FLOB and least frequent in LCMC, with ZCTC between the two.

Fig. 4
figure 4

Word clusters in Chinese and English

In addition to their significantly higher frequencies in translational Chinese, word clusters demonstrate two other interesting characteristics. On the one hand, high-frequency word clusters (defined here as those accounting for at least 0.01 % of the respective corpus) are more common in Chinese translations. As can be seen in Fig. 5, the number of high-frequency word clusters in ZCTC (a total of 413, including 403 2-word clusters and ten 3-word clusters) is greater than that in LCMC (a total of 291, including 287 2-word clusters and four 3-word clusters), which is a statistically significant difference (LL = 21.96, p < 0.001). Given that translated Chinese tends to use high-frequency words (Xiao 2010), it is hardly surprising to find a more common use of high-frequency word clusters in ZCTC. While word clusters in different languages cannot be directly compared against each other, it is also of interest to note in Fig. 5 that the use of high-frequency word clusters in the ZCTC corpus of translational Chinese is more similar to the English corpus FLOB, which yields 522 instances of high frequency clusters (498 2-word clusters and 24 3-word clusters).

Fig. 5
figure 5

High-frequency word clusters in English and Chinese

On the other hand, word clusters have a much wider coverage in translated Chinese in comparison with native Chinese, which is possibly a result of the influence of English (see Figs. 6 and 7). As can be seen in the figures, because of the low overall frequencies of 2-word clusters with a minimum coverage rate of 50 % (18, 20 and 44 instances in LCMC, ZCTC and FLOB respectively) and 3-word clusters with a minimum coverage rate of 20 % (zero, four and 13 instances respectively), their frequencies are quite similar in the three corpora. However, there is a marked contrast in the frequencies of 2-word clusters with a minimum coverage rate of 30 % (35, 65 and 158 instances respectively) and 3-word clusters with a minimum coverage rate of 10 % (eight, 23 and 90 instances respectively) in the three corpora. This contrast displays an accelerating tendency as the coverage rate drops: there are 101, 170 and 332 occurrences of 2-word clusters with a minimum coverage rate of 20 %, and 61, 132 and 425 instances of 3-word clusters with a minimum coverage rate of 5 %, in the native and translated Chinese corpora and the comparable English corpus respectively.

Fig. 6
figure 6

Coverage of 2-word clusters in English and Chinese

Fig. 7
figure 7

Coverage of 3-word clusters in English and Chinese

The higher frequency and wider coverage of word clusters in translational Chinese suggest that translators demonstrate a higher propensity for striving for fluency than writers of native Chinese texts. Translators are also likely to be under the influence of English, the principal source language of the ZCTC corpus.

This section has explored four lexical features, namely mean word length, affixes, pronouns and high-frequency high-coverage word clusters in the native and translational Chinese corpora. The results show that these features are all significantly more frequent in translated texts. While alternative accounts are plausible (e.g. translational explicitation for the overuse of pronouns in translated Chinese), the significantly more frequent use of all of these lexical features provide evidence in support of the TU hypothesis of source language interference at lexical level. The next section will explore three linguistic features at grammatical level.

4 Grammatical Features

This section examines three grammatical features, namely mean sentence segment length, the predicative 是 shi (“be”) structure, and the 被 bei passive construction.

4.1 Mean Sentence Segment Length

Mean sentence length has often been used as a parameter in research of translational language. However, different results have been reported in different studies. For example, Malmkjær (1997) observes that using stronger punctuation in translation entails shorter sentences in translational language, while Laviosa (1998) notes that the mean sentence length is lower in translated newspaper articles than comparable original texts but higher in translated literature than original narrative texts. According to Xiao (2010), while the mean sentence length is slightly greater in LCMC than in ZCTC, the difference has no statistical significance (t = −1.41 for 28 d.f., p = 0.17).

In native Chinese texts complete sentences do not always end with full stops, because commas are often used to replace full stops. In translated Chinese texts, by contrast, full stops in English source texts tend to be transferred into the translations, which explains why full stops are significantly less frequent (LL = 202.29, p < 0.001) but commas are substantially more common (LL = 2,555.28, p < 0.001) in LCMC, as shown in Fig. 8.

Fig. 8
figure 8

Full stops and commas in LCMC and ZCTC

Chen (1994) finds that three quarters of sentences ending with a full stop and semi-colon contain two or more structurally complete sentence segments. For instance, in example (2),Footnote 1 while the Chinese version only contains one sentence, it actually expresses three relatively complete meanings, and as such, three sentences are used in the English version.

(2a)

人们大多通过电影认识叶锦添 ,(2)尤其在2001年凭着《卧虎藏龙》中典雅清幽的东方意象 , 夺下华人世界第一座奥斯卡最佳美术指导奖后 , (3)各地蜂拥而至的邀约 , 更快速将他推向全球舞台。

Renmen daduo tongguo dianying renshi Ye Jintian, (2) youqi zai 2001 nianping-zhe “Cang Long Wo Hu” zhong dianya qingyou de dongfang yixiang,duoxia huaren shijie diyi zuo Aosika zui jia meishu zhidao jiang hou, (3) gedi fengyong’erzhi de yaoyue, geng kuaisu jiang ta tuixiangquanqiu wutai.

(2b)

Most people know Tim Yip through films. (2) In particular, in 2001 he becamethe first ever person from the Chinese world to win the US AcademyAward for Best Art Direction, received for the elegant Oriental imageryhe brought to Crouching Tiger, Hidden Dragon. (3) Since then, demandfor his services has gone into hyperdrive, accelerating the spreadof his fame and appeal worldwide.

Hence, Wang and Qin (2010: 169) argue that for languages that are characterised by parataxis such as Chinese (Liu 1991), sentence segment length is more meaningful than sentence length. This section will compare the mean sentence segment lengths in native and translated Chinese. In this study, the number of sentence segments is equivalent to the sum of the sentence number and the number of commas. Figure 9 compares the mean sentence segment lengths of native and translated texts. Clearly, the mean sentence segment length is greater in translated Chinese than in native Chinese, in all of the four registers, particularly in academic prose. This finding is in line with Wang and Qin’s (2010: 169) observations of literary and non-literary translations. One possible explanation is source language interference, because the mean sentence segment length in English is greater than in Chinese (the mean sentence segment length is 25.59 words in FLOB but only 13 words in LCMC) while corresponding registers in English (e.g. academic writing) customarily make use of long sentences.

Fig. 9
figure 9

Mean sentence segment lengths in LCMC and ZCTC

4.2 The Predicative shi Structure

The predicative 是 shi (“be”) is the most frequently used verb and also the second most frequent word in the Chinese language (Xiao et al. 2009). The predicative structure is a sentence with the predicative shi as the main predicate. This section compares the distribution of the predicative structure in native and translational Chinese.

Figure 10 shows the normalised frequencies (per 100,000 words) of the predicate structure in LCMC and ZCTC. As can be seen, when the two corpora are taken as a whole, the shi structure is significantly more frequent in translated texts (LL = 16.96, p < 0.001). The structure is also more frequent in translations in both literary and non-literary subcorpora, though the contrast in literary texts is not as marked as in non-literary texts, possibly because the predicative structure is very frequently used in both original and translated literary Chinese texts.

Fig. 10
figure 10

The predicative shi structure in LCMC and ZCTC

The more frequent use of the predicative shi structure in translated Chinese is a result of transfer of the copular verb be in English source texts. Like the predicative shi in Chinese, the copular be is also a high-frequency verb in English, which is used in a broader range of contexts than the predicative shi in Chinese. For example, the verb be can be used as a main verb or an auxiliary whereas the predicative shi is not used as an auxiliary. Consequently, native English learners of Chinese tend to overuse the predicative shi structure, e.g. *我是饿了 Wo shi e le ‘I am hungry’; *我今年是 20 岁 Wo jinnian shi ershi sui ‘I am 20 years old this year’. In examples like these, native Chinese speakers would not use the predicative shi, but rather say 我饿了 Wo e le; 我今年20岁Wo jinnian ershi sui, unless they want to use the predicative structure for emphasis. This is because the correspondence between word classes and syntactic functions is not as rigid in Chinese as in English so that adjectives and even nouns can be used directly as predicates without using the predicative verb shi whereas in English the copular verb is mandatory in such cases. Like English learners’ Chinese interlanguage, Chinese texts translated from English are also characterised with the overuse of the predicative shi structure, as illustrated in the following (a) examples cited from the ZCTC corpus.

(3a)

…从来

容易

的。

conglai

dou

bu

shi

rongyi

de

ever

all

not

SHI

easy

DE

‘…has never been easy.’

(3b)

…从来

容易。

conglai

dou

bu

rongyi

‘…has never been easy.’

(4a)

心悦神怡

感觉

非常

美妙

的。

zhe

zhong

xinyueshenyi

de

ganjue

shi

feichang

meimiao

de

this

kind

joyful

DE

feeling

SHI

very

beautiful

DE

‘This kind of joyful feeling is very beautiful.’

(4b)

心悦神怡

感觉

非常

美妙。

zhe

zhong

xinyueshenyi

de

ganjue

feichang

meimiao

this

kind

joyful

DE

feeling

very

beautiful

(5a)

效果

不错

的。

 

qi

xiaoguo

ye

shi

bu-cuo

de

 

its

effect

also

SHI

not-bad

DE

 

‘Its effect is also not bad.’

(5b)

效果

不错。

qi

xiaoguo

ye

bucuo

its

effect

also

not-bad

In the above examples, although it cannot be said that it is grammatically incorrect to use the predicative shi, (3a–5a) are certainly not as natural and idiomatic as (3b–5b) when there is no need for emphasis or contextual contrast; these sentences simply read like translations.

The LCMC and ZCTC corpora respectively contain 456 and 578 instances of the pattern “shi + (adverb) + adjective + DE + punctuation mark”, a typical predicative structure in Chinese. The quantitative difference between the two corpora is statistically significant (LL = 12.19, p < 0.001), suggesting that the predicative structure is much more frequently used in Chinese translations. Then to what extent is the structure transferred from the English source texts? The Babel parallel corpus shows 568 occurrences of the structure “be + adjective”, of which 197 are translated into Chinese as the predicative shi structure, accounting for more than one third of the total instances. Examples similar to (3a–5a) are abundant in the Babel corpus. These are clearly a result of source language interference.

4.3 The bei Passive Construction

This section considers Chinese passives marked with bei. Passives that profile the agent are conventionally called long passives while those that do not are known as short passives (Xiao et al. 2006). Bei passives can take either long or short form.

Figure 11 shows the proportions of short and long passives in native and translated Chinese, and for comparative purposes, the corresponding figures in the native English corpus FLOB are also included. As can be seen, although short passives are more frequent than long passives in both native and translated Chinese, the proportion of short passives in ZCTC is significantly greater than in LCMC (LL = 63.1, p < 0.001). The higher proportion of short passives in translated Chinese is clearly a result of source language interference, because the short passive is the statistical norm of passive use in English (see Xiao et al. 2006), which accounts for over 90 % of the total, as shown in Fig. 11. The passive in English is a strategy for expression in that it is used when the agent is unknown or there is no need to mention the agent. In Chinese, in contrast, three out of the five syntactic passive markers (wei…suo, jiao, rang) can only occur in long passives, while the proportions of short passives for the other two (60.6 % and 57.5 % for bei and gei respectively) are considerably lower than that of English passives (Xiao et al. 2006). As earlier Chinese grammarians Lü and Zhu (1979) and Wang (1985) noted, the agent must be included in the Chinese passive, though this constraint has become more relaxed. When it is hard to identify the agent, vague expressions such as ren ‘person, someone’ or renmen ‘people’ is specified as the agent, which seldom occur in English passive use. In cases where English uses the passive but does not profile the agent, Chinese tends to avoid the passive.

Fig. 11
figure 11

Short and long forms of bei passives

Figure 12 compares the pragmatic meanings expressed by bei passives in LCMC and ZCTC and by English be passives in FLOB. As can be seen, there are significant differences in the proportions of different pragmatic meaning categories between the three corpora (LL = 212.28 for 2 d.f., p < 0.001), with the translated Chinese corpus positioned between the native Chinese and native English corpora, and particularly marked contrasts in neutral and negative meaning categories. Passives in English and Chinese have different functions. English passives primarily function to mark a formal, objective and impersonal style, and are thus pragmatically neutral whereas Chinese passives are an “inflictive voice” that tends to express a negative pragmatic meaning, evaluating the event being described as undesirable, unfavourable or adversative (Xiao et al. 2006). This is because the prototypical passive marker bei is derived from a verb in ancient Chinese which meant ‘suffer’. Consequently, many disyllabic words with bei in modern Chinese refer to something undesirable, e.g. beibu ‘be arrested’, beifu ‘be captured’, beigao ‘the accused’, beihai ‘be victimised’, and beipo ‘be forced’, though the semantic constraint on passive use in modern Chinese is no longer as rigid as before (Xiao et al. 2006).

Fig. 12
figure 12

Pragmatic meanings expressed by bei passives

Native and translated Chinese texts also differ in the frequency of passives and in the distribution of passives across genres. Figure 13 shows the normalised frequencies of passives in different genres in the two Chinese corpora. It is clear that the overall mean frequency of passives is significantly greater in translated Chinese than in native Chinese (LL = 69.59, p < 0.001). Given that passives are over ten times as frequent in English as in Chinese (Xiao et al. 2006: 141–142), it is hardly surprising that translated Chinese texts in ZCTC (99 % translated from English) make more frequent use of passives than original Chinese writings. It can also be seen that the most marked contrasts between native and translated Chinese in the distribution of passives are displayed in reports and official documents (H), news reviews (C) and academic prose (J), where passives are significantly more common in translated Chinese, and in detective stories (L), where passives are substantially more frequent in native Chinese.

Fig. 13
figure 13

Distribution of bei passives in LCMC and ZCTC

Such distribution patterns of passives in native and translational Chinese are closely related to the different functions of passives in Chinese and English, the overwhelmingly dominant source language in our translational corpus. Since mystery and detective fiction (L) is largely concerned with victims who suffer from various kinds of inflictive events that are usually described using passives in Chinese, it is hardly surprising to find that the inflictive voice is more common in this genre in native Chinese. On the other hand, expository genres like reports and official documents (H), press reviews (C), and academic prose (J), where the most marked contrast is found between translational and native Chinese, are all genres of formal writings that make greater use of passives in English. When texts of such genres are translated into Chinese, passives tend to be carried over and overused in translations because of source language interference or shining through. In such cases, a native speaker of Chinese would not normally use the passive when they express similar meanings. For example, the translated example 该 证书 就 必须 被 颁发 (this certificate then must PASSIVE issue) (ZCTC_H) is clearly a direct translation of the English passive “Then the certificate must be issued”. To express this meaning, a native Chinese is very likely to avoid using the passive: 该 证书 就 必须 颁发 (this certificate then must issue) (Xiao and Dai 2010; Xiao 2010: 28).

The differences between native and translated Chinese in their use of bei passives as discussed above can reasonably be regarded as the result of source language interference arising from cross-linguistic differences between English and Chinese (Dai and Xiao 2011). Then to what extent does source language interference occur in English-to-Chinese translation, i.e. the extent to which bei passives in Chinese translations are transferred from English source texts? We will seek to answer this question on the basis of English-Chinese parallel corpora in the remainder of this section.

A search for the Chinese passive marker bei in the Babel parallel corpus returned 526 instances in Chinese translations, which can be divided into two categories according to whether a passive form is used in the English source text. A total of 446 instances of passives are transferred from the English source texts (including the structure of be or other copular verbs such as get, become, feel, look, remain and seem followed by a past participle). For the remaining 80 instances of passives in Chinese translations, a passive form is not used in the English source texts (cf. Dai and Xiao 2011). It can be seen that the majority of the passives (about 85 %) in Chinese translations are transferred from English source texts, a finding which is in line with Teich (2003: 196). Furthermore, even those instances of passives in Chinese translations which are not directly carried over from English passives can be traced back to the influence of English source texts (e.g. the past participial constructions).

As noted earlier, there are considerable variations in the distribution of passives across genres. In genres of expository writing passives are significantly more frequent in translational Chinese while the contrast is less marked in genres of imaginative writing. This suggests that literary and non-literary texts behave differently in terms of their use of passives in English-to-Chinese translation. As Babel is a corpus of mixed genres, it cannot be used to investigate how source language interference varies in literary versus non-literary texts. In order to explore source language interference in literary and non-literary texts, we will compare the distribution of passives in the English-to-Chinese Literature and English-to-Chinese Non-literature components of the GCEPC parallel corpus.

Figure 14 compares the frequencies of transfer and non-transfer of passives from the source texts in literary and non-literary subcorpora. Of the 553 instances of passives in the literary component, 405 instances are derived from English passives, accounting for 73 % of the total; and of the 768 occurrences in the non-literary component, 712 instances are transferred from English source texts (93 %). This means that as far as English-to-Chinese translation is concerned, source language transfer of passive constructions is more likely to occur in the non-literary than literary translation. This is because a large part of non-literary work relates to genres in English that tend to overuse passives including, for example, official documents and scientific writing.

Fig. 14
figure 14

Source language interference in literary and non-literary texts

5 Conclusions

This article has investigated the phenomenon of source language interference in English-to-Chinese translation by undertaking a contrastive study of a range of lexical and grammatical features in translational Chinese in relation to comparable native Chinese. The lexical features investigated include mean word length, affixes, pronouns and word clusters while at grammatical level mean sentence segment length, the predicative shi structure and passive constructions are considered. The results demonstrate that the source-induced difference between translational and native Chinese at both lexical and grammatical levels indicates that the phenomenon of source language interference is observable in English-to-Chinese translation. This study has thus uncovered a fresh body of evidence from translation involving two genetically distant languages, English and Chinese, which supports the hypothesis of source language interference or shining through that has previously been studied only in closely related languages including English, German and Czech. Our case study of the translation of passive constructions in English-to-Chinese parallel corpora suggests that source language interference or shining through typically occurs in 85 % cases in data of mixed genres, with a higher transfer rate of 93 % for non-literary translation in comparison with 73 % for literary translation.

Given the typological distance between the two languages involved in the translation under consideration, the evidence revealed in this study is of critical importance if source language interference or shining through is to be generalised as a universal feature of translation. Future research in this area will benefit from the investigation of a wider range of linguistic features of translational Chinese, and from more language pairs given the availability of appropriate corpus resources. As regards the parallel corpus approach, this study has only considered the direction of English-to-Chinese translation. As source language interference is asymmetrical in different directions of translation because of cross-linguistic differences, it will also be worth investigating the direction of Chinese-to-English translation.