Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Introduction: English-Original and Polish Translations of Lolita

Lolita (Nabokov 1955) is one of the best known novels by Vladimir Nabokov, which firmly established him as an outstanding American novelist. Due to its highly controversial–at that time–subject matter, Nabokov was unable to find the publisher of the novel in the United States of America, and instead the book was published in France in 1955 by Olympia Press. The first American edition was issued by G. P. Putnam’s Sons Publishing House in New York only in 1958 (Boyd 1995: xlv). The novel is written in a highly artistic, masterful and precise style, which made Nabokov one of the most brilliant and idiosyncratic stylists of English (Stiller 1991: 421).

The history of translation of Lolita into Polish has been quite turbulent. The first attempt to translate the novel was undertaken in 1958 in Tel-Aviv, where a translation produced by an anonymous journalist made the papers of the Polish-language weekly Przegląd. According to Stiller (1991: 434), the author of this abridged version––which covered only three fourths of the length of the English original––was a journalist Moshe Balsam. Stiller (1991: 434) further writes that in the years to follow, there were more fragments of the novel translated into Polish, which made it to the papers, e.g. to the weekly Przekrój, where fragments of the novel translated by Juliusz Kydryński came out in 1959; to the Polish émigré weekly Wiadomości printed in London (in 1961) translated by Jerzy Tepa; to the Odra weekly with the fragments of the novel translated from Russian by Eugenia Siemaszkiewicz (in 1974); to the weekly Tygodnik Kulturalny, where the first seventeen chapters of the book translated by Robert Stiller were printed in 1987. The first full and unabridged Polish translation of Lolita appeared in 1991, and it was completed by Robert Reuven Stiller.

The translation by Stiller is accompanied by an extensive commentary concerning the project (Stiller 1991). The translator claims that four reference materials provided the basis for his translation, namely: (1) The Annotated ‘Lolita’ by A. Appel, Jr. and Nabokov, which is an annotated text of the novel accompanied by commentaries and notes, which provide further explanations of Nabokov’s referrals, puns, archaic, foreign and invented words etc. as they appear in the English Lolita; (2) Keys to ‘Lolita’ by Carl A. Proffer, which is another extensive commentary on the novel; (3) the original English language version of Lolita, and (4) the Russian self-translation of the novel by Nabokov. As a result, Stiller’s translation is based on both English and Russian language versions of the novel (Stiller 1991: 435–436).

The second full-version translation of the novel into Polish was completed six years later by Michał Kłobukowski (i.e. in 1997). Nevertheless, Stiller (1997) added piquancy to Kłobukowski’s translation. Immediately after the second full-version translation of Lolita was released onto the market, Stiller (1997: 6–7) published an incriminating article in the literary journal Wiadomości Kulturalne accusing Kłobukowski of glaring incompetence and plagiarism of his own translation of Lolita.

Thus, it is believed that the tempestuous past of the English-original of Lolita as well as a stormy verbal duel between the two Polish translators of the novel make this particular text even more interesting object of a comparative study.

Universal Features of Translation

In her seminal paper, Baker (1995: 243) puts forward the idea of universal features of translation or translation universals, which are specific textual characteristics (e.g. lexical, grammatical or stylistic) typical of translated texts, irrespective of languages involved in the translation process. Further, Baker posits a number of hypotheses on the differences between translational and non-translational language, e.g. that translations tend to be, among others, more explicit as regards lexis and syntax than non-translated texts, their content and form is simplified if compared with non-translated texts, and that language used in translation is more conventional and less creative than the one used in non-translated texts (Baker 1995).

In the same vein, Kenny (2001: 53–54) claims that translations exhibit distribution of lexical items that distinguishes them from original texts in the same language, which accounts for a symptom of specific translation strategies or tendencies, such as, among others, explicitation, simplification, normalization, sanitization and levelling-out. According to Olohan (2004: 92), these patterns are specific to translations and are seen to be more typical of translational language than of non-translational one. In addition, characteristics of translational language are a product of constraints inherent in the translation process and do not vary across cultures (Olohan 2004: 92). Thus, it is essential to study linguistic patterns which are specific to translated texts, irrespective of source and target languages (Laviosa-Braithwaite 1995: 153). Finally, Kenny (2001: 54) hypothesizes that translation universals have predictive power, which follows that if one accepts that some type of lexical or stylistic characteristics constitutes a translation universal, it means that one may predict the said characteristics in instances or samples of translation that one has not yet encountered (Kenny 2001: 54).

Therefore, in this study, the English-original version of the novel Lolita (henceforth ‘ENL’) will be compared with its two independent Polish translations (henceforth ‘PLS’ and ‘PLK’, respectively) to identify the differences as regards text length, sentence length, number of repetitions (conciseness of style), and to find traces, if any, of translation universals.

For the purposes of this study, a typology of translation universals [TUs] proposed by Chesterman (2004: 6–7) was applied. Chesterman distinguished between two types of TUs: the S-universals, which are related to translation from the source to the target language, and the T-universals, which are related to comparisons of translational and non-translational texts (i.e. target-language texts, which are not translations). In this article, which deals with comparison of the English source-text and its two Polish translations, the search for S-universals will be pursued.

Methodology, Research Material, Tools and Stages of the Analysis

In order to provide answers to the aforementioned study questions, the corpus-driven methodology was applied. In contrast to the corpus-based approach, which always works within commonly accepted frameworks of theories of language, or––in other words––is theoretically-committed (which implies prior classification of linguistic data), the English-original and Polish translations of Lolita were not adjusted to fit any predefined categories or theoretical schemata. Thus, the study questions were addressed through empirical analysis of frequency distributions of words and recurrent patterns of language use as found in the aforementioned texts. As a result, the novels were compared through bottom-up observation of empirical linguistic data, which were presented in quantitative terms and, where necessary, supplemented with qualitative observations.

According to Hoover (2004: 517–533), the aim of such quantitative approaches to literature is to represent elements or characteristics of literary texts numerically, applying the powerful, accurate, and widely accepted methods of mathematics to measurement, classification, and analysis. Furthermore, the availability of texts in electronic format has increased the attractiveness of quantitative approaches as innovative ways of reading amounts of text that overwhelm traditional modes of reading Hoover (2004: 517-533). It is therefore believed that quantitative approaches, such as the corpus-driven one presented in this study, enable one to study translational style and its variations from a different perspective, and to put forward more fine-grained hypotheses or research questions to be addressed in qualitative studies in the future.

The texts used in the analysis, i.e. the English-original as well as its two Polish translations, were purchased in bookstores in paper format and they were further converted into machine-readable format supported by the software used throughout the study. To that aim, the texts were manually scanned and subjected to the OCR procedure. The scanned texts were then subjected to repeated proofreading in order to ensure spelling accuracy, and they were further verified against the paper format versions. At that stage, any cases of misrecognition of characters were edited and corrected using a spellchecker, or a search-and-replace facility of a word processor. Finally, the texts were saved in two files in a plain text format.

The corpus-driven analysis conducted in this study was facilitated by the use of the computer software WordSmith Tools 4.0 developed by Scott (2004), which is a suite of programs custom-designed for text analysis.

Corpus-Driven Comparison of Stylometric Indicators

The corpus-driven analyses used in this study encompass comparisons of descriptive statistics, which presents basic stylometric indicators of style (number of running words, i.e. text length, number of distinct words, i.e. vocabulary used, TTR and STTR, which are measures of lexical variety, number of sentences and length of sentences used). The study ends with the comparison of frequency profiles and frequency spectra, which enable one to gain an insight into distribution of top-frequency and bottom-frequency words, respectively.

Descriptive Statistics

Descriptive statistics describes linguistic data in quantitative terms, and present basic indicators of style and lexical richness (Olohan 2004: 78–81). Hence, it provides a holistic view of the English-original of Lolita and its two Polish translations by Stiller and Kłobukowski (ENL, PLS, PLK, respectively). Their characteristics are presented in Table 1.

Table 1 Descriptive statistics for ENL, PS and PK

Hence no lemmatization was conducted on either ENL, PLS or PLK, the indicators such as the number of types, TTR and STTR are inflated for the Polish translations,Footnote 1 and thus impossible to serve as the basis for comparison. It is due to the fact that the texts represent typologically different languages, i.e. English, which is highly-analytical, and Polish, which is more synthetic as regards morphology. Nevertheless, as Sinclair (1991: 8) claims that each distinct inflectional form is potentially a unique lexical unit, the issue of non-lemmatization was ignored and the study focused on the remaining indicators, which are relevant and valid irrespective of typological differences between the two languages.

As far as the length of the original and the translations, one may arrive at the following conclusions. Firstly, the data show that both Polish translations are shorter than the source-text in terms of the number of running words, or tokens (i.e. 112,230 versis 101,130 and 95,936 in ENL, PLS and PLK, respectively, which yields the ratio of original-to-translation at 1.11 and 1.17). In other words, Stiller required––on average––901 words to translate 1,000 English words in the original; Kłobukowski, on the other hand, required only 854 words to do the same. On the surface, this finding contradicts the hypothesis on explicitation in translation. According to Nida and Taber (1974: 163), one should expect translation to be longer than original text because translators tend to explicitate phenomena which are non-existent in the language of translation. This assumption has not been validated in this study.

On the other hand, if one takes into consideration the size of ENL, PLS and PLK measured in characters, the results are the opposite––the English-original is shorter than both Polish translations (1,261,546 versus 1,370,082 and 1,331,058 characters in ENL, PLS and PLK, respectively). It yields the original-to-translation ratio at 0.92 in the case of PLS, and at 0.94 in the case of PLK.

Overall, this case shows that comparison of length of texts written in different languages on the basis of the number of running words is misleading; the number of characters, including letters, digits, punctuation and spaces, constitutes a more reliable indicator in such comparisons (Mikhailov 2003: 167), particularly when one compares texts written in typologically or genetically unrelated languages.

The answer to this discrepancy is to be searched in typological differences concerning morphology. The frequent use of articles in the English language means that the number of running words in any English text is higher than in the translation into a language without articles, which is the case of Polish. On the other hand, it is dubious that every utterance in English is longer than an analogical utterance in Polish (particularly in a translation situation involving real texts). An important observation, however, refers to the fact that a synthetic language (such as Polish) has more synthetic (i.e. longer) word forms used, while a more analytical language, which is in this case English, has less synthetic word forms (i.e. shorter), which is due to poorer inflection. This difference is reflected in the mean word length, which accounts for 4.40 characters in ENL and 5.50 and 5.56 characters in PLLS and PLLK, respectively.

Therefore, one is made to conclude that Polish translations of Lolita are longer than its English original. However, it remains a debatable issue whether this pattern is typical of any English-to-Polish translation in general. The findings of a number of stylometric studies of originals and translations (Englund Dimitrova 1993, 1994; Mikhailov and Villikka 2001; Mikhailov 2003; Scarpa 2006; Rybicki 2007) show that the length of translation as compared with its source-text varies depending on language pairs and a direction of translation. Also, Baker (2000) suggests that this variation in original-to-translation ratios is due to translators’ individual styles or idiolects. As a result, further studiesFootnote 2 conducted on larger parallel English-Polish corpora, containing texts representing different genres and types, are necessary to validate the universalist claim that Polish translations from English tend to be longer than their source-texts.

Also, the data presented in Table 1 show that the translation by Stiller is considerably longer than the one by Kłobukowski (by 5,194 running words or 39,024 characters). Further, taking into consideration the fact that the shorter translation by Kłobukowski has a higher number of word types than the longer translation by Stiller, one can conclude that PLS has more repetitions than PLK. This observation is further corroborated by the mean frequency of a word type, which is higher in PLS (3.51 versus 3.32 in PLK). Eventually, it shows that PLK has higher lexical density than PLS.

As regards lexical density measured by the STTR, the data show that Kłobukowski’s translation is lexically richer than Stiller’s translation. On average, there appear 700 word types per 1,000 word tokens in PLK, whereas in the case of PLS there are 660 tokens. It means that PLK is more complex and specific lexically and has fewer repetitions as compared with PLS.

The data on the number of sentences (5,628 versus 5,529 in PLS and PLK, respectively) and the mean sentence length (17.96 versus 17.35 in PLS and PLK, respectively) show that Stiller used 99 more sentences, which at the same time are slightly longer than the ones used by Kłobukowski. Further, the fact that Stiller uses 5,194 more words and longer sentences in the translation can mean that Kłobukowski’s translated sentences are more concise and terse as compared with more explicit Stiller’s sentences. On the other hand, the number of sentences in the English-original (5,549) shows that Kłobukowski was more consistent in translating in sentence-for-sentence fashion, whereas Stiller exhibited more flexibility in this respect. Overall, there are 79 more sentences in PLS than in ENL. Such a manipulation on the number of sentences on the part of Stiller is further confirmed by a higher value of the mean sentence length standard deviation in PLS (18.97 as compared with 17.75 in PLK). Thus, it is possible to put forward the hypothesis that Stiller’s translated sentences are more explicit and precise as compared with Kłobukowski’s more concise and terse sentences.

Taking into consideration the mean sentence length in the English-original version of the novel, which is 20.22 tokens, the corresponding figures for PLS and PLK show that both translators employed faithful sentence-for-sentence translation and used long-form constructions to translate the novel (Stiller, in particular). As the mean sentence length for the Polish prose is 11.90 tokens (Ruszkowski 2004: 34),Footnote 3 the data show that both Stiller’s and Kłobukowski’s sentences are untypical and differ from the ones in the non-translational texts, i.e. typical Polish novels.

Comparison of Wordlists

In order to compare ENL, PLS and PLK in terms of type, range and distribution of the most frequently used vocabulary, the wordlists were generated for these three texts. As a rule, wordlists highlight top-frequency grammatical words, which means that it is difficult to identify any lexical differences between the original and the two translations, which can be markers of translators’ style. To remedy this inconvenience, grammatical words were deleted from the top-frequency items, and the most frequently used lexical (content) words in ENL, PLS and PLK are presented instead. Such a filtered-out wordlist with 25 top-frequency lexical words is presented in Table 2.

Table 2 Wordlists with top-frequency content words in ENL, PS and PK

As, at least hypothetically, the three texts convey the same information, it is no surprising the most content words overlap in the source-text and its translations. These words include, among others, names of protagonists (Lolita, Lo, Charlotte, Humbert). However, the data also show that some differences between the source-text and its translation result from typological differences between language systems of the two languages, e.g. more analytical English morphology inflates frequencies of the most frequently used word types as compared with their lower values for Polish texts. For example, the high frequency (1,791 in aggregate) of the verbs was, were and have in ENL results from their functioning not only as inflectional forms of the verbs to be and to have, but also from being auxiliary verbs used in multiple grammatical tenses. It explains their higher frequency as compared with aggregated frequency (532 and 498 occurrences in PLS and PLK, respectively) of the corresponding verb forms in Polish, e.g. było, była, był, byli, byłaś, byłeś, byliśmy, byliście, byłyśmy, byłyście. Also, one may notice the high frequency of broad-meaning English verb forms, such as said and made, which do not have their potential equivalents in PLS and PLK among top-frequency content words.

The above examples also refer to one of specific problems of translation between English and other Slavic languages (e.g. Polish or Russian). Extending the assumption made by Comrie (1981: 31–79) with reference to Russian, it seems that the Polish language is more explicit semantically (i.e. words have more specific meaning distinctions) than English, which in turn is more ambiguous and vague in its surface forms. Hence, English largely depends on pragmatic and contextual information in specifying exact interpretation of its linguistic forms (e.g. a past tense reporting verb said), which are broad in meaning. According to Piotrowski (1994: 95–96), although the English language has both broad-meaning and specific lexemes, users of English tend to choose the ones with broad meaning rather than specific. Users of Russian and other Slavic languages, on the other hand, tend to choose specific lexemes, and that is the reason why they regard texts with multiple repetitions as ones with plain, simple, or even bad style (Piotrowski 1994: 96). As regards translation, the outcome can be that translation of English reporting verbs (or broad-meaning English lexemes in general) requires that more lexical words be used in Polish to produce a natural and acceptable translation.

Table 2 also reveals some characteristic features of the Polish translations. It shows that most top-frequency lexical words overlap in PLS and PLK. The exceptions to these are words such as znów (‘again’), wciąż (‘still’), czasu (‘time’, singular genitive case), Charlotte, dwa (‘two’), which are over-represented in PLS, whereas the words, such as potem (‘after’), ma (‘has’), teraz (‘now’), Lolity (singular genitive case), być (‘to be’) are over-represented in PLK. As regards the proper name Charlotte transferred by Stiller into the Polish text, Kłobukowski used Charlotta as an equivalent partly adapted to the Polish noun declension system. It is the only name of character that differs in the Kłobukowski’s translation. The remaining ones are the same in both texts. Thus, overall, one is made to conclude that Stiller’s translation has more repetitions among top-frequency grammatical words, which can pertain to sentences being more explicit and precise as compared with Kłobukowski’s translation. However, the two translations are similar in terms of high-frequency lexical words.

Frequency Profiles

In order to determine whether it is the English-original or the Polish translations of Lolita that has or have more repetitions and lower lexical variety in terms of top-frequency words, a frequency profile proposed by Baroni (2009: 805–806) was used. As a rule, the frequency profile is obtained by a replacement of words in a frequency list (which was completed with the use of WordSmith Tools 4.0) with their frequency-based ranks, by assigning rank 1 to the most frequent word, rank 2 to the second most frequent word, rank 3 to the third most frequent word etc. It enables one to answer the question which frequency-based ranks (r) of words (tokens) have a particular frequency (f). However, a typical frequency profile was modified in that frequency information was substituted with information on cumulative percentage of the total word count (%cW) corresponding to frequency-based ranks. The results are presented in Table 3.

Table 3 Frequency profiles for top-frequency word types in ENL, PLS and PLK

Although the data in Table 3 show that English Lolita (and any English text?) has more repetitions and lower lexical variety among top-frequency words, it is largely due to the lack of lemmatization. Furthermore, the typological difference regarding the character of morphology further confirms the above observation, e.g. articles and prepositions, which are frequently used in English, are treated as separate words, while in Polish various endings, prefixes and suffixes are bound with other stems or roots, which makes the frequencies of Polish words lower. Thus, it is no surprising to observe that the English text (actually, any English text), as compared with Polish, is dominated by top-frequency words (100 top-frequency words constitute almost 50 % of the total number of words used in the text, while in PLS and PLK the corresponding values are 36 % and 32 %, respectively). This observation may be therefore interpreted as the S-universal.

As regards the differences between the Polish translations, one may notice that in Stiller’s translation 549 word types account for 50 % of the total word count, while in Kłobukowski’s translation this threshold is reached at 758 word types. The data thus show that the translations are not uniform in that respect because PLK is unusually rich and considerably more varied lexically – there are 209 more word types in PLK which account for 50 % of the total word count as compared with PLS.

Frequency Spectra

According to Baroni (2009: 806), frequency spectra enable one to determine how many word types (w) in a frequency list have a particular frequency [w (f)]. As creative or author-specific vocabulary usually occurs in a text with low frequencies, frequency spectra can be used to study lexical variety and degree of repetitions among bottom-frequency words. As a rule, a text is more varied lexically if proportion of bottom-frequency words in the total word count (%W) is higher. For the purposes of this study, a number of word types (w) corresponding to particular frequency (f) in the frequency spectra was substituted with information on the cumulative percentage of the vocabulary (%cV) and the cumulative percentage of the total word count (%cW) corresponding to word types with frequencies 1–25. The results are presented in Tables 4, 5 and 6 below.

Table 4 Frequency spectrum for ENL
Table 5 Frequency spectrum for PLS
Table 6 Frequency spectrum for PLK

Interpreting the above data, it is paramount to remember that some of the differences are attributed to different language systems––more analytical (with poor inflection) English versus more synthetic (with rich inflection) Polish, where each inflectional form of a particular word type (e.g. genitive, accusative or locative case of the noun, in either singular or plural, feminine or masculine) is treated as a single occurrence of a type. It is a problem typical of operating with non-lemmatized types and tokens in highly-inflectional languages, such as Polish. With the view of the above, one is in a better position to understand the discrepancy in the data.

As illustrated by the data in Tables 4, 5 and 6, it appears that the Polish translations of Lolita are considerably more creative lexically than the English-original. Although such a claim is not based on the analysis of distribution of lemmas, but word types, it is clear that Polish texts contain more low frequency words, where one can usually find creative and author-specific vocabulary (Kenny 2001: 127–134).

Firstly, as regards the number of hapax legomena (i.e. words which occur in a text only once) in ENL, PLS and PLK, the English text has 6,984 hapax legomena, which account for 49.91 % of the total vocabulary (%V) and 6.22 % of the total word count (%W). The PLS and PLK, on the other hand, have 19,560 and 19,586 hapaxes, respectively, which account for approximately 68 % of total vocabulary (%V) and 20 % of the total word count (%W). Statistically, it means that every 16th running word is a hapax legomenon in ENL, while in PLS and PLK it is every 5th word––with the false proviso that words are normally distributed in a text. If one takes into consideration overall vocabulary, then in ENL hapax legomena constitute almost 50 % of the text’s lexis, while in the Polish translations they account for almost 70 % of all distinct words used.

As regards all word types with frequencies 1–25, the data show that the Polish translations have fewer repetitions and higher lexical variety among bottom-frequency words than ENL (i.e. all these word types account for nearly 35 % of the total word count in ENL and almost 55 % in PLS and PLK). Although this relationship can be treated as another S-universal in English-to-Polish literary translation, it is not known how far that result is influenced by the lack of lemmatization conducted on English and, in particular, on Polish language data. Finally, the data show that Stiller’s translation is more varied lexically as regards the number of low-frequency words (i.e. with frequencies 1–25) than Kłobukowski’s translation.

Conclusions

The aim of the study presented in this article was to compare—with the use of corpus-driven methodology––the English-original and the two Polish translations of Lolita by Stiller and Kłobukowski in terms of text length, sentence length, number of repetitions (conciseness of style) as well as frequencies and distribution of both word-types (distinct words) and word-tokens (running words). Also, the aim was to find traces, if any, of translation universals (S-universals, after Chesterman 2004) attested in the Polish translations.

Descriptive statistics revealed that Polish translations of Lolita are shorter than the English-original, and it is irrespective of the fact that the length measured by the number of running words indicates otherwise. It remains a contested issue, however, whether this pattern is typical of any English-to-Polish literary translation. Also, it was revealed that the sentences used in the Polish translations are shorter and thus more concise and terse than the ones found in the English-original. Hence, S-universal of explicitation in these particular translations was invalidated. On the other hand, the sentences used in the translations are longer than typical sentences found in Polish prose, which indicates that the translators used faithful sentence-for-sentence translation and long-form syntactic constructions. Comparison of lexical density showed that Kłobukowski’s translation is overall lexically richer than Stiller’s translation.

Comparison of wordlists showed that Stiller’s translation has more repetitions among top-frequency grammatical words, which can point to sentences being more explicit and precise as compared with Kłobukowski’s translation. Also, it was revealed that the source text and its two translations are largely similar in terms of high-frequency lexical words, except for the discrepancies due to typological differences between the morphology of the two languages, which were described in greater detail above, and which point to lexical explicitation in English-to-Polish translation.

Finally, comparison of frequency profiles and frequency spectra demonstrated that the English text, as compared with the Polish ones, is dominated by top-frequency words, an observation which may be interpreted as another S-universal, and that Kłobukowski’s translation is more lexically varied in terms of the use of top-frequency words than Stiller’s one, which has more bottom-frequency words. It was also found that Polish translations have fewer repetitions and higher lexical variety among bottom-frequency words than the English original.

To conclude, it seems that further qualitative research should be conducted to bring to life concrete illustrations of both typical and anomalous cases glossed over in a quantitative text analysis presented above. It is vital since it is still unknown what factors (and to what extent?) impact basic stylometric indicators presented throughout this study. The very impact of source language and target language, direction of translation, genre-specific characteristics, text type, register characteristics, translator’s idiolect, author’s idiolect, translator’s and author’s ideologies, source-language culture, target-language culture, onto basic stylometric indicators and, more generally, onto the scope and character of language universals still remain a debatable issue and account for a rather unexplored research area, particularly in the case of English-to-Polish literary translation.