1. Purpose and Significance
The conventional translation study is usually based on qualitative and
cognitive approaches, which lead to the lack of abundant evidence and data in
the course of analysis. In western countries, some significant papers by Mona
Baker indicate the inception of the corpus-based translation studies and a large
number of insightful researches were made in the last twenty years. However,
this quantitative method has not been adopted until the 21st century, and the
domestic academic level of corpus and its relative studies are still at its
infancy. The purpose of this thesis lies in using the quantitative method
facilitated by computer parallel corpora to compare and contrast the stylistic
features of two English versions of Lu Xun‟s novel “The New Year‟s Sacrifice”,
namely, one translated by Yang Xianyi & Gladys Yang and the other by William
A. Lyell. This thesis tries to apply the latest approach to contrastive study of
translations, which features abundant statistics and data analysis for stylistic
and linguistic evidence from the two selected translations with an aim to
provide more objective and scientific findings compared with conventional
qualitative translation studies. This will be the first thesis on translation
studies from the perspective of Corpus Linguistics by a student from Zhejiang
University City College.
2. Current Literature Review
First things first, the basic concepts and methodology come from several
classical textbooks in Corpus Linguistics, Translation Studies and Stylistics.
Graeme Kennedy‟s An Introduction to Corpus Linguistics, John Sinclair‟s Corpus,
Concordance, Collocation, and Anthony Woods‟s Statistics in Language Studies all
must be read as an introductory guidance which present a panoramic theoretical
overview on Corpus Linguistics. Concerning Stylistics, Geoffrey Leech and
Michael Short‟s Style in Fiction: A Linguistic Introduction to English Fictional
Prose and David Hoover‟s Some Approaches to Corpus Stylistics show the
theoretical background and practical steps of conventional stylistic case
studies. For Translation Studies, Yang Xiaorong‟s Introduction to Translation
Criticism is a well-acknowledged book on the philosophy and methodology in this
field. Not surprisingly, great advantages could be easily seen if Corpus
Linguistics, Translation Studies and Stylistics are combined comprehensively.
The adoption of corpus in translation studies began in the early 1990s.
“Recognizing the potential for using corpora in translation research in the
early 1990s, Mona Baker made a number of suggestions on which much corpus-based
analysis of translation has subsequently been founded” (Olohan 35-56). David
Hoover, in his paper Some Approaches to Corpus Stylistics, claims that corpus
stylistics can be valuable to literary studies. He says that with the help of
corpus linguistics we can explore the degree to which an author has succeeded in
creating distinct individual voices within a novel, which will be of great
importance in stylistic studies. Besides, the author holds that corpus can also
be equally useful to analyze short texts, which provides a theoretical framework
for self-built corpus in stylistic studies.
And Baker‟s suggestion was introduced into China by Zhang Meifang (张美芳) in
2002. Then the corpus-based translation studies have witnessed great progress in
the following years which provides precious inspiration and suggestion for this
Then, as for the relevant studies in recent years, two papers and one book
are of great reference value. The book is Introducing Corpora in Translation
Studies by Maeve Olohan, one of Mona Baker‟s colleagues, which provides several
basic case studies concerning using corpora in translation studies and stylistic
contrasts. And the two papers are The Use of Parallel Corpus in Translation
Criticism – Take Different Versions of Bacon's Of Studies for example by Xu Xin
(徐欣), and Corpus-based Version Analysis – Take Pride and Prejudice for Example
by Xu Wei (徐伟). Sharing almost the same methodology and research steps, both
papers use corpus to measure the source text and the translations by some very
basic and superficial statistics, word count, sentence count, average sentence
length, wordlist, theme word, Type-token Ratio, etc. Their findings are
inspiring but the results are merely presenting data rather than using it as a
source of evidence to investigate insightful findings in stylistic and
In 2010 Corpus Linguistics boomed in China, and several important papers
contribute a remarkable development in text mark-up, data analysis, theory
Wan Lifang‟s (万丽芳) Lexical Diversity Research on L2 English Majors’
Writing, though not featured in translation studies, shows a quantitative and
scientific method to measure the lexical diversity, that is, lexical variation,
lexical diversity and error dimension, in English texts. This method could be
adopted in this thesis to see the lexical differences between L2 translator Yang
Xianyi and L1 translator William A. Lyell. Xiao Zhonghua (肖忠华) and Dai
Guangrong‟s (戴光荣) Finding “The Third Code”: A Corpus-based Research on
Translation Universality measures some typical
English features like conjunction and passive voice. These two features are
the main differences between Chinese and English, and from the statistics we may
see to what extent L2 translators could be influenced by their first language.
Xing Fukun (邢富坤) and Cheng Dongyuan‟s (程东元) paper The Readability Research Based
on the Language Statistical Model¸ within the framework of readability theory,
interprets some commonly seen data, such as word count, sentence count, average
sentence length, etc. With the help of this paper, some basic data can be
interpreted and studied under a broader contextualization of theory. Last but
not the least, some panoramic review on the development of corpus-based
translation studies and stylistics can also be found this year, for example,
Zhou Xiaoling (周小玲) and Jiang Jiansong‟s (蒋坚松) The Development of Corpus-based
Translator’s Stylistic Research Abroad (2000-2009), David Hoover‟s Some
Approaches to Corpus Stylistics, etc.
3. Key Points and Possible Difficulties
First, as one of the objects of this thesis is to compile a self-built
parallel corpus, all necessary processes of compiling small-scale corpora must
be fulfilled which would be time-consuming, such as Optical Character
Recognition (OCR), text proofreading, and text mark-ups. Great patience, careful
treatment and well-knit linguistic knowledge must be involved in this
Second, some technical difficulties may be met in the following stages of
research, such as software compatibility, hardware stability and the unified
encoding format. The mainstream corpus software available at this time is mostly
developed in the late 1990s and early 2000s for Windows NT 5.0 family or even
earlier. As a counter plan, virtual technology has been tested successfully to
ensure its good functioning.
Third, data analysis is another critical aspect as well. How to testify the
validity of the data? How to determine which statistical method gives the most
“near-fact” evidence? Will a Manipulation of Sparse Array be needed if the
sample size is too small? All these questions must be answered in the next
several months. Data, however, is not the end itself, even if it is quite
accurate and fact revealing. As Kennedy addressed in his An Introduction to
Corpus Linguistics, “Corpus Linguistics is not an end in itself but is one
source of evidence for improving descriptions of the structure and use of
languages” (Kennedy 1). This thesis will consider data as the evidence source
rather than the destination of itself. Finally and necessarily, the data is
going to be used to compare and contrast stylistic features of the two
translations. All data must be analyzed by conventional qualitative method in
each section to indicate the significance of data, that is, what does the data
mean, or in which way can we interpret the data by conventional qualitative
theory? In order to achieve this goal, some case studies in corpus
4. Approaches and Methods
This thesis will be conducted on the basis of data collection and analysis
of a small-scale self-built parallel corpus, by which quantitative and
qualitative methods will be combined comprehensively to reach a final conclusion
on stylistic contrasts between the source text (ST) and target texts (TTs). To
reach this goal, ST and TTs should be made machine-readable and proofread, and a
parallel corpus should be compiled and marked up.
5. A Tentative Outline
2. “The New Year‟s Sacrifice” and its Two English Versions
3. Approaches and Methods: Self-built Parallel Corpus
3.1. Theoretical Background
3.2. Study Procedure
4. A Corpus-based Stylistic Contrast between the ST and TTs
4.2. Lexical Diversity
4.2.1. Lexical Density by TTR and Standardized TTR
4.2.2. Lexical Variation by Uber Index
4.2.3. Lexical Sophistication by Word Frequency Profile
4.3. Differences in Translational Language
4.3.1. Conjunction Use
4.3.2. Passive Voice
6. Work Schedule (The date indicated is the deadline for each stage) 24th
Oct -14th Nov.
2011: Meet supervisors and discuss possible topics and references.
17th Nov. 2011: Decide on the topic and start writing the Research Proposal
and Literature Review.
12th Dec. 2011: Defend the Research Proposal.
30th Dec.2011: Finish the Literature Review.
13th Feb. 2012: Finish the first draft of the thesis.
13th Apr. 2012: Finish the second draft based on the feedback.
28th Apr 2012: Finish the final draft.
7th May 2012: Defend the thesis.
25th May 2012: Finish the follow-up work.
 Baker, Mona. “Corpus Linguistics and Translation Studies: Implication
and Application” [A]. Text and Technology [C]. Ed. Mona Baker. Amsterdam: John
Benjamin‟s, 1993. 233-250.
 Baker, Mona. “Corpora in Translation Studies: An Overview and Some
Suggestions for Future Research” [J]. Target, 1995,(2): 223-243.
 Hoover, David. “Some Approaches to Corpus Stylistics” [J]. Journal of
Foreign Languages, 2010, (2): 67-81.
 Kennedy, Graeme. An Introduction to Corpus Linguistics [M]. Essex:
 Leech, Geoffrey and Short, Michael. Style in Fiction: A Linguistic
Introduction to English Fictional Prose [M]. Essex: Longman, 1982.
 Lyell, William A. Diary of a Madman and Other Stories [M]. New York:
Penguin Classics, 1973.
 Olohan, Maeve. Introducing Corpora in Translation Studies [M]. Oxford:
 Sinclair, John. Corpus, Concordance, Collocation [M]. Oxford: Oxford
University Press, 1991.
 Woods, Anthony, Paul Fletcher and Arthur Hughes. Statistics in Language
Studies [M]. Cambridge: Cambridge University Press, 1986.
 Yang, Xianyi and Gladys Yang. The New-Year Sacrifice and Other Stories
[M]. Hong Kong: The Chinese University Press, 2002
 卢卫中, 夏云. 语料库文体学：文学文体研究的新途径[J]. 外国语, 2010, (1): 47-53.
 鲁 迅. 彷徨[M]. 北京: 人民文学出版社, 1979.
 万丽芳. 中国英语专业大学生二语写作中的词汇丰富性研究[J]. 外语界, 2010, (1): 40-46.
 肖维青. 自建语料库与翻译批评[J]. 外语研究, 2005, (4): 60-65.