Classical Chinese Language and Translation

Top PDF Classical Chinese Language and Translation:

Transliteration Extraction from Classical Chinese Buddhist Literature Using Conditional Random Fields with Language Models

Transliteration Extraction from Classical Chinese Buddhist Literature Using Conditional Random Fields with Language Models

Cognates and loanwords play important roles in the research of language origins and cultural interchange. Therefore, extracting plausible cognates or loanwords from historical literature is a key issue in historical linguistics. The adoption of loanwords from other languages is usually through transliteration. In Chinese historical literature, the characters used to transliterate the same loanword may vary because of different translation eras or different Chinese language/dialect preferences among translators. For example, in classical Chinese Buddhist scriptures, the translation process of Buddhist scriptures from Sanskrit to classical Chinese occurred mainly from the 1st century to 10th century. In these works, the same Sanskrit words may be transliterated into different Chinese loanword forms. For instance, the surname of the Buddha, Gautama, is transliterated into several different forms, such as “瞿曇” (qü-tan) or “喬答摩” (qiao-da-mo), and the name “Culapanthaka” has several different Chinese transliterations, such as “ 朱 利 槃 特 ” (zhu-li-pan-te) and “ 周 利 槃 陀 伽 ” (zhou-li-pan-tuo-qie). In order to assist researchers in historical linguistics and other digital humanities research fields, an approach to extract transliterations in classical Chinese texts is necessary.
Show more

14 Read more

English to Chinese Translation: How Chinese Character Matters

English to Chinese Translation: How Chinese Character Matters

The ancient Chinese (or Classical Chinese, 文言文 ) can be conveniently split into characters, for most characters in ancient Chinese still keep understood by one who only knows modern Chinese (or Written Vernacular Chinese, 白话文 ) words. For example, “ 三人行,则必有我师焉。 ” is one of the popular sentences in the Analects ( 论语 ), and its correspond- ing modern Chinese words and English meaning are shown in TABLE 1. From the table, we can see that the characters in ancient Chinese have indepen- dent meaning, but most of the characters in modern Chinese do not, and they must combine together in- to words to make sense. If we split modern Chinese sentences into characters, the semantic meaning in the words will partially lose. Whether or not this semantic function of Chinese word can be partly re- placed by the alignment model and Language Model (LM) of character-based SMT will be shown in this paper.
Show more

11 Read more

Generating Chinese Classical Poems Based on Images

Generating Chinese Classical Poems Based on Images

Inspired by the work in machine translation and object detection, K. Xu et al. [12] present an attention based model that can automatically learn to describe the contents of images. Using standard backpropagation techniques, they train the model in a deterministic manner by maximizing a variational lower bound. K. Simonyan and A. Zisserman [13] use a very small (3×3) convolution filter architecture to evaluate the depth of the network in a comprehensive manner, indicating that significant improvements in the prior art configuration can be achieved by pushing the depth Up to 16-19 layers. Combining computer vision and natural language processing, Xiaobing [14], developed by Microsoft, can compose a modern Chinese poem given a picture. The system has learned 519 poet's modern poetry since 1920 and been trained more than 10000 times.
Show more

5 Read more

Generating Classical Chinese Poems from Vernacular Chinese

Generating Classical Chinese Poems from Vernacular Chinese

with supervised machine translation approaches (Cho et al., 2014; Bahdanau et al., 2015), un- supervised machine translation (Lample et al., 2018a,b) does not rely on human-labeled parallel corpora for training. This technique is proved to greatly improve the performance of low-resource languages translation systems. (e.g. English-Urdu translation). The unsupervised machine transla- tion framework is also applied to various other tasks, e.g. image captioning (Feng et al., 2019), text style transfer (Zhang et al., 2018), speech to text translation (Bansal et al., 2017) and clin- ical text simplification (Weng et al., 2019). The UMT framework makes it possible to apply neu- ral models to tasks where limited human labeled data is available. However, in previous tasks that adopt the UMT framework, the abstraction levels of source and target language are the same. This is not the case for our task.
Show more

10 Read more

A Classical Chinese Corpus with Nested Part of Speech Tags

A Classical Chinese Corpus with Nested Part of Speech Tags

We largely adopted the tagset of the Penn Chinese Treebank. As the standard most familiar to the computational linguistics community, their tagset has been used in annotating a large volume of modern Chinese texts, offering us the possibility of leveraging existing modern Chinese annotations as training data as we seek automatic methods to expand our corpus. For the most part, the Penn tagset can be adopted for classical Chinese in a straightforward manner. For example, the tag PN (pronoun) is used, instead of the modern Chinese pronouns 我 wo ‘I’ and 你 ni ‘you’, for the classical equivalents 吾 wu ‘I’ and 爾 er ‘you’. Similarly, the tag SP (sentence-final particles) is applied, rather than to the modern Chinese particles 吧 ba or 呀 a, to their classical counterparts 耳 er and 也 ye. In other cases, we have identified roughly equivalent word classes in classical Chinese. To illustrate, although the classical language has no prepositions in the modern sense, the P (preposition) tag is retained for words known as coverbs (Pulleyblank, 1995). A few tags specific to modern Chinese are discarded; these include DER, DEV, and FW (see Table 1).
Show more

10 Read more

English language, culture and Translation

English language, culture and Translation

The Interpreting and Translation Section has significant experience in managing various short and long-term projects related to the provision of consultancy and tailor made training. The Section has designed and delivered individual and group training, tailored to the changing needs of various clients and offering flexible delivery and high quality of training. Recent and current clients include UN, UNDP, Moscow State University, European Parliament, Institute of Translation and Interpreting, Government of Iraq, Foreign & Commonwealth Office, Ministry of Defence, London Metropolitan Police, oil and gas companies in Kazakhstan among other private and public organisations.
Show more

8 Read more

The China which is here : translating classical Chinese poetry

The China which is here : translating classical Chinese poetry

Ts'ai, Ting-kan, Chinese Poems in English Rhymes Chicago: University of Chicago Press, 1932 Turner, John, A Golden Treasury of Chinese Poetry Hong Kong: Renditions Book, 1989 Waley, Arth[r]

301 Read more

Glimpses of Ancient China from Classical Chinese Poems

Glimpses of Ancient China from Classical Chinese Poems

There has been increasing interest in corpus-based research on historical languages (Crane & Lüdeling, 2012). Large-scale corpora for Classical Chinese include the Academia Sinica Ancient Chinese Corpus (Wei et al., 1997), the corpus at the Centre for Chinese Linguistics Corpus at Peking University, the Chinese Ancient Text Database at the Chinese University of Hong Kong (Ho, 2002), and the Sheffield Corpus of Chinese (Hu et al., 2005). Linguistic annotations, if available in these corpora, are limited to part-of-speech (POS) tags. With this constraint, most previous corpus-based studies focused on character frequency distribution (Zh ū , 2004; Zh ng, 2004; Qín, 2005), including a concordance for the Complete Tang Poems (Shǐ, 1990).
Show more

12 Read more

Vietnamese to Chinese Machine Translation via Chinese Character as Pivot

Vietnamese to Chinese Machine Translation via Chinese Character as Pivot

In this work, we focus on machine translation (MT) for language pairs with few parallel corpora but rich linguistic connections. A case study on Vietnamese and Chinese will be done. To exploit the shared linguistic characteristics between the language pair, the common written form, Chinese character, is adopted as a translation bridge. Be- ing the oldest continuously used writing system in the world, Chinese characters are logograms that are still used to write Chinese ( 汉字 / 漢字 in Chi- nese, hànzì in Chinese pinyin) and Japanese (kan- ji). Such characters were used but are currently less frequently used in Korean (hanja), and were also used in Vietnamese (chữ Hán). All the coun- tries that were historically under Chinese language and culture are unofficially referred to Chinese character cultural sphere or Sinosphere. These t- wo terms are often used interchangeably but have different denotations (Matisoff, 1990). A Chinese character writing example of different regions is in Figure 1.
Show more

10 Read more

Query Translation in Chinese English Cross Language Information Retrieval

Query Translation in Chinese English Cross Language Information Retrieval

The Chinese query is translated into English via looking up the English senses o f Chinese query term and words in its associated word list in a Chinese-English dictionary.. The procedur[r]

6 Read more

Stylistic Grammars in Language Translation

Stylistic Grammars in Language Translation

Stylistic Grammars in Language Translation S t y l i s t i c G r a m m a r s i n L a n g u a g e T r a n s l a t i o n C h r y s a n n e D i M A R C O a n d G r a e m e H I R S T D e p a r t m e n t o[.]

6 Read more

Translation Divergences in Chinese–English Machine Translation: An Empirical Investigation

Translation Divergences in Chinese–English Machine Translation: An Empirical Investigation

In this article, we conduct an empirical investigation of translation divergences between Chinese and English using a parallel treebank. In order to semi-automatically identify and categorize the translation divergences, we first devise a hierarchical alignment scheme between Chinese and English parse trees that eliminates conflicts and redun- dancies between word alignments and syntactic parses to prevent the generation of spurious translation divergences. Using this hierarchically aligned Chinese–English parallel treebank that we call HACEPT, we are able to semi-automatically identify translation divergences, classify them into seven types, and quantify each type of translation divergence. Our results show that the translation divergences are much broader than previously described in studies that are largely based on anecdotal evi- dence and linguistic knowledge. Our results also quantitatively demonstrate that some high-profile translation divergences that motivate previous research are actually very rare in our data, whereas other translation divergences that have previously received little attention actually exist in large quantities. We show that the type of syntax-based translation rules currently used in state-of-the-art SMT systems can be automatically extracted from HACEPT and they are expressive enough to capture the translation di- vergences. We also point out that existing treebanks are not optimal for extracting such translation rules. We also discuss the implications of our study to attempts to bridge translation divergence by devising shared semantic representations across languages. We show that although it is possible to bridge some translation divergences with semantic representations, other translation divergences are open-ended and building a semantic representation that captures all possible translation divergences may be impractical.
Show more

45 Read more

Adaptive Language and Translation Models for Interactive Machine Translation

Adaptive Language and Translation Models for Interactive Machine Translation

Cache-based language models were introduced by Kuhn and de Mori (1990) for the dynamic adap- tation of speech language models. These models, inspired by the memory caches on modern com- puter architectures, are motivated by the principle of locality which states that a program tends to re- peatedly use memory cells that are physically close. Similarly, when speaking or writing, humans tend to use the same words and phrase constructs from paragraph to paragraph and from sentence to sen- tence. This leads us to believe that, when processing a document, the part of a document that is already processed (e.g. for speech recognition, translation or text prediction) gives us very useful information for future processing in the same document or in other related documents.
Show more

8 Read more

Automated Evaluation of Language Translation

Automated Evaluation of Language Translation

The path to a systematic picture of MT evaluation is long and intensive.While it is difficult to write a comprehensive overview of the MT evaluation literature, certain tendencies and trends should be mentioned. First, throughout the history of evaluation, two aspects – often called quality and fidelity – stand out. Particularly MT researchers often feel that if a system produces syntactically and lexically well-formed sentences after translation (i.e., high quality output), that does not distort the meaning (semantics) of the input (i.e., high fidelity), then the evaluation is good and sufficient.System developers and real-world users often add evaluation measures, notably system extensibility like how easy it is for a user to add new words, grammar, and transfer rules and coverage (specialization of the system to the domains of interest), and price. In fact, as discussed in (Church and Hovy, 1993), for some real world applications quality may take a back seat to these factors. Various ways of measuring quality have been proposed, some focusing on specific syntactic constructions like relative clauses, number agreement etc. (Flanagan, 1994), others are asking judges to rate each sentence as a whole on an N-point scale (White et al., 1992 1994; Doyon et al., 1998), and others automatically measuring the perplexity of a target text against a n-gram language model of ideal translations (Papineni et al., 2001). The amount of agreement among such measures has never been taken into account. Fidelity requires bilingual judges, and is usually measured on an N-point scale by having judges rate how well each portion of the system's output expresses the content of an equivalent portion of one or more ideal human translations (White et al., 1992 1994; Doyon et al., 1998). A proposal to measure the quality automatically is by projecting both system output and a number of ideal human translations into a vector space, and then measuring how far the system's translation deviates from the mean of the human
Show more

6 Read more

Machine Translation for Language Preservation

Machine Translation for Language Preservation

Statistical machine translation has been remarkably successful for the world’s well-resourced languages, and much effort is focussed on creating and exploiting rich resources such as treebanks and wordnets. Machine translation can also support the urgent task of document- ing the world’s endangered languages. The primary object of statistical translation models, bilingual aligned text, closely coincides with interlinear text, the primary artefact collected in documentary linguistics. It ought to be possible to exploit this similarity in order to improve the quantity and quality of documentation for a language. Yet there are many technical and logistical problems to be addressed, starting with the problem that – for most of the languages in question – no texts or lexicons exist. In this position paper, we examine these challenges, and report on a data collection effort involving 15 endangered languages spoken in the highlands of Papua New Guinea.
Show more

10 Read more

A Statistical Approach to Language Translation

A Statistical Approach to Language Translation

A STATISTICAL APPROACH TO LANGUAGE TRANSLATION A S T A T I S T I C A L A P P R O A C H T O L A N G U A G E T R A N S L A T I O N P B R O W N , J C O C K E , S D E L I , A P I E T R A , V D E L L A P I[.]

6 Read more

How To Teach A Chinese Language

How To Teach A Chinese Language

American Educational Research Journal, Computer Assisted Language Instruction Consortium Journal, Language Learning & Technology, Learning and Individual Differences, Journal of Educational Computing Research, Journal of Literacy Research, SAGE Open, Science, System: An International Journal of Educational Technology and Applied Linguistics, TESOL Quarterly, Educational Research and Evaluation 2011-date Proposal Review for Conferences:

9 Read more

Chinese Native Language Identification

Chinese Native Language Identification

To see how these models perform across lan- guages, we also compare the results against the TOEFL11 corpus used in the NLI2013 shared task. We perform the same experiments on that dataset using the English CoreNLP models, Penn Treebank PoS tagset and a set of 400 English func- tion words. Figure 2 shows the results side by side. Remarkably, we see that the model results closely mirror each other across corpora. This is a highly interesting finding from our study that mer- its further investigation. There is a systematic pat- tern occurring across data from learners of com- pletely different L1-L2 pairs. This suggests that manifestations of CLI via surface phenomena oc- cur at the same levels and patternings regardless of the L2. Cross-language studies can help re- searchers in linguistics and cognitive science to better understand the SLA process and language transfer effects. They can enhance our understand- ing of how language is processed in the brain in ways that are not possible by just studying mono- linguals or single L1-L2 pairs, thereby providing us with important insights that increase our knowl- edge and understanding of the human language faculty.
Show more

5 Read more

Classical and modern Arabic corpora: Genre and language change

Classical and modern Arabic corpora: Genre and language change

researchers. Classical Arabic texts, in particular the Quran and Hadith, are a specialised genre. The Classical Arabic Quran has been analysed, translated, interpreted and annotated by scholars for over a thousand years, resulting in many knowledge sources for rich corpus linguistic annotation. Modern Standard Arabic is the common written standard used throughout the Arab world; but our research with Arabic corpora has covered wider genre and language variation. AI researchers at Leeds University have collaborated with Arabic linguists to develop a number of Classical Arabic corpus resources: the Quranic Arabic Corpus with several layers of linguistic annotation; the QurAna Quran pronoun anaphoric co-reference corpus; the QurSim Quran verse similarity corpus; the Qurany Quran corpus annotated with English translations and verse topics; the Boundary-Annotated Quran Corpus; the Quran Question and Answer Corpus; the Multilingual Hadith Corpus; the King Saud University Corpus of Classical Arabic; and the Corpus for teaching about Islam. We have also developed Modern Arabic corpus resources spanning a range of genres and language types: Arabic By Computer; the Corpus of Contemporary Arabic; the Arabic Internet Corpus; the World Wide Arabic Corpus; the Arabic Discourse Treebank; the Arabic Learner Corpus; the Arabic Children’s Corpus; and the Arabic Dialect Text Corpus. Modern Arabic corpus researchers harvest online news, web-pages, and internet social media; these might see to differ markedly from religious texts in language and genre. However, Quran verses are short text snippets, analogous to Twitter tweets or Amazon customer reviews. Quran verses annotated with analyses derived from traditional exegesis or scholars’ commentaries can provide rich training data for supervised Machine Learning of language models, in Artificial Intelligence research. So, the language of the Quran may still inform Modern Arabic corpus linguistics and artificial intelligence research, and development of Modern Arabic text analytics tools.
Show more

28 Read more

English Translation of Chinese Tea Terminology from the Perspective of Translation Ethics

English Translation of Chinese Tea Terminology from the Perspective of Translation Ethics

DOI: 10.4236/ojml.2019.93017 180 Open Journal of Modern Linguistics mistakes, most of the mistranslation is caused either by the translator’s failure of understanding the source text, or by the translator’s unawareness of cultural dif- ferences and of translation service. As a result, the Chinese tea named “ 大红袍 ” has been translated in various ways, such as “Dahongpao Tea”, “Big Hongpao Tea”, “Clovershrub Tea” and “Red Robe Tea”, which may mislead customers to mistake them for different types of tea; “ 六安瓜片 (Lu’an Melon Seeds Tea)” is transliterated as “Lu An Gua Pian”, which does not make any sense to foreign customers. In effect, “ 瓜片 (Melon Seeds)” means that the tea leaves are shaped like melon seeds. Therefore, if the tea translation only adopts transliteration, it may not only lose the connotation of the tea, but also make the reader unable to understand. In view of these problems, it is of great significance to explore effec- tive strategies for tea terminology translation.
Show more

12 Read more

Show all 10000 documents...