[PDF] Top 20 Has Machine Translation Achieved Human Parity? A Case for Document level Evaluation

Has Machine Translation Achieved Human Parity? A Case for Document level Evaluation

... neural machine translation achieves parity with professional human translation on the WMT Chinese– English news translation ...the evaluation of single sentences and ... See full document

6

Extending Machine Translation Evaluation Metrics with Lexical Cohesion to Document Level

... with human assessments at the system level even when used alone, highly com- parable to BLEU and ...the document level, however, they are not as good as the ...the document and system ... See full document

9

Improving Evaluation of Document level Machine Translation Quality Estimation

... Quality control is carried out by inclusion of pairs of genuine MT outputs and automatically de- graded versions of them (bad references) within 100-translation HITs, before a difference of means significance test ... See full document

6

Accurate Evaluation of Segment level Machine Translation Metrics

... for evaluation of MT systems and document-level metrics have been iden- tified (Koehn, 2004; Graham and Baldwin, 2014; Graham et ...test has been proposed for segment-level metrics, ... See full document

9

Hierarchical Modeling of Global Context for Document Level Neural Machine Translation

... We integrate our proposed HM-GDC into the orig- inal Transformer model implemented by Open- NMT (Klein et al., 2017). Following the Trans- former model (Vaswani et al., 2017), the hidden size and filter size are set to ... See full document

10

Document Level Machine Translation Evaluation with Gist Consistency and Text Cohesion

... a document. They propose to build document-level MT metrics by integrating cohesion score based on lexical cohesion ...than human translation when the MT model is especially trained on ... See full document

8

Attaining the Unattainable? Reassessing Claims of Human Parity in Neural Machine Translation

... that machine translation (MT) has reached human parity for the translation of news from Chinese into English, using pairwise ranking and considering three vari- ables that were not ... See full document

11

Cache based Document level Statistical Machine Translation

... However, document-level translation has drawn little attention from the SMT research ...of document boundaries (Tam, ...ture document-level information and model it via ... See full document

11

Bilingual Lexical Cohesion Trigger Model for Document Level Machine Translation

... cohesion has been explored in the literature of both linguistics and computational ...is achieved through word choices in a text by Halliday and Hasan ...of machine translation output, ... See full document

5

Using a Graph based Coherence Model in Document Level Machine Translation

... Meteor has been shown to have a higher correlation with human judgements than BLEU (Lavie et ...tive evaluation metric for our ...reference translation, the metric is interesting with regard ... See full document

10

When and Why is Document level Context Useful in Neural Machine Translation?

... Alternatively, multi-encoder approaches encode each additional sentence separately. The model learns representations solely of the context sentences which are then integrated into the baseli- ne model architecture. ... See full document

11

Human Evaluation of Neural Machine Translation: The Case of Deep Learning

... It is important to note that these typologies were established before the creation of NMT, and it could therefore be argued that they concentrate mostly on features for which more recent MT systems are not likely to ... See full document

11

Docent: A Document Level Decoder for Phrase Based Statistical Machine Translation

... DP-based SMT decoders have a parameter called distortion limit that limits the difference in word order between the input and the MT output. In DP search, this is formally considered to be a parameter of the search ... See full document

6

Automatic Evaluation of Chinese Translation Output: Word Level or Character Level?

... English-to-Chinese translation task evaluated 127 documents with 1,830 ...segment has 4 reference translations and the system translations of 11 MT systems, released in the corpus ...scale. Human ... See full document

6

Mining question-answer pairs from web forum: a survey of challenges and resolutions

... Internet forums, which are also known as discussion boards, are popular web applications. Members of the board discuss issues and share ideas to form a community within the board, and as a result generate huge amount of ... See full document

9

Superiority Of Graph-Based Visual Saliency (GVS) Over Other Image Segmentation Methods

... emphasis has focused primarily on computer vision analysis, which is also often time hard to isolate the one most suitable among a given set of ...image evaluation models (PRI, VOI, GCE and DBE) were tested ... See full document

8

Searching for Context: a Study on Document Level Labels for Translation Quality Estimation

... literal translation of “This is wrong” - by “Das ist nicht gut”, which ﬁts better into the ...eral translation of “Here, this layer is thin” - to “Hier ist die Anzahl solcher Menschen gering”, a ... See full document

8

Comparing a Hand crafted to an Automatically Generated Feature Set for Deep Learning: Pairwise Translation Evaluation

... Word embeddings are very important in our model, because they allow us to model the rela- tions between the two translations and the reference. In this work, we created and trained our own embeddings between the two MT ... See full document

9

Automatically Evaluating Answers to Definition Questions

... Once this answer key of vital/okay nuggets is created, the assessor then manually scores each run. For each system response, he or she decides whether or not each nugget is present. Assessors do not sim- ply perform ... See full document

8

Continuous Measurement Scales in Human Evaluation of Machine Translation

... Table 4 shows a breakdown by target language of the proportion of judgments collected whose scores met the significance threshold of p < 0.05. Results appear at first to have shockingly low lev- els of high quality ... See full document

9