[PDF] Top 20 Dependency-Based Self-Attention for Transformer NMT

Dependency-Based Self-Attention for Transformer NMT

... the Transformer model (Vaswani et ...(RNN)- based models (Sutskever et ...garnered attention from MT re- searchers. The Transformer model computes the strength of a relationship between two ... See full document

8

Design and Implementation of Consecutive Interpreting System Based on Transformer NMT Model

... multi-head self-attention mechanism, and the second is a simple, position-wise fully connected feed-forward ...multi-head attention over the output of the encoder stack ...in NMT, ... See full document

8

Tilde’s Machine Translation Systems for WMT 2018

... when NMT systems first showed to achieve significantly better results than statistical machine translation (SMT) systems (Bojar et ...for NMT have changed on a yearly (and even more fre- quent) ...shallow ... See full document

9

IITP MT System for Gujarati English News Translation Task at WMT 2019

... network based encoder-decoder NMT architecture (Cho et ...the self-attention to better en- code a sequences. Self-attention is used in the architecture to calculate ... See full document

5

Attending to Future Tokens for Bidirectional Sequence Generation

... bidirectional self-attention module where every token can attend to every other ...a Transformer is not restricted to sequen- tial ...in Transformer or LSTM-based models (Gu et ... See full document

10

How Much Attention Do You Need? A Granular Analysis of Neural Machine Translation Architectures

... and self-attention (Vaswani et ...the dependency between source language time steps, leading to considerable speed-ups in training time and improvements in ...The Transformer, however, ... See full document

10

Knowledge Enriched Transformer for Emotion Detection in Textual Conversations

... Knowledge-Enriched Transformer (KET) to effectively incorporate con- textual information and external knowledge bases to address the aforementioned ...The Transformer (Vaswani et ...The ... See full document

12

On the Relation between Position Information and Sentence Length in Neural Machine Translation

... CNN, Transformer allows us to change the position information ...vanilla Transformer, the modified Trans- former using self-attention with relative positional encodings (Shaw et ...improves ... See full document

11

Look Harder: A Neural Machine Translation Model with Hard Attention

... hard-attention based Transformer model and the original soft-attention based Transformer model indicates the effectiveness of selecting a few relevant source tokens for each ... See full document

7

Tensorized Self Attention: Efficiently Modeling Pairwise and Global Dependencies Together

... tensorized self-attention (MTSA), for context ...pairwise dependency is captured by an efficient dot-product based token2token self- attention, while the global dependency ... See full document

11

Joey NMT: A Minimalist NMT Toolkit for Novices

... Joey NMT, a minimalist neural machine translation toolkit based on PyTorch that is specifically designed for ...Joey NMT provides many popular NMT features in a small and simple code base, so ... See full document

6

Dependency-Based Relative Positional Encoding for Transformer NMT

... dependency-based NMT model that uses dependency trees for both source and target ...the Transformer, but did not improve the Transformer’s ...the Transformer model so that it incorpo- ... See full document

8

Lexically Constrained Decoding for Sequence Generation Using Grid Beam Search

... interaction, at each iteration we chose a phrase of up to three tokens from the reference translation which does not appear in the current MT hypotheses. In the strict setting, the complete phrase must be missing ... See full document

12

Moon IME: Neural based Chinese Pinyin Aided Input Method with Customizable Association

... There are variable referential natural language processing studies(Cai et al., 2018; Li et al., 2018b; He et al., 2018; Li et al., 2018a; Zhang et al., 2018a; Cai et al., 2017a,b) for IME devel- opment to refer to. Most ... See full document

6

A Multiscale Visualization of Attention in the Transformer Model

... When specific neurons are linked to a tangi- ble outcome, it presents an opportunity to inter- vene in the model (Bau et al., 2019). By altering the relevant neurons—or by modifying the model weights that determine these ... See full document

6

Massive Exploration of Neural Machine Translation Architectures

... Table 6 shows the effect of varying beam widths and adding length normalization penalties. A beam width of 1 corresponds to greedy search. We found that a well-tuned beam search is crucial to achieving good results, and ... See full document

10

Bilingual GAN: A Step Towards Parallel Text Generation

... space based GAN methods and attention based sequence to sequence models have achieved impressive results in text generation and unsupervised machine translation respec- ...space based model capable ... See full document

10

Analyzing the Structure of Attention in a Transformer Language Model

... The Transformer is a fully attention-based alternative to recurrent networks that has achieved state-of-the-art results across a range of NLP ...of attention in a Transformer language ... See full document

14

Relation Classification Using Segment Level Attention based CNN and Dependency based RNN

... Traditional supervised approaches can be di- vided into feature-based methods and kernel methods. Feature-based methods focus on extract- ing and combining relevant features. Rink and Harabagiu (2010) ... See full document

6

The AMU UEDIN Submission to the WMT16 News Translation Task: Attention based NMT Models as Feature Functions in Phrase based SMT

... • N-best list extraction is more difficult, as hypotheses that have been recombined do not display correct cumulative sums for the NMT-feature scores. The one-best translation is always correctly scored as it ... See full document

7