NLP, Machine Translation, Machine Learning

Top PDF NLP, Machine Translation, Machine Learning:

A Machine Learning Method to Distinguish Machine Translation from Human Translation

A Machine Learning Method to Distinguish Machine Translation from Human Translation

In the statistical machine translation systems part, the performance is depended on the LM and trans- lation model. Traditional Back-off n-gram LM- s (BNLMs) (Chen and Goodman, 1996; Chen and Goodman, 1999; Stolcke, 2002) have been wide- ly used for probability estimation and BNLMs al- so show up in many other NLP tasks (Jia and Zhao, 2014; Zhang et al., 2012; Xu and Zhao, 2012). Recently, a better probability estimation method, Continuous-Space Language Models (C- SLMs), especially Neural Network Language Mod- els (NNLMs) (Bengio et al., 2003; Schwenk et al., 2006; Schwenk, 2007; Le et al., 2011) are being used in SMT tasks (Son et al., 2010; Son et al., 2012; Wang et al., 2013; Wang et al., 2015; Wang et al., 2014). Also, Neural Network Translation Model- s (NNTMs) show a success in SMT (Kalchbrenner and Blunsom, 2013; Blunsom et al., 2014; Devlin et al., 2014). However, the high cost of CSLMs makes it difficult to decoding directly. This leads to a n- best reranking method which is available for our pa- per (Schwenk et al., 2006; Son et al., 2012).
Show more

7 Read more

Learning for Semantic Parsing with Statistical Machine Translation

Learning for Semantic Parsing with Statistical Machine Translation

(Air Travel Information Service) (Miller et al., 1996; Papineni et al., 1997; Macherey et al., 2001), in which a typcial MR is only a single semantic frame. Learning methods have been devised that can gen- erate MRs with a complex, nested structure (cf. Figure 1). However, these methods are mostly based on deterministic parsing (Zelle and Mooney, 1996; Kate et al., 2005), which lack the robustness that characterizes recent advances in statistical NLP. Other learning methods involve the use of fully- annotated augmented parse trees (Ge and Mooney, 2005) or prior knowledge of the NL syntax (Zettle- moyer and Collins, 2005) in training, and hence re- quire extensive human efforts when porting to a new domain or language.
Show more

8 Read more

Preference Learning for Machine Translation

Preference Learning for Machine Translation

Mira has őrst been applied to a structured NLP problem by McDonald et al. [2005]. In SMT, the algorithm has received most interest due to its appeal as a online learning algorithm [Arun and Koehn, 2007; Watanabe, 2012; Watanabe et al., 2007b,a], and for enabling the use of sparse features [Chiang et al., 2009, 2008; Hasler et al., 2011; Eidelman, 2012] due to its efficiency. Since Mira can be implemented as an online algorithm, it also allows for parallelization [Eidelman et al., 2013c,b]. Batch variants of the Mira algorithm have also been explored for SMT [Zhao and Huang, 2013; Cherry and Foster, 2012]. As we have shown in our presentation of Mira for SMT, hope and fear derivations are a way of deőning effective constraints. However, by using k-best lists as stand-in for the full search space, some ődelity is lost, which is why Chiang [2012] proposes a cost-augmented inference approach to search for constraints in a larger space. Wisniewski and Yvon [2013] present another variant for the constraints, and Eidelman et al. [2013a ] propose a variant for the margin deőnition in Mira . Tan et al. [2013] propose an algorithm, which, similar to Mert , optimizes the exact corpus-level BLEU score. In general structured prediction for SMT, approaches that include the search procedure for learning have been explored thoroughly: Zhang et al. [2008] present an application of search-based structured prediction [Daumé et al., 2009] for SMT, and in another line of work, violation-őxing approaches are presented, which try to counter-act incorrect updates which are due to search errors [Huang et al., 2012; Yu et al., 2013; Liu and Huang, 2014; Zhang et al., 2013; Zhao et al., 2014].
Show more

266 Read more

Machine Learning for Hybrid Machine Translation

Machine Learning for Hybrid Machine Translation

We describe a substitution-based system for hybrid machine translation (MT) that has been extended with machine learning components controlling its phrase selection. The approach is based on a rule-based MT (RBMT) system which creates template translations. Based on the rule-based generation parse tree and target-to-target alignments, we identify the set of “interesting” translation candidates from one or more translation engines which could be substituted into our translation templates. The substitution process is either controlled by the output from a binary classifier trained on feature vectors from the different MT engines, or it is depending on weights for the decision factors, which have been tuned using MERT. We are able to observe improvements in terms of BLEU scores over a baseline version of the hybrid system.
Show more

5 Read more

Learning Deep Transformer Models for Machine Translation

Learning Deep Transformer Models for Machine Translation

The residual network is the most common ap- proach to learning deep networks, and plays an important role in Transformer. In principle, resid- ual networks can be seen as instances of the or- dinary differential equation (ODE), behaving like the forward Euler discretization with an initial value (Chang et al., 2018; Chen et al., 2018b). Eu- ler’s method is probably the most popular first- order solution to ODE. But it is not yet accu- rate enough. A possible reason is that only one previous step is used to predict the current value

13 Read more

A Study of Reinforcement Learning for Neural Machine Translation

A Study of Reinforcement Learning for Neural Machine Translation

We adopt the Transformer model with trans- former big setting as defined in (Vaswani et al., 2017) for Zh-En and En-Zh translations, which achieves SOTA translation quality in several oth- er datasets. For En-De translation, we utilize the transformer base v1 setting. These settings are ex- actly same as used in the original paper, except we set the layer prepostprocess dropout for Zh-En and En-Zh translation to be 0.05. The optimizer used for MLE training is Adam (Kingma and Ba, 2015) with initial learning rate is 0.1, and we fol- low the same learning rate schedule in (Vaswani et al., 2017). During training, roughly 4, 096 source tokens and 4, 096 target tokens are paired in one mini batch. Each model is trained using 8 NVIDI- A Tesla M40 GPUs. For RL training, the model is initialized with parameters of the MLE model (trained with only bilingual data), and we continue training it with learning rate 0.0001. Same as (Bah- danau et al., 2017), to calculate the BLEU reward, we start all n-gram counts from 1 instead of 0 and
Show more

10 Read more

Learning Mechanism in Machine Translation System “PIVOT”

Learning Mechanism in Machine Translation System “PIVOT”

Learning Mechanism in Machine Translation System 'PIVOT' L e a r n i n g Mechanism in Machine T r a n s l a t i o n S y s t e m "PIVOT" ~ i t s u g u ~ i u r a M i k i t o H i r a t a Nami H o s h i n[.]

7 Read more

Semi Supervised Learning for Neural Machine Translation

Semi Supervised Learning for Neural Machine Translation

However, most existing NMT approaches suf- fer from a major drawback: they heavily rely on parallel corpora for training translation mod- els. This is because NMT directly models the probability of a target-language sentence given a source-language sentence and does not have a sep- arate language model like SMT (Kalchbrenner and Blunsom, 2013; Sutskever et al., 2014; Bahdanau et al., 2015). Unfortunately, parallel corpora are usually only available for a handful of research- rich languages and restricted to limited domains such as government documents and news reports. In contrast, SMT is capable of exploiting abundant target-side monolingual corpora to boost fluency of translations. Therefore, the unavailability of large-scale, high-quality, and wide-coverage par- allel corpora hinders the applicability of NMT.
Show more

10 Read more

Non linear Learning for Statistical Machine Translation

Non linear Learning for Statistical Machine Translation

Modern statistical machine translation (SMT) systems usually use a linear com- bination of features to model the quality of each translation hypothesis. The linear combination assumes that all the features are in a linear relationship and constrains that each feature interacts with the rest fea- tures in an linear manner, which might limit the expressive power of the model and lead to a under-fit model on the cur- rent data. In this paper, we propose a non- linear modeling for the quality of transla- tion hypotheses based on neural networks, which allows more complex interaction between features. A learning framework is presented for training the non-linear mod- els. We also discuss possible heuristics in designing the network structure which may improve the non-linear learning per- formance. Experimental results show that with the basic features of a hierarchical phrase-based machine translation system, our method produce translations that are better than a linear model.
Show more

11 Read more

Multi agent Learning for Neural Machine Translation

Multi agent Learning for Neural Machine Translation

To resolve the second problem, we simplify the many-to-many learning to the one-to-many (one teacher vs. many students) learning, extending ensemble knowledge distillation (Fukuda et al., 2017; Freitag et al., 2017; Liu et al., 2018; Zhu et al., 2018). During the training, each agent per- forms better by learning from the ensemble model (Teacher) of all agents integrating the knowledge distillation (Hinton et al., 2015; Kim and Rush, 2016) into the training objective. This proce- dure can be viewed as the introduction of an ad- ditional regularization term into the training ob- jective, with which each agent can learn advan- tages from the ensemble model gradually. With this method, each agent is optimized not only to the maximization of the likelihood of the train- ing data, but also to the minimization of the di- vergence between its own model and the ensemble model.
Show more

10 Read more

Learning to Actively Learn Neural Machine Translation

Learning to Actively Learn Neural Machine Translation

We present a framework to learn the sentence selection policy most suitable and effective for the NMT task at hand. This is in contrast to the majority of work in AL-MT where hard-coded heuristics are used for query selection (Haffari and Sarkar, 2009; Bloodgood and Callison-Burch, 2010). More concretely, we learn the query pol- icy based on a high-resource language-pair shar- ing similar characteristics with the low-resource language-pair of interest. After trained, the policy is applied to the language-pair of interest capital- ising on the learned signals for effective query se- lection. We make use of imitation learning (IL) to train the query policy. Previous work has shown that the IL approach leads to more effective pol- icy learning (Liu et al., 2018), compared to rein- forcement learning (RL) (Fang et al., 2017) . Our proposed method effectively trains AL policies for batch queries needed for NMT, as opposed to the previous work on single query selection.
Show more

11 Read more

Active Learning and Crowd-Sourcing for Machine Translation

Active Learning and Crowd-Sourcing for Machine Translation

Active learning has been applied to Statistical Parsing (Hwa, 2004; Tang et al., 2001) to improve sample selec- tion for manual annotation. In case of MT, active learn- ing has remained largely unexplored. Some attempts in- clude training multiple statistical MT systems on varying amounts of data, and exploring a committee based selection for re-ranking the data to be translated and included for re- training (Callison-burch, 2003). But this does not apply to training in a low-resource scenario where data is scarce. Recent work discussed multiple query selection strategies for a Statistical Phrase Based Translation system (Haffari et al., 2009). Their framework requires source text to be translated by the system and the translated data is used in a self-training setting to train MT models. (Gangadhara- iah et al., 2009) use a pool-based strategy that maximizes a measure of expected future improvement, to sample in- stances from a large parallel corpora. Their goal is to select the most informative sentence pairs to build an MT system, and hence they assume the existence of target sides trans- lations along with the source sides. We however are inter- ested in selecting most informative sentences to reduce the effort and cost involved in translation.
Show more

6 Read more

Undirected Machine Translation with Discriminative Reinforcement Learning

Undirected Machine Translation with Discriminative Reinforcement Learning

Because UMT prunes all but the single cho- sen action at each step, both choosing a good in- ference order and choosing a correct action re- duce to a single choice of what action to take next. To learn this decoding policy, we propose a novel Discriminative Reinforcement Learning (DRL) framework. DRL is used to train mod- els that construct incrementally structured out- put using a local discriminative function, with the goal of optimizing a global loss function. We apply DRL to learn the UMT scoring func- tion’s parameters, using the BLEU score as the global loss function. DRL learns a weight vector for a linear classifier that discriminates between decisions based on which one leads to a com- plete translation-derivation with a better BLEU score. Promotions/demotions of translations are performed by applying a Perceptron-style update on the sequence of decisions that produced the translation, thereby training local decisions to op- timize the global BLEU score of the final trans- lation, while keeping the efficiency and simplic- ity of the Perceptron Algorithm (Rosenblatt, 1958; Collins, 2002).
Show more

10 Read more

Response based Learning for Grounded Machine Translation

Response based Learning for Grounded Machine Translation

An application of structured prediction to SMT involves more than a straightforward replacement of labeled output structures by reference transla- tions. Firstly, update rules that require to com- pute a feature representation for the reference translation are suboptimal in SMT, because of- ten human-generated reference translations can- not be generated by the SMT system. Such “un- reachable” gold-standard translations need to be replaced by “surrogate” gold-standard translations that are close to the human-generated translations and still lie within the reach of the SMT sys- tem. Computation of distance to the reference translation usually involves cost functions based on sentence-level BLEU (Nakov et al. (2012), in- ter alia) and incorporates the current model score, leading to various ramp loss objectives described in Gimpel and Smith (2012).
Show more

11 Read more

Machine Translation

Machine Translation

commercial systems exist e.g., ALPS; FAMT spinoffs could reduce costs in near term.. status and prospects.[r]

5 Read more

Name Translation in Statistical Machine Translation   Learning When to Transliterate

Name Translation in Statistical Machine Translation Learning When to Transliterate

two reasons. First, although names are important to human readers, automatic MT scoring metrics (such as B LEU ) do not encourage researchers to improve name translation in the context of MT. Names are vastly outnumbered by prepositions, articles, adjec- tives, common nouns, etc. Second, name translation is a hard problem — even professional human trans- lators have trouble with names. Here are four refer- ence translations taken from the same corpus, with mistakes underlined:

9 Read more

Can Machine Learning Algorithms Improve Phrase Selection in Hybrid Machine Translation?

Can Machine Learning Algorithms Improve Phrase Selection in Hybrid Machine Translation?

We describe a substitution-based, hybrid machine translation (MT) system that has been extended with a machine learning component controlling its phrase selection. Our approach is based on a rule-based MT (RBMT) system which creates template translations. Based on the generation parse tree of the RBMT system and standard word alignment computation, we identify potential “translation snippets” from one or more translation engines which could be substituted into our translation templates. The substitution process is controlled by a binary classifier trained on feature vectors from the different MT engines. Using a set of manually annotated training data, we are able to observe improvements in terms of BLEU scores over a baseline version of the hybrid system.
Show more

6 Read more

Machine Translationness: Machine-likeness in Machine Translation Evaluation

Machine Translationness: Machine-likeness in Machine Translation Evaluation

We evaluated the method by analysing the Pearson corre- lation between MTS scores and human quality perception. The MTS variable should decrease as the human rating in- creases (the higher quality the lower MTS). Therefore, the coefficient should be a negative fraction, away from 0 and tending to -1. Three people were asked to assess 196 Word- net glosses machine translated from English into Spanish. We were interested in the linguistic intuition of monolin- gual ordinary readers in detecting disfluent and inaccurate translations. So we did not need bilingual evaluators to judge disfluent translations and judge the inaccuracy of odd and absurd translations. We realized that a standard ARPA scale was suitable for our experiment. This scale has five points: 1-Incomprehensible, 2- Disfluent, 3- Non-native, 4- Good, 5- Flawless Spanish (the language of our experi- ment). Although the scale is for fluency we considered that translations with MTness instances that affected accuracy could be rated as incomprehensible.
Show more

8 Read more

Optimizing the relevancy of Predictions using Machine Learning and NLP of Search Query

Optimizing the relevancy of Predictions using Machine Learning and NLP of Search Query

This method is hugely relies on the efficiency, reliability, correctness of the Semantic Web database and the communication latencies between the user and Search engine. Since, the modern day high performance web can execute distributed and parallel computations, the performance factors of the existing web servers will suffice for this type of Query processing. The Sematic Web database is a dynamic database having huge collection of knowledge populated into it. It consists of relationships among various members, the various meaning of a particular word out of which, only one will be fitting for a particular context in a query. In order to obtain satisfactory performance, the Network communication rates must be sufficiently high, meaning that an Internet connection having a transfer rate less than a minimum threshold value will not result in a timely response. As per the report of Akamai, a global content delivery network the average internet download speed of the world is 3.1 Mbps (raising 4% from the previous quarter. The Query processing model discussed in this paper would require at least a 1Mbps internet connection so as to accept a query, process it and provide appropriate suggestions in a minimum time frame. When concerned with the predictions within a session, the system realistically assumes that the user takes some time to find the required information in a web page, which is one of the results of his first query in the session and this model will take some amount of time to perform NLP and to obtain relevant predictions and to compare them with the existing patterns and returns the eventual predictions to the user when he starts to type his next search query. The advantage of this model is that these computations occur in background and the probability that the time taken to compute NLP and to return the end results being more than the time between consecutive searches is very less.
Show more

6 Read more

Learning from Parenthetical Sentences for Term Translation in Machine Translation

Learning from Parenthetical Sentences for Term Translation in Machine Translation

The state-of-art term translation knowledge ex- traction methods tend to take the Internet as a big corpus (Ren et al., 2010). The most important as- sumption behind these methods is that the corre- sponding translation for every source term must exist somewhere on the web. Then, the term trans- lation pair extraction problem is converted to the task of finding these translations from the web and extract them correctly. As a result, except terms, the other various fragments, including multi-word expressions , will be extracted for the lack of term recognization. Not surprisingly, it has increased system workloads and directly reduces the quality of term translation.
Show more

9 Read more

Show all 10000 documents...