First, we evaluate our model quantitatively in terms of automatic metrics such as BLEU (Pap- ineni et al., 2002), ROUGE (Lin, 2004), which have been widely used in previous works on para- phrase generation. In addition, we include iBLEU (Sun and Zhou, 2012), which penalizes repeat- ing the source sentence in its paraphrase. We use the same hyper-parameter in their original work. We compare DNPG with four existing neural-based models: ResidualLSTM (Prakash et al., 2016), VAE-SVG-eq (Gupta et al., 2017), pointer-generator (See et al., 2017) and the Trans- former (Vaswani et al., 2017), the latter two of which have been reported as the state-of-the-art models in Li et al. (2018) and Wang et al. (2018) respectively. For a fair comparison, we also in- clude a Transformer model with copy mechanism. Table 4 shows the performances of the models, indicating that DNPG achieves competitive per- formance in terms of all the automatic metrics among all the models. In particular, the DNPG has similar performance with the vanilla Transformer model on Quora dataset, while significantly per- forms better on WikiAnswers. The reason maybe that the DNPG is more robust to the noise, since it can process the paraphrase in an abstractive way. It also validates our assumption that paraphras- ing can be decomposed in terms of granularity. When the training data of high quality is available, the transformer-based models significantly outper- forms the LSTM-based models.
Prior work on controllable text generation usu- ally assumes that the controlled attribute can take on one of a small set of values known a priori. In this work, we propose a novel task, where the syntax of a generated sen- tence is controlled rather by a sentential ex- emplar. To evaluate quantitatively with stan- dard metrics, we create a novel dataset with human annotations. We also develop a vari- ational model with a neural module specifi- cally designed for capturing syntactic knowl- edge and several multitask training objectives to promote disentangled representation learn- ing. Empirically, the proposed model is ob- served to achieve improvements over baselines and learn to capture desirable characteristics. 1
Zhao et al. (2008) utilize multiple resources to strengthen a log-linear SMT model. Recently, deep neural models have also been applied to paraphrasegeneration due to their great success on natural language processing tasks. Prakash et al. (2016) design a deep stacked network with residual connections, Gupta et al. (2018) propose a conditional variational auto-encoder which can produce multiple paraphrases, Iyyer et al. (2018) learn a model to generate syntactically controlled paraphrase, Li et al. (2018) propose a generator-evaluator architecture coupled with deep reinforcement learning. Although being similar to the architecture proposed in (Li et al., 2018), our framework targets a different goal. Our work focuses on generating multiple diverse paraphrases, while theirs dedicates to improving the quality of the top generated paraphrase. The main difference is that our approach utilizes a generator discriminator to encourage more diverse paraphrases, which is essential for diversity. Diverse text generation A few works have explored to produce diverse generation by chang- ing decoding schemes or introducing random noise. Methods that change decoding schemes are orthogonal and complementary to our work. Li et al. (2016) modify the score function in decoding to encourage usage of novel words and penalize words after the same partially generated sentence. Dai et al. (2017) utilize conditional generative adversarial network (GAN) to generate diverse image caption according to the input noise. Gupta et al. (2018) employ a variational auto-encoder framework with both encoder and decoder conditioned on source input. Shi et al. (2018) employ inverse reinforcement learning for unconditional diverse text generation. Xu et al. (2018a) propose a modified GAN to generate di- verse and informative outputs for different inputs. Xu et al. (2018b) train a generator with different embeddings to generate multiple paraphrases, only the decoder embedding with lowest cross entropy can get updated.
Mia Xu Chen, Orhan Firat, Ankur Bapna, Melvin Johnson, Wolfgang Macherey, George Foster, Llion Jones, Mike Schuster, Noam Shazeer, Niki Parmar, et al. 2018. The best of both worlds: Combining recent advances in neural machine translation. In Proceedings of the 56th Annual Meeting of the As- sociation for Computational Linguistics (Volume 1: Long Papers), pages 76–86.
The task of paraphrasegeneration has many im- portant applications in NLP. It can be used to gen- erate adversarial examples of input text, which can then be used to train neural networks so that they become less susceptible to adversarial attack (Iyyer et al., 2018). For knowledge-based QA sys- tems, a paraphrasing step can produce multiple variations of a user query and match them with knowledge base assertions, enhancing recall (Yin et al., 2015; Fader et al., 2014). Relation extraction can also benefit from incorporating paraphrasegeneration into its processing pipeline (Romano et al., 2006). Manually annotating translation ref- erences is expensive, and automatically generating references through paraphrasing has been shown to be effective for evaluation of machine trans- lation (Zhou et al., 2006; Kauchak and Barzilay, 2006).
In this work, we propose taking a data-driven approach to train a model that can conduct evalu- ation in learning for paraphrasing generation. The framework contains two modules, a generator (for paraphrasegeneration) and an evaluator (for para- phrase evaluation). The generator is a Seq2Seq learning model with attention and copy mecha- nism (Bahdanau et al., 2015; See et al., 2017), which is first trained with cross entropy loss and then fine-tuned by using policy gradient with su- pervisions from the evaluator as rewards. The evaluator is a deep matching model, specifically a decomposable attention model (Parikh et al., 2016), which can be trained by supervised learn- ing (SL) when both positive and negative exam- ples are available as training data, or by inverse reinforcement learning (IRL) with outputs from the generator as supervisions when only positive examples are available. In the latter setting, for the training of evaluator using IRL, we develop a novel algorithm based on max-margin IRL prin- ciple (Ratliff et al., 2006). Moreover, the gener- ator can be further trained with non-parallel data, which is particularly effective when the amount of parallel data is small.
prediction ever since it was proposed in (Bengio et al., 2015). Similar to IL, reinforcement learn- ing, particularly with neural network models, has been widely used in many different domains, such as coreference resolution (Yin et al., 2018), docu- ment summarization (Chen and Bansal, 2018), and machine translation (Wu et al., 2018).
language inference and has inspired recent work on similar tasks (Chen et al., 2016; Kim et al., 2017). We present two contributions. First, to mitigate data sparsity, we modify the input representation of the decomposable attention model to use sums of character n-gram embeddings instead of word embeddings. We show that this model trained on the Quora dataset produces comparable or better results with respect to several complex neural ar- chitectures, all using pretrained word embeddings. Second, to significantly improve our model perfor- mance, we pretrain all our model parameters on the noisy, automatically collected question-paraphrase corpus Paralex (Fader et al., 2013), followed by fine-tuning the parameters on the Quora dataset. This two-stage training procedure achieves the best result on the Quora dataset to date, and is also sig- nificantly better than learning only the character n-gram embeddings during the pretraining stage.
In this paper, we investigate whether multilin- gual neural translation models learn stronger semantic abstractions of sentences than bilin- gual ones. We test this hypotheses by mea- suring the perplexity of such models when ap- plied to paraphrases of the source language. The intuition is that an encoder produces bet- ter representations if a decoder is capable of recognizing synonymous sentences in the same language even though the model is never trained for that task. In our setup, we add 16 different auxiliary languages to a bidirectional bilingual baseline model (English-French) and test it with in-domain and out-of-domain para- phrases in English. The results show that the perplexity is significantly reduced in each of the cases, indicating that meaning can be grounded in translation. This is further sup- ported by a study on paraphrasegeneration that we also include at the end of the paper.
Paraphrase modeling has been viewed as a ma- chine translation (MT) task in previous work. Ap- proaches include ones based on statistical ma- chine translation (SMT) (Quirk et al., 2004; Ban- nard and Callison-Burch, 2005) as well as syntax- based SMT (Callison-Burch, 2008). Mallinson et al. (2017) showed that neural machine transla- tion (NMT) systems perform better than phrase- based MT systems in paraphrasegeneration tasks. In Wang et al. (2018), authors show that para- phrase generation using the transformer leads to better performance compared to two other state- of-the-art techniques, a stacked residual LSTM (Prakash et al., 2016) and a nested variational LSTM (Gupta et al., 2018). Yu et al. (2016) showed that text generation task can be achieved using a generative network, where the generator is modeled as a stochastic policy. Later the model was explored and compared to maximum likeli- hood estimation, as well as scheduled sampling in Kawthekar et al. (2017). Authors noted that training generative adversarial networks (GANs) is a hard problem for textual input due to its dis- crete nature, which makes mini updates for mod- els to learn difficult. Iyyer et al. (2018) proposed encoder-decoder model-based, syntactically con- trolled paraphrase networks to generate syntacti- cally adversarial examples.
In machine translation, all words appearing in an input sentence must be rewritten in the target language. However, paraphrasegeneration does not require rewriting of all words. When some cri- teria are provided, words not satisfying the criteria in the input sentence are identified and rewritten. For example, the criterion for text simplification is the textual complexity, and rewrites complex words to simpler synonymous words. Owing to the characteristics of the task where only a limited portion of an input sentence needs to be rewrit- ten, previous methods based on machine transla- tion often perform conservatively and fail to pro- duce necessary rewrites (Zhang and Lapata, 2017; Niu et al., 2018). To solve the problem of con- servative paraphrasing that copies many parts of the input sentence, we propose a neural model for paraphrasegeneration that first identifies words in the source sentence requiring paraphrasing. Sub- sequently, these words are paraphrased by the neg- ative lexically constrained decoding that avoids outputting them as they are.
Recent neural paraphrasing systems (Prakash et al. 2016; Gupta et al. 2018) adopt automatic evaluation measures commonly used in MT, citing good correlation with human judgment (Madnani and Tetreault 2010; Wubben, van den Bosch, and Krahmer 2010): BLEU (Papineni et al. 2002) en- courages exact match between source and prediction by n- gram overlap; METEOR (Lavie and Agarwal 2007) also uses WordNet stems and synonyms; and TER (Snover et al. 2006) includes the number of edits between source and prediction. Such automatic metrics enable fast development cycles, but they are not sufficient for final quality assessment. For example, Chen and Dolan (2011) point out that MT met- rics reward homogeneous predictions to the training target, which conflicts with a qualitative goal of good human-level paraphrasing: variation in wording. Chaganty, Mussmann, and Liang (2018) show that predictions receiving low scores from these metrics are not necessarily poor quality accord- ing to human evaluation. We thus complement these auto- mated metrics with crowdsourced human evaluation.
duce a paraphrase of the sentence with the de- sired syntax. We show it is possible to create training data for this task by first doing back- translation at a very large scale, and then us- ing a parser to label the syntactic transforma- tions that naturally occur during this process. Such data allows us to train a neural encoder- decoder model with extra inputs to specify the target syntax. A combination of automated and human evaluations show that SCPN s gen- erate paraphrases that follow their target spec- ifications without decreasing paraphrase qual- ity when compared to baseline (uncontrolled) paraphrase systems. Furthermore, they are more capable of generating syntactically ad- versarial examples that both (1) “fool” pre- trained models and (2) improve the robustness of these models to syntactic variation when used to augment their training data.
Progress in statistical paraphrasegeneration has been hindered for a long time by the lack of large monolingual parallel corpora. In this paper, we adapt the neural machine translation approach to paraphrasegeneration and per- form transfer learning from the closely related task of entailment generation. We evaluate the model on the Microsoft Research Paraphrase (MSRP) corpus and show that the model is able to generate sentences that capture part of the original meaning, but fails to pick up on important words or to show large lexical vari- ation.
Depending on the level of granularity, there can be different types of paraphrasing such as: lexical (e.g. <automobile, car>), phrasal (e.g. <carry on, persist in>), and sentential (e.g. <The book was inter- esting, I enjoyed reading the book>) (Madnani and Dorr, 2010). Earlier work related to clinical-domain specific paraphrasing uses some unsupervised textual similarity measures to generate/extract lexical and phrasal paraphrases from monolingual parallel and comparable corpora (Elhadad and Sutaria, 2007; Del´eger and Zweigenbaum, 2009). Prud’hommeaux and Roark (2015) propose a graph-based word alignment algorithm to examine neurological disorders through analysis of spoken language data. An- other loosely related recent work adopts a semi-supervised word embedding model for medical synonym extraction (Wang et al., 2015) that can be regarded as the simplest form of a lexical paraphrase extraction task. Our work is the first to propose a neural network-based architecture that can model word/character sequences to essentially address all granularities of paraphrasegeneration for the clinical domain.
Applications of deep learning for paraphrasegeneration tasks have not been rigorously explored. We utilized several sources as potential large datasets. Recently, Weiting et al. (2015) took the PPDB dataset (size XL) and annotated phrases based on their paraphrasability. This dataset is called Annotated-PPDB and contains 3000 pairs in total. They also introduced another dataset called ML-Paraphrase for the purpose of evaluating bigram paraphrases. This dataset contains 327 instances. Microsoft Research Paraphrase Corpus (MSRP) (Dolan et al., 2005) is another widely used dataset for paraphrase detection. MSRP contains 5800 pairs of sentences (obtained from various news sources) accompanied with human annotations. These datasets are too small and therefore, we did not use them for training our deep learning models.
Metaphor is an increasingly studied phenomenon in computational linguistics. But while metaphor detection has received considerable attention in the NLP literature (Dunn et al., 2014; Veale et al., 2016) and in corpus linguistics (Krennmayr, 2015) in recent years, not much work has focused on the task of metaphor paraphrasing - assigning an appropriate interpretation to a metaphorical ex- pression. Moreover, there are few (if any) anno- tated corpora of metaphor paraphrases (Shutova and Teufel, 2010). The main papers in this area are Shutova (2010), and Bollegala and Shutova (2013). The first applies a supervised method combining WordNet and distributional word vec- tors to produce the best paraphrase of a single verb used metaphorically in a sentence. The second ap- proach, conceptually related to the first, builds an unsupervised system that, given a sentence with a single metaphorical verb and a set of poten- tial paraphrases, selects the most accurate candi- date through a combination of mutual information scores and distributional similarity.
follow Madnani et al. (2012), who used MT metrics for paraphrase identification, and experiment with 8 MT metrics as features for our binary classifier. In addition, we experiment with a binary feature which checks if the sampled paraphrase preserves named entities from the input sentence. We use WEKA (Hall et al., 2009) to replicate the classifier of Mad- nani et al. (2012) with our new feature. We tune the feature set for our classifier on the development data. 3 Semantic Parsing using Paraphrasing In this section we describe how the paraphrase al- gorithm is used for converting natural language to Freebase queries. Following Reddy et al. (2014), we formalize the semantic parsing problem as a graph matching problem, i.e., finding the Freebase sub- graph (grounded graph) that is isomorphic to the in- put question semantic structure (ungrounded graph). This formulation has a major limitation that can be alleviated by using our paraphrasegeneration al- gorithm. Consider the question What language do people in Czech Republic speak?. The ungrounded graph corresponding to this question is shown in Figure 3(a). The Freebase grounded graph which re- sults in correct answer is shown in Figure 3(d). Note that these two graphs are non-isomorphic making it impossible to derive the correct grounding from the ungrounded graph. In fact, at least 15% of the ex- amples in our development set fail to satisfy isomor- phic assumption. In order to address this problem, we use paraphrases of the input question to gener- ate additional ungrounded graphs, with the aim that one of those paraphrases will have a structure iso- morphic to the correct grounding. Figure 3(b) and Figure 3(c) are two such paraphrases which can be converted to Figure 3(d) as described in §3.2.
evaluate three previously proposed paraphrase gen- eration techniques, which range from very simple approaches that make use of little-to-no NLP or language-dependent resources to more sophisticated ones that heavily rely on such resources. Our eval- uation helps develop a better understanding of the strengths and weaknesses of each type of approach. The evaluation also brings to light additional proper- ties, including the number of redundant paraphrases generated, that future approaches and evaluations may want to consider more carefully.
of the data in the DAESO corpus consists of headline clus- ters scraped from Google News in the period April–August 2006. Google News uses clustering algorithms that con- sider the full text of each news article, as well as other features such as temporal and category cues, to produce sets of topically related articles. The scraper stores the headline and the first 150 characters of each news article scraped from the Google News Website. Roughly 13,000 clusters were retrieved. It is clear that although clusters deal roughly with one subject, the headlines can represent quite a different perspective on the content of the article; certain headlines are paraphrases, others are clearly not. To obtain only paraphrase pairs, the clusters need to be more coherent. In the DAESO project 865 clusters were man- ually subdivided into sub-clusters of headlines that show clear semantic overlap.