Conclusions and Future Work - Exploiting target syntax with structured decoding

We propose an translation model that exploits target side structural syntax with a strictly top-down tree-structured decoder called Doubly Recurrent Neural Networks (DRNN). We incorporate DRNN into an encoder-decoder NMT architecture, coupled with attention on tree and our novel syntactic connections on tree structures. Our experiments show that our proposed models can outperform strong sequential NMT baseline on natural language translation tasks, and reach a new state-of-the-art on Django code generation task. Our tree- structured decoder can also produce syntactically valid parse trees when compared against a highly trained syntactic parser output. It can also automatically learn programming language syntax, without having to resort to rule-based constraints like previous approaches.

In the future we hope to incorporate source side syntax information into the model. We also plan to explore the applications of SynC in the area of neural machine translation with more structured attention mechanisms, and even potentially a hybrid phrase-based NMT systems with SynC, in which the model can benefit from SynC to be more extensible when handling larger vocabulary sizes.

For code generation, the lack of high quality corpora with scale comparable to natural language corpora remains a problem. A more detailed study on the effects of sketching can also help us understand the limits of current models.

Bibliography

Roee Aharoni and Yoav Goldberg. 2017. Towards string-to-tree neural machine translation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics, Van- couver, Canada, pages 132–140. https://doi.org/10.18653/v1/P17-2021.

David Alvarez-Melis and Tommi S Jaakkola. 2017. Tree-structured decoding with doubly- recurrent neural networks. In International Conference on Learning Representations.

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. CoRR abs/1409.0473.

Joost Bastings, Ivan Titov, Wilker Aziz, Diego Marcheggiani, and Khalil Simaan. 2017. Graph convolutional encoders for syntax-aware neural machine translation. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Copenhagen, Denmark, pages 1957–1967. https://doi.org/10.18653/v1/D17-1209.

Yoshua Bengio, R´ejean Ducharme, Pascal Vincent, and Christian Janvin. 2003. A neural probabilistic language model. J. Mach. Learn. Res. 3:1137–1155. http://dl.acm.org/citation.cfm?id=944919.944966.

Danqi Chen and Christopher Manning. 2014. A fast and accurate dependency parser using neural networks. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Doha, Qatar, pages 740–750. https://doi.org/10.3115/v1/D14-1082.

David Chiang. 2005. A hierarchical phrase-based model for statistical machine translation. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05). Association for Computational Linguistics, Ann Arbor, Michigan, pages 263–270. https://doi.org/10.3115/1219840.1219873.

Do Kook Choe and Eugene Charniak. 2016. Parsing as language modeling. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Austin, Texas, pages 2331–2336. https://doi.org/10.18653/v1/D16-1257.

Li Dong and Mirella Lapata. 2016. Language to logical form with neural attention. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, pages 33–43. https://doi.org/10.18653/v1/P16-1004.

Li Dong and Mirella Lapata. 2018. Coarse-to-fine decoding for neural semantic parsing. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, pages 731–742. http://aclweb.org/anthology/P18-1068.

Chris Dyer, Miguel Ballesteros, Wang Ling, Austin Matthews, and Noah A. Smith. 2015. Transition-based dependency parsing with stack long short-term memory. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, Beijing, China, pages 334–343. https://doi.org/10.3115/v1/P15-1033.

Chris Dyer, Adhiguna Kuncoro, Miguel Ballesteros, and Noah A. Smith. 2016. Recur- rent neural network grammars. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, San Diego, California, pages 199–209. https://doi.org/10.18653/v1/N16-1024.

Akiko Eriguchi, Kazuma Hashimoto, and Yoshimasa Tsuruoka. 2016. Tree-to-sequence attentional neural machine translation. In 54th Annual Meeting of the Association for Computational Linguistics.

Akiko Eriguchi, Yoshimasa Tsuruoka, and Kyunghyun Cho. 2017. Learning to parse and translate improves neural machine translation. In 55th Annual Meeting of the Association for Computational Linguistics.

Michel Galley, Jonathan Graehl, Kevin Knight, Daniel Marcu, Steve DeNeefe, Wei Wang, and Ignacio Thayer. 2006. Scalable inference and training of context-rich syntactic translation models. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics. Association for Computational Linguistics, pages 961–968.

Christoph Goller and Andreas K¨uchler. 1996. Learning task-dependent distributed representations by backpropagation through structure. In In Proc. of the ICNN-96. IEEE, pages 347–352.

Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press. http://www.deeplearningbook.org.

Jiatao Gu, Zhengdong Lu, Hang Li, and Victor O.K. Li. 2016. Incorporating copying mechanism in sequence-to-sequence learning. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, pages 1631–1640. https://doi.org/10.18653/v1/P16-1154. Martin T Hagan, Howard B Demuth, Mark H Beale, and Orlando De Jes´us. 1996. Neural

network design, volume 20. Pws Pub. Boston.

Donald O. Hebb. 1949. The organization of behavior: A neuropsychological theory. Wiley, New York.

K. Hornik, M. Stinchcombe, and H. White. 1989. Multilayer feedforward networks are universal approximators. Neural Netw. 2(5):359–366. https://doi.org/10.1016/0893- 6080(89)90020-8.

Hideki Isozaki, Tsutomu Hirao, Kevin Duh, Katsuhito Sudoh, and Hajime Tsukada. 2010. Automatic evaluation of translation quality for distant language pairs. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Cambridge, MA, pages 944–952. https://www.aclweb.org/anthology/D10-1092.

Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, Jayant Krishnamurthy, and Luke Zettlemoyer. 2017. Learning a neural semantic parser from user feedback. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, pages 963–973. https://doi.org/10.18653/v1/P17-1089.

Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, and Luke Zettlemoyer. 2018. Mapping language to code in programmatic context. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Lin- guistics, pages 1643–1652. http://aclweb.org/anthology/D18-1192.

Nal Kalchbrenner and Phil Blunsom. 2013. Recurrent continuous translation models. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Seattle, Washington, USA, pages 1700–1709. https://www.aclweb.org/anthology/D13-1176.

Urvashi Khandelwal, He He, Peng Qi, and Dan Jurafsky. 2018. Sharp nearby, fuzzy far away: How neural language models use context. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Melbourne, Australia, pages 284– 294. https://www.aclweb.org/anthology/P18-1027.

Diederik P Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In International Conference on Learning Representations.

Dan Klein and Christopher D. Manning. 2003. Accurate unlexicalized parsing. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1. Association for Computational Linguistics, Stroudsburg, PA, USA, ACL ’03, pages 423–430. https://doi.org/10.3115/1075096.1075150.

Philipp Koehn, Franz Josef Och, and Daniel Marcu. 2003. Statistical phrase-based translation. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1. Association for Computational Linguistics, Stroudsburg, PA, USA, NAACL ’03, pages 48–54. https://doi.org/10.3115/1073445.1073462.

Yann LeCun, Bernhard E. Boser, John S. Denker, Donnie Henderson, R. E. Howard, Wayne E. Hubbard, and Lawrence D. Jackel. 1990. Handwritten digit recognition with a back-propagation network. In D. S. Touretzky, editor, Advances in Neural Information Processing Systems 2, Morgan-Kaufmann, pages

396–404. http://papers.nips.cc/paper/293-handwritten-digit-recognition-with-a-backpropagation-network.pdf.

Wang Ling, Phil Blunsom, Edward Grefenstette, Karl Moritz Hermann, Tom´aˇs Koˇcisk´y, Fumin Wang, and Andrew Senior. 2016. Latent predictor networks for code generation. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, pages 599–609. https://doi.org/10.18653/v1/P16-1057.

Minh-Thang Luong, Quoc V Le, Ilya Sutskever, Oriol Vinyals, and Lukasz Kaiser. 2016. Multi-task sequence to sequence learning. In International Conference on Learning Representations.

Thang Luong, Hieu Pham, and Christopher D. Manning. 2015. Effective approaches to attention-based neural machine translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Association for Computational Lin- guistics, pages 1412–1421. https://doi.org/10.18653/v1/D15-1166.

Tomáˇs Mikolov, Martin Karafiát, Lukáˇs Burget, Jan ˇCernock`y, and Sanjeev Khudanpur. 2010. Recurrent neural network based language model. In Eleventh annual conference of the international speech communication association.

Dipendra Misra, Ming-Wei Chang, Xiaodong He, and Wen-tau Yih. 2018. Policy shaping and generalized update equations for semantic parsing from denotations. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, pages 2442–2452. http://aclweb.org/anthology/D18-1266.

Vijayaraghavan Murali, Letao Qi, Swarat Chaudhuri, and Chris Jermaine. 2017. Neu- ral sketch learning for conditional program generation. In International Conference on Learning Representations.

Graham Neubig, Yoav Goldberg, and Chris Dyer. 2017. On-the-fly operation batching in dynamic computation graphs. In CoRR. volume abs/1705.07860. http://arxiv.org/abs/1705.07860.

Yusuke Oda, Hiroyuki Fudaba, Graham Neubig, Hideaki Hata, Sakriani Sakti, Tomoki Toda, and Satoshi Nakamura. 2015. Learning to generate pseudo-code from source code using statistical machine translation (t). In 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, pages 574–584.

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of 40th Annual Meeting of the Association for Computational Linguistics. Associa- tion for Computational Linguistics, Philadelphia, Pennsylvania, USA, pages 311–318. https://doi.org/10.3115/1073083.1073135.

Slav Petrov. 2010. Products of random latent variable grammars. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the

Frank Rosenblatt. 1958. The perceptron: a probabilistic model for information storage and organization in the brain. Psychological review 65(6):386.

Beatrice Santorini. 1990. Part-Of-Speech tagging guidelines for the Penn Treebank project (3rd revision, 2nd printing). Technical report, Department of Linguistics, University of Pennsylvania, Philadelphia, PA, USA.

Mike Schuster and Kuldip K Paliwal. 1997. Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing 45(11):2673–2681.

Abigail See, Peter J. Liu, and Christopher D. Manning. 2017. Get to the point: Summa- rization with pointer-generator networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, pages 1073–1083. https://doi.org/10.18653/v1/P17-1099.

Rico Sennrich and Barry Haddow. 2016. Linguistic input features improve neural machine translation. In In Proceedings of the First Conference on Machine Translation.

Felix Stahlberg, Eva Hasler, Aurelien Waite, and Bill Byrne. 2016. Syntactically guided neural machine translation. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics, Berlin, Germany, pages 299–305. https://doi.org/10.18653/v1/P16-2049.

Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to sequence learning with neural networks. In Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2. MIT Press, Cambridge, MA, USA, NIPS’14, pages 3104– 3112. http://dl.acm.org/citation.cfm?id=2969033.2969173.

Kai Sheng Tai, Richard Socher, and Christopher D Manning. 2015. Improved semantic representations from tree-structured long short-term memory networks. In Proceedings of the 53rd Annual Meeting on Association for Computational Linguistics.

Liling Tan, Jon Dehdari, and Josef van Genabith. 2015. An awkward disparity between BLEU / RIBES scores and human judgements in machine translation. In Proceedings of the 2nd Workshop on Asian Translation (WAT2015). Workshop on Asian Translation, Kyoto, Japan, pages 74–81. https://www.aclweb.org/anthology/W15-5009.

Oriol Vinyals, Meire Fortunato, and Navdeep Jaitly. 2015a. Pointer networks. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, editors, Advances in Neural Information Processing Systems 28, Curran Associates, Inc., pages 2692–2700. http://papers.nips.cc/paper/5866-pointer-networks.pdf.

Oriol Vinyals, Lukasz Kaiser, Terry Koo, Slav Petrov, Ilya Sutskever, and Geoffrey Hinton. 2015b. Grammar as a foreign language. In Advances in Neural Information Processing Systems. pages 2773–2781.

Pengcheng Yin, Bowen Deng, Edgar Chen, Bogdan Vasilescu, and Graham Neubig. 2018. Learning to mine aligned code and natural language pairs from stack overflow. In Proceedings of the 15th International Conference on Mining Software Repositories. ACM, New York, NY, USA, MSR ’18, pages 476–486. https://doi.org/10.1145/3196398.3196408.

Pengcheng Yin and Graham Neubig. 2017. A syntactic neural model for general-purpose code generation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Lin- guistics, pages 440–450. https://doi.org/10.18653/v1/P17-1041.

Xingxing Zhang, Liang Lu, and Mirella Lapata. 2015. Tree recurrent neural networks with application to language modeling. CoRR, abs/1511.00060 .

Andreas Zollmann and Ashish Venugopal. 2006. Syntax augmented machine translation via chart parsing. In Proceedings on the Workshop on Statistical Machine Translation. pages 138–141.

In document Exploiting target syntax with structured decoding (Page 49-55)