Cross-Lingual Word Embeddings - Deep Representation Learning

2.2 Deep Representation Learning

2.2.3 Cross-Lingual Word Embeddings

One subfield of word representation learning that is pertinent to this dissertation is the learning of cross-lingual word representation. With the rise of distributed word representations, it has also become popular to learn distributed cross-lingual lexical representation, namely cross-lingual word embeddings. Cross-lingual word embeddings project words from two or more languages into a single semantic space so that words with similar meanings reside closer to each other regardless of language.

Traditionally, most methods for learning cross-lingual word embeddings were supervised, relying on cross-lingual supervision such as bilingual dictio- naries (Klementiev et al., 2012; Mikolov et al., 2013b), or parallel corpora (Zou et al., 2013; Gouws et al., 2015). Some methods attempted to alleviate the de- pendence on such cross-lingual supervision by only requiring comparable sen- tences or even documents (Vuli´c and Moens, 2015).

On the other hand, there are efforts focusing on the multilingual case to learn a shared embedding space across more than two languages (Ammar et al., 2016; Duong et al., 2017), resulting in a set of multilingual word embeddings (MWEs).

In comparison, the standard cross-lingual embeddings between two languages are referred to as bilingual word embeddings (BWEs). Please refer to this sur- vey (Ruder et al., 2017) for a more detailed coverage on existing research of cross-lingual word embedding induction.

CHAPTER 3

LANGUAGE-ADVERSARIAL TRAINING FOR CROSS-LINGUAL MODEL TRANSFER

In this chapter, we propose the Language-Adversarial Training technique for the cross-lingual model transfer problem, which learns a language-invariant hidden feature space to achieve better cross-lingual generalization using only unlabeled monolingual textsfrom the source language and the target. It is a pi- oneering effort towards removing type II supervision (cross-lingual resources) from cross-lingual model transfer.

This chapter is based on (Chen et al., 2016; Chen et al., 2018b).

In this chapter, we focus on a simple yet fundamental NLP task, text classification, and present our language-adversarial training approach in the context of cross-lingual text classification (CLTC). In the text classification task, the input is a piece of text (a sentence or document), and the output is chosen from a set of predetermined categories. For instance, one may want to classify a product review into five categories corresponding to its star rating (1-5).

Similar to all cross-lingual transfer learning tasks, the goal of CLTC is to leverage the abundant resources of asource language (likely English, denoted as SOURCE) in order to build text classifiers for a low-resourcetargetlanguage (TARGET). Our model is able to tackle the more challengingunsupervisedCLTC setting, where no target language annotations (type I supervision) are available. On the other hand, our method remains superior in the semi-supervised setting

with a small amount of type I supervision available (See Section 3.3.3).

3.1 Related Work

Cross-lingual Text Classification is motivated by the lack of high-quality la-

beled data in many non-English languages (Bel et al., 2003; Mihalcea et al., 2007; Wan, 2008; Banea et al., 2008, 2010; Prettenhofer and Stein, 2010). Our work is comparable to these in objective but very different in method. Most previous works are resource-based methods that are directly centered around some kind of type II supervision (cross-lingual resources), such as machine translation sys- tem, parallel corpora, or bilingual lexica, in order to transfer the knowledge learned from the source language into the target. For instance, some recent efforts make direct use of a parallel corpus either to learn a bilingual document representation (Zhou et al., 2016) or to conduct cross-lingual distillation (Xu and Yang, 2017).

Domain Adaptationtries to learn effective classifiers for which the training and

test samples are from different underlying distributions (Blitzer et al., 2007; Pan et al., 2011; Glorot et al., 2011; Chen et al., 2012; Liu et al., 2015). This can be thought of as a generalization of cross-lingual text classification. However, one main difference is that, when applied to text classification tasks, most of these domain adaptation work assumes a common feature space such as a bag-of- words representation, which is not available in the cross-lingual setting. See Section 3.3.2 for experiments on this. In addition, most works in domain adaptation evaluate on adapting product reviews across domains (e.g. books to elec- tronics), where the divergence in distribution is less significant than that between two languages.

Adversarial Networks are a family of neural network models that have two or more components with competing objectives, and have enjoyed much suc- cess in computer vision (Goodfellow et al., 2014; Ganin et al., 2016). A series of work in image generation has used architectures similar to ours, by pitting a neural image generator against a discriminator that learns to classify real versus generated images (Goodfellow et al., 2014). More relevant to this work, adversarial architectures have produced the state-of-the-art in unsupervised domain adaptation for image object recognition: Ganin et al. (2016) train with many la- beled source images and unlabeled target images, similar to our setup. In addition, other recent work (Arjovsky et al., 2017; Gulrajani et al., 2017) proposes improved methods for training Generative Adversarial Nets. In Chen et al. (2016), we proposedlanguage-adversarial training, the first adversarial neural net for cross-lingual NLP, which will be described in this chapter. As of the writing of this dissertation, there are many more recent works that adopt adversarial training for cross-lingual NLP tasks, such as cross-lingual text classification (Xu and Yang, 2017), cross-lingual word embedding induction (Zhang et al., 2017; Lample et al., 2018) and cross-lingual question similarity reranking (Joty et al., 2017).

In document Learning Deep Representations for Low-Resource Cross-Lingual Natural Language Processing (Page 34-38)