• No results found

subword unit

Neural Text Normalization with Subword Units

Neural Text Normalization with Subword Units

... with subword units, one can add or concatenate each subword’s embedding with its corresponding linguistic fea- ture embedding and feed a combined embedding to the bi-LSTM ...the subword model with ...

7

Neural Machine Translation of Rare Words with Subword Units

Neural Machine Translation of Rare Words with Subword Units

... of subword models can actually improve perfor- ...alignable subword units, al- though the segmentation algorithm cannot rely on the target text at ...

11

A Systematic Study of Leveraging Subword Information for Learning Word Representations

A Systematic Study of Leveraging Subword Information for Learning Word Representations

... internal subword- level structure, 1 subword-agnostic word represen- tation models do not take these structure features into account and are effectively unable to represent rare words accurately, or unseen ...

21

Subword level Word Vector Representations for Korean

Subword level Word Vector Representations for Korean

... by subword level embedding such as character em- ...into subword units and using them as inputs improves performance for downstream NLP such as text classification (Zhang et ...

10

Enriching Word Vectors with Subword Information

Enriching Word Vectors with Subword Information

... ging (Ling et al., 2015) and parsing (Ballesteros et al., 2015). Another family of models are convolu- tional neural networks trained on characters, which were applied to part-of-speech tagging (dos San- tos and ...

12

Mimicking Word Embeddings using Subword RNNs

Mimicking Word Embeddings using Subword RNNs

... Compositional models for embedding rare and unseen words. Several studies make use of morphological or orthographic information when training word embeddings, enabling the predic- tion of embeddings for unseen words ...

11

Reusing Weights in Subword Aware Neural Language Models

Reusing Weights in Subword Aware Neural Language Models

... on subword-aware neural language modeling we refer the reader to the paper by Vania and Lopez (2017), where the authors systematically compare different subword units (characters, character tri- grams, BPE, ...

11

SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing

SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing

... language-independent subword tokenizer and detokenizer designed for Neural-based text processing, including Neural Machine ...for subword units. While existing subword segmentation tools assume that ...

6

BPEmb: Tokenization free Pre trained Subword Embeddings in 275 Languages

BPEmb: Tokenization free Pre trained Subword Embeddings in 275 Languages

... comparing subword approaches. This is an interesting task for subword evaluation, since many rare, long-tail entities do not have good represen- tations in common token-based pre-trained embeddings such as ...

5

Supersense Tagging with a Combination of Character, Subword, and Word level Representations

Supersense Tagging with a Combination of Character, Subword, and Word level Representations

... or subword- level approaches, there has been lack of studies on ways to combine different levels of features, namely character, subword, and word-level fea- ...of subword units have not even been ...

5

Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates

Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates

... unique subword sequences, subword seg- mentation is potentially ambiguous and multiple segmentations are possible even with the same ...method, subword regularization, which trains the model with ...

10

Codeswitching language identification using Subword Information Enriched Word Vectors

Codeswitching language identification using Subword Information Enriched Word Vectors

... Our work covers only the EN-ES language pair. We use FastText (Bojanowski et al., 2016) to train a subword information enhanced word vectors model from the datasets of the shared task. We then use these vectors ...

5

Incorporating Subword Information into Matrix Factorization Word Embeddings

Incorporating Subword Information into Matrix Factorization Word Embeddings

... impact subword information has on in-vocabulary (IV) word representations, we run intrinsic evaluations consisting of word sim- ilarity and word analogy ...of subword information results in similar gains ...

6

Subword level Composition Functions for Learning Word Embeddings

Subword level Composition Functions for Learning Word Embeddings

... In this section we test the ability of subword- level embeddings to predict what affix is present in a morphologically complex word. We use the dataset gathered by (Lazaridou et al., 2013), which contains 6549 ...

11

Subword Level Language Identification for Intra Word Code Switching

Subword Level Language Identification for Intra Word Code Switching

... German–Turkish The German–Turkish Twitter Corpus (C ¸ etino˘glu and C ¸ ¨oltekin, 2016) consists of 1029 tweets with 17K tokens. They are man- ually normalized, tokenized, and annotated with language IDs. The language ID ...

7

Entropy Based Subword Mining with an Application to Word Embeddings

Entropy Based Subword Mining with an Application to Word Embeddings

... 1) subword pat- tern mining 2) subword ...a subword vocabulary, and then using these subwords to hierarchically segment each word in the ...candidate subword boundaries, we identify candidate ...

10

Subword Encoding in Lattice LSTM for Chinese Word Segmentation

Subword Encoding in Lattice LSTM for Chinese Word Segmentation

... We examined the effectiveness of subwords for neural CWS. Subwords are deduced using BPE, and then integrated into a character-based neural segmentor through lattice LSTM. Results on four benchmarks show that ...

6

Meaningless yet meaningful: Morphology grounded subword level NMT

Meaningless yet meaningful: Morphology grounded subword level NMT

... • Main advantage of BPE is solving OOV problem in two ways: i) some segmenta- tions are almost morphological segmen- tation, and ii) some segmentations are nearly character-level segmentations. As a result, OOV words are ...

6

Subword and Spatiotemporal Models for Identifying Actionable Information in Haitian Kreyol

Subword and Spatiotemporal Models for Identifying Actionable Information in Haitian Kreyol

... Crisis-affected populations are often able to maintain digital communications but in a sudden-onset crisis any aid organizations will have the least free resources to process such communications. Information that aid ...

10

Discriminating between Similar Languages using Weighted Subword Features

Discriminating between Similar Languages using Weighted Subword Features

... In view of this I document work on a refined version of the Bayesline (Tan et al., 2014) which has been referenced in the last shared task (Bar- baresi, 2016a) and which has now been used in of- ficial competition. After ...

6

Show all 10000 documents...

Related subjects