[PDF] Top 20 A Portuguese Native Language Identification Dataset

A Portuguese Native Language Identification Dataset

... aforementioned Portuguese learner corpora contain very useful data for research, particularly for Native Language Identification (NLI), a task that has received much attention in recent ...the ... See full document

6

Improving Native Language Identification by Using Spelling Errors

... Spelling errors have been used as features for NLI since Koppel et al. (2005). They considered syntax errors and eight types of spelling errors such as repeated letters, missing letters, and inver- sion of letters. The ... See full document

5

The Role of Emotions in Native Language Identification

... ICLE (Granger et al., 2009): the ICLEv2 dataset consists of essays written by highly- proficient non-native college-level students of En- glish. We used a 7-language subset of the corpus normalized ... See full document

7

Fusion of Simple Models for Native Language Identification

... To provide some additional complementarity in the final ensemble, a Naive Bayes model was trained on the same features. Despite its simplic- ity, the model became competitive after introduc- ing the n-gram features. ... See full document

7

Ensemble Methods for Native Language Identification

... standardized dataset and eval- uation metric allowed for the effective compar- ison of different models, and the results con- firmed the usefulness of SVMs for NLI (Tetreault et ... See full document

7

Improving Native Language Identification with TF IDF Weighting

... The dataset used for the shared task is called TOEFL11 (Blanchard et ...Foreign Language (TOEFL). The essays are written by 11 native language speakers ...glish language proficiency level ... See full document

8

A Report on the 2017 Native Language Identification Shared Task

... tive language of an ESL (English as a Second Lan- guage) writer based on a sample essay, although NLI has also been shown to work on other languages (Malmasi and Dras, ...2013 Native Language Iden- ... See full document

14

CIC FBK Approach to Native Language Identification

... The dataset used in the NLI Shared Task 2017 is composed of English essays written by non-native learners in a standardized assessment of English proficiency for academic ... See full document

8

Native Language Identification on Text and Speech

... a dataset contain- ing essays and spoken responses in form of tran- scriptions and acoustic features (iVectors) by non- native English speakers of eleven native languages taking a standardized ... See full document

7

Exploring Optimal Voting in Native Language Identification

... The performance of our three submissions on the test set is shown in Table 2. The first out- come is that the single model based on raw text character 6-grams performs very significantly above the organizer-provided ... See full document

7

Native Language Identification with User Generated Content

... of native language identification in the context of user generated content (UGC) in online ...this dataset, we de- fine three closely-related tasks: (i) distinguishing between native and ... See full document

11

Experimental Results on the Native Language Identification Shared Task

... TOEFL11 dataset (Blanchard et ...The dataset con- tains essays written in English from native speakers of 11 languages (Arabic, Chinese, French, German, Hindi, Italian, Japanese, Korean, Spanish, ... See full document

7

Robust, Lexicalized Native Language Identification

... the dataset play some role, though the most obvious improvement came from the use of our bias adaptation technique, which uses a small amount of data from a test corpus to improve the model; this was particularly ... See full document

18

Generating a Lexicon of Errors in Portuguese to Support an Error Identification System for Spanish Native Learners

... resourced language in what concerns foreign language ...Spanish native speakers learning Portuguese, we developed an approach to automatically generate a lexicon of wrong words, reproducing ... See full document

6

Chinese Native Language Identification

... that dataset using the English CoreNLP models, Penn Treebank PoS tagset and a set of 400 English func- tion ...and language transfer ...how language is processed in the brain in ways that are not ... See full document

5

Oracle and Human Baselines for Native Language Identification

... One possible approach to estimating an upper-bound for classification accuracy, and one that we employ here, is the use of an “Oracle” classifier. This method has previously been used to analyze the limits of majority ... See full document

7

A Multi versus a Single classifier Approach for the Identification of Modality in the Portuguese Language

... unbalanced nature of the dataset a weighted average approach was calculated for each metric. We use support vector machines as our classifier and experimented with various SVM kernels to find the optimal ... See full document

6

From Language to Family and Back: Native Language and Language Family Identification from English Text

... for native language attribution is introduced by Kop- pel et ...a dataset of au- thors of five different native languages taken from ICLEv1 (Granger et ... See full document

8

Neural Networks and Spelling Features for Native Language Identification

... In isolation, the ResNet system yields a relatively high F1 score of 80.16. This indicates that, although simpler methods yield better results for this task, deep neural networks are also applica- ble. However, further ... See full document

5

Native Tongues, Lost and Found: Resources and Empirical Evaluations in Native Language Identification

... In our study of the effect of training on one corpus and testing on another, we carry out experiments on pairs of corpora that consist of the same sets of languages. We evaluate first on the ICLE-NLI vs TOEFL7 corpora (7 ... See full document

17