[PDF] Top 20 Cross corpus Native Language Identification via Statistical Embedding

Cross corpus Native Language Identification via Statistical Embedding

... 2017 Native Language Iden- tification Shared Task (Malmasi et ...tive language of written texts, alongside a second task on spoken transcripts and low dimensional audio file representations as data ... See full document

5

Embedding Web Based Statistical Translation Models in Cross Language Information Retrieval

... tbe language of the searcher and the langage in which documents are written represents a serious ...foreign language, a CLIR system might still be ... See full document

39

Chinese Native Language Identification

... TOEFL11 corpus used in the NLI2013 shared ...CLI via surface phenomena oc- cur at the same levels and patternings regardless of the ...L2. Cross-language studies can help re- searchers in ... See full document

5

Feature Extraction for Native Language Identification Using Language Modeling

... of Native Language Identification ...the native language of authors of English texts written by non-native English speak- ...the language modeling approach and employs ... See full document

9

Robust, Lexicalized Native Language Identification

... training corpus in ...using cross-validation, our Lang-8 trained model does reasonably well in both our testing corpora; the results are fairly consistent, and the difference can be attributed to the ... See full document

18

Classifier Stacking for Native Language Identification

... with best individual performance, adding one and another. We do not include a feature type if it does not improve the accuracy on cross validation, and thus we cannot guarantee the optimiza- tion of performance ... See full document

8

Native Language Identification with PPM

... The character-based PPM models were used for spam detection, source-based text classification and classification of multi-modal data streams that included texts. In Bratko et al. (2006), the character-based PPM models ... See full document

8

Native Language Identification with User Generated Content

... under cross- validation on the TOEFL dataset, which includes 11 native languages (with a rather diverse distri- bution of language families), was ...ASK corpus of learners of Norwegian ... See full document

11

Arabic Native Language Identification

... and cross-linguistic ...potential language transfer hypothe- ses from the writings of English learners (Malmasi and Dras, ...tated corpus, which was not the case in this study, the annotations could ... See full document

7

Can characters reveal your native language? A language independent approach to native language identification

... authorship identification or plagiarism detection is to rely on features like words, part-of-speech tags, stems, or some other high-level linguistic ...of native language ...in native lan- ... See full document

11

Large Scale Native Language Identification with Cross Corpus Evaluation

... Directions for future work are manifold. The next phase of this research will focus on developing tools to derive and browse ranked lists of the most dis- criminative cross-corpus features, which will then ... See full document

7

A study of N gram and Embedding Representations for Native Language Identification

... This corpus was used in the first Native Lan- guage Identification shared task (Tetreault et al., 2013). 29 teams participating in the task, and wide range of lexical and syntactic feature ... See full document

9

Native Language Identification With Classifier Stacking and Ensembles

... using language models for this task and in their system used language model perplexity scores based on lexical 5-grams from each language in the ...conducted cross-corpus evaluation, ... See full document

45

Exploring Adaptor Grammars for Native Language Identification

... We examine two different approaches in this pa- per. We first utilise adaptor grammars for discovery of high performing ‘quasi-syntactic collocations’ of arbitrary length as mentioned above and use them as classification ... See full document

11

Oracle and Human Baselines for Native Language Identification

... The cross-validation and test results are very similar, with the oracle accuracy at 95%, suggesting that for each document there is in most cases at least one feature type that correctly predicts ... See full document

7

Maximizing Classification Accuracy in Native Language Identification

... a corpus of texts consisting of 11,000 essays written by nonnative English speakers as part of a high-stakes test of general proficiency in academic ...The corpus is perfectly balanced in terms of its ... See full document

8

Measuring Feature Diversity in Native Language Identification

... Such analyses can help us better understand the linguistic properties of features and guide interpre- tation of the results. This knowledge can also be useful in creating classifier ensembles. One goal in creating such ... See full document

7

Exploiting Parse Structures for Native Language Identification

... Syntactic features, in contrast, in particular those that capture grammatical errors, which might po- tentially be useful for this task, have received lit- tle attention. Koppel et al. (2005) did suggest using syntactic ... See full document

11

CIC FBK Approach to Native Language Identification

... Syntactic n-grams can be used in any task where traditional n-grams are applied. They allow to introduce syntactic information into machine- learning methods (obviously, at cost of previ- ous syntactic parsing). ... See full document

8

Stemming Algorithms: A Comparative Study and their Analysis

... Terms and their corresponding stems can also be stored in a table. Stemming is then done via lookups in the table. One way to do stemming is to store a table of all index terms and their stems. Terms from queries ... See full document

6