[PDF] Top 20 Experiments in Sentence Language Identification with Groups of Similar Languages

Experiments in Sentence Language Identification with Groups of Similar Languages

... Manual Feature Selection We also used manual feature selection, selecting features to use in the clas- sifiers from lists published on Wikipedia comparing the two languages. Of course some of the features in lists ... See full document

9

Experiments in Discriminating Similar Languages

... two groups (Lui et ...in sentence length, DSLCC v.1.0 was fairly similar to this year’s corpus, and helped more than independently acquired mate- ... See full document

7

Experiments in Cuneiform Language Identification

... dialect identification but also on other NLP tasks related to language and dialect variation ...the language and dialect identification com- petitions organized at VarDial focused on ... See full document

5

Discriminating Similar Languages: Evaluations and Explorations

... of-the-art language identification systems trained to recog- nized similar languages and language varieties using the results of the first two DSL shared ...which groups of ... See full document

8

Code Mixing: A Challenge for Language Identification in the Language of Social Media

... form language identification using artificial mul- tilingual data, created by randomly sampling text segments from monolingual ...word-level language identification. A dataset of 30 ... See full document

11

Discriminating between Similar Languages and Arabic Dialect Identification: A Report on the Third DSL Shared Task

... each language/variety ...two groups may be due to different criteria the two annotators used, the differences inside groups show important trends, ... See full document

14

Word-length algorithm for language identification of under-resourced languages

... different languages and for clustering based on string kernels; however, issues of resource-poor languages were not ...the language identiﬁcation task and showed state-of the-art results using string ... See full document

13

Twitter Language Identification Of Similar Languages And Dialects Without Ground Truth

... to language while minimizing information with respect to data ...97 languages, including the specific languages that we use for this ...supported languages. For example, with ... See full document

11

The NRC System for Discriminating Similar Languages

... Training on groups B to E is faster because we only need one SVM model per feature space. In addition, for group C, only one model is necessary because no vote outperforms the best model. Training the best model ... See full document

7

Language Discrimination and Transfer Learning for Similar Languages: Experiments with Feature Combinations and Adaptation

... As a result, the scores of the models with no adaptation in Figure 3 drop drastically when they are trained on the training set, and tested on a test set with utterances from different speakers. On the GDI data, this is ... See full document

10

A Simple Baseline for Discriminating Similar Languages

... Some research suggests that word-based features can even outperform character-based approaches. For Brazilian vs European Portuguese, Zampieri and Gebre (2012) found that word unigrams gave very similar ... See full document

6

Slavic Forest, Norwegian Wood

... For example, CS contains a language-specific nummod:gov deprel, which never occurs in SK. We do not want the parser to learn to assign that deprel, because we are not going to score on such relations. Hence, we ... See full document

10

Crawling microblogging services to gather language classified URLs Workflow and case study

... The relatively low number of results for Russian may be explained by weaknesses of langid.py with deviations of encoding standards. Indeed, a few tweaks are necessary to correct the biases of the software in its ... See full document

7

Short Term Projects, Long Term Benefits: Four Student NLP Projects for Low Resource Languages

... high-resource languages (Xia and Lewis, 2007) to maximizing annotation effort (Garrette and Baldridge, ...high-resource languages (Bender, 2011); for example, much work on com- putational syntax uses models ... See full document

5

Stacked Sentence Document Classifier Approach for Improving Native Language Identification

... L1 sentence classifier that is aimed at classifying the native language of each sentence of a ...L1 sentence classifier are used as features by the L1 document ...the sentence ... See full document

8

Approaching Difficulties of Teaching Language Complexes by Example of GAS and BCS

... three languages is very clear at this point: Serbians say pevačica, referring to a person who is a ...two languages one is speaking and from which culture the speaker originates as the Serbs speak ekavica ... See full document

9

Syriac Language Maintenance among the Assyrians of Iraq

... Perhaps, due to the times of political conflicts which have taken place in the country for years, the Assyrians see that none of the former governments has given them the full right to teach and practice their ethnic ... See full document

14

Phrase Based Approach for Adaptive Tokenization

... We use the training and testing sets from the second international Chinese word segmentation bakeoff (Emerson, 2005), which are freely available and most widely used in evaluations. There are two corpora in simplified ... See full document

9

Language Identification: The Long and the Short of the Matter

... Language identification is the task of identify- ing the language a given document is written ...on experiments across three separate datasets and a range of tokeni- sation ...of ... See full document

9

Experiments on Sentence Boundary Detection

... Feature Preceding word Probability preceding word ends a sentence Part of speech tag assigned to preceding word Probability that part of speech tag feature 3 is assigned to last word in [r] ... See full document

6