Top PDF Dutch corpus

The Spoken Dutch Corpus. Overview and First Evaluation

... Spoken Dutch Corpus project obtains data through other projects, as in the case of the private interviews that have been recorded within the project The pronunciation of Standard ...

7

Experiences from the Spoken Dutch Corpus Project

... Spoken Dutch Corpus (Corpus Gesproken Neder- lands; CGN) project aims to develop a corpus of 1,000 hours of speech originating from adult speakers of standard ...The corpus is to serve ...

8

Using the Spoken Dutch Corpus for type-logical grammar induction

... The dependency-based annotation format employed within the Spoken Dutch Corpus (CGN) project (van der Wouden et al., 2002) has been designed in such a way as to enable a transparent mapping to the ...

7

Orthographic Transcription of the Spoken Dutch Corpus

... Spoken Dutch Corpus, the problems encountered in making that specification and the evaluation experiments that were carried out to assess the transcription efficiency and the inter- transcriber ...Spoken ...

6

CLiPS Stylometry Investigation (CSI) corpus: A Dutch corpus for the detection of age, gender, personality, sentiment and deception in text

... (CSI) corpus, a new Dutch corpus containing reviews and essays written by university ...The corpus currently contains about 305,000 tokens spread over 749 ...The corpus will be made ...

5

Word Segmentation in the Spoken Dutch Corpus

... This paper describes the aims of the word segmentation in the Spoken Dutch Corpus (Corpus Gesproken Nederlands, CGN), and the procedures to create it. For one million words, a manually verified ...

6

Syntactic Analysis in the Spoken Dutch Corpus (CGN)

... the corpus) and Utrecht (for the Dutch part), the syntactic annotation tool Annotate, developed by DFKI Saarbruecken, is used (Plaehn, 1998; Brants, ...

6

Part of Speech Tagging and Lemmatisation for the Spoken Dutch Corpus

... the Dutch prepositions, for instance, are not only used to intro- duce an NP or some other complement, but can also be used without adjacent complement, as a consequence of strand- ing or intransitive ...

7

Cross linguistic differences and similarities in image descriptions

... trilingual corpus of described ...new corpus of Dutch descriptions (Section ...across Dutch, US English, and German (Section ...the Dutch corpus available online and we also ...

10

The D-TUNA Corpus: A Dutch Dataset for the Evaluation of Referring Expression Generation Algorithms

... D-TUNA corpus, which is the first semantically annotated corpus of referring expressions in ...the corpus addresses several other research goals. Firstly, the corpus contains both written and ...

6

From D-Coi to SoNaR: a reference corpus for Dutch

... reference corpus of written ...reference corpus of written Dutch, a pilot project was ...The Dutch Corpus Initiative project or D-Coi was highly successful in that it not only realized ...

8

Linguistic Problems Based on Text Corpora

... The genre of self-contained linguistic problems appeared long before the onset of corpus linguis- tics. The authors of most problems either con- structed phrases or sentences on their own, or (much less commonly) ...

9

Grammar Driven versus Data Driven: Which Parsing System Is More Affected by Domain Shifts?

... For parsing, most previous work on domain adaptation has focused on data-driven systems (Gildea, 2001; McClosky et al., 2006; Dredze et al., 2007), i.e. systems employing (con- stituent or dependency based) treebank ...

9

Collection of a corpus of Dutch SMS

... available corpus of Dutch text messages containing data originating from the Netherlands and ...This corpus has been collected in the framework of the SoNaR project and constitutes a viable part of ...

6

Integrating Linguistic Knowledge in Passage Retrieval for Question Answering

... The passage retrieval component in Joost includes an interface to seven off-the shelf IR systems. One of the systems supported is Lucene from the Apache Jakarta project (Jakarta, 2004). Lucene is a widely- used ...

8

The Multilingual Affective Soccer Corpus (MASC): Compiling a biased parallel corpus on soccer reportage in English, German and Dutch

... The reports are saved as plain text files in UTF-8 coding in separate folders according to which sub- corpus and category (WIN, LOSS, TIE) they belong to. The metadata for the three main subcorpora is split into ...

5

Data Collection and IPR in Multilingual Parallel Corpora. Dutch Parallel Corpus

Dutch corpus