• No results found

Web corpus

What kind of corpus is a web corpus?

What kind of corpus is a web corpus?

... NoWaC corpus. We have com- pared this web corpus with one corpus of spo- ken language and one of written ...the web corpus sides with the written corpus, not the spoken ...

8

Focused Web Corpus Crawling

Focused Web Corpus Crawling

... e., web corpus crawling, a doc- ument with a high weight can simply be defined as one which is not removed from the corpus by the post-processing tools due to low linguistic qual- ity and/or a ...

7

The development of a web corpus of Hindi language and corpus based comparative studies to Japanese

The development of a web corpus of Hindi language and corpus based comparative studies to Japanese

... a web corpus of spoken Hindi (COSH), one of the Indo-Aryan languages spoken mainly in the Indian ...the web corpus and the special ...the corpus, especially in pragmatics and semantics, ...

10

A High-Quality Web Corpus of Czech

A High-Quality Web Corpus of Czech

... new corpus to other resources ...acquire web data that are as similar to currently available corpora as ...the web corpus differ from those in reference corpus a ...our web ...

5

A Figure of Merit for the Evaluation of Web Corpus Randomness

A Figure of Merit for the Evaluation of Web Corpus Randomness

... the Web using au- tomated queries to search engines (Ghani et ...a corpus that contains, mostly, pages in the rele- vant language, but they did not evaluate the results in terms of quality or ...a ...

8

Automatic Classification by Topic Domain for Meta Data Generation, Web Corpus Evaluation, and Corpus Comparison

Automatic Classification by Topic Domain for Meta Data Generation, Web Corpus Evaluation, and Corpus Comparison

... crawled web corpus (Sch¨afer and Bildhauer, 2012; Sch¨afer, 2015), henceforth ...a corpus composed predom- inantly of newspaper texts (Kupietz et ...

6

Building a Korean Web Corpus for Analyzing Learner Language

Building a Korean Web Corpus for Analyzing Learner Language

... Other possibilities There are other ways to in- crease the size of a web corpus using BootCaT. First, one can increase the number of returned pages for a particular query. We set the limit at 20, as ...

9

PaddyWaC: A Minimally Supervised Web Corpus of Hiberno English

PaddyWaC: A Minimally Supervised Web Corpus of Hiberno English

... another web-corpus of Hiberno-English that is in development (Crúbadán, Scannell, personal communica- tion) that relies on domain filtering of crawled ...

8

Comprehensive Annotation of Multiword Expressions in a Social Web Corpus

Comprehensive Annotation of Multiword Expressions in a Social Web Corpus

... Multiword expressions (MWEs) are quite frequent in languages such as English, but their diversity, the scarcity of individual MWE types, and contextual ambiguity have presented obstacles to corpus-based studies ...

7

Building a fine-grained subjectivity lexicon from a web corpus

Building a fine-grained subjectivity lexicon from a web corpus

... Our method consists of a preprocessing step where the original files of different source formats are converted into structured text format, converting them into lower case, cleaning them by removing hyperlinks, html ...

7

The American National Corpus: More Than the Web Can Provide

The American National Corpus: More Than the Web Can Provide

... the web as a source of corpus materials has caused some in the language processing community to suggest that the development of a corpus of American English is ...a corpus compiled of ...

6

‘BonTen’ – Corpus Concordance System for ‘NINJAL Web Japanese Corpus’

‘BonTen’ – Corpus Concordance System for ‘NINJAL Web Japanese Corpus’

... Japanese web corpus named ‘NINJAL Web Japanese Corpus’ (hereafter ‘NWJC’)(Asahara et ...the web-based corpus concordance system ‘BonTen’ – Brahman 1 for ...the web-based ...

5

C4Corpus: Multilingual Web size Corpus with Free License

C4Corpus: Multilingual Web size Corpus with Free License

... multilingual web crawl available to ...for Web corpus processing and bring them under the unified framework based on Hadoop platform in order to scale up to ...on Web corpus ...

9

Towards Universal Web Parsebanks

Towards Universal Web Parsebanks

... the web compared to most of the UD ...create web corpora for such languages that would repre- sent a substantial fraction of the web in that lan- guage, and even if such a web corpus ...

10

Collecting Semantic Data from Mechanical Turk for a Lexical Knowledge Resource in a Text to Picture Generating System

Collecting Semantic Data from Mechanical Turk for a Lexical Knowledge Resource in a Text to Picture Generating System

... 5-gram web corpus (LDC2006T13), by counting the frequency of each target and response word in unigram and bigram portions of the corpus and then the number of times the two words co-occur within a ...

5

Web Text Corpus for Natural Language Processing

Web Text Corpus for Natural Language Processing

... the Web Corpus, web-related collo- cations such as home page and search home ...word Corpus, there are 53 matches from the Web ...the Web Corpus results both contain the ...

8

Solving Relational Similarity Problems Using the Web as a Corpus

Solving Relational Similarity Problems Using the Web as a Corpus

... We present a simple linguistically-motivated method for characterizing the semantic rela- tions that hold between two nouns. The ap- proach leverages the vast size of the Web in order to build lexically-specific ...

9

Linked Open Corpus Models, Leveraging the Semantic Web for Adaptive Hypermedia

Linked Open Corpus Models, Leveraging the Semantic Web for Adaptive Hypermedia

... open corpus content, however, to add an open corpus resource to the system, a comparative analysis must be conducted between the new resource and every existing resource in the ...open corpus ...

9

Corpus based Semantic Lexicon Induction with Web based Corroboration

Corpus based Semantic Lexicon Induction with Web based Corroboration

... Table 3 shows the 10 top-ranked candidates for each semantic category based on the Max of Seeds scores. The table illustrates that this scoring func- tion does a good job of identifying semantically cor- rect words, ...

9

Construction of Text Summarization Corpus for the Credibility of Information on the Web

Construction of Text Summarization Corpus for the Credibility of Information on the Web

... We will describe analysis of the recorded data. First, we will describe the rate of agreement in the result of narrowing steps between two summarizers. The calculation of the agreement rate for the step of collecting ...

7

Show all 10000 documents...

Related subjects