[PDF] Top 20 Rule Based Normalization of Historical Texts

Rule Based Normalization of Historical Texts

... Epsilon identity rules In the system developed so far, insertion rules are rather difficult to han- dle. In generating modern wordforms by means of the rewrite rules (see Sec. 5), they tend to ap- ply in an ... See full document

9

Normalizing Medieval German Texts: from rules to deep learning

... contains texts issued on Swiss territory from the early Middle Ages up to ...with texts written between 1450 and 1550, which corresponds to the Early New High German ...2500 historical-modern word ... See full document

6

Normalization of Kazakh Texts

... is based on ap- plying a noisy channel model (Damerau, 1964), which consists of a source model and a channel ...method based on neural system ...was based on replacement rules as a regular expression ... See full document

6

Evaluating the Noisy Channel Model for the Normalization of Historical Texts: Basque, Spanish and Slovene

... the normalization of historical texts using a combination of weighted finite-state transducers and language ...the normalization of dialectal texts and tested the method against a 17th ... See full document

6

A Large Scale Comparison of Historical Text Normalization Systems

... for historical text normalization so far, covering eight languages from different language families—English, Ger- man, Hungarian, Icelandic, Spanish, Portuguese, Slovene, and Swedish—as well as different ... See full document

14

Semi Supervised Maximum Entropy Based Approach to Acronym and Abbreviation Normalization in Medical Texts

... For the problem at hand, one way to take context into account is to encode the type of discourse in which the abbreviation occurs, where discourse is defined narrowly as the type of the medical document and the medical ... See full document

8

Combining Phonology and Morphology for the Normalization of Historical Texts

... In order to learn the changes that occur within the selected word pairs, the previously mentioned Phonetisaurus tool was used. This tool is a WFST-driven grapheme-to-phoneme (g2p) frame- work suitable for rapid ... See full document

6

Using Comparable Collections of Historical Texts for Building a Diachronic Dictionary for Spelling Normalization

... of historical written re- sources can help overcoming typical challenges posed by heritage texts enhanc- ing spelling normalization, POS-tagging and subsequent diachronic linguistic anal- ...of ... See full document

6

Comparing Rule based and SMT based Spelling Normalisation for English Historical Texts

... for historical texts poses several challenges, as earlier stages of languages are ...First, historical variants not only differ from present-day ... See full document

7

Evaluating Inter Annotator Agreement on Historical Spelling Normalization

... be based on entire texts rather than isolated pairs of normalizations, since expected agreement cannot be calculated for isolated pairs and, hence, a comparison with our scores would not easily be ... See full document

10

Identifying Prepositional Phrases in Chinese Patent Texts with Rule-based and CRF Methods

... a rule-based method and Conditional Random Field (CRF) method to recognize the ...patent texts possess kinds of common and fixed structures and expressions, which are more suitable for rule- ... See full document

7

Lexicon Construction and Corpus Annotation of Historical Language with the CoBaLT Editor

... for texts encoded according to the TEI P5 guidelines, which is why the GermanC team spent a lot of time on writing scripts to deal with formatting ...for historical corpus development we could find is ... See full document

6

Learning attention for historical text normalization by learning to pronounce

... Automated processing of historical texts often relies on pre-normalization to modern word forms. Training encoder-decoder architectures to solve such problems typi- cally requires a lot of training ... See full document

13

Correcting Whitespace Errors in Digitized Historical Texts

... Our research is motivated by our own experi- ence working with historical texts. We were for- tunate to obtain access to a manually-digitized corpus of nineteenth-century newspapers from the United States. ... See full document

6

Morphological analyzer for classical tamil text: A rule based approach

... Morphological Analyzer is the essential and basic tool for building any language processing application. Morphological Analysis is the process of providing grammatical information of a word given its suffix. ... See full document

6

Evaluation of a multi-arm multi-stage Bayesian design for phase II drug selection trials – an example in hemato-oncology

... Adaptive designs for clinical trials that use features that change or “adapt” in response to information generated during the trial to be more efficient than standard ap- proaches [1] have been the focus of an abundant ... See full document

15

Blood in the shower : a visual history of menstruation and clean bodies

... themselves through the branding strategies pioneered by Procter & Gamble and other multinational corporations in the late-twentieth-century through consumer market research (McCraw, 2009). Market surveys indicate ... See full document

10

Deep Neural Models for Medical Concept Normalization in User Generated Texts

... knowledge- based system for mapping texts to UMLS identi- fiers is MetaMap (Aronson, ...linguistic- based system uses lexical lookup and variants by associating a score with phrases in a ... See full document

7

Temporal classification for historical Romanian texts

... In this paper we look at a task at border of natural language processing, historical linguistics and the study of language development, namely that of identifying the time when a text was written. We use machine ... See full document

5

OCR and post correction of historical Finnish texts

... why historical documents still pose a challenge for OCR are: fonts differ in different materials, lack of orthographic standard (same words spelled differently), material quality (some documents can have ... See full document

7