Top PDF Improved Semantic Representation for Domain Specific Entities

Improved Semantic Representation for Domain Specific Entities

Improved Semantic Representation for Domain Specific Entities

dings with the help of knowledge derived from other resources (Yu and Dredze, 2014; Bian et al., 2014; Faruqui et al., 2015) or by including arbi- trary contexts in the training process (Levy and Goldberg, 2014). However, most of these tech- niques still suffer from another deficiency of word embeddings that they inherit from their count- based ancestors: they conflate the different mean- ings of a word into a single vector representa- tion. Attempts have been made to tackle the meaning conflation issue of word-level represen- tations. A series of approaches cluster the context of a word prior to representation (Reisinger and Mooney, 2010; Huang et al., 2012; Neelakantan et al., 2014) whereas others exploit lexical knowl- edge bases for sense-specific information (Rothe and Sch¨utze, 2015; Chen et al., 2014; Iacobacci et al., 2015; Camacho-Collados et al., 2015).
Show more

5 Read more

An Improved Corpus Comparison Approach to Domain Specific Term Recognition

An Improved Corpus Comparison Approach to Domain Specific Term Recognition

Terms are linguistic representations of domain specific concepts to encode our special knowl- edge about a subject field. Emerging pattern (EP) (Dong and Li, 1999) presents a similar idea to corpus comparison in the field of database for knowledge discovery. EPs are defined as itemsets whose growth rates, i.e., the ratios of their supports 1 in one dataset over those in another, are larger than a predefined threshold. When applied to datasets with classes (e.g., cancerous vs. normal tissues, poisonous vs. edible mushrooms), EPs can capture significant differences or useful contrasts between the classes in terms of their growth rates. In principle, the larger the growth rates, the more significant the patterns. This approach has been successfully deployed in several applications of data mining, e.g., Li and Wong (2002) on identification of good diagnos- tic gene groups from gene expression profiles.
Show more

9 Read more

SEE: structured representation of scientific evidence in the biomedical domain using Semantic Web techniques

SEE: structured representation of scientific evidence in the biomedical domain using Semantic Web techniques

Scientific findings are now routinely published as resources on the World Wide Web. Besides electronic versions of natural language texts more and more information from both new and legacy sources becomes available through databases [1] and web services [2] which provide through structured formats and interfaces consolidated views of and programmatic access to biomedical data. Semantic web technologies and standards in particular offer by virtue of their well-defined semantics and broad applic- ability potent means for the computational integration and analysis of biomedical data from heterogeneous and distributed sources on a large scale [3-5]. Accordingly, the Resource Description Framework (RDF, [6]) is increasingly employed to represent and disseminate new and legacy biomedical data [7,8] and biomedical ontologies specified in the Web Ontology Language (OWL, [9]) are being developed to encode domain- specific knowledge and annotate data from biomedical investigations [10-12]. As with any other means for communicating scientific results, findings encoded in semantic web formats need to be accompanied by an account of how they have been established to evaluate their relevance. Towards this end different models, tools and methods have been proposed: for representing and evaluating research hypotheses [13,14], contextua- lization [15], models of discourse [16], of argument [17], extended means for annota- tion [18,19], or specific container formats [20]. There is, however, currently no dedicated model supporting a coherent, extensible and semantic-web compatible repre- sentation of all those aspects routinely considered by a researcher inspecting the evidence for a given scientific finding, i.e. a representation of (i) the experimental and computational methods and settings that were used to establish the observational results and process the data, (ii) the reasoning including additional findings and assumptions used to infer the result in question, and (iii) information sources and agents through which the corresponding views were communicated and propagated.
Show more

22 Read more

GenNext: A Consolidated Domain Adaptable NLG System

GenNext: A Consolidated Domain Adaptable NLG System

We introduce GenNext, an NLG system designed specifically to adapt quickly and easily to different domains. Given a do- main corpus of historical texts, GenNext allows the user to generate a template bank organized by semantic concept via derived discourse representation structures in con- junction with general and domain-specific entity tags. Based on various features collected from the training corpus, the system statistically learns template rep- resentations and document structure and produces well–formed texts (as evaluated by crowdsourced and expert evaluations). In addition to domain adaptation, Gen- Next’s hybrid approach significantly re- duces complexity as compared to tradi- tional NLG systems by relying on tem- plates (consolidating micro-planning and surface realization) and minimizing the need for domain experts. In this descrip- tion, we provide details of GenNext’s the- oretical perspective, architecture and eval- uations of output.
Show more

5 Read more

Towards a Domain Independent Semantics: Enhancing Semantic Representation with Construction Grammar

Towards a Domain Independent Semantics: Enhancing Semantic Representation with Construction Grammar

This issue of scalability and generalizability across genres could possibly be improved by linking semantics more directly with syntax, as theorized by Construction Grammar (CxG) (Fillmore et. al., 1988; Golderg, 1995; Kay, 2002; Michaelis, 2004; Goldberg, 2006). This theory suggests that the meaning of a sentence arises not only from the lexical items but also from the patterned structures or constructions they sit in. The meaning of a given phrase, a sentence, or an utterance, then, arises from the combination of lexical items and the syntactic structure in which they are found, including any patterned structural configurations (e.g. patterns of idiomatic expressions such as “The Xer, the Yer” – The bigger, the better) or recurring structural elements (e.g. function words such as determiners, particles, conjunctions, and prepositions). That is, instead of focusing solely on the semantic label of words, as is done in SRL and in many traditional theories in Linguistics, CxG brings more into focus the interplay of lexical items and syntactic forms or structural patterns as the source of meaning.
Show more

8 Read more

Inducing Domain Specific Semantic Class Taggers from (Almost) Nothing

Inducing Domain Specific Semantic Class Taggers from (Almost) Nothing

Next, we measured the impact of the one- semantic-class-per-discourse heuristic, shown as XCat+OSCPD I.40. From Table 1, it appears that OSCPD produced mixed results: recall increased by 1-4 points for DIS / SYM , DRUG , HUMAN , and OTHER , but precision was inconsistent, improv- ing by +4 for T EST but dropping by -8 for DRUG . However, this single snapshot in time does not tell the full story. Figure 2 shows the performance of the classifiers during the course of bootstrap- ping. The OSCPD heuristic produced a steeper learning curve, and consistently improved perfor- mance until the last few iterations when its perfor- mance dipped. This is probably due to the fact that noise gradually increases during bootstrapping, so incorrect labels are more likely and OSCPD will compound any mistakes by the classifier. A good future strategy might be to use the OSCPD heuris- tic only during the early stages of bootstrapping when the classifier’s decisions are most reliable.
Show more

11 Read more

Scientific Representation, Denotation, and Fictional Entities

Scientific Representation, Denotation, and Fictional Entities

mapping the elements of the language into a domain of independent entities endowed with their own properties. Hence, take a set of sentences in some particular language; the ‘interpretative mapping’, on this account, provides them with a ‘semantics’ under which they may be said to be true or false. But it is doubtful that this is the same ‘interpretation’ that is involved in the DDI account, since to the extent that the model source contains sentences at all, they already come fully interpreted in terms of the model itself. It seems more appropriate to think of it as an instance of ‘application’: it applies the model source to the target in order to derive results of interest regarding the target itself. Now, there is no doubt that the application of the model is constrained by the relation of denotation established in the first stage of the DDI account, but it also brings a large degree of freedom in two respects at least. Firstly, the denotation relation by itself does not stipulate which parts of the target object correspond to which parts of the source object, and there is always plenty of leeway at this point. In the Galileo example the mere fact that the geometrical diagram denotes the kinematical situation does not settle which parts of the diagram stand for which parts of the kinematics. But more importantly the mere fact of denotation does not determine how the source is to be conceived in the first place, i.e. how it is to be divided into parts that can then be related to the target. And it is, however, clear that the application of the source to the target does require a partition of the source into relevant parts and properties (a “structure”), and the relating of such “structure” to a similar “structure” of parts and properties in the target. Thus in Galileo’s modelling example, the geometrical diagram must clearly distinguish vertical and horizontal lines at every point, and the area therein comprised. Similarly the kinematical problem must clearly identify time intervals, speed of motion at every instant, and constant or accelerated motion across the interval. Etc.
Show more

15 Read more

Automatic Expansion of Feature Level Opinion Lexicons

Automatic Expansion of Feature Level Opinion Lexicons

Other works use the lexical resource Word- Net(Fellbaum, 1998) to compute the semantic ori- entation of a given word or phrase. For example, in (Kamps et al., 2004), a distance function between words is defined using WordNet synonymy rela- tions, so the semantic orientation of a word is cal- culated from the distance to a positive seed (“good”) and a negative seed (“bad”). Other works use a big- ger set of seeds and the synonyms/antonyms sets from WordNet to build an opinion lexicon incremen- tally (Hu and Liu, 2004a; Kim and Hovy, 2004). In other works (Esuli and Sebastiani, 2006; Bac- cianella et al., 2010; Esuli and Sebastiani, 2005), the basic assumption is that if a word is semantically oriented in one direction, then the words in its gloss (i.e. textual definitions) tend to be oriented in the same direction. Two big sets of positive and nega- tive words are built, starting from two initial sets of seed words and growing them using the synonymy and antonymy relations in WordNet. For every word in those sets, a textual representation is obtained by collecting all the glosses of that word. These textual representations are transformed into vectors by stan- dard text indexing techniques, and a binary classifier is trained using these vectors. The same assumption about words and their glosses is made by Esuli and Sebastiani (2007), but the relation between words and glosses are used to build a graph representation of WordNet. Given a few seeds as input, two scores of positivity and negativity are computed, using a random-walk ranking algorithm similar to PageR- ank (Page et al., 1998). As a result of these works, an opinion lexicon named SentiWordNet (Baccianella et al., 2010) is publicly available. We are also us- ing a ranking algorithm in our expansion method, but applying it to a differently built, domain-specific graph of terms.
Show more

7 Read more

Ontology based Semantic Search Engine for Cancer

Ontology based Semantic Search Engine for Cancer

RDF is a metadata language that does not provide special vocabulary for describing the resources. It is often essential to be able to describe more of a subject than saying it is a resource. Some form of classification for these resources is often required to be able to be able to provide a more precise and correct mapping of the world. The basic idea behind Semantic Web is to provide meaning of resources, as defined in the Knowledge Representation domain, "knowledge is descriptive and can be expressed in a declarative form" [16]. The formalization of knowledge in declarative form begins with a conceptualization. This formalization includes the objects presumed or hypothesized to exist in the world. This is why RDF schema (RDFS) was introduced as a language that provides formal conceptualization of the world. RDF Schema semantically extends RDF to enable us to talk about classes of resources, and the properties that will be used with them. The RDF schema defines the terms that will be used in RDF statements and gives specific meanings to them. It provides mechanisms for describing groups of related resources and the relationships between these resources. Meaning in RDF is expressed through reference to the schema. RDFS consists of a collection of RDF resources that can be used to describe properties of other RDF resources this makes it a simple ontology language which allows more capture of semantics than just pure RDF. The most important resources described in RDFS are:
Show more

5 Read more

Semantic Web Domain Knowledge Representation Using Software Engineering Modeling Technique

Semantic Web Domain Knowledge Representation Using Software Engineering Modeling Technique

to the UML infrastructure has been proposed. The metamodel MOF (Meta-Object Facility) has been introduced (Baclawski et. al., 2001, Baclawski, Kokar & Aronson, 2001). The architecture followed by this meta- metamodel is called as Model Driven Architecture MDA. The MDA divorces implementation details from business functions. Thus it is not necessary to repeat the process of modeling an application or system functionality and behaviour each time a new technology comes along. Also it is driving UML more and more formal, so that is useful when it comes to using it for the automatic inferencing by agents. A complete MDA specification consists of a platform independent model like UML plus one or more PSMs (Platform-Specific Models) (Anneke, 2003). The problem of transformation between ontology and MDA- based languages is solved using XSLT. But still no commercial MDA tools are available which can process the models at M3 and M2 layers. The existing UML tools support the models till M1 layer very well (Djuric et. al., 2004).
Show more

10 Read more

LexScore: A Semantic Approach to Scoring Domain Specific Sentiment Lexicons

LexScore: A Semantic Approach to Scoring Domain Specific Sentiment Lexicons

Lingjia Deng, Janyce Wiebe, and Yoon- jung Choi. 2014. Joint inference and disambiguation of implicit sentiments via implicature constraints. In COLING. Dengyong Zhou, Olivier Bousquet, Thomas Navin Lal, Jason Weston, and Bernhard Scholkopf. 2004. Learning with local and global consistency. In NIPS. Duyu Tang, Furu Wei, Bing Qin, Ming Zhou, and Ting Liu. 2014. Building Large-Scale Twitter- Specific Sentiment Lexicon: A Representation Learning Approach. In COLING.

5 Read more

Finding relevant semantic association paths through user-specific intermediate entities

Finding relevant semantic association paths through user-specific intermediate entities

Nowadays e-learning systems [2] have become very popular in higher education. In an e-learning environment, user may expect to know the relationships between two concepts or entities. For example, how person ‘X’ is related to person ‘Y’. To answer this question, we need to find the semantic association paths between person ‘X’ and person ‘ Y ’ . Since, person ‘ X ’ may be related to person ‘ Y ’ with one or more intermedi- ate entities, the results may be too large depending on the size of the RDF graph and most of the relationships may be irrelevant to the user. In order to filter the irrelevant paths user may introduce one more well-known intermediate entity. For example, user wants to find the relationship paths between person ‘ X ’ and person ‘ Y ’ with respect to person ‘ Z ’ , he/she can introduce ‘ Z ’ as intermediate entity to find the paths. Now, we get only paths between entity ‘X’ and entity ‘Y’ that pass through the entity ‘Z’. So, the size of the resultant path set can be reduced and relevancy may be improved. The challenge here is to find one or more useful paths in a quick and efficient way. A good path can bring valuable information about the relation between the two entities.
Show more

11 Read more

Recovering Implicit Information

Recovering Implicit Information

Implicit entities may be either empty syntactic constituents in sentence fragments or unfilled semantic roles associated with domain-specific verb decompo~'Jitions, in this way the task [r]

10 Read more

Representation and Inference for Open-Domain QA: Strength and Limits of two Italian Semantic Lexicons

Representation and Inference for Open-Domain QA: Strength and Limits of two Italian Semantic Lexicons

In the prototype, the most exploited type of semantic relation is hyperonymy. Nevertheless, the observation of system failures shows that, even if links driven by the hyperonymy relation are the most exploited in our prototype, they are however not completely reliable. What seems to happen is that the IS-A relation has become a sort of repository of different aspects of meaning, aspects that collapse into the same label losing their important distinctions. Important reference for this kind of considerations is the work done by the Guarino and Gangemi’s research group and resulted in the OntoClean methodology (Gangemi et al., 2001). One of the problems raised by analysing WordNet with OntoClean is what is called the ISA overloading phenomenon. In our computational lexicons there is an over exploitation of the ISA expressive means, used to express purpose, function, origin, material, part-whole information etc. For example, when the system tries to determinate the specificity of the keyword ingrediente (ingredient) in the question Qual è un ingrediente base della cucina giapponese? 15 , fails both when uses IWN and SIMPLE-CLIPS. As a matter of fact, the two semantic lexicons represent the word meaning ingrediente as a synset and a SemU without hyponyms, at the same level with other substances such as cibo (food), insetticida (insect-powder), cemento (cement), etc. In this
Show more

6 Read more

The semantic representation of spatial configurations: a conceptual motivation for generation in Machine Translation

The semantic representation of spatial configurations: a conceptual motivation for generation in Machine Translation

The semantic representation of spatial configurations a conceptual motivation for generation in Machine Translation The semantic representation of spatial configurations a conceptual motivation for ge[.]

6 Read more

Model-based documentation

Model-based documentation

With the capabilities described in Sections 3.1 and 3.2, MBSE and S1000D-based MBD can partially address Problems 1, 2 and 4. With respect to Problem 1, both the MBSE implementation using SysML and the S1000D implementation lack the support for creating engineering (product) design models with CAD design tools. Without the inclusion of engineering design models, documentation often remains incomprehensible and incomplete. To overcome this issue, a CAD design tool can be integrated with these implementations. In connection with Problems 2 and 4, search on the content of these implementations is limited to the keyword matching only without taking into account the meaning of the data. In addition to this, inference capability is missing in these implementations. To enable meaning and inference dependent search, domain ontology can be used [20].
Show more

7 Read more

ERNIE: Enhanced Language Representation with Informative Entities

ERNIE: Enhanced Language Representation with Informative Entities

Although pre-trained language representation models have achieved promising results and worked as a routine component in many NLP tasks, they neglect to incorporate knowledge in- formation for language understanding. As shown in Figure 1, without knowing Blowin’ in the Wind and Chronicles: Volume One are song and book respectively, it is difficult to recognize the two oc- cupations of Bob Dylan, i.e., songwriter and writer, on the entity typing task. Furthermore, it is nearly impossible to extract the fine-grained relations, such as composer and author on the relation classification task. For the existing pre-trained language representation models, these two sentences are syntactically ambiguous, like “UNK wrote UNK in UNK”. Hence, considering rich knowledge information can lead to better lan- guage understanding and accordingly benefits var- ious knowledge-driven applications, e.g. entity typing and relation classification.
Show more

11 Read more

Design of a Knowledge Based Report Generator

Design of a Knowledge Based Report Generator

Three fundamental principles of the technique are its use of domain-specific semantic and linguistic knowledge, its use of macro-level semantic and linguistic constructs such as whole me[r]

6 Read more

Legal regulation of representation of foreign business entities

Legal regulation of representation of foreign business entities

Rossylna O. V. Legal regulation of representation of foreign business entities. This article is devoted to the legal regulation of foreign business entities. The aim of the research is to analyze the legal status of persons, who can represent interests of foreign business entities` in business procedure. Conditions of consul`s proce- dural representation are determined. If business entity is unable to protect its rights and interests, consul represents its interests at court for as long as it could appoint a representative or it will be able to defend their rights and interests personally. Consul performs its functions personally or authorizes their execution to another consular official. Consul performs their duties without warrant of authority. The legal status of foreign attorney, that he has in Ukraine, and foreign experience in this issue are analyzed in details. The Parliament enshrined the model of «open doors», i.e. a model of almost unlimited access to the market of paid legal services for foreign attorneys. The author noticed that such simplified procedure for the ad- mission of foreign lawyers to practice of advocacy does not always promote the high standards of legal aid. Legal status of Ukraine`s attorney depends on the pol- icy of a particular State to the model of foreign attorney`s admission to the mar- ket of legal services.
Show more

9 Read more

Comparison of MetaMap and cTAKES for entity extraction in clinical notes

Comparison of MetaMap and cTAKES for entity extraction in clinical notes

In this paper, we compared the automatic extraction of 14 obesity comorbidities using MetaMap and cTAKES. Automatic extraction was compared to manual annota- tion by experts. The result of the experiments we con- ducted proved that cTAKES slightly outperforms MetaMap, but this situation could change considering other configuration options that each tool has such as the abbreviations list in the MetaMap tool. Moreover, we worked with two types of aggregations: aggregation of CUIs with the same semantic type and aggregation of CUIs with different semantic types. These groups im- prove the results. Hence, the use of cTAKES or even MetaMap, using the proposed aggregations, can repre- sent a good strategy to replace the manual extraction of medical entities.
Show more

7 Read more

Show all 10000 documents...