Distributional semantic modelling of word mean- ing has become a popular method for building pretrained lexical representations for downstream Natural Language Processing (NLP) tasks (Baroni and Lenci, 2010; Mikolov et al., 2013; Penning- ton et al., 2014; Levy and Goldberg, 2014; Bo- janowski et al., 2017). In this approach, meaning is encoded in a dense vector space model, such that words (or concepts) that have vector representa- tions that are spatially close together are similar in
As motivated in the previous section, the main objective of this study is to develop a frame- work for semantic modelling of German adjective- noun collocations. To assess the applicability of FrameNet for modelling of collocations, we have investigated eleven frames for nouns from various semantic fields (see Table 1). The corresponding semantic fields were assigned according to the in- formation from the German wordnet GermaNet, and the estimates about the degree of concreteness of the chosen nouns are provided by the MRC Psy- cholinguistic Database (Wilson, 1988). The nom- inal bases have been chosen on the basis of fre- quency and richness of collocates. The stage of choosing the candidates for modelling showed that there are significant differences in the behaviour of concrete and abstract nouns: the latter ones have a greater number and a richer variety of collocates (see Table 2). As explained in the previous sec- tion, we employ English FrameNet for German collocations. Semantic Frames in FrameNet de- scribe non-linguistic concepts and deal with mean- ings rather than with particular lexical units in a language. Thus, a correct translation of the tar- get German word into English makes it possi- ble to apply the information contained in the En- glish FrameNet to German data. In collocations, it is only the collocate (the adjective) that is lan- guage specific, and thus is problematic to translate. However, we consider the semantically transpar- ent base (noun) to be the frame-evoking word, and such words do not cause any difficulties for trans- lation.
10 Read more
In parallel, we have been working on controlled vocabularies using the Simple Knowledge Organization System SKOS  for structuring and organizing significant semantic components. “SKOS is an area of work developing specifications and standards to support the use of knowledge organization systems (KOS) such as thesauri, classification schemes, subject heading systems and taxonomies within the framework of the Semantic Web.”  SKOS adapts principles such as equivalency, hierarchical and associative relationship for expressing semantic structures, but in contrast to traditional approaches, this standard makes embedded knowledge explicit in a formalized, machine-understandable way. SKOS uses the Resource Description Framework RDF , which is another W3C-standard that provides a model for creating statements on resources and their properties in the form of subject-predicate-object expressions, also known as RDF triples. The notions the tribe (subject) has the name (predicate) Wedda (object) and the Wedda (subject) live in (predicate) Ceylon (object) are simple examples. “Asserting an RDF triple says that some relationship, indicated by the predicate, holds between the resources denoted by the subject and object. This statement corresponded to an RDF triple is known as an RDF statement” . Resources, which can range from physical objects to abstract concepts, or other entities like numbers and strings, can be subject and/or object of multiple statements, thus forming a semantic network. All of the components involved are identified by Uniform Resource Identifiers URIs , which enables information systems to retrieve entities and to interpret how they are interrelated. In addition, this allows for the combination of resources from different datasets in order to enhance semantic expressiveness. Thus, SKOS/RDF supports the publication, alignment, exchange, and reuse of machine- as well as human-readable vocabularies e.g. as Linked Data .
The main means to enter the problem was the " A c tiv ity and E n tity Cycle Input". This was chosen since an a c t i v i t y cycle diagram fo r the problem was given. The natural language in te rfa c e was most useful fo r a r r iv a l mechanism entry. By simply saying "x arrives every y minutes", SASIM creates a door mechanism (where an e n t it y a r riv e s , based on a s t a t i s t i c a l d is t r ib u t io n , from an i n d e f i n i t e l y large pool of e n t i t i e s ) . The considerable size of the network presented a few memory and location problems. For example there are 25 a c t i v i t i e s , each having a lin k with the object "ACTIVITY". I t is not easy to show a ll these links on a small PC screen, while s t i l l maintaining l e g i b i l i t y . The f u l l semantic network can be seen in appendix 4.
190 Read more
We note that the majority of annotations were document-level annotations and all entities were normal- ized. In many cases, the semantic modelling of annota- tions was straightforward. We discuss different several semantic interoperability issues that we faced. First, there are several inconsistencies in annotations: (1) The OMM Impact corpus provides impact sentence annotations and, in contrast to the Enzyminer corpus, it does not dis- tinguish between different types of impacts or protein properties. (2) The DHLA corpus does not annotate text sentences with impact facts, but represents them on the document level (see Table 3). (3) Not all mutations in the Enzyminer corpus are grounded to UniProt identifiers. The corpora also differs in their text fragment bound- aries. For example, in the OMM Impact corpus sentences with multiple impact mentions are annotated multiple times, whereas in the Enzyminer corpus only relevant sen- tence fragments are annotated. Another issue we faced was resolution to different identifiers, e.g proteins are nor- malized to EC numbers in the OMM Impact corpus and to UniProt IDs in other corpora. Moreover, we observed some inconsistency in the annotations of categories of protein properties. For example, the Enzyminer corpus is annotated with protein molecular functions (normalized to Gene Ontology classes) and a couple of kinetic prop- erties such as the catalytic rate constant (Kcat) and the Michaelis constant (Km). In contrast, the OMM Impact corpus is annotated with sentences containing more pro- tein properties, such as the maximal speed of activity (Vmax), dissociation constant (Kd) and thermostability.
13 Read more
The recipe-based modelling methodology rests on the fact that relevant knowledge exist in specific domains which can be reused to promote learning and reduce design cycle times. For example, a product model has embedded process knowledge which can be extracted and used to produce first hand process model. Also process models have implicit resource knowledge which can be used as the basis for creating resource models. An ability to synchronise these bodies of knowledge will influence next generation approach to manufacturing systems modelling science. This will help reduce the amount of effort required for planning and designing new products, processes and resources. This is the thinking behind the recipe-based semantic modelling methodology for manufacturing systems design. Basically, the concept relies on the derivation of a library of ‘pre-defined product-process configuration recipes’ which can semantically be matched to a set of production resource requirements, so that based on semantic rules, logics and appropriate matching of ‘concepts’, possible solutions can be pulled from existing databases of recipes and their associated modelling libraries. A manufacturing system recipe refers to predefined ‘patterns of resource solutions’ matching product and process requirements of a given production system. The basic idea behind enacting such an approach is to provide current and future designers with abstract descriptions of reusable components (or building blocks) of manufacturing systems and also allow them to select among predicted suitable sets of resource systems (people, machines and computers). This is considered important because when manufacturing systems’ requirements change, the resource systems can dynamically be reconfigured to meet the new requirements. To a larger extent, recipes of manufacturing systems solutions comprise various systems of layouts, people, production and assembly machines, utility systems and computers which are often configured based on different organisational structures, constraints, demand, and data so that they function appropriately to meet product-process requirement. This is achieved through a common high level semantic language which acts as a communication backbone between product, process and resource sets of data. They dwell on models and methods required to enable the convergence of meaning across the life cycle of virtual system development.
The semantic modelling of tourism information enables intelligent tourism information systems to provide personalized services. An intelligent tourism information system includes ontology-driven subject domain and repository of tourism information. It is adaptive to user’s needs (e.g. a user requires to be informed about transportation, restaurants, accommodation, services, weather, events, itinerary tips, shopping, nightlife, daily excursion, car rental, sport activities…). Information management tasks are annotated in terms of subject domain concepts which are used as a basis for implementing intelligent system’s adaptive behavior. The system’s adaptive behavior to users’ needs is obtained by attaching semantic metadata to its information modules. For achieving this, tourism concepts ontologies (being used) must be also aligned with the ontologies defining its context and the user’s profile. The system’s adaptability requires the tourism information of the knowledge base to be modeled using multiple descriptions (viz. using various templates associated with the user’s needs). In the LA_DMS project, Kanellopoulos et al. (2005) proposed a layer-based approach for semantic labeling of a tourism destination information. The layers of their semantic labeling reflect a higher level of semantics and constitute sub-models, such as tourism destination model, user’s model (user’s preferences) and machine’s model (e.g. presentation properties). As a result, the LA_DMS model enables DMS to provide personalized information services for tourism destinations.
19 Read more
Abstract. Despite the large number of Natural Language Interfaces to Databases (NLIDB) that have been implemented, they do not guarantee to provide a correct response in 100% of the queries. In this paper, we present a way of semantic modelling the elements that integrate the knowledge of a NLIDB with the aim of increasing the number of correctly-answered queries. We design semantic representations in order to: a) model any relational database schema and its relationship with the natural language and b) add metadata to natural language words to enable our NLIDB to interpret natural language queries that contain superlatives. We configured our NLIDB in a relational database that we migrated from Geobase and used the Geoquery250 corpus to evaluate its performance. We compare its performance with the interfaces ELF, Freya and NLP-Reduce. The results indicate that our proposal allowed our NLIDB to obtain the best performance.
15 Read more
We have introduced a method to identify and model the salient features from a given domain as directions in a semantic space. Our method is based on the observation that there is a trade- off between accurately modelling similarity in a vector space, and faithfully modelling features as directions. In particular, we introduced a post- processing step, modifying the initial semantic space, which allows us to find higher-quality di- rections. We provided qualitative examples that illustrate the effect of this fine-tuning step, and quantitatively evaluated its performance in a num- ber of different domains, and for different types of semantic space representations. We found that after fine-tuning, the feature directions model the objects in a more meaningful way. This was shown in terms of an improved performance of low-depth decision trees in natural categorization tasks. However, we also found that when the con- sidered categories are too specialized, the fine- tuning method was less effective, and in some cases even led to a slight deterioration of the re- sults. We speculate that performance could be im- proved for such categories by integrating domain knowledge into the fine-tuning method.
11 Read more
To populate the preferences of our user profile model we make use of existing semantic annotators that are able to extract the subset of concepts expressed by the users in their posts. At the moment we make use of TextRazor to extract these concepts  but other systems, such as Alchemi API  or DBPediaSpotlight  could also be used. Note that TextRazor extracts concepts from DBPedia and FreeBase, to our knowledge, two of the most complete knowledge bases up to date Note that concepts with a confidence score lower than 3, in a scale from 0.5-10, are discarded. The preference level of the user for the concept is based on a sentiment analysis of the content. The SentiCircles sentiment analysis approach is used to compute the sentiment of the extracted concepts .
15 Read more
Stirling’s diversity f ramework  with three basic properties (variety, balance and disparity) is used. A more thorough discussion of the model can be found in . Do- main diversity of the comments is the focus here. An ontology is used to represent the domain. Each comment is linked to a set of ontology entities (e.g. through semantic tagging). Given an ontology representing the domain Ω , a pool of comments linked to a set of entities E from Ω , and a class in the ontology taxonomy T providing the entry category for which diversity will be calculated as follows.
The main characteristic of using HMM-LDA classes for feature selection is that the set of words in the syntactic classes and the set of words in the semantic class are not disjoint. In fact, there is quite a large overlap. In this and the next subsections, we dis- cuss ways to remedy and even exploit this situation to get a higher level of accuracy. In the Pang et al. movie review data, there is about 35% overlap be- tween words in the syntactic and semantic classes for η = 0.9. Our first systematic approach attempts to gain better accuracy by lowering the ratio of se- mantic words in the final feature set.
WSMO is one of the most important technologies for Semantic Web service. It complements the existing syn- tactic Web service standards, by providing a conceptual model and language for the semantic markup describing all relevant aspects of general services which are accessible through a Web service interface. This paper has presented an Object-Z semantics for WSMO, whereby the WSMO constructs are modeled as objects. The advantage of this approach is that the abstract syntax and static and dynamic semantics for each the WSMO construct are grouped to- gether and captured in an Object-Z class; hence the lan- guage model is structural, concise and easily extendible. Subsequent work will address and complete the dynamic semantics of WSMO. We believe this OZ specification can provide a useful document for developing support tools for WSMO.
10 Read more
Intentions comprise of semantic relationships that express a human’s goal-oriented private states of mind, including intents, objectives, aims, and pur- poses. As a relation, it encodes information that might not be explicitly stated in text and its detec- tion might require inferences and human judgment. The answer to the question What was Putin trying to achieve by increasing military cooperation with North Korea? is found in the sentence Putin is at- tempting to restore Russia’s influence in the East Asian region. Extracting the exact answer to restore Russia’s influence in the East Asian region becomes easier if this is recognized as Putin’s intention which matches the question’s expected answer.
ix populate the KB. The KB consists of a hierarchy of reusable and usable ontologies that together generically model a coral reef ecosystem in a “computer-understandable” form. The ontologies range from informal through to formal and, when coupled to datasets, derive inferences from data to “ask” the KB questions for semantic correlation, synthesis and analysis. The ontology design leverages the scalable and autonomic characteristics of semantic technologies such as modularity, reuse and the ability to link latent connections in data through complex logic systems.
29 Read more
The presence of ‘necessarily’ in this sentence is somewhat redundant, in that its meaning is incorporated by the notion of entailment. I insist, however, on emphasising it to indicate that semantic properties that are accidentally associated with the meaning of a particular use of a verb will not be annotated. Dowty points out that entailments of the predicate must be distinguished from what follows from any one sentence as a whole (e.g. entailments that may arise from NP meanings) (Dowty, 1991:572, footnote 16). For example, in the sentence Mary slapped John, assuming that John is a human entity, it follows from the meaning of the sentence that John will perceive something as a result of the action of slapping. But this ‘entailment’ is not intrinsically tied to the meaning of slap, because the sentences Mary slapped the table or Mary slapped the corpse are also felicitous. That is, sentience of the direct object is not an essential component of the semantics of slap, in the way it is for a verb like awaken. The sentences Mary awakened the table and Mary awakened the corpse are clearly anomalous. True entailments of predicators (which are the ones that will be annotated) must be detectable in every possible environment in which the predicator is used.
10 Read more
SocialSensor 2 is a 3-year FP7 European Integrated Project aiming to tackle some of the challenges outlined above and offer solutions as well as improvements. In the project framework, new techniques for analysis, aggregation and real-time search of user- generated content will be developed, in order to extract useful information and make it available for use in different applications. Innovative solutions from the fields of information extraction and retrieval, social network analysis, user modelling, semantic web services, and media adaptation, delivery and presentation, will compose a software platform that crawls and analyses multimedia UGC from the social web, combines it with professional content, and makes it searchable for professional users, but also recommends, delivers and presents it to media consumers depending on their context and their personal profile. To achieve this, crucial issues have to be tackled, such as the sheer data volume, its heterogeneity and low quality.
In order to model effects like these, we need to extend existing models of sentence process- ing by introducing a semantic dimension. Pos- sible ways of integrating different sources of in- formation have been presented e.g. by McRae et al. (1998) and Narayanan and Jurafsky (2002). Our aim is to formulate a model that reliably pre- dicts human plausibility judgements from corpus resources, in parallel to the standard practice of basing the syntax component of psycholinguistic models on corpus probabilities or even probabilis- tic treebank grammars. We can then use both the syntactic likelihood and the semantic plausibility score to predict the preferred syntactic alterna- tive, thus accounting for the effects shown e.g. by McRae et al. (1998).
Software systems such as Kepler , Taverna , Triana  are tools that allow scientists to capture scientific workflows. The software chosen for the data flow implementation of the Semantic Reef architecture is Kepler, which is an open-source scientific workflow tool [16, 19]. The choice of Kepler as the workflow system was motivated predominantly due to the flexibility in workflow design and manipulation. As shown in a taxonomic study of workflow systems by Yu , Kepler is a user directed system that supports flexible data movement methods. These methods include: A centralised approach where data is transferred between resources via a central point; a mediated approach where the locations of the data are managed by a distributed data management system; and a peer-to-peer approach where data is transferred between processing resources . The flexible data movement supported by Kepler workflows enables access to a diverse range of data resources, such as the distributed data repositories and streaming sensor data required to populate the ontologies within the KB.
28 Read more
ABSTRACT: Pattern mining is an important research area in data mining and knowledge discovery. The data mining concept is used in the field of information filtering for generating user’s information needs from a collection of documents. Topic modelling has become one of the most popular probabilistic text modelling techniques that has been quickly accepted by machine learning and text mining communities. The most important contribution of topic modelling is that it can automatically classify documents in a collection by choosing a number of topics that represents every document with multiple topics and their corresponding distribution. Patterns are always more discriminative than single terms for describing the documents. Selection of the most discriminative and representative patterns from the huge amount of discovered patterns becomes crucial. Topic Modelling provide a suitable way to analyse large number of unclassified text. In Maximum Pattern Based Topic Modelling (MPBTM) user information represented in terms of pattern. In MPBTM semantic features of pattern is considered in the document modelling. Since here proposed an efficient ranking method, using MPBTM by semantically analyse the pattern. Open English NLP library used for filtering semantic meanings of patterns from the collections of topics. The main features of the proposed model include: (1) Each topic is represented by patterns (2) For more information filtering, here proposed Open English 2.0NLP(3) Give more accurate document modelling method for ranking.