In recent years, distributional models have been extended to handle the semantic composition of words into phrases and longer constituents (Baroni and Zamparelli 2010; Mitchell and Lapata 2010; Socher et al. 2012), building on work that the Cognitive Science community had started earlier on (Foltz 1996; Kintsch 2001, among others). Although these models still do not account for the full range of composition phenomena that have been examined in FormalSemantics, they do encode relevant semantic information, as shown by their success in demanding semantic tasks such as predicting sentence simi- larity (Marelli et al. 2014). Compositional DistributionalSemantics allows us to model semantic phenomena that are very challenging for FormalSemantics and more generally symbolic approaches, especially concerning content words. Consider polysemy: In the first three sentences in Figure 1(a), postdoc refers to human beings, whereas in the fourth it refers to an event. Composing postdoc with an adjective such as tall will highlight the human-related information in the noun vector, bringing it closer to person, whereas composing it with long will highlight its eventive dimensions, bringing it closer to time (Baroni and Zamparelli 2010, Boleda et al. 2013 as well as Asher et al. and Weir et al., this issue); crucially, in both cases, the information relating to research activities will be preserved. Note that in this way DS can account for polysemy without sense enumeration (Kilgarriff 1992; Pustejovsky 1995).
Distributionalsemantics aims to induce the mean- ing of language from unlabelled text. Traditional approaches to distributionalsemantics have repre- sented semantics in vector spaces (Baroni et al., 2013). Words are assigned vectors based on col- locations in large corpora, and then these vectors a composed into vectors representing longer utter- ances. However, so far there is relatively limited empirical evidence that composed vectors provide useful representations for whole sentences, and it is unclear how to represent logical operators (such as universal quantifiers) in a vector space. While future breakthroughs may overcome these limita- tions, there are already well developed solutions in the formalsemantics literature using logical rep- resentations. On the other hand, standard for- mal semantic approaches such as Bos (2008) have
This work has intentionally left the data as raw as possible, in order to keep the noise present in the models at a realistic level. The combination of Machine Learning and DistributionalSemantics here advocated suggests a very promising perspective: transformation functions corresponding to different syntactic relations could be learned from suitably processed corpora and then combined to model larger, more complex structures, probably also recursive phenomena. It remains to prove if this approach is able to model the symbolic, logic-inspired kind of compositionality that is common in FormalSemantics; be- ing inherently based on functional items, it is at present time very difficult and computationally intensive to attain, but hopefully this will change in the near future.
Vector space models have become popu- lar in distributionalsemantics, despite the challenges they face in capturing various semantic phenomena. We propose a novel probabilistic framework which draws on both formalsemantics and recent advances in machine learning. In particular, we sep- arate predicates from the entities they refer to, allowing us to perform Bayesian infer- ence based on logical forms. We describe an implementation of this framework us- ing a combination of Restricted Boltz- mann Machines and feedforward neural networks. Finally, we demonstrate the fea- sibility of this approach by training it on a parsed corpus and evaluating it on estab- lished similarity datasets.
Distributionalsemantics has revolutionised com- putational semantics by representing the meaning of linguistic expressions as vectors that capture their co-occurrence patterns in large corpora (Tur- ney et al., 2010; Erk, 2012). This strategy has been shown to be very successful for modelling word meaning, and it has recently been expanded to cap- ture the meaning of phrases and even sentences in a compositional fashion (Baroni and Zamparelli, 2010; Mitchell and Lapata, 2010; Grefenstette and Sadrzadeh, 2011; Socher et al., 2012). Distribu- tional semantic models are often presented as a robust alternative to representing meaning, com- pared to symbolic and logic-based approaches in formalsemantics, thanks to their flexible represen- tations and their data-driven nature. However, cur- rent models fail to account for aspects of meaning that are central in formalsemantics, such as the re- lation between linguistic expressions and their ref- erents or the truth conditions of sentences. In this position paper we focus on one of the main limita- tions of current distributional approaches, namely,
However, historically, there has been no obvious way to extract the meaning of lin- guistic utterance beyond words. Unlike formalsemantics, distributionalsemantics lacks a framework to construct the meaning of phrases and sentences based on their components. The question of assessing meaning similarity above the word level within the distributional paradigm has received a lot of attention in recent years (Mitchell and Lapata, 2008, 2010; Zanzotto et al., 2010; Guevara, 2010; Baroni and Zamparelli, 2010; Coecke et al., 2010). A number of compositional frameworks have been proposed in the literature, each of these defining operations to combine word vectors into representations for more complex struc- tures such as phrases. However, these works pay more attention to simple phrases, e.g. adjective noun construction, with limited experiments at the sentence level. This thesis focus on how to obtain vectors for sentences using distributional methods, as sentences are the basic language structure that human uses to interact with each other. Moreover, It looks at how information such as syntactic structure can be incorporated into the process of building sentence representations.
Any NLP system that does semantic processing relies on the assumption of semantic compositionality: the meaning of a phrase is determined by the meanings of its parts and their combination. For this, it is necessary to have automatic methods that are capable to reproduce the compositionality of language. Recent years have shown the renaissance of interest in distributionalsemantics. While distributional methods in semantics have proven to be very efficient in tackling a wide range of tasks in natural language processing, e.g., document retrieval, clustering and classification, question answering, query expansion, word similarity, synonym extraction, relation extraction, and many others, they are still strongly limited by being inherently word-based. The main hurdle for vector space models to further progress is the ability to handle compositionality.
We define a set S of all aspects as the set of pairs (s, σ), where s is the syntactic type of an aspect (for example in the Lambek calculus) and σ is the semantics of the aspect (for example described in the lambda calculus). We can extend S by defining a product on such pairs reducing each element to a normal form. This defines a semigroup: the Lambek calculus can be described in terms of a residuated lattice, which is a partially ordered semigroup (Lambek, 1958), and the lambda calculus is equivalent to a Cartesian closed category under β-equivalence (Lambek, 1985), which can be considered as a semigroup with additional structure.
Distributionalsemantics creates vector- space representations that capture many forms of semantic similarity, but their re- lation to semantic entailment has been less clear. We propose a vector-space model which provides a formal foundation for a distributionalsemantics of entailment. Us- ing a mean-field approximation, we de- velop approximate inference procedures and entailment operators over vectors of probabilities of features being known (ver- sus unknown). We use this framework to reinterpret an existing distributional- semantic model (Word2Vec) as approxi- mating an entailment-based model of the distributions of words in contexts, thereby predicting lexical entailment relations. In both unsupervised and semi-supervised experiments on hyponymy detection, we get substantial improvements over previ- ous results.
use the Scale-Invariant Feature Transform (SIFT) to depict the keypoints in terms of a 128-dimensional real-valued descriptor vector. Color version SIFT descriptors are extracted on a regular grid with five pixels spacing, at four multiple scales (10, 15, 20, 25 pixel radii), zeroing the low contrast ones. We chose SIFT for its invariance to image scale, ori- entation, noise, distortion and partial invariance to illumination changes. To map the descriptors to vi- sual words, we cluster the keypoints in their 128- dimensional space using the K-means clustering al- gorithm, and encode each keypoint by the index of the cluster (visual word) to which it belongs. We varied the number of visual words between 250 and 2000 in steps of 250. We then computed a one-level 4x4 pyramid of spatial histograms (Grauman and Darrell, 2005), consequently increasing the features dimensions 16 times, for a number that varies be- tween 4K and 32K, in steps of 4K. From the point of view of our distributional semantic model construc- tion, the important point to keep in mind is that stan- dard parameter choices such as the ones we adopted lead to distributional vectors with 4K, 8K, . . . , 32K dimensions, where a higher number of features cor- responds, roughly, to a more granular analysis of an image. We used the VLFeat implementation for the entire pipeline (Vedaldi and Fulkerson, 2008). See the references in Section 2.2 above for technical de- tails.
The original aim of distributionalsemantics is to model the similarity of the meaning—the semantics—of words. The basic assumption underlying this approach is that the se- mantic similarity of two words is a function of their con- texts. That is, in other words, the meaning of words can be inferred from the frequencies of the words they imme- diately occur with and this can happen in such a way that the degree of similarity of those meanings can be measured. Semantic similarity is a key concept in the modeling of lan- guage and thus in computational linguistics. It is crucial in a variety of linguistic applications influencing our everyday life such as search engines.
We explore the impact of adding distri- butional knowledge to a state-of-the-art coreference resolution system. By inte- grating features based on word and context expansions from a distributional thesaurus (DT), automatically mined IS-A relation- ships and shallow syntactical clues into the Berkeley system (Durrett and Klein, 2013), we are able to increase its F1 score on bridging mentions, i.e. coreferent men- tions with non-identical heads, by 8.29 points. Our semantic features improve over the Web-based features of Bansal and Klein (2012). Since bridging mentions are a hard but infrequent class of coreference, this leads to merely small improvements in the overall system.
The most general representation of a distributional model takes the form of a sparse matrix, with entries specified as a triplet of row label (target term), column label (feature term) and co-occurrence frequency (cf. left panel of Fig. 1). The wordspace package creates DSM objects from such triplet representa- tions, which can easily be imported into R from a wide range of file and database formats. Ready-made import functions are provided for TAB-delimited text files (as used by DISSECT), which may be com- pressed to save disk space, and for term-document models from the text-mining framework tm for R.
Distributional studies of compositionality differ in what they actually try to model. Of most relevance here are composition models that try to model human judgements about XNs with the help of the vectors of their constituents and some compositionality function. Mitchell and Lapata (2010), for example, try to model human responses to a compound noun similarity task. Marelli et al. (2012) investigate the relation between distribution-based semantic transparency measures of compounds and constituent frequency effect in lexical decision latencies. Reddy et al. (2011) is a very good example where compositionality clearly corresponds to semantic transparency. While the term ‘semantic transparency’ does not occur in the paper, Reddy et al. (2011, 211) adapt the following definition of compound compositionality proposed in Bannard et al. (2003, 66): “[. . . ] the overall semantics of the MWE [multi word expression] can be composed from the simplex semantics of its parts, as described (explicitly or implicitly) in a finite lexicon.” This is reminiscent of Plag’s definition of semantic transparency, and the link to semantic transparency becomes even clearer when looking at their operationalisation of the term. For the purposes of their paper, compositionality is equated with literality, and the aim of their models is to predict human ratings of compound literality. The compound literality ratings were elicited by asking the subjects to give a score ranging from 0 to 5 for how literal the phrase XY is, with a score of 5 indicating ‘to be understood very literally’, and a score of 0 indicating ‘not to be understood literally at all’. Since we use their data for the models presented here, we will simply adopt their view and treat their literality ratings as compositionality or, in our terms, semantic transparency measures.
Syntax-based vector spaces are used widely in lexical semantics and are more versatile than word-based spaces (Baroni and Lenci, 2010). However, they are also sparse, with resulting reliability and coverage problems. We address this problem by derivational smoothing, which uses knowledge about derivationally related words (oldish → old) to improve semantic similarity estimates. We develop a set of derivational smoothing methods and evaluate them on two lexical semantics tasks in German. Even for mod- els built from very large corpora, simple derivational smoothing can improve cover- age considerably.
Clearly, any compositional semantics must somehow account for this [missing text], as such sentences are quite common and are not at all exotic, farfetched, or contrived. Traditionally, linguists and semanti- cists have dealt with such sentences by investigating various phenom- ena such as metaphor (2a); metonymy (2b); textual entailment (2c); nominal compounds (2d); lexical ambiguity (2e), and quantifier scope ambiguity (2f), to name a few. However, and although they seem to have a common denominator, it is somewhat surprising that in look- ing at the literature one finds that these phenomena have been stud- ied quite independently; to the point where there is very little, if any, that seems to be common between the various proposals that are of- ten suggested.
Existing logical systems can hide presuppositions, which we do not see just because they are so successful. First-order predicate logic is a good example. It has done amazingly well as a focus for making modern logic more rigorous, and as a 'laboratory' where the most important metatheorems of the field were developed. But as a paradigm for reasoning, it also has some features which one might question. In particular, stepping back a little, its very uniformity seems suspect. It would really be most surprising if one single linguistic formula representation could efficiently serve the purposes of such different activities as interpretation and deduction. Cognitive psychology suggests that memory and performance of humans over time require an interplay of different successive representations for the same information. Therefore, no such uniformity need be assumed for actual cognition, and a much more complex modular architecture might be needed, where semantics and deduction do not work on the same representations, and where information is passed between modules. And the same moral applies elsewhere. With information processing by machines, modularity is at the heart of performance. And to mention one further source, issues of architecture become unavoidable when we ponder the logical meso- and macro-structures of discourse and argumentation.
The obvious criticism is that the Gricean condition intro- duces inessential complexity, even if the term “inform” is in- appropriate without it. To remedy this, in this paper, we will simply drop the GC part of the FP (on the grounds that it is really a social protocol for human communication which need not be presumed in the context of any other interac- tion protocol), but retain the SC part as a simpler FIPA-like pre-condition, and allow the FIPA RE as a trustworthiness assumption (see section 4.5). Now, it turns out that the re- maining FIPA ACL messages that we use can be re-expressed as special cases of the inform act in context. This was known when the standard was prepared, but, in some cases, was ob- scured and made erroneous by the inessential complexity we have sought to remove. For example a propose message, sent by a potential CNP contractor to the manager, is an inform with the propositional content being that if the sender be- lieves that the receiver intends the action to be done by the sender, then the sender (will) intend this (too). So a sincere accept or reject requires belief by the manager in the propo- sitional content of such a proposal, and becomes an inform message with content expressing the manager’s intention, so that the contractor can discharge the conditional “promise”. There are other deeper issues with the FIPA semantics. The semantic conditions as expressed are sender oriented and there
This is useful if we want to automatically manipu- late the information which is represented by means of an SSN . For example, we can formally define whether some piece of information is already im- plicit in some other piece of information; in other words, we can define a notion of logical conse- quence. Related to this is the possibility to use the semantics in order to test the consistency of the in- formation conveyed by an SSN . For that purpose, we can do so-called model checking: an SSN is consis- tent if we can construct a model –that is, a logically possible state of the world– in which the SSN is true
This dissertation addresses the formalsemantics of Handel-C: a C-based language with true parallelism and priority-based channel communication, which can be compiled to hardware. It describes an implementation in the Haskell functional programming language of a de- notational semantics for Handel-C, as described in (Butterfield & Woodcock, 2005a). In particular, the Typed Assertion Trace trace model is used, and difficulties in creating a con- crete implementation of the abstract model are discussed. An existing toolset supporting a operational semantics for the language is renovated, in part to support arbitrary semantic “modes,” and to add support for the denotational semantics using this feature. A comparison module is written to compare the traces of two programs in any semantic mode supported by the simulator.