Distributional Semantics

Top PDF Distributional Semantics:

Separating Disambiguation from Composition in Distributional Semantics

Separating Disambiguation from Composition in Distributional Semantics

Although there might not exists such a thing as the best evaluation method for compositional- distributional semantics, it is safe to assume that a phrase similarity task avoids many of the pitfalls of tasks such as the one of Section 6.1. Given pairs of short phrases, the goal is to assess the similar- ity of the phrases by constructing composite vec- tors for them and computing their distance. No as- sumptions about disambiguation abilities regard- ing a specific word (e.g. the verb) are made here; the only criterion is to what extent the composite vector representing the meaning of a phrase is sim- ilar or dissimilar to the vector of another phrase. From this perspective, this task seems the ideal choice for evaluating a model aiming to provide appropriate phrasal semantics. The scores given by the models are compared to those of human evaluators using Spearman’s ρ.
Show more

10 Read more

A Vector Space for Distributional Semantics for Entailment

A Vector Space for Distributional Semantics for Entailment

Distributional semantics creates vector- space representations that capture many forms of semantic similarity, but their re- lation to semantic entailment has been less clear. We propose a vector-space model which provides a formal foundation for a distributional semantics of entailment. Us- ing a mean-field approximation, we de- velop approximate inference procedures and entailment operators over vectors of probabilities of features being known (ver- sus unknown). We use this framework to reinterpret an existing distributional- semantic model (Word2Vec) as approxi- mating an entailment-based model of the distributions of words in contexts, thereby predicting lexical entailment relations. In both unsupervised and semi-supervised experiments on hyponymy detection, we get substantial improvements over previ- ous results.
Show more

11 Read more

Predictability of Distributional Semantics in Derivational Word Formation

Predictability of Distributional Semantics in Derivational Word Formation

Compositional models of distributional semantics, or CDSMs (Mitchell and Lapata, 2010; Erk and Pad´o, 2008; Baroni et al., 2014; Coecke et al., 2010), have established themselves as a standard tool in computational semantics. Building on traditional distributional semantic models for individual words (Turney and Pantel, 2010), they are generally applied to compositionally compute phrase meaning by defining combination operations on the meaning of the phrase’s constituents. CDSMs have also been co-opted by the deep learning community for tasks including sentiment analysis (Socher et al., 2013) and machine translation (Hermann and Blunsom, 2014). A more recent development is the use of CDSMs to model meaning-related phenomena above and below syntactic structure; here, the term “composition” is used more generally to apply to processes of meaning combination from multiple linguistic units, e.g., above and below syntactic structure. Above the sentence level, such models attempt to predict the unfolding of discourse (Kiros et al., 2015). Below the word level, CDSMs have been applied to model word formation processes like compounding (church + tower → church tower) and (morphological) derivation (Lazaridou et al., 2013) (favor + able → favorable). More concretely, given a distributional representation of a basis and a derivation pattern (typically an affix), the task of the CDSM is to predict a distributional representation of the derived word, without being provided with any additional information. Interest in the use of CDSMs in this context comes from the observation that derived words are often less frequent than their bases (Hay, 2003), and in the extreme case even completely novel; consequently, distributional evidence is often unreliable and sometimes unavailable. This is confirmed by Luong et al. (2013) who compare the performance of different types of word embeddings on a word similarity task and achieve poorer performance on data sets containing rarer and more complex words. Due to the Zipfian distribution there are many more rare than frequent word types in a corpus, which increases the need for methods being able to model derived words.
Show more

12 Read more

Entailment above the word level in distributional semantics

Entailment above the word level in distributional semantics

Distributional semantics (DS) approximates lin- guistic meaning with vectors summarizing the contexts where expressions occur. The success of DS in lexical semantics has validated the hy- pothesis that semantically similar expressions oc- cur in similar contexts (Landauer and Dumais, 1997; Lund and Burgess, 1996; Sahlgren, 2006; Sch¨utze, 1997; Turney and Pantel, 2010). For- mal semantics (FS) represents linguistic mean- ings as symbolic formulas and assemble them via composition rules. FS has successfully modeled quantification and captured inferential relations between phrases and between sentences (Mon- tague, 1970; Thomason, 1974; Heim and Kratzer, 1998). The strengths of DS and FS have been complementary to date: On one hand, DS has in- duced large-scale semantic representations from corpora, but it has been largely limited to the
Show more

10 Read more

Semantic transparency: challenges for distributional semantics

Semantic transparency: challenges for distributional semantics

Vecchi et al. (2011) use distributional semantics to characterise semantic deviance in ANs. Unat- tested ANs were rated by two of the authors using a 3-point scale (deviant, intermediate or acceptable), where the two endpoints marked ‘semantically highly anomalous, regardless of effort’ vs. ‘completely acceptable’. Only those items with inter-rater agreement on ‘deviant’ or ‘acceptable’ were included in the test set. They investigated the ability of three measures to distinguish between deviant and non-deviant ANs: length of the AN vectors, cosine similarity between vectors for AN and N, and the average cosine with the top 10 nearest neighbours (density). Of these three indices, only the first and the last yield significant results for AN classification. The authors hypothesize that a wide angle between N and AN might not be a measure of deviance, but rather a common feature of a number of types of non-deviant ANs, among them metaphorical constructions. If they are correct, it should be possible to use cosine similarity on acceptable AN combinations in order to identify shifts.
Show more

10 Read more

Linear Compositional Distributional Semantics and Structural Kernels

Linear Compositional Distributional Semantics and Structural Kernels

In this paper, we want to start the analysis of the models for compositional distributional semantics with respect to the similarity measure. We focus on linear CDS models. We believe that this sim- ple analysis of the properties of the similarity can help to better investigate new CDS models. We show that, looking CDS models from this point of view, these models are strictly related with the convolution kernels (Haussler, 1999), e.g., the tree kernels (Collins and Duffy, 2002). We will then examine how the distributed tree kernels (Zanzotto and Dell’Arciprete, 2012) are an interesting result to draw a strongest link between CDS models and convolution kernels.
Show more

5 Read more

Computing Semantic Compositionality in Distributional Semantics

Computing Semantic Compositionality in Distributional Semantics

This article introduces and evaluates an approach to semantic compositionality in computational lin- guistics based on the combination of Distributional Semantics and supervised Machine Learning. In brief, distributional semantic spaces containing representations for complex constructions such as Adjective-Noun and Verb-Noun pairs, as well as for their constituent parts, are built. These repre- sentations are then used as feature vectors in a supervised learning model using multivariate multiple regression. In particular, the distributional semantic representations of the constituents are used to predict those of the complex structures. This approach outperforms the rivals in a series of experi- ments with Adjective-Noun pairs extracted from the BNC. In a second experimental setting based on Verb-Noun pairs, a comparatively much lower performance was obtained by all the models; however, the proposed approach gives the best results in combination with a Random Indexing semantic space.
Show more

10 Read more

Derivational Smoothing for Syntactic Distributional Semantics

Derivational Smoothing for Syntactic Distributional Semantics

Distributional semantics (Turney and Pantel, 2010) builds on the assumption that the semantic similar- ity of words is strongly correlated to the overlap between their linguistic contexts. This hypothesis can be used to construct context vectors for words directly from large text corpora in an unsupervised manner. Such vector spaces have been applied suc- cessfully to many problems in NLP (see Turney and Pantel (2010) or Erk (2012) for current overviews).

5 Read more

Exploitation of Co reference in Distributional Semantics

Exploitation of Co reference in Distributional Semantics

The original aim of distributional semantics is to model the similarity of the meaning—the semantics—of words. The basic assumption underlying this approach is that the se- mantic similarity of two words is a function of their con- texts. That is, in other words, the meaning of words can be inferred from the frequencies of the words they imme- diately occur with and this can happen in such a way that the degree of similarity of those meanings can be measured. Semantic similarity is a key concept in the modeling of lan- guage and thus in computational linguistics. It is crucial in a variety of linguistic applications influencing our everyday life such as search engines.
Show more

7 Read more

Proceedings of the Workshop on Distributional Semantics and Compositionality

Proceedings of the Workshop on Distributional Semantics and Compositionality

Any NLP system that does semantic processing relies on the assumption of semantic compositionality: the meaning of a phrase is determined by the meanings of its parts and their combination. For this, it is necessary to have automatic methods that are capable to reproduce the compositionality of language. Recent years have shown the renaissance of interest in distributional semantics. While distributional methods in semantics have proven to be very efficient in tackling a wide range of tasks in natural language processing, e.g., document retrieval, clustering and classification, question answering, query expansion, word similarity, synonym extraction, relation extraction, and many others, they are still strongly limited by being inherently word-based. The main hurdle for vector space models to further progress is the ability to handle compositionality.
Show more

10 Read more

Formal Distributional Semantics: Introduction to the Special Issue

Formal Distributional Semantics: Introduction to the Special Issue

In recent years, distributional models have been extended to handle the semantic composition of words into phrases and longer constituents (Baroni and Zamparelli 2010; Mitchell and Lapata 2010; Socher et al. 2012), building on work that the Cognitive Science community had started earlier on (Foltz 1996; Kintsch 2001, among others). Although these models still do not account for the full range of composition phenomena that have been examined in Formal Semantics, they do encode relevant semantic information, as shown by their success in demanding semantic tasks such as predicting sentence simi- larity (Marelli et al. 2014). Compositional Distributional Semantics allows us to model semantic phenomena that are very challenging for Formal Semantics and more generally symbolic approaches, especially concerning content words. Consider polysemy: In the first three sentences in Figure 1(a), postdoc refers to human beings, whereas in the fourth it refers to an event. Composing postdoc with an adjective such as tall will highlight the human-related information in the noun vector, bringing it closer to person, whereas composing it with long will highlight its eventive dimensions, bringing it closer to time (Baroni and Zamparelli 2010, Boleda et al. 2013 as well as Asher et al. and Weir et al., this issue); crucially, in both cases, the information relating to research activities will be preserved. Note that in this way DS can account for polysemy without sense enumeration (Kilgarriff 1992; Pustejovsky 1995).
Show more

17 Read more

Distributional Semantics for Resolving Bridging Mentions

Distributional Semantics for Resolving Bridging Mentions

This intuition is backed by a manual analysis we conducted on 100 random errors not made by the baseline. When examining each of the system’s antecedent decisions and their weights, we found that 23% of the wrong links were chosen because of distributional semantics features. The major- ity of these semantic errors were triggered by the PRIOR feature, whereas only one of them could be ascribed to the IS-A feature. Here, the recall- oriented clustering of IS-As in the DT (Gliozzo et al., 2013) produced an incorrect hypernymic rela- tion between Chaidamun Basin and the country.

8 Read more

On the difficulty of a distributional semantics of spoken language

On the difficulty of a distributional semantics of spoken language

In the case of spoken language, unsupervised methods usually focus on discovering relatively low-level constructs such as phoneme inventories or word-like units. This is mainly due to the fact that the key insight from distributional semantics that “you shall know the word by the company it keeps” (Firth, 1957) is hopelessly confounded in the case of spoken language. In text two words are considered semantically similar if they co-occur with similar neighbors. However, speech seg- ments which occur in the same utterance or situ- ation often have many other features in addition to similar meaning, such as being uttered by the same speaker or accompanied by similar ambient noise. In this study we show that if we can abstract away from speaker and background noise, we can effectively capture semantic characteristics of spo- ken utterances in an unsupervised way. We present SegMatch, a model trained to match segments of the same utterance. SegMatch utterance encod- ings are compared to those in Audio2Vec, which is trained to decode the context that surrounds an utterance. To investigate whether our represen- tations capture semantics, we evaluate on speech and vision datasets where photographic images are paired with spoken descriptions. Our experiments show that for a single synthetic voice, a simple model trained only on image captions can capture pairwise similarities that correlate with those in the visual space.
Show more

7 Read more

Alternative measures of word relatedness in distributional semantics

Alternative measures of word relatedness in distributional semantics

word relatedness by four different distances. We tested this method on the standard WS-353 Test, obtaining the co-occurrence frequency from the Wacky corpus. The Spearman correlation with hu- man given scores are around the baseline for vec- tor space models, so there is hope for improve- ment. The method is computationally less ex- pensive. Furthermore, it provides a new frame- work for experimenting with distributional seman- tic compositionality, since our method can be ex- tended from measuring word-word semantic relat- edness to evaluating phrasal semantics. This is in fact one of the most challenging streams of re- search on distributional semantics: finding a prin- cipled way to account for natural language com- positionality.
Show more

5 Read more

Estimating Linear Models for Compositional Distributional Semantics

Estimating Linear Models for Compositional Distributional Semantics

Lin and Pantel (2001) propose the pattern dis- tributional hypothesis that extends the distribu- tional hypothesis for specific patterns, i.e. word sequences representing partial verb phrases. Dis- tributional meaning for these patterns is derived directly by looking to their occurrences in a cor- pus. Due to data sparsity, patterns of different length appear with very different frequencies in the corpus, affecting their statistics detrimentally. On the other hand, compositional distributional semantics (CDS) propose to obtain distributional meaning for sequences by composing the vectors of the words in the sequences (Mitchell and Lap- ata, 2008; Jones and Mewhort, 2007). This ap- proach is fairly interesting as the distributional meaning of sequences of different length is ob- tained by composing distributional vectors of sin- gle words. Yet, many of these approaches have a large number of parameters that cannot be easily estimated.
Show more

9 Read more

Don’t Blame Distributional Semantics if it can’t do Entailment

Don’t Blame Distributional Semantics if it can’t do Entailment

The only truly common core among all uses of any given expression is that they are all, indeed, uses of the same expression. Hence, if expression meaning is to serve its purpose as a common core among all uses, i.e., as a context-invariant starting point of semantic/pragmatic explanations, then it must reflect all uses. As we argued in section 3, distributional semantics, conceived of as a model of expression meaning (i.e., the ‘strong’ view of Lenci 2008), embraces exactly this fact. This makes the representa- tions of distributional semantics, but not those of formal semantics, suitable for characterizing expression meaning. By contrast, (largely) discrete notions like reference, truth and entailment are useful, at best, at the level of speaker meaning – recall that our position is that words don’t refer, speakers do (Strawson, 1950). 15 That is, one can fruitfully conceive of a particular speaker, in some individuated context, as intending to refer to discrete things, communicating a certain determinate piece of information that can be true or false, entailing certain things and not others. This still involves considerable abstraction, as any symbolic model of a cognitive system would (Marr, 1982); e.g., speaker intentions may not always be as determinate as a symbolic model presupposes. But the amount of abstraction required, in particular the kind of determinacy of content that a symbolic model presupposes, is not as problematic in the case of speaker meaning as for expression meaning. The reason is that a model of speaker meaning needs to cover only a single usage, by a particular speaker situated in a particular context; a model of expression meaning, by contrast, needs to cover countless interactions, across many different contexts, of a whole community of speakers. The symbolic representations of formal semantics are ill-suited for the latter.
Show more

14 Read more

Distributional Semantics Meets Multi-Label Learning

Distributional Semantics Meets Multi-Label Learning

We present a label embedding based approach to large-scale multi-label learning, drawing inspiration from ideas rooted in distributional semantics, specifically the Skip Gram Nega- tive Sampling (SGNS) approach, widely used to learn word embeddings. Besides leading to a highly scalable model for multi-label learning, our approach highlights interesting con- nections between label embedding methods commonly used for multi-label learning and paragraph embedding methods commonly used for learning representations of text data. The framework easily extends to incorporating auxiliary informa- tion such as label-label correlations; this is crucial especially when many training instances are only partially annotated. To facilitate end-to-end learning, we develop a joint learning al- gorithm that can learn the embeddings as well as a regression model that predicts these embeddings for the new input to be annotated, via efficient gradient based methods. We demon- strate the effectiveness of our approach through an extensive set of experiments on a variety of benchmark datasets, and show that the proposed models perform favorably as compared to state-of-the-art methods for large-scale multi-label learning.
Show more

8 Read more

Multimodal Distributional Semantics

Multimodal Distributional Semantics

Distributional semantics is the branch of computational linguistics that devel- ops methods to approximate the meaning of words based on their distributional properties in large textual corpora. The basis of such methods relies on the distributional hypothesis : Words that occur in similar context are seman- tically similar. Although the distributional hypothesis has multiple theoretical underpinnings in psychology, linguistics, lexicography and philosophy of language [Firth, 1957; Harris, 1954; Miller and Charles, 1991; Wittgenstein, 1953], nowa- days its strong influence is mainly due to its practical consequence: Harvesting meaning becomes the very straightforward operation of recording the contexts in which words occur and using their co-occurrence statistics to represent their meanings. Distributional semantic models ( DSMs ) are among the approaches which take full advantage of the distributional hypothesis by storing distributional information into vectors that can be utilized to compute the degree of semantic relatedness of two or more words in terms of geometric distance (see e.g., Clark [2013]; Turney and Pantel [2010]). For example, both sea and ocean might of- ten appear with words such as water , boat , fish and wave and, as a result, their distributional vectors will be very close, indicating that the two words are very similar. The way in which DSMs operationalize the distributional hypothesis has led to very effective approaches in many semantic-related tasks (see Section 2.1
Show more

163 Read more

Composition of Compound Nouns Using Distributional Semantics

Composition of Compound Nouns Using Distributional Semantics

The use of distributional semantics to repre- sent the meaning of a single word has proven to be very effective, but there still is diffi- culty representing the meaning of larger con- stituents, such as a noun phrase. In general, it is unclear how to find a representation of phrases that preserves syntactic distinctions and the relationship between a compound’s constituents. This paper is an attempt to find the best representation of nominal compounds in Spanish and English, and evaluates the per- formance of different compositional models by using correlations with human similarity judgments and by using compositional repre- sentations as input into an SVM classifying the semantic relation between nouns within a compound. This paper also evaluates the util- ity of different function’s compositional repre- sentations, which give our model a slight ad- vantage in accuracy over other state-of-the-art semantic relation classifiers.
Show more

10 Read more

Functional Distributional Semantics

Functional Distributional Semantics

Current approaches to distributional semantics generally involve representing words as points in a high-dimensional vector space. However, vec- tors do not provide ‘natural’ composition oper- ations that have clear analogues with operations in formal semantics, which makes it challenging to perform inference, or capture various aspects of meaning studied by semanticists. This is true whether the vectors are constructed using a count approach (e.g. Turney and Pantel, 2010) or an em- bedding approach (e.g. Mikolov et al., 2013), and indeed Levy and Goldberg (2014b) showed that there are close links between them. Even the ten- sorial approach described by Coecke et al. (2010) and Baroni et al. (2014), which naturally captures argument structure, does not allow an obvious ac- count of context dependence, or logical inference. In this paper, we build on insights drawn from formal semantics, and seek to learn representa-
Show more

13 Read more

Show all 1597 documents...