The task of wordsensedisambiguation (WSD) can be regarded as one of the most important tasks for natural language processing applications including semantic interpretation of texts, semantic web applications, paraphrasing and summarization. One issue of current wordsensedisambiguation methods is that the most successful techniques are supervised, which means that annotated corpora should be available to train the systems. However, this kind of data is heavy to produce and cannot be created for each new domain to be disambiguated. This indicates that more efforts should be put on unsupervised wordsensedisambiguation techniques. Furthermore, one vital issue that should generally be solved for this kind of systems is the choice of an adequate context. Usually, this context is defined as a window of words or sentences around the word to be disambiguated. The question raised by this paper is whether defining this context using syntactic and logical features can be beneficial to WSD. This paper briefly presents a natural language processing pipeline that outputs logical representations from texts and disambiguates the logical representations using various WSD algorithms. The paper also presents different context definitions that are used for WSD. Preliminary results show that logical and syntactic features can be of interest to WSD. The main contribution of this paper is the use of syntactic and semantic information for WSD in an unsupervised manner. The paper is organized as follows: First, section 2 explains the pipeline that creates logical representations and presents the various WSD algorithms and the contexts used in this study. Section 3 presents experiments that are conducted over a small corpus and shows preliminary results. It also describes the results of our system on the Senseval English lexical Sample Task before drawing a conclusion.
We note that decision trees based on binary features representing the possible values of a given sequence of part of speech tags outperforms one based on individ- ual features. The combinations which include P 1 obtain higher accuracies. In the the case of the verbs and ad- jectives in S ENSEVAL -2 and S ENSEVAL -1 data, the best results are obtained using the parts of speech of words following the target word. The nouns are helped by parts of speech of words on both sides. This is in accordance with the hypothesis that verbs and adjectives have strong syntactic relations to words immediately following while nouns may have strong syntactic relations on either side. However, the hard and serve data are found to be helped by features from both sides. We believe this is because of the much larger number of instances per task in case of hard and serve data as compared to the adjectives and verbs in S ENSEVAL -1 and S ENSEVAL -2 data. Due to the smaller amount of training data available for S ENSEVAL - 2 and S ENSEVAL -1 words, only the most potent features help. The power of combining features is highlighted by the significant improvement of accuracies above the base- line for the line and hard data, which was not the case using individual features (Table 1).
One of the first problems that is encountered by any natural language processing system is that of lexical ambiguity, be it syntactic or seman- tic (Jurafsky and Martin, 2008). The resolution of a word’s syntactic ambiguity has largely been solved in language processing by part-of-speech taggers which predict the syntactic category of words in text with high levels of accuracy. The problem is that words often have more than one meaning, sometimes fairly similar and sometimes completely different. The meaning of a word in a particular usage can only be determined by exam- ining its context. WordSenseDisambiguation (WSD) is the process of identifying the sense of a polysemic word 1 . Different approaches to WSD
using the NLM-based graph over the NLM dataset. However, the NLM-based does not perform as well in the cross-testing scenario, i.e. when applied to the Acronym datasets. This may be due to a greater specificity of the Acronym corpus, in which the different CUIs among which the dis- ambiguation algorithm has to choose (representing extended forms of the acronyms), correspond to more specific concepts. On the other hand, terms in the NLM-WSD corpus are much more general. Hence, it is possible that some of the target CUIs of the Acronym corpus do not even appear in the graph created from NLM-related abstracts. Also, it is likely that any graph created from a large enough set of abstracts (such as the one created with acronym-based abstracts) con- tains enough information about CUIs representing the general concepts of the “NLM” dataset to perform a good disambiguation. Finally, we can observe that results obtained with the joint graph improve those obtained with simpler graphs for all but one of the datasets. This suggests that the combined information that can be found inside the joint graph is useful to better represent the connections between concepts and hence help to improve the overall disambiguation. We have conducted some additional experiments comparing the accuracy obtained using either the NLM- based Graph or the Joint Graph, both built with the same number of documents, and the achieved results confirm this intuition: 75.42% of accuracy of the Joint Graph against 69.12% of the NLM- based Graph for 10,000 documents, 77.45% against 71.48% for 20,000 documents, and 77.78% against 72.93% for 30,000 documents.
With its importance, a wordsensedisambiguation (WSD) has been known as a very important ﬁeld of a natural language processing (NLP) and has been studied steadily since the advent of NLP in the 1950s. In spite of the long study, few WSD systems are used for practical NLP applications unlike part-of-speech (POS) taggers and syntactic parsers. The reason is because most of WSD studies have focused on only a small number of ambiguous words based on sense tagged corpus. In other words, the previous WSD systems disambiguate senses of just a few words, and hence are not helpful for other NLP applications because of its low coverage.
Three parsers fulfilled all the requirements: Link Grammar (Sleator and Temperley, 1993), Minipar (Lin, 1993) and (Carroll & Briscoe, 2001). We installed the first two parsers, and performed a set of small experiments (John Carroll helped out running his own parser). Unfortunately, we did not have a comparative evaluation to help choosing the best. We performed a little comparative test, and all parsers looked similar. At this point we chose Minipar mainly because it was fast, easy to install and the output could be easily processed. The choice of the parser did not condition the design of the experiments (cf. section 7).
The detailed description of the tournament model- based Japanese dependency parsing is found in (Iwatate et al., 2008). The original Iwatate’s pars- ing algorithm was for Japanese, which is for a strictly head-ﬁnal language. We adapt the algo- rithm to English in this shared task. The tour- nament model chooses the most likely candidate head of each of the focused words in a step- ladder tournament. For a given word, the al- gorithm repeats to compare two candidate heads and ﬁnds the most plausible head in the series of a tournament. On each comparison, the win- ner is chosen by an SVM binary classiﬁer with a quadratic polynomial kernel 2 . The model uses
Urdu is an Indo-Aryan language that inherits its vocabulary and gram- matical forms from a range of languages including Arabic, Persian and South Asian languages . It is morphologically rich, some of its words (verbs and nouns) can have more than 40 forms making it difficult to process automatical- ly . Urdu is one of the important international languages with more than 300 million speakers . 151 million are native speakers and the re- mainder second language speakers 1 . Urdu is the national language of Pakistan where there are more than 11 million speakers. Other countries with a large number of Urdu speakers include India, Bangladesh, USA, UK, and Canada . It is also spoken globally due to the large South Asian diaspora . Despite its wide usage Urdu is still a poorly resourced language for NLP and efforts are being made to create Urdu computational resources .
The resulting extended WSD system performs systematically better than its unextended baseline counterpart, improving the score from about +3% for the worst extension, to about +9% for the best one. This article uses WordNet as a dictionary, and evaluations are performed on two English all-words WSD tasks, because it is easier to compare the performance and the robustness of the method, as the majority of the researches in WSD uses this language. However, the whole process of sense embeddings creation and Lesk extension can be easily adapted to many language, requiring only a set of unannotated corpora, and a typical dictionary, thus, giving the possibility to create an efficient WSD system, even for a poorly resourced language.
Word Sense Disambiguation Using Statistical Models of Roget's Categories Trained on Large Corpora Word Sense Disambiguation Using Statistical Models of Roget's Categories Trained on Large Corpora Davi[.]
could refer to “Robert Redford.” We use the number of links connecting a particular string with a specific entity as a measure of the strength of association. Note that our dictionary spans not just named entities but also many general topics for which there are Wikipedia ar- ticles. Further, it transcends Wikipedia by including an- chors (i) from the greater web; and (ii) to Wikipedia pages that may not (yet) exist. For the purposes of NED, it could make sense to discard all but the articles that correspond to named entities. We keep everything, however, since not all articles have a known entity type, and because we would like to construct a resource that is generally useful for dis- ambiguating concepts. Our dictionary can disambiguate mentions directly, simply by returning the highest-scoring entry for a given string. The construction of this dictionary is explained with more details in (Spitkovsky and Chang, 2012).
Table 2 shows the disambiguation accuracy of the WSD sys- tems for each of the datasets (Abbrev, NLM-WSD and MSH- WSD) and their combination (Combine). The results show that overall the supervised system obtains higher disambiguation accuracies than the unsupervised one, which is consistent with previous results, for example [4–7]. They also show that the accu- racy on the Abbrev dataset is higher than the MSH-WSD or NLM- WSD datasets. We believe this is because the Abbrev dataset con- tains only abbreviations, which have a more coarse grained dis- tinction between their senses. We also see this between the MSH-WSD and NLM-WSD datasets. NLM-WSD primarily contains terms where, as mentioned above, MSH-WSD contains a mix of terms and abbreviations. This explains why the WSD systems ob- tain a higher disambiguation accuracy on the MSH-WSD dataset than the NLM-WSD dataset.
ically acquired from corpora in a totally unsuper- vised way. Experimental results show that the use of Domain Models allows us to reduce the amount of training data, opening an interesting research di- rection for all those NLP tasks for which the Knowl- edge Acquisition Bottleneck is a crucial problem. In particular we plan to apply the same methodology to Text Categorization, by exploiting the Domain Ker- nel to estimate the similarity among texts. In this im- plementation, our WSD system does not exploit syn- tactic information produced by a parser. For the fu- ture we plan to integrate such information by adding a tree kernel (i.e. a kernel function that evaluates the similarity among parse trees) to the kernel combi- nation schema presented in this paper. Last but not least, we are going to apply our approach to develop supervised systems for all-words tasks, where the quantity of data available to train each word expert classifier is very low.
A New Approach to Word Sense Disambiguation A N e w A p p r o a c h t o W o r d S e n s e D i s a m b i g u a t i o n Rebecca Bruce and Janyce Wiebe The Computing Research Lab New Mexico State Univers[.]
Error Driven Word Sense Disambiguation Error D r i v e n W o r d S e n s e D i s a m b i g u a t i o n L u c a D i n i a n d V i t t o r i o D i T o m a s o F r 6 d 6 r i q u e S e g o n d C E L I X e[.]
Our baseline consists of the predictions made by a majority class learner, which labels all examples with the predominant sense encountered in the training data. 8 Note that the most frequent sense baseline is often times difficult to surpass because many of the words exhibit a disproportionate usage of their main sense (i.e., higher than 90%), such as the noun bass or the verb approve. Despite the fact that the majority vote learner provides us with a supervised baseline, it does not take into consideration actual features pertaining to the instances. We therefore introduce a second, more informed baseline that relies on binary-weighted features extracted from the English view of the datasets and we train a multinomial Na¨ıve Bayes learner on this data. For every word included in our datasets, the binary-weighted Na¨ıve Bayes learner achieves the same or higher accuracy as the most frequent sense baseline.
The sentences were automatically pos-tagged with the Ratnaparki tagger and parsed with the Collins parser. We extracted local contextual fea- tures as for WordNet sense-tagging and used the lo- cal features to train our WSD system on the coarse- grained sense-tagging task of automatically assign- ing PropBank frameset tags. We tested the effect of using only collocational features (“co”) for frameset tagging, as well as using only PropBank role fea- tures (“pb”) or only our original syntactic/semantic features (“synsem”) for this task, and found that the combination of collocational features with Prop- Bank features worked best. The system has the worst performance on the word strike, which has a high number of framesets and a low number of train- ing instances. Table 3 shows the performance of the system on different subsets of local features.
Several types of information are useful for WSD (Ide and Veronis 1998). Three major types are the grammatical characteristics of the polysemous word to be disambiguated, words that are syntactically related to the polysemous word, and words that are topically related to the polysemous word. Among these types, use of grammatical characteristics, which are language- dependent, is not compatible with the approach using bilingual corpora. On the other hand, since a topical relation is language-independent, use of topically relat- ed words is most compatible with the approach using bilingual corpora. Accordingly, we focused on using topically related words as clues for determining the most suitable sense of a polysemous word.