SENTIMENTAL ANALYSIS FOR POSITIVE COMPUTING

(1)

202 | P a g e

SENTIMENTAL ANALYSIS FOR POSITIVE COMPUTING

Aalekh Srivastava

¹

, Abhigyan Singh

²

1,2Computer Science and Engineering, VIT University, (India)

ABSTRACT

Sentimental Analysis which is also known as opinion mining, is the use of natural language processing, text analysis, and computational linguistics to identify and extract subjective information. This information can be extracted from various sources like critic reviews, social media, surveys. In this paper we propose the use of sentiment analysis in collaboration with entity extraction, classification and categorisation, language identification and text tokenisation in different fields from surveys, healthcare, social media, opinion formation and data management.

Keywords – entity extraction, language identification, opinion mining, sentimental analysis, text tokenization

I. INTRODUCTION

Human beings are subjective creatures who have opinions on every other thing from materialistic view to real- life view. Being able to interact with people on that level has many advantages for information systems. This advantage can be leveraged through sentimental analysis and form strong opinions and categorises data basically in any field that consists of human opinions or reviews. Sentimental analysis is based upon Natural Language Processing(NLP) where a sentence or a phrase which consists of some sentiments is taken as input.

The input is processed by using some algorithm and the result is given as whether the phrase or sentence has a positive, neutral or negative sentiment. Each sentiment topic is also assigned a specific score based on some criteria and the final score is given.

II. LITERATURE REVIEW

Sentimental analysis finds application in many areas.The majority of the applications where it is used are in opinion mining, reviewing surveys and social media. Many algorithms have been developed in order to understand opinion mining with sentimental analysis. An algorithm has been proposed for identifying the polarity of remarks given on something. The algorithm is applied on every remark given to identify its polarity.

Each remark gets an opinion value by the algorithm and if the opinion is high then the remark is taken to be positive. Lower opinion values represent negative remarks.

(2)

203 | P a g e Sentimental analysis is highly used in social media especially twitter. Generally, the sentiments are analysed by corpus based or dictionary based techniques. However, there has also been one technique proposed which makes use of both the above mentioned techniques to find the semantic orientation of the sentiments words in tweets.

This algorithmis based upon two important parts namely pre-processing of data and classification.

The algorithm comprises of the following steps as given below:

1. Retrieval of tweets

The tweets are searched on particular topic and all the concerned web pages are downloaded. Then it is extracted in the form of text line using a data mining tool.

2. Pre-processing of extracted data

The tweets extracted are raw data and using sentimental analysis does not give good quality results. So these tweets must go through pre-processing processes for obtaining best results. There are several techniques that can be used for carrying out pre-processing tasks,

i. filtering, ii. tokenisation,

iii. removal of stop words (preposition and articles).

3. Parallel processing

This includes classification and clustering of data for analysis purposes.

4. Assigning scores to the sentiments

The dictionary is used in which words are assigned a score ranging between 1(negative) and 3(positive). This determines the score of each sentiments in the data.

5. Output

Final result is given as that it tells if the sentiment is positive, neutral or negative.

In medical fields, sentimental analysis has much utility. The hospitals and nursing homes can take reviews of their hospitals, staff members, doctors, and nurses. These remarks and comments can be applied over sentimental analysis and used to find that if the overall feedback is positive of various areas in the medical field.

By this, they can improvise their facilities, and manage better staff, and doctors. This will not only help evolving the hospitals in different ways but the patients will also be benefitted by the facilities.

(3)

204 | P a g e Using sentimental analysis, the shopping websites can expand their business in a much better way. They can use the customers' feedback of different products, sellers, and also their shopping experience and apply these feedbacks over sentimental analysis. They can find out which of their products are good. Moreover, the website admin can also use the feedbacks to rate their website or app and innovate new strategies to improve it in order to achieve customer satisfaction in more advanced ways.

III. EXISTING METHODS AND ALGORITHMS

1.1 Semantic Orientation with PMI

Semantic Orientation(SO) (Hatzivassiloglou and McKeown, 2002) is basically a measure or score given to a positive or negative sentiment which is expressed by a word or phrase. The work by Turney (2002) is very effective which provides with the scoring of selected text and also it can be used with multiple word phrases.

The phrases which are derived from the given text, are assigned SO values. This SO is determined based upon the phrase's pointwise mutual information (PMI) with the words "excellent" and "poor". PMI is defined as follows :

Semantic Orientation(SO) (Hatzivassiloglou and McKeown, 2002) is basically a measure or score given to a positive or negative sentiment which is expressed by a word or phrase. The work by Turney (2002) is very effective which provides with the scoring of selected text and also it can be used with multiple word phrases.

The phrases which are derived from the given text, are assigned SO values. This SO is determined based upon the phrase's pointwise mutual information (PMI) with the words "excellent" and "poor". PMI is defined as follows :

p(w1 and w2) is the probability that w1 and w2 co-occur.

The SO for the phrase is the difference between its PMI with the word "excellent" and its PMI with the word

"poor". Then the probabilities are calculated by querying the AltaVista Advanced Search engine for counts. The search engine's "NEAR" operator, which represents the occurrences of the two queried words within ten words of each other in a text, is used to define co-occurrence. The final SO equation is :

This yields values above zero for phrases with greater PMI with the word "excellent" and below zero for greater PMI with "poor". A SO value of zero depicts a completely neutral semantic orientation.

1.2 Osgood semantic differentiation with WordNet

Kamps and Marx (2002) derived a method of using WordNet relationships to derive three values pertinent to the emotive meaning of adjectives. These three values correspond to potency(strong or weak), activity(active or

(4)

205 | P a g e passive) and evaluative(good or bad), which were introduced in Charles Osgood's Theory of Semantic Differentiation.

These values are calculated by measuring the relative minimal path length in WordNet between the adjective in question and the pair of words appropriate for the given factor.

1.3 Support Vector Machines

SVMs are a machine learning classification technique which use a function called a kernel to map a space of data points in which data is not linearly separable onto a new space in which it is, with allowances for erroneous classification.

IV. EXISTING METHODS RESULTS

The accuracy value represents the percentage of test texts which were classified correctly by the model. The results for 3-fold cross validation show how the present feature sets compare with the best performing SVM reported in Pang et al

(5)

206 | P a g e Hybrid Models consisting of SVMs, Turney, Osgood show the best accuracy percentage.

V. PROPOSED METHOD

Sentimental Analysis may be used with the data clustering algorithms in data mining. The output words of the pre-processed texts can be used to form a data set. In the data set, the words will be divided into two classes, positive and negative. The words will be allocated a score of zero or one where zero represents a negative sentiment word and one represents a positive sentiment word.

The data set would be applied over a clustering algorithm, say k-means clustering. The results of the clustering will have two clusters of different colours representing the positive and negative words. Positive words will be in one cluster and the negative will constitute the other cluster. The centroid of the two clusters can be found by the formula,

Now the input word from the users will be taken for sentimental analysis. The word will be searched between the twoclusters and the euclidean distance of the word will be found from the centroid of both clusters. The euclidean distance can be found byn bnbjlsbwkjldmer,

The euclidean distance from the centroid of negative cluster should be subtracted from that of positive centroid cluster. The final difference should be multiplied with „minus one‟ and scaled to the range of -1 to 1 and this would be the final score allotted to the word. If the final score is negative then it will definitely reflect a negative sentiment and vice-versa. Moreover, if there are more than one word in the text to be processed for the analysis, then the same method will be applied to each of those but the individual final score of all the words should be added up to give the final result.

(6)

207 | P a g e

VI. RESULTS

Input Text : I love apples

Input Text : I hate banana

(7)

208 | P a g e Input Text : I love dogs but hate cats

VII.

CONCLUSION

Sentimental analysis is one of the most interesting and demanding area of research in present scenario. Due to its' application in various fields, its' the need of the developing world to increase economy and businesses. There are many proposed systems and methodology by which sentiments can be analysed effectively. This paper presents various fields where sentimental analysis can be applied and also presents some of the best methods and algorithms used for the same.This paper also proposes a completely new method to process the words for sentimental analysis.The method performs sentimental analysis with the help of data clustering. It provides a score to the words which tell whether it is positive or negative and the magnitude of score describes the degree to which the input word is positive or negative.

REFERENCES

[1] Varsha Sahayak, Vijaya Shete, and Apashabi Pathan "Sentiment Analysis on Twitter Data" - International Journal of Innovative Research in Advanced Engineering (IJIRAE) Issue 1, Volume 2 (January 2015) [2] Shai Shalev-Shwartz, Yoram Singer, Nathan, Srebro, Andrew Cotter “Pegasos: Primal Estimated sub-

GrAdient SOlver for SVM”, 2000.

[3] Tony Mullen and Nigel Collier, "Sentiment analysis using support vector machines with diverse information sources”

[4] Gaurangi Patil, Varsha Galande, Vedant Kekan and Kalpana Dange, "Sentiment Analysis Using Support Vector Machine”

[5] Keke Cai, Scott Spangler, Ying Chen and Li Zhang, "Leveraging Sentiment Analysis for Topic Detection", 2008 IEEE/WIC/ACM.

(8)

209 | P a g e [6] Deepali Virmani, Vikrant Malhotra and Ridhi Tyagi, "Sentiment Analysis Using Collaborated Opinion

Mining”

[7] Raisa Varghese and Jayasree M, "A Survey on sentiment analysis and opinion mining", IJRET