Word Vector Representations to build an Emojifier using LSTM

(1)

Word Vector Representations to build an

Emojifier using LSTM

1

st

Siddharth Bansal

Student of B-Tech Computer Science and Engineering SRM Institute of Science and Technology

Kattankulathur,Tamil Nadu, India

Abstract—The paper implements word vectors that allows us to implement a model which inputs a sentence and find the most appropriate emoji to be used with the sentence. This experiment uses word vector concept in Deep Learning domain of Computer science. By using word vectors our algorithm would be able to generalize and associate words in the test set to the same emoji even if those words don’t even appear in the training set. This allows us to build an accurate classifier mapping from sentences to emojis, even using a small training set.This experiment incorporates an LSTM in Keras which gives more sophistication to this experiment and also takes word ordering into account. LSTM has been under a lot of analysis after it was introduced by Sepp Hochreiter and Jurgen Schmidhuber in their paper in the year 1994 [1] . The study towards their uses and advantages over any other frameworks are being carried out. I have performed an experiment to show the effects of LSTM in a long range in time

Index Terms—sentiment Analysis, Neural Network, word vector, LSTM

I. INTRODUCTION

The present experiment, emojifier using word vector, is a sentiment analysis of any input given by the user and computationally produce the appropriate emoji. This experiment aims to help the organizations who use virtual assistance in their websites to have a deeper connect with their end customers.

Most of the modern organizations use a chatbot service to interact with their end customers. This experiment allows chatbot to also produce the required image or emoji with respect to the feelings depicted in the input sentence, the chatbot could have a deeper impact and also the end customers would not feel that they are chatting with a robot.

Sentiment Analysis has been defined as the process which determines the feelings or rather sentiments of a sentence is good , bad or rather neutral. [2]

Many organizations use the concept of sentiment analysis to develop their strategies towards products or brand, how people react to their strategies or product launches and why consumers dont buy certain products through taking their opinions and reviews accordingly. [2]

2

nd

A. M. J. Muthu Kumaran

Assistant Professor (S.G) Computer Science and Engineering SRM Institute of Science and Technology

Kattankulathur,Tamil Nadu, India

The basic objective for this experiment is to build an accurate classifier mapping from sentences to emojis, even using a small training set.

This paper proposes to use LSTM in keras and dropout in keras to show a better accuracy on a experiment in order to to determine which emoji’s are easier to predict. The remainder of the paper consists of some related work in the field and the experiment conducted and lastly the conclusion with acknowledgments and references.

AI. RELATED WORK

A. emojis

[image:1.612.342.533.530.638.2]

Since the debut on Twitter and Instagram, emojis quickly expanded their territory from emotions to various objects (sports, foods, etc.). [3] There has been quite a lot of developments to understand emojis in general and their relation with words in sentences. Spencer Cappallo and Stacey Svetlichnaya in their paper [4] have discussed which emojis are quite hard to predict by the computer.

(2)

[image:2.612.122.228.56.206.2]

Fig. 2. LSTM for a Single Cell

Fig. 3. Equations of LSTM

B. Word Embeddings

In the word representation part, each input word of the sentence is transformed into a vector by looking up their respective word embeddings values. In Keras,there is an inbuilt function embedding which allows us to map positive integers with respect to the words into vectors.In this experiment we have taken trained embeddings .

C. LSTM

LSTM stands for Long short term memory. It was intro-duced by Sepp Hochreiter and Jurgen Schmidhuber in their paper LONG SHORT-TERM MEMORY” [1] . ”Learning to store information over extended time intervals via recurrent backpropagation takes a very long time, mostly due to insuf-ficient, decaying error back flow”. It has 3 gates namely input gate output and a forget gate.Each gate have a specific equation to be calculated. as shown in figure 3. In this experiment paper we have not calculated those equations but have made use of keras which takes in input vector and hidden layer dimensions which are 128. and returns a batch of sequences.

D. Dropout

Dropout is a popular regularisation technique with deep

training [5]. Dropout is used to tackle the vanishing gradient problems during the LSTM. Well dropout prevents co-adaption of each hidden unit inside the neural network by making the presence of the other units unreliable. In this experiment we have used the Keras.dropout to drop the hidden layers from the network. We have taken 0.5 as the probability to perform dropout in this experiment.

E. Chatting bots

Chatting autobots also referred to as chatbots are used by most of the organizations in today’s generation to connect with their customers in taking their feedback about their product and also registering an issue if arises and connecting the issues to the senior developers in the company. chatbots have been automated to generate back channel responses to the queries registered by the user. In recent times there has been a study by Yoko Nishihara and Ikuta and Masaki [6] about how to involve chatbots into discussions. I believe we can modify chatbots with appropriate emojis to have a better interactions with the participants in the discussions.

BI. EXPERIMENT

A. Overview

This experiment has two versions, the first version does not use LSTM and recurrent neural networks but The input of the model is a string corresponding to a sentence (e.g. ”I love you). In the code, the output will be a probability vector of shape (1,5), that you then pass in an argmax layer to extract the index of the most likely emoji output.

In the version 2 of the experiment I am going to use word embeddings and pass it onto LSTM and dropout functions and predict the emoji output.

We will plot the confusion matrix in the end to compare the results and draw a final conclusion.

B. Data Set

I have used 2 sets of comma separated values to train the model and test the model. First set X contains 127 strings and the other set contains 56 examples to test the model. we will also use a pre trained 50 dimensional GloVe which was downloaded from the open source from the nlp stanford website. For both Versions of the experiment I am going to use the same dataset, to print better comparison results. There is a package in python which allows us to access all the emojis in the

C. Emojify version 1

(3)

cost-Fig. 4. Equations to train our model

2) Conversion of words to vector using Word2Vec: The word2vec is a strategy that includes Continuous Bag-of-Words model (CBOW)and also Skip-gram model which use, Hierarchical Softmax and Negative sampling strategies that help in giving each word its value. These values will be used further in order to predict the appropriate emoji by averaging their respective values of the words in the sentence.

3) training your model: we create a function to train the model with specifying the learning rate and the number of iterations as the arguments and calculating those equations.

The first equation is quite similar to that of the logistic regression , which is logical since we are trying to predict the outcome. Softmax is a function that takes as input a vector of K real numbers, and normalizes it into a probability distribution consisting of K probabilities. [7] in the end we compute the loss function ”Yoh” refers to ”one-hot vector”.

4) Testing: This module predicts an appropriate emoji with respect to its accuracy. I called out a predict function which takes any given sentence from the given test data set and predicts the appropriate emoji.

5) conclusion: This version of emojifier was not a bad model of prediction giving us an accuracy of 80-90 percent but loses its value when it comes to prediction of negative sentiments of the sentences it had taken in as input. Thus studying the experiment we can conclude that This algorithm that we have used does not understand word ordering, so is not good at understanding phrases like ”not happy.”

D. Emojifier Version 2

1) overview: This version of Emojifier uses the version 1 as its baseline model to predict the appropriate emoji. It will use LSTM in order to tackle the error word ordering into account . This model takes keras modules to perform functions such as embedding, LSTM and Dropout. This version of the experiment also has 3 modules: a)Embedding layer

[image:3.612.335.523.48.412.2]

b)Building Emojify c)Testing

[image:3.612.90.258.56.130.2]

Fig. 5. test results with the first version

Fig. 6. The architecture diagram of the emojifier version 2

E. Experiment Architecture

1) Embedding layer: This function maps words pre trained indices corresponding to the words in a sentence and collectively form a vector.In Keras, they offer an Embedding function that can be used for sentiment analysis on text data . to prevent redundancy in the vectors, it requires that the input data be integer encoded, so that each word is represented by a unique integer. This function inputs a sentence of words and gives indices to each one of the words. In this experiment we do not train the parameters in the embedding layer thus not updating the values of word embeddings.

[image:3.612.326.551.244.409.2] [image:3.612.315.533.631.696.2]

(4)

2) Building the Emojifier version 2: In this module we create a model for emojifier which takes in input shape ,vector map, and also their indices . The max length for any sentence is fixed to 10. In this function I have used the keras sequence models for the ease of implementing different functions related to this experiment. such as embedding, LSTM and Droput. Softmax function is a function that takes as input a vector of K real numbers, and normalizes it into a probability distribution consisting of K probabilities [7] . This functions also returns a model instance in keras. In this module we are also training the model by fitting it into the keras model and making mini batches to train faster calculating loss for each iteration.

3) Testing: This module predicts appropriate emojis for the given sentence.After training the model graph we then test the model with the given test data set and more importantly check the sentiments of the sentences. this also displays the accuracy of the experiment. In this experiment the accuracy does not remain constant. It varies from 75-90 % accuracy

Fig. 8. Test results with emojifier version 2

F. Real time application

[image:4.612.327.557.81.255.2]

This theory was put into a real time test by creating a command line chatbot in python, and also using a dataset of sentences ranging from humor to emotions as well. The back channel response generated though were quite random, but the emojis at the end of the responses were appropriate as expected with an accuracy of 75-85%. This showed that even with small training set we can have good results. such as feeling sad is showing a negative sentiment and love is showing a heart sentiment etc. The responses by the chatbots were wuite random thus sometimes could not make sense to what query was being asked but my prototype has only one purpose to give appropriate emojis and with an predicted accuracy of 75-85 % , it did show a fair sense of the correct emojis in relation with that sentence.

Fig.9. Chatbot back channel responses with appropriate emojis

IV. CONCLUSION

This experiment Emojifier version 2 has incorporated LSTM which has taken word ordering into account. This experiment has given us 75-90 % of accuracy in predicting the appropriate emojis. With respect to the version 1 of the experiment, this version keeps one word in memory to refer it later in the sentence which makes phrases like ”not happy” also shows the right emoji. The current model still isn’t very robust at understanding negation. But if the training set had more examples or a different data set would have been used, this LSTM model would show better results.

Testing on the command line chatbot upon several examples it was noted that though the generated back channel responses were not that accurate but definitely the emojis that were associated with it does show very much in relation to the back channel response that was generated. it could be seen that 75-90 % accuracy was maintained even in real time.

The real time test showed great results on the emojis predicted and also showed that the smiley emoji has a better rate of accuracy among the other tested emojis such as heart and sad emoji, in the sentences with sentiments

The prototype also worked for non sentimental sentences such as definitions and food related queries, and it undoubtedly served its purpose as it was meant to be created. Even though this is just a prototype and is used for small sentences but it shows great promise and lots off areas for research and issues to be worked upon.

(5)

Murugan, who not only reviewed my prototype on a general basis but also did comment to be very helpful in day to day lives as well. They used the chatbot and found it to be very innovative, thus awarding me the best thesis award on the day.

V. FUTUREWORK

This project was made with respect to a very small data training set thus is not robust. I believe this experiment has many applications for the future and can be deployed in any one of those, virtual Chat assistance for instance, by organizations towards interacting with the clients.

But in the research department also it shows a great promise on removing bias component in a sentence , learning to respect the emotions of the people the real time application like chatbot would interact with, definitely the time and the space complexity can always be improved at minimal cost of the data used.

The main thing could be to test the prototype with more number of emojis in the training data set. Undoubtedly the training set is the very key in any modern machine learning algorithms but I believe that if this chatbot prototype can learn from the users it will be interacting with, then definitely it can have a better accuracy in accordance to the users.

The one thing I would love to work on would be that it would be able to learn a new emoji. Even if an organization wants a logo to become an emoji and be used more often by the chatbot while interacting with their clients it could well be quite an achievement.

FIGURE REFERENCES

Fig1:Spencer Cappallo et al. New Modality: Emoji Chal-lenges in Prediction, Anticipation, and Retrieval. In: IEEE Transactions on Multimedia 21.2 (Feb. 2019), pp. 402415.

Fig2:https://medium.com/@shiyan/materials-to-understand-lstm-34387d6454c1

Fig3: https://i.stack.imgur.com/7cRVr.png

Fig4: https://github.com/Kulbear/deep-learning-coursera/tree/master/Sequence%20Models/images

Fig 6: https://github.com/Kulbear/deep-learning-coursera/tree/master/Sequence%20Models/images

Fig7: https://github.com/Kulbear/deep-learning-coursera/tree/master/Sequence%20Models/images

ACKNOWLEDGMENT

It was my great honour to present my thesis on a real life application of emojifier in chatbots, an automated chatting machine to interact with users.

I would like to express my special thanks and gratitude to everyone who helped me throughout for the working and

making of this project. I would express my gratitude to Mr. T.R Pachamuthu for allowing me to be a part of SRM Institute of Science and Technology (SRM IST), which helped me access a lot of articles, journals from various different authors from all parts of the world through different websites for my research.

I would also like to thank the entire department of Computer Science and Engineering (CSE) for their constant support and expertise knowledge on the subject regarding the project as well as their help in publishing paper in journals. I would like to express my gratitude to our CSE Head of Department (HOD) Dr. B. Amutha for assigning me a project mentor and a panel for reviewing my project and asking them to share their precious time for this project.

I would also like to express my gratitude to my class incharge Mrs. R Radha for her constant support throughout the project. Also my sincere thanks to CSE coordinator Dr. Annapurani Panaiyappan .K for giving me a chance to present my paper in the international conference of Internet of Things (ICIOT 2019) held at SRM IST, and also awarding me the best paper award in that conference.

I would like to thank the entire panel Dr. B. Muruganantham, Dr. S Ganesh, my project mentor Dr. A.M.J Muthukumaran and also the panel head Dr. A. Murugan for their constant support and guidance towards the completion of my project. They have undoubtedly given me a lot of valuable expert knowledge on the subject which has undoubtedly increased my project to the highest level possible.

Last but not least, I would like to thank the educational platform Coursera in giving me the knowledge by being able to access professors video lectures around the world and , and also the IEEE to give access to some of the research papers from authors all around the world.

REFERENCES

[1] S. H. a. J. Schmidhuber, "Long Short-Term

Memory,"

Neural Computation,

vol. 9, pp.

1735-1780, 1997.

(6)

Word Vector Representations to build an Emojifier using LSTM