Bibliography - Master Paper Final Version.pdf

Ackermann, P., Kohlschein, C., Bitsch, J. Á., Wehrle, K., & Jeschke, S. (2016, September). EEG-based automatic emotion recognition: Feature extraction, selection and classification methods. In 2016 IEEE 18th international conference on e-health networking, applications and services (Healthcom) (pp. 1-6). IEEE. Binali, H., Wu, C., & Potdar, V. (2010, April). Computational approaches for emotion

detection in text. In 4th IEEE International Conference on Digital Ecosystems and Technologies (pp. 172-177). IEEE.

Bishop, C. M. (2006). Pattern recognition and machine learning. springer.

Cambria, E. (2016). Affective computing and sentiment analysis. IEEE Intelligent Systems, 31(2), 102-107.

Cambria, E., & White, B. (2014). Jumping NLP curves: A review of natural language processing research. IEEE Computational intelligence magazine, 9(2), 48-57. Chaumartin, F. R. (2007, June). UPAR7: A knowledge-based system for headline

sentiment tagging. In Proceedings of the 4th International Workshop on Semantic Evaluations (pp. 422-425). Association for Computational Linguistics.

Chen, S. Y., Hsu, C. C., Kuo, C. C., & Ku, L. W. (2018). Emotionlines: An emotion corpus of multi-party conversations. arXiv preprint arXiv:1802.08379.

Chopade, C. R. (2015). Text based emotion recognition: A survey. International journal of science and research, 4(6), 409-414.

Chowdhury, G. G. (2003). Natural language processing. Annual review of information science and technology, 37(1), 51-89.

Devillers, L., & Vidrascu, L. (2006). Real-life emotions detection with lexical and paralinguistic cues on human-human call center dialogs. In Ninth International Conference on Spoken Language Processing.

Devillers, L., Lamel, L., & Vasilescu, I. (2003, July). Emotion detection in task-oriented spoken dialogues. In 2003 International Conference on Multimedia and Expo. ICME'03. Proceedings (Cat. No. 03TH8698) (Vol. 3, pp. III-549). IEEE. Ekman, P., & Friesen, W. V. (1971). Constants across cultures in the face and

emotion. Journal of personality and social psychology, 17(2), 124.

Hirat, R., & Mittal, N. (2015). A survey on emotion detection techniques using text in blogposts. International Bulletin of Mathematical Research, 2(1), 180-187.

Kao, E. C. C., Liu, C. C., Yang, T. H., Hsieh, C. T., & Soo, V. W. (2009, April). Towards text-based emotion detection a survey and possible improvements. In 2009

International Conference on Information Management and Engineering (pp. 70- 74). IEEE.

Kim, G. C. (1997). A dialogue analysis model with statistical speech act processing for dialogue machine translation. Spoken Language Translation.

extraction. Expert Systems with Applications, 41(4), 1742-1749.

Medhat, W., Hassan, A., & Korashy, H. (2014). Sentiment analysis algorithms and applications: A survey. Ain Shams engineering journal, 5(4), 1093-1113.

Mullen, T., & Collier, N. (2004). Sentiment analysis using support vector machines with diverse information sources. In Proceedings of the 2004 conference on empirical methods in natural language processing.

Pak, A., & Paroubek, P. (2010, May). Twitter as a corpus for sentiment analysis and opinion mining. In LREc (Vol. 10, No. 2010, pp. 1320-1326).

Pang, B., Lee, L., & Vaithyanathan, S. (2002, July). Thumbs up?: sentiment classification using machine learning techniques. In Proceedings of the ACL-02 conference on Empirical methods in natural language processing-Volume 10(pp. 79-86). Association for Computational Linguistics.

Plutchik, R. (1980). A general psychoevolutionary theory of emotion. In Theories of emotion (pp. 3-33). Academic press.

Rao, Y., Xie, H., Li, J., Jin, F., Wang, F. L., & Li, Q. (2016). Social emotion

classification of short text via topic-level maximum entropy model. Information & Management, 53(8), 978-986.

Soleymani, M., Garcia, D., Jou, B., Schuller, B., Chang, S. F., & Pantic, M. (2017). A survey of multimodal sentiment analysis. Image and Vision Computing, 65, 3-14. Tan, S., & Zhang, J. (2008). An empirical study of sentiment analysis for Chinese

documents. Expert Systems with applications, 34(4), 2622-2629.

Tripathi, S., Acharya, S., Sharma, R. D., Mittal, S., & Bhattacharya, S. (2017, February). Using Deep and Convolutional Neural Networks for Accurate Emotion

Classification on DEAP Dataset. In Twenty-Ninth IAAI Conference.

Turner, B. M. (2000). Histone acetylation and an epigenetic code. Bioessays, 22(9), 836 845.

Whitelaw, C., Garg, N., & Argamon, S. (2005, October). Using appraisal groups for sentiment analysis. In Proceedings of the 14th ACM international conference on Information and knowledge management (pp. 625-631). ACM.

Appendix – Classifier Implementation Codes

dataLoadAndProcess.py

import json import re

def data_load(train_path, val_path, test_path): """

:param train_path: :param val_path: :param test_path: :return train, val, test: """

# load raw data from json file with open(train_path) as train_data: train = json.load(train_data) # json.loads is only for string

with open(val_path) as validation_data: val = json.load(validation_data) with open(test_path) as test_data: test = json.load(test_data) return train, val, test

train, val, test = data_load('EmotionLines/EmotionPush/emotionpush_train.json', 'EmotionLines/EmotionPush/emotionpush_dev.json',

'EmotionLines/EmotionPush/emotionpush_test.json')

# extract emotion and utterance x_train = []

y_train = [] x_val = [] y_val = [] x_test = []

y_test = []

def text_extraction(train, val, test):

# extract utterance and emotions from the raw text data for ith_group in train:

for j in ith_group:

x_train.append(j['utterance']) y_train.append(j['emotion']) for ith_group in val:

for j in ith_group:

x_val.append(j['utterance']) y_val.append(j['emotion']) for ith_group in test:

for j in ith_group:

x_test.append(j['utterance']) y_test.append(j['emotion'])

return x_train, y_train, x_val, y_val, x_train, y_test

x_train, y_train, x_val, y_val, x_train, y_test = text_extraction(train, val, test)

def load_stopwords(): """ :return: stopwords """ file = open('english', 'r') stopwords = []

for line in file:

stopwords.append(line.split()[0]) return stopwords # global variables REPLACE_BY_SPACE_RE = re.compile('[/(){}\[\]\|@,;]') BAD_SYMBOLS_RE = re.compile('[^0-9a-z #+_]') STOPWORDS = set(load_stopwords())

# remove and replace special chars in the utterance def text_prepare(text):

text = text.lower() # lowercase text

text = REPLACE_BY_SPACE_RE.sub(' ', text) # replace REPLACE_BY_SPACE_RE symbols by space in text

text = BAD_SYMBOLS_RE.sub('', text) # delete symbols which are in BAD_SYMBOLS_RE from text

text = ' '.join([word for word in text.split() if word not in STOPWORDS]) # delete stopwords from text

text = text.strip() return text

# prepare data

x_train = [text_prepare(x) for x in x_train] x_val = [text_prepare(x) for x in x_val] x_test = [text_prepare(x) for x in x_test]

# Dictionary of all emotions from train corpus with their counts. emotions_counts = dict()

for emotion in y_train:

# y_train represents list of 'emotion' list if emotion in emotions_counts:

emotions_counts[emotion] += 1 else:

emotions_counts[emotion] = 1

# Dictionary of all words from train corpus with their counts. words_counts = dict()

for item in x_train:

# X_train represents the 'utterance' string item_list = item.split(' ')

for word in item_list: if word in words_counts: words_counts[word] += 1 else:

words_counts[word] = 1 bagOfWords.py

from dataLoadAndProces import words_counts from dataLoadAndProces import emotions_counts from dataLoadAndProces import x_train

from dataLoadAndProces import x_val from dataLoadAndProces import x_test from dataLoadAndProces import y_train from dataLoadAndProces import y_val

from dataLoadAndProces import y_test

from sklearn.preprocessing import LabelEncoder import numpy as np

from scipy import sparse as sp_sparse

from sklearn.feature_extraction.text import TfidfVectorizer DICT_SIZE = 700

# this is a dictionary

WORDS_TO_INDEX = dict()

# # words_frequency = sorted(words_counts.items(), key=lambda x: x[1], reverse=True)[:DICT_SIZE]

# words_frequency = sorted(words_counts.items(), key=lambda x: x[1], reverse=True) # index = 0

# for item in words_frequency:

# WORDS_TO_INDEX[item[0]] = index # index += 1

def my_bag_of_words(text, words_to_index, dict_size): """

text: a string

dict_size: size of the dictionary

return a vector which is a bag-of-words representation of 'text' """

result_vector = np.zeros(dict_size)

text = text.split() # split text string to a array of words for word in text:

if word in words_to_index:

result_vector[words_to_index[word]] += 1; return result_vector

# x_train_mybag = sp_sparse.vstack([sp_sparse.csr_matrix(my_bag_of_words(text, WORDS_TO_INDEX, DICT_SIZE))

# for text in x_train])

# x_val_mybag = sp_sparse.vstack([sp_sparse.csr_matrix(my_bag_of_words(text, WORDS_TO_INDEX, DICT_SIZE))

# for text in x_val])

# x_test_mybag = sp_sparse.vstack([sp_sparse.csr_matrix(my_bag_of_words(text, WORDS_TO_INDEX, DICT_SIZE))

# for text in x_test])

"""

le = LabelEncoder()

y_train = le.fit_transform(y_train) y_val = le.fit_transform(y_val)

def tf_idf_features(X_train, X_val, X_test): """

X_train, X_val, X_test — samples

return TF-IDF vectorized representation of each sample and vocabulary """

# Create TF-IDF vectorizer with a proper parameters choice # Fit the vectorizer on the train set

# Transform the train, test, and val sets and return the result

tfidf_vectorizer = TfidfVectorizer(ngram_range=(1, 2), max_df=0.9, min_df=5, token_pattern='(\S+)')

X_train_tfidf = tfidf_vectorizer.fit_transform(X_train) X_val_tfidf = tfidf_vectorizer.transform(X_val) X_test_tfidf = tfidf_vectorizer.transform(X_test)

return X_train_tfidf, X_val_tfidf, X_test_tfidf, tfidf_vectorizer.vocabulary_

x_train_tfidf, x_val_tfidf, x_test_tfidf, tfidf_vocab = tf_idf_features(x_train, x_val, x_test)

tfidf_reversed_vocab = {i: word for word, i in tfidf_vocab.items()} application.py

from bagOfWords import x_train_tfidf from bagOfWords import x_val_tfidf from bagOfWords import x_train from bagOfWords import x_val from bagOfWords import x_test from bagOfWords import y_train from bagOfWords import y_val

from bagOfWords import my_bag_of_words from bagOfWords import WORDS_TO_INDEX from bagOfWords import words_counts

from sklearn import multiclass

from sklearn.gaussian_process import GaussianProcessClassifier from sklearn.gaussian_process.kernels import RBF

from sklearn import svm from sklearn import metrics

from scipy import sparse as sp_sparse import csv

def construct_bag_of_words(DICT_SIZE):

words_frequency = sorted(words_counts.items(), key=lambda x: x[1], reverse=True)[:DICT_SIZE]

index = 0

for item in words_frequency:

WORDS_TO_INDEX[item[0]] = index index += 1

x_train_mybag = sp_sparse.vstack([sp_sparse.csr_matrix(my_bag_of_words(text, WORDS_TO_INDEX, DICT_SIZE))

for text in x_train])

x_val_mybag = sp_sparse.vstack([sp_sparse.csr_matrix(my_bag_of_words(text, WORDS_TO_INDEX, DICT_SIZE))

for text in x_val])

x_test_mybag = sp_sparse.vstack([sp_sparse.csr_matrix(my_bag_of_words(text, WORDS_TO_INDEX, DICT_SIZE))

for text in x_test])

return x_train_mybag, x_val_mybag, x_test_mybag

# Train with different classifiers def train_classifier(x_train, y_train): """

X_train, y_train — training data

Create and fit different classifier wraped into OneVsRestClassifier. return: trained classifier

""" '''LogisticRegression''' # model = multiclass.OneVsRestClassifier(linear_model.LogisticRegression(penalty='l2', C=1.0)) '''RidgeClassifier''' # model = multiclass.OneVsRestClassifier(linear_model.RidgeClassifier(alpha=1.0, fit_intercept=True, normalize=False, copy_X=True,

# max_iter=None, tol=0.001, class_weight=None, # solver='auto', random_state=None))

'''LinearSVC''' # model = multiclass.OneVsRestClassifier(svm.LinearSVC(random_state=0)) '''SGDClassifier''' # model = multiclass.OneVsRestClassifier(linear_model.SGDClassifier(penalty='l2', alpha=0.0001, l1_ratio=0)) '''GaussianProcessClassifier''' # kernel = 1.0 * RBF(1.0) # model = multiclass.OneVsOneClassifier(GaussianProcessClassifier(kernel=kernel, random_state=0)) # print(y_train.shape)

# model.fit(x_train.toarray(), y_train) #.toarray() is used for gaussianprocess model = multiclass.OneVsOneClassifier(svm.SVC(gamma='auto'))

model.fit(x_train, y_train) return model

def train(writer, DICT_SIZE):

x_train_mybag, x_val_mybag, x_test_mybag = construct_bag_of_words(DICT_SIZE) classifier_mybag = train_classifier(x_train_mybag, y_train)

y_val_predicted_labels_mybag = classifier_mybag.predict(x_val_mybag)

# y_val_predicted_scores_mybag = classifier_mybag.decision_function(x_val_mybag) accuracy_score = metrics.accuracy_score(y_val, y_val_predicted_labels_mybag) f1_macro = metrics.f1_score(y_val, y_val_predicted_labels_mybag, average='macro') f1_micro = metrics.f1_score(y_val, y_val_predicted_labels_mybag, average='micro') f1_weighted = metrics.f1_score(y_val, y_val_predicted_labels_mybag,

average='weighted')

writer.writerow([DICT_SIZE, accuracy_score, f1_macro, f1_micro, f1_weighted]) # classifier_tfidf = train_classifier(x_train_tfidf, y_train)

# y_val_predicted_labels_tfidf = classifier_tfidf.predict(x_val_tfidf)

# # y_val_predicted_scores_tfidf = classifier_tfidf.decision_function(x_val_tfidf) # accuracy_score = metrics.accuracy_score(y_val, y_val_predicted_labels_tfidf) # f1_macro = metrics.f1_score(y_val, y_val_predicted_labels_tfidf, average='macro') # f1_micro = metrics.f1_score(y_val, y_val_predicted_labels_tfidf, average='micro') # f1_weighted = metrics.f1_score(y_val, y_val_predicted_labels_tfidf,

average='weighted')

# writer.writerow([DICT_SIZE, accuracy_score, f1_macro, f1_micro, f1_weighted])

# for basic bag of words

with open('results/emotionpush/bag-of-words-emotionpush-OvO-svc.csv', mode='w') as file:

for size in range(100, 7100, 100): train(writer, size)

# # for extended BOW(TF-IDF)

# with open('results/emotionpush/bag-of-words-emotionpush-OvO-svc.csv', mode='w') as file:

# writer = csv.writer(file, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)

In document Master Paper Final Version.pdf (Page 32-41)