In this chapter, we have introduced some techniques based on Markov theory to perform both sentiment classification and transfer learning in cross-domain tasks. The basic algorithm aims at building a Markov chain transition matrix, whose states represent either terms or classes, and whose transitions depend on term co-occurrences in documents. A first variant is proposed that aims at considering the granularity of sentences rather than perceiving documents as a whole. A second modification affects the classification process, so that it can be driven by polarity-bearing terms.
Experiments on a common benchmark corpus show that the proposed algorithm achieves performance comparable with the state of the art with respect to both in-domain and cross-domain sentiment classification. Also, the introduced techniques require less parameters to be tuned. Finally, in spite of having a comparable computational complexity, which quadratically grows with the number of features, much fewer terms are required to achieve good accuracy. The last two characteristics mentioned make our Markov-based techniques suitable for the analysis of big datasets.
72
CHAPTER 7. MARKOV TECHNIQUES FOR TRANSFER LEARNING AND SENTIMENT ANALYSIS
8
Deep methods to enhance text
understanding in big dataset analysis
Traditional transfer learning techniques are not appropriate for the analysis of large datasets. In spite of their good performance with small size data, they do not scale properly with the cardinality of datasets, principally due to computational complexity and high-dimensional text representation.In this chapter, an analysis of distributed text representation methods is performed, explaining how they improve text understanding and discussing their impact on transfer learning. Another deep learning progress is then examined, namely memory-based deep neural networks, showing that some deep architectures are automatically able to store and preserve relevant information in memory through time. This capability makes memory-based deep neural networks very appropriate for both in-domain and cross-domain sentiment classification. Finally, we show how such recent deep learning advances can be combined to have a breakthrough in cross-domain sentiment classification.
8.1
Deep learning
Before focusing on distributed text representations and memory-based deep neural networks, we make a brief introduction of deep learning, explaining its general characteristics, and discussing its impact on sentiment classification.
8.1.1
Overview
Conventional machine learning techniques have been applied to a large variety of tasks, including the identification of object in images, speech-to-text translation, information retrieval, matching problems, and so on. However, traditional approaches have a limited ability to process raw data. To deal with this problem, the approach used for a long time was to design a feature extractor able to map raw data into a more suitable representation. This is effective, but has some drawbacks: first, a feature extractor needs to be carefully engineered; and second, domain expertise is needed in order to design it correctly.
74
CHAPTER 8. DEEP METHODS TO ENHANCE TEXT UNDERSTANDING IN BIG DATASET ANALYSIS
Figure 8.1: A general deep learning architecture.
An alternative approach is representation learning, whose goal is the automatic discovery of a suitable representation from raw data. Deep learning techniques are representation learning methods with multiple levels of representation, as can be seen in Figure 8.1. Each layer is obtained by composing different modules, which generally involve the application of simple functions. Very often these functions are non-linear, like for example the hyperbolic tangent or the sigmoid function. Each module transforms the representation at one level into a representation at a higher, more abstract level. The layers of deep neural networks are typically trained without supervision, which is generally added just in the upper layer of the network. This brings to robust data representations, independent of the task.
Basically, deep architectures are able to learn complex functions by composing simpler functions, extracting intricate structures and patterns in high-dimensional data. There is no need for feature engineering, because a suitable data representation is automatically learned. Moreover, deep architectures are scalable in terms of data, because larger amount of data brings to better representations, and they are less affected by the curse of dimensionality.
CHAPTER 8. DEEP METHODS TO ENHANCE TEXT UNDERSTANDING IN BIG
DATASET ANALYSIS 75
8.1.2
The impact on sentiment classification
Thanks to such advantages, deep learning is recently having a great impact on several research areas, such as image recognition and classification, speech processing and recognition, visual object detection, video, audio, natural language processing, machine translation, drug discovery and genomics [LBH15]. Here we only focus on sentiment classification, but careful readers can find further details in [LBH15, B+09, GBCB16, Sch15].
Deep learning brought to a dramatic improvement in sentiment classification. Socher et al. [SPW+13] introduced Recursive Neural Tensor Networks to foster single sentence sentiment classification. Apart from the high accuracy achieved in classification, these networks are able to capture sentiment negations in sentences due to their recursive structure. Dos Santos et al. [dSG14] proposed a Deep Convolutional Neural Network that jointly uses character-level, word- level and sentence-level representations to perform sentiment analysis of short texts. Kumar et al. [KIO+16] presented Dynamic Memory Network (DMN), a neural network architecture that processes input sequences and questions, forms episodic memories, and generates relevant an- swers. The ability of DMNs to naturally capture position and temporality allows this architecture achieving the state-of-the-art performance in single sentence sentiment classification over the Stanford Sentiment Treebank proposed in [SPW+13]. Tang et al. [TQL15] introduced Gated Recurrent Neural Networks to learn vector-based document representation, showing that the underlying model outperforms the standard Recurrent Neural Networks in document modeling for sentiment classification. Zhang and LeCun [ZL15a] applied temporal convolutional networks to large-scale data sets, showing that they can perform well without the knowledge of words or any other syntactic or semantic structures. Wang et al. [WJL16] combined Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN) for sentiment analysis of short texts, taking advantage of the coarse-grained local features generated by CNN and long-distance dependencies learned via RNN. Chen et al. [CXH+16] proposed a three-steps approach to learn a sentiment classifier for product reviews. First, they learned a distributed representation of each review by a one-dimensional CNN. Then, they employed a RNN with gated recurrent units to learn distributed representations of users and products. Finally, they learned a sentiment classifier from user, product and review representations.
Despite the recent success of deep learning in in-domain sentiment classification tasks, few attempts have been made in cross-domain problems. Glorot et al. [GBB11] used Stacked Denoising Autoencoder introduced in [VLL+10] to extract domain-independent features in an unsupervised fashion, which can help transferring the knowledge extracted from a source domain to a target domain. However, they relied only on the most frequent 5,000 terms of the vocabulary for computational reasons. Although this constraint is often acceptable with small or medium data sets, it could be a strong limitation in big data scenarios, where very large data sets are required to be analyzed.
76
CHAPTER 8. DEEP METHODS TO ENHANCE TEXT UNDERSTANDING IN BIG DATASET ANALYSIS