Performance check - Experimental results - Big Data mining and machine learning techniques appl

12.6 Experimental results

12.6.2 Performance check

The performance check aims to measure whether the compiled algorithm is as efficient as a manually optimized code. For this purpose, the elapsed time between the algorithms’ start and termination has been measured and compared. The measure has been taken by using thetime

function of thetimePython module.

The performance check has been performed on large generated data sets, to measure the scalability of the compiled algorithm. The tests have been conducted on data sets with 5 features and an increasing number of instances, starting from 10,000 and growing to 100,000 and 1 million. The number of slices used to parallelize the data is proportional to the data set cardinality

174

CHAPTER 12. TOWARDS VENDOR-AGNOSTIC IMPLEMENTATION OF BIG DATA ANALYTICS

(e.g. 50, 500 and 5,000 respectively). The dataset with 10,000 instances and divided into 50 slices is referred as the small dataset; analogously, the others are referred as the medium and the large respectively. The two versions of k-means have been tested on a cluster with an increasing number of worker nodes, starting from 2 and growing to 4, and 6. Each node has 4GB of RAM and 2 CPU.

Table 12.1 shows how the elapsed time of the compiled version of k-means scales with respect to the number of computational nodes and the number of instances used. The times shown in the table represent the average of the same experiment on 3 different runs.

# workers Small Medium Large

2 47.22 312.52 2,881.83 4 22.65 202.52 1,779.22 6 21.06 165.29 1,333.02

Table 12.1: Scalability of the compiled version of k-means with respect to the number of workers.

Table 12.2 shows the comparison between the manual and the compiled k-means respectively. Again, the average elapsed time on 3 runs has been taken.

Algorithm Small Medium Large

Manual k-means 22.89 163.18 1,315.04 Compiled k-means 21.06 165.29 1,333.02

Table 12.2: Elapsed time comparison between the manual and the compiled version of k-means.

The compiled version of k-means does not add a noticeable delay, compared to the manually written version. As can be seen in the experiment with 6 workers, which is the most affected by parallelization, the algorithms scale similarly by varying the dataset size.

12.7 Final remarks

This chapter has presented parallel programming primitives to define the parallelizable parts of algorithms in a uniform way, independent of existing analytic services. The rationale of this approach is to boost the development of big data analytics, fostering code reuse among different platforms and simplifying code optimization.

Programmers just have to identify parallel code fragments and encapsulate them into some parallel primitives, whereas parallellism as well as optimization are out of their duties. Paral- lelization is carried out by a compiler that is able to transform agnostically written code in a fully optimized, ready to be deployed application, tailored for a specific target platform. The compiler takes the algorithm definition as input, processes it, produces the necessary execution and deployment skeletons, choosing a suitable parallel computational pattern, and tuning the parameters of algorithms.

Part VI

Conclusion

13

Results achieved and future work

This chapter summarizes the most remarkable contributions of this thesis, and outlines ongoing and possible future work.

13.1 Methods for sentiment analysis

The first notable contribution of this thesis is the introduction of Markov chain techniques for sentiment analysis. In particular, our Markov chain introduced in Chapter 7 is able to perform both sentiment classification and transfer learning in cross-domain tasks. To the best of our knowledge, it is the first time that a Markov chain is used as a classifier for sentiment classification rather than for a support task as part-of-speech (POS) tagging.

Our Markov chain includes both terms and classes. The relationships between terms allow knowledge transfer from the source terms to the target terms, through the terms that are shared among domains. The relationships between terms and classes denote terms with a sentiment orientation that can be used for classification. Considering classes as Markov chain states is a novelty with reference to classification tasks, where the most common approach consists in building different Markov chains (e.g. one for each category) and evaluating the probability that a test document is being generated by each of them.

Also, we have proposed a variant of this basic approach that copes with sentences rather than considering each document as a whole (Section 7.3.1). A second variant has also been introduced so that the classification process is driven by polarity-bearing terms, which are able to discriminate among classes (Section 7.3.2).

Apart from the former variant, both the basic approach and the latter variant achieve accuracy comparable with the best methods in literature. Less parameters need to be tuned with respect to the state of the art, and less features are required to obtain such performance. These advantages make our Markov chain approach particularly appealing in big data scenarios, where scalability is as essential as effectiveness. Also, since our algorithm only relies on term co-occurrences, it can easily be applied to other languages.

The second remarkable contribution of this thesis is an investigation on deep learning for cross-domain sentiment classification. Paragraph vector, which is an unsupervised technique to learn distributed text representation that was not designed to perform knowledge transfer,

178 CHAPTER 13. RESULTS ACHIEVED AND FUTURE WORK

achieved comparable performance with our Markov chain in cross-domain tasks. The major outcome is that deep approaches to learn distributed text representations are able to extract domain-independent knowledge in an unsupervised fashion, so as to bridge the inter-domain semantic gap. This result suggests that a breakthrough of ad hoc cross-domain sentiment solutions can be obtained by combining distributed text representation and transfer learning techniques. Since paragraph vector can learn fixed-length feature representation from variable-length pieces of texts, and it is not affected by the curse of dimensionality, the breakthrough will also involve big data scenarios.

The suitability of distributed text representations as paragraph vector for transfer learning is confirmed by a really simple multi-source approach, where knowledge is extracted from N heterogeneous domains and the resulting model is applied to a different target domain. Using knowledge from multiple source domains is a na¨ıve approach to enhance transfer learning; anyway, accuracy increases by about 2-3% on average independently of the dataset size. This outcome supports our belief that the breakthrough will also involve big data context, where very large data sets are required to be analyzed.

Memory-based deep neural networks as gated recurrent unit have been added to the investigation later on. With reference to cross-domain tasks, gated recurrent unit performs poorly with small-scale data (e.g. 2,000 instances), achieves accuracy comparable with the other techniques with medium-scale data (e.g. 20,000 instances), and even outperforms them with large-scale data (e.g. 100,000 instances). The outcome suggests that gated recurrent unit needs many instances to learn bridging the inter-domain semantic gap. Once such instances are available, it is automatically able to align heterogeneous domains without explicit transfer learning mechanisms. This ability is supposedly due to GRU gates, which allow each unit working as a memory wherein relevant information can be stored and preserved through time.

Moreover, fine-tuning of a pre-trained model on a small sample of labeled target instances has been attempted to assess its impact on cross-domain as an explicit transfer learning mechanism. Paragraph vector does not take advantage of fine-tuning, since it is able to capture word semantics as well as word relationships without supervision. On the other hand, fine-tuning is beneficial to gated recurrent unit, because it acts as a transfer learning mechanism. The less training examples have originally been used to train the model on the source domain, the higher impact fine-tuning has on performance. As expected, greater amount of tuning data (e.g. 500 reviews rather than 250) brings to better performance with small-scale data. The impact of this factor decreases by augmenting the dataset cardinality, until eventually vanishing with large-scale data.

Strengthened by this first results, recent memory-based deep neural networks have been combined with word embeddings, a de facto standard in deep learning. Among the deep memory- based methods, gated recurrent unit and differentiable neural computer have been experimented. While in the former the memory mechanism is a part of the network structure, the latter is able to address and manage an external memory. Such an ability makes DNC one of the most innovative deep learning techniques, able to emulate reasoning and inference problems in natural language. Global vectors (GloVe) have been used in combination with both architectures for the initialization of their feature weights.

CHAPTER 13. RESULTS ACHIEVED AND FUTURE WORK 179 Experiments on the Amazon reviews corpus have shown that differentiable neural computer with GloVe dramatically outperforms state-of-the-art techniques for cross-domain sentiment classification. Transfer learning from a source domain to a target domain is supported by distributed word representations with small-scale datasets, and by memory mechanisms as the dataset size increases. Fine-tuning on a small sample of target instances is more useful to gated recurrent unit than differentiable neural computer, as the latter is less sensitive to noise, and few target samples are not enough to be relevant. Differentiable neural computer with GloVe feature weights achieves new state-of-the-art performance both in binary and fine-grained classifications on very large datasets built on the same Amazon reviews corpus. Finally, differentiable neural computer and gated recurrent unit achieve comparable performance with many techniques in single-sentence in-domain sentiment classification on the Stanford sentiment treebank. Small- scale training data and the absence of a mechanism to deal with sentence syntax are probably the reasons that prevent DNC from reaching the state-of-the-art performance.

In document Big Data mining and machine learning techniques applied to real world scenarios (Page 185-191)