3.4 Methodology and Results Summary
4.3.2 Three-class Sentiment Classification
In the previous experiment, we classified financial tweets into one of the two sentiment categories, positive or negative, and therefore assumed that every tweet contained an opinion. In the following experiments, we took into account the concept of the neutral zone, in order to classify tweets into three sentiment categories: positive, negative, or neutral. Our aim was to investigate whether the introduction of the neutral zone would improve the predictive capabilities of tweets. First, we experimented with the simple
42 Chapter 4. Static Predictive Twitter Sentiment Analysis
version of the neutral zone, i.e., the fixed neutral zone (see Section 4.1.1.1), and then, in the second experiment, we applied the improved neutral zone, i.e., the relative neutral zone (see Section 4.1.1.2).
In the first experiment, we employed the fixed neutral zone and, therefore, a tweet was classified as neutral if its distance d(x) from the SVM hyperplane was in the boundaries of the neutral zone, that is, −t < d(x) < t. We applied the same transformation of data as before (see Section 4.2.2) and performed the Granger analysis test on tweet sentiment time series data and the corresponding stock closing prices time series data. We varied the size of the neutral zone (i.e., the t value) from 0 to 1 (where t=0 corresponds to classification without the neutral zone) and calculated the p-value for the separate day lags (1, 2, and 3). The results are shown in Table 4.1. The column where the size of the neutral zone is 0, represents the results of the Granger analysis test without the neutral zone, where financial tweets were classified into one of the two categories, positive or negative. All the remaining columns contain p-values for various sizes of the neutral zone. Values which are lower than the p-value of 0.1, after applying the Bonferroni correction, are marked in bold. The highest number of significant values was obtained with t values of 0.8 and 1 for the border distance of the neutral zone from the SVM hyperplane and for the September- November time period. In the same time period significant results were obtained also in the two-class setting without the neutral zone (see Section 4.3.1). This may be due to more active discussions on Twitter given high volatility of the stock price in this period. The results in Table 4.1 therefore indicate that by introducing the neutral zone the predictive power (in terms of Granger causality testing) of the sentiment classifier was improved.
In Appendix B, we report the results of the Granger causality correlation between the values of daily changes of positive sentiment probability and daily returns of the stock closing prices, by using the fixed neutral zone, also for several other companies (Apple, Amazon, Cisco, Google, Microsoft, Netflix, and Research In Motion Limited), whose tweets we collected. The results show that also for several other companies, the learned classifier has the potential to be useful for stock price prediction in terms of Granger causality.
Additionally, we explored whether there is evidence for the reversed causality (that the price movements may influence the public sentiment in tweets) for the Baidu company. The results showed that there was some causality in that direction, but after making adjustments to the critical p-value by applying the Bonferroni correction, no significant results were left.
In the second experiment, using the relative neutral zone and calculation of classifi- cation reliability (see Section 4.1.1.2), we repeated our experiments on classifying Baidu financial tweets and calculating Granger causality correlation between the values of daily changes of positive sentiment probability and daily returns of the stock closing prices. In this setting, a tweet was classified as being neutral if the classification reliability was below a given threshold. As the first step, we calculated the average distances of training examples (positive and negative) from the SVM hyperplane. The average distance of pos- itive training examples was 1.7922 and the average distance of negative training examples was -1.7069. These distances indicate that the classifier was more certain in classifying positive examples, since the absolute value of positive average distance was higher than the absolute value of the negative one. We conducted a series of experiments by varying the reliability threshold R and calculating Granger causality correlation. The reliability threshold value was varied from 0 to 1, with an increase of 0.1. As in the previous ex- periments, we adjusted the critical p-value by applying the Bonferroni correction. The results are shown in Table 4.2. As can be seen from the table, the relative neutral zone further improved the results, i.e., that the positive sentiment probability Granger-causes closing stock price for Baidu. The best results were obtained with the values of 0.4 and
4.3. Experimental Results 43
Table 4.2: Statistical significance (p-values) of Granger causality correlation between pos- itive sentiment probability and closing stock price for Baidu, while changing the value of reliability threshold. Values which are lower than the p-value of 0.1, after applying the Bonferroni correction, are marked in bold.
Reliability
threshold 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Time period Lag
9 months 1 0.587 0.762 0.446 0.423 0.791 0.356 0.488 0.590 0.523 0.904 0.552 Mar.-May 1 0.824 0.778 0.476 0.720 0.471 0.760 0.910 0.920 0.917 0.928 0.667 June-Aug. 1 0.995 0.894 0.508 0.718 0.009 0.022 0.519 0.639 0.750 0.567 0.863 Sept.-Nov. 1 0.298 0.520 0.181 0.212 0.032 0.007 0.053 0.035 0.016 0.089 0.063 9 months 2 0.594 0.249 0.200 0.012 0.007 0.003 0.163 0.717 0.735 0.829 0.168 Mar.-May 2 0.624 0.788 0.733 0.642 0.298 0.406 0.566 0.681 0.996 0.517 0.218 June-Aug. 2 0.993 0.496 0.292 0.227 0.014 0.053 0.775 0.683 0.803 0.197 0.442 Sept.-Nov. 2 0.017 0.039 0.029 0.010 <0.001 <0.001 0.001 0.023 0.017 0.075 0.119 9 months 3 0.795 0.485 0.419 0.032 0.023 0.008 0.215 0.868 0.845 0.908 0.296 Mar.-May 3 0.311 0.652 0.770 0.762 0.507 0.492 0.738 0.647 0.989 0.478 0.382 June-Aug. 3 0.915 0.400 0.337 0.354 0.031 0.030 0.389 0.821 0.648 0.149 0.530 Sept.-Nov. 3 0.026 0.056 0.058 0.004 <0.001 <0.001 0.004 0.051 0.003 0.012 0.038
0.5 for the reliability threshold R. In comparison with Table 4.1, new results in Table 4.2 show more significant results. Also, it can be observed that for the same time period of September-November the p-values are lower than in the experiments with the fixed neutral zone.
In Appendix C, we report the results of the Granger causality correlation between the values of daily changes of positive sentiment probability and daily returns of the stock closing prices, by using the relative neutral zone, also for the other companies (Apple, Amazon, Cisco, Google, Microsoft, Netflix, and Research In Motion Limited), whose tweets we collected. Again, the results indicate that also for several other companies, the learned classifier and the improved relative neutral zone have the potential to be useful for stock price prediction in terms of Granger causality.
In addition, we explored whether there is evidence for the reversed causality (that the price movements may influence the public sentiment) for the Baidu company. In this experiment we obtained only one significant result for the time period June-August, 3-days lag and the value of 0.5 for the reliability threshold R.
4.3.3 Comparison of the Developed Sentiment Classifier with the Publicly