Wordlist Stability - Preliminary Evaluation

4.2 Preliminary Evaluation

4.2.6 Wordlist Stability

A bootstrap resampling technique can be used to compute confidence intervals around the summary statistic. It is possible to sample both the words from the predefined

8In other words, the window of observations used to compute the current value is increasing over time. Hence, the initial values are computed using fewer data points than the latter values.

emotion lists and the documents in the collection to generate such confidence intervals. The summary statistic can be repeatedly computed across many such samples to derive a bootstrap distribution from which statements of confidence can be inferred. Words can be bootstrapped to assess the sensitivity of the results to the choice of words. This procedure yields wider confidence intervals at times when the word frequency distribution is more concentrated over a smaller number of words.

Documents can be bootstrapped to assess the sensitivity of the measure to how the words are distributed across the document collection. This procedure yields wider confidence intervals at times when the words are less evenly distributed across the documents, for example if a smaller set of documents contain a large proportion of the found emotion words. Only the sensitivity of the measure to the choice of words will be considered in this section (as the RTRS document collection is very large).

From the analysis of this chapter, in terms of the sensitivity to the word lists themselves, it is now possible to be fairly confident that both the excitement and anxiety lists contain words which are good representatives of the same emotional signal (especially the anxiety list). However, if no normalisation is performed on the word frequency series of individual words before aggregating them to the final index, due to the nature of word frequency distributions (which tend to follow power laws as noted above), the exclusion of some of the most frequent words could potentially make a substantial difference. If normalisation is performed this problem is likely to be less substantial. It is possible to estimate to what extent this is a problem by drawing random samples with replacement from the wordlists to generate synthetic lists and recompute the RSS index with the various aggregation procedures to

quantify this potential source of noise. This bootstrap procedure makes it possible to approximate the sampling distribution of possible RSS series to generate confidence intervals of the index with respect to the appearance of particular words in the list.

Bootstrap distributions of correlations for the above aggregation procedures are estimated, as well as with a raw aggregation of the 50 most frequent words, for the excitement and anxiety lists. The correlations are Pearson r scores computed between a particular bootstrap sample and the original series.

Figure 4.2.11 displays box plots of the four bootstrap distributions generated by sampling from the excitement dictionary (or only the 50 most frequent excitement words) with replacement and computing the excitement index either by aggregating the raw frequencies or by computing the rolling z-score for each word before

aggregating. Figure 4.2.12 displays corresponding plots generated using the anxiety list. The main box starts at the first and ends at the third quartile of the

corresponding distribution. The central line represents the median value. The

‘whiskers’ of the plot extend 1.5 times the interquartile range of the box. Anything

outside the whiskers are considered outliers and are plotted using small circles.

Once more, note the relative stability of the anxiety list compared to the excitement list. The scaled versions, in which each word gets an equal effective contribution to the final index, are much more robust than the raw versions. Finally, the versions including only the 50 most common words are seemingly not as robust as those containing all words.

4.3 Discussion

This chapter presented the basic RSS methodology, in particular how the preselected lexicons were constructed and how they are used to produce the RSS score and how articles are selected and aggregated into a time series of a given frequency. The second part of the chapter evaluates the methodology on several criteria. The

evaluation section discusses the effects of different ways to scale the relative frequency counts of excitement and anxiety words, the potential effect of basic negation, as well as more non-trivial issues such as the evolution of the relative ranks and frequencies of the words in a given list and, most importantly, the extent to which the words in a list tend to capture the same emotional signal. A thorough empirical evaluation of six different emotion lexicons has been carried out on all Reuters articles published in New York and Washington over the period January 1996 through October 2014.

Different methods to scale the emotion word frequencies to account for the amount of text analysed have been evaluated and it was found that scaling by the total number of characters or words seems sensible. The potential impact of negation on the RSS index was tested and it was found that, at least when the number of documents is as large as for the US Reuters database, it has little to no impact on the changes of the final index. The performance of different word lists is compared on the criteria of how well the most frequent words of each list capture a target emotion signal defined as the principal component of the correlation matrix of the individual word frequency series. It is found that the excitement and anxiety lists perform well (especially the anxiety list), and that there is room for improving the lists in terms of excluding words that act as outliers (in the sense of not contributing to the principal

component, or even contributing negatively) and including words that correlate with the main signal. Using the excitement and anxiety lists, a different word aggregation strategy that gives equal weight to each word is explored, thus avoiding the strong dependence of the RSS metric on the most frequent words. It is showed, using a bootstrap test, that such an aggregation procedure yields more robust (with respect to word choice) results.

EXC EXC Top 50 EXC Norm EXC Norm Top 50

0.30.40.50.60.70.80.9

Figure 4.2.11: Bootstrap distributions of correlations between series generated from excitement word samples and the original US EXC series

ANX ANX Top 50 ANX Norm ANX Norm Top 50

0.50.60.70.80.91.0

Figure 4.2.12: Bootstrap distributions of correlations between series generated from anxiety word samples and the original US ANX series

However, most of the analysis conducted relied on all information over the available period and might therefore not be suitable to apply in an ex ante test of

predictability. The full excitement and anxiety word lists and the original

methodology using the simple word frequency aggregation to produce the RSS series will therefore be used in the remaining chapters.

To conclude, the excitement and anxiety lexicons appear suitable to be used to make CNT operational as they are the two most consistent lists, and the anxiety list is also the list with the highest clarity score. The two lists and the RSS measure will now be used in the main experiments to follow this chapter.

Experiment 1: A Macro Measure of Relative 5

Sentiment

The first hypothesis states that conviction and emotion drive investment, both in financial markets and in business, which ultimately fuels economic growth. Much investment activity must in some way manage uncertainty. CNT postulates that agents manage to act in the face of such uncertainty by gaining conviction through narratives. This hypothesis can be explored empirically using statistical techniques such as Granger-causality and regression analysis.

This chapter reports on the first experiment of the thesis using the derived relative sentiment shift methodology. The experiment explores the extent to which financial markets and the macroeconomy are driven by macro confidence in the form of aggregate relative sentiment. Before proceeding to test the hypothesis, indices of relative sentiment shifts will be compared to conventional measures of confidence and sentiment in the form of the Michigan Consumer Sentiment Index and the VIX.

5.1 Hypothesis

It is hypothesised that the relative balance between approach and avoidance

(excitement and anxiety) determines action under uncertainty. As a consequence of this, the amount of investment made under uncertainty is expected to depend on RSS.

In other words, an increase in medium to long term financial and business investment (short term investment is not necessarily as dependent on conviction, although

investment strategies themselves likely are - including quantitative and technical strategies) made under uncertainty should follow an increase in the relative difference between excitement and anxiety among economic actors. Similarly, a decrease in investment activity should follow from a decrease in aggregate relative emotion.

Given the complexity of financial markets and the economy as a whole, it is reasonable to assume that most medium to long term decisions have uncertain outcomes and should therefore be driven by conviction. This hypothesis is tested by constructing relative sentiment indices at the macro level and testing how well they predict or lead business investment (and as a consequence also GDP growth).

In document An Algorithmic Investigation of Conviction Narrative Theory: Applications in Business, Finance and Economics (Page 113-119)