2 Analysis of the State of the Art 7
2.6 Summary of the Research Gap 83
Analysis of the State of the Art 84
Table 1: Overview of approaches to study effects of investor sentiment on stock returns.
Authors Invest. sent. Own corpus Accuracy Invest. blogs Abn. ret. Fre- quency Time span Stocks News Schumaker et al. (2012) Yes Un- known
Unknown N/A No Minutes 5
weeks
S&P 500 stocks Tetlock et
al. (2008) Yes, only negative N/A Unknown N/A Yes Daily 1980–2004 S&P 500 stocks Antweiler &
Frank (2006)
No Yes Unknown N/A Yes Daily 1973–
2001 Mostly large cap. stocks Leinweber & Sisk (2011)
Yes No Unknown N/A Yes Daily 2004–
2009 S&P 1500 stocks
Stock message boards
Antweiler & Frank (2004) Yes, only bullish- ness Yes 88.1% (on the training set)
N/A No Daily Year
2000 DJIA and DJICI stocks Das & Chen
(2007) Yes Yes <70% N/A No Daily 2 months
of 2001 MSH stocks
Bollen et al.
(2011) Yes N/A Unknown N/A No Daily ~10 months of 2008 DJIA stock index Blogs O’Hare et
al. (2009) Yes Yes 75.1% Financial blogs N/A N/A N/A S&P stocks 500 Fotak
(2007) Yes N/A N/A Seeking-alpha Yes Daily 2006 Mostly large cap. stocks Chen et al.
(2014) Yes, only negative N/A Unknown Seeking-alpha Yes Daily 2005–2012 7422 stocks Zhang &
Skiena (2010)
No N/A Unknown Generic
topic blogs No Daily 2005– 2009 NYSE stocks Gilbert & Karahalios (2010) No, only
negative No Unknown Generic topic blogs No Daily ~10 months of 2008 S&P 500 index
This thesis Yes Yes 79.2% Seeking-
alpha, Blogspot Yes Monthly 2007- 2011 DJIA stocks NOTES per column:
“Invest. sent.”: Did the study use a measure of investor sentiment used as defined in Section 2.2.2? “Own corpus”: Did authors create their own corpus for evaluation and/or training of a classifier?
“N/A”: authors did not necessarily require a corpus. “No”: a corpus by others was used. “Accuracy”: Did the study evaluate the accuracy of the investor sentiment classifier?
“Invest. blogs”: Did the study investigate investment blogs? From which platform? “Abn. ret.”: Did authors study effects on (unexpected) abnormal returns?
“Frequency”: What is the frequency of the data used in the study of effects on (various forms of) returns? “Time Span”: What is the time period of the data used in the study of effects on returns?
Analysis of the State of the Art 85
From the review and Table 1, the following specific gaps are identified in the literature.
Explicit Investor Sentiment Classification
With respect to blogs, two studies were based on general-topic blogs and extracted general sentiment (Gilbert & Karahalios, 2010; Zhang & Skiena, 2010). That is, the sentiment does not necessarily relate to investment topics. Fotak (2007) classified individual stock recommendations in investment blog documents. However, the study of Fotak (2007) dates back to 2006, used only 500 documents, and did not employ a large-scale investor sentiment classification of blog documents over several years. Furthermore, the study of Chen et al. (2014) does not explicitly relate to and did not classify investor sentiment. Thus, there is a research gap of long term studies that are based on an automatic classifier of the sentiment orientation of investor sentiment from blog documents. Unlike prior studies, this thesis focuses on classifying investor sentiment from investment blog documents.
Corpus of Investment Blog Documents
For evaluating the accuracy of a classifier, a corpus of manually classified documents is required. Unlike for movie review sentiment or subjectivity classification (e.g., (Pang & Lee, 2004)), there is no standard corpus for investor sentiment (in blog documents). The reason might be that creating a corpus is laborious, e.g., O’Hare et al. (2009) found sentiment annotation in blog documents to be difficult. Thus, many authors choose not to create a corpus and rather use a dictionary-based approach or manual classification of investor sentiment (e.g., (Chen et al., 2014; Fotak, 2007; Tetlock et al., 2008)). Consequently, there is a research gap of evaluated classifiers, on which the studies are based. O’Hare et al. (2009) have created one of the first financial blog document corpora with classifications of the sentiment orientation of investor sentiment – however, it is not available publicly. Consequently, a novel corpus has been designed in the scope of this thesis. Like for the approaches that study investor sentiment in stock message boards (e.g., (Antweiler & Frank, 2004)), this is quite common in domains studied the first time.
Accurate Classifiers
Accuracy of the investor sentiment classifiers is reported only sparsely by studies (i.e., by (Antweiler & Frank, 2004; Das & Chen, 2007; O’Hare et al., 2009)). However, Antweiler & Frank (2004) determined accuracy erroneously and Das & Chen (2007) and O’Hare et al. (2009) did not study effects of investor sentiment on abnormal returns. In studies that use no corpus, there can be of course also no measuring of the accuracy (e.g. (Chen et al., 2014; Tetlock et al., 2008; Zhang & Skiena, 2010)). Even in studies that use a corpus, sometimes accuracy is not reported – but only some other metric (e.g., in (Gilbert & Karahalios, 2010)). Thus, there is a research gap in all reviewed approaches that study effects on abnormal returns concerning the suitability of the investor sentiment classification for the study. Therefore, this thesis rigorously evaluates the investor sentiment classification performance on a corpus using the standard metric accuracy (Sokolova & Lapalme, 2009).
Analysis of the State of the Art 86
Regarding the design of an accurate classifier of the sentiment orientation of investor sentiment in blog documents, O’Hare et al. (2009) proposed a machine learning-based approach, which achieved about 75% accuracy. This thesis also pursues a machine learning- based approach because it has been shown to be usually highly accurate (see Section 2.4.3). O’Hare et al.’s accuracy serves as baseline accuracy because the accuracies of the other studies on blogs are either unknown (as discussed above) or are presumably lower: Chen et al. (2014) and Zhang & Skiena (2010) used a dictionary-based approach for classification, which has been indicated to usually have lower accuracies compared to machine learning- based approaches (see Section 2.4.1). O’Hare et al.’s baseline level of accuracy seems reasonable because investor sentiment classification in web documents is a hard problem due to ambiguity (Das & Chen, 2007). To address the research gap of designing an accurate classifier by using a machine learning-based approach, this thesis uses a SVM approach (see Section 2.4.3.5). To choose, the C-parameter, which influences the accuracy, experiments were conducted (see Section 3.2.2). Further parameters that influence the accuracy are embodied in the document-vector-transformation (see Section 2.4.3.2). The settings of these parameters were chosen based on the literature (see Section 3.2.1).
Investment Blog Dataset: Seekingalpha vs. Blogspot
Almost no study covers an investment blog dataset. An exception is the recent work of Chen et al. (2014) who use a several year Seekingalpha investment blog document dataset. However, the studies on findings of effects of investor sentiment from blog documents mostly refer to a single blog platform (i.e., Fotak (2007) and Chen et al. (2014) refer to the Seekingalpha blog platform, and Gilbert & Karahalios (2010) refer to the LiveJournal blog platform). Zhang & Skiena (2010) might be an exception because they use datasets from LiveJournal and Spinn3r. However, LiveJournal is not investment specific and Spinn3r consists of blog documents from many sources (Spinn3r, 2015). However, these sources are not transparent. Thus, it is not possible to trace back any effects to a certain blog platform. In contrast, this thesis studies effects of investor sentiment from investment blog documents from two specific blog platforms: (1) Seekingalpha, and (2) Blogspot. This thesis is one of the first to study effects of investment blog documents from Blogspot. Furthermore, this thesis compares (magnitudes and statistical significance of) effects related to the two blog platforms.
Effects on Abnormal Returns
The findings on effects of investor sentiment from textual content on returns are with respect to various forms of returns (i.e., total returns of stock indexes, total returns of stocks, unexpected abnormal returns of stocks, and abnormal returns of stock portfolios) in prior studies. There is substantial evidence for unexpected abnormal returns of individual stocks on time horizons of up to 60 days (e.g., (Leinweber & Sisk, 2011)). The evidence for unexpected abnormal returns indicates price drift and that investor sentiment takes its time to be incorporated into prices. This evidence corroborates predictions of behavioral finance
Analysis of the State of the Art 87
theory (see Section 2.2). There is also some evidence for unexpected abnormal returns of stocks related to Seekingalpha investment blog documents containing long and short stock recommendations (Fotak, 2007). However, Fotak (2007) did not study portfolio level effects of investor sentiment on abnormal returns in various market phases. Some kind of evidence in this direction is provided by Chen et al. (2014), without explicitly relating to investor sentiment. Two studies have evidenced effects of sentiment from blog documents on stock indexes (Gilbert & Karahalios, 2010; Zhang & Skiena, 2010), thus they support the proposed study of effects of investor sentiment from blog documents. However, in contrast to these studies, this thesis studies effects on abnormal returns of stock portfolios. This thesis also considers transaction costs, which is in contrast to most prior studies except for Tetlock et al. (2008) and Leinweber & Sisk (2011).
Monthly Frequency
All reviewed studies of effects on (abnormal) returns focus on effects based on daily (or higher) frequency data. That is, investor sentiment is typically aggregated into a daily score and effects on (abnormal) returns are studied on the following day(s). Thus, there is a research gap with respect to higher aggregates and longer term effects. Unlike previous studies, this thesis aggregates investor sentiment into a monthly investor sentiment index and studies effects on abnormal returns on the portfolio level at monthly frequency. Thus, the aggregate accumulates investor sentiments over a longer time period and also a larger number of investors with the benefit of potentially reducing noise. The monthly aggregates of investor sentiment are assumed to have long term effects (i.e., at least one month into the future) based on predictions of behavioral finance theory (see Section 2.2). The monthly frequency is also related to mutual fund performance evaluation (e.g., (Carhart, 1997)). Thus, such effects would be relevant for exploitation in a fund context.
Multiyear Time Span
The time span of the datasets used in the studies with respect to news content is typically quite long and can stretch over several decades. However, when studying web information content (i.e., stock message boards, Twitter posts, and blogs), the datasets are typically much shorter, i.e., less than a year. Datasets spanning less than one year might come with the problem of covering only a distinct market phase such as the financial crisis year of 2008 (e.g., (Bollen et al., 2011; Gilbert & Karahalios, 2010)). The datasets of Zhang and Skiena (2010) cover longer time periods. However, several datasets are used by Zhang and Skiena (2010) for different textual sources, covering a maximum period of three years only for web content (i.e., Twitter posts and blogs). In contrast, the datasets used in the study of this thesis cover a five year period including different market phases to allow for a meaningful portfolio simulation to study effects on abnormal returns of a portfolio of stocks.
Analysis of the State of the Art 88
Specific Stocks
Some (micro) blog-related studies focus on effects of sentiment on the level of a stock index (Bollen et al., 2011; Gilbert & Karahalios, 2010). That is, they do not relate to specific, individual stocks. The study of Fotak (2007) is on the stock level but only with respect to mentions in 500 blog documents. The related work of Chen et al., 2014 (Zhang & Skiena, 2010) did not restrict their datasets on specific stocks as their datasets contain documents referring to 7422 (3238) different stocks. Thus, their daily aggregate of their measure of negative words or investor sentiment is presumably based on a few (blog) documents only for most of the stocks because they are simply not very well known. The low number of (blog) documents may result in a low quality (i.e., noisy) measure. The investor sentiment index proposed in this thesis would be of even lower quality because it aggregates a document level measure of investor sentiment instead of a word level one. Thus, this thesis uses different datasets of blog documents that were specifically retrieved from Seekingalpha and Blogspot to refer to large capitalization DJIA stocks, for which there should the highest number of blog documents on these blog platforms.