jpulliza_Masters_Paper.pdf

(1)

Jonathan L. Pulliza. An Analysis of Speculative Language in SEC 10-K Filings. A Master’s Paper for the M.S. in I.S degree. May 2015 36 pages. Advisor: Stephanie W. Haas

This study applies sentiment analysis techniques to model the usage of speculation within a collection of financial documents. The model is trained on the MPQA corpus to extract features that correlate with speculative sentences and applied to a collection of SEC 10-K documents from a five year period. The documents with the highest amount of

speculation contained a different concentration of terms compared to the entire collection, and the sentences mostly consisted of explaining potential risks concerning projects, taxes, and pensions.

Headings:

Sentiment Analysis

Text Mining

Machine Learning

(2)

AN ANALYSIS OF SPECULATIVE LANGUAGE IN SEC 10-K FILINGS

by

Jonathan L. Pulliza

A Master’s paper submitted to the faculty of the School of Information and Library Science of the University of North Carolina at Chapel Hill

in partial fulfillment of the requirements for the degree of Master of Science in

Information Science.

Chapel Hill, North Carolina

May 2015

Approved by

(3)

“Even where collateral is provided, capital market disruptions, the lowered rating or insolvency of the issuer or guarantor, changes in 33the power supply market prices and other events may prevent a party from being able to meet its obligations or may degrade the value of collateral, letters of credit and guarantees, and the collateral, guarantee or other performance assurance provided may prove insufficient to protect against all losses that a party may ultimately suffer.”

"The closure of the plan to entrants after December 31, 2013 and the cessation of benefit accruals in 2023 are expected to further lessen the significance of pension costs and the criticality of the related estimates to the Company’s financial statements."

5. Discussion

The model was trained using the MPQA corpus, which had a small percentage of

speculation annotations compared to the size of the collection. It would be interesting to

(30)

between the uses of speculation intensity and underlying operations, but this study was

limited by the amount of speculation sentences available in the MPQA corpus.

Transplanting the trained model onto financial documents assumed that speculation is

used in the same way across domains and the resulting extracted sentences seem to

corroborate that theory, though a full analysis by human annotators would be necessary to

confirm.

The most discriminating features that correlated with the speculation class also

coincided with the auxiliary words highlighted by Velldal et al. [7], which mostly

consisted of modal terms such as may, could and might. The model was limited to

extracting features sentence by sentence due to the structure of the corpus. There may be

discourse-level features within each 10-K document that are strong indicators of

speculation, such as references to previous passages or position within the document.

There was a difference in term usage when comparing the top speculative

documents versus the entire collection. The increased usage of terms such as regulation

and capital point to possible themes within these documents, such as dealing with

government agencies and raising or investing funds into the firm. Deeper analysis is

required from a financial expert in order to assess whether there are real structural

challenges that are particular to the top speculative companies, but this system has at least

shown that there are discernable differences in language usage for speculative documents.

The rankings of firm filings by year show that there is some variability in language

usage across time, perhaps coinciding with the findings of Brown and Tucker that

business make a significant amount of changes to filings when there are large economic

(31)

group of firms in the top and bottom of the speculative rankings over the period. Having

a high amount of speculation may not be a negative indicator of performance but rather a

positive indicator of the quality of the document. It may be the case that all of these firms

have the same issues but a small group chose to reveal and explain their weaknesses

much more or much less compared to the rest of the industry. It is also interesting that the

2009 ranking is so different from the rest of the years, which could be a function of some

industry shift or of the economic volatility of the time. Further research could focus on

this pivotal time period, perhaps growing the corpus to include other recessions.

Future analysis could include some more structured groupings, including firms

that have recently gone through a large merger, firms with shareholder lawsuits, and

firms that have been flagged by their auditors as having poor internal controls, which

were all highlighted by the work of Loughran and McDonald [15]. It would also be

interesting to see how speculation is distributed across industries and if it correlates with

other industry measures such as competition.

6. Conclusion

This study created a model for speculative language based on the MPQA corpus

and applied the resulting model to a corpus of financial documents. The documents with

the most speculative sentences contained a different concentration of terms compared to

the complete collection on a TF.IDF basis. Specifically terms such as regulation, fund,

and supplier were ranked much higher in the documents with the highest amount of

speculative sentences. Upon further examination some of the sentences labeled as

speculative contained discussion regarding potential risks to the firm, especially

(32)

particular groups of sentences that perhaps could be overlooked within these large

documents by a human reader.

Speculation is a necessity when communicating possible future events, so there

needs to be some level of speculation present in all financial documents. Firms deciding

to overuse speculation may be cloaking risks, but firms that choose to underutilize

speculation may not be fully expressing the possible opportunities and dangers to the

firm. In this study the relative concentration of speculative sentences was taken as the

barometer for the abnormality of a document, and therefore its worthiness of further

inspection. That does not necessarily mean that any of these firms were using speculation

inappropriately. The statements within these filings are very particular to the operations

of the firm at a point in time, and one firm’s choice to use more or less speculative

language compared to the rest of the industry may be more indicative of its particular

situation than any planned manipulation of the text for the purpose of obfuscating

financial results.

Speculative sentences are often connected to some underlying risk that an event

may or may not occur, and capturing the amount of speculation in a document could be

critical to a field such as finance which is built around risk modeling. This study will

hopefully improve the visibility of this issue within financial filings. If the SEC continues

to enforce the “plain English” guidance, these document should reflect the actual

underlying sentiment of the firm’s management, and therefore serve as an important

(33)

Bibliography

[1] “SEC.gov | The Investor’s Advocate: How the SEC Protects Investors, Maintains Market Integrity, and Facilitates Capital Formation,” 2013. [Online]. Available: https://www.sec.gov/about/whatwedo.shtml#.VRWP5e7F854. [Accessed: 27-Mar-2015].

[2] “Authoritative Source of Guidance,” Federal Accounting Standards Advisory Board. [Online]. Available:

http://www.fasab.gov/accounting-standards/authoritative-source-of-gaap/#gaap. [Accessed: 27-Mar-2015].

[3] O. E. Dictionary, “‘speculation, n.’. .” .

[4] S. Aman and S. Szpakowicz, “Using Roget ’ s Thesaurus for Fine-grained Emotion Recognition,” in International Joint Conference on Natural Language Processing, 2007, pp. 312–318.

[5] V. Vincze, G. Szarvas, R. Farkas, G. Móra, and J. Csirik, “The BioScope corpus: biomedical texts annotated for uncertainty, negation and their scopes,” BMC Bioinformatics, vol. 9, no. Suppl 11, pp. S9–S9, Nov. 2008.

[6] R. Farkas, V. Vincze, G. Móra, J. Csirik, and G. Szarvas, “The CoNLL-2010 Shared Task: Learning to Detect Hedges and Their Scope in Natural Language Text,” in Proceedings of the Fourteenth Conference on Computational Natural Language Learning --- Shared Task, 2010, pp. 1–12.

[7] E. Velldal, L. Øvrelid, J. Read, and S. Oepen, “Speculation and Negation: Rules, Rankers, and the Role of Syntax,” Computational Linguistics, vol. 38, no. 2. MIT Press, Cambridge, MA, USA, pp. 369–410, Jun-2012.

[8] B. Pang, L. Lee, H. Rd, and S. Jose, “Thumbs up ? Sentiment Classification using Machine Learning Techniques,” in Proceedings of the ACL-02 conference on Empirical methods in natural language processing, 2002, vol. 10.

[9] E. Boiy and M.-F. Moens, “A machine learning approach to sentiment analysis in

(34)

[10] S. Dasgupta and V. Ng, “Mine the Easy , Classify the Hard : A Semi-Supervised Approach to Automatic Sentiment Classification,” in ACL ’09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, 2009, pp. 701– 709.

[11] “About WordNet,” Princeton University, 2010. [Online]. Available: http://wordnet.princeton.edu/. [Accessed: 29-Oct-2014].

[12] S. Kim, M. Rey, and E. Hovy, “Determining the Sentiment of Opinions,” in

COLING Conference, 2004.

[13] “SentiWordNet.” [Online]. Available: http://sentiwordnet.isti.cnr.it/. [Accessed: 29-Oct-2014].

[14] A. Aue and M. Gamon, “Customizing Sentiment Classifiers to New Domains : a Case Study,” 2005.

[15] T. I. M. Loughran and B. Mcdonald, “When Is a Liability Not a Liability ? Textual Analysis , Dictionaries , and 10-Ks,” J. Finance, vol. LXVI, no. 1, pp. 35–65, 2011.

[16] Y. Choi, Y. Kim, and S.-H. Myaeng, “Domain-specific sentiment analysis using contextual feature generation,” Proceeding 1st Int. CIKM Work. Top. Anal. mass Opin. - TSA ’09, p. 37, 2009.

[17] T. Nasukawa and J. Yi, “Sentiment analysis: Capturing favorability using natural language processing,” Proc. 2nd Int. Conf. Knowl. capture, pp. 70–77, 2003.

[18] L. Qiu, W. Zhang, C. Hu, and K. Zhao, “SELC : A Self-Supervised Model for Sentiment Classification,” in Proceedings of the 18th ACM conference on Information and knowledge management, 2009, pp. 929–936.

[19] J. Lafferty, A. McCallum, and F. C. N. Pereira, “Conditional random fields: Probabilistic models for segmenting and labeling sequence data,” ICML ’01 Proc. Eighteenth Int. Conf. Mach. Learn., pp. 282–289, 2001.

[20] I. G. Councill, R. Mcdonald, and L. Velikovich, “What ’ s Great and What ’ s Not : Learning to Classify the Scope of Negation for Improved Sentiment Analysis,” in

NeSp-NLP ’10 Proceedings of the Workshop on Negation and Speculation in Natural Language Processing, 2010, no. July, pp. 51–59.

(35)

[22] T. A. Wilson, “Fine-Grained Subjectivity and Sentiment Analysis: Recognizing the Intensity, Polarity, and Attitudes of Private States,” University of Pittsburgh, 2008.

[23] “MPQA Opinion Corpus.” [Online]. Available:

http://mpqa.cs.pitt.edu/corpora/mpqa_corpus/. [Accessed: 29-Oct-2014].

[24] C. Chen, C. Liu, Y. Chang, and H. Tsai, “Opinion Mining for Relating Subjective Expressions and Annual Earnings in US Financial Statements,” J. Inf. Sci. Eng., vol. 29, no. 3, 2013.

[25] U. States and E. Commission, “SECURITIES AND EXCHANGE COMMISSION FORM 10-K ANNUAL REPORT PURSUANT TO SECTION 13 OR 15 ( d ) OF THE SECURITIES EXCHANGE ACT OF 1934,” 2015.

[26] T. Loughran and B. McDonald, “Regulation and financial disclosure: The impact of plain English,” J. Regul. Econ., vol. 45, no. 1, pp. 94–113, Nov. 2013.

[27] F. Li, “Annual report readability, current earnings, and earnings persistence,” J. Account. Econ., vol. 45, no. 2–3, pp. 221–247, Aug. 2008.

[28] T. Loughran and B. Mcdonald, “Measuring Readability in Financial Disclosures,”

J. Finance, vol. 69, no. 4, pp. 1643–1671, Aug. 2014.

[29] R. Pozen, “Final Report of the Advisory Committee on Improvements to Financial Reporting to the United States Securities and Exchange Commission,” 2008.

[30] S. V. Brown and J. W. Tucker, “Large-Sample Evidence on Firms’ Year-over-Year MD&A Modifications,” J. Account. Res., vol. 49, no. 2, pp. 309–346, May 2011.

[31] R. Feldman, S. Govindaraj, J. Livnat, and B. Segal, “Management’s tone change, post earnings announcement drift and accruals,” Rev. Account. Stud., vol. 15, pp. 915–953, 2010.

[32] A. K. Davis and I. Tama-Sweet, “Managers’ Use of Language Across Alternative Disclosure Outlets: Earnings Press Releases versus MD&A*,” Contemp. Account. Res., vol. 29, no. 3, pp. 804–837, Sep. 2012.

[33] F. Li, R. Lundholm, and M. Minnis, “A Measure of Competition Based on 10-K Filings,” J. Account. Res., vol. 51, no. 2, p. no–no, Jan. 2013.

(36)

[35] K. W. Hanley and G. Hoberg, “The information content of IPO prospectuses,”

Rev. Financ. Stud., vol. 23, pp. 2821–2864, 2010.

[36] “SEC.gov | EDGAR Company Filings | CIK Lookup.” [Online]. Available: http://www.sec.gov/edgar/searchedgar/cik.htm#.VFbFL5DF-Cs. [Accessed: 02-Nov-2014].

[37] L. Richardson, “Beautiful soup,” Crummy Site, 2013.

jpulliza_Masters_Paper.pdf

Table of Contents

1.

Introduction

2.

Survey of Previous Work

3.

Methods

4.

Results

5.

Discussion

6.

Conclusion