Machine Learning Applications in Algorithmic Trading

(1)

Machine Learning Applications in

Algorithmic Trading

Ryan Brosnahan, Ross Rothenstine

Introduction:

Computer controlled algorithmic trading is used by financial institutions around the world and accounted for over 73% of the total trading volume in the United States in 2009. In this paper we explore and compare various techniques in algorithmic trading, with emphasis on methods which employ machine learning techniques. We also propose a novel algorithmic trading methodology integrating the concept of Threshold Recurrent Reinforcement Learning (TRRL) originally outlined by Maringer and Ramtohul with lower level nodes derived from Indirect learning methods similar to recurrent neural network. We find that as technology improves, it becomes more practical to explore high-dimensional long-term temporal data in shorter intervals, leading to more intelligent and dynamic multilevel neural meta-networks, which are desired for responsive threshold identification.

Algorithmic Trading

The logistic process of algorithmic trading has been typically centered on labor-intense methods performed by human quantitative analysts, Quants. These Quants often come from computational backgrounds such as engineering and the sciences, and apply their skills to the financial markets. Quants are responsible for all aspects of algorithmic trading from data management to portfolio management. Recently, however, there has been an increasing interest in using machine learning for algorithm discovery.

(2)

Data

Acquiring and sanitizing data for algorithmic trading is the first step of the process. The type of data required varies significantly based on the goals of the trading strategy; HFT requires low latency data of few variables, where long-term strategies may use company financial data in their algorithms. Maximum time granularity for price data is limited by the speed at which an exchange can receive, execute, and give confirmation of an order plus travel time between the user and exchange; most exchanges have a turnaround time of less than 3ms. Fast data, however, comes at great cost:

Source Cost Frequency Quality Latency

Yahoo Finance Time >1s Unreliable >5s IQ Feed ~$100/month Basic Tick Reliable <500ms Bloomberg Data Feed ~$1,800/month Basic Tick Very Reliable <5ms Google Finance No longer available as of 22 October 2012

In longer-term strategies, additional data sources could include financial and accounting data, economic data, sentiment data, and others. Some suggested sources for these include Compustat, SEC Filings, Bureau of Economic Analysis, Bureau of Labor Statistics, and World Bank.

Model Development

Risk:

Each model must be evaluated critically to maximize profit while remaining in the limits of acceptable risk tolerance. The two risk performance metrics most often used are the Sharpe Ratio and Sterling Ratio.

[ ] √ [ ]

Where Ra is the asset return and Rb is the return of some benchmark asset, usually the S&P 500 index.

Monthly Daily Hourly Minute Tic

(3)

[ ]

The Sharpe ratio penalizes variance of any kind including swings in profit, where the sterling ratio only penalizes periods of large loss. The literature most often uses the Sharpe ratio, and we are inclined to agree that it is the superior risk metric.

Long-Term:

As demonstrated by famous value investors Benjamin Graham and Warren Buffet, investing in

companies at a discounted rate to fair-value is the best strategy for maximizing long-term profit. Using the tool Portfolio123, we can quantify and back test a strategy that emulates these great investors. The Piotroski Time Hedge model developed by Hedge Fund manager Dr. Joseph Piotroski is a value investing algorithm that has shown excellent performance in back-tests.

Returns of Piotroski algorithm compared to S&P 500 SPDR ETF

The algorithm was found empirically and can be constructed using dynamic programming techniques. The simplified algorithm is as follows, where the order of securities is sorted with stability at each step.

1. Reduce domain to OTC securities

2. Reduce domain to reasonably liquid securities

3. Reduce domain to securities with EPS above breakeven

4. Reduce domain to securities with operating income above breakeven

5. Reduce domain to securities with operating cash flow per share above breakeven 6. Reduce domain to securities that improved gross margin from the previous year 7. Reduce domain to securities with operating cash flow per share above EPS

8. Reduce domain to securities that decreased the debt to assets ratio from the previous year 9. Reduce domain to securities that increased their current ratio from the previous year

(4)

10. Reduce domain to securities that increased their asset turnover from the previous year 11. Reduce domain to securities that increased return on assets from the previous year

12. Reduce domain to securities that did not increase the number of outstanding shares from the previous year

Every four weeks the algorithm rebalances the portfolio with the k best choices.

The algorithm boasts results of 28% annualized return compared to 6.22% from the S&P 500 over a 10 year period with Sharpe ratio of 1.07 compared to the S&P 500’s 0.11. The algorithm is based loosely on the idea of reversion to fair valuation. Because of the larger timespan, data such as the real underlying value of the security and other economic news and forecasts are the major driving forces of price.

Although these numbers are impressive, it is hubris to expect that this algorithm will perform equally well into the future.

Short-term:

Previously quants manually explored and optimized trading models, but there is an emerging trend employing reinforced learning methods to create trading algorithms. Popular direct methods include: Q-Learning – uses an action-value function to discover the maximum utility move at a given state, and then follows a greedy policy thereafter. A recent variation is called delayed Q-learning promotes further exploration of the state space during runtime by incorporating probably approximately correct learning bounds to Markov decision process; this forces exploration of the state space to reduce error from misclassification.

Temporal Difference Learning – Combines methods of Monte Carlo and dynamic programming; a model is found, but sometimes there are temporal gaps between the current time and when the model can be complete; TD uses bootstrapping techniques to fill the gap.

Advantage Updating – At each state x the algorithm determines V(x), the total discounted return expected when starting at x and performing optimal actions. For each state and action U, the advantage

(5)

A(x, u) is stored, representing is stored, representing the degree to which the expected total discounted reinforcement is increased by performing action U (followed by optimal actions thereafter) relative to the action currently considered best. Because this data is stored, there is no need for continuous recalculation of higher order partial derivatives with respect to a state or action for optimization purposes. AU performs significantly better than Q-learning on continuous time and stochastic systems, which is ideal for HFT securities price data. (Baird)

The disadvantages of pure direct methods are the training time required, and the calculation of partial derivatives for gradient ascent at every state; a process that can be computationally expensive and can lead to hubris from ascent to local maxima when there are other superior policies.

Our Approach:

We propose a two level approach, the bottom being a model based neural network, the top being a direct reinforced learning algorithm employing recurrent reinforced learning from data derived from the nodes. We hypothesize that the lower level models will aide in reducing the dimensionality of the system, reducing the computation complexity for the top-level direct methods.

The brain in the system uses techniques similar to delayed Q-learning, but with input data from models in the lower level. The brain provides feedback to the nodes, shutting off those that do not aide in the forecast. This will permit the use of more data, with expectation that dimensionality will be reduced by the direct learning in the brain.

(6)

The project is still in development and we are still constructing a proper model framework. The development process of this project is very front-heavy, with significant progress required before any useful data can be obtained.

Future

Ryan hopes to continue exploring this project in the future, finishing development and testing the multilevel brain-node network.