Evaluation Criteria - A Novel Algorithmic Trading Framework

This section is dedicated to the issue of evaluating the systems and this project as a whole. Evaluation is an issue much discussed in the field of artificial intelligence (AI). Cohen and Howe claims many papers do an inadequate job of evaluating the result of their AI systems[46].

110 Evaluation Criteria

The systems described in question in this thesis have an inherent black-box nature.

This means it is inconceivable to understand the rationale behind why the system delivers its result. This complicates the evaluation. An evaluation with clear relative benchmarks is thus important and to avoid any pitfalls that invalidate the results. Examples of this are overfitting or data leakage. In this effort Cohen and Howe’s five stage process is used for evaluating empirical AI systems [46]:

1. Find a task and a view of how to accomplish it 2. Refine the view to a specific method

3. Develop a program implementing the method 4. Design experiments to test the program 5. Run experiments

Ideally this development process should be run as iterative cycles. However, as the formal structure of this thesis and the previous project connected, the process has been divided into two phases. This project, the main thesis, largely contains the implementation, and naturally a bit of redesign. The identification of task and method were mostly completed in the previous project. Still the system created is made adaptable such that many different methods can be tested and thus a broader and more experiments can be run without the need to redesign the whole system.

The task is to beat the market index. In more quantifiable terms this entail to yield statistically significantly larger returns. In addition the goal is to do this consistently, thus the variance of these returns are of importance. The general hypothesis is that this can be achieve by the use of time series analysis and bi-ologically inspired regression techniques. A set of specific systems is defined in the trading systems section 4.1. Instead of only creating a program to test one method, a framework has been made capable to test all these methods and to support further adjustments rapidly.

5.2.1 Evaluation

The system described in this paper is a practical work of engineering. The aim of this project is not to to evaluate some AI technique. It is designed with the goal to generate a good return on investment. Any evaluation of the AI techniques that the system employs is with that aim in mind. The system uses several comple-mentary techniques in a competitive environment. This is meant to improve the system in its ultimate goal to generate a higher return than the rest of the market, not to test which technique is best. The main evaluation efforts will thus go to

measure different configurations of the system against standardized portfolios like indexes and conventional portfolio theoretical constructs. This paper is based on a hypothesis that neural network and evolution in a combination can be used to build effective portfolios, but there is no specific design in question. The exper-iments are thus more of an exploratory nature with the aim of identifying good portfolio building system with these techniques as central components. All the focus is on performance on a stock market and this is thus the only data sample the system will use.

Component Evaluation

The system consists of many different largely independent nodes that are used to build modules. As these modules are independent they can be tested as separate entities with much more ease than with a complete system test. The functional-ity of modules like data input, output and evaluation are common for all trading systems and tested for correctness independent of other components. These mod-ules are deterministic and simple to test without need for further discussion. All components are individually tested before they are joined in larger more complex systems.

Portfolio Evaluation

The time series predictions can be tested on historical data, and the error easily calculated. The portfolio agents are harder to evaluate as there are no universally accepted optimal portfolio. The essential parameters are level of risk and return, the issue is how to evaluate the risk. A common approach is to use historical returns and variance to classify the quality of a portfolio, and then to compare this with an appropriate index. In this project a uniform buy-and-hold portfolio is used as an index or benchmark of which to compare results against. Specifically the portfolios are evaluated on:

• The mean excess daily return relative to the benchmark, ¯rp− ¯rm

• The variance relative to the benchmark, ^σ^p²/σ_m²

• The excess cumulative return compared with the benchmark, Rp− Rm

Here p denotes the portfolio, m the benchmark and R is the cumulative return over the period in question. As this project is a study of real-life application the excess cumulative return is weighed highest as this shows potential gains achievable.

Over a longer time period the cumulative return will reflect both mean returns and variance.

112 Results

System Evaluation

The eventual aim is to test the set of trading systems presented in section 4.1.

The trading systems are in essence just portfolio producers and can be evaluated as described above. The trading systems are evaluated by the portfolios they hold throughout the experiments. The benchmark used to compare them to is a uniform buy-and-hold strategy. The reason for creating this 1/N benchmark instead of using the existing index to compare is that by creating this index with the system an area of potential bias is eliminated. This way any biases will also occur in the benchmark. Only differences relative to the benchmark are evaluated.

This eliminates the market returns and risk and leaves only the difference the system has managed to produce.

In document A Novel Algorithmic Trading Framework (Page 127-130)