1.3 Model Estimation
1.3.2 Approach Based on the Model Performance Measure-
In this section, we discuss how we might develop a model estimation prin- ciple around the model performance measurement principle of Section 1.2. At first blush, it might seem natural for an investor to choose the model that maximizes the utility-based performance measures, discussed in Section 1.2, on the data available for building the model (the training data). However, it can be shown that this course of action would lead to the selection of the em- pirical measure (the frequency distribution of the training data) — for many interesting applications,13a very poor model indeed, if we want our model to
generalize well on out-of-sample data; we illustrate this idea in Example 1.3 (see Section 1.6).
Though it is, generally speaking, unwise to build a model that adheres too strictly to the individual outcomes that determine the empirical measure, the observed data contain valuable statistical information that can be used for the purpose of model estimation. We incorporate statistical information from the data into a model via data-consistency constraints, expressed in terms of features, as described in Section 1.3.1.1.
12Depending on the exact choice of the data-consistency constraints, the objective function of this search may contain an additional regularization term. We shall elaborate on this in Chapters 9 and 10.
13For some simple applications, for example a biased coin toss with many observations, the empirical probabilities may serve well as a model. For other applications, for example, conditional probability problems where there are several real-valued explanatory variables and few observations, the empirical distribution will, generally speaking, generalize poorly out-of-sample.
Introduction 13
1.3.2.1 Robust Outperformance Principle
Armed with the notions of features and data-consistency constraints, we return to our model estimation problem. The empirical measure typically does not generalize well because it is all too precisely attuned to the observed data. We seek a model that is consistent with the observed data, in the sense of conforming to the data-consistency constraints, yet is not too precisely attuned to the data. The question is, which data-consistent measure should we select? We want to select a model that will perform well (in the sense of the model performance measurement principle of Section 1.2), no matter which data-consistent measure might govern a potential out-of-sample test set. To address this question, we consider the following game against nature14 (which
we assume is adversarial) that occurs in a market setting.
A game against “nature”LetQdenote the set of all probability measures,
K denote the set of data-consistent probability measures, and U∗
q denote the
(random) utility that is realized when allocating (so as to maximize expected utility) under the measureq in this market setting.15
(i) (Our move) We choose a model,q∈Q; then,
(ii) (Nature’s move) given our choice of a model, and, as a consequence, the allocations we would make, “nature” cruelly inflicts on us the worst (in the sense of the model performance measurement principle of Sec- tion 1.2) possible data-consistent measure; that is, “nature” chooses the measure
p∗= arg min
p∈KEp[U
∗
q]. (1.12)
If we want to perform as well as possible in this game we will seek the solution of
q∗= arg max
q∈Q minp∈KEp[U
∗
q]. (1.13)
By solving (1.13), we estimate a measure that (as we shall see later) conforms to the data-consistency constraints, and is robust, in the sense that the ex- pected utility that we can derive from it will be attained, or surpassed, no mat- ter which data-consistent measure “nature” chooses. The resulting estimate therefore, in particular, avoids being too precisely attuned to the individual observations in the training dataset, thereby mitigating overfitting.16
14This game is a special case of a game in Gr¨unwald and Dawid (2004), which was preceded by the “log loss game” of Good (1952).
15We note that we are speaking informally here, since we have not specified the market setting or how to calculateU∗
q. We shall discuss these issues more precisely in the remainder
of the book.
16This strategy does not guarantee a cure to overfitting, though! If there are too many data- consistency constraints, or the data-consistency constraints are not chosen wisely, problems
14 Utility-Based Learning from Data
This game can be further enriched by introducing a rival, who allocates according to the measure q0
∈Q.17 In this case, we would seek the solution
according to the robust outperformance principle:
Robust Outperformance Principle
We seek
q∗= arg max
q∈Qminp∈KEp[U
∗
q −Uq∗0]. (1.14)
Estimatingq∗would allow us to to maximize the worst-case outperformance
over our competitor (who allocates according to the measureq0
∈Q), in the presence of a “nature” that conforms to the data-consistency constraints and tries to minimize our outperformance (in the sense of the model performance measurement principle of Section 1.2) over our rival.
Jaynes (2003), page 431, has pointed out that “this criterion concentrates attention on the worst possible case regardless of the probability of occurrence of this case, and it is thus in a sense too conservative.” In our view, this may be so, given a fixed collection of features. However, by enriching the collection of features, it is always possible to go too far in the other direction, overly constraining the set of measures consistent with the data, and estimating a model that is too aggressive. We shall have more to say about ways to attempt to tune (optimally) the extent to which the data are consistent with the model in Section 1.3.5 and Chapter 10.
We note that this formulation has been cast entirely in the language of utility theory. The model that is produced is therefore specifically tailored to the risk preferences of the model user with utility function U. We also note that we have not made use of the concept of a “true” measure in this formulation.
1.3.2.2 Minimum Market Exploitability Principle
As we shall see in Chapter 10, under certain technical conditions, it is pos- sible to reverse the order of the max and min in the robust outperformance principle. Moreover, as we shall see in Chapter 10, subject to regularity con- ditions, by solving the resulting minimax problem, we obtain the solution to the maxmin problem (1.14) arising from the robust outperformance principle. By reversing the order of the max and min in (1.14), we obtain the minimum market exploitability principle:
Minimum Market Exploitability Principle
can arise. We shall discuss these issues, and countermeasures that can be taken to fur- ther protect against overfitting, at greater length below in this introduction, as well as in Chapters 9 and 10.
17Later, we shall see that this rival’s allocation measureq0 can be identified with the prior measure in an MRE problem.
Introduction 15 We seek p∗= arg min p∈Kmaxq∈QEp[U ∗ q −Uq∗0]. (1.15) Here, Ep[Uq∗−Uq∗0] (1.16)
can be interpreted as the gain in expected utility, for an investor who allocates according to the modelq, rather thanq0, when the “true” measure isp. Under
the minimum market exploitability principle, we seek the data-consistent mea- sure,p, that minimizes the maximum gain in expected utility over an investor who uses the modelq0. After a little reflection, this principle is consistent with
a desire to avoid overfitting. The intuition here is that the data-consistency constraints completely reflect the characteristics of the model that we want to incorporate, and that we want to avoid introducing additional (spurious) characteristics. Any additional characteristics (beyond the data-consistency constraints) could be exploited by an investor; so, to avoid introducing addi- tional such characteristics, we minimize the exploitability of the market by an investor, given the data-consistency constraints.
Fortunately, as we shall see in Chapter 10, the minimum market exploitabil- ity principle leads to a convex optimization problem with an associated dual problem that can be solved robustly via efficient numerical techniques. More- over, as we shall also see in Chapter 10, this dual problem can be interpreted as a utility maximization problem over a parametric family, and can be solved robustly via efficient numerical techniques.
By virtue of their equivalence, both the minimum market exploitability principle and the robust outperformance principle lead us down the same path; both lead to a tractable approach to estimate statistical models tailor- made to the risk preferences of the end user.