### Munich Personal RePEc Archive

**Disparity, Shortfall, and**

**Twice-Endogenous HARA Utility**

### Walker, Todd and Haley, M. Ryan and McGee, M. Kevin

### Indiana University

### 6 September 2009

### Online at

### https://mpra.ub.uni-muenchen.de/17139/

### Disparity, Shortfall, and Twice-Endogenous HARA Utility

M. Ryan Haley∗_{, M. Kevin McGee}†_{and Todd B. Walker}‡

September 6, 2009

Abstract

We demonstrate that shortfall-minimizing portfolio selection based on the Cressie-Read family of divergence measures maps to the HARA family. This means that all HARA utility functions can be interpreted as “endogenous” in the sense described in Stutzer (2003), and that traditional HARA expected utility maximization has an analog to the behavioral notion that an investor seeks to organize their selection of assets to minimize the probability of realizing a return below some pre-determined target or benchmark rate. We show that not only do risk aversion parameters arise endogenously, given the choice set, but that the type of risk aversion, relative or constant, is also determined endogenously. We also connect this approach to portfolio selection to some topics in behavioral economics.

Keywords: Entropy, Measure Change, Cressie-Read, Endogenous Utility, Benchmark

### 1

### Introduction

The purpose of this short paper is to expose an extremely interesting relationship between Hyperbolic Absolute Risk Aversion (HARA) utility, disparity minimization, and shortfall. Specifically, we show that the entire family of HARA utility functions has a minimium-divergence, shortfall-based representation, which means that HARA utility can be under-stood through the simple notion that the decision maker seeks the allocation that minimizes the probability of realizing an outcome below some pre-determined reference level. This result bridges the behavioral notion of shortfall minimization, first espoused by Roy (1952), with the now-familiar expected utility idea in a broad way. Specifically, we extend the en-dogenous utility arguments of Stutzer (2000, 2003), showing that his findings are special cases of a much more expansive relationship between shortfall, disparity, and conventional expected utility.

### 2

### Background

We begin with a brief, but by no means exhaustive, overview of three seemingly unrelated topics and then demonstrate their connections in the following sections.

2.1 Shortfall

The notion of portfolio selection by way of shortfall minimization is frequently associated with the Safety First (SF) principle, as developed in the celebrated paper by Roy (1952). The behavioral notion is simple: select the mix of assets that minimizes the probability of realizing a return below some pre-determined, disaster-level of return. Roy’s presentation laid bare how the seemingly simplistic notion of arranging one’s assets to avoid a sub-target return actually mapped to a rich decision environment that very closely paralleled the famed mean-variance model developed in Markowitz (1952). In fact, Markowitz (1999) referred to Roy’s shortfall-based idea as a “tremendous contribution” and writes (p. 5): “On the basis of Markowitz (1952), I am often called the father of modern portfolio theory (MPT), but Roy (1952) can claim an equal share of this honor.”

The evolution of MPT has included numerous extensions to Roy’s work, and the core notion of avoiding shortfall has since extended beyond portfolio theory to other areas of choice under uncertainty. Some examples, among many others, are the semi-variance methods of Markowitz (1970) and the lower partial-moment methods of Bawa (1976,1978). Related to these applications is the Sortino ratio (see, for example, Sortino and Price, 1994) which is a version of the Sharpe ratio (SR) (see, for example, Sharpe, 1994) that penalizes only downside risk. Value-at-Risk builds on the shortfall notion when quantifying how exposed a financial position is to a given percentage loss; see, for example, Jorion (2000). The notion of Loss Aversion (see, for example, Rabin 1998) from behavioral economics also has ties to shortfall.

2.2 Entropy, Economics and Disparity

has emphasized the close connection between method of moments estimators and
dispar-ity measures; Kitamura and Stutzer (1997) explored the relationship between entropy and
GMM estimation; and Imbens (1998) et al. compare an exponential tilting estimator with
other divergence measures.1 _{Micro theorists have recently shown a strong connection }

be-tween information theory and expected utility; see, for example, Candeal et al. (2001). In macro and finance settings, Robertsonet al. (2005) show how entropy can be used to impose moment restrictions on macroeconomic models to improve forecasting, and Stutzer (1996) demonstrates how an information theoretic approach can be used to price options.

The general spirit of this paper is to emphasize that the relative entropy approach taken in the papers cited above is a special case of a much broader divergence-based approach. This relationship is well known in other contexts (see, for example, Baggerly (1998), Qin and Lawless (1994), Basu et al. (2004)), but can be applied more generally to many of the economic environments described above. Within this general spirit we expose a particu-larly interesting connection between two specific “endogenous” expected utility functions described in Stutzer (2000,2003) and a likewise endogenized version of the entire HARA family of expected utility functions.

2.3 HARA Expected Utility and Endogenous

Expected utility functions with the HARA property are commonly used in a wide variety of economic models. As is well known, HARA is a relatively general family of utility functions, including the family of Constant Relative Risk Aversion (CRRA) utility functions. Because of its broad application, deeper insights into its properties are of interest.

A topic of recent research occurring at the confluence of utility theory and behavioral economics is the notion of “endogenous” utility; see, for example, Stuzer (2000, 2003) or Perets and Yashiv (2005). In these models the utility function, or the parameters thereof, arise within the economic model, in sharp contrast to tractional expected utility wherein the utility function and its parameters are thought to be exogenous. For example, Stutzer (2003) demonstrates that maximization of the decay-rate of the shortfall probability in a portfolio selection environment can be interpreted, by way of Large Deviations theory, as form of power utility that exhibits constant relative risk aversion. Furthermore, the coefficient of relative risk aversion is a choice variable, indicating that it is not independent of the investment opportunity set as in standard models of expected-utility-based portfolio selection.

The notion of endogenous utility raises many questions. Among them: Is Stutzer’s finding an anomaly or an indication of a much broader relationship between shortfall-minimization, entropy, and utility theory? At first glance, it would seem that the simple notion of meeting or not meeting a target would have little in common with the more sophisticated decision process implied by HARA utility maximization. However, we prove that there does exist a comprehensive relationship between the notion of shortfall minimization and HARA utility maximization. Specifically, we show that posing the shortfall minimization portfolio selection problem in terms of the Cressie-Read family maps directly to the family of HARA utility functions. Therefore, HARA utility functions do in fact have a shortfall-based analog. This relationship also implies that HARA utility functions have endogenous utility analogs. Thus, in thinking of HARA through this surprising and alternative lens, we can understand how the notion of goal attainment/failure (see Fishburn 1977) relates to the traditional HARA

interpretation and the choice set.

### 3

### Shortfall and Exponential Tilting

Despite the early utility connections made to shortfall, the connection between
shortfall-based rules and traditional expected utility theory has not been fully exposed.2 _{Stutzer}

(2000) began this bridging by relating his Portfolio Performance Index (PPI) to the familiar negative exponential utility model wherein the coefficient of absolute risk aversion entered the decision problem as a choice variable. Stutzer’s more general 2003 paper, built on the geometric mean, related the familiar log-optimal utility function to an endogenized CRRA utility function; this version is referred to as the decay-rate maximizing criterion (DRMC). Haley and Whiteman (2008) connect their Generalized Safety First (GSF) rule to Stutzer’s PPI, and highlight that both methods can be posed as moment-constrained Kullback-Leibler (KL) optimization problems in iid sample. Their posing of the problem is the basis of our method for bridging shortfall, disparity, and HARA utility.

Suppose an investor desires the portfolio that minimizes the probability of realizing a return below some self-selected (or imposed) target or benchmark rate of return, denoted as d. Falling below this level results in a shortfall, with d being the point of shortfall. Let the portion of initial wealth W0 allocated to asset i be denoted as wi, and collect them in the

vector w= (w1, ..., wJ), where J indicates the number of admissible assets. Further assume

that the returns of each asset are random variables, denoted as Ri.

The T-sample PPI is

max

θ,w

dθ−logn1 T

T

X

t=1

exp[θRt(w)]

o

where

Rt(w) = J

X

j=1 wjRjt

denotes the time-tportfolio return, and where the negative of the choice variableθ ∈ ℜ− _{can}

be interpreted as an endogenous constant absolute risk aversion parameter or a Lagrange
multiplier.3 _{Haley and Whiteman (2008) pose, in}_{iid} _{sample, this objective function directly}

as an equivalent d-mean constrained minimum-disparity optimization problem:

max

θ,w

T

X

t=1

πt(θ,w) log

hπ_{t}(θ,w)

u

i

−θh

T

X

t=1

πt(θ,w)Rt(w)−d

i

,

where

πt(θ,w) =

exp[θRt(w)]

PT

t=1exp[θRt(w)]

,

again subject to the usual wealth exhaustion constraint. Here, θ retains its endogenous utility parameter interpretation, but is transparently the Lagrange multiplier on the twisted

2 _{Some utility connections to SF have been advanced; see, for example, Pyle and Turnovsky (1970) and}

Levy and Sarnat (1972).

moment restriction.4 _{Note that the}_{π}

t(θ,w)s are the exponential tilting weights that appear

in the KL divergence.

The portfolio selection process works by finding the portfolio weightswthat correspond to the largest of the KL minimum divergences, which themselves are governed by the choice for θ, the exponential tilting parameter. The portfolio return distribution that requires the most re-weighting will be the one that is, in theory, least likely to result in a shortfall return. We say “in theory” because, as Haley and Whiteman emphasize, the PPI is actually an upper bound on the shortfall probability, thus the method minimizes the upper bound, not the

actual shortfall probability itself. In fact, unless the data-generating process follows certain probability laws (e.g., Gaussian), the portfolio that minimizes the upper probability bound may not be exactly the same as the portfolio that minimizes the actual shortfall probability; see Haley and Whiteman (2008) for a counter-example.

In finite sample applications, the KL-based methods can be easily adapted as an alterna-tive to direct evaluation of the empirical shortfall probability

1 T

T

X

t=1

H(Rt)[Rt(w)]

where

H(Rt)=

0 if Rt(w)> d,

1

2 if Rt(w) =d, 1 if Rt(w)< d

is the complementary Heaviside function. While direct computation of this expression de-livers the true (in-sample) shortfall-minimizing portfolio, solving the problem in this way is computationally expensive, and is generally infeasible for practical size J. Thus, most shortfall rules are actually based on approximations to the actual shortfall probability. The methods of Roy and Stutzer, for example, both operate by embedding the portfolio selection process in a bound on the shortfall probability.

The KL-based methods are connected directly to a bound (induced by Chernoff’s theorem in iid sample) on the shortfall probability. The family of disparity-based selection rules we propose herein do not correspond directly to the shortfall bound, but rather to the optimization process — the tiling to achieve a distribution with mean equal tod— embedded in the KL-based rules. Thus, our family of portfolio rules is related to the notion of shortfall minimization in the latter sense. However, as we will demonstrate below, the shortfall notion is well-represented in the operating details of our method.

### 4

### The Cressie-Read Family of Portfolio Selection Criteria

Developing a generalized alternative, which nests the KL-based rules, is the focus of this section. Our approach involves the Cressie-Read (CR) divergence family; see Cressie and Read (1984) or Baggerly (1998).5

4_{The DRMC formulation from Stutzer (2003) is similar, though focussed instead on the decay rate of the}

portfolio time-average associated with terminal wealthWT =W0Q

T

t=1Rt(w).

The CR divergence between the observed (τt) and tilted (ˆτt) measure is defined by

CRt(ˆτt, τt;λ) =

2 λ(1 +λ)τt

τt ˆ τt λ −1 ,

for fixed scalar parameter λ ∈ ℜ. We use the CR divergence because it generalizes many
well-known divergence measures. For example, λ = −2 yields the Neyman-modified χ2
divergence, λ = 1 gives Pearson’s χ2_{, and} _{λ} _{=} _{−}_{1/2 is the Freeman-Tukey measure. Two}
limiting distributions which are also encountered frequently are the empirical likelihood
measure (λ→0) and the KL measure (λ→ −1).

Factoring the CR function according to Basu and Lindsay (1994) provides additional insight:

T

X

t=1

CRt(ˆτt, τt;λ) =

2 λ(λ+ 1)

T X t=1 τt τt ˆ τt λ −1

+λ(ˆτt−τt)

= 2 T X t=1 ˆ

τtD(δt;λ)

where

D(δ;λ) = (δ+ 1)

λ+1_{−}_{(δ}_{+ 1)}
λ(λ+ 1) −

δ

λ+ 1, δt =

τt ˆ τt −1 .

Thus, the CR divergence may be interpreted as a weighted function (D) of disparity measures (δ) between the actual and tilted probability measures. The functionD(·) is non-negative, defined on [−1,∞) and equals zero if and only if the disparity between the two measures is also zero (i.e., τt = ˆτt ∀ t). In figure 1 we plot the CR disparity measure for

λ= [−2,−0.5,0,0.5,2].6

Embedding the portfolio selection rule (i.e., an optimization overw) into the CR function culminates in definition 1.

Definition 1. Let the relevant measure of disparity be governed by the CR power divergence. Then, for portfolio return Rt(w), benchmark return d, Lagrange multipliers θ and φ, initial

measure u = ut = 1/T for all t, and tilted weights τ1ˆ, ...ˆτT, the CR optimal portfolio is

determined by max w,θ,φ min ˆ τ T X t=1 ut

λ(λ+ 1)

ˆ τt ut −λ −1 +θ T X t=1 ˆ

τtRt(w)−d

+φ T X t=1 ˆ τt−1

subject to the usual wealth exhaustion constraint. This can be simplified by solving the interior minimization problem, which gives

ˆ

τt(w, θ, φ) =

ut

(λ+ 1){θ[Rt(w)−d] +φ}

_{λ}_{+1}1 .

Back-substituting gives max w,θ,φ T X t=1

ut {θ[Rt(w)−d] +φ}(λ+ 1)

_{λ}λ_{+1}

λ −φ−

1 λ(λ+ 1)

−1 −0.5 0 0.5 1 1.5 2 0

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

λ = − 2

λ = − 1/2

λ = 0

λ= 1/2

λ = 2

D

(

δ

)

δ

Figure 1: Disparity measures for various λ

subject to wealth exhaustion.

Proposition 1. The CR portfolio has a unique solution. Proof. See appendix.

The inner-workings of all members of the CR family of rules parallel the intuition of the KL-based rules: find the portfolio with the largest of the CR-minimum disparities. As before, the portfolio that requires themost re-weighting to achieve the tilted mean restriction is the portfolio that, intuitively, is least likely to deliver a return below the target rate d.

4.1 Implementation

When using the CR family, the researcher must specify d and λ prior to conducting any analysis. Selectingdis often straightforward, far more so than the prospect of parameterizing a utility function, especially for the end-user of portfolio rules (e.g., financial planner, fund managers). As Roy (1952 pg. 433) reminds us:

In calling in a utility function to our aid, an appearance of generality is achieved at the cost of a loss of practical significance and applicability in our results. A man who seeks advice about his actions will not be grateful for the suggestion that he maximize expected utility.

would likely find it easier to pin-point a shortfall level than a utility parameter for any given decision.7

Benchmark or target rates are readily available to investors, and have long been considered within modern portfolio theory. The Sharpe portfolio builds its excess return by subtracting out the risk-free rate; the term “differential return” or “information ratio” is used if some-thing other than the risk-free rate serves as the benchmark. Mutual fund manager’s today face considerable criticism if they fail to meet or beat the sector-, cap-, or style-specific index that is most comparable to the stated objectives of their fund.

A value forλ must be specified so the function can produce a result; i.e., a set of assets weights wand Lagrange multipliers θ and φ. Thispartially amounts to specifying a specific HARA utility function, but is less restrictive that traditional HARA implementation because the remaining parameters are identified using the endogenous Lagrange multipliers. The details of this connection are the focus of the next section.

### 5

### CR and HARA

Our core contribution rests in exposing the relationship between our general CR-based family of minimum disparity portfolio selection rules to the widely used HARA utility family. To see the connection, rewrite the CR objective function from definition one as

ψ(θ, φ) = 1 λT

T

X

t=1

[(λ+ 1)(θrt+φ)]

λ λ+1 −φ

where rt ≡ Rt(w)−d. Now let β ≡ λ/(λ+ 1), which implies that 1/λ = (1 −β)/β and

(λ+ 1) = 1/(1−β). Also, let η ≡ (λ + 1)φ, so φ = η(1−β). Using these notational substitutions yields

1 T

T

X

t=1

1−β β

θrt

1−β +η

β

−η(1−β);

i.e., maximizing ψ(·) is equivalent to maximizing a time-averaged HARA utility function. The HARA parameterβ is plainly pinned down by the choice for CR parameter λ. The other two HARA parameters, θ and η are pinned down by the CR parameters θ and φ. While λ must be set exogenously, θ and φ are decision variables within the CR objective function, and are thus endogenous in the same sense as in Stutzer (2000,2003). These values are likewise identified by the user’s choice ford. The CR formulation exposes how the HARA parameters θ and η are actually comprised of the CR Lagrange multipliers on the twisted mean restriction, the constraint that the twisted weights must sum to one, and the choice of the tilting measure (encapsulated by λ).

The endogenous analog to HARA utility is “twice endogenous” in the sense that for various values of θ and φ that may be achieved by the optimization, CRRA or CARA may obtain. For example, if φ = 0 arose endogenously, then η would also equal zero, and the HARA expression would reduce to the familiar CRRA utility model; this would be true for

7_{We would conjecture, based on the content of Rabin (1998) and like-minded research, that the sensibility}

any values of θ and λ. Thus, the type of risk aversion arises endogenously, as do the risk aversion parameter values.

Regarding λ and the type of absolute risk aversion: the Arrow-Pratt measure of absolute risk aversion is

θ

(λ+ 1)(θrt+φ)

,

the derivative of which is

−(λ+ 1)

θ

(λ+ 1)(θrt+φ)

2

.

Hence we have increasing (constant) (decreasing) absolute risk aversion if λ is less than (equal to) (greater than) -1; table 1 summarizes some of these relationships. Note that the case where λ = −1 corresponds to the endogenous negative exponential utility model as derived in Stutzer (2000).

Table 1: Five CR Rules and their HARA Analogs

λ Disparity Name β Utility Name Type of Absolute Risk Aversion

1 Pearson’sχ2

0.5 — decreasing 0 Empirical Likelihood 0 — decreasing -0.5 Freeman-Tukey -1 — decreasing -1 Kullback-Leibler β→ ∞ Exponential constant -2 Euclidean 2 Quadratic increasing

5.1 Numerical Considerations

Because the calculation of the CR portfolio involves tilting from one distribution to another, concepts from importance sampling can be used to gather additional insight into the portfolio ranking process. Like our case here, importance sample involves evaluating the usefulness of various “importance densities” (or kernels, where applicable) in estimating moments from a different density, often call the “target” density. How well these target moments are estimated depends critically on the importance density. If the importance density is poor, the moment estimates will be likewise poor.

Haley and Whiteman (2008), using Geweke (1989), adapt this intuition to the KL-based portfolio selection process. The result is a way to evaluate the reliability of the portfolio selection process. In this case, the “importance density” is the empirical distribution of portfolio returns induced by a given vector of asset weights w. The “target density” here is the tilted distribution, meaned atd. The moment of interest, therefore, is the twisted mean. Is it being estimated reliably in the portfolio selection process? As Haley and Whiteman demonstrate, answering this question amounts to computing the Relative Numerical Effi-ciency (Geweke, 1989) value of the divergence. This RNE value, generally appearing in the unit interval, should be as close to one as possible. Geweke (1989) states that the effective sample size being used to estimate the target moment, the twisted mean in our case, is equal to the original sample size T times the RNE value.

The key insight into this reliability rating process surrounds the fact that for each portfolio

means that the process can be muddied; i.e., the portfolio that is determined to be optimal may in fact not be optimal because it may not have been sorted properly from its competitors. This is analogous to sampling error, and in fact the RNE, as developed below, permits the CR user to obtain a non-bootstrapped, and therefore computationally inexpensive, standard error.

Because Haley and Whiteman (2008) offer an extended presentation of these concepts, we leave aside the details and instead describe how to construct the RNE for the CR rule by simply replacing the KL weights used in Haley and Whiteman (2008) with the more general CR weights

ˆ

τt(w, θ, φ) =

ut

(λ+ 1){θ[Rt(w)−d] +φ}

_{λ}_{+1}1 ,

of which, of course, the KL weights are a special case.

As noted above, the CR parametersλ must be pre-specified (as does the benchmark rate d), while the remaining HARA parameter values are determined endogenously. However, we conjecture that it may also be possible to endogenize the choice of λ by augmenting the CR portfolio selection rule with an RNE-maximization component. The implications would be twofold: 1) the user would only need to (exogenously) state a benchmark value (or reference level, of the parlance of loss aversion), and 2) the “best” tilting function (in the sense that it maximizes RNE) would precipitate from the portfolio optimization process. We leave a full investigation of this possibility to future research.

### 6

### Conclusions

We have proposed a new family of disparity-based shortfall minimizing portfolio selection rules, which we have related to the familiar HARA family of expected utility functions. In this capacity, our work extends the endogenous utility interpretation from the KL case found in Stutzer (2000,2003) to the entire CR family. This permits the HARA family to be interpreted as minimum disparity estimation problems built on the behavioral hypothesis that investors (in our immediate content) seek to minimize the probability of realizing a return below some pre-determined target or benchmark rate. This application of disparity minimization forms an interesting bridge between the seemingly simplistic and time-honored notion of shortfall minimization (and related ideas such as loss aversion) to the formally structured expected utility approach to decision under uncertainty.

### Appendix

Collected herein is a detailed proof of the primary proposition. We construct the proof as five subproofs, after first setting out a foundation and introducing a more compact notation.

### Proof of Proposition 1

Recall the objective function:

max w,θ,φ min ˆ τ T X t=1 u λ(λ+ 1)

ˆ τt u −λ −1 +θ T X t=1 ˆ τtrt

+φ T X t=1 ˆ τt−1

(A1)

where rt≡Rt(w)−d (i.e., the return net of the targetd) and whereu=ut= 1/T ∀t. For

notational parsimony we have suppressed rt’s dependence on w.

Taking first-order conditions of the interior minimization gives

Lˆτt =−

1 λ+ 1

ˆ τt

u

−(λ+1)

+θrt+φ= 0,

which implies that

ˆ

τt(w, θ, φ) =

u [(λ+ 1)(θrt+φ)]

1 λ+1 . (A2) Back-substituting gives max w,θ,φ T X t=1

u[(θrt+φ)(λ+ 1)]

λ λ+1

λ −φ−

1 λ(λ+ 1)

subject to wealth exhaustion or, equivalently,

max w,θ,φ

T

X

t=1

u[(θrt+φ)(λ+ 1)]

λ λ+1

λ −φ−ν

J X

j=1

wj −1

where ν is the Lagrange multiplier for the (explicit) wealth exhaustion constraint.8

Becauseθrt =θ

P

jwjrjtwe can define γj ≡θwj such thatθrt=

P

jγjrjtand

P

jγj =θ,

which means that the wealth exhaustion constraint can be folded into the objective function, thus reducing the maximization problem to

max

φ,γ ψ(φ, γ)≡ T

X

t=1 u[(P

jγjrjt+φ)(λ+ 1)]

λ λ+1

λ −φ. (A3)

The first-orders for the resulting maximization problem are

ψφ= T

X

t=1

u[(λ+ 1)(θrt+φ)]

−1

λ+1 −1 = 0 (A4a)

ψγj =

T

X

t=1

urjt[(λ+ 1)(θrt+φ)]

−1

λ+1 = 0 (A4b)

8 _{Assuming an interior solution, the second-order condition for the interior minimization produces a}

Equation (A4a) implies thatP

tτˆt= 1, while (A4b) states that

P

tτˆtrjt= 0; i.e., the tilting

weights will be chosen so that the twisted distributions ofeach of the assets will be meaned atd.

To assess the second-order conditions of (A3) let

mt≡ −u[(λ+ 1)(θrt+φ)]

−(λ+2)

λ+1 ,

and write the Hessian as

P

tmt

P

tmtr1t

P

mtr2t . . . P_{t}mtrJ t

P

tmtr1t

P

tmtr12t

P

mtr1tr2t . . . P_{t}mtr1trJ t

... ... ... . . . ...

P

tmtrJ t

P

tmtr1trJ t

P

mtr2trJ t . . . P_{t}mtrJ t2

.

The Hessian principal minors are all weighted sums of squares:

H1 = T X t=1 mt H2 =

T−1

X

t=1

T

X

s=t+1

mtms(r1t−r1s)2

H3 =

T−2

X

t=1

T−1

X

s=t+1

T

X

q=s+1

mtmsmq[r2t(r1s−r1q) +r2s(r1q−r1t) +r2q(r1t−r1s)]2

and so forth. If all themts are negative then the Hessian is negative definite. However, from

the definition of mt, mt is negative if (λ+ 1)(θrt+φ) is positive, which is also a necessary

condition for ˆτtto be positive. Since at least some of therjtmust be negative for the portfolio

decision to be nontrivial, we must have for

λ >−1, (θrt+φ)>0

λ <−1, (θrt+φ)<0 (A5)

If (A5) is met, any solution we have will be unique. We verify these using the following three cases.9

1. The “upper” case: λ > 0, which implies that 0 < λ

λ+1 < 1. To explore the impact of choosing the weight γj for any asset j, suppose that φ and the other γk6=js have been

chosen, and denote φ∗

t = (φ+

P

k6=jγkrjt) for each time period t. Also denote as rjmax

and rjmin the rates of return for asset j farthest above and below their respective φ∗t.

From (A3), ψ(φ, γ) will be real valued if all the (P

jγjrjt+φ) are nonnegative, which

in turn requires

−φ∗

t

rjmax

≤γj ≤

−φ∗

s

rjmin

. (A6)

9 _{For completeness, we include cases four and five, though they have already been discovered elsewhere,}

If (A6) is satisfied for all J assets, then (θrt+φ) will be positive, which by (A5) assures

concavity. Within the range (A6), all of the (γjrjt +φ∗t) are either monotonically

increasing or decreasing concave functions of γj, depending on the sign of rjt. At the

two boundaries (γjrjmin+φ∗t)/λ or (γjrjmax+φ∗s)/λ will equal zero. Since presumably

the majority of therjts are positive, we might expect ψ(φ, γ) to be increasing over most

of the range. However, since from (A4b) the slope of ψ(φ, γ) approaches positive and negative infinity at the lower and upper limits of the range, we must have an interior maximum.

2. The “central” case: −1< λ <0, which implies that λ

λ+1 <0. Using the same notation as the previous case, note that since the exponent in (A3) will be negative, ψ(φ, γ) will be discontinuous at every value ofγj =−φ∗t/rjt. The only range for which all the values

of (γjrjt+φ∗t) will be positive will be between the points of discontinuity forrjmin and

rjmax. Then a sufficient condition for all the ˆτts to be positive and, by (A5), for allψ(·)

to be concave is

−φ∗

t

rjmax

< γj < −φ∗

s

rjmin

(A7)

for each asset.

As γj approaches either boundary, either (γjrjmin +φ∗t)/λ or (γjrjmax + φ∗s)/λ will

approach negative infinity. Otherwise the (γjrjt+φ∗t) are monotonically increasing or

decreasing functions of γj, depending on the sign of rjt, and the sum ψ(φ, γ) will have

a unique interior maximum.

3. The “lower” case: λ <−1, which implies that λ

λ+1 >1. Again using the same notation, since (λ+ 1) is negative, our sufficient condition for concavity requires (θrt+φ) ≤ 0,

or each (γjrjt+φ∗t)≤0, giving the range forγj:

−φ∗

t

rjmin

≤γj ≤

−φ∗

s

rjmax

.

At the boundaries, ψγj in (A4b) is zero for rjmin or rjmax; since its slope for the other

rjt will be positive or negative depending on the sign of rjt, and the magnitudes of

those slopes depend on the particular rjtvalues, there is no guarantee that the summed

slope will be positive at the lower boundary or negative at the upper boundary. In short, we cannot be sure an interior maximum exists. However, for values of λ that satisfy λ = −2k/(2k + 1), so λ/(λ+ 1) = −2k, where k is any positive integer, then [(λ+1)(γjrjt+φ∗t)]−2k/λis everywhere concave, and henceψ(φ, γ) is everywhere concave,

and a unique maximum exists.

For portfolio return R(w), portfolio weights w, target rate d, and parameter θ ∈ ℜ−_{,}

GSF is concave in θ and w. To see this, let λ(θ,w) ≡ log E{exp[θR(w)]}

. For
α, β ∈[0,1] such that α+β = 1 andθ1, θ2 ∈ ℜ−_{,}

E{exp[(αθ1+βθ2)R(w)]}=E{exp[αR(w)θ1] exp[βR(w)θ2]}, then H¨older’s inequality implies that

E{exp[αR(w)θ1] exp[βR(w)θ2]} ≤ E{exp[R(w)θ1]}α

E{exp[R(w)θ2]}β

.

The monotonicity of log(·) then implies that

λ(αθ1 +βθ2,w)≤αλ(θ1,w) +βλ(θ2,w),

which immediately implies thatλ(θ,w) is convex inθ and that [dθ−λ(θ,w)] is concave in θ.

Regarding w, for α, β ∈[0,1] such that α+β = 1 and wa,wb ∈ ℜN,

E{exp[θR(αwa+βwb)]} =E{exp[αθR(wa)] exp[βθR(wb)]},

which uses the fact that R(αwa+βwb) is equal to [αR(wa) +βR(wb)]. Then H¨older’s

inequality implies that

E{exp[αθR(wa)] exp[βθR(wb)]} ≤

E[exp[θR(wa)] α

E[exp[θR(wb)] β

.

The monotonicity of log(·) then implies that

λ(θ, αwa+βwb)≤αλ(θ,wa) +βλ(θ,wb),

which immediately implies thatλ(θ,w) is convex inwand that [dθ−λ(θ,w)] is concave in w.

5. Case five (Empirical Likelihood): λ = 0. This proof appears in Haley and McGee (2009), but we include it here for completeness. Under EL divergence the twisting loss function that needs to be minimized is

T

X

t=1

ulog(u/ρt).

Therefore, we seek the portfolio weights wi and the twisted probabilities ρt that solve:

max

θ,wi

min

ρt,θ,φ

T

X

t=1

ulog(u/ρt) +θ T

X

t=1

ρtrt+φ

_{X}T

t=1

ρt−1

(A8)

subject to wealth exhaustion. Beginning with the interior minimization problem, the first-order conditions are

Lρt =−(u/ρt) +θrt+φ= 0,

Lθ =

T

X

t=1

ρtrt= 0,

Lφ=

T

X

t=1

The conditions with respect to the ρt can be rewritten θρtrt+φρt=u. Summing over

the T periods gives

θ

T

X

t=1

ρtrt+φ T

X

t=1 ρt=

T

X

t=1

u= 1. (A9)

The other two first-order conditions imply PT

t=1ρtrt = 0 and

PT

t=1ρt = 1, so (A9)

reduces to φ = 1, giving

ρt=u/(1 +θrt). (A10)

The second-order conditions for the interior minimization problem produce the bordered Hessian

|H¯|=

0 0 1 ... 1 0 0 r1 ... rT

1 r1 u/ρ2

1 ... 0 ... ... . . .. ... 1 rT 0 . . . u/ρ2T

= T X t=1 T X

s=t+1

(rt−rs)2

Y

n6=s,t

u ρ2

n

,

which is strictly positive for all the principal minors, ensuring a unique minimum. Substituting (A10) forρt into (A8), the outer maximization problem becomes

max

θ,wi

T

X

t=1

ulog 1 +θrt) (A11)

The derivative with respect to θ again produces E(rt) = 0, or E(Rt) = d. To

charac-terize the solving of θ, note first from (A10) that for all the ρt to be positive, we must

have all (1 +θrt)>0. This condition is also required by (A11), for the logarithm to be

defined. Denoting rmin and rmax as the most extreme values of rt (for any given set of

weights wi), where rmin <0< rmax, then (1 +θrt)>0 for all t requires

− 1

rmax

< θ <− 1

rmin

,

that is, θ must fall in an interval bounded around zero.10

To further limit the value of θ, let g(θ) be the partial derivative of (A11) with respect to θ: g(θ) = T X t=1 rt

1 +θrt

with the solution for θ at g(θ) = 0. Clearly, g′_{(θ) =} _{−}PT

t=1rt2/(1 +θrt)2 is negative.

Since g(0) =PT

t=1rt= (µ−d)>0,θ has a unique positive solution:

0< θ <− 1

rmin

.

Rather than minimize (A11) with respect to thewidirectly, note thatθrt=PT_{t}_{=1}θwirit ≡

PT

t=1γirit, where γi ≡ θwi and

PT

t=1γi = θ. Replacing the N + 1 choice variables θ

and the wi in (A11) with the N variables γi makes the wealth exhaustion constraint

redundant, so (A11) reduces to

max

γi

=

T

X

t=1

ulog1 +

N

X

n=1 γirit

.

The first-order conditions are

Lγi =

T

X

t=1 urit

1 +θrt

= 0,

which again requires that under the twisted probabilities, the mean return for each of the N assets equals d. The Hessian of the second partial derivatives is

|H|=

−PT

t=1

ur2 1t

(1+θrt)2 −

PT

t=1

ur1tr2t

(1+θrt)2 ... −

PT

t=1

ur1trN t

(1+θrt)2

−PT

t=1 (1+ur1θrtrt2)t2 −

PT

t=1

ur2 2t

(1+θrt)2 ... −

PT

t=1

ur2trN t

(1+θrt)2

..

. ... . .. ...

−PT

t=1

ur1trN t

(1+θrt)2 −

PT

t=1

ur2trN t

(1+θrt)2 ... −

PT

t=1

ur2_{N t}

(1+θrt)2

.

### References

[1] Baggerly, K. 1998. “Empirical Likelihood as a Goodness-of-Fit Measure,” Biometrika

85: 535.

[2] Basu, A., Park, C., Lindsay, B. G. and Li, H. 2004. “Some Variants of Minimum Dis-parity Estimation.” Computational Statistics and Data Analysis, 45: 741–763.

[3] Bawa V. 1976. “Admissible Portfolios for All Individuals,”Journal of Finance 31: 1169– 1183.

[4] Bawa V. 1978. “Safety-first, stochastic dominance, and optimal portfolio choice,” Jour-nal of Financial and Quantitative AJour-nalysis 13: 255–271.

[5] Benartzi, S. and R. Thaler. 1995. “Myopic Loss Aversion and the Equity Premium Puzzle,” Quarterly Journal of Economics 110: 73–92.

[6] Candeal, J., De Miguel, J., Indur´ain, E., and G. Mehta. 2001. “Utility and Entropy,”

Economic Theory, 17: 233–238.

[7] Cressie, N. and T. Read. 1984. “Multinomial Goodness-of-Fit Tests,” Journal of the Royal Statistical Assocation. Series B (Methodological), 46: 440–464.

[8] Fishburn, P. 1977. “Mean-Risk Analysis with Risk Associated with Below-Target Re-turns,”American Economic Review 67: 116–126.

[9] Geweke, J. 1989. “Bayesian Inference in Econometric Models Using Monte Carlo Inte-gration,” Econometrica 57: 1317–1339.

[10] Golan, A. and E. Maasoumi. 2008. “Information Theoretic and Entropy Methods: An Overview,” Econometric Reviews 27: 317–328.

[11] Granger, C., Maasoumi, E., and J. Racine. 2004. “A Dependence Metric for Possible Nonlinear Processes.” Journal of Time Series Analysis 25: 649–669.

[12] Haley, M. and C. Whiteman. 2008. “Generalized Safety First and a New Twist on Portfolio Performance,”Econometric Reviews, 27: 457–483.

[13] Haley, M. and M. K. McGee. 2009. “KLICing There and Back Again: Portfolio Selection Using the Empirical Likelihood Divergence and Hellinger Distance,” manuscript -University of Wisconsin - Oshkosh.

[14] Imbens, G.W. and R.H. Spady and P. Johnson. 1998. “Information Theoretic Ap-proaches to Inference in Moment Condition Models,” Econometrica 333–357.

[15] Jorion P. 2000.Value at Risk: The new benchmark for managing financial risk, second edition. McGraw-Hill.

[16] Kitamura, Y. and M. Stutzer. 1997. “An Information-Theoretic Alaternative to Gener-alized Method OF Moments Estimation,” Econometrica 65: 861–874.

[18] Levy, H. and M. Sarnat. 1972. “Safety First – An Expected Utility Principle,” Journal of Financial and Quantitative Analysis 7: 1829–1834.

[19] Markowitz, H. 1952. “Portfolio Selection,” Journal of Finance 7: 77–91.

[20] Markowitz H. 1970. Portfolio selection: Efficient diversification of investments, second edition. New York: Yale University Press.

[21] Markowitz, H. 1999. “The Early History of Portfolio Theory: 1600-1960,” Financial Analysts Journal 55: 5–15.

[22] Perets, G. and E. Yashiv. 2005. “Deriving HARA Utility “Endogensouly”: Invariance Restrictions in Economic Optimization,” Universite de Lyon: working paper.

[23] Pyle, D. and S. Turnovsky. 1970. “Safety-First and Expected Utility Maximization in Mean-Standard Deviation Portfolio Analysis,” Review of Economics and Statistics 52: 75–81.

[24] Qin, J. and J. Lawless. 1994. “Empirical Likelihood and General Estimating Equations,”

The Annals of Statistics 22: 300–325.

[25] Rabin, M. 1998. “Psychology and Economics,” Journal of Economic Literature 36: 11– 46.

[26] Robertson, J., Tallman, E., and C. Whiteman. 2005. “Forecasting Using Relative En-tropy,” Journal of Money, Credit, and Banking 37: 383–401.

[27] Rockafellar, R. 1970.Convex Analysys, Princeton: Princeton University Press.

[28] Roy, A. 1952. “Safety First and the Holding of Assets,” Econometrica 20: 431–449.

[29] Sharpe W. 1994. “The Sharpe Ratio,” Journal of Portfolio Management Fall: 49–58.

[30] Sortino, F. and Price L. 1994. “Performance Measurement in a Downside Risk Frame-work,”Journal of Investing Fall: 59–65.

[31] Stutzer, M. 1996. “A Simple Nonparametric Approach to Derivative Security Valua-tion,” Journal of Finance 51: 1633–1652.

[32] Stutzer, M. 2000. “A Portfolio Performance Index,” Financial Analyst Journal

May/June: 52–61.