• No results found

Convex Optimization Applications

N/A
N/A
Protected

Academic year: 2021

Share "Convex Optimization Applications"

Copied!
51
0
0

Loading.... (view fulltext now)

Full text

(1)

Convex Optimization Applications

Stephen Boyd Steven Diamond Junzi Zhang Akshay Agrawal

EE & CS Departments Stanford University

1

(2)

Outline

Portfolio Optimization Worst-Case Risk Analysis Optimal Advertising Regression Variations Model Fitting

2

(3)

Outline

Portfolio Optimization Worst-Case Risk Analysis Optimal Advertising Regression Variations Model Fitting

Portfolio Optimization 3

(4)

Portfolio allocation vector

I

invest fraction w

i

in asset i , i = 1, . . . , n

I

w ∈ R

n

is portfolio allocation vector

I 1T

w = 1

I

w

i

< 0 means a short position in asset i

(borrow shares and sell now; must replace later)

I

w ≥ 0 is a long only portfolio

I

kw k

1

= 1

T

w

+

+ 1

T

w

is leverage (many other definitions used . . . )

Portfolio Optimization 4

(5)

Asset returns

I

investments held for one period

I

initial prices p

i

> 0; end of period prices p

i+

> 0

I

asset (fractional) returns r

i

= (p

i+

− p

i

)/p

i I

portfolio (fractional) return R = r

T

w

I

common model: r is a random variable, with mean E r = µ, covariance E(r − µ)(r − µ)

T

= Σ

I

so R is a RV with E R = µ

T

w , var(R) = w

T

Σw

I E R is (mean) return of portfolio

I var(R) is risk of portfolio

(risk also sometimes given as std(R) =

pvar(R))

I

two objectives: high return, low risk

Portfolio Optimization 5

(6)

Classical (Markowitz) portfolio optimization

maximize µ

T

w − γw

T

Σw subject to

1T

w = 1, w ∈ W

I

variable w ∈ R

n

I

W is set of allowed portfolios

I

common case: W = R

n+

(long only portfolio)

I

γ > 0 is the risk aversion parameter

I

µ

T

w − γw

T

Σw is risk-adjusted return

I

varying γ gives optimal risk-return trade-off

I

can also fix return and minimize risk, etc.

Portfolio Optimization 6

(7)

Example

optimal risk-return trade-off for 10 assets, long only portfolio

Portfolio Optimization 7

(8)

Example

return distributions for two risk aversion values

Portfolio Optimization 8

(9)

Portfolio constraints

I

W = R

n

(simple analytical solution)

I

leverage limit: kw k

1

≤ L

max

I

market neutral: m

T

Σw = 0

I mi is capitalization of asset i

I M = mTr is market return

I mTΣw = cov(M, R)

i.e., market neutral portfolio return is uncorrelated with market return

Portfolio Optimization 9

(10)

Example

optimal risk-return trade-off curves for leverage limits 1, 2, 4

Portfolio Optimization 10

(11)

Example

three portfolios with w

T

Σw = 2, leverage limits L = 1, 2, 4

Portfolio Optimization 11

(12)

Variations

I

require µ

T

w ≥ R

min

, minimize w

T

Σw or kΣ

1/2

w k

2 I

include (broker) cost of short positions,

s

T

(w )

, s ≥ 0

I

include transaction cost (from previous portfolio w

prev

), κ

T

|w − w

prev

|

η

, κ ≥ 0

common models: η = 1, 3/2, 2

Portfolio Optimization 12

(13)

Factor covariance model

Σ = F ˜ ΣF

T

+ D

I

F ∈ R

n×k

, k  n is factor loading matrix

I

k is number of factors (or sectors), typically 10s

I

F

ij

is loading of asset i to factor j

I

D is diagonal matrix; D

ii

> 0 is idiosyncratic risk

I

Σ > 0 is the factor covariance matrix ˜

I

F

T

w ∈ R

k

gives portfolio factor exposures

I

portfolio is factor j neutral if (F

T

w )

j

= 0

Portfolio Optimization 13

(14)

Portfolio optimization with factor covariance model

maximize µ

T

w − γ



f

T

Σf + w ˜

T

Dw



subject to

1T

w = 1, f = F

T

w

w ∈ W, f ∈ F

I

variables w ∈ R

n

(allocations), f ∈ R

k

(factor exposures)

I

F gives factor exposure constraints

I

computational advantage: O(nk

2

) vs. O(n

3

)

Portfolio Optimization 14

(15)

Example

I

50 factors, 3000 assets

I

leverage limit = 2

I

solve with covariance given as

I single matrix

I factor model

I

CVXPY/OSQP single thread time

covariance solve time single matrix 173.30 sec factor model 0.85 sec

Portfolio Optimization 15

(16)

Outline

Portfolio Optimization Worst-Case Risk Analysis Optimal Advertising Regression Variations Model Fitting

Worst-Case Risk Analysis 16

(17)

Covariance uncertainty

I

single period Markowitz portfolio allocation problem

I

we have fixed portfolio allocation w ∈ R

n

I

return covariance Σ not known, but we believe Σ ∈ S

I

S is convex set of possible covariance matrices

I

risk is w

T

Σw , a linear function of Σ

Worst-Case Risk Analysis 17

(18)

Worst-case risk analysis

I

what is the worst (maximum) risk, over all possible covariance matrices?

I

worst-case risk analysis problem:

maximize w

T

Σw

subject to Σ ∈ S, Σ  0 with variable Σ

I

. . . a convex problem with variable Σ

I

if the worst-case risk is not too bad, you can worry less

I

if not, you’ll confront your worst nightmare

Worst-Case Risk Analysis 18

(19)

Example

I

w = (−0.01, 0.13, 0.18, 0.88, −0.18)

I

optimized for Σ

nom

, return 0.1, leverage limit 2

I

S = {Σ

nom

+ ∆ : |∆

ii

| = 0, |∆

ij

| ≤ 0.2},

Σ

nom

=

0.58 0.2 0.57 −0.02 0.43

0.2 0.36 0.24 0 0.38

0.57 0.24 0.57 −0.01 0.47

−0.02 0 −0.01 0.05 0.08 0.43 0.38 0.47 0.08 0.92

Worst-Case Risk Analysis 19

(20)

Example

I

nominal risk = 0.168

I

worst case risk = 0.422

worst case ∆ =

0 0.04 −0.2 −0. 0.2

0.04 0 0.2 0.09 −0.2

−0.2 0.2 0 0.12 −0.2

−0. 0.09 0.12 0 −0.18

0.2 −0.2 −0.2 −0.18 0

Worst-Case Risk Analysis 20

(21)

Outline

Portfolio Optimization Worst-Case Risk Analysis Optimal Advertising Regression Variations Model Fitting

Optimal Advertising 21

(22)

Ad display

I

m advertisers/ads, i = 1, . . . , m

I

n time slots, t = 1, . . . , n

I

T

t

is total traffic in time slot t

I

D

it

≥ 0 is number of ad i displayed in period t

I P

i

D

it

≤ T

t

I

contracted minimum total displays:

Pt

D

it

≥ c

i

I

goal: choose D

it

Optimal Advertising 22

(23)

Clicks and revenue

I

C

it

is number of clicks on ad i in period t

I

click model: C

it

= P

it

D

it

, P

it

∈ [0, 1]

I

payment: R

i

> 0 per click for ad i , up to budget B

i

I

ad revenue

S

i

= min

(

R

iX

t

C

it

, B

i )

. . . a concave function of D

Optimal Advertising 23

(24)

Ad optimization

I

choose displays to maximize revenue:

maximize

Pi

S

i

subject to D ≥ 0, D

T1 ≤ T ,

D1 ≥ c

I

variable is D ∈ R

m×n

I

data are T , c, R, B, P

Optimal Advertising 24

(25)

Example

I

24 hourly periods, 5 ads (A–E)

I

total traffic:

Optimal Advertising 25

(26)

Example

I

ad data:

Ad A B C D E

c

i

61000 80000 61000 23000 64000

R

i

0.15 1.18 0.57 2.08 2.43

B

i

25000 12000 12000 11000 17000

Optimal Advertising 26

(27)

Example P

it

Optimal Advertising 27

(28)

Example optimal D

it

Optimal Advertising 28

(29)

Example

ad revenue

Ad A B C D E

c

i

61000 80000 61000 23000 64000

R

i

0.15 1.18 0.57 2.08 2.43

B

i

25000 12000 12000 11000 17000

P

t

D

it

61000 80000 148116 23000 167323 S

i

182 12000 12000 11000 7760

Optimal Advertising 29

(30)

Outline

Portfolio Optimization Worst-Case Risk Analysis Optimal Advertising Regression Variations Model Fitting

Regression Variations 30

(31)

Standard regression

I

given data (x

i

, y

i

) ∈ R

n

× R, i = 1, . . . , m

I

fit linear (affine) model ˆ y

i

= β

T

x

i

− v , β ∈ R

n

, v ∈ R

I

residuals are r

i

= ˆ y

i

− y

i

I

least-squares: choose β, v to minimize kr k

22

=

Pi

r

i2

I

mean of optimal residuals is zero

I

can add (Tychonov) regularization: with λ > 0, minimize kr k

22

+ λkβk

22

Regression Variations 31

(32)

Robust (Huber) regression

I

replace square with Huber function

φ(u) =

(

u

2

|u| ≤ M

2Mu − M

2

|u| > M M > 0 is the Huber threshold

I

same as least-squares for small residuals, but allows (some) large residuals

Regression Variations 32

(33)

Example

I

m = 450 measurements, n = 300 regressors

I

choose β

true

; x

i

∼ N (0, I)

I

set y

i

= (β

true

)

T

x

i

+ 

i

, 

i

∼ N (0, 1)

I

with probability p, replace y

i

with −y

i

I

data has fraction p of (non-obvious) wrong measurements

I

distribution of ‘good’ and ‘bad’ y

i

are the same

I

try to recover β

true

∈ R

n

from measurements y ∈ R

m

I

‘prescient’ version: we know which measurements are wrong

Regression Variations 33

(34)

Example

50 problem instances, p varying from 0 to 0.15

Regression Variations 34

(35)

Example

Regression Variations 35

(36)

Quantile regression

I

tilted `

1

penalty: for τ ∈ (0, 1),

φ(u) = τ (u)

+

+ (1 − τ )(u)

= (1/2)|u| + (τ − 1/2)u

I

quantile regression: choose β, v to minimize

Pi

φ(r

i

)

I

τ = 0.5: equal penalty for over- and under-estimating

I

τ = 0.1: 9× more penalty for under-estimating

I

τ = 0.9: 9× more penalty for over-estimating

Regression Variations 36

(37)

Quantile regression

I

for r

i

6= 0,

Pi

φ(r

i

)

∂v = τ |{i : r

i

> 0}| − (1 − τ ) |{i : r

i

< 0}|

I

(roughly speaking) for optimal v we have τ |{i : r

i

> 0}| = (1 − τ ) |{i : r

i

< 0}|

I

and so for optimal v , τ m = |{i : r

i

< 0}|

I

τ -quantile of optimal residuals is zero

I

hence the name quantile regression

Regression Variations 37

(38)

Example

I

time series x

t

, t = 0, 1, 2, . . .

I

auto-regressive predictor:

ˆ

x

t+1

= β

T

(x

t

, . . . , x

t−M

) − v

I

M = 10 is memory of predictor

I

use quantile regression for τ = 0.1, 0.5, 0.9

I

at each time t, gives three one-step-ahead predictions:

ˆ

x

t+10.1

, x ˆ

t+10.5

, x ˆ

t+10.9

Regression Variations 38

(39)

Example time series x

t

Regression Variations 39

(40)

Example

x

t

and predictions ˆ x

t+10.1

, ˆ x

t+10.5

, ˆ x

t+10.9

(training set, t = 0, . . . , 399)

Regression Variations 40

(41)

Example

x

t

and predictions ˆ x

t+10.1

, ˆ x

t+10.5

, ˆ x

t+10.9

(test set, t = 400, . . . , 449)

Regression Variations 41

(42)

Example

residual distributions for τ = 0.9, 0.5, and 0.1 (training set)

Regression Variations 42

(43)

Example

residual distributions for τ = 0.9, 0.5, and 0.1 (test set)

Regression Variations 43

(44)

Outline

Portfolio Optimization Worst-Case Risk Analysis Optimal Advertising Regression Variations Model Fitting

Model Fitting 44

(45)

Data model

I

given data (x

i

, y

i

) ∈ X × Y, i = 1, . . . , m

I

for X = R

n

, x is feature vector

I

for Y = R, y is (real) outcome or label

I

for Y = {−1, 1}, y is (boolean) outcome

I

find model or predictor ψ : X → Y so that ψ(x ) ≈ y for data (x , y ) that you haven’t seen

I

for Y = R, ψ is a regression model

I

for Y = {−1, 1}, ψ is a classifier

I

we choose ψ based on observed data, prior knowledge

Model Fitting 45

(46)

Loss minimization model

I

data model parametrized by θ ∈ R

n

I

loss function L : X × Y × R

n

→ R

I

L(x

i

, y

i

, θ) is loss (miss-fit) for data point (x

i

, y

i

), using model parameter θ

I

choose θ; then model is

ψ(x ) = argmin

y

L(x , y , θ)

Model Fitting 46

(47)

Model fitting via regularized loss minimization

I

regularization r : R

n

→ R ∪ {∞}

I

r (θ) measures model complexity, enforces constraints, or represents prior

I

choose θ by minimizing regularized loss (1/m)

X

i

L(x

i

, y

i

, θ) + r (θ)

I

for many useful cases, this is a convex problem

I

model is ψ(x ) = argmin

y

L(x , y , θ)

Model Fitting 47

(48)

Examples

model L(x , y , θ) ψ(x ) r (θ)

least-squares

T

x − y )

2

θ

T

x 0

ridge regression

T

x − y )

2

θ

T

x λkθk

22

lasso

T

x − y )

2

θ

T

x λkθk

1

logistic classifier log(1 + exp(−y θ

T

x )) sign(θ

T

x ) 0 SVM (1 − y θ

T

x )

+

sign(θ

T

x ) λkθk

22

I

λ > 0 scales regularization

I

all lead to convex fitting problems

Model Fitting 48

(49)

Example

I

original (boolean) features z ∈ {0, 1}

10

I

(boolean) outcome y ∈ {−1, 1}

I

new feature vector x ∈ {0, 1}

55

contains all products z

i

z

j

(co-occurence of pairs of original features)

I

use logistic loss, `

1

regularizer

I

training data has m = 200 examples; test on 100 examples

Model Fitting 49

(50)

Example

Model Fitting 50

(51)

Example

selected features z

i

z

j

, λ = 0.01

Model Fitting 51

References

Related documents

a church-based debt advice centre; running financial education, money management and ‘cooking on a budget’ courses (e.g. in schools, or for people in the wider community);

Convex sets, (Univariate) Functions, Optimization problem Unconstrained Optimization: Analysis and Algorithms Constrained Optimization: Analysis and Algorithms

 It was observed from literature that glass additives can be used for sintering of barium titanate and modified barium titanate at low temperature and to get

From the above table it is clear that the 35% of respondent prefer big bazaar for the price of the product, 23.5% prefer big bazaar for the availability of all product under one

Normal metabolic effects of thyroid hormone on different tissues: Liver: increase glycolysis, cholesterol synthesis, and conversion of cholesterol into bile salts Adipose: amplifies

These results suggest a re-evaluation of the importance of generator wholesale market power in vertically integrated electricity industries, and of measures to improve retail

No matter how well you plan and execute the sample collection process, there is likely to be some error in the results. •   Sampling error : the difference

Where duty payment is applicable, permit applicant should pay the duty at any branch of Bank of China (Hong Kong) Limited, Nanyang Commercial Bank Limited or Chiyu Banking