Introduction: Mathematical optimization

(1)

Introduction: Mathematical optimization

Motivating Example Applications

Least-squares(LS) and linear programming(LP)-Very common place

Course goals and topics Nonlinear optimization

Brief history of convex optimization

(2)

Almost Every Problem can be posed as an Optimization Problem

Given a setC⊆ ℜn _{find the ellipsoidE⊆ ℜ}n _{that is of smallest volume suchthatC⊆E.} Hint: First work out the problem in lower dimensions

C

x in C is a vector of size n

x = [x1,x2...xn]

Constraint: x1^2/a1^2 + x2^2/a2^2

+ ....xn^2/an^2 <= 1

a2

a1

(3)

Almost Every Problem can be posed as an Optimization Problem

Given a setC⊆ ℜ n _{find the ellipsoidE⊆ ℜ}n _{that is of smallest volume suchthatC⊆E.} Hint: First work out the problem in lower dimensions

SphereS r ⊆ ℜn _{centered at0is expressedas:}

Sr = { ||x||_2 <= r}

2-norm is the square root of sum of squares

of the individual components of x

(4)

Almost Every Problem can be posed as an Optimization Problem

u

A'

Our basic ellipsoid

had A' = diagonal

SphereS r⊆ ℜn centered at0is expressed as:S r ={u∈ ℜ n|∥u∥2≤r}

EllipsoidE⊆ ℜ n _{is expressedas:}

Av + b

A'u + b'

(5)

Almost Every Problem can be posed as an Optimization Problem

EllipsoidE⊆ ℜ n _{is expressedas:} n E= v∈ ℜ|Av+b∈S{ ₁} { n 2 = v∈ ℜ |∥Av+b∥ ≤ } n ++

1 . Here,A∈S , that is,Ais

ann×n(strictly) positive definitematrix. The optimization problem willbe:

1) A is an nxn matrix

(Sphere and Ellipsoid are both

in R^n)

2) This brings an additional

constraint that A is symmetic,

and it is positive deﬁnite

Prof. Ganesh Ramakrishnan (IIT Bombay) Introduction to Convex Optimization : CS709 July 17,2018 4 / 42

3)That is, A has positive eigen

values..

4) The positive eigenvalues will

correspond to scaling of the axis

and corresponding eigenvectors

to the new axes

(6)

Almost Every Problem can be posed as an Optimization Problem

EllipsoidE⊆ ℜ n _{is expressedas:} n

E= v∈ ℜ|Av+b∈S{ ₁}_{= v∈ ℜ|∥Av+b∥}{ n

2≤1 . Here,A} ∈S ++n , that is,Ais

ann×n(strictly) positive definite matrix. The optimization problem willbe:

minimize

[a11,a12..,ann,b1,..bn]

det(A−1₎

subjecttov T_{Av>0,∀v̸= 0}

_{A is positive deﬁnite}

(7)

Almost Every Problem can be posed as an Optimization Problem (contd.)

Given a polygonP⊆ ℜ n _{find the ellipsoidE⊆ ℜ}n _{that is of smallest volume suchthat}

et v1,v 2, ...vp be the corners of thepolygonP P ⊆E.

L

The optimization problem willbe: minimize [a11,a12..,ann,b1,..bn] subject to−v i det(A−1₎ T_{Av>0,∀v̸= 0} ∥Av +b∥ 2≤1,i∈{1..p}

Introduction to Convex Optimization: CS709 July 17, 2018 5 / 42

v1

v2

v3

v4

v5

Given that the speciﬁed set S is indeed a polygon, is this problem

with a simpliﬁed set of constraints equivalent to the original problem?

(8)

Almost Every Problem can be posed as an Optimization Problem (contd.)

Given a polygonP⊆ ℜ n _{find the ellipsoidE⊆ ℜ}n _{that is of smallest volume suchthat} P⊆E.

Let v1,v 2, ...vp be the corners of the polygonP

The optimization problem willbe: minimize [a11,a12..,ann,b1,..bn] subject to−v det(A−1₎ T_{Av>0,∀v̸= 0} ∥Avi +b∥ 2≤1,i∈{1..p}

How would you pose an optimization problem to find the ellipsoidE ′ _{of largest volume that}

(9)

So Again: Mathematical optimization

fi :ℜ n→ℜ,i= 1, ...,m: constraintfunctions

mini_xmize f0(x)

subject tofi(x)≤b i,i= 1, . . . ,m. x= (x 1, ...,x n): optimizationvariables

optimal solutionx ∗ _{has smallest value off}₀_{among all vectors that satisfy theconstraints}

(10)

Examples

portfolio optimization

variables: amounts invested in differentassets

(11)

Examples

Data fitting- Machine learning

variables: modelparameters

constraints: prior information, parameter limits objective: measure of misfit or prediction error

(12)

(13)

Problem in Perspective

1,c2, . . . ,ck}

Given data pointsx i,i= 1,2, . . . ,m

Possible class choices:c 1,c 2, . . . ,c k

Wish to generate amapping/classifier

f:x→{ c

To get classlabelsy1,y2, . . . ,ym

(14)

Problem in Perspective

In general, series ofmappings

x−→y −→z −→{c ,c , . . .,f(·) g(·) h(·) ₁ ₂ c }_k

(15)

Problem in Perspective

In general, series ofmappings

x−→y −→f(·) g(·) z −→{c ,h(·) ₁c , . . . ,c }₂ _k

wherey,zare in some latentspace.

(16)

Perceptron Classifier

Consider abinary classificationproblem:f(x)∈{−1,+1}

w^Tx + b

w

Objective: Learn a linear classifier With linearlyseparability

(17)

Perceptron Classifier

Objective: Learn alinear classifier

With linearly separability, any finite time learningalgorithm?

(18)

Perceptron Classifier

Desirable: Any new input pattern similar to a seen patternisclassifiedcorrectly

Linear Classification?

w⊤_{ϕ(x) +b≥0for +ve points (y= +1)}

w⊤_{ϕ(x) +b<0for -ve points (y= -1)}

(19)

(20)

Perceptron Update Rule: Error Perspective

Unsigned distance from hyperplane:ywT_(ϕ(x))

wTϕ(x) = 0_b ϕ(x0)

ϕ(x)

D

w

(21)

Perceptron Update Rule: Error Minimization

Perceptron update tries to minimize the errorfunction

E= negative of sum of unsigned distances over misclassified examples = sum of

misclassificationcosts _∑

E=− ywT_ϕ(x)

(x,y)∈M

whereM⊆Dis the set of misclassifiedexamples.

(22)

Machine Learning as Optimization

w_b∗ ₌argmin L(w) +Ω(∥w∥2)

w (1)

0-1 Loss:

L(w) = ∑_(x,y)δ (y̸=w T_ϕ(x)) ₍₂₎

Minimizing the 0-1 Loss is NP-hard. We therefore look forsurrogates. Perceptron:A Non-convexSurrogate

L(w) =− ∑(x,y)∈MywTϕ(x) (3)

(23)

Convex Surrogates for 0-1 Loss in Classification

w_b∗ ₌argmin L(w) +Ω(∥w∥2) w (4) Logistic Regression: L(w) =− 1∑m m i=1 y w ϕ((i) T (i) ( ₍ T

x )−log 1 +exp w ϕ x( (i)) )

)

(5)

Sigmoidal Neural Net:

1 L(w) =− _m m K ∑ ∑ _(i) ( _L _(i) ) y_k log σk x + ( ) ( (i) k (

1−y log 1−σ kxL (i)

) ( ) )

(6)

i=1 k=1

(24)

More Generally..

xrepresents some action suchas

▶ portfolio decisions to be made ▶resources to beallocated ▶schedule to becreated ▶vehicle/airline deflections

Constraints impose conditions on outcome basedon

▶performancerequirements ▶manufacturingprocess

Objectivef 0(x)might correspond to one of the following and should be desirably small ▶total cost

▶risk

(25)

Solving optimization problems

General optimization problems very difficult tosolve

methods involve some compromise, e.g., very long computation time, or not always finding thesolution

Exceptions:certain problem classes can be solved efficiently and reliably least-squaresproblems

linear programming problems convex optimization problems

(26)

Least-squares

mini_xmize∥ Ax−b∥ 2

2

solving least-squares problems

analytical solution: x∗ _{= (A}T_A)−1_AT_b

reliable and efficient algorithms andsoftware

computation time proportional to n2_{k (A∈R}k×n_{); less if structured a}

maturetechnology using least-squares

least-squares problems are easy to recognize

a few standard techniques increase flexibility (e.g., including weights, adding regularization terms)

(27)

Linear programming

mini_xmize cT_x T i

subject toa x≥b i,i= 1, . . . ,m.

solving linear programs

no analytical formula forsolution

reliable and efficient algorithms andsoftware

computation time proportional to n2_{m if m≥n; less with structure a}

maturetechnology using linear programs

not as easy to recognize as least-squaresproblems

a few standard tricks used to convert problems into linear programs (e.g., problems involving l1- or l∞-norms, piecewise-linearfunctions)

[OPTIONAL]

(28)

Convex optimization problem

mini_xmize f0(x)

subject tofi(x)≤b i,i= 1, . . . ,m.

objective and constraint functions areconvex:

fi(αx1 +βx 2)≤αf i(x1) +βfi(x2)

(29)

Example: m lamps illuminating n(small, flat) patches

k

intensity Ik at patch k depends linearly on lamp powerspj: n

∑

I = a p ,kj jakj=rkj−2max{cosθ ,0}kj

j

j=1

problem:Provided the fixed locations(a kj’s), achieve desired illumination Ides with bounded

lamp powers

mini_pmize maxk=1,..,n |log(I k)−log(Ides)|

subject to0≤p j≤p max,j= 1, . . . ,m.

[OPTIONAL]

(30)

Course goals and topics

Goals

recognize/formulate problems (such as the illumination problem) as convex optimization problem

develop code for problems of moderate size (1000 lamps, 5000patches)

characterize optimal solution (optimal power distribution), give limits of performance,etc Topics

Convex sets, (Univariate) Functions, Optimization problem Unconstrained Optimization: Analysis and Algorithms Constrained Optimization: Analysis and Algorithms Optimization Algorithms for MachineLearning

(31)

Grading and Audit

Grading

Quizzes and Assignments: 15% Midsem:25%

Endsem: 45% Project:15% Audit requirement

Quizzes and Assignments and Project