Parallel ADMM for robust quadratic optimal resource allocation problems

(1)

resource allocation problems

Zawar Qureshi Sebastian East Mark Cannon

University of Oxford

July 10, 2019

(2)

Optimize power delivered by i.c. engine and electric motor while meeting driver demand

driver demand supervisory controller

actuator

setpoints vehicle

power

SoC, torque, speed

Minimize fuel consumption over a future horizon given

? limits on energy capacity (battery SoC, fuel), power flows, torques

(3)

Piecewise quadratic maps fitted to fuel map &

electrical loss map _battery _/

fuel Pbat Pfuel motor-generator i.c. engine Pmot Peng Σ Pout storage conversion 264 264 267 267 267 270 270 270 270 270 273 273 273 273 273 277 277 277 277 277 280 280 280 280 280 284 284 284 284 284 289 289 289 289 289 293 293 293 293 293 298 298 298 298 298 302 302 302 302 302 308 308 308 308 313 313 313 313 318 318 318 324 324 330 330 337 337 343 343 350 350 100 150 200 250 300 350 400 450 500 550 eng (rad/s) 50 100 150 200 250 300 350 Teng (Nm) 0 50 100 150 200 250 300 350 Pf (kW) BSFC (g/kWh)

Engine fuel map

1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4 5 5 5 5 5 5 6 6 6 6 6 6 7 7 7 7 7 8 8 8 8 8 9 9 9 9 10 10 10 10 11 11 11 11 12 12 12 12 0 50 100 150 200 250 300 350 400 450 500 550 mot (rad/s) -250 -200 -150 -100 -50 0 50 100 150 200 Tmot (Nm) -150 -100 -50 0 50 100 Pbat (kW) Ploss (kW) Electrical losses 3/ 19

(4)

This paper:

B Stochastic demand & robust optimization

B Multiple resources

B Quadratic losses & quadratic costs

B Parallel ADMM implementation

/ source 1 loss 1

source 2 loss 2 Σ output

(5)

This paper:

B Stochastic demand & robust optimization

B Multiple resources

B Quadratic losses & quadratic costs

B Parallel ADMM implementation / / source 1 loss 1 loss 2 source 2 lossn sourcen Σ Σ output storage conversion 4/ 19

(6)

xk gk(·) fk(·) 1 z−1 1 z−1 X k gk(xk) X k fk(xk)

(7)

x(1)_k x(2)_k g(1)_k (·) f_k(1)(·) g(2)_k (·) f_k(2)(·) 1 z−1 1 z−1 1 z−1 1 z−1 X k g_k(1)(x(1)_k ) X k f_k(1)(x(1)_k ) X k g_k(2)(x(2)_k ) X k f_k(2)(x(2)_k ) 5/ 19

(8)

/ / x(1)_k x(2)_k g(1)_k (·) f_k(1)(·) g(2)_k (·) f_k(2)(·) 1 z−1 1 z−1 1 z−1 1 z−1 Σ Σ X k g_k(1)(x(1)_k ) X k g_k(2)(x(2)_k ) X i,k f_k(i)(x(_ki)) X i x(_ki)

(9)

/ / / / x(1)_k x(2)_k x(_kn) g(1)_k (·) f_k(1)(·) g(2)_k (·) f_k(2)(·) g(_kn)(·) f_k(n)(·) 1 z−1 1 z−1 1 z−1 1 z−1 1 z−1 1 z−1 Σ Σ Σ Σ X k g_k(1)(x(1)_k ) X k g_k(2)(x(2)_k ) X k g_k(n)(x(_kn)) X i,k f_k(i)(x(_ki)) X i x(_ki) 5/ 19

(10)

If demand sequence: {y1, . . . , yn}is known

Optimal allocation for no uncertainty

minimize

x_k(i)∈[x_k(i),x¯(_ki)]

X

i,k

f_k(i)(x(i)_k ) ←total cost

subject to X

i

x(i)_k ≥yk ∀k ←demand

X

k

g_k(i)(x(i)_k )≤c(i) ∀i ←resource capacity

Assumption

f_k(i)(·),g_k(i)(·), are convex and quadratic:

f_k(i)(x) =α(i)_k,2x2+α_k,1(i)x+α(i)_k,0

(11)

If demand sequence: {y1, . . . , yn}is known

Optimal allocation for no uncertainty

minimize

x_k(i)∈[x_k(i),x¯(_ki)]

X

i,k

f_k(i)(x(i)_k ) ←total cost

subject to X

i

x(i)_k ≥yk ∀k ←demand

X

k

g_k(i)(x(i)_k )≤c(i) ∀i ←resource capacity

Assumption

f_k(i)(·),g_k(i)(·), are convex and quadratic:

f_k(i)(x) =α(i)_k,2x2+α_k,1(i)x+α(i)_k,0

g_k(i)(x) =β_k,2(i)x2+β_k,1(i)x+β_k,0(i)

(12)

Unknown demand sequence with samplesy(j)₌_{_y(j)

1 , . . . , y

(j)

n }, j= 1, . . . , q

Robust optimal allocation

minimize x_k(i,j)∈[x_k(i),¯x(_ki)] x(₁i) 1 q X i,j,k f_k(i,j)(x(i,j)_k ) subject to X i x(i,j)_k ≥y_k(j) ∀k, j X k

g_k(i,j)(x(i,j)_k )≤c(i) ∀i, j

x(i,j)₁ =x(i)₁ ∀i, j ←common 1st decision

Assumption

(13)

Unknown demand sequence with samplesy(j)₌_{_y(j)

1 , . . . , y

(j)

n }, j= 1, . . . , q

Robust optimal allocation

minimize x_k(i,j)∈[x_k(i),¯x(_ki)] x(₁i) 1 q X i,j,k f_k(i,j)(x(i,j)_k ) subject to X i x(i,j)_k ≥y_k(j) ∀k, j X k

g_k(i,j)(x(i,j)_k )≤c(i) ∀i, j

x(i,j)₁ =x(i)₁ ∀i, j ←common 1st decision

Assumption

Samplesy(1)_{, . . . , y}(q) _{are i.i.d.}

(14)

Equivalent problem: minimize x(_ki,j), x(₁i) z_k(i,j), s(_kj), t(i,j) X i,j,k h 1 qf (i,j) k (x (i,j) k ) + I[x(_ki),x¯(_ki)](x (i,j) k ) i +X j,k I≥0(s (j) k ) + X i,j I_≤c(i)(t(i,j))

subject to g_k(i,j)(x(i,j)_k ) =z(i,j)_k ∀i, j, k

X i x(i,j)_k −y_k(j)=s(j)_k ∀j, k X k z(i,j)_k =t(i,j) ∀i, j x(i,j)₁ =x(i)₁ ∀i, j

(15)

X i x(i,j)_k −y_k(j)=s(j)_k ∀j, k X k z(i,j)_k =t(i,j) ∀i, j x(i,j)₁ =x(i)₁ ∀i, j IS(x) = ( 0 x∈ S +∞ x /∈ S 8/ 19

(16)

X i x(i,j)_k −y_k(j)=s(j)_k ∀j, k X k z(i,j)_k =t(i,j) ∀i, j x(i,j)₁ =x(i)₁ ∀i, j

split capacity constraints into: separable nonlinearities & linear constraints

(17)

Augmented Lagrangian: L=X i,j,k 1 qf (i,j) k (x (i,j) k ) +X i,j,k h I_[x(i) k ,¯x (i) k ] (x(i,j)_k ) +ρ1 2 z (i,j) k −g (i,j) k (x (i,j) k ) +λ (i,j) k 2i +X i,j h I≤c(i)(t(i,j)) +ρ₂2 t(i,j)− X k z_k(i,j)+p(i,j)2i +X j,k h I≥0(s (j) k ) + ρ3 2 s (j) k − X i x(i,j)_k +y_k(j)+µ(j)_k 2i +X i,j ρ4 2 x (i) 1 −x (i,j) 1 +ν (i,j)2

? λ(i,j)_k ,µ(j)_k ,ν(i,j)_,_p(i,j)_{: Lagrange multipliers}

? ρ1,ρ2,ρ3: multiplier update step size parameters

(18)

ADMM iteration: primal update x(i,j)_k ←argmin x(_ki,j) L= Π_[x(i) k ,x¯ (i) k ]

{minimizer of quartic equation inx(i,j)_k }

z_k(i,j) ←argmin

z(_ki,j)

L=

g_k(i,j)(x(i,j)_k )−λ(i,j)_k + ρ2

ρ1+nρ2 h t(i,j)₊_p(i,j)₋P k g (i,j) k (x (i,j) k )−λ (i,j) k i x(i)₁ ←argmin x(₁i) L= P j(x (i,j) 1 −ν(i,j)) t(i,j) ←argmin t(i,j) L= Π_≤_c(i) n P kz (i,j) k −p (i,j)o s(j)_k ←argmin s(_kj) L= Π≥0 n P ix (i,j) k −y (j) k −µ (j) k o

(19)

ADMM iteration: primal update x(i,j)_k ←argmin x(_ki,j) L= Π_[x(i) k ,x¯ (i) k ]

{minimizer of quartic equation inx(i,j)_k }

z_k(i,j) ←argmin

z(_ki,j)

L=

g_k(i,j)(x(i,j)_k )−λ(i,j)_k + ρ2

ρ1+nρ2 h t(i,j)₊_p(i,j)₋P k g (i,j) k (x (i,j) k )−λ (i,j) k i x(i)₁ ←argmin x(₁i) L= P j(x (i,j) 1 −ν(i,j)) t(i,j) ←argmin t(i,j) L= Π_≤_c(i) n P kz (i,j) k −p (i,j)o s(j)_k ←argmin s(_kj) L= Π≥0 n P ix (i,j) k −y (j) k −µ (j) k o

can be implemented in parallel partially parallelizable

(20)

ADMM iteration: dual update µ(j)_k ← µ(j)_k +s(j)_k −P ix (i,j) k +y (j) k

λ(i,j)_k ← λ(i,j)_k +z_k(i,j)−g(i,j)_k (x(i,j)_k )

ν(i,j)← ν(i,j)+x(i)₁ −x(i,j)₁

p(i,j) ← p(i,j)₊_t(i,j)₋P

kz

(i,j) k

(21)

ADMM iteration: dual update µ(j)_k ← µ(j)_k +s(j)_k −P ix (i,j) k +y (j) k

λ(i,j)_k ← λ(i,j)_k +z_k(i,j)−g(i,j)_k (x(i,j)_k )

ν(i,j)← ν(i,j)+x(i)₁ −x(i,j)₁

p(i,j) ← p(i,j)₊_t(i,j)₋P

kz

(i,j) k

can be implemented in parallel partially parallelizable

(22)

CUDA heterogeneous programming model

CUDA kernels run in parallel on the GPU

Threads execute same instructions simultaneously using different data

Execution CPU GPU CPU GPU CPU serial code parallel kernel 0 serial code parallel kernel 1 serial code

(23)

CPU main memory device m emo ry L1 cache control data GPU | {z } GPU cores

e.g. Nvidia GTX 1060 3GB GPU has 1152 cores and up to 18432 threads

(24)

? x∗= argmin_xAx4₊_Bx3₊_Cx2₊_Dx

? Fast algebraic solution based on Vieta’s and Cardano’s methods:

Input :coefficientsA,B,C,D b←3B/4A,c←C/2A,d←D/4A; Q←c/3−b2_/₉_,_R_←_bc/₆₋_b3_/₂₇₋_d/₂_,_∆_←_Q3₊_R2_; if∆>0then x∗←(R+√∆)1/3_{+ (}_R₋√_∆)1/3₋_b/₃_; else ifQ=R= 0then x∗← −b/3; else θ←cos−1₍_R/_|_Q_|3/2₎_; xa←2|Q|1/2cos (θ/3)−b/3; xb←2|Q|1/2cos (θ/3 + 2π/3)−b/3; xc←2|Q|1/2cos (θ/3 + 4π/3)−b/3; (x1, x2, x3)←sort(xa, xb, xc); δf←1 4(x 4 1−x43) + b 3(x 3 1−x33) + c 2(x 2 1−x23) +d(x1−x3); ifδf >0then x∗_←_x 3; else x∗←x ;

(25)

Computation time for minimization ofN quartics with random coefficients (CPU speed optimizations via compiler flags /Ox and /Od)

102 104 106 Number of equations (N) 10-5 10-4 10-3 10-2 10-1 100 101 102 Time (s)

CPU single-core optimisations off CPU multi-core optimisations on GPU

(26)

? y_k(j): samples of stochastic driver power demand

? x(1,j)_k : drive power from i.c. engine

? x(2,j)_k : drive power from electric motor

? Objective: minimize fuel consumption (f_k(2,j)= 0 ∀j, k) constraints: battery capacity & power flows (g(1,j)_k = 0 ∀j, k)

Robust optimization problem

minimize x_k(i,j)∈[x_k(i),x¯(_ki)] x(₁i) 1 q q X j=1 n X k=1 f_k(1,j)(x(1,j)_k ) subject to x(1,j)_k +x(2,j)_k ≥y_k(j), j∈ {1, . . . , q}, k∈ {1, . . . n} P kg (2,j) k (x (2,j) k )≤∆E, j∈ {1, . . . , q} x(i,j)₁ =x(i)₁ , i= 1,2, j∈ {1, . . . , q}

(27)

Optimal predicted battery state of charge profiles with 100 power demand scenarios generated from random perturbations of FTP-75 drive cycle

(28)

Average computation times

100 101 102

Number of demand scenarios, q

10-3 10-2 10-1 100 101 102 Time (s) CPU runtime GPU runtime

(29)

Average computation times

100 101 102

Number of demand scenarios, q

10-3 10-2 10-1 100 101 102 Time (s) 0 50 100 150 200 250 300 Number of iterations CPU runtime GPU runtime Number of iterations

? Parallel implementation is between 10×and20×faster than serial

? 0.25 s for 100 scenarios (0.6 s for 200 scenarios) is acceptable withTsamp= 1s

(30)

Contributions

? ADMM algorithm for robust quadratic resource allocation problems

? Parallel implementation on GPU coded in CUDA

Observations

? Choice of operator splitting is important for parallel implementation

? Robust optimal energy management problem is solvable in real time

using cheap, non-specialized hardware

? State of the art low-cost parallel processing hardware is evolving fast

Code: https://github.com/qureshizawar/CUDA-quartic-solver

(31)

Contributions

Observations

? Robust optimal energy management problem is solvable in real time using cheap, non-specialized hardware

Questions?

(32)

Contributions

Observations

? Robust optimal energy management problem is solvable in real time using cheap, non-specialized hardware

(33)