• No results found

Distributed Discrete-time Optimization with Coupling Constraints Based on Dual Proximal Gradient Method in Multi-agent Networks

N/A
N/A
Protected

Academic year: 2021

Share "Distributed Discrete-time Optimization with Coupling Constraints Based on Dual Proximal Gradient Method in Multi-agent Networks"

Copied!
10
0
0

Loading.... (view fulltext now)

Full text

(1)

Distributed Discrete-time Optimization with Coupling Constraints Based on Dual Proximal

Gradient Method in Multi-agent Networks

Jianzheng Wang, Guoqiang Hu, Senior Member, IEEE

Abstract—In this paper, we aim to solve a distributed optimiza- tion problem with coupling constraints based on proximal gradi- ent method in a multi-agent network, where the cost function of the agents is composed of smooth and possibly non-smooth parts.

To solve this problem, we resort to the dual problem by deriving the Fenchel conjugate, resulting in a consensus based constrained optimization problem. Then, we propose a fully distributed dual proximal gradient algorithm, where the agents make decisions only with local parameters and the information of immediate neighbours. Moreover, provided that the non-smooth parts in the primal cost functions are with some simple structures, we only need to update dual variables by some simple operations and the overall computational complexity can be reduced. Analytical convergence rate of the proposed algorithm is derived and the efficacy is numerically verified by a social welfare optimization problem in the electricity market.

Index Terms—Multi-agent network; proximal gradient method; distributed optimization; Fenchel conjugate; dual problem.

I. INTRODUCTION

A. Background and Motivation

Decentralized optimization has become an active topic in recent years for solving various engineering problems, such as detection and localization in sensor networks [1], machine learning and regression problems [2], and economic dispatch in power systems [3], etc. As a typical optimization architec- ture, each agent maintains an individual cost function and the global optimal solution can be attained with multiple rounds of communications and decision-makings. In this paper, we focus on a class of composite optimization problems, where the cost functions are composed of smooth (differentiable) and possibly non-smooth (non-differentiable) parts, which are often dis- cussed in various fields, such as resource allocation problems [4], Lasso regressions [5], and support vector machines [6], etc. To solve these problems, widely discussed techniques include alternating direction method of multipliers [7], primal- dual subgradient methods [8], and proximal gradient methods [9], etc .

Most existing works on decentralized optimizations assume that the agents are fully connected to ensure the correctness of the optimization results, which limits their usage in large- scale distributed networks [10, 11]. To overcome this issue, a valid alternative is applying graph theory in modelling

Jianzheng Wang and Guoqiang Hu are with the School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, 639798 e-mail: ([email protected], [email protected]).

the communication links, leading to the distributed setup where the agents only communicate with their immediate neighbours [12]. However, with the increasing demand on the computational efficiency in various fields, more explorations on the algorithm development for distributed optimization problems (DOPs) are needed [13]. Noticing that proximal gradient methods are usually numerically more stable than the subgradient based counterparts in composite optimizations [14], in this work, we aim to develop an efficient distributed optimization algorithm based on proximal gradient method.

B. Literature Review

Fruitful distributed algorithms for solving DOPs can be found in the existing works. To adapt to large-scale distributed networks, consensus based DOPs without coupling constraint were studied in [15–19], where the agents make decisions with local variables and certain agreement on the optimal solution is achieved only through local communications.

Alternatively, we focus on optimizing a class of composite DOPs subject to coupling affine constraints via dual proximal gradient method. In this work, dual proximal gradient method corresponds to the proximal gradient method applied to the dual as also discussed in [4, 20, 21], where, however, no coupling constraint is considered. To the best knowledge of the authors, this work incorporates dual proximal gradient method in distributed setups with general coupling affine constraints for the first time, which enriches the existing algorithms for constrained DOPs.

To develop a fully distributed algorithm for the problem of interest, we propose a distributed dual proximal gradient (DDPG) algorithm. To highlight the new features and advan- tages of this work, the comparisons with some state-of-the-art works with similar problem setups are listed as follows.

One distinct feature of the proposed DDPG algorithm is that, by resorting to the dual problem, we can only update the dual variables by some simple operations, e.g., basic proximal mappings and simple iterations, provided that the proximal mapping of the non-smooth parts in the primal cost functions can be explicitly derived, which is more efficient than the existing distributed algorithms with possibly costly computations of the primal variables

arXiv:2108.10652v1 [math.OC] 24 Aug 2021

(2)

or other auxiliary variables, e.g., provided in [22–31].1

In [22, 24–31], some common fixed or varying global step-sizes are required. By contrast, the proposed fully distributed DDPG algorithm allows for heterogenous step-sizes determined by local information, e.g., private objective functions and local parameters in the global constraints, which provides more flexibilities for the initialization process and is more adaptive to large-scale distributed networks.

The consensus based distributed optimization algorithms studied in [23, 26, 27, 31] require the updating of some weighted running averages of variables or gradients, which increase the computational complexity and require more memory capacity for the auxiliary variables.

An explicit convergence rate is derived for the proposed DDPG algorithm, which is not provided in [24, 26–

28, 30]. In addition, the algorithms in [22, 24–28, 30, 31]

assume some compact local constraints to ensure the con- vergence of the algorithms. By contrast, this work focuses on dual sequences without boundedness requirement on the primal variables.

The contributions of this work are summarized as follows.

We consider a class of composite DOPs with local convex and coupling affine constraints. A DDPG algorithm is proposed by deriving the dual problem based on Fenchel conjugate, where the optimal solution can be attained when the agents execute updates only with the dual infor- mation of immediate neighbours and locally determined step-sizes, leading to a fully distributed computation environment.

Different from the existing research works with similar problem setups, the proposed DDPG algorithm only requires the update of dual variables by some simple operations if the non-smooth parts of the objective func- tions are simple-structured, which can reduce the over- all computational complexity. In addition, the proposed DDPG algorithm requires some widely used assumptions on the primal problems and explicit convergence rate is provided.

C. Paper Structure and Notations

The remainder of this paper is organized as follows. Section II provides some fundamental definitions and mathematical properties employed by this work. Section III formulates the optimization problem of interest and introduces the assump- tions. In Section IV, the dual problem is formulated and the DDPG is proposed therein. The convergence analysis is conducted in Section V. The efficacy of the proposed DDPG algorithm is demonstrated in Section VI with a social welfare optimization problem in the electricity market. Section VII concludes this paper.

1For the DOPs with smooth cost functions, some existing works on dual algorithms, e.g., [32], can avoid the update of primal variables. However, directly extending their results to non-smooth cases can be costly in the sense that the computation of the gradient of the formulated dual function requires an additional nontrivial optimization process. Therefore, the contribution to computational efficiency of this work is established for possibly non-smooth cost functions.

N and N+ denote the non-negative and positive integer spaces, respectively. Let notation | A | be the size of set A.

Rn+denotes the n-dimensional Euclidian space only with non- negative real elements. Operator (·)> represents the transpose of a matrix. A1× A2denotes the Cartesian product of sets A1

and A2. relintA represents the relative interior of set A. k · k and k · k1 refer to the l2- and l1-norms, respectively. Define kuk2X = u>Xu with X a square matrix. ⊗ is Kronecker product. In is an n-dimensional identity matrix and On×m

is an (n × m)-dimensional zero matrix. 1n and 0n denote the n-dimensional column vectors with all elements 1 and 0, respectively. Define

DmA[un] =

u1Im O

. ..

O u|A|Im

∈ Rm|A|×m|A|. (1)

II. PRELIMINARIES

Some frequently used definitions and relevant properties of graph theory, proximal mapping, and Fenchel conjugate are provided in this section.

A. Graph Theory

Define an undirected graph G = {V, E} for a multi-agent network, where V = {1, 2, ..., N } is the set of vertices and E ⊆ {(i, j)|i, j ∈ V and i 6= j} is the set of edges with (i, j) ∈ E unordered. G is connected if any two distinct vertices are linked by at least one path. Vi= {j|(i, j) ∈ E} is the neighbour set of agent i. Let L ∈RN ×N be the Laplacian matrix of G. Then, the (i, j)th element of L, defined by dij, follows dij = −1 if (i, j) ∈ E, dij = 0 if (i, j) /∈ E & i 6= j, and dii=| Vi| [33].

B. Proximal Mapping

A proximal mapping of a proper, convex, and closed function ψ : Rn → (−∞, +∞] is defined by proxαψ[v] = arg minu(ψ(u)+1 ku−vk2), α > 0, v ∈ Rn.2A generalized version of proximal mapping can be defined as

proxXψ[v] = arg min

u (ψ(u) +1

2ku − vk2X−1), (2) with X ∈Rn×n a positive definite matrix [20].

C. Fenchel Conjugate

ψ : Rn → (−∞, +∞] is a proper function. Then, the Fenchel conjugate of ψ is defined by ψ(v) = supu{v>u − ψ(u)}, which is convex [34, Sec. 3.3].

Lemma 1. (Extended Moreau Decomposition [35, Thm. 6.45]) ψ : Rn → (−∞, +∞] is a proper, convex, and closed function.ψ is its Fenchel conjugate. Then,

v = αproxψ1α[v

α] + proxαψ[v], (3) v ∈ Rn,α > 0,

2The proximal mapping can be equivalently written as proxαψas in some other works.

(3)

Lemma 2. [20, Lemma V.7] Let ψ : Rn → (−∞, +∞] be a proper, closed, σ-strongly convex function and ψ be its Fenchel conjugate,σ > 0. Then,

arg max

u (v>u − ψ(u)) = ∇vψ(v) (4) and ∇vψ(v) is σ1-Lipschitz continuous.

III. PROBLEMFORMULATION

The problem formulation and relevant assumptions are provided as follows.

Let F (x) = P

i∈V Fi(xi) be the global cost function of a multi-agent network G = {V, E}, xi ∈ RM, x = [x>1, ..., x>N]> ∈ RN M. Agent i maintains the private cost function Fi(xi) = fi(xi) + gi(xi). Let Xi ⊆ RM be the feasible region of xi. Then, the feasible region of x can be defined by X = X1× X2× ... × XN ⊆ RN M. Then, an affine-constrained optimization problem of G can be given by

(P1) min

x∈X

X

i∈V

Fi(xi) subject to Ax = b, A ∈ RB×N M, b ∈RB, which is equivalent to

(P2) min

x

X

i∈V

(Fi(xi) + IXi(xi)) subject to Ax = b,

withIXi(xi) =

 0, if xi∈ Xi, +∞, otherwise [36].

Remark 1. Note that for an inequality constraint Ax  b, one can formulate an equality constraintAx+y = b with y ∈ RB+ being a slack variable. Then, the inequality-constrained problem can be equivalently written as

(P1+) min

x∈X,y∈RB+

X

i∈V

Fi(xi) subject to Ax + y = b.

To realize decentralized computations, y can be decomposed and assigned to the agents. Hence, the structure of Problem (P1+) complies with that of Problem (P1).

Assumption 1. G is connected and undirected.

Assumption 2. fi : RM → (−∞, +∞] and gi : RM → (−∞, +∞] are both proper, convex, and closed extended real-valued functions. In addition, fi is differentiable and σi- strongly convex, σi> 0, i ∈ V .

Similar assumptions in Assumption 2 can be referred to in [4, 20, 37–41].

Assumption 3. Xi is non-empty, convex and closed, i ∈ V ; there exists an ˘x ∈ relintX such that A˘x = b.

Remark 2. By Assumption 3,IXiis proper, convex, and closed [36], which complies with the assumption ongi. Therefore, it is also feasible to omit the discussion onIXi in Problem (P2) as in [4, 20, 41]. In this work, we highlight the existence of Xi for more detailed discussions.

IV. DISTRIBUTEDDUALPROXIMALGRADIENT

ALGORITHMDEVELOPMENT

In this section, we propose a fully distributed DDPG algo- rithm for solving the problem of interest.

A. Dual Problem

The dual problem of Problem (P2) is formulated in this subsection. By decoupling the objective function of Problem (P2), we have

(P3) min

x,z

X

i∈V

(fi(xi) + (gi+ IXi)(zi)) subject to Ax = b, xi= zi, ∀i ∈ V,

where z = [z>1, ..., z>N]> ∈ RN M with zi ∈ RM a slack vector. The Lagrangian function of Problem (P3) can be given by

L(x, z, θ, µ) =X

i∈V

(fi(xi) + (gi+ IXi)(zi) + µ>i (xi− zi)) + θ>(Ax − b)

=X

i∈V

(fi(xi) + x>i (A>i θ + µi)

+ (gi+ IXi)(zi) − z>i µi) − b>θ, (5) where µi ∈ RM and θ ∈RB are the Lagrangian multiplier vectors associated with constraints xi = zi and Ax = b, respectively. µ = [µ>1, ..., µ>N]>∈ RN M. Ai∈ RB×M is the ith column sub-block of A with A = [A1, ..., Ai, ..., AN].

Then, the dual function can be obtained by minimizing L(x, z, θ, µ) with (x, z), which is

D(θ, µ) = min

x,z

X

i∈V

(fi(xi) + x>i (A>i θ + µi) + (gi+ IXi)(zi) − z>i µi) − b>θ

= min

x,z

X

i∈V

(fi(xi) − x>i Hiλi

+ (gi+ IXi)(zi) − z>ii− κii)

=X

i∈V

(−fi(Hiλi) − κii

− (gi+ IXi)(Fλi)), (6) where Hi = [−A>i , −IM] ∈ RM ×(M +B), λi = [θ>, µ>i ]> ∈ RM +B, F = [OM ×B, IM] ∈ RM ×(M +B), E = [b>, 0>M] ∈ R1×(M +B), P

i∈V κi = 1. The forth equality in (6) employs the definition of Fenchel conjugate and (gi+ IXi) denotes the Fenchel conjugate of gi+ IXi. Hence, the dual problem of Problem (P3) can be formulated as

(P4) min

λ P (λ) + R(λ),

where λ = [λ>1, ..., λ>N]> ∈ RN B+N M, P (λ) = P

i∈V(fi(Hiλi) + κii) and R(λ) = P

i∈V(gi + IXi)(Fλi).

(4)

B. Distributed Dual Proximal Gradient Algorithm

In this subsection, we aim to solve Problem (P4) in a distributed manner based on proximal gradient method.

In Problem (P4), the variables of fi(Hiλi) are coupled in terms of the common component θ in λi, but those of (gi+ IXi)(Fλi) are decoupled since Fλi= µi. In the following, with a slight abuse of notation, we redefine λi= [θ>i , µ>i ]>∈ RM +B, where θiis the local estimate of the common θ. Then, Problem (P4) can be equivalently rewritten as

(P5) min

λ P (λ) + R(λ)

subject to Kλj= Kλl, ∀(j, l) ∈ E, (7) where K = [IB, OB×M]. Constraint (7) ensures the partial consistency among λi in terms of component θi, i.e., θi = Kλi. Let λ = [(λ1)>, ..., (λN)>]> be the optimal solution to Problem (P5) with λi = [(θi)>, (µi)>]>.

In the following, we assume that the range of θi is es- timable, i.e., θi∈ Si with Si ⊂ RB being the estimated non- empty, convex and compact zone. For the convenience of the following discussion, we define Γ = maxi∈V supθi∈Siik.

Note that considering constraint θi ∈ Si is equivalent to accommodating an indicator function ISii) into the non- smooth part [36]. Then, Problem (P5) can be modified into

(P6) min

λ Φ(λ)

subject to Kλj= Kλl, ∀(j, l) ∈ E, (8) where Φ(λ) = P (λ) + Q(λ), P (λ) =P

i∈V pii), Q(λ) = P

i∈V qii), pii) = fi(Hiλi) + κii, qii) = (gi+ IXi)i) + ISii) = (gi+ IXi)(Fλi) + ISi(Kλi). Note that (8) can be represented by a compact equation Mλ = 0, where M = L⊗K ∈RN B×(N B+N M ). It can be checked that Mλ = L ˆθ, where L = L ⊗ IB ∈ RN B×N B is an augmented Laplacian matrix of G and ˆθ = [θ>1, ..., θN>]>∈ RN B.

Then, the Lagrangian function of Problem (P6) can be given by

L(λ, ξ) = P (λ) + Q(λ) + ξ>Mλ, (9) where ξ = [ξ>1, ..., ξ>N]> ∈ RN B is the collection of La- grangian multipliers. Let C be the set of the saddle points of L(λ, ξ). Then, any saddle point (λ, ξ) ∈ C satisfies [42]

L(λ, ξ) ≥ L(λ, ξ) ≥ L(λ, ξ), (10)

∀λ ∈ RN B+N M, ∀ξ ∈ RN B. We aim to seek a saddle point of L(λ, ξ), which can be characterized by Karush-Kuhn-Tucker (KKT) conditions [43]

0 ∈ ∇λP (λ) + ∂λQ(λ) + M>ξ, (11)

= 0. (12)

Based on the previous discussion, the DDPG algorithm for solving Problem (P6) is designed as

λ(t + 1) =proxD

M +B V [cl]

Q λ(t) − DM +BV [cl]

· (∇λP (λ(t)) + M>ξ(t)), (13) ξ(t + 1) =ξ(t) + DBVl]Mλ(t + 1), (14)

which means

λi(t + 1) = proxcqiii(t) − ci(∇λipii(t)) + M>i ξ(t))

= proxcqiii(t) − ci(∇λipii(t))

+X

j∈Vi

K>i(t) − ξj(t))), (15) ξi(t + 1) = ξi(t) + γiM]iλ(t + 1)

= ξi(t) + γiX

j∈Vi

K(λi(t + 1)

− λj(t + 1)), (16)

due to the separability of P and Q, ∀i ∈ V , t ∈ N.

Mi ∈ RN B×(B+M ) and M]i ∈ RB×(N B+N M ) are the ith column and ith row sub-blocks of M, respectively, i.e., M = [M1, ..., Mi, ..., MN] = [(M]1)>, ..., (M]i)>, ..., (M]N)>]>. ci, γi> 0 are step-sizes.

Remark 3. The estimated Si enables the range of θi to be bounded, which, as we will see later, facilitates the conver- gence analysis of DDPG algorithm. Similar settlement can be referred to in [24]. In practice, the estimation of Si relies on the experience in the specific problems. For example, in some social welfare optimization problems in the electricity market, the optimal dual variables can be the settled energy prices [44], whose range can be easy to estimate with historical prices.

Remark 4. From (15) and (16), it can be seen that each agent only needs the information of its neighbours and updates with locally determined step-size (K contains the dimension information of primal and dual variables without other shared global information), which results in a fully distributed com- putation fashion of the DDPG algorithm.

The detailed computation procedure of DDPG algorithm is stated in Algorithm 1.

Algorithm 1 Distributed Dual Proximal Gradient Algorithm

1: Initialize λ(0), ξ(0). Determine step-sizes ci, γi> 0, ∀i ∈ V .

2: for t = 0, 1, 2, ... do

3: for i = 1, 2, ..., N do (in parallel)

4: Update λi by (15).

5: Update ξi by (16).

6: end for

7: Obtain an output (λout, ξout) under certain conver- gence criterion.

8: end for

C. Computational Complexity of DDPG Algorithm

To apply (15), one needs to compute (i) ∇pi and (ii) the proximal mapping of qi, i ∈ V . For (i), ∇pi can be efficiently obtained given that fi is simple-structured and, consequently,

∇fican be analytically derived, e.g., fiis a quadratic function [36, Sec. 3.3.1]. For (ii), some feasible methods for different cases are introduced as follows.

(5)

1) Case 1: If the proximal mapping of gi + IXi can be easily obtained3, by qii) = (gi+ IXi)i) + ISii), we have proxcqi

i = proxcIi

Si× proxc(gi

i+IXi) [35, Thm. 6.6], where proxcIi

Si is an Euclidean projection onto Si [9, Sec. 1.2] and proxc(gi

i+IXi) can be obtained by calculating prox1/cg i

i+IXi with Lemma 1. Then, (15) can be modified into

%i(t) = θi(t) − ci(∇θipii(t)) +X

j∈Vi

i(t) − ξj(t))), (17) ρi(t) = µi(t) − ciµipii(t)), (18) θi(t + 1) = proxcIi

Si%i(t) = ΠSi[%i(t)], (19) µi(t + 1) = proxcqi

1,ii(t) = ρi(t) − ciprox

1 ci

q1,ii(t) ci

], (20) where q1,i = (gi+ IXi), q1,i = (gi+ IXi) = gi+ IXi

due to the convexity and lower semi-continuity of gi+ IXi, and (gi + IXi) is the biconjugate of gi + IXi [36, Sec.

3.3.2], ΠSi[·] is an Euclidean projection onto Si.4 Essentially, (17)-(20) are obtained by decomposing λi(t + 1) and using the above mentioned properties. With the above arrangement, the calculation of the proximal mapping of (gi+ IXi) can be avoided as shown in (20), leading to the reduction of the computational complexity if the proximal mapping of gi+ IXi

is easier to obtain. For instance, in a Lasso problem with penalty gi(xi) = kxik1and Xi= RM, the proximal mapping of l1-norm is a soft thresholding operator with analytical solution [35, Sec. 6.3].

2) Case 2: Take the advantage of the structure of gi in some specific problems. For example, consider a regularization problem, where the penalty is an Euclidean e-norm: gi(xi) = kxike, Xi= RM. Then, we can have

qii) = gii) + ISii)

= IWii) + ISii)

=

 0, if µi∈ Wi & θi∈ Si , +∞, otherwise,

=

 0, if λi∈ Yi , +∞, otherwise,

= IYii), (21)

where Wi = {v ∈ RM|kvke ≤ 1} (convex zone) with k · ke the dual norm of k · ke, Yi = Si× Wi (convex zone).

The second equality holds by computing the conjugate of a norm [36, Sec. 3.3.1]. Then, the proximal mapping of qiis an Euclidean projection onto Yi [9, Sec. 1.2].

3) Case 3: If qi is with certain complicated structure, as a general method, we can construct a strongly convex non- smooth gi (e.g., shift a strongly convex component of the

3This assumption is based on that gi+ IXiis with certain simple structure, which is often the basic assumption in the works on proximal gradient method.

See some frequently used formulas in [35, Sec. 6.3] and applications in [9, Sec. 7].

4%i and ρi are intermediate variables. (17) and (18) can be included in (19) and (20), respectively, to generate a one-step formula for θiand µi.

smooth part to gi). Then, rewrite (15) by the definition of proximal mapping, which gives

λi(t + 1) = arg min

λi (qii) + 1 2ci

i− λi(t) + ci(∇λipii(t))

+X

j∈Vi

K>i(t) − ξj(t)))k2). (22) To solve (22), one can utilize a subgradient descent method by computing

λiqii) =∇λi(gi+ IXi)(Fλi) + ∂λiISi(Kλi)

=F>i(gi+ IXi)(Fλi)

+ K>iISi(Kλi), (23) where ∇i(gi+ IXi)(Fλi) = arg maxu((Fλi)>u − (gi+ IXi)(u)) by Lemma 2.

In Cases 1 and 2, each agent only needs to update λiand ξi

with basic proximal mappings and simple iterations without any other costly computation on primal variables or other auxiliary variables as discussed in [22–31], which reduces the computational complexity. In Case 3, the updating of λi

requires an inner-loop optimization process to compute the subgradient of qi, which can be completed only with local information.

Remark 5. (Extension of assumption on fi) In the case that the structure offi is complicated (can be non-smooth but still strongly convex), (15) can also be implemented by computing

λipii) =∇λifi(Hiλi) + κiE>

=H>iHiλifi(Hiλi) + κiE>, (24) where ∇Hiλifi(Hiλi) = arg maxu((Hiλi)>u − fi(u)) by Lemma 2. However, similar to Case 3, (24) requires a higher computational complexity since an inner-loop optimization process for computing∇fi is involved.

V. CONVERGENCEANALYSIS

The convergence analysis of the proposed DDPG algorithm is conducted in this section.

Lemma 3. With Assumption 2, the Lipschitz constant of

λP (λ) is given by h =q P

i∈V h2i, wherehi= kHσik2

i . See the proof in Appendix A.

Lemma 4. Suppose that Assumptions 1-3 hold. Based on Algorithm 1, for any(λ, ξ) ∈ C and t ∈ N, we have

Φ(λ(t + 1)) − Φ(λ) + (ξ)>Mλ(t + 1) ∈ [0, Ψt], where

Ψt=kλ− λ(t)k2DM +B V [ 1

2cl]

− kλ− λ(t + 1)k2DM +B V [ 1

2cl]

+ kξ− ξ(t)k2DB

V[2γl1 ]− kξ− ξ(t + 1)k2DB V[2γl1 ]

− kλ(t) − λ(t + 1)k2

DM +BV [2cl1 hl2]

− kξ(t) − ξ(t + 1)k2DB V[ 1

2γl]

(6)

+ kMλ(t + 1)k2DB

Vl]. (25)

See the proof in Appendix B.

Theorem 1. Suppose that Assumptions 1-3 hold. Let cih1

i,

∀i ∈ V . By Algorithm 1, for any (λ, ξ) ∈ C, we have

| Φ(¯λ(T + 1)) − Φ(λ) |

≤ Θ(c1, ..., cN, γ1, ..., γN)

T + 1 + O(γmax), (26) kξkkM¯λ(T + 1)k

≤ Θ(c1, ..., cN, γ1, ..., γN)

T + 1 + O(γmax), (27) where Θ(c1, ..., cN, γ1, ..., γN) = kλ− λ(0)k2

DM +BV [ 1

2cl]+ kξk2DB

V[γl4]+ kξ(0)k2DB

V[γl1]max= maxl∈V γl, ¯λ(T + 1) =

1 T +1

PT

t=0λ(t + 1), O(γmax) = γmaxN kLk2Γ2,T ∈ N+. See the proof in Appendix C.

Remark 6. O(γmax) characterizes the upper bound of the stationary error of the algorithm. A larger estimated zone of Simay lead to a largerΓ, leading to a larger stationary error bound. Meanwhile, a smaller γmax can reduce the stationary error bound but may sacrifice the convergence speed.

VI. NUMERICALRESULT

In this section, we demonstrate of the performance of the DDPG algorithm by solving a social welfare optimization problem in an electricity market.

A. Simulation Setup

Define VUC and Vuser as the sets of utility com- panies (UCs) and users, respectively. Define x = [xUC1 , ..., xUC|V

UC|, xuser1 , ..., xuser|V

user|]>, where xUCi is the energy generation quantity of UC i and xuserj is the demand of user j. φi(xUCi ) and ωj(xuserj ) are the cost function of UC i and utility function of user j, respectively, i ∈ VUC, j ∈ Vuser. Then, the social welfare optimization problem of the market can be formulated as

(P7) min

x

X

i∈VUC

φi(xUCi ) − X

j∈Vuser

ωj(xuserj ) subject to X

i∈VUC

xUCi = X

j∈Vuser

xuserj , (28) xUCi ∈ XiUC, ∀i ∈ VUC, (29) xuserj ∈ Xjuser, ∀j ∈ Vuser, (30) where

φi(xUCi ) = δi(xUCi )2+ ϑixUCi + βi, (31) ωj(xuserj ) =

( τjxuserj − πj(xjuser)2, xuserjτj

j,

τj2

j, xuserj >τj

j, (32) with δi, ϑi, βi, τj, πjbeing parameters, whose values are set in Table I [45], ∀i ∈ VUC, ∀j ∈ Vuser. (28) is the supply-demand balance constraint. XiUC= [0, xUCi,max] and Xjuser = [0, xuserj,max] with xUCi,max, xuserj,max> 0. Define A = [1>|VUC|, −1>|V

user|]. Then, (28) is equivalent to Ax = 0.

UC 1

UC 2 user 1

user 3 user 2

Fig. 1. Communication typology of the market.

TABLE I

PARAMETERS OFUCS AND ENERGY USERS

UCs Users

i/j δi ϑi βi xUCi,max τj πj xuserj,max 1 0.0031 8.71 0 150 17.17 0.0935 91.79 2 0.0074 3.53 0 150 12.28 0.0417 147.29

3 - - - - 18.42 0.1007 91.41

Similar to the derivation procedure of (5), the Lagrangian function of Problem (P7) can be obtained as

L(x, z, θ, µ) = X

i∈VUC

i(xUCi ) + IXiUC(zUCi ))

+ X

j∈Vuser

(−ωj(xuserj ) + IXjuser(zjuser)) + θAx + X

i∈VUC

µUCi (xUCi − ziUC)

+ X

j∈Vuser

µuserj (xuserj − zuserj ), (33)

where z = [z1UC, ..., zUC|V

UC|, zuser1 , ..., z|Vuser

user|]> is a slack vector, θ and µ = [µUC1 , ..., µUC|V

UC|, µuser1 , ..., µuser|V

user|]> are dual vectors. Define ˆθ = [θ1UC, ..., θ|VUC

UC|, θ1user, ..., θuser|V

user|]>, which contains the local estimates of θ. Let ξ = [ξUC1 , ..., ξUC|V

UC|, ξ1user, ..., ξ|Vuser

user|]> be the Lagrangian multi- plier vector associated with the constraint on ˆθ as indicated in (9). With some direct calculations, the optimal solution to Problem (P7) is x= [0, 150, 48.5, 50.2, 51.3]>.

B. Simulation Result

To demonstrate the performance of Algorithm 1, we con- sider the communication typology shown in Fig. 1. The simulation result is shown in Figs. 2 to 4. Fig. 2 shows the dynamics of dual variables ˆθ and µ. It can be seen that all the elements in ˆθ converge to θ = −8.1 while µ converges to µ = [−0.61, 2.34, 0, 0, 0]>. One can check that the optimal solution at the saddle point of L is x = arg minxL(x, z, θ, µ) = [0, 150, 48.5, 50.2, 51.3]>5, which means the lower bound and upper bound of xUC1 and xUC2 are activated, respectively, while other variables reach interior optimal solutions. Fig. 3 depicts the dynamics of ξ. Fig. 4

5The minimization with x is independent of z since x and z are decoupled in (33).

(7)

0 5000 10000 15000 Interations

-25 -20 -15 -10 -5 0 5

Value

UC 1

UC 1 UC 2

UC 2 user 1

user 1 user 2 user 2 user 3 user 3

Fig. 2. Dynamics of ˆθ and µ.

0 5000 10000 15000

Interations -150

-100 -50 0 50 100 150

Value

UC 1 UC 2 user 1 user 2 user 3

Fig. 3. Dynamics of ξ.

0 5000 10000 15000

Interations 0

2000 4000 6000 8000 10000 12000

Value

Fig. 4. Dynamics of Φ(λ).

shows that the value of dual function Φ(λ) (as defined in Problem (P6)) is decreased to around 756.53.

VII. CONCLUSION

In this work, we considered solving a composite DOP with both local convex and coupling affine constraints. A fully distributed DDPG algorithm was proposed for solving the this problem by resorting to the dual problem. As a distinct feature compared with the existing research works with similar problem setups, we showed that if the non-smooth parts of the objective functions are with some simple structures, one only needs to update dual variables by some simple operations, leading to the reduction of overall computational complexity.

APPENDIX

A. Proof of Lemma 3 By Lemma 2, ∇fiis σ1

i-Lipschitz continuous, which means k∇vfi(Hiv) − ∇ufi(Hiu)k

=kH>iHivfi(Hiv) − H>iHiufi(Hiu)k

≤kH>i kk∇Hivfi(Hiv) − ∇Hiufi(Hiu)k

≤kH>i k σi

kHiv − Hiuk

≤kHik2 σi

kv − uk = hikv − uk, (34)

∀v, u ∈ RM +B, which means ∇λifi(Hiλi) is hi-Lipschitz continuous and, therefore, ∇λipii) = ∇λifi(Hiλi) + κiE> is also hi-Lipschitz continuous.

On the other hand, due to the separability of P (λ), ∇λP (λ) can be decoupled with respect to each λi, i.e.,

λP (λ) =

λ1p11) ...

λNpNN)

. (35)

By using the Euclidean l2-norm, the Lipschitz constant of

λP (λ) can be obtained as h.

B. Proof of Lemma 4

By the first-order optimality condition of (13) in terms of (2), we have

0 ∈∂λQ(λ(t + 1)) + DM +BV [1 cl

](λ(t + 1)

− λ(t)) + ∇λP (λ(t)) + M>ξ(t)

=∂λQ(λ(t + 1)) − DM +BV [1 cl

](λ(t) − λ(t + 1)) + ∇λP (λ(t)) + M>ξ(t + 1)

− M>DBVl]Mλ(t + 1), (36) where DM +BV [c1

l] = (DM +BV [cl])−1. From the convexity of Q(λ), we have

Q(λ) − Q(λ(t + 1))

≥(λ − λ(t + 1))>DM +BV [1

cl](λ(t) − λ(t + 1))

(8)

− (λ − λ(t + 1))>λP (λ(t))

− (λ − λ(t + 1))>M>ξ(t + 1)

+ (λ − λ(t + 1))>M>DBVl]Mλ(t + 1). (37) From the convexity and hi-Lipschitz continuous differentia- bility of pi, we have

(λ − λ(t + 1))>λP (λ(t))

=X

i∈V

i− λi(t))>λipii(t))

+X

i∈V

i(t) − λi(t + 1))>λipii(t))

≤X

i∈V

(pii) − pii(t))) +X

i∈V

(pii(t))

− pii(t + 1))) + kλ(t) − λ(t + 1)k2

DM +BV [hl2]

=P (λ) − P (λ(t + 1)) + kλ(t) − λ(t + 1)k2

DM +BV [hl2]. (38)

By (14), we have 0 = DBV[1

γl](ξ(t) − ξ(t + 1)) + Mλ(t + 1), (39) where DBV[γ1

l] = (DBVl])−1. Therefore, by multiplying the both sides of (39) by (ξ − ξ(t + 1))>, we have

(ξ − ξ(t + 1))>DBV[1 γl

](ξ(t) − ξ(t + 1))

+ (ξ − ξ(t + 1))>Mλ(t + 1) = 0. (40) By adding (37) and (38) together from the both sides, we have

Φ(λ(t + 1)) − Φ(λ)

≤ − (λ − λ(t + 1))>DM +BV [1 cl

](λ(t) − λ(t + 1)) + (λ − λ(t + 1))>M>ξ(t + 1)

+ kλ(t) − λ(t + 1)k2

DM +BV [hl2]

− (λ − λ(t + 1))>M>DBVl]Mλ(t + 1)

= − (λ − λ(t + 1))>DM +BV [1

cl](λ(t) − λ(t + 1))

− (ξ − ξ(t + 1))>DBV[1 γl

](ξ(t) − ξ(t + 1))

− (ξ − ξ(t + 1))>Mλ(t + 1) + (ξ(t + 1))>

− (ξ(t + 1))>Mλ(t + 1) + kλ(t) − λ(t + 1)k2

DM +BV [hl2]

− (λ − λ(t + 1))>M>DBVl]Mλ(t + 1)

=kλ − λ(t)k2DM +B V [ 1

2cl]− kλ − λ(t + 1)k2DM +B V [ 1

2cl]

− kλ(t) − λ(t + 1)k2DM +B V [2cl1 ]

+ kξ − ξ(t)k2DB V[ 1

2γl]− kξ − ξ(t + 1)k2DB V[ 1

2γl]

− kξ(t) − ξ(t + 1)k2DB

V[2γl1 ]+ (ξ(t + 1))>

− ξ>Mλ(t + 1) + kλ(t) − λ(t + 1)k2

DM +BV [hl2]

− (λ − λ(t + 1))>M>DBVl]Mλ(t + 1)

=kλ − λ(t)k2

DM +BV [2cl1 ]− kλ − λ(t + 1)k2

DM +BV [2cl1 ]

− kλ(t) − λ(t + 1)k2

DM +BV [2cl1 hl2]

+ kξ − ξ(t)k2DB V[ 1

2γl]− kξ − ξ(t + 1)k2DB V[ 1

2γl]

− kξ(t) − ξ(t + 1)k2DB V[ 1

2γl]

+ (ξ(t + 1))>Mλ − ξ>Mλ(t + 1) + kMλ(t + 1)k2DB

Vl]− (Mλ)>DBVl]Mλ(t + 1), (41) where we use (40) in the first equality and the second equality holds with v>u = 12(kvk2+ kuk2− kv − uk2), ∀v, u ∈ RM N +BN.

Let ξ = ξ and λ = λ and rearrange (41), then we have Φ(λ(t + 1)) − Φ(λ) + (ξ)>Mλ(t + 1)

≤kλ− λ(t)k2DM +B V [ 1

2cl]− kλ− λ(t + 1)k2DM +B V [ 1

2cl]

+ kξ− ξ(t)k2DB

V[2γl1 ]− kξ− ξ(t + 1)k2DB V[2γl1 ]

− kλ(t) − λ(t + 1)k2

DM +BV [2cl1 hl2]

− kξ(t) − ξ(t + 1)k2DB V[ 1

2γl]+ kMλ(t + 1)k2DB

Vl], (42) where KKT condition (12) is used. By combining (9), (10) and (12), ∀λ ∈RN M +N B, we have

Φ(λ) − Φ(λ) + (ξ)>Mλ ≥ 0. (43) Based on (42) and (43), the proof is completed.

C. Proof of Theorem 1

Note that (41) holds for all λ ∈RN B+N M and ξ ∈RN B. The proof is conducted by discussing the following two scenarios.

1) Scenario 1:If M¯λ(T + 1) 6= 0, by letting λ = λ and ξ = 2kξkkM ¯M ¯λ(T +1)λ(T +1)k in (41), we have

Φ(λ(t + 1)) − Φ(λ) + 2kξk(M¯λ(T + 1))>

kM¯λ(T + 1)kMλ(t + 1)

≤kλ− λ(t)k2DM +B V [ 1

2cl]− kλ− λ(t + 1)k2DM +B V [ 1

2cl]

+ k2kξk M¯λ(T + 1)

kM¯λ(T + 1)k− ξ(t)k2DB V[2γl1 ]

− k2kξk M¯λ(T + 1)

kM¯λ(T + 1)k− ξ(t + 1)k2DB V[ 1

2γl]

+ kMλ(t + 1)k2DB

Vl], (44)

where cih1

i is considered. Summing up (44) over t = 0, 1, ..., T gives

(T + 1)(Φ(¯λ(T + 1)) − Φ(λ) + 2kξkkM¯λ(T + 1)k)

T

X

t=0

(Φ(λ(t + 1)) − Φ(λ) + 2kξkkM¯λ(T + 1)k)

≤k2kξk M¯λ(T + 1)

kM¯λ(T + 1)k− ξ(0)k2DB V[ 1

2γl]

+ kλ− λ(0)k2DM +B V [2cl1 ]+

T

X

t=0

kMλ(t + 1)k2DB Vl]

References

Related documents