arxiv: v2 [math.oc] 19 Jul 2021

(1)

Distributed Grid Optimization via Distributed Dual Subgradient

Methods with Averaging

Subhonmesh Bose

Hoa Dinh Nguyen

Haitian Liu

Ye Guo

Thinh T. Doan

Carolyn L. Beck

Abstract—A collection of optimization problems central to power system operation requires distributed solution architec-tures to avoid the need for aggregation of all information at a central location. In this paper, we study distributed dual subgradient methods to solve three such optimization problems. Namely, these are tie-line scheduling in multi-area power systems, coordination of distributed energy resources in radial distribution networks, and joint dispatch of transmission and distribution assets. With suitable relaxations or approximations of the power flow equations, all three problems can be reduced to a multi-agent constrained convex optimization problem. We utilize a constant step-size dual subgradient method with averaging on these problems. For this algorithm, we provide a convergence guarantee that is shown to be order-optimal. We illustrate its application on the grid optimization problems.

Index Terms—Distributed optimization, power grid

I. INTRODUCTION

We consider optimization problems that arise in power system operation, where collecting all data at a central location to solve the problem is not an option. Barriers to such data aggregation can arise due to lack of jurisdiction of one entity over all data sources, or in the interest of speed. In such applications, one necessarily must rely on distributed solution architectures. The distributed computational paradigm advocates local computation by ‘agents’ who interact over a network and exchange intermediate variables with other agents across edges of that network to solve the optimization problem. A long literature has emerged on distributed optimization; its application to power system operation has also grown substantially, as our literature survey will indicate. In this paper, we consider three different grid optimization problems and cast them as examples of a constrained convex optimiza-tion program. In each problem setting, we adopt different power flow models, different notions of agents and different notions of the network over which these agents interact. In doing so, we view these disparate grid optimization problems as examples of a generic constrained convex optimization problem that facilitates unified algorithm development and analysis.

We consider three different grid optimization problems:

S. Bose and C.L. Beck are with the University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA. H.D. Nguyen is with the International Institute for Carbon-Neutral Energy Research (WPI-I2_{CNER) and Institute}

of Mathematics for Industry (IMI), Kyushu University, 744 Motooka, Nishi-ku, Fukuoka 819-0395, Japan. H. Liu and Y. Guo are with the Tsinghua-Berkeley Shenzhen Institute, Shenzhen, Guangdong 518055, China. T.T. Doan is with Virginia Tech, Blacksburg, VA 24060 USA. E-mails:boses@illinois. edu,[email protected],[email protected],guo-ye@ sz.tsinghua.edu.cn,[email protected],[email protected]. This project was partially supported by grants from the Power Systems Engineering Research Center (PSERC), JSPS Kakenhi Grant Number JP19K15013, National Science Foundation of China under Grant 51977115.

• Multi-area optimal power flow problem (P1):This problem

seeks to dispatch resources over an interconnected trans-mission network, parts of which are controlled by different system operators. The distributed algorithmic architecture bypasses the need for the system operators to share all relevant data from within their footprint with another system operator and yet seek to solve a joint optimal power flow problem through a distributed solution architecture.

• Coordination of distributed energy resources (DERs) in dis-tribution grids (P2):This problem is designed to optimize

real and reactive power outputs from DERs at the grid-edge to minimize cost (dis-utility) of such a dispatch and possibly a network-wide objective such as frequency regulation. A distributed algorithm allows quick updates of optimization variables without the need to communicate with a central coordinator across the distribution grid.

• Transmission and distribution (T&D) grid coordination (P3): This optimization problem seeks to dispatch assets

across the transmission and distribution grids without the need to collect all information from the grid-edge and the bulk power systems at one location. The distributed solution architecture alleviates the transmission system operator’s lack of visibility into utility-managed distribution networks. Optimization over power grids is typically nonconvex. Non-convexity arises due to the nature of Kirchhoff’s laws [1]. In this paper, we convexify the problems of interest by consid-ering power flow models that are suited to the problem. For P1, we consider a linear power flow model for transmission

networks (see [2]), motivated by the fact that multi-area coordination achieved through wholesale market environments often make use of such models. For P2, we consider a

second-order cone programming (SOCP) based relaxation of the power flow equations in the distribution grids. SOCP-based relaxation of power flow equations in distribution grids have been thoroughly studied; empirical evidence suggests that such relaxations are often tight (see [3], [4]). For P3, we consider

two different power flow models for the transmission and the distribution grids. For transmission, we choose a semidefinite programming (SDP) based relaxation of power flow equations, given its popularity in literature as a means to solve the AC optimal power flow problem [5]–[7]. For the distribution grids, we consider a linear distribution power flow model from [8] that is derived to preserve the key features of low/medium voltage distribution grids.

We view these distinct problems in power system operation through a unifying lens that allows us to study algorithm design in a unified framework. While many techniques apply to these problems, in this paper, we study distributed dual subgradient methods with averaging, addressed recently in [9]. At its core, this algorithm relies on dual decomposition that

(2)

starts by separating the Lagrangian into agent-wise Lagrangian functions that each agent optimizes, given a dual iterate (see classical texts such as [10]–[12]). Such an update rule requires a central coordinator to manage the dual iterates. Distributed dual subgradient methods maintain local copies of such mul-tipliers and run a consensus-based distributed dual ascent on these local multipliers. Approximate primal solutions can be recovered from these dual solutions as in [13], building on techniques in [14]–[16]; asymptotic guarantees on recovered primal sequences are known. In this paper, we adopt the variant of such a dual subgradient method analyzed recently in [9] that generalizes the centralized counterpart in [17]. For the fully distributed algorithm, the authors characterize a O(log T /√T )-bound on a metric that combines sub-optimality and constraint violation, while they argue a lower bound of O(1/√T ). We provide an alternate analysis that closes the gap between these bounds, i.e., we sharpen the convergence rate to O(1/√T ). We do so via a constant step-size algorithm as opposed to decaying step-sizes adopted in [9]. Our choice is motivated to avoid known pitfalls of diminishing step-sizes in practical applications (see [13]).

Overall, we present a unified framework to study three different grid optimization problems, provide a sharper conver-gence rate for a recently-studied algorithm and share results from numerical experiments of the same algorithm for the three applications. The paper is organized as follows. In Sec-tion II, we present a generic constrained multi-agent convex optimization problem as P and present our convergence result. Then, we reduce the application problems P1, P2 and P3 as

instances of P in SectionsIII,IVandV, respectively. In each section, we describe prior art for each problem, identify the problem as an instance of P and document numerical results of running the algorithm from Section II.

Closest in spirit to this work is the survey in [18] that provides an extensive review on distributed optimization tech-niques for optimal power flow (OPF) problems with various power flow models. The list of techniques include augmented Lagrangian decomposition, Karush-Kuhn-Tucker conditions-based approaches, gradient dynamics and dynamic program-ming methods with applications to voltage/frequency control. In contrast, we study an algorithm that was discovered after the survey was published, provide a sharper convergence guarantee for it and apply it to specific grid optimization problems with appropriate power flow models.

II. THE DUAL SUBGRADIENT METHOD WITH AVERAGING

In this section, we present a fully distributed algorithm to solve a convex multi-agent optimization problem of the form

P : minimize x1,...,xN N X j=1 fj(xj), (1a) subject to N X j=1 g_jE(xj) = 0, (1b) N X j=1 g_jI(xj) ≤ 0, (1c) xj ∈ Xj⊆ Rni, j = 1, . . . , N. (1d)

The N agents communicate only across edges of an undi-rected graph G(N,_{E). In Sections}III,IVandV, we cast P1,

P2and P3as examples of P. Here, we focus on an algorithmic

architecture based on a dual subgradient method to solve P and its convergence result.

To lighten notation, let gj collect both gjE and g I

j with the

understanding that the first ME _{constraints encode equalities}

and the last MI _{are inequalities.}

The algorithm relies on Lagrangian duality theory associ-ated with P. We begin by defining the Lagrangian function

L(x, z) := N X j=1 fj(xj) + z|gj(xj) , (2) x|= (x|1, . . . , xTN) ∈ X := X1× . . . × XN, z ∈ Z := RME× RMI + . (3) Then, P can be written as a min-max problem with optimal value P?_{, given by}

P?_{= min}

x∈Xmaxz∈Z L(x, z). (4)

Let_X? _{denote the set of optimizers of P. Associated with P}

is its dual problem, given by P?

D= max

z∈Zminx∈X L(x, z). (5)

Let_Z?_{denote the set of optimizers of the dual problem. Weak}

duality implies that P?≥ P?

D. We say strong duality holds if

the inequality is met with an equality. And, x?∈ X, z?

∈ Z is a saddle point of L, if

L(x?, z) ≤ L(x?, z?) ≤ L(x, z?), (6) for all x ∈_{X, z ∈ Z. The well-known saddle point theorem} (see [19, Theorem 2.156]) states that the primal-dual optimiz-ers X?_{× Z}? _{coincide with the saddle points of L.}

Assumption 1. The functions fj, gjI are convex and gEj is

affine over the compact convex setXj for eachj = 1, . . . , N .

The set of saddle points ofP is nonempty and bounded. Assumption 1ensures that strong duality holds for P, i.e., P?_{= P}?

D, and the set of primal-dual optimizers is nonempty.

Saddle-points exist under standard constraint qualifications such as Slater’s condition, e.g., see [19, Theorem 2.165].

Dual decomposition techniques for distributed optimization rely on the observation that the dual function separates into agent-wise optimization problems, given a multiplier z as

min x∈X L(x, z) = N X j=1 min xj∈Xj Lj(xj, z) | {z } :=Dj(z) , (7)

where Lj(xj, z) := fj(xj) + z|gj(xj). If the agents can

perform these agent-wise minimizations, then a distributed projected subgradient ascent algorithm can solve the dual problem (e.g., see [10]). Per Danskin’s theorem, a subgradient ∇zDj(z) is readily obtained from the agent-wise

minimiza-tion as the sub-differential set of the concave funcminimiza-tion Dj at

z, given by

(3)

Here, “conv” computes the convex hull of its argument and X?

j(z) is the set of minimizers of Lj(·, z) over Xj. The

minimization problem is well-defined, given that _Xj’s are

compact. Running such an algorithm, however, requires a central coordinator to compute the z-update and broadcast the results to all agents. Albeit simpler than aggregating all problem data at a single location, the need for said coordina-tion is a downside of the classical dual decomposicoordina-tion method. To avoid coordination for the dual update, one can alter-nately create local copies of z’s among all agents and enforce equality among these local estimates in the dual problem as

max

N

X

j=1

Dj(zj), subject to zj = zk, j, k = 1, . . . , N, (9)

where zj is the local copy of z with agent j. One can run a

projected distributed subgradient ascent as in [13] to solve (9). The primal iterates obtained from agent-wise minimization of Ljevaluated at the dual iterates may fail to collectively satisfy

the constraints of P. Primal averaging schemes have been studied in [13]; limit points of such recovered primal solutions are known to satisfy the constraints. One can judiciously maintain these local copies only among a subset of the agents to relieve communication burden (see [20]).

Recently, a dual subgradient algorithm was proposed in [17] that leveraged an estimation sequence technique to provide guarantees on sub-optimality and infeasibility on the last iterate. This algorithm does not treat ergodic means simply as outputs from a dual subgradient calculation, but rather uses these means as primal-dual iterates to run the algorithm. We utilize the fully distributed variant of the algorithm that is proposed and analyzed in [9]. To present the algorithm, let W ∈ RN ×N be a doubly stochastic, irreducible and aperiodic weighting matrix that follows the sparsity pattern of G, i.e.,

Wj,k6= 0 ⇐⇒ (j, k) ∈ E. (10)

Then, the distributed projected dual subgradient with averaging is given by Algorithm1, where xj/Xj are primal sequences

and zj/Zj are dual sequences. The updates comprise

min-imization of the local dual function in step 3, averaging of these primal minimizers in step 4, a consensus followed by local subgradient-based dual update in step 5 and an ergodic mean computation for the projected dual variable in step 6 with step-size η. Here, πZ projects the arguments on Z.

Algorithm 1: Distributed dual subgradient with aver-aging to solve P. 1 Choose zj(1) = 0, Zj(0) = 0, xj(0) ∈ Xj and η = η0/ √ T . 2 for t = 1, . . . , T do 3 Xj(t) ← argminxj∈Xj Lj(xj, zj(t)). 4 x_j(t) ← t−1_t x_j(t − 1) +1_tX_j(t). 5 Z_j(t) ←PN_k=1W_jkZ_k(t − 1) + tg_j(x_j(t)) − (t − 1)gj(xj(t − 1)). 6 zj(t + 1) ← _t+1t zj(t) +_t+11 πZ[ηZj(t)]. 7 end

To study convergence properties of this algorithm, consider the metric introduced in [17] and used in [9], given by

VT(x(T ), z(T )) := N X j=1 fj(xj(T )) − N X j=1 Dj(z(T )) + ηT 2N π_Z   N X j=1 gj(x(T ))   2 , (11) where z(T ) :=_N1 PN

j=1zj(T ). The sum of the first two terms

measures the gap between the primal objective at x(T ) ∈ X and the dual function evaluated at z(T ) ∈ Z. The last summand is a measure of the constraint violation at x(T ). We sharpen the bound of [9, Theorem 2] in the next result. The proof is deferred to the appendix to maintain continuity of exposition.

Theorem 1. Suppose Assumption1 holds. Iterates generated by Algorithm1 withη = η0/ √ T over t = 1, . . . , T , η0 > 0 constant, satisfy VT(x(T ), z(T )) ≤ 1 √ T _C 0 1 − σ2(W ) + C1 , VT(x(T ), z(T )) ≥ P?− N X j=1 Dj(zj(T )) − C2 √ T. (12)

whereC’s are positive constants that do not depend on G or T , and σ2(W ) is the second largest singular value of W .

Our upper bound in this result sharpens the conclusion of [9, Theorem 2], while the lower bound is identical. The result implies that the metric in (11) indeed converges at a rate of 1/√T . Our proof of the bounds largely mirrors that of [9, Theorem 2], but deviates from the reliance on results from [21] that incur the log T factor. Instead, we use an argument inspired by the proof of [22, Theorem 2].

We briefly remark on the implication of Theorem1 on the sub-optimality of x(T ) and the constraint violation, separately. Call the right hand side of the upper bound in (12) as C0/√T . Then, we infer N X j=1 fj(xj(T )) − P?≤ N X j=1 fj(xj(T )) − N X j=1 Dj(z(T )) ≤ C 0 √ T, (13)

since P? dominates the dual function, and the second sum-mand ofVT in (11) is non-negative. Also, combining the two

inequalities in (12), we get P?₋ N X j=1 Dj(z(T )) ≤ C0+ C2 √ T . (14)

(4)

Bounding the constraint violation alone using Theorem 1 proves more challenging. The difficulty stems from the fact that, unless x(T ) is feasible in P, the primal-dual gap can assume negative values. However, this gap is bounded below. Using (13), we obtain N X j=1 fj(xj(T )) − N X j=1 Dj(z(T )) ≥ min x∈X N X j=1 fj(xj) − max x∈X N X j=1 fj(xj) =: −Df. (15)

The constant Df ≥ 0 is finite, owing to the compact nature

of _{X. Then, (}12) implies η0 2N πZ   N X j=1 gj(x(T ))   2 ≤ √Df T + C0 T , (16)

This suggests a worst-case O(T−1/4) decay in constraint violation–an estimate that is overly conservative as our nu-merical estimates will reveal. We remark that better finite-time guarantees for vanilla dual subgradient methods are known, e.g., in [14], [25], that with a constant step-size of η0/

√

T yield an O(1/√T ) convergence of the ergodic mean of the primal iterates. Also, with non-summable and square-summable decaying step-sizes, it can be established that classic dual subgradient methods converge to a single dual optimizer (not just to the optimal set), even in distributed settings, e.g., see [15]. Such guarantees are still missing for the proposed algorithm to the best of our knowledge.

III. TIE-LINESCHEDULING VIAMULTI-AREAOPTIMAL

POWERFLOWPROBLEM

In this section and the next two, we present three different examples of grid optimization problems that lend themselves to distributed optimization paradigms. For each problem, we review existing literature, formulate the problem as an example of P and present results of running Algorithm1 on it.

We first present our results on P1–the tie-line scheduling

problem. Tie-lines are transmission lines that interconnect the footprints of different system operators; henceforth, we refer to these footprints as areas. Transfer capabilities of tie-lines between neighboring areas can often meet a significant portion of the demand within an area, e.g., tie-line capacity of 1800MW between areas controlled by NYISO and ISONE equals 10% and 12% of their respective total demands. Ideally, one would solve a joint OPF problem over assets within all interconnected areas to compute the optimal tie-line schedules. However, aggregation of information at a central location to run such an optimization problem remains untenable due to technical and legal barriers. Tie-line scheduling requires a distributed algorithmic paradigm. There is significant prior work on solution architectures for multi-area OPF problems, dating back to [26]. Including [26], Lagrangian relaxation based techniques have been employed in [27]–[30], where coupling constraints between areas are included in the costs and the Lagrange multipliers or the multiplier sensitivities associated with such coupling constraints are exchanged be-tween regions. Subsequently, the authors of [31] developed a

hierarchical decomposition method that seeks to solve the nec-essary optimality conditions for a jointly optimal dispatch. The authors of [32] explored a marginal equivalent decomposition that requires operators to share parts of costs and constraints iteratively. The algorithm in [33] leveraged a generalization of Benders decomposition method. More recently, algorithms in [34], [35] have utilized properties of multi-parametric programming to design critical region projection algorithms to solve the tie-line scheduling problems. In this work, we utilize Algorithm 1 to solve the tie-line scheduling problem that we present next.

We adopt a linear power flow model where the power injec-tions across buses is a linear function of voltage phase angles as prescribed by the popular DC approximations. Denote by θj ∈ Rnj and θj ∈ R

n_j

, the voltage phase angles at the internal and boundary buses in each area j, respectively. The interconnection among areas is given by the undirected graph G(N, E). The multi-area OPF problem is then given by

P1: minimize N X j=1 cj pGj , subject to pG_j ≤ pG j ≤ pGj , (17a) Bj,jθj+ B_j,jθ_j = pGj − p D j , (17b) B_j,jθj+ Bj,jθj+ X k∼j B_j,kθ_k = 0, (17c) Hjθj+ Hjθj≤ Lj, (17d) Hj,kθj+ Hk,jθk≤ Ljk, (17e) j = 1, . . . , N, k ∼ j in G.

Power procurement costs modeled in cj’s typically reflect

supply offers of generators in electricity markets that an SO within each area orchestrates. Here, Lj and Ljk denote the

vectors of line capacities within area j and that of tie-lines connecting areas j and k, respectively. Thus, (17a)–(17d) encode the generation capacity constraints, power balance and transmission line constraints within each area, while (17e) enforces the transmission capacity limit of the tie-lines connecting the areas. To cast (17) as an instance of P, define

xj = θ|_j, θ| j, [p G j]| | , Xj= {xj | (17a), (17b), (17d)} , fj(xj) = cj pGj .

Then, (17c) becomes an example of (1b), while (17e) defines an example of (1c).

Consider the three-area power system shown in Figure1that comprises three IEEE 118 systems stitched together with 6 tie-lines as shown. The three systems were modified as delineated in AppendixB1. We applied Algorithm1on a reformulation of P1as an instance of P with a flat start (zj(1) = 0, Zj(0) = 0,

j = 1, . . . , N ) and step size η = η0/

√

T , where η0= 102and

T = 106. The results are portrayed in the left of Figure2. We chose W based on the transition probabilities of a Markov chain in the Metropolis-Hastings algorithm (see [36, Sec. 2.5]). Here, P? _{was computed by solving P}

1 as a linear program.

(5)

Area 2, 118 bus Area 3, 118 bus Area 1, 118 bus 9, Ref bus 58 17 28 106 78 5 13 29 9 28 17

Fig. 1. The three-area network for multi-area optimal power flow simulations, obtained by joining three IEEE 118-bus systems.

Fig. 2. Performance of Algorithm1(left) and Algorithm2(right) on P1

for the network in Figure1.

Fig. 3. Performance of Algorithm 2with primal averaging (left) and the impact of step-size on Algorithms1and2with primal averaging (right).

Algorithm 2: Distributed dual subgradient to solve P.

1 Choose zj(1) = 0 and η = η0/ √ T . 2 for t = 1, . . . , T do 3 x_j(t) ← argmin_x j∈Xj Lj(xj, zj(t)). 4 zj(t + 1) ←P N k=1WjkπZ[zk(t) + ηgk(xk(t))]. 5 end

We compared Algorithm1with the classical dual subgradi-ent method in Algorithm 2(the projection and the consensus operations in step 4 are sometimes reversed, e.g., in [13]). The progress of Algorithm2with the same step-size used for Algorithm1are shown in the right of Figure2. Note that Algo-rithm1leads to a much smoother progress ofPN

j=1fj(xj(t))

compared to that with Algorithm2. Classical dual subgradient with primal averaging via ˆxj(t) := 1_tP

t

r=1xj(r) for each

j = 1, . . . , N can prevent this “flutter” (see [25, Section 4]), as the left plot in Figure3reveals. While step4 of Algorithm 1 executes a similar averaging operation, this averaging step cannot be viewed as an output of the iteration dynamics as is the case for Algorithm 2 with averaging. As a result, the last iterate of Algorithm 1 moves smoothly as opposed to Algorithm 2. Such an update is useful in applications that

require iterates to be directly implemented as control actions and the dual subgradient is only available at the current iterate (see [17] for a detailed discussion).

In the right of Figure 3, we compared the impact of step-size on the performance of Algorithms 1 and 2 with primal averaging after T = 106 _{iterations. Here, relative optimality}

measures PN j=1fj(xj(t)) − P? /P

? _{and constraint}

viola-tion measures πZ[ PN i=1gj(xj(T ))]

for Algorithm 1 and the same at ˆxj(t)’s for Algorithm 2. Empirically, constraint

violation for Algorithm1appears similar to that for Algorithm 2with primal averaging that is known to have O(T−1/2) decay rate, much better than that suggested by (16) for Algorithm1.

IV. DER COORDINATION INDISTRIBUTIONNETWORKS

Our next application problem is the coordination of DERs such as thermostatically controlled loads, electric vehicles, distributed rooftop solar, etc. Such DERs are increasingly getting adopted in low and medium voltage distribution grids. A careful coordination of such resources can provide valuable grid services at the distribution and the transmission network. There is a long literature on DER coordination to fulfill a variety of objectives that range from tracking a regulation signal at the T&D interface, to volt/VAR control within the distribution grid, etc. See [37]–[39] for examples. A variety of techniques have been used to tackle the nonconvexity of power flow equations in these papers, e.g., the authors of [38] optimize over an inner approximation of the feasible sets, while authors of [39] adopt a linearized distribution power flow model from [40], [41]. Data-driven variants of such algorithms have also been studied, e.g., see [42], [43]. Here, data from an actively managed distribution grid supplements an incomplete or inaccurate network model.

System conditions in the distribution grid can change quite fast. Various ways of tackling fast uncertain dynamics have been proposed. One line of work on DER coordination solves optimization problems in quick successions to deal with such changes, e.g., in [44]. In another line of work, the authors explicitly model the uncertainties and optimize against them, e.g., via chance-constraints in [45] and through an optimal control formulation with robust constraint enforcement in [46]. In what follows, we adopt an optimization framework for DER coordination, aligned more with [44]. We acknowledge that a stochastic control formulation is perhaps more suitable.

Consider a balanced three-phase radial distribution network on N buses described by graph G(N,_{E). Let the first bus be} the T&D interface. Associate directions to edges in E arbi-trarily to obtain a directed graph ~G(N, ~_{E), where j → k ∈ ~}_E denotes a directed edge from bus j to bus k in ~G. At each bus j, consider a dispatchable asset capable of injecting real and reactive powers pG

j, qjG, respectively. Let cj(pGj, qjG) denote

(6)

from dispatchable generation. The power injection capabilities of this asset at bus j are limited as pG

j ≤ p G j ≤ p G j along with qG_j ≤ qG j ≤ q G j or p G j 2 +pG_j2≤sG j 2 ,

henceforth denoted as (pG_j, qG_j_{) ∈ S}j. Such models

encom-pass photovoltaic and energy storage systems, water pumps, commercial HVAC systems, etc. At each bus j, also assume nominal real and reactive power demands pD_j and qD_j .

We need additional notation to describe the DER coordi-nation problem. Associate with bus j the squared voltage magnitude wj. Let Pj,k, Qj,k denote the real and reactive

power flows from bus j to bus k for j → k in ~G. Denote by `j,k, the squared current magnitude flowing from bus j to

bus k. Let rj,k and xj,k denote the resistance and reactance

of the line j → k. The DER coordination problem with a second-order conic convex relaxation of power flow equations in the radial distribution network can be formulated as

P2: minimize N X j=1 cj(pGj, q G j), subject to (18a) pG_j, q_jG_{∈ S}j, (18b) pG_j − pD j = X k:j→k Pj,k− X k:k→j (Pk,j− rk,j`k,j), (18c) qG_j − qjD= X k:j→k Qj,k− X k:k→j (Qj,k− xk,j`k,j), (18d) wk= wj− 2(rj,kPj,k+xj,kQj,k) + (rj,k2 +x 2 j,k)`j,k, (18e) `j,k≤ Lj,k, wj ≤ wj ≤ wj, (18f) `j,kwj≥ Pj,k2 + Q 2 j,k, (18g) j = 1, . . . , N, j → k ∈ ~G.

The last inequality can be written as a second-order cone constraint, making (18) a second-order cone program (SOCP). This inequality written as an equality is the DER coordination problem with nonconvex AC power flow equations. The in-equality results in a relaxation that enlarges the feasible set. See [3], [4] for sufficient conditions under which the inequality is met with an equality at optimality of (18). Even when such conditions are not satisfied, the relaxation is often exact in practice, e.g., see [47].

To cast P3 as an instance of P, we first write the

out-neighbors of j in ~G as k1, . . . , kJ and identify

xj:= pGj, q G j, wj, Pj,k1, . . . , Pj,kJ, Qj,k1, . . . , Qj,kJ, `j,k1, . . . , `j,kJ) | , Xj:= {xj | (18b), (18f), (18g)}, fj(xj) = cj(pGj, q G j ).

Then, it is straightforward to write (18c), (18d) and (18e) as examples of (1b). This formulation does not require inequality constraints of the form (1c).

We ran Algorithm 1 on P2 over a modified IEEE 4-bus

radial distribution network (see Appendix B2 for details).

T&D interface 1 G G 1 G 2 G G 2 G 3 G G 3 G 4 G G (a) (b)

Fig. 4. (a) A 4-bus radial network. (b) Progress of the objective function at the last iterate of Algorithm1on P2for the network in Figure4a.

To illustrate the use of DER coordination with time-varying distribution grid conditions, we simulated a case where real and reactive power demands were changed every 106_iterations

as prescribed in Appendix B2 with step-size η = 0.1. Algo-rithm 1 is restarted after every change. Here, we use the last primal-dual iterate at the point of change to restart Algorithm 1. As Figure4billustrates, Algorithm1can track the optimal cost in the changing problem environment.

1 G G T&D interface 3 4 5 G G GG 15 14 G G 15 14 G G G 2 2 11 1212 9 G G 10 G 10 13 G G G 13 G G G G G G 7 7 88 6 1 G G T&D interface 3 4 5 G G GG 15 14 G G 15 14 G G G 2 2 11 1212 9 G G 10 G 10 13 G G G 13 G G G G G G 7 7 88 6 1 G G T&D interface 3 4 5 G G GG 15 14 G G 15 14 G G G 2 2 11 1212 9 G G 10 G 10 13 G G G 13 G G G G G G 7 7 88 6 1 G G T&D interface 3 4 5 G G GG 15 14 G G 15 14 G G G 2 2 11 1212 9 G G 10 G 10 13 G G G 13 G G G G G G 7 7 88 6

Fig. 5. The IEEE 15-bus test feeder subdivided into 2, 4, 8 and 12 groups.

Fig. 6. Evolution of the objective function value of Algorithm 1 on the IEEE 15-bus test system with varying degrees of decentralization (based on groupings of buses per Figure5).

(7)

V. T&D COORDINATION

Transmission SOs typically do not have visibility into distri-bution grids. Thus, they cannot directly harness the flexibility offered by DERs connected to the distribution networks. Even if SOs gain such visibility, current bottlenecks in wholesale market clearing software makes it unlikely to jointly dispatch all T&D assets. Naturally, distributed algorithms are suited for T&D coordination. Who might represent the distribution grid and its capabilities in the wholesale market process? Distribution utility companies have been largely responsible for procuring power from the wholesale markets and supplying it to end-use customers connected to the distribution grid. The evolution of utility business is being actively debated, e.g., see [48]. Some advocate the creation of a retail market, very much along the lines of a wholesale market, facilitated by either a utility or an independent distribution system operator. Others advocate third-party retail aggregators to represent DERs in the wholesale market. Algorithmic needs for dispatching DERs together with transmission assets will largely depend upon how the regulatory structure evolves. We set aside regulatory debates and focus on an algorithmic solution that allows a transmission SO and a collection of DER aggregators at the T&D interface to compute an optimal dispatch for all T&D assets without having to aggregate all information at a central location. That is, assume that an aggregator A directly controls the dispatchable DERs and knows the network parameters of the distribution grid it controls. Our setup is similar to those in [49]–[52] that discuss a variety of decomposition techniques for T&D coordination which differ in the representation of the distribution grids at the transmission level. In what follows, we assume a semidefinite relaxation of power flow equations for the transmission network and a linear distribution flow model for the distribution grids for the T&D coordination problem P3 and cast it as an example of P.

To formulate the joint dispatch problem of all T&D assets, we require three different graphs. The first among these is the transmission network, modeled as an undirected graph Gtran on ntran _{transmission buses. The second set of graphs are the}

distribution grids that connect to the transmission network at their points of common coupling–the ntrantransmission buses. We model the distribution grid connected to transmission bus ` as an undirected graph Gdist

` on n dist

` + 1 distribution

buses, where the first bus of Gdist

` coincides with bus ` in

Gtran. Finally, we consider an undirected star graph G on N = ntran_{+ 1 nodes with the aggregators A}

1, . . . , Antran as

the satellite nodes and the SO (the N -th node) at the center. Let V ∈ _Cn denote the vector of nodal voltage phasors, where C is the set of complex numbers. We formulate the engineering constraints of the grid using the positive semidef-inite matrix W := V VH _{∈ C}ntran×ntran

. To describe these constraints, let y`,k = yk,` denote the admittance of the

transmission line joining buses `, k in Gtran and y`,` denote

the shunt admittance at bus `. Then, define Φ`,k, Ψ`,k as the

ntran_×ntran_{Hermitian matrices whose only nonzero entries are}

[Φ`,k]`,`:= 1 2(y`,k+ y H `,k), [Φ`,k]`,k= [Φ`,k]Hk,`:= − 1 2y`,k, [Ψ`,k]`,`:= 1 2i(y H `,k− y`,k), [Ψ`,k]`,k= [Ψ`,k]Hk,`:= 1 2iy`,k.

In addition, we define the ntran× ntran _{Hermitian matrices}

Φ`:= 1 2 y`,`+ y H `,` 1`1H` + X k∼` Φ`,k, Ψ`:= 1 2i y H `,`− y`,` 1`1H` + X k∼` Ψ`,k,

where_{1 is a vector of all ones of appropriate size and 1}`is a

vector of all zeros except at the `-th position that is unity. This notation allows us to describe the apparent power flow from bus ` to bus k as Tr(Φ`,kW ) + iTr(Ψ`,kW ), the apparent

power injection at bus ` as Tr(Φ`W ) + iTr(Ψ`W ), and the

squared voltage magnitude at bus ` as Tr(1`1H`W ). At each

transmission bus `, let a generator supply apparent power PG ` +

iQG_` with procurement cost described by C`.

Let each transmission bus ` be the first bus of an ndist₊

1-bus distribution network Gdist

` . Let p`+ iq`∈ Cn

dist

denote the vector of net power injections across the distribution network, save the first bus. Further, let the power procurement cost be given by c` to inject p`+ iq` ∈ Cn

dist

. Also, let w` ∈ Cn

dist

denote the vector of squared voltage magnitudes across the same set of buses. We adopt the popular LinDistFlow model to tackle the nonconvex nature of the power flow equations in the distribution grid. Let f_{M ∈ R}ndist×ndist

be the node-to-edge incidence matrix of Gdist

` . Further, remove the first

row of fM to obtain the reduced incidence matrix M . Then, the voltage magnitudes are related to power injections under the LinDistFlow model as w` = ρ`p` + χ`q` + W`,`1,

where ρ` and χ` are ndist× ndist matrices defined as ρ` :=

2M−|diag(r`)M−1, χ` := 2M−|diag(x`)M−1, r`/ x`

collect the resistances/reactances of the ndist _{distribution lines.}

The optimal joint dispatch over all T&D assets is given by

P3: minimize ntran X `=1 C`(P`G, QG`) + ntran X `=1 c`(pG`, q`G),

subject to (P_`G, QG_`_{) ∈ S}tran_` , (19a) pG_`, q_`G_{∈ S}dist_` , (19b) P_`G_{+ 1}| pG_` − pD ` = Tr(Φ`W ), (19c) QG_` _{+ 1}| q_`G− qD ` = Tr(Ψ`W ), (19d) Tr(Φ`,`0W ) ≤ f_`,`0, (19e) w_`≤ W`,`≤ w`, (19f) W 0, (19g) w_`≤ ρ`p`+ χ`q`+ W`,`1 ≤ w`, (19h) for ` = 1, . . . , ntran, `0∼ `.

(8)

Recall that G for T&D coordination problem is a graph on N = ntran _{+ 1 nodes, where the first n}tran _{nodes are}

transmission buses and the last node represents the SO. Define x`:= p|`, q`|, W`,`

|

, X`:= {x` | (19b)}, f`= c`(pG`, q G ` )

for ` = 1, . . . , ntran. Collect the real and reactive power gen-erations across the transmission grid in the vectors PG, QG_,

respectively. Then, define xN :=

[PG]|, [QG]|, vec (<{W })|, vec (={W })|

|

, XN := {xntran₊₁ | (19a), (19e), (19f), (19g)},

fN(xN) = ntran X `=1 C`(P`G, Q G `).

The constraint (19g) can be written in terms of xN as

<{W } ={W } −={W } <{W }

0

and (19c) – (19d) as examples of (1b) using Tr(ϕW ) = vec (<{ϕ})|vec (<{W }) + vec (={ϕ})|vec (={W }) for a Hermitian matrix ϕ. Constraints (19h) are examples of in-equality constraints in (1c). 1 G G GG G G G G G G G G G G 2 3 4 5 6 dist 2 G dist 5 G dist 1 G dist 4 G tran

G

dist 6 G 1 2 3 4 5 6 8 8 9 9 10 10 11 11 12 1313 14 14 15 15 16 16 17 17 18 G G 19 19 2323 77 24 24 25 20 20 21 21 22 22 G G G G G G 26 26 27 27 28 28 29 29 30 30 31 31 32 32 33 G G dist 3 G

Fig. 7. The 204-bus network for T&D simulations, obtained by joining the IEEE 6-bus transmission network with six IEEE 33-bus distribution networks.

Fig. 8. Progress of Algorithm 1 on P3 using LinDistFlow and SOCP

relaxation of power flow equations for the distribution grid.

We report numerical results on a 204-bus T&D system that comprises the IEEE 6-bus transmission network joined with six IEEE 33-bus distribution systems (see Figure 7 and AppendixB3for details). We applied Algorithm1on a refor-mulation of P3as an instance of P with a flat start (zj(1) = 0,

Zj(0) = 0, j = 1, . . . , N ) and step size η = η0/

√

T , where η0 = 102 and T = 106. The agent-wise subproblems for

P3 are communicated over a 7-node star graph G with the

SO in the center. Convergence results are shown in Figure 8. To illustrate the flexibility of our modeling framework, we also simulated P3where the LinDistFlow model was replaced

by SOCP relaxations of the power flow equations for the distribution grid, as in SectionIV. Convergence of Algorithm1 with this power flow model in Figure8are similar to that with

the LinDistFlow model. The optimal costs, however, are higher by 4.04%, given that the SOCP relaxation model accounts for distribution losses that the LinDistFlow model does not.

REFERENCES

[1] M. B. Cain, R. P. O’Neill, A. Castillo et al., “History of optimal power flow and formulations,” Federal Energy Regulatory Commission, vol. 1, pp. 1–36, 2012.

[2] B. Stott, J. Jardim, and O. Alsac¸, “DC power flow revisited,” IEEE Trans. on Power Systems, vol. 24, no. 3, pp. 1290–1300, 2009. [3] M. Farivar and S. H. Low, “Branch flow model: Relaxations and

convexification (parts I, II),” IEEE Trans. on Power Systems, vol. 28, no. 3, pp. 2554–2572, 2013.

[4] L. Gan, N. Li, U. Topcu, and S. H. Low, “Exact convex relaxation for optimal power flow in distribution networks,” IEEE Trans. on Automatic Control, vol. 60, no. 1, pp. 351–352, 2015.

[5] J. Lavaei and S. H. Low, “Zero duality gap in optimal power flow problem,” IEEE Trans. on Power Systems, vol. 27, no. 1, pp. 92–107, 2011.

[6] B. Zhang and D. Tse, “Geometry of injection regions of power net-works,” IEEE Trans. on Power Systems, vol. 28, no. 2, pp. 788–797, 2012.

[7] S. Bose, D. F. Gayme, K. M. Chandy, and S. H. Low, “Quadratically constrained quadratic programs on acyclic graphs with application to power flow,” IEEE Trans. on Control of Network Systems, vol. 2, no. 3, pp. 278–287, 2015.

[8] M. E. Baran and F. F. Wu, “Optimal capacitor placement on radial distribution systems,” IEEE Trans. on Power Delivery, vol. 4, no. 1, pp. 725–734, 1989.

[9] S. Liang, L. Y. Wang, and G. Yin, “Distributed dual subgradient algorithms with iterate-averaging feedback for convex optimization with coupled constraints,” IEEE Trans. on Cybernetics, 2019.

[10] S. Boyd, L. Xiao, A. Mutapcic, and J. Mattingley, “Notes on decompo-sition methods,” Notes for EE364B, Stanford University, vol. 635, pp. 1–36, 2007.

[11] B. Polyak, “Introduction to optimization,” Translations Series in Math-ematics and Engineering. New York: Optimization Software Inc. Publi-cations Division, 1987.

[12] D. Bertsekas and J. Tsitsiklis, Parallel and Distributed Computation: Numerical Methods. Prentice-Hall, 1989.

[13] A. Simonetto and H. Jamali-Rad, “Primal recovery from consensus-based dual decomposition for distributed convex optimization,” Journal of Optimization Theory and Applications, vol. 168, no. 1, pp. 172–197, 2016.

[14] A. Nedi´c and A. Ozdaglar, “Distributed subgradient methods for multi-agent optimization,” IEEE Trans. on Automatic Control, vol. 54, no. 1, pp. 48–61, 2009.

[15] E. Gustavsson, M. Patriksson, and A.-B. Str¨omberg, “Primal conver-gence from dual subgradient methods for convex optimization,” Mathe-matical Programming, vol. 150, no. 2, pp. 365–390, 2015.

[16] J. Ma et al., “Recovery of primal solution in dual subgradient schemes,” Ph.D. dissertation, Massachusetts Institute of Technology, 2007. [17] Y. Nesterov and V. Shikhman, “Dual subgradient method with averaging

for optimal resource allocation,” European Journal of Operational Research, vol. 270, no. 3, pp. 907–916, 2018.

[18] D. K. Molzahn, F. Dorfler, H. Sandberg, S. H. Low, S. Chakrabarti, R. Baldick, and J. Lavaei, “A survey of distributed optimization and control algorithms for electric power systems,” IEEE Trans. on Smart Grid, vol. 8, no. 6, pp. 2941–2962, 2017.

[19] J. F. Bonnans and A. Shapiro, Perturbation analysis of optimization problems. Springer Science & Business Media, 2013.

[20] H. Kao and V. Subramanian, “Convergence rate analysis for distributed optimization with localization,” in 2019 57th Annual Allerton Confer-ence on Communication, Control, and Computing (Allerton). IEEE, 2019, pp. 384–390.

[21] J. C. Duchi, A. Agarwal, and M. J. Wainwright, “Dual averaging for distributed optimization: Convergence analysis and network scaling,” IEEE Trans. on Automatic control, vol. 57, no. 3, pp. 592–606, 2011. [22] T. T. Doan, S. Bose, D. H. Nguyen, and C. L. Beck, “Convergence of

the iterates in mirror descent methods,” IEEE Control Systems Letters, vol. 3, no. 1, pp. 114–119, 2018.

[23] Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course. Norwell, MA: Kluwer Academic Publishers, 2004. [24] G. Qu and N. Li, “Accelerated distributed nesterov gradient descent,”

(9)

[25] A. Nedi´c and A. Ozdaglar, “Approximate primal solutions and rate analysis for dual subgradient methods,” SIAM Journal on Optimization, vol. 19, no. 4, pp. 1757–1780, 2009.

[26] B. H. Kim and R. Baldick, “Coarse-grained distributed optimal power flow,” IEEE Trans. on Power Systems, vol. 12, no. 2, pp. 932–939, May 1997.

[27] X. Lai, L. Xie, Q. Xia, H. Zhong, and C. Kang, “Decentralized multi-area economic dispatch via dynamic multiplier-based Lagrangian relaxation,” IEEE Trans. on Power Systems, vol. 30, no. 6, pp. 3225– 3233, Nov 2015.

[28] X. Wang, Y. H. Song, and Q. Lu, “Lagrangian decomposition approach to active power congestion management across interconnected regions,” IEE Proceedings - Generation, Transmission and Distribution, vol. 148, no. 5, pp. 497–503, Sep 2001.

[29] R. Baldick, B. H. Kim, C. Chase, and Y. Luo, “A fast distributed implementation of optimal power flow,” IEEE Trans. on Power Systems, vol. 14, no. 3, pp. 858–864, Aug 1999.

[30] A. J. Conejo and J. A. Aguado, “Multi-area coordinated decentralized DC optimal power flow,” IEEE Trans. on Power Systems, vol. 13, no. 4, pp. 1272–1278, Nov 1998.

[31] R. Baldick and D. Chatterjee, “Coodinated dispatch of regional trans-mission organizations: Theory and example,” Computer & Operations Research, vol. 41, pp. 319–332, 2014.

[32] F. Zhao, E. Litvinov, and T. Zheng, “A marginal equivalent decom-position method and its application to multi-area optimal power flow problems,” IEEE Trans. on Power Systems, vol. 29, no. 1, pp. 53–61, Jan 2014.

[33] Z. Li, W. Wu, B. Zhang, and B. Wang, “Decentralized multi-area dynamic economic dispatch using modified generalized Benders decom-position,” IEEE Trans. on Power Systems, vol. 31, no. 1, pp. 526–538, Jan 2016.

[34] Y. Guo, S. Bose, Q. Xia, and L. Tong, “On robust tie-line scheduling in multi-area power systems,” IEEE Trans. on Power Systems, vol. 33, no. 4, pp. 4144–4154, 2018.

[35] Y. Guo, L. Tong, W. Wu, B. Zhang, and H. Sun, “Coordinated multi-area economic dispatch via critical region projection,” IEEE Trans. on Power Systems, vol. PP, no. 99, 2017.

[36] G. Notarstefano, I. Notarnicola, and A. Camisa, “Distributed Optimiza-tion for Smart Cyber-Physical Networks,” FoundaOptimiza-tions and Trends® in Systems and Control, vol. 7, no. 3, pp. 253–383, 2020.

[37] A. D. Dominguez-Garcia and C. N. Hadjicostis, “Coordination of distributed energy resources for provision of ancillary services: Archi-tectures and algorithms,” Enclyclopedia of Systems and Control, 2014. [38] D. Fooladivanda, M. Zholbaryssov, and A. D. Dominguez-Garcia, “Con-trol of networked distributed energy resources in grid-connected ac microgrids,” IEEE Trans. on Control of Network Systems, 2017. [39] E. DallAnese, S. Guggilam, A. Simonetto, Y. C. Chen, and S. V. Dhople,

“Optimal regulation of virtual power plants,” IEEE Trans. on Power Systems, vol. 33, no. 2, pp. 1868–1881, 2018.

[40] M. E. Baran and F. F. Wu, “Optimal sizing of capacitors placed on a radial distribution system,” IEEE Trans. on Power Systems, vol. 4, no. 1, pp. 735–743, 1989.

[41] ——, “Network reconfiguration in distribution systems for loss reduction and load balancing,” IEEE Trans. on Power Systems, vol. 4, no. 2, pp. 1401–1407, 1989.

[42] H. Xu, A. D. Dom´ınguez-Garc´ıa, V. V. Veeravalli, and P. W. Sauer, “Data-driven voltage regulation in radial power distribution systems,” IEEE Trans. on Power Systems, vol. 35, no. 3, pp. 2133–2143, 2019. [43] H. Xu, A. D. Dom´ınguez-Garc´ıa, and P. W. Sauer, “Data-driven

coordi-nation of distributed energy resources for active power provision,” IEEE Trans. on Power Systems, vol. 34, no. 4, pp. 3047–3058, 2019. [44] X. Zhou, E. Dall’Anese, L. Chen, and A. Simonetto, “An incentive-based

online optimization framework for distribution grids,” IEEE Trans. on Automatic Control, 2017.

[45] E. Dall’Anese, K. Baker, and T. Summer, “Chance-constrained AC optimal power flow for distribution systems with renewables,” IEEE Trans. on Power Systems, vol. 32, no. 5, pp. 3427–3438, 2017. [46] W. Lin and E. Bitar, “Decentralized stochastic control of distributed

energy resources,” IEEE Trans. on Power Systems, 2017.

[47] S. H. Low, “Convex relaxation of optimal power flow—part II: Exact-ness,” IEEE Trans. on Control of Network Systems, vol. 1, no. 2, pp. 177–189, 2014.

[48] P. De Martini, L. Kristov, and L. Schwartz, “Distribution systems in a high distributed energy resources future,” Lawrence Berkeley National Lab.(LBNL), Berkeley, CA (United States), Tech. Rep., 2015.

[49] A. Kargarian and Y. Fu, “System of systems based security-constrained unit commitment incorporating active distribution grids,” IEEE Trans. on Power Systems, vol. 29, no. 5, pp. 2489–2498, 2014.

[50] Z.Li, Q. Guo, H. Sun, and J. Wang, “Coordinated economic dispatch of coupled transmission and distribution systems using heterogeneous decomposition,” IEEE Trans. on Power Systems, vol. 31, no. 6, pp. 4817– 4830, 2016.

[51] ——, “Coordinated transmission and distribution AC optimal power flow,” IEEE Trans. on Smart Grid, vol. 9, no. 2, pp. 1228–1240, 2018. [52] Z.Yuan and M. Hesamzadeh, “Hierarchical coordination of TSO-DSO economic dispatch considering large-scale integration of distributed energy resources,” Applied Energy, vol. 195, pp. 600–615, 2017. [53] R. Horn and C. Johnson, Matrix Analysis. Cambridge, U.K.: Cambridge

Univ. Press, 1985.

[54] R. Zimmerman, C. Murillo-S´anchez, and R. Thomas, “Matpower: Steady-state operations, planning, and analysis tools for power systems research and education,” IEEE Trans. on Power Systems, vol. 26, pp. 12–19, 2011.

Subhonmesh Bose is an Assistant Professor in the Dept. of Electrical and Computer Engineering at the University of Illinois at Urbana-Champaign. He was an Atkinson Postdoctoral Fellow in Sustainability at Cornell University and received his M.S. and Ph.D. degrees from California Institute of Technol-ogy in Electrical Engineering in 2012 and 2014, respectively. And, he got his B.Tech. degree at the Indian Institute of Technology Kanpur in 2009. Dr. Bose’s research focuses on algorithms and market design for the power grid, optimization theory and networked control. He received his NSF CAREER Award in 2021.

Dinh Hoa Nguyen received a Ph.D. degree from The University of Tokyo in 2014. Currently, he is an Assistant Professor at Kyushu University, Japan. His research interests include integration of renewable and distributed energy resources, smart grid, consumer-centric energy systems and markets, grid resiliency and security, artificial intelligence, distributed optimization, and multi-agent system.

Haitian Liu received his B.E. from Huazhong Uni-versity of Science and Technology, Wuhan, China, in 2019. Currently he is pursuing his Ph.D. degree in Tsinghua-Berkeley Shenzhen Institute, Shenzhen, China. His research interest includes optimization for multi-area power systems.

(10)

Carolyn L. Beck received her B.S. from Calif. Polytechnic State Univ., Pomona, CA, her M.S. from Carnegie Mellon, Pittsburgh, PA, and her Ph.D. from the California Institute of Technology, Pasadena, CA, all in electrical engineering. Prior to completing her Ph.D., she worked at Hewlett-Packard in Silicon Valley for four years, designing hardware and soft-ware for measurement instruments. She is currently a Professor and Associate Head in the Department of Industrial and Enterprise Systems Engineering, University of Illinois at Urbana-Champaign. She has held visiting positions at the Royal Institute of Technology (KTH) in Stockholm, Stanford University in Palo Alto and Lund University, Lund, Swe-den. Her research interests include control of networked systems, epidemic processes, mathematical systems theory, clustering and aggregation methods.

APPENDIX

A. Proof of Theorem 1

Since fj and gj’s are convex (and hence, continuous) and

Xj is compact for each j, these functions admit positive

constants D_X, DG, Lg such that

xj− x0j ≤ DX, kgj(xj)k ≤ Dg, gj(xj) − gj(x 0 j) ≤ Lg xj− x 0 j (20)

for all xj, x0j∈ Xj. Also, define DZ := LgDX+ Dg.

1) Upper bounding VT: Using this notation, we derive the

upper bound onVT in four steps:

(a) We bound the optimality gap as N X j=1 [fj(xj(T )) − Dj(z(T ))] ≤ 2Dg T N X j=1 T X t=1 η Zj(t − 1) − Z(t − 1) − η T N X j=1 T X t=1 gj(Xj(t)) | πZZ(t − 1) , (21)

where we use the notation Z(t) := _N1 PN

j=1Zj(t).

(b) Then, we bound the constraint violation as

T 2N πZ " _N X j=1 gj(x(T )) # 2 ≤ 1 T T X t=1 N X j=1 gj(Xj(t))|πZZ(t − 1) + 1 2N D 2 Z. (22)

(c) We prove that Zj’s remain close to their centroid as N X j=1 Zj(t) − Z(t) ₂≤ N3/2DZ(1 − σ2(W )) −1 . (23) (d) Steps (a), (b), (c) are combined to prove the result.

• Step (a). Bounding the duality gap: Note that N X j=1 [fj(xj(T )) − Dj(z(T ))] = N X j=1 [fj(xj(T )) − Dj(zj(T ))] + N X j=1 [Dj(zj(T )) − Dj(z(T ))] ≤ N X j=1 " 1 T T X t=1 fj(Xj(t)) − Dj(zj(T )) # | {z } :=Aj + Dg N X j=1 kzj(T )) − zj(T )k | {z } :=Bj . (24)

The last line follows from three observations: fj is convex,

xj(T ) = _T1 P T

t=1Xj(t) and Dj is Dg-Lipschitz. In the rest

of step (a), we individually boundAj andBj.

To obtain a bound onAj, note that

tzj(t) − (t − 1)zj(t − 1) = πZ[ηZj(t − 1)] , (25)

which then implies

tLj(Xj(t), zj(t)) = Lj(Xj(t), tzj(t) − (t − 1)zj(t − 1)) + (t − 1)Lj(Xj(t), zj(t − 1)) ≥ Lj(Xj(t), πZ[ηZj(t − 1)])

+ (t − 1)Lj(Xj(t − 1), zj(t − 1)). (26)

The first line follows from elementary algebra, while the second line requires the definition of Zj and the fact that

Xj(t − 1) minimizes Lj(·, zj(t − 1)) over Xj. Iterating the

above inequality, we obtain

T Dj(zj(T )) = T Lj(Xj(T ), zj(T )) ≥ T X t=1 Lj(Xj(t), πZ[ηZj(t − 1)]) . (27)

The above relation boundsAj from above as

(11)

To bound Bj, we use the definition of zj(t) to infer zj(T ) = 1 T T X t=1 πZ[ηZj(t − 1)] , (30)

which in turn implies Bj≤ 1 T T X t=1 πZ[ηZj(t − 1)] − πZηZ(t − 1) . (31)

Using the bounds of (29) and (31) in (24) and appealing to the non-expansive nature of the projection operator yields (21), completing step (a) of the proof.

• Step (b). Bounding the constraint violation: From the Z-update, we obtain Z(t) = t N N X j=1 gj(xj(t)), (32)

that proves useful in bounding the constraint violation as T2 N2 πZ "N X j=1 gj(xj(T )) # 2 = πZZ(T ) 2 = T X t=1 πZZ(t) 2 − πZZ(t − 1) 2 ≤ 2 T X t=1 h πZZ(t − 1) | [Z(t) − Z(t − 1)]i | {z } :=E (t) + T X t=1 Z(t) − Z(t − 1) 2 | {z } :=F(t) . (33)

The inequality follows from the fact that for any two scalars a, b, we have a2− b2= 2b(a − b) + (a − b)2, πR+[a] 2 − πR+[b] 2 ≤ 2πR+[b](a − b) + (a − b) 2 . (34) We separately bound E (t) and F (t). For the former, we use the convexity of gj and the x-update to infer

Z(t) − Z(t − 1) = t N N X j=1 gj(xj(t)) − t − 1 N N X j=1 gj(xj(t − 1)) = t N N X j=1 gj t − 1 t xj(t − 1) + 1 tXj(t) −t − 1 N N X j=1 gj(xj(t − 1)) ≤ 1 N N X j=1 gj(Xj(t)) . (35)

Note that if an entry of g encodes an equality constraint, the linearity of that constraint makes the above relation being met with an equality. Thus, we obtain

E (t) ≤ 1 N N X j=1 gj(Xj(t)) | πZZ(t − 1) . (36) To bound Ft, we use the first line of (35) and the

bounded/Lipschitz nature of gj onXj to get

Z(t) − Z(t − 1) ≤Lg N N X j=1 (t − 1) kxj(t) − xj(t − 1)k + Dg =Lg N N X j=1 kXj(t) − xj(t)k + Dg ≤ LgDX+ Dg = DZ (37)

Replacing the bounds on E (t) and F (t) in (33) gives the required bound on constraint violation in (22), completing the proof of step (b).

• Step (c): Bounding the deviation of Zj’s from its centroid:

Consider ζ ∈RN ×M_{, given by}

ζ(t)|= (Z1(t) | . . . | ZN(t)) . (38) and define P := I − _N1₁₁|, where _{1 ∈ R}N is a vector of all ones and I ∈ RN ×N _{is the identity matrix. Using this}

notation, we deduce N X j=1 Zj(t) − Z(t) ₂≤ √ N kP ζ(t)k_F ≤ N kP ζ(t)k₂, (39) where, k·k_F denotes the Frobenius norm of a matrix. Then, the Z-updates can be written as

ζ(t + 1) = W ζ(t) + ϕ(t), ζ(0) = 0 (40) with ϕ(t) ∈ RN ×M_{; an analysis similar to (}₃₇_{) gives that}

each row has a 2-norm bounded above by DZ, implying

kϕ(t)k₂≤√N DZ. (41)

Using (40), we then obtain

kP ζ(t + 1)k₂= kP (W ζ(t) + ϕt)k₂ ≤ kW P ζ(t)k₂+ kP ϕtk2

(42) utilizing the fact that W and P commute. To bound the first term in (42), note that W is doubly stochastic for which the Perron-Frobenius theorem [53, Theorem 8.4.4] implies that its eigenvalue with largest absolute value is unity for which_{1 is} the eigenvector. However, ₁|P = 0, which in turn suggests P ζ(t) is orthogonal to this eigenvector. Using the Courant-Fischer theorem [53, Theorem 4.2.11], we then obtain

kW P ζ(t)k₂≤ σ2(W ) kP ζ(t)k2, (43)

where σ2(W ) is the second largest singular value of W . Since

W is irreducible and aperiodic, σ2(W ) ∈ (0, 1). We bound

the second term in (42) as kP ϕtk₂≤kP k2

| {z }

=1

kϕ(t)k₂≤√N DZ, (44)

because the 2-norm is sub-multiplicative. Using the bounds in (43) and (44) in (42), imply

kP ζ(t + 1)k₂≤ σ2(W ) kP ζ(t)k2+

√

N DZ. (45)

Iterating the above inequality gives kP ζ(t)k₂≤√N DZ t−1 X `=0 [σ2(W )]t−`−1 ≤√N DZ(1 − σ2(W ))−1. (46)

(12)

• Step (d). Combining steps (a), (b), (c) to derive the result: Note that (21) and (22) together with the definition ofVT give

VT(x(T ), z(T )) = N X j=1 [fj(xj(T )) − Dj(z(T ))] + ηT 2N πZ "N X j=1 gj(x(T )) # 2 ≤ 2Dg T N X j=1 T X t=1 η Zj(t − 1) − Z(t − 1) + η 2N D 2 Z ≤ η2DgN 3/2_D Z 1 − σ2(W ) +η 2N D 2 Z. (47) where the second inequality follows from using (23). Using η = η0/

√

T , we then obtain the upper bound in (12). 2) Lower boundingVT: By the saddle-point property of a

primal dual optimizer (x?, z?_{) of P, we get}

P? = L(x?, z?) ≤ L(x(T ), z?) = N X j=1 fj(xj(T )) + z?,| N X j=1 gj(x(T )) ≤ N X j=1 fj(xj(T )) + z?,| N X j=1 πZ[gj(x(T ))] . (48)

Applying Young’s inequality to the last summand in the right hand side of the above relation, we further get

P?≤ N X j=1 fj(xj(T )) + N 2ηT kz ? k2+ηT 2N N X j=1 πZ[gj(xj(T ))] 2 . (49) Subtracting PN

j=1Dj(z(T )) on both sides and using η =

η0/

√

T yields the desired lower bound on VT in (12).

B. Simulation data for SectionsIII,IV andV

Network data were obtained from MATPOWER 7.1 [54]. 1) Data for solving P1: The multi-area power system

considered in Section III is illustrated in Figure 1. The 118-bus networks were modified as follows. Tie-line capacities were set to 100MW and their reactances were set to 0.25p.u. Capacities of transmission lines internal to each area were set to 100MW. All loads and generators at boundary buses were removed. Quadratic cost coefficients were neglected and the linear cost coefficients cj of the generators were perturbed to

e

cj := cj◦ (0.99 + 0.02ξj), for j = 1, . . . , N , where entries of

ξj are independent N (0, 1) (standard normal) variables. All

phase angles were restricted to [−π₆,π₆].

2) Data for solving P2: The 4-bus network considered in

Section IV, shown in Figure 4a, is modified from the IEEE 4-bus network as follows. The branch joining buses 1 and 4 was altered to connect buses 3 and 4. We enforced squared current flows as `j,k∈ [0, 200] Amp2, real and reactive branch

power flows as Pj,k∈ [−1, 1] MW and Qj,k∈ [−1, 1] MVAR,

respectively. DER generators were added at buses 2, 3 and 4. Bus 1 defined the T&D interface. Generation capacities were fixed to [0, 1] MW and [−1, 1] MVAR. Generation costs were αpj(pGj)2+ βpjpjG+ αqj(qGj)2 with coefficients in TableI.

TABLE I

COST COEFFICIENTS OF THE4-BUS NETWORK FORP2.

Bus 1 2 3 4

αp[$/MW2] 0 6 7 8

βp[$/MW] 30 19 18 17

αq[$/MVAR2] 5 5.1 5.2 5.3

For the IEEE 15-bus system shown in Figure5, we modified the branch flow limits to mirror those for the 4-bus system. We added 7 distributed generators at buses 5, 7, 8, 10, 13, 14, 15, where bus 1 is the T&D interface, all with capacities [0, 0.2] MW and [−0.2, 0.2] MVAR. Generation costs were similar to the 4-bus network with coefficients in TableII.

TABLE II

Bus 1 5 7 8 10 13 14 15

αp[$/MW2] 0 25 23 21 19 17 15 13

βp[$/MW] 50 41 42 43 44 45 46 47

αq[$/MVAR2] 25 24 23 22 21 20 19 18

We randomized the real and reactive power demands at each change point by scaling each (real/reactive) load by [ω + (ω0_{− ω)ξ], where ξ ∼ N (0, 1). Parameters (ω, ω}0_{) were}

varied at the change points in the sequence (0.70, 1.30), (0.80, 1.20), (0.85, 1.15), (0.75, 1.20), (0.95, 1.05). The ex-periment was initialized with default loads from MATPOWER. 3) Data for solving P3: In Section V, for the 204-bus

system in Figure 7, the 6-bus transmission network was modified as follows. All branch capacities are set to 5MW. All real and reactive generation capacities were set to [0, 5]MW and [−5, 5] MVAR, respectively. We considered P_`D+ jQD

` =

(4+j4)[MVA] at each bus ` = 1, . . . , 6. Generation costs were similar to the 4-bus network with coefficients in TableIII.

TABLE III

Bus 1 2 3 4 5 6

αp[$/MW2] 8.7 5.9 6.8 7.2 4.2 3.5

βp[$/MW] 11 12 13 14 15 16

αq[$/MVar2] 3.2 3.5 2.3 1.8 1.5 1.7

For all 33-bus distribution networks, all branch capacities were set to 4 MW. Four DER generators were added at buses 18, 22, 25 and 33. Bus 1 is the T&D interface. Again, we considered generation costs as for P2 but with coefficients

αp` = 5 ◦ (0.9 + 0.1ξ`), βp` = 20 ◦ (0.9 + 0.1ξ0`) and

αq`= 3 ◦ (0.9 + 0.1ξ00`) for ` = 1, . . . , n

tran_{, where all entries}

of ξ`, ξ0`, ξ00` are drawn from N (0, 1). Real and reactive