• No results found

CHAPTER 3 : APPLICATION OF SDD SOLVERS FOR LARGE SCALE OPTIMIZATION

3.1.6 Experiments

This section provides empirical validation of our distributed Newton method. We performed two sets of experiments on three different network topologies: 1) random (both small and large in sizes), 2) bar-bell, and 3) star-bar graphs. The usage of different typologies allows us to better understand the effect of good and bad mixing times on the performance of our technique.

To ensure state-of-the-art performance, we compared the proposed algorithm to six benchmark solvers: 1) Distributed SDD-Newton (Tutunov et al. (2016)), 2) Augmented Lagrangian for Dis- tributed Optimization (ADAL) (Chatzipanagiotis et al. (2015)), 3) Accelerated Dual Descent (ADD) with two different splittings (Zargham et al. (2013)), 4) dual sub-gradients, and 5) the fully dis- tributed algorithms for convex optimization (Mosk-Aoyama et al. (2007)) (FDA).

In all experiments, we used Φe(x(e)) =exp(x(e)) +exp(−x(e)) to represent the cost function on the

edges of the network. The flow vectors,b, were chosen so that the first component corresponded to 1 and the last to -1 with all others being 0.

Feasibility & Objective Value Results

In this section, we report the performance of all algorithms on various network typologies. The parameter details for each of the network typologies are detailed below:

1. Small Random Graphs: We refer to a 20-node 60-edge network as a small random one. Here, edges were generated uniformly at random. Typical, condition numbers for these networks ranged between 8-15. For ease of exposure, random small networks are referred to as ”sRandom Graph” in Figure 9.

2. Large Random Graphs: We refer to an 80-node 200-edge network as a large random one. Again, edges were generated uniformly at random. Condition numbers for such networks varied between 19-32. In Figure 10, we refer to large random networks as ”lRandom Graph”.

3. Bar-Bell Graphs: A bar-bell graph is a network consisting of two cliques connected by a line graph. In Figure 11 we considered a bar-bell network with 30 nodes. In this network, the condition number can resemble high values in the order of hundreds.

with two star-shapes connected by a line graph. Here, the condition number can also resemble high orders. In Figure 12, we refer to Bar-Star networks as ”Bar-Star Graph”.

A gradient threshold of 10−5 was used to assess convergence. For the solver in Algorithm 5, the

length of the chaindwas chosen according to

d=dlog 2 ln 3 √ 2 3 √ 2−1 ! κ(LG) ! e

and for the solver in Algorithm 6 the degree of Chebyshev polynomial was set to

k=d 1 2( p κ(LG) + 1) ln2 e

whereκ(LG) is the condition number of the graphGand= 10−3is the accuracy in approximating

the Newton direction. In our experiments, we relaxed this choice to a fixed constant step-size. We varied its values between [0.1, 0.2, 0.4] and similar performance to that reported in Figures 9, 10, 11, 12 was observed. Step-sizes for all other algorithms were chosen as suggested per the corresponding paper.

We assessed the performance of all methods using two evaluation criteria: 1) feasibility error||Axk−

b||2 with k being the iteration count, and 2) objective value f(xk) =PeΦe(x(e)). Results on the

four typologies are reported in Figures 9, 10, 11, 12. We first recognize that our proposed method is capable of outperforming others in both evaluation criteria. On small random graphs, for instance, our distributed Newton method achieves a low feasibility error of 10−6in 102.6to 102.8for the second

best being ADAL. This is also true on other typologies. For example, on bar-bell graphs we achieved a low feasibility error in about 101.7iterations compared to 102.5 for ADAL. Though comparable to

our performance, it is worth noting that ADAL does not adhere to the distributed framework we detailed before due to the need for global information in computing dual updates.1

Figure 9: Experimental Results for Small Random Graph

Figure 11: Experimental Results for Bar-Bell Graph

Figure 13: Experimental results: convergence, communication overhead, accuracy effect

Though successful, it is interesting to ask the question of whether our algorithm is capable of retrieving the exact optimal flowx∗. We computedx∗using the centralized Newton method running on the large random network of 80 nodes and 200 edges. We then traced||xk−x∗||2for all algorithms.

Results reported in Figure 13 (a) show that our techniques are capable of achieving a 0 value of the norm after∼102 iterations compared to values>103 for the other methods.

Communication Cost

One might argue that the improvements we achieved above arrive at high communication overhead between the processors of the network. This can be true, since at every iteration our fully distributed solvers require O κ(LG) log1 and Op

being the condition number of the graph. To better understand this phenomenon, we conducted an experiment with a random graph of 20 nodes and 60 edges generated uniformly at random. We measured the local communication exchange between processors as a function of the feasibility error which varied from 10−1 to 10−4 These results are shown in Figure 13 (b),(c). First, it is clear that

all algorithms are relatively comparable at low error demands. As these demands increase so does the communication cost for all approaches. Our methods’ growth, however, is slower compared to that of others, which can become exponential for ADD and sub-gradients.

Related documents