A.5 Proofs of the results in Section 4.4
A.5.5 Proof of Claim 4.4.1
For Algorithm 1 with constant stepsize η < ¯η1 (defined in (A.170)), since the objective
value ˜F (xt) is decreasing, we have ˜F (xt) ≤ ˜F (x0). By Proposition A.5.1 this implies
that the algorithm generates a sequence in K1 ∩ K2. By Claim 4.1.1 and the fact
K2 = Γ(βT) (see the definitions of K2 in (4.20) and the definition of Γ(·) in (4.10)),
∇ ˜F is Lipschitz continuous with Lipschitz constant L(βT) over the set K2. According
to [86, Proposition 1.2.3], each limit point of the sequence generated by Algorithm 1
with constant stepsize η < ¯η1 (A.170)
≤ 2/L(βT) is a stationary point of problem (P1).
We then consider Algorithm 1 with stepsize chosen by the restricted Armijo rule. The proof of [86, Proposition 1.2.1] for the standard Armijo rule can not be directly applied, and some extra effort is needed. For the restricted Armijo rule, the procedure of picking the stepsize ηkcan be viewed as a two-phase approach. In the first phase, we
find the smallest nonnegative integer so that the distance requirement is fulfilled, i.e.
i1 , min{i ∈ Z+| d(xk(ξis0), x0) ≤
5
6δ}, (A.180)
where Z+ denotes the set of nonnegative integers, and let ¯sk = ξi1s0. Since
d(xk(0), s0) = d(xk−1, x0) ≤
2
3δ, (A.181)
(according to Proposition 4.4.1 and Claim 4.4.3), such an integer i1 must exist. In the
is fulfilled, i.e.
i2, min{i ∈ Z+| ˜F (xk(ξi¯sk)) ≤ ˜F (xk−1) − σξi¯skk∇ ˜F (xk−1)k2F}, (A.182)
and let ηk= ξi2¯sk = ξi1+i2s0.
Note that the second phase follows the same procedure as the standard Armijo rule (see (1.11) of [86]). Hence the difference between the standard Armijo rule and the restricted Armijo rule can be viewed as the following: in each iteration the former starts from a fixed initial stepsize s while the latter starts from a varying initial stepsize ¯sk.
We notice that the proof of [86, Proposition 1.2.1] does not require the initial stepsizes to be constant, but rather the following property: if the final stepsize ηk goes to zero for
a subsequence k ∈ K, then for large enough k ∈ K the initial stepsize must be reduced at least once (see the remark after (1.17) in [86]). This property also holds when the initial stepsize is lower bounded (asymptotically). In the following, we will prove that for the restricted Armijo rule the initial stepsize ¯sk is lower bounded (asymptotically),
and then show how to apply the proof of [86, Proposition 1.2.1] to the restricted Armijo rule.
We first prove that the sequence {¯sk} is lower bounded (asymptotically), i.e.
lim inf
k→∞ s¯k> 0. (A.183)
Assume the contrary that lim infk→∞¯sk = 0, i.e. there exists a subsequence {¯sk}k∈K
that converges to zero. Since s0 is a fixed scalar, we can assume ¯sk < s0, ∀k ∈ K, thus
the corresponding i1> 0 for all k ∈ K. By the definition of i1 in (A.180), we know that
i1− 1 does not satisfy the distance requirement; in other words, we have
d(xk(ξ−1s¯k), x0) >
5 6δ. Denote gk−1 , ∇ ˜F (xk−1), then the above relation becomes
5 6δ < d(xk−1− ξ −1 ¯ skgk−1, x0) ≤ d(xk−1, x0) + ξ−1s¯kkgk−1kF (A.181) ≤ 2 3δ + ξ −1 ¯ skkgk−1kF, implying 1 6ξδ ≤ ¯skkgk−1kF.
Since 16ξδ is a constant and {¯sk}k∈K converges to zero, the above relation implies
k∇ ˜F (xk−1)kF is bounded above by a universal constant when kxk−1kF ≤ βT (note
that kxk−1kF ≤ βT holds due to Proposition 4.4.1 and Claim 4.4.3)), which is a contra-
diction. Therefore, (A.183) is proved.
Now we prove that each limit point of the sequence {xk} generated by Algorithm 1
with restricted Armijo rule is a stationary point. Assume the contrary that there exists a limit point ¯x with ∇ ˜F ( ¯x) 6= 0, and suppose the subsequence {xk}k∈Kconverges to ¯x. By
the same argument as that for [86, Proposition 1.2.1], we can prove that the subsequence of final stepsizes {ηk}k∈K → 0 (see the inequality before (1.17) in [86]). Since {¯sk} is
lower bounded (asymptotically), we must have that ¯sk > ηk, ∀ k ∈ K, k ≥ ¯k for large
enough ¯k. Thus the corresponding i2 > 0 for all k ∈ K, k ≥ ¯k. By the definition of i2 in
(A.182), we know that i2− 1 does not satisfy the reduction requirement; in other words,
we have ˜F (xk(ηkξ−1)) > ˜F (xk−1) − σηkξ−1k∇ ˜F (xk−1)k2F, or equivalently,
˜
F (xk−1) − ˜F (xk−1− ηkξ−1∇ ˜F (xk−1))) < σηkξ−1k∇ ˜F (xk−1)k2F, ∀ k ∈ K, k ≥ ¯k.
This relation is the same as (1.17) in [86] (except that (1.17) in [86] considers a more general descent direction), and the rest of the proof is also the same as [86] and is omitted here.
For Algorithm 1 with stepsize chosen by the restricted line search rule, since it “gives larger reduction in cost at each iteration” than the restricted Armijo rule, it “inherits the convergence properties” of the restricted Armijo rule (as remarked in the last paragraph of the proof of [86, Proposition 1.2.1]). The rigorous proof is similar to that in the second last paragraph of the proof of [86, Proposition 1.2.1]) and is omitted here.
Algorithm 2 is a two-block BCD method to solve problem (P1). According to [98, Corollary 2], each limit point of the sequence generated by Algorithm 2 is a stationary point of problem (P1).
Algorithm 3 belongs to the class of BSUM methods [82]. According to Proposition A.5.1, the level set X0 = {x | ˜F (x) ≤ ˜F (x0)} is a subset of the bounded set K1∩ K2,
thus X0 is bounded. Moreover, X0 is a closed set, thus X0 is compact. It is easy to verify that the objective function of each subproblem in Algorithm 3 is a convex tight upper bound of ˜F (x) (more precisely, satisfies Assumption 2 in [82]). It is also obvious that the objective function of each subproblem is strongly convex, thus each subproblem
of Algorithm 3 has a unique solution. Based on these facts, it follows from [82, Theorem 2] that each limit point of the sequence generated by Algorithm 3 is a stationary point. Algorithm 4 is a SGD method (or more precisely, incremental gradient method) with a specific stepsize rule. According to (A.175) and (A.177) in Appendix (A.5.4), Algorithm 4 can be viewed as an approximate gradient descent method with bounded error. By [99, Proposition 1], each limit point of the sequence generated by Algorithm 4 is a stationary point.