• No results found

“A Study on Conjugate Gradient Stopping Criteria in Truncated Newton Methods for Linear Classification”

N/A
N/A
Protected

Academic year: 2021

Share "“A Study on Conjugate Gradient Stopping Criteria in Truncated Newton Methods for Linear Classification”"

Copied!
11
0
0

Loading.... (view fulltext now)

Full text

(1)

Supplementary Materials for

“A Study on Conjugate Gradient Stopping Criteria in Truncated Newton Methods for Linear Classification”

For experiments we use two-class classification shown in Table I. Except for yahookr and yahoojp, others are available from LIBSVM Data Sets (2007). Table I shows the statistics of each data set used in our experiments.

Data sets #instances #features density log 2 (C best )

LR L2

HIGGS 11,000,000 28 92.1057% -6 -12

w8a 49,749 300 3.8834% 8 2

rcv1 20,242 47,236 0.1568% 6 0

real-sim 72,309 20,958 0.2448% 3 -1

news20 19,996 1,355,191 0.0336% 9 3

url 2,396,130 3,231,962 0.0036% -7 -10 yahoojp 176,203 832,026 0.0160% 3 -1 yahookr 460,554 3,052,939 0.0111% 6 1 webspam 350,000 16,609,143 0.0220% 2 -3 kdda 8,407,752 20,216,831 0.0001% -3 -5 kddb 19,264,097 29,890,095 0.0001% -1 -4 criteo 45,840,617 1,000,000 0.0039% -15 -12 kdd12 149,639,105 54,686,452 0.00002% -4 -11

Table I: Data statistics. The density is the average number of non-zeros per instance. C best is the regularization parameter selected by cross validation.

I. Additional Experimental Results

In this section we present complete results for linear classification problems with either

• LR-loss or

• L2-loss.

References

LIBSVM Data Sets, 2007. https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/

datasets.

c

.

(2)

0.0 0.2 0.4 0.6 0.8 1.0

CG iterations 1e2

10 −7 10 −6 10 −5 10 −4 10 −3 10 −2

(f − f * )/ f *

ResConst01 ResConst09 QuadConst01 QuadConst09

(a) HIGGS

0 2 4 6

CG iterations 1e2

10 −7 10 −6 10 −5 10 −4 10 −3 10 −2 10 −1 10 0

(f − f * )/ f *

ResConst01 ResConst09 QuadConst01 QuadConst09

(b) w8a

0.00 0.25 0.50 0.75 1.00 1.25 1.50

CG iterations 1e2

10 −6 10 −5 10 −4 10 −3 10 −2 10 −1 10 0 10 1

(f − f * )/ f *

ResConst01 ResConst09 QuadConst01 QuadConst09

(c) new20

0.0 0.2 0.4 0.6 0.8 1.0

CG iterations 1e2

10 −5 10 −4 10 −3 10 −2 10 −1 10 0 10 1

(f − f * )/ f *

ResConst01 ResConst09 QuadConst01 QuadConst09

(d ) rcv1

0.0 0.2 0.4 0.6 0.8

CG iterations 1e2

10 −6 10 −4 10 −2 10 0

(f − f * )/ f *

ResConst01 ResConst09 QuadConst01 QuadConst09

(e) real-sim

0.00 0.25 0.50 0.75 1.00 1.25 1.50

CG iterations 1e2

10 −5 10 −4 10 −3 10 −2 10 −1 10 0 10 1

(f − f * )/ f *

ResConst01 ResConst09 QuadConst01 QuadConst09

(f ) url

0.0 0.5 1.0 1.5 2.0

CG iterations 1e2

10 −6 10 −5 10 −4 10 −3 10 −2 10 −1 10 0

(f − f * )/ f *

ResConst01 ResConst09 QuadConst01 QuadConst09

(g) yahoojp

0.0 0.2 0.4 0.6 0.8

CG iterations 1e3

10 −6 10 −5 10 −4 10 −3 10 −2 10 −1 10 0

(f − f * )/ f *

ResConst01 ResConst09 QuadConst01 QuadConst09

(h) yahookr

0.0 0.5 1.0 1.5

CG iterations 1e2

10 −5 10 −4 10 −3 10 −2 10 −1 10 0 10 1

(f − f * )/ f *

ResConst01 ResConst09 QuadConst01 QuadConst09

(i ) webspam

0.0 0.5 1.0 1.5 2.0

CG iterations 1e3

10 −7 10 −6 10 −5 10 −4 10 −3 10 −2 10 −1 10 0

(f − f * )/ f *

ResConst01 ResConst09 QuadConst01 QuadConst09

(j ) kdda

0 1 2 3 4 5

CG iterations 1e3

10 −7 10 −6 10 −5 10 −4 10 −3 10 −2 10 −1 10 0

(f − f * )/ f *

ResConst01 ResConst09 QuadConst01 QuadConst09

(k ) kddb

0.0 0.2 0.4 0.6 0.8

CG iterations

1e2

10 −6 10 −5 10 −4 10 −3 10 −2 10 −1

(f − f * )/ f *

ResConst01 ResConst09 QuadConst01 QuadConst09

(l ) criteo

0 1 2 3 4 5

CG iterations 1e2

10 −5 10 −4 10 −3 10 −2 10 −1 10 0

(f − f * )/ f *

ResConst01 ResConst09 QuadConst01 QuadConst09

(m) kdd12

Figure i: A comparison of different constant thresholds in the inner stopping condition.

We show the convergence of a truncated Newton method for logistic regression. The x-axis

shows the cumulative number of CG iterations. Loss=LR and C = C best in each sub-figure.

(3)

0.0 0.5 1.0 1.5 2.0

CG iterations 1e2

10 −9 10 −7 10 −5 10 −3 10 −1

(f − f * )/ f *

ResConst01 ResConst09 QuadConst01 QuadConst09

(a) HIGGS

0.0 0.5 1.0 1.5

CG iterations 1e3

10 −6 10 −5 10 −4 10 −3 10 −2 10 −1 10 0

(f − f * )/ f *

ResConst01 ResConst09 QuadConst01 QuadConst09

(b) w8a

0.0 0.5 1.0 1.5 2.0 2.5

CG iterations 1e2

10 −3 10 −2 10 −1 10 0 10 1 10 2

(f − f * )/ f *

ResConst01 ResConst09 QuadConst01 QuadConst09

(c) new20

0.0 0.2 0.4 0.6 0.8 1.0

CG iterations 1e3

10 −3 10 −2 10 −1 10 0 10 1 10 2

(f − f * )/ f *

ResConst01 ResConst09 QuadConst01 QuadConst09

(d ) rcv1

0 1 2 3

CG iterations 1e2

10 −4 10 −3 10 −2 10 −1 10 0 10 1

(f − f * )/ f *

ResConst01 ResConst09 QuadConst01 QuadConst09

(e) real-sim

0.0 0.2 0.4 0.6 0.8 1.0

CG iterations 1e3

10 −4 10 −3 10 −2 10 −1 10 0 10 1

(f − f * )/ f *

ResConst01 ResConst09 QuadConst01 QuadConst09

(f ) url

0.0 0.5 1.0 1.5

CG iterations 1e3

10 −6 10 −5 10 −4 10 −3 10 −2 10 −1 10 0 10 1

(f − f * )/ f *

ResConst01 ResConst09 QuadConst01 QuadConst09

(g) yahoojp

0 1 2 3

CG iterations 1e4

10 −3 10 −2 10 −1 10 0 10 1

(f − f * )/ f *

ResConst01 ResConst09 QuadConst01 QuadConst09

(h) yahookr

0 2 4 6

CG iterations 1e2

10 −4 10 −3 10 −2 10 −1 10 0 10 1 10 2

(f − f * )/ f *

ResConst01 ResConst09 QuadConst01 QuadConst09

(i ) webspam

0 1 2 3 4

CG iterations 1e4

10 −4 10 −3 10 −2 10 −1 10 0

(f − f * )/ f *

ResConst01 ResConst09 QuadConst01 QuadConst09

(j ) kdda

0 2 4 6

CG iterations 1e4

10 −4 10 −3 10 −2 10 −1 10 0 10 1

(f − f * )/ f *

ResConst01 ResConst09 QuadConst01 QuadConst09

(k ) kddb

0 1 2 3 4 5

CG iterations

1e2

10 −6 10 −5 10 −4 10 −3 10 −2 10 −1

(f − f * )/ f *

ResConst01 ResConst09 QuadConst01 QuadConst09

(l ) criteo

0.0 0.2 0.4 0.6 0.8 1.0

CG iterations 1e4

10 −5 10 −4 10 −3 10 −2 10 −1 10 0

(f − f * )/ f *

ResConst01 ResConst09 QuadConst01 QuadConst09

(m) kdd12

Figure ii: A comparison of different constant thresholds in the inner stopping condition.

We show the convergence of a truncated Newton method for logistic regression. The x-

axis shows the cumulative number of CG iterations. Loss=LR and C = 100C best in each

sub-figure.

(4)

0.0 0.2 0.4 0.6 0.8 1.0

CG iterations 1e2

10 7 10 6 10 5 10 4 10 3 10 2

(f f * )/ f *

ResConst09 ResAdaL2 QuadConst09 QuadAdaL2

(a) HIGGS

0 1 2 3 4 5

CG iterations 1e2

10 −7 10 −6 10 −5 10 −4 10 −3 10 −2 10 −1 10 0

(f − f * )/ f *

ResConst09 ResAdaL2 QuadConst09 QuadAdaL2

(b) w8a

0.0 0.2 0.4 0.6 0.8 1.0

CG iterations 1e2

10 −6 10 −5 10 −4 10 −3 10 −2 10 −1 10 0 10 1

(f − f * )/ f *

ResConst09 ResAdaL2 QuadConst09 QuadAdaL2

(c) new20

0 2 4 6

CG iterations 1e1

10 −5 10 −4 10 −3 10 −2 10 −1 10 0 10 1

(f − f * )/ f *

ResConst09 ResAdaL2 QuadConst09 QuadAdaL2

(d ) rcv1

0 2 4 6

CG iterations 1e1

10 −6 10 −4 10 −2 10 0

(f − f * )/ f *

ResConst09 ResAdaL2 QuadConst09 QuadAdaL2

(e) real-sim

0.0 0.5 1.0 1.5

CG iterations 1e2

10 −5 10 −4 10 −3 10 −2 10 −1 10 0 10 1

(f − f * )/ f *

ResConst09 ResAdaL2 QuadConst09 QuadAdaL2

(f ) url

0.0 0.5 1.0 1.5 2.0

CG iterations 1e2

10 −6 10 −5 10 −4 10 −3 10 −2 10 −1 10 0

(f − f * )/ f *

ResConst09 ResAdaL2 QuadConst09 QuadAdaL2

(g) yahoojp

0.0 0.2 0.4 0.6 0.8 1.0

CG iterations 1e3

10 −6 10 −5 10 −4 10 −3 10 −2 10 −1 10 0

(f − f * )/ f *

ResConst09 ResAdaL2 QuadConst09 QuadAdaL2

(h) yahookr

0.0 0.5 1.0 1.5

CG iterations 1e2

10 −5 10 −4 10 −3 10 −2 10 −1 10 0 10 1

(f − f * )/ f *

ResConst09 ResAdaL2 QuadConst09 QuadAdaL2

(i ) webspam

0.0 0.5 1.0 1.5

CG iterations 1e3

10 −7 10 −6 10 −5 10 −4 10 −3 10 −2 10 −1 10 0

(f − f * )/ f *

ResConst09 ResAdaL2 QuadConst09 QuadAdaL2

(j ) kdda

0 1 2 3 4

CG iterations 1e3

10 −7 10 −6 10 −5 10 −4 10 −3 10 −2 10 −1 10 0

(f − f * )/ f *

ResConst09 ResAdaL2 QuadConst09 QuadAdaL2

(k ) kddb

0.0 0.2 0.4 0.6 0.8

CG iterations

1e2

10 −6 10 −5 10 −4 10 −3 10 −2 10 −1

(f − f * )/ f *

ResConst09 ResAdaL2 QuadConst09 QuadAdaL2

(l ) criteo

0 1 2 3 4 5

CG iterations 1e2

10 −5 10 −4 10 −3 10 −2 10 −1 10 0

(f − f * )/ f *

ResConst09 ResAdaL2 QuadConst09 QuadAdaL2

(m) kdd12

Figure iii: A comparison between adaptive and constant forcing sequences. We show the

convergence of a truncated Newton method for logistic regression. The x-axis shows the

cumulative number of CG iterations. Loss=LR and C = C best in each sub-figure.

(5)

0.0 0.5 1.0 1.5

CG iterations 1e2

10 −9 10 −7 10 −5 10 −3 10 −1

(f − f * )/ f *

ResConst09 ResAdaL2 QuadConst09 QuadAdaL2

(a) HIGGS

0.0 0.2 0.4 0.6 0.8

CG iterations 1e3

10 −6 10 −5 10 −4 10 −3 10 −2 10 −1 10 0

(f − f * )/ f *

ResConst09 ResAdaL2 QuadConst09 QuadAdaL2

(b) w8a

0.0 0.5 1.0 1.5 2.0

CG iterations 1e2

10 −3 10 −2 10 −1 10 0 10 1 10 2

(f − f * )/ f *

ResConst09 ResAdaL2 QuadConst09 QuadAdaL2

(c) new20

0.0 0.5 1.0 1.5 2.0

CG iterations 1e2

10 −3 10 −2 10 −1 10 0 10 1 10 2

(f − f * )/ f *

ResConst09 ResAdaL2 QuadConst09 QuadAdaL2

(d ) rcv1

0 1 2 3

CG iterations 1e2

10 −4 10 −3 10 −2 10 −1 10 0 10 1

(f − f * )/ f *

ResConst09 ResAdaL2 QuadConst09 QuadAdaL2

(e) real-sim

0.0 0.2 0.4 0.6 0.8 1.0

CG iterations 1e3

10 −4 10 −3 10 −2 10 −1 10 0 10 1

(f − f * )/ f *

ResConst09 ResAdaL2 QuadConst09 QuadAdaL2

(f ) url

0.00 0.25 0.50 0.75 1.00 1.25 1.50

CG iterations 1e3

10 −6 10 −5 10 −4 10 −3 10 −2 10 −1 10 0 10 1

(f − f * )/ f *

ResConst09 ResAdaL2 QuadConst09 QuadAdaL2

(g) yahoojp

0.0 0.2 0.4 0.6 0.8 1.0

CG iterations 1e4

10 −3 10 −2 10 −1 10 0 10 1

(f − f * )/ f *

ResConst09 ResAdaL2 QuadConst09 QuadAdaL2

(h) yahookr

0 1 2 3 4 5

CG iterations 1e2

10 −4 10 −3 10 −2 10 −1 10 0 10 1 10 2

(f − f * )/ f *

ResConst09 ResAdaL2 QuadConst09 QuadAdaL2

(i ) webspam

0.00 0.25 0.50 0.75 1.00 1.25 1.50

CG iterations 1e4

10 −4 10 −3 10 −2 10 −1 10 0

(f − f * )/ f *

ResConst09 ResAdaL2 QuadConst09 QuadAdaL2

(j ) kdda

0 1 2 3 4

CG iterations 1e4

10 −4 10 −3 10 −2 10 −1 10 0 10 1

(f − f * )/ f *

ResConst09 ResAdaL2 QuadConst09 QuadAdaL2

(k ) kddb

0 1 2 3 4 5 6

CG iterations

1e2

10 −6 10 −5 10 −4 10 −3 10 −2 10 −1

(f − f * )/ f *

ResConst09 ResAdaL2 QuadConst09 QuadAdaL2

(l ) criteo

0 1 2 3 4 5

CG iterations 1e3

10 −5 10 −4 10 −3 10 −2 10 −1 10 0

(f − f * )/ f *

ResConst09 ResAdaL2 QuadConst09 QuadAdaL2

(m) kdd12

Figure iv: A comparison between adaptive and constant forcing sequences. We show the

convergence of a truncated Newton method for logistic regression. The x-axis shows the

cumulative number of CG iterations. Loss=LR and C = 100C best in each sub-figure.

(6)

0.0 0.2 0.4 0.6 0.8 1.0

CG iterations 1e2

10 7 10 6 10 5 10 4 10 3 10 2

(f f * )/ f *

ResAdaL1 ResAdaL2 QuadAdaL2 QuadAdaL1 TR

(a) HIGGS

0 1 2 3 4 5 6

CG iterations 1e2

10 −7 10 −6 10 −5 10 −4 10 −3 10 −2 10 −1 10 0

(f − f * )/ f *

ResAdaL1 ResAdaL2 QuadAdaL2 QuadAdaL1 TR

(b) w8a

0.00 0.25 0.50 0.75 1.00 1.25 1.50

CG iterations 1e2

10 −6 10 −5 10 −4 10 −3 10 −2 10 −1 10 0 10 1

(f − f * )/ f *

ResAdaL1 ResAdaL2 QuadAdaL2 QuadAdaL1 TR

(c) new20

0.0 0.2 0.4 0.6 0.8

CG iterations 1e2

10 −5 10 −4 10 −3 10 −2 10 −1 10 0 10 1

(f − f * )/ f *

ResAdaL1 ResAdaL2 QuadAdaL2 QuadAdaL1 TR

(d ) rcv1

0.0 0.2 0.4 0.6 0.8

CG iterations 1e2

10 −6 10 −4 10 −2 10 0

(f − f * )/ f *

ResAdaL1 ResAdaL2 QuadAdaL2 QuadAdaL1 TR

(e) real-sim

0.0 0.5 1.0 1.5

CG iterations 1e2

10 −5 10 −4 10 −3 10 −2 10 −1 10 0 10 1

(f − f * )/ f *

ResAdaL1 ResAdaL2 QuadAdaL2 QuadAdaL1 TR

(f ) url

0.0 0.5 1.0 1.5 2.0

CG iterations 1e2

10 −6 10 −5 10 −4 10 −3 10 −2 10 −1 10 0

(f − f * )/ f *

ResAdaL1 ResAdaL2 QuadAdaL2 QuadAdaL1 TR

(g) yahoojp

0.0 0.2 0.4 0.6 0.8 1.0

CG iterations 1e3

10 −6 10 −5 10 −4 10 −3 10 −2 10 −1 10 0

(f − f * )/ f *

ResAdaL1 ResAdaL2 QuadAdaL2 QuadAdaL1 TR

(h) yahookr

0.0 0.5 1.0 1.5

CG iterations 1e2

10 −5 10 −4 10 −3 10 −2 10 −1 10 0 10 1

(f − f * )/ f *

ResAdaL1 ResAdaL2 QuadAdaL2 QuadAdaL1 TR

(i ) webspam

0.0 0.5 1.0 1.5

CG iterations 1e3

10 −7 10 −6 10 −5 10 −4 10 −3 10 −2 10 −1 10 0

(f − f * )/ f *

ResAdaL1 ResAdaL2 QuadAdaL2 QuadAdaL1 TR

(j ) kdda

0 1 2 3 4

CG iterations 1e3

10 −7 10 −6 10 −5 10 −4 10 −3 10 −2 10 −1 10 0

(f − f * )/ f *

ResAdaL1 ResAdaL2 QuadAdaL2 QuadAdaL1 TR

(k ) kddb

0.0 0.2 0.4 0.6 0.8 1.0 1.2

CG iterations

1e2

10 −6 10 −5 10 −4 10 −3 10 −2 10 −1

(f − f * )/ f *

ResAdaL1 ResAdaL2 QuadAdaL2 QuadAdaL1 TR

(l ) criteo

0 2 4 6

CG iterations 1e2

10 −5 10 −4 10 −3 10 −2 10 −1 10 0

(f − f * )/ f *

ResAdaL1 ResAdaL2 QuadAdaL2 QuadAdaL1 TR

(m) kdd12

Figure v: An investigation on the robustness of adaptive rules and a comparison with

the trust region approach. We show the convergence of a truncated Newton method for

logistic regression. The x-axis shows the cumulative number of CG iterations. Loss=LR

and C = C best in each sub-figure.

(7)

0.0 0.2 0.4 0.6 0.8 1.0 1.2

CG iterations 1e2

10 −9 10 −7 10 −5 10 −3 10 −1

(f − f * )/ f *

ResAdaL1 ResAdaL2 Q adAdaL2 Q adAdaL1 TR

(a) HIGGS

0.0 0.2 0.4 0.6 0.8 1.0 1.2

CG iterations 1e3

10 −6 10 −5 10 −4 10 −3 10 −2 10 −1 10 0

(f − f * )/ f *

ResAdaL1 ResAdaL2 QuadAdaL2 QuadAdaL1 TR

(b) w8a

0.0 0.5 1.0 1.5 2.0 2.5

CG iterations 1e2

10 −3 10 −2 10 −1 10 0 10 1 10 2

(f − f * )/ f *

ResAdaL1 ResAdaL2 QuadAdaL2 QuadAdaL1 TR

(c) new20

0 1 2 3

CG iterations 1e2

10 −3 10 −2 10 −1 10 0 10 1 10 2

(f − f * )/ f *

ResAdaL1 ResAdaL2 QuadAdaL2 QuadAdaL1 TR

(d ) rcv1

0 1 2 3 4

CG iterations 1e2

10 −4 10 −3 10 −2 10 −1 10 0 10 1

(f − f * )/ f *

ResAdaL1 ResAdaL2 QuadAdaL2 QuadAdaL1 TR

(e) real-sim

0.0 0.2 0.4 0.6 0.8 1.0

CG iterations 1e3

10 −4 10 −3 10 −2 10 −1 10 0 10 1

(f − f * )/ f *

ResAdaL1 ResAdaL2 QuadAdaL2 QuadAdaL1 TR

(f ) url

0.00 0.25 0.50 0.75 1.00 1.25 1.50

CG iterations 1e3

10 −6 10 −5 10 −4 10 −3 10 −2 10 −1 10 0 10 1

(f − f * )/ f *

ResAdaL1 ResAdaL2 QuadAdaL2 QuadAdaL1 TR

(g) yahoojp

0.0 0.2 0.4 0.6 0.8 1.0 1.2

CG iterations 1e4

10 −3 10 −2 10 −1 10 0 10 1

(f − f * )/ f *

ResAdaL1 ResAdaL2 QuadAdaL2 QuadAdaL1 TR

(h) yahookr

0 2 4 6

CG iterations 1e2

10 −4 10 −3 10 −2 10 −1 10 0 10 1 10 2

(f − f * )/ f *

ResAdaL1 ResAdaL2 QuadAdaL2 QuadAdaL1 TR

(i ) webspam

0.0 0.5 1.0 1.5 2.0

CG iterations 1e4

10 −4 10 −3 10 −2 10 −1 10 0

(f − f * )/ f *

ResAdaL1 ResAdaL2 QuadAdaL2 QuadAdaL1 TR

(j ) kdda

0 2 4 6

CG iterations 1e4

10 −4 10 −3 10 −2 10 −1 10 0 10 1

(f − f * )/ f *

ResAdaL1 ResAdaL2 QuadAdaL2 QuadAdaL1 TR

(k ) kddb

0.0 0.2 0.4 0.6 0.8

CG iterations

1e3

10 −6 10 −5 10 −4 10 −3 10 −2 10 −1

(f − f * )/ f *

ResAdaL1 ResAdaL2 QuadAdaL2 QuadAdaL1 TR

(l ) criteo

0.0 0.2 0.4 0.6 0.8

CG iterations 1e4

10 −5 10 −4 10 −3 10 −2 10 −1 10 0

(f − f * )/ f *

ResAdaL1 ResAdaL2 QuadAdaL2 QuadAdaL1 TR

(m) kdd12

Figure vi: An investigation on the robustness of adaptive rules and a comparison with

the trust region approach. We show the convergence of a truncated Newton method for

logistic regression. The x-axis shows the cumulative number of CG iterations. Loss=LR

and C = 100C best in each sub-figure.

(8)

0.0 0.2 0.4 0.6 0.8 1.0

CG iterations 1e2

10 −1 10 0

C osi ne

QuadAdaL2 ResConst09 ResAdaL2

(a) HIGGS

0.0 0.5 1.0 1.5 2.0 2.5

CG iterations 1e2

10 −1 10 0

C osi ne

QuadAdaL2 ResConst09 ResAdaL2

(b) w8a

0 2 4 6

CG iterations 1e1

10 −1 10 0

C osi ne

QuadAdaL2 ResConst09 ResAdaL2

(c) new20

0 1 2 3 4

CG iterations 1e1

10 −1 10 0

C osi ne

QuadAdaL2 ResConst09 ResAdaL2

(d ) rcv1

0 1 2 3 4 5

CG iterations 1e1

10 −1 10 0

C osi ne

QuadAdaL2 ResConst09 ResAdaL2

(e) real-sim

0.0 0.2 0.4 0.6 0.8 1.0 1.2

CG iterations 1e2

10 −1 10 0

C osi ne

QuadAdaL2 ResConst09 ResAdaL2

(f ) url

0.00 0.25 0.50 0.75 1.00 1.25 1.50

CG iterations 1e2

10 −1 10 0

C osi ne

QuadAdaL2 ResConst09 ResAdaL2

(g) yahoojp

0 2 4 6

CG iterations 1e2

10 −1 10 0

C osi ne

QuadAdaL2 ResConst09 ResAdaL2

(h) yahookr

0.0 0.2 0.4 0.6 0.8 1.0 1.2

CG iterations 1e2

10 −1 10 0

C osi ne

QuadAdaL2 ResConst09 ResAdaL2

(i ) webspam

0.0 0.2 0.4 0.6 0.8 1.0 1.2

CG iterations 1e3

10 −1 10 0

C osi ne

QuadAdaL2 ResConst09 ResAdaL2

(j ) kdda

0.0 0.5 1.0 1.5 2.0 2.5 3.0

CG iterations 1e3

10 −1 10 0

C osi ne

QuadAdaL2 ResConst09 ResAdaL2

(k ) kddb

0 2 4 6

CG iterations

1e1

10−1 100

C osi ne

QuadAdaL2 ResConst09 ResAdaL2

(l ) criteo

0.0 0.5 1.0 1.5 2.0 2.5 3.0

CG iterations 1e2

10 −1 10 0

C osi ne

QuadAdaL2 ResConst09 ResAdaL2

(m) kdd12

Figure vii: Cosines between the anti-gradient −∇f (w w w k ) and the resulting direction s s s k ,

using different truncation rules. Iterations in which the cosine is below the 0.1 threshold

are marked with a 6. The x-axis shows the cumulative number of CG iterations. Loss=LR

and C = C best in each sub-figure.

(9)

0.0 0.2 0.4 0.6 0.8 1.0

CG iterations 1e2

10 −1 10 0

C osi ne

QuadAdaL2 ResConst09 ResAdaL2

(a) HIGGS

0 1 2 3

CG iterations 1e2

10 −1 10 0

C osi ne

QuadAdaL2 ResConst09 ResAdaL2

(b) w8a

0.0 0.2 0.4 0.6 0.8

CG iterations 1e2

10 −1 10 0

C osi ne

QuadAdaL2 ResConst09 ResAdaL2

(c) new20

0.0 0.2 0.4 0.6 0.8

CG iterations 1e2

10 −1 10 0

C osi ne

QuadAdaL2 ResConst09 ResAdaL2

(d ) rcv1

0.0 0.5 1.0 1.5 2.0

CG iterations 1e2

10 −1 10 0

C osi ne

QuadAdaL2 ResConst09 ResAdaL2

(e) real-sim

0.0 0.2 0.4 0.6 0.8

CG iterations 1e3

10 −1 10 0

C osi ne

QuadAdaL2 ResConst09 ResAdaL2

(f ) url

0.0 0.2 0.4 0.6 0.8 1.0 1.2

CG iterations 1e3

10 −1 10 0

C osi ne

QuadAdaL2 ResConst09 ResAdaL2

(g) yahoojp

0.0 0.2 0.4 0.6 0.8 1.0

CG iterations 1e4

10 −1 10 0

C osi ne

QuadAdaL2 ResConst09 ResAdaL2

(h) yahookr

0 1 2 3

CG iterations 1e2

10 −1 10 0

C osi ne

QuadAdaL2 ResConst09 ResAdaL2

(i ) webspam

0.00 0.25 0.50 0.75 1.00 1.25 1.50

CG iterations 1e4

10 −1 10 0

C osi ne

QuadAdaL2 ResConst09 ResAdaL2

(j ) kdda

0.0 0.5 1.0 1.5

CG iterations 1e4

10 −1 10 0

C osi ne

QuadAdaL2 ResConst09 ResAdaL2

(k ) kddb

0 1 2 3 4

CG iterations

1e2

10−1 100

C osi ne

QuadAdaL2 ResConst09 ResAdaL2

(l ) criteo

0 1 2 3 4

CG iterations 1e3

10 −1 10 0

C osi ne

QuadAdaL2 ResConst09 ResAdaL2

(m) kdd12

Figure viii: Cosines between the anti-gradient −∇f (w w w k ) and the resulting direction s s s k ,

using different truncation rules. Iterations in which the cosine is below the 0.1 threshold

are marked with a 6. The x-axis shows the cumulative number of CG iterations. Loss=LR

and C = 100C best in each sub-figure.

(10)

0.0 0.2 0.4 0.6 0.8

CG iterations 1e2

10 −5 10 −4 10 −3 10 −2 10 −1

(f − f * )/ f *

ResConst09_l2 ResAdaL2_l2 Q adConst09_l2 Q adAdaL2_l2 ResConst01_l2

(a) HIGGS

0 1 2 3 4 5

CG iterations 1e2

10 −6 10 −5 10 −4 10 −3 10 −2 10 −1 10 0

(f − f * )/ f *

ResConst09_l2 ResAdaL2_l2 QuadConst09_l2 QuadAdaL2_l2 ResConst01_l2

(b) w8a

0.0 0.5 1.0 1.5 2.0

CG iterations 1e2

10 6 10 5 10 4 10 3 10 2 10 1 10 0 10 1

(f f * )/ f *

ResConst09_l2 ResAdaL2_l2 QuadConst09_l2 QuadAdaL2_l2 ResConst01_l2

(c) new20

0.0 0.2 0.4 0.6 0.8

CG i era ions 1e2

10 −7 10 −5 10 −3 10 −1 10 1

(f − f * )/ f *

ResCons 09_l2 ResAdaL2_l2 QuadCons 09_l2 QuadAdaL2_l2 ResCons 01_l2

(d ) rcv1

0.0 0.2 0.4 0.6 0.8

CG iterations 1e2

10 −6 10 −4 10 −2 10 0

(f − f * )/ f *

ResConst09_l2 ResAdaL2_l2 QuadConst09_l2 QuadAdaL2_l2 ResConst01_l2

(e) real-sim

0.0 0.5 1.0 1.5 2.0 2.5 3.0

CG iterations 1e2

10 −5 10 −4 10 −3 10 −2 10 −1 10 0 10 1

(f − f * )/ f *

ResConst09_l2 ResAdaL2_l2 QuadConst09_l2 QuadAdaL2_l2 ResConst01_l2

(f ) url

0.0 0.2 0.4 0.6 0.8 1.0

CG i era ions 1e3

10 −7 10 −6 10 −5 10 −4 10 −3 10 −2 10 −1 10 0

(f − f * )/ f *

ResCons 09_l2 ResAdaL2_l2 QuadCons 09_l2 QuadAdaL2_l2 ResCons 01_l2

(g) yahoojp

0.0 0.2 0.4 0.6 0.8

CG iterations 1e3

10 −5 10 −4 10 −3 10 −2 10 −1 10 0

(f − f * )/ f *

ResConst09_l2 ResAdaL2_l2 Q adConst09_l2 Q adAdaL2_l2 ResConst01_l2

(h) yahookr

0.0 0.5 1.0 1.5 2.0

CG iterations 1e2

10 6 10 5 10 4 10 3 10 2 10 1 10 0 10 1

(f f * )/ f *

ResConst09_l2 ResAdaL2_l2 QuadConst09_l2 QuadAdaL2_l2 ResConst01_l2

(i ) webspam

0 1 2 3 4

CG iterations 1e3

10 −6 10 −5 10 −4 10 −3 10 −2 10 −1 10 0

(f − f * )/ f *

ResConst09_l2 ResAdaL2_l2 QuadConst09_l2 QuadAdaL2_l2 ResConst01_l2

(j ) kdda

0 1 2 3 4 5

CG iterations 1e3

10 −6 10 −5 10 −4 10 −3 10 −2 10 −1 10 0

(f − f * )/ f *

ResConst09_l2 ResAdaL2_l2 QuadConst09_l2 QuadAdaL2_l2 ResConst01_l2

(k ) kddb

0 2 4 6

CG iterations

1e2

10 1

9 × 100

(f − f * )/ f *

ResConst09_l2 ResAdaL2_l2 QuadConst09_l2 QuadAdaL2_l2 ResConst01_l2

(l ) criteo

0 1 2 3 4

CG iterations 1e2

10 −8 10 −6 10 −4 10 −2 10 0

(f − f * )/ f *

ResConst09_l2 ResAdaL2_l2 QuadConst09_l2 QuadAdaL2_l2 ResConst01_l2

(m) kdd12

Figure ix: A comparison between adaptive and constant forcing sequences. We show the

convergence of a truncated Newton method for linear SVM. The x-axis shows the cumulative

number of CG iterations. Loss=L2 and C = C best in each sub-figure.

(11)

0.0 0.2 0.4 0.6 0.8

CG iterations 1e2

10 −5 10 −4 10 −3 10 −2 10 −1

(f − f * )/ f *

ResConst09_l2 ResAdaL2_l2 Q adConst09_l2 Q adAdaL2_l2 ResConst01_l2

(a) HIGGS

0.0 0.5 1.0 1.5 2.0

CG iterations 1e3

10 −8 10 −6 10 −4 10 −2 10 0

(f − f * )/ f *

ResConst09_l2 ResAdaL2_l2 Q adConst09_l2 Q adAdaL2_l2 ResConst01_l2

(b) w8a

0.0 0.2 0.4 0.6 0.8 1.0

CG iterations 1e4

10 4 10 3 10 2 10 1 10 0 10 1 10 2

(f f * )/ f *

ResConst09_l2 ResAdaL2_l2 QuadConst09_l2 QuadAdaL2_l2 ResConst01_l2

(c) new20

0.00 0.25 0.50 0.75 1.00 1.25

CG iterations 1e3

10 4 10 3 10 2 10 1 10 0 10 1 10 2

(f f * )/ f *

ResConst09_l2 ResAdaL2_l2 QuadConst09_l2 QuadAdaL2_l2 ResConst01_l2

(d ) rcv1

0 2 4 6

CG iterations 1e2

10 −5 10 −3 10 −1 10 1

(f − f * )/ f *

ResConst09_l2 ResAdaL2_l2 QuadConst09_l2 QuadAdaL2_l2 ResConst01_l2

(e) real-sim

0.0 0.5 1.0 1.5 2.0

CG iterations 1e3

10 −5 10 −4 10 −3 10 −2 10 −1 10 0 10 1 10 2

(f − f * )/ f *

ResConst09_l2 ResAdaL2_l2 QuadConst09_l2 QuadAdaL2_l2 ResConst01_l2

(f ) url

0.0 0.5 1.0 1.5 2.0

CG iterations 1e3

10 6 10 5 10 4 10 3 10 2 10 1 10 0 10 1

(f f * )/ f *

ResConst09_l2 ResAdaL2_l2 QuadConst09_l2 QuadAdaL2_l2 ResConst01_l2

(g) yahoojp

0 1 2 3 4 5

CG iterations 1e3

10 −4 10 −3 10 −2 10 −1 10 0 10 1

(f − f * )/ f *

ResConst09_l2 ResAdaL2_l2 QuadConst09_l2 QuadAdaL2_l2 ResConst01_l2

(h) yahookr

0 1 2 3 4 5 6

CG iterations 1e2

10 −3 10 −2 10 −1 10 0 10 1 10 2

(f − f * )/ f *

ResConst09_l2 ResAdaL2_l2 QuadConst09_l2 QuadAdaL2_l2 ResConst01_l2

(i ) webspam

0 1 2 3 4

CG iterations 1e4

10 −5 10 −4 10 −3 10 −2 10 −1 10 0 10 1

(f − f * )/ f *

ResConst09_l2 ResAdaL2_l2 QuadConst09_l2 QuadAdaL2_l2 ResConst01_l2

(j ) kdda

0 1 2 3 4

CG iterations 1e4

10 −3 10 −2 10 −1 10 0 10 1

(f − f * )/ f *

ResConst09_l2 ResAdaL2_l2 QuadConst09_l2 QuadAdaL2_l2 ResConst01_l2

(k ) kddb

0.0 0.5 1.0 1.5 2.0

CG iterations

1e3

10 1

(f − f * )/ f *

ResConst09_l2 ResAdaL2_l2 QuadConst09_l2 QuadAdaL2_l2 ResConst01_l2

(l ) criteo

0.0 0.5 1.0 1.5

CG iterations 1e3

10 −7 10 −6 10 −5 10 −4 10 −3 10 −2 10 −1 10 0

(f − f * )/ f *

ResConst09_l2 ResAdaL2_l2 Q adConst09_l2 Q adAdaL2_l2 ResConst01_l2

(m) kdd12

Figure x: A comparison between adaptive and constant forcing sequences. We show the

convergence of a truncated Newton method for linear SVM. The x-axis shows the cumulative

number of CG iterations. Loss=L2 and C = 100C best in each sub-figure.

References

Related documents

5 An interesting question is whether think tanks are more than the sum of their individual parts in two senses: first, whether the repu- tation effect of a think tank overall might

c) Below is a list of module (by default) that can be accessed by Public User.. Bhd Function List Product Description 4-6 24 January 2017–Version 1.3.1 Overview, Continued

To exit out of the humidifier settings without saving any changes, select the red X by using the Left and Right buttons to highlight it, then press the Center button.. If the

We restrict tax collections to be ane functions of labor income. Agents dier in their pro- ductivities and wealth. They trade a single security whose payo possibly depends on

Breanna Williams, for the Master of Music degree in Vocal Performance, presented on June 15, 2020, at Southern Illinois University Carbondale.. TITLE: SCHOLARLY PROGRAM NOTES ON

Duplin County Schools’ Beginning Teacher Support Program has an exciting partnership with UNCW, designed to promote teacher leadership and teacher efficacy. Our program is proud

The diploma completion program must employ Kansas licensed teachers to provide instruction and/or Kansas licensed virtual course monitors to provide oversight of students. If

By applying the distributed concept to the existing ATM system, the performance is increased in following ways. ƒ Interconnection among existing databases: Client information