Computational Complexity - Networks In Chapter 4, we showed that if a function class is not clo

Networks In Chapter 4, we showed that if a function class is not closure-convex, then the sample complexity

Chapter 6 Computational Complexity

Another important component of a learning problem is its computational complexity. As comput- ing capabilities increase, we expect to be able to solve harder learning problems. However, if the computational requirements grow too quickly with the size of the problem, the size of solvable problems will remain quite restricted. One of the main aims of computational learning theory is the study of the maximum size of learning problems which can be solved using a reasonable amount of computation (Valiant 1984). We delineate the boundary of what is feasibly leamable by requiring the computational requirements of leamable problems to be polynomial in 1 /e, I / 5 and the relevant complexity parameters. Classes of functions for which this can be done are said to be efficiently leamable.

In this chapter, we study the computational requirements of agnostically learning single hidden layer neural networks. We first relate the computational complexity of agnostically learning the basis function class to the computational complexity of learning the class of single hidden layer neural networks. We do this via an iterative approximation result which shows that by iteratively adding a function to the convex combination of a function class such that the distance to the target function is minimised, good convergence to the best approximation in the convex hull of the function class can be achieved even when the target function is not in the convex hull. The iterative approximation result is described in Section 6.1. We then show in Section 6.2 how the iterative approximation result can be used to show that if a basis function class is efficiently agnostically

leamable, then the convex hull of the function class is also efficiently agnostically leamable. Since a basis function class is contained in the convex hull, this means that the convex hull of a function class (and the class of single hidden layer neural networks with those basis functions) is efficiently agnostically leamable if and only if the basis function class is efficiently agnostically leamable.

Leaming {0,1 }-valued functions with {0,1} valued targets is widely studied in computational learning theory. We call the proper agnostic version of this problem proper agnostic PAC leaming (Keams et al. 1994). In Section 6.3, we show how a proper efficient agnostic PAC leaming algorithm for a basis function class G can be used to efficiently agnostically leam single hidden layer neural networks (with real-valued outputs) with hidden units from Q.

In Section 6.4, we show that the problem of agnostically leaming of some classes of single hidden layer neural networks (including networks with linear threshold hidden units) is likely to be difficult computationally. We do this by showing that an algorithm for agnostically leaming the network can be used for PAC leaming polynomial sized DNF formulae. Whether the class of polynomial sized DNF formulae is PAC leamable has been an open problem in computational leaming theory since it was first posed by Valiant (1984). It is generally believed that polynomial sized DNF is not efficiently leamable (Jermm 1994).

In view of this, we consider leaming subclasses of single hidden layer neural networks. In Chapter 7, we will study functions with finite q-th absolute moment of the Fourier transform. In Section 6.5, we show that the class of single hidden layer neural networks with linear threshold hidden units and bounded fan-in is efficiently agnostically leamable. We end the chapter with a discussion of the results in Section 6.6.

6.1 Iterative Approximation

The iterative approximation result in this section is an extension of the results of Jones (1992) and Barron (1993). They showed that if a function is in the closure of the convex hull of a bounded set of functions in a Hilbert space, then it can be approximated by iteratively adding functions from the set such that the squared distance to the target function is of order 0(1/A;), where k is the number of functions added. We extend the result in order to allow agnostic leaming. We show that even if the target function is not in closure of the convex hull, the iterative approximation scheme will converge to the best possible approximation such that the squared distance to the target will approach the optimal squared distance at a rate of 0 ( l / f c ) .

We now give the iterative approximation result which is the key to showing the equivalence between efficient agnostic learning of a function class and efficient agnostic learning of its convex hull.

Theorem 6.1 Let K be a Hilbert space with norm || • ||. Let Q be a subset ofH with |1 g || < 6 for each g G Q. Letco{Q) be the convex hull of Q. For any f eV., let df = infg>^co(g) W g' - f

Suppose that f\ is chosen to satisfy

/ i - / f < inf W g - f f +ei g&y

and iteratively, fk is chosen to satisfy

f k - f l r < inf i|_g&y afk-i + a g - f f +ek

where a = 1 - l/{k + I), a = \ — a, c > b^, and e^ < Then for every fc > 1,

f - f k f -d) < 4c

(6.1) Proof. Given 5 > 0, let / * be a point in the convex hull of g with \\ f* - f \\< df + 5. Thus f* = E i l i ngi with and 7i = 1 for some sufficiently large N. Then for all a e [0,1],

afk-x+ag-f\\^

=11 +

= 11 afk-x +ag-f* f + || /* - / +2{afk-i +ag- f \ f* - f ) ,

where (•, •) is the inner product in the Hilbert space H . Thus,

c ^ f k - i + ^ g - f f - W r - f

afk-i +ag-f* f +2{afk-x + a g - f * J * - f )

- n + ^ i g - n 11^ + 2 ( « / f c - i + ^ g - r j * - f ) = i i A - i - r f +a'\\g-r f +

2aa{fk-x - f \ 9 - f*) + + a g - f*,/* - / ) •

Let g be independently drawn from the set { g i , . . . , g^} with P{g = gi} — j i . The average value of II a / , _ , + a ^ - / | | 2 _ | | r - / | | 2 i s E t z P II fk-i - r f II 9^ - r II' - f\9^ - D i=\ = II fk-x - r II' E 7 i II 9^ - r f + 2 a a ^ ' y i{ h- i - f\gi - / * ) i=l i=l N + 2 Y , ' y i { a f k - i + a g i - f * J * - f ) =_{a fk-i - r} IP _{- 2 { g i , n + \ \ r 11'))+®} \ i = i / N + 9 i - cx9i - f*J* - f ) / N

= a _{f k - i - r II' II 9i II' - II r II' + 2 a ( / , _ , - r , r - / )} \i=i

< a^ II f k - i - r II' + 2 a ( A _ i - r - / ) •

Since the average is bounded in this way, there must be a y G {g\,-- • ,gN} such that

II a / f c _ i + a ^ - / | | 2 - | | r - f f

< II fk-i - r II' + 2a{fk-i - /*, f* ' f )

« II A - i - II' + 2 ( A _ , - f \ r - / ) ] + a^b" II - r II' + 2 { h - i - r , r - / ) 1 +

= a

< a _(6.2)

since a G [0,1]. Noting that

i i / f c - i - / i i ' = i i / f c - i - r + r - /

we get

fk-x - / II' - II r - / f = 1 1 A - , - r f + 2 ( / , _ , - r , r - / ) .

Substituting into (6.2) and letting 5 go to 0, we get

inf II afk-, +ag-f ||2 -d) < a [|| - / ||2 -d)

₊

Setting fc = 1, a = 0 and /o = 0, we see that

inf II ^ - / f -d] < b^.

Hence the theorem is true for fc = 1. Assume as an inductive hypothesis that

2 ^ 4c

Then

inf II +ag~f jp -d) + e, < + a'b^ +

g&G ^ k - l {k + iy - k l i

Letting a = 1 - 2/{k + 1) = a = 2/{k + 1),

2 j2 4c 4(c - b^)

Then (6.1) follows directly. •

4c 4c

+

A ; + l (A;+1)2 4c

Recently, Koiran (1994) has independently obtained a similar result for iterative approximation when the target function is not in the convex closure of the set of functions. In (Koiran 1994), he obtained bounds of the form \\ f - f k - d j < ^ + x ^ ^ e r e C > yjb^ + dj. In comparison, our bound of O is asymptotically better than Koiran's bound of O . The constant in our bound is also independent of the target function, unlike the constant in his bound.

6.2 Equivalence in Efficient Learning

In this section we show that the class of single hidden layer neural networks with hidden units from an admissible class of basis functions is efficiently agnostically leamable if and only if the class of basis functions is efficiently agnostically leamable. This is done by using the iterative approximation result together with the agnostic learning algorithm of the basis function class as a subroutine to learn one hidden unit at a time.

We will need the following uniform convergence result which follows from Hoeffding's inequality (Hoeffding 1963) and the union bound.

Theorem 6.2 Let T be a finite set offiinctions on some set Z with 0 < f{z) < C for all f e T

In document Agnostic learning and single hidden layer neural networks (Page 60-65)