1995; Hellerstein, Pilliapakkamnatt, Raghavan, Wilkins
LISAHELLERSTEIN
Department of Computer and Information Science, Polytechnic University, Brooklyn, NY, USA
132
C
Certificate Complexity and Exact Learning Problem DefinitionThis problem concerns the query complexity of proper learning in a widely studied learning model: exact learn- ing with membership and equivalence queries. Hellerstein et al. [8] showed that the number of (polynomially sized) queries required to learn a concept class in this model is closely related to the size of certain certificates associ- ated with that class. This relationship gives a combina- torial characterization of the concept classes that can be learned with polynomial query complexity. (Similar re- sults were shown by Hegedüs[7], based on the work of Moshkov [11].)
The Exact Learning Model
Concepts are functionsf :X!YwhereXis an arbitrary domain, andY=f0;1g. In exact learning, there is a hid- den conceptf from a known class of conceptsC, and the problem is to exactly identify the functionf.
Algorithms in the exact learning model obtain infor- mation about f, the target concept, by querying two or- acles, a membership oracle and an equivalence oracle. A membership oracle forf answers membership queries, which are of the form “What isf(x)?” wherex2X(point evaluation queries). The membership oracle responds with the valuef(x). An equivalence oracle forf answers equiv- alence queries, which are of the form “Ish f?” whereh
is a representation of a concept defined on the domainX. Representation his called a hypothesis. The equivalence oracle responds “yes” ifh(x) = f(x) for allx2X. Other- wise, it returns a counterexample, a valuex2Xsuch that
f(x)¤h(x).
The exact learning model is due to Angluin [2]. Angluin viewed the combination of membership and equivalence oracles as constituting a “minimally adequate teacher.” Equivalence queries can be simulated both in Valiant’s well-known PAC model, and in the on-line mistake-bound learning model.
LetRbe a set of representations of concepts, and let
CR be the associated set of concepts. For example, if R
were a set of DNF formulas, thenCRwould be the set of
Boolean functions (concepts) represented by those formu- las. An exact learning algorithm is said to learnRif, given access to membership and equivalence oracles for anyf in
CR, it ends by outputing a hypothesishthat is a represen-
tation off.
Query Complexity of Exact Learning
There are two aspects to the complexity of exact learning, query complexity and computational complexity. The re- sults of Hellerstein et al. concern query complexity.
The query complexity of an exact learning algorithm measures the number of queries it asks and the size of the hypotheses it uses in those queries (and as the final out- put). We assume that each representation classRhas an associated size function that assigns a non-negative num- ber to eachr2R. The size of a conceptcwith respect toR, denoted byjcjR, is the size of the smallest representation of cinR; ifc62cR,jcjR=1. Ideally, the query complexity of
an exact learning algorithm will be polynomial in the size of the target and other relevant parameters of the problem. Many exact learning results concern learning repre- sentations of Boolean functions. Algorithms for learning such classesRare said to have polynomial query com- plexity if the number of hypotheses used, and the size of those hypotheses, is bounded by some polynomialp(m,n), wherenis the number of variables on which the targetf
is defined, andm=jfjR. We assume that algorithms for
learning Boolean representation classes are given the value ofnas input.
Since the number and size of queries used by an al- gorithm is a lower bound on the time taken by that al- gorithm, query complexity lower bounds imply computa- tional complexity lower bounds.
Improper Learning and the Halving Algorithm An algorithm for learning a representation classRis said to be proper if all hypotheses used in its equivalence queries are fromRC, and it outputs a representation fromRC. Oth-
erwise, the algorithm is said to be improper.
WhenRC is a finite concept class, defined on a finite
domainX, a simple, generic algorithm called the halving algorithm can be used to exactly learn Rusing logjRCj
equivalence queries and no membership queries. The halv- ing algorithm is based on the following idea. For any
VRC, define the majority hypothesis MAJV to be the
concept defined onXsuch that for allx2X, MAJV(x) = 1
if g(x) = 1 for more than half the conceptsg inV, and MAJV(x) = 0 otherwise. The halving algorithm begins by
settingV=RC. It then repeats the following:
1. Ask an equivalence query with the hypothesis MAJV.
2. If the answer is yes, then output MAJV.
3. Otherwise, the answer is a counterexamplex. Remove fromVallgsuch thatg(x) = MAJV(x).
Each counterexample eliminates the majority of the el- ements currently inV, so the size ofVis reduced by a fac- tor of at least 2 with each equivalence query. It follows that the algorithm cannot ask more than log2jRCjqueries.
The halving algorithm cannot necessarily be imple- mented as a proper algorithm, since the majority hypothe- ses may not be representable inRC. Even when they are
Certificate Complexity and Exact Learning
C
133 representable inRC, the representations may be exponen-tially larger than the target concept. Proper Learning and Certificates
In the exact model, the query complexity of proper learn- ing is closely related to the size of certain certificates.
For any conceptfdefined on a domainX, a certificate thatf has propertyPis a subset SX such that for all conceptsgdefined onX, ifg(x) = f(x) for allx2X, then
g has property P. The size of the certificate Sis jSj, the number of elements in it.
We are interested in properties of the form “g is not a member of the concept class C”. To take a sim- ple example, let D be the class of constant-valued n- variable Boolean functions, i. e.Dconsists of the two func- tions f1(x1; : : : ;xn) = 1 and f2(x1; : : : ;xn) = 0. Then if g is ann-variable Boolean function that is not a mem- ber of D, a certificate that g is not in C could be just a paira2 f0;1gnandb2 f0;1gnsuch thatg(a) = 1 and
g(b) = 0.
ForCa class of concepts defined onX, define the ex- clusion dimension ofCto be the maximum, over all con- ceptsgnot inC, of the size of the smallest certificate thatg
is not inC. LetX D(C) denote the exclusion dimension of
C. In the above example,X D(C) = 2. Key Results
Theorem 1 Let R be a finite class of representations. Then there exists a proper learning algorithm in the exact model that learns C using at most X D(C) logjCj queries. Fur- ther, any such algorithm for C must make at least X D(C)
queries.
Independently, Hegedüs proved a theorem that is essen- tially identical to the above theorem. The algorithm in the theorem is a variant of the ordinary halving algorithm. As noted by Hegedüs, a similar result to Theorem 1 was proved earlier by Moshkov, and Moshkov’s techniques can be used to improve the upper bound by a factor of 1/X D(C).
An extension of the above result characterizes the rep- resentation classes that have polynomial query complex- ity. The following theorem presents the extended result as it applies to representation classes of Boolean func- tions.
Theorem 2 Let R be a class of representations of Boolean functions. Then there exists a proper exact learning algo- rithm for R with polynomial query complexity iff there ex- ists a polynomial p(m, n) such that for all m;n>0, and
all n-variable Boolean functions g, if the size of g is greater than m, then there exists a certificate of size at most p(m, n) proving thatjgjR>m.
A concept class having certificates of the type specified in this theorem is said to have polynomial certificates.
The algorithm in the above theorem does not run in polynomial time. Hellerstein et al. give a more complex al- gorithm that runs in polynomial time using a˙P
4 oracle,
providedRsatisfies certain technical conditions. Köbler and Lindner subsequently gave an algorithm using a˙P 2
oracle [10].
Theorem 2 and its generalization give a technique for proving bounds on proper learning in the exact model. Proving upper bounds on the size of the appropriate cer- tificates yields upper bounds on query complexity. Proving lower bounds on the size of appropriate certificates yields lower bounds on query complexity and hence also on time complexity. Moreover, unlike many computational hardness results in learning, computational hardness re- sults achieved in this way do not rely on any unproven complexity theoretic or cryptographic hardness assump- tions.
One of the most widely studied problems in compu- tational learning theory has been the question of whether DNF formulas can be learned in polynomial time in com- mon learning models. The following result on learning DNF formulas was proved using Theorem 2, by bounding the size of the relevant certificates.
Theorem 3 There is a proper algorithm that learns DNF formulas in the exact model with query complexity bounded above by a polynomial p(m;r;n), where m is the size of the smallest DNF representing the target function f , n is the number of variables on which f is defined, and r is the size of the smallest CNF representing f .
The size of a DNF is the number of its terms; the size of a CNF is the number of its clauses. The above theorem does not imply polynomial-time learnability of arbitrary DNF formulas, since the running time of the algorithm de- pends not just on the size of the smallest DNF representing the target, but also on the size of the smallest CNF.
Building on results of Alekhnovich et al., Feldman showed that if NP6= RP, DNF formulas cannot be learned in polynomial time in the PAC model augmented with membership queries. The same negative result then fol- lows immediately for the exact model [1,6]. Hellerstein and Raghavan used certificate size lower bounds and The- orem 1 to prove that DNF formulas cannot be learned by a proper exact algorithm with polynomial query complex-
134
C
Channel Assignment and Routing in Multi-Radio Wireless Mesh Networks ity, if the algorithm is restricted to using DNF hypothesesthat are only slightly larger than the target [9].
The main results of Hellerstein et al. apply to learn- ing with membership and equivalence queries. Hellerstein et al. also considered the model of exact learning with membership queries alone, and showed that in this model, a projection-closed Boolean function class is polynomial query learnable iff it has polynomial teaching dimension. Teaching dimension was previously defined by Goldman and Kearns. Hegedüs defined the extended teaching di- mension, and showed that all classes are polynomially query learnable with membership queries alone iff they have polynomial extended teaching dimension.
Balcázar et al. introduced the strong consistency di- mension to characterize polynomial query learnability with equivalence queries alone [5]. The abstract identi- fication dimension of Balcázar, Castro, and Guijarro is a very general measure that can be used to characterize polynomial query learnability for any set of example-based queries [4].
Applications None
Open Problems
It remains open whether DNF formulas can be learned in polynomial time in the exact model, using hypotheses that are not DNF formulas.
Feldman’s results show the computational hardness of proper learning of DNF in the exact learning model based on complexity theoretic assumptions. However, it is un- clear whether the hardness of proper learning of DNF is a result of computational barriers, or whether query com- plexity is also a barrier to efficient learning. It is still open whether the class of DNF formulas has polynomial cer- tificates; showing they do not have polynomial certificates would give a hardness result for proper learning of DNF based only on query complexity, with no complexity theo- retic assumptions (and without the hypothesis-size restric- tions used by Hellerstein and Raghavan). It is also open whether the class of Boolean decision trees has polynomial certificates.
Certificate techniques are used to prove lower bounds on learning when we restrict the type of hypotheses used by the learning algorithm. These types of results are called representation dependent, since they depend on the restriction of the representations used as hy- potheses. Although there are some techniques for prov- ing representation-independent hardness results, there is a need for more powerful techniques.
Recommended Reading
1. Alekhnovich, M., Braverman, M., Feldman, V., Klivans, A.R., Pitassi, T.: Learnability and automatizability. In: FOCS ’04 Pro- ceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science (FOCS’04), pp. 621–630. IEEE Computer Society, Washington (2004)
2. Angluin, D.: Queries and concept learning. Mach. Learn.2(4), 319–342 (1987)
3. Angluin, D.: Queries Revisited. Theor. Comput. Sci.313(2), 175–194 (2004)
4. Balcázar, J.L., Castro, J., Guijarro, D.: A new abstract combina- torial dimension for exact learning via queries. J. Comput. Syst. Sci.64(1), 2–21 (2002)
5. Balcázar, J.L., Castro, J., Guijarro, D., Simon, H.-U.: The con- sistency dimension and distribution-dependent learning from queries. Theor. Comput. Sci.288(2), 197–215 (2002)
6. Feldman, V.: Hardness of approximate two-level logic mini- mization and pac learning with membership queries. In: STOC ’06 Proceedings of the 38th Annual ACM Symposium on The- ory of Computing, pp. 363–372. ACM Press, New York (2006) 7. Hegedüs, T.: Generalized teaching dimensions and the query
complexity of learning. In: COLT ’95 Proceedings of the 8th An- nual Conference on Computational Learning Theory, pp. 108– 117 (1995)
8. Hellerstein, L., Pillaipakkamnatt, K., Raghavan, V., Wilkins, D.: How many queries are needed to learn? J. ACM.43(5), 840– 862 (1996)
9. Hellerstein, L., Raghavan, V.: Exact learning of dnf formulas us- ing dnf hypotheses. J Comput. Syst. Sci.70(4), 435–470 (2005) 10. Köbler, J., Lindner, W.: Oracles in sp2are sufficient for exact
learning. Int. J. Found. Comput. Sci.11(4), 615–632 (2000) 11. Moshkov, M.Y.: Conditional tests. Probl. Kibern. (in Russian)40,
131–170 (1983)