• No results found

Concept Induction by Refinement-Based Search

3.4 Learning in DL Knowledge Bases

3.4.3 Concept Induction by Refinement-Based Search

While Algorithm 1 demonstrated the basics of a generate-and-test search approach, it did not define how to generate concepts for testing. We now describe how this can be achieved through the use of refinement operators as described in Section 3.4.2. Algorithm 2 demonstrates this using a downward refinement operator,ρ.

Algorithm 2A basicgenerate-and-testmethod which uses a refinement operator ρ to

search the space of concepts (L,v), and a quality function Q to assess hypothesis performance over the dataset E. A frontier listL is maintained for all candidates to be searched.

1: L= [>] .The hypothesis frontier of candidates

2: S=∅ .The set of solutions

3: whilelength(L)>0do

4: C:= pop(L) .RemoveCfrom L

5: ifQ(C,E) =true then

6: S:=S∪ {C} .HypothesisCis a solution

7: else

8: for allC0 ∈ρ(C)do

9: push(C0,L) .Add refinementC0 to L

10: end for

11: end if

12: end while

Algorithm 2 demonstrates how a refinement operator can be incorporated into the search to learn hypotheses. This is a general purpose algorithm which searches a space of concepts inLfrom the top concept> and progressively specialises expres- sions added to a list. If, on line 9, a candidate concept is added to the head the of list, the search proceedsdepth-first, whereas if it is added to the tail of the list, the search proceedsbreadth-first.

The structuring of the search space permits certain quality functions to have the property that if a hypothesishdoes not pass a quality test, then neither will all of its (upward, downward) refinements. This property is known as(anti-)monotonicity.

Definition 3.4.9. (Monotonicity, Anti-monotonicity) For all hypotheses C,D ∈ Land the set of all examplesE, a quality functionQis known asmonotoniciff

∀E⊆ E :(CvD)∧ Q(D,E)→ Q(C,E)

andQis known asanti-monotoniciff

∀E⊆ E :(CvD)∧ Q(C,E)→ Q(D,E)

[23].

(Anti-)monotonic functions are useful as they permit us to prune potentially large parts of the search space away. Once a hypothesishfails by an (anti-) monotonic qual- ity function, then we can safely exclude, orprune, all (specialisations) generalisations of h from the search by not considering (downward) upward refinements of h. An example of an (anti-)monotonic quality function isrelative frequency.

Definition 3.4.10. (Relative Frequency) Given a hypothesis C and a set of examples E, relative frequencyis defined as relFreq(C,E) = |cover|E |(C,E)| [23].

Example 3.4.11. Consider the quality function relFreq(C,E) ≥ t where 0 ≤ t ≤ 1 as an anti-monotonic quality criterion for downward refinement. Consider an example where t = 50, and two hypotheses C,D ∈ L where |cover(C,E)| = 49. Therefore, C fails the quality function as it does not cover enough examples. By Definition 3.4.9, we know that all concepts D ∈ ρ(C) refined down from C will never cover more examples than C, so all refinements of C may be excluded from the search.

By adding (anti-)monotonic quality criteria to a refinement-based search algo- rithm, we stand to improve the efficiency of the search by excluding hypotheses which can never be considered solutions. Unfortunately, the space of concepts may still be vast even with such pruning, so any frontier list of hypotheses candidates such as that maintained in Algorithm 2 may still grow infeasibly large. One well- known method for dealing with this problem is to simply fix the maximum size of the frontier list, known asbeam search. A beam searchapproximatesthe search over all concepts reachable by some refinement operatorτby restricting the search to within

§3.4 Learning in DL Knowledge Bases 37

a set of candidates. Typically, the restricted size frontier (known as the beam) is only populated with new hypotheses deemed the best relative to the set of refinements of all hypotheses currently maintained in the beam. When the beam is of infinite width, beam search is equivalent to breadth-first search, or best-first search if the hypothe- ses are ranked within an infinitely sized beam. In order to rank hypotheses autility

function is often used, which is often also called aheuristic evaluation function.

Definition 3.4.12. (Utility Function)A utility function u : L ×S 7→ R maps a pair

(C,E) where C is a concept expression C ∈ L for some language L together with a set of examplesE ∈S and maps it to a real number inR. A utility function represents the value of a concept in a learning problem relative to the examples it describes from E and can be used to rank concepts C,D such asu(C,E)< u(D,E)which indicates that concept D is preferred over C. Utility functions are often based on measures, for example accuracy (Definition 3.3.3) or relative frequency (Definition 3.4.10).

By ordering elements of a beam relative to a utility function u, we can maintain the list of current best n candidates of the search. In this way, u acts as a heuristic

by permitting the search to proceed into parts of the space of concepts deemed most likely to contain solutions to the exclusion of other parts. Depending on the size of the beam and behaviour of the heuristic, a search may reach solutions faster, yet it may also exclude subsets of concepts from the space which contain the best solutions. This is why such methods are known to be approximate, as they are not complete, and may inadvertently confine a search into a sub-space of concepts where the best solutions are not present, as illustrated in Figure 3.3 where a search may become trapped in sub-optimallocal maxima.

A basic beam search algorithm by downward refinement which maintains a set of best hypotheses relative to a utility function u is shown as Algorithm 3. This algorithm also incorporates an anti-monotonic quality function Q to determine if a refined hypothesis should be added to the frontier, or pruned. Note that while accuracy (Definition 3.3.3) can be used to rank hypotheses as a utility function, it is neither monotonic nor anti-monotonic, so cannot be used for pruning hypotheses from a search, unlike relative frequency.

One method of mitigating the risk that Algorithm 3 becomes trapped in a search space around a local maxima is to introduce randomness in the search. One such method for achieving this is known as stochastic beam search. In stochastic beam search, the reinitialisation of the next beam (Lines 14 to 19) is modified to select candidates at random with a probability which is proportional to a function of their

Algorithm 3A basic best-first beam search with downward refinement operator ρ to

search the space of concepts(L,v)relative to examples E, where uis a utility func- tion ranking better hypotheses with larger values, and whereQis an anti-monotonic quality function assessing if hypotheses can be considered solutions. The maximum beam width is denoted bybmax.

1: B:={>} .The hypothesis frontier beam of search candidates

2: S:=∅ .The set of solutions

3: while|B|>0do .While the frontier beam is non-empty

4: E=∅ .Initialise the expansion set

5: for allC∈ Bdo

6: for allD∈ρ(C)do

7: ifQ(D,E) =truethen .HypothesisDis a sufficient candidate

8: S:=S∪ {D} .Capture solutionD

9: else

10: E:=E∪ {D} .IncludeDin the expansion set

11: end if

12: end for

13: end for

14: B:= ∅ .Reinitialise the beam

15: while|E|>0 and|B|<bmaxdo

16: D∈arg maxDEu(D) .Arbitrary best refinement 17: E:=E\ {D}

18: B:=B∪ {D} .IncludeDin the next beam

19: end while

§3.4 Learning in DL Knowledge Bases 39

Figure 3.3: A graph where the curve represents the space of all hypotheses (hori- zontal axis) against their performance (vertical axis). An algorithm (such as a beam search) which limits the search to the shaded region may only find hypotheses at

local maximum Las being best, and will fail to locate the best solution(s) at the global

maximum G.

utility. A common method is to use the Gibbs distributione−cT(D) for a conceptDand

some valueT ∈Rwhere thec:L 7→Ris acostfunction, and may be based on a util- ity functionu. This distribution reflects the intuition that stronger hypotheses should be selected with greater probability than weaker ones. Stochastic refinementis another approach which incorporates such random selection directly into the behaviour of a refinement operator which refines to new candidates with certain probabilities [98].