Position C - The Application of Classical Conditioning to the Machine Learning of a Commonsense

To explain the reasoning for this position, first a number of measures for the worth of a particular framework shall be presented, followed by a discussion of a selection of frameworks in terms of those measures. The list of frameworks discussed is exhaustive in neither the number of frameworks nor the depth to which each framework is discussed to argue that predicate² logic is the best with complete authority; hence this is merely a position taken by this thesis rather than any stronger sort of assertion. As with the other positions, dif-ferent weightings of the advantages and disadvantages could lead to others to take a different viewpoint and the reasoning below merely reflects the posi-tion of this thesis. Note that in order to keep the length of this secposi-tion from being far longer than would be in line with its relevance, the descriptions and definitions of each of the frameworks are assumed to be known by the reader.

2In this thesis, predicate logic primarily refers to first-order logic. However, as this thesis does not make use of the quantification elements of first-order logic, the more general term “predicate logic” has been used. In terms of position C, it is believed that quantification will ultimately be needed even though it is not used in this thesis.

There are four measures this thesis has used to determine the worth of a particular framework:

1. Expressiveness. This refers to what limitations (if any) the framework has in expressing a model of an environment.

2. Computational efficiency. Computational efficiency refers to how much memory space a model of a given fidelity expressed in the frame-work takes up and how quickly a model’s predictions can be retrieved, added, removed or modified.

3. Inferential capacity. This is how easy it is for the framework to be used to extrapolate from the environmental model that has been pro-vided explicitly.

4. Human readability. This measure of human readability is how easy it is for a human to understand how the model reflects its environment and how easy it is for a human to manually modify the model.

Four frameworks are discussed: Predicate logic, neural networks, automata and Markov models. Each of these frameworks is Turing-complete and so in theory is able classify any input that a human can classify, assuming the Church-Turing thesis is correct. This means that any of the frameworks are able in theory to express any model of the environment that a human can.

This may not be the case in a practical sense however.

Predicate logic is arguably the most expressive in practical terms of the three frameworks, as it is able to explicitly express all parts of an environ-mental model, or as they are discussed in the literature, theories (with the term model referring to a structure that satisfies a theory). Theories created within predicate logic do however vary in efficiency, dependent on the theory’s complexity, but for real-world cases is arguably better than either of the other methods discussed. Predicate logic allows for considerable inferential capacity through its rules of deduction. This system of knowledge representation does also allow for a trained human to be able to both read and write knowledge in this format with ease due to the high degree of modularity in how the knowl-edge is represented. Alonso (2002) suggested a further benefit of predicate logic, arguing that because predicate logic is able to be easily understood by humans, this allows for agents based on predicate logic to be safer than other forms of knowledge representation.

Unlike the other three formalisms, it is less widely known that some types of neural networks are Turing complete (McCulloch & Pitts, 1943; Hy¨otyniemi, 1996). This fact is less useful in practice however, as it requires a complex network for even simple Turing machines. As a practical framework, neural networks are able to represent concepts that involve continuums in an efficient manner – by encoding real-valued inputs as input nodes. Neural networks are however less efficient at representing binary relationships between inputs (i.e. those relationships that exist or don’t exist), as the only way to represent

a relationship between a pair of inputs is by applying a high/low weighting to every possible pair of inputs to represent whether a particular pair has that property. Neural networks are also relatively poor at allowing for the knowledge to be created within their structure as concepts can only be input into the system as examples, rather than general rules. The number of needed examples could be very high for the system to correctly identify a particular concept. Once a network is trained on a concept it is able to identify other unseen examples of a concept fairly robustly, so does demonstrate a reasonable level of inference. Neural networks are poor for human readability and the ability to be manually changed, since individual neuron weights affect the classifications in non-linear ways that are hard for a human to predict.

While automata with the addition a reversible tape for memory are by definition Turing complete, the expressiveness of automata is severely curtailed in practice. This is because of the massive proliferation of states automata require to approximate any continuum or reflect any uncertainty. Automata are a very good tool for matching particular specific patterns though, and so were a viable candidate for this thesis. For concrete pattern matching, automata are also the most efficient of all the systems described, due to only needing to record one state plus any stack or tape memory with the time taken for a single input being constant. The efficiency of changing the knowledge reflected in an automaton is better than for a neural network because it can be done without needing a large set of examples of the change, but is not as efficient as predicate logic because new knowledge necessitates at least some change in the existing structure, which is not required by predicate logic, due to the modularity inherent in simply adding an extra rule to the existing theory. Due to its discrete nature, automata cannot easily represent any form of knowledge best encoded as a real-valued property. This implies that it is inefficient to create automata that recognise patterns where a wide range of input values can cause transition to the same state. Automata have a low inferential capacity because of there being no mechanism to allow for extrapolation based on existing encoded knowledge. Human readability of automata is better than for neural networks, but is far behind predicate logic.

Markov models share a number of similarities to automata in terms of representation of knowledge; such as the need to change existing structures to accommodate new knowledge, though for some of the knowledge (such as transition probabilities) the change is minimal. One main difference, due to the probabilistic transitions in states, is that there is at least some knowledge that is best encoded as a real-valued property that can be feasibly represented by the system. The inferential capacity is greatly improved upon, with knowledge such as being able to infer the likelihood of a particular pattern occurring. The human readability of such a system is again, similar to that of automata.

In document The Application of Classical Conditioning to the Machine Learning of a Commonsense Knowledge of Visual Events (Page 22-25)