2.2 Experience-based Learning
2.2.4 Cognitive Approaches
This section reviews two approaches of a more cognitive nature stemming from Artifical Intelligence AI. Still being based on own experience, they provide mechanisms to make the agent aware of different conditions in the environment.
CLARION The cognitive architecture CLARION (e.g. Sun and Slusarz 2005) was designed to capture implicit and explicit learning processes in humans. The main assumption is that there are two different levels of learning: A subsymbolic ‘bottom’ level and a symbolic ‘top’ level. The ‘bottom’ level represents low-skill, often repetitive tasks for which learning proceeds in a trial-and-error fashion. Knowledge on this level is typically not accessible, and it is difficult to express such skills with language. On the symbolic level, knowledge is directly accessible and can be expressed with language. This level typically represents more complex knowledge. It can be acquired by experience, but also by explicit teaching.
The input state is made up of a number of dimensions, and each di- mension may specify a number of possible value or value ranges. Action selection takes place using RL in the bottom level, or by firing production rules on the top level. Which level is used is determined stochastically. Af- ter the action was performed, top and bottom levels are updated with the feedback received from the environment.
At the bottom level, the RL mechanism is implemented with a neural net. The input layer is constituted of the values of the input state. Three in- termediate layers are used to compute Q-values (allowing memory of action sequences), while the fourth layer chooses an action according to standard reinforcement learning (similar to equation (2.10)).
At the top level, the rule conditions are constructed out of the input dimensions, their consequents from actions available to the agent. The rules are, for compliance with the bottom level, implemented as network. Rule extraction, specialisation and generalisation are determined by feedback from the subsymbolic level: If there is no rule matching the current state and the action performed well according to some performance criterion, a new rule is created with the current state as the condition, and the performed bottom level action as consequent. If rules matching the current condition exist and the action was successful, the matching rules are replaced by a generalised version by adding another input element to the condition. The covered rules are deactivated, but might become reactivated if specialisation is applied to the new rule at a later stage. Conversely, specialisation means the removal of an input value from the condition and is triggered when the result of an action was not successful in the specified condition. Deactivated rules are reactivated if the specialised rule does not cover them any more. An information gain measure that estimates the performance of rules under different conditions serves as the success criterion.
This model is applied inSun and Naveh(2007) to a ‘stone-age economics’ simulation in which agents belonging to a group collect and contribute food. An agent might cheat and not contribute, which is punished with some probability. They show that their adaptive agents are able to reproduce results of the same model with more deterministic strategies investigated before (Cecconi and Parisi 1998). They also investigate the properties of the emerged survival strategies. For example, it turns out that relying more strongly on the top level enhances performance, and that higher probabil- ities of rule generalisation are beneficial only when less importance rests with the bottom layer.
Learning Classifier Systems Learning Classifier Systems (LCS) also aim at the extraction of rules. The basic idea is to start with a set of initial rules (classifiers) and to evolve this set over time by application of mechanisms for modification, deletion and addition of new rules. Whereas earlier LCS, as introduced by Holland (1975), relied mostly on the Genetic Algorithms paradigm, newer versions have more in common with RL ap- proaches and so have also been described as generalised RL (Sigaud and Wilson 2007).
An LCS consists of a population of classifiers. A classifier contains a condition part, an action part, and an estimation of the expected reward. Typically, the condition part consists of the three basic tests 0 (property does not exist), 1 (property exists) and #. # represents a generalisation and stands for both 0 or 1. A classifier has one action as a consequent, but typically several classifiers match a condition in the environment and hence compete with each other. The action to be executed is then selected according to some RL mechanism (e.g., the ϵ-greedy policy, which selects the best-performing action at a rate of ϵ, 0 < ϵ <1 tries a random action). Many LCS use a Genetic Algorithm to create new rules by selecting and recombining the fittest classifiers from the population (where fitness is, e.g., the expected reward received from the environment). A covering operator is called whenever the set of matching classifiers is empty. The operator adds a classifier matching the current situation with a randomly chosen action to the population. Sophisticated systems may limit the population size, and add corresponding eviction and generalisation procedures.
Newer families of classifier systems, like anticipation-based classifier sys- tems (ACS, Butz (e.g 2002)), do not rely on evolutionary methods. They extend the classifier representation with the description of the next state and
build a model of transitions. A specialisation mechanism is applied when the classifier oscillates between correct and incorrect predictions, indicating that a splitting of the condition might improve the match. Generalisation is based on complex algorithms that estimate whether generalisation will re- sult in an improvement (see also Sigaud and Wilson (2007) for an overview of LCS).
Applications in Economics have usually used Holland-type classifiers. Markets of different kinds have been modelled using LCS, for example, the market for electricity (Bagnall and Smith 2005), for fish (Kirman and Vriend 2001b), or stock markets (e.g. LeBaron et al 1999).
In Bagnall and Smith (2005), the UK electricity market is modelled. In the model, there are a number of electricity generating agents. Each agent must produce an offer bid per day for the amount of electricity it wants to produce. The strategies are determined by three factors - capacity constraints, demand and capacity premiums (for particular time slots in a trading period). By this, a 10-bit vector of states, denoting different demand, constraint and premium situations is constructed. The model is used to model various scenarios. For example, they reproduce actual, observed bidding behaviour.
Kirman and Vriend(2001b)’s model represents a wholesale fish market, in which buyers and sellers are matched. Buyers resell the fish, and their payoff is given by the difference of the prices they pay and a fixed price they receive. Analogously, sellers’ profit is determined by the difference of their costs and the selling price. Classifiers are used for several decisions, such as deciding stock levels, or buying and selling prices. Furthermore, buyers may become loyal by choosing to return to a seller; sellers remember their customers and may reward loyalty by lowering their ask price. It turns
out that loyalty develops as buyers and sellers realise simultaneously the benefits: Returning customers allow better planning of a seller’s stock and continuous profit flow, for which lower prices are accepted; because of these, customers learn to return.
The stock market model ofLeBaron et al(1999) aims to reproduce actual stock market behaviour in an artificial stock market. In the market, there are trader agents whose task is to make forecasts about the future price of assets. The expected price is used in their demand functions, which then determines the amount of assets to purchase. The agents base their forecasts on hypotheses or candidate rules, of which a single agent maintains 100. These rules map conditions of the environment into forecasts. The state vector is 12 bits long. The conditions are given by dividend/price ratios and comparisons between current price and average prices, which describe the value of an asset given the market conditions. LeBaron et al(1999) are able to reproduce features of price time series taken from real markets.
Summarising, LCS are a way to represent learning where the environ- ment is dynamic and unclear which possible rules are best for the agent’s performance. They are, in principle, a directed search among candidate rules: Starting from a large set of possible rules, those are selected that perform best in the environment the agent is in. Weaknesses of LCS have been handled in the newer approaches - for example, by modelling state transitions. However, the mechanisms when to apply generalisation and specialisation are complex. In this sense, LCS can become relatively ’heavy’ models of mental processes. It has been suggested that using simpler RL methods is sometimes easier and better tractable (e.g. Holland et al 2000).