Attention Module - Cognitive Model Architecture

5.2 Cognitive Model Architecture

5.2.5 Attention Module

Returning to the conceptual model, it is clear that perception is a key gateway within the model. The standard perception model handles awareness- a listing of entity and affordances. An agent’s stress level, physiology, emotions, and learning all affect this vital process. By default, a PMFServ agent perceives all the entities and affordances in its environment and evaluates them. The attention model places a filter over this process, limiting the number of entities and actions that can be evaluated. It also calculates an attentional salience factor. This salience function calculates the level of attention focused on some other agent performing an action which has certain results. This salience determines the probability that an event in the environment will be noticed. A noticed action has the ability to generate emotions and to allow an agent to learn a new affordance.

Table 5.6: Theories Implemented in Attention Model

Theory Source Implementation

Affordances J. J. Gibson(1986) Composite

Novelty James(1890) Direct

Repeated Exposures Ray and Sawyer(1971) Emergent

Selection Simons and Chabris(1999) Direct

Motivation Fazio et al. (1994) Composite (Partial) Salience Treisman and Gelade(1980) Composite (Partial)

The attention model is a composite of smaller models implementing the subcomponents of salience. The constituent attention theories for this model are displayed in Table 5.6. Salience determines the likelihood of observing an event, relative to other events occurring simultaneously. Submodels for attention calculate the motivation, novelty, and selection factors for an event. Each of these models is implemented only to handle attention to semantics- it is assumed that

the physical properties (syntax) are equally noticeable. Models for signal quality, duration, and frequency are not implemented at this time.

Novelty Model

Novelty is a theoretical construct that indicates how “new” a stimulus appears (James,1890). Novelty and familiarity would seem to have an inverse connection, with respect to exposures (Johnston, Hawley, Plewe, Elliott, & DeWitt, 1990). To harness this, the novelty model accesses a record of the number of exposures for each action and agent over time and calculates a novelty factor based on the level of familiarity. The novelty model accomplishes this by reading from the memory model, which has functions to count the number of exposures and to calculate a familiarity value. This familiarity value will be explained later in the section on memory models, Section 5.2.3. For any given event, the novelty is calculated as the RMS of the familiarity values of the actor of the event and the action of the event. The novelty calculation for an event is shown in Equation5.5, wherefActor is the familiarity of the event’s actor andfAction is the familiarity of the event’s action according to the memory model.

N ovelty(Event) =p0.5((1−fActor)2+ (1−fAction)2) (5.5) This representation was chosen because it allows a high degree of novelty if either component is novel. This dynamic was chosen because it allows representation of processes such as dishabituation, where adding an additional stimulus can restore responding to a habituated (familiar) stimulus. In this context, the response of interest is active attention. This implementation allows a return to novelty when a highly familiar person suddenly engages in a totally new action. Conversely, if a straight average was used, then a completely familiar person could be at most 50% novel. Alternatively, taking the maximum novelty component would go too far in the opposite direction: giving no additional novelty to a new person doing a new action as opposed to a new person doing an old action. While a root mean square may not be the best representation for combining these terms, it parsimoniously represents these important dynamics within the simulation.

Repeated Exposures Model

Numerous studies have shown the cumulative impact of multiple exposures and repetition on the cumulative likelihood of attention and impact of persuasive messages (Ray et al.,1971;Ray & Sawyer,1971). FromRay and Sawyer(1971), it can be observed that across experiments the recall probability of a message tends to have its highest increase with the first exposure. The next 5 subsequent exposures to an advertisement have less impact and tend to either have equal

impact (linear curve) or decreasing impact (sigmoidal). The next exposures tend to either result in nearly full recall (hit the upper bound) or have minimal contribution to recall. TheEbbinghaus(1913) learning curve takes on a sigmoid- type function, so this is assumed to be the family of curves that repetition takes on with respect to recall (due to some combination of attention and learning). The persuasive impact of messages is a more complicated issue because it appears to be a function of the persuasiveness of the message. Some messages appear to have little impact, regardless of the number of exposures, while others increase as a function of exposures. This seems to indicate that the impact of repeated exposures is dictated by processing of the content, and not necessarily due to familiarity with the message.

While these represent an increased cumulative impact, empirical studies do not indicate repeated exposure effects that cannot be otherwise explained by other cognitive components. The persuasion of a message appears to be largely dictated by its content and processing, while learning it is modeled by other parts of the agent cognitive model. As such, no explicit repetition model was implemented since its key dynamics are present in the memory model and the novelty model. The memory model captures a record of attended and stored exposures for each agent. The novelty model provides a decreasing impact for each additional exposure, capturing one typical dynamic of repetition on learning. Through these dynamics, the effects of repetition should emerge: greater total familiarity with the presented message and decreased impact of additional exposures. Selective Attention Model

Selective attention is a construct that refers to the additional probability of perceiving events performed on an object that an agent actively perceives, as opposed to other peripheral events (Simons & Chabris,1999). Selective attention is implemented by having agents keep a record of the objects and agents they are actively attending at the current time. PMFServ agents are able to actively take actions on other agents, including actions of active perception (watching). As such, the selective attention model records all entities that an agent is currently engaged in action upon. This means that selective attention is focused on any targets being watched or acted upon by an agent. This allows agents to choose who will be the target of their selective attention, as is observed in the cocktail party effect (Cherry,1953).

SelectiveAttention(x) = ( 1 N ifx∈XT argeted 0 ifx /∈XT argeted (5.6) If an agent is allowed to engage in multiple actions simultaneously, their total selective attention is spread evenly across those objects. Equation 5.6 displays the selective attention focusing calculation, where XT argeted represents the set

of all entities targeted by an agent’s actions, N is the number of entities in XT argeted, and x is some entity from the simulation. At present, no mechanism exists to preferentially apply selective attention to certain agents or objects. In the simulated scenarios explored later, agents are only able to engage in one action at a time so selective attention will always be fully focused on one entity. Motivated Attention Models

Motivated attention is a construct that refers to the additional attention given to events that correspond with the needs, wants, and other motivations of an agent (Fazio et al.,1994). Motivation is the most complex submodel of salience. It calculates a motivation factor based upon the characteristics of the action as compared to the agent’s current state. Motivation has two components in this implementation: outcomes (central) and social (peripheral). The outcomes from the action can be motivating, such as seeing someone eat when you are hungry. The social component would be the motivation to watch someone eat because you enjoy their company. Outcome motivation is calculated as a congruence between the agent’s current needs on their GSP and the activations from performing the action. The social components use social influence terms which have already been discussed earlier (conformity, similarity, valence, authority, in-group, reference group). All factors of motivation are taken as having an independent impact, following the design decision to keep the model simple where empirical interactions are unknown.

The central motivational cues are handled by allowing agents to analyze the outcomes of events which have occurred. As noted earlier in Section5.2, agents evaluate their potential actions based upon “activations” that determine the attractiveness of that action, as mediated by their values and beliefs. To calculate a factor for motivated attention, an agent processes an event that results from some other agent’s action. In processing this event, the agent calculates the subjective emotional utility for themselves had they been the actor in that event and the outcomes were the same. So, for example- if agent B is eating a sandwich, the motivational salience for agent A is a function of the subjective benefit (or harm) for agent A eating a sandwich. This motivated attention does not consider if the action or outcomes of the observed action are possible.

Equation5.7displays the central motivated attention calculation for an agent observing a given event (Note: the ‘sgn’ symbol represents the sign function, producing -1 for negative values and 1 otherwise). SEUEvent represents the subjective expected utility of activations that the perceiving agent would receive had they been the actor in that event and the outcomes were the same. Two adjustments are made to the raw utility value in order to calculate the motivated attention factor. One adjustment rescales the value from between [-1,1] to fit into [0,1].

M otivatedAttention(Event) = 0.5∗(1 +sgn(SEUEvent)(|SEUEvent|0.25)) (5.7) The second rescaling factor takes the fourth root of the absolute SEU value. This factor was introduced during model calibration due to the very small range over which SEU can realistically operate within PMFServ. An SEU of 1.0 would indicate that an agent went from a completely neutral state to a state of full satisfaction of all its goals, standards, and preferences. In practice, such a huge swing would almost never be observed. This calibration tweak was introduced to spread the range of motivated attention so that smaller changes in SEU would still have some impact on the motivation pay attention to an event. Rescaling was necessary since in experimental studies, even modest changes in motivation such as hunger resulted in significant changes in attention (Fazio et al.,1994). A linear weight was not acceptable, since this would lead to clipping the range of SEU for the purposes of motivation (high motivation and very high motivation would have the same impact). As such, a calibration exponent was calculated from the Stanford Prison scenario which allowed the maximum possible utility changes to span a range between [0.15, 0.85] for the central motivated attention factor. Unfortunately, since motivation does not have a standardized unit or scale, there was no way to calibrate this parameter in a more methodological manner. For a follow up model, this would be an area that would benefit from additional empirical data.

Attentional Salience

Salience is used to calculate the probability that an action is receives enough attention to be processed cognitively. This is accomplished by first calculating a salience for each event occurring during a time step. An additional salience term exists which represents inattention salience: the salience of background events not simulated that might be attended to instead of the simulated events. This vector of saliences is normalized to form a probability vector, from which a finite number of events are chosen. Each event is chosen without replacement, except for inattention which always remains an option. The probability distribution for choosing an event to attend is shown in Equation 5.8, where E is the set of all simultaneously observable events, EAtt is the set of already attended events, se is the salience of an individual evente, andsI is the inattention salience.

P[e=Attended] =        se sI+ P e∈E\_EAttse ife∈(E\EAtt) sI sI+Pe∈Ese No Event Attended 0 ife∈EAtt (5.8)

The algorithm for drawing the set of attended events is displayed in Algorithm 5.6, whereN is the maximum simultaneous events attended, E is the set of all simultaneously observable events, and X(E, EAtt) is a random variable with a distribution defined by Equation 5.8. The output of this algorithm is EAtt, the total set of attended events. If an inattention term is selected, it is ignored and one less total event will be attended. This attention algorithm is effectively an iterated drawing from the yet-unattended events, with a constant probability of no event being attended. This corresponds loosely to a series of winner-take-all competitions for attention between events, a process which has some support in neurological research (Lee et al., 1999). These events are processed by the learning model, which can learn new affordances.

Figure 5.6: Attention Algorithm EAtt ={ }

for i= 0 to N do

ATTENDED EVENT = X(E,EAtt)

if ATTENDED EVENT != No Event Attendedthen

EAtt =EAtt ∪ {ATTENDED EVENT} end if

end for

While the parameters used to calculate attentional salience and their basic curves are known, no data exists to define their relative strengths or appropriate combination. To accommodate this uncertainty, multiple classes of functions with different weight parameters are available within the model. By examining the studies that define these parameters as impacting recall of events and/or messages, a linear weight was estimated for each component which represents the slope of change between the high condition and the low condition in the experiment. For example if the high authority condition resulted in a 0.3 increase in probability of recall, this was chosen as the linear weight. Alternatively, for those factors which do have experimentally derived curves (conformity), the curve slope was used instead. All factors were normalized to fit the range [0,1].

Attentional salience is calculated as a function of attention and social influence terms previously defined. These factors are novelty, centrally motivated attention, selective attention, transferability, authority influence, conformity influence, similarity influence, valence influence, ingroup influence, and reference group influence. Each parameter is combined using a linear weight that determines its contribution to the total salience for an event. As such, the attentional salience for an event e is determined by a function as shown in Equation 5.9. The w factors represent the weight given to each factor. This form of equation was chosen as it was the simplest possible combination that would capture the information operationalized from the social science findings and theories.

se=Salience(e) =w0·Novelty(e) +w1·MotivatedAttention(e)+

w2·SelectiveAttention(e) +w3·Transferability(e)+

w4·Authority(e) +w5·Conformity(e) +w6·Similarity(e)+

w7·Valence(e) +w8·InGroup(e) +w9·ReferenceGroup(e)

(5.9) Table 5.7 notes the weights for each factor, as well as the source used to help initialize these weights. The “Process” column in Table 5.7refers to if the component is Central (depends on the specific event), Peripheral (depends on more general context), or Mixed (combination of both).

Table 5.7: Event Salience Component Weights

Component Assumed Weight Source Process

Authority 0.33 Mantell(1971) Peripheral

Conformity 0.34 Tanford and Penrod(1984) Peripheral

In-Group 0.30 Tajfel(1982) Peripheral

Motivation (central) 0.47 Roskos-Ewoldsen and Fazio(1992) Central

Novelty 0.21 Johnston et al.(1990) Mixed

Reference Group 0.30 Kameda et al.(1997) Peripheral

Selective Attention 0.32 Simons and Chabris(1999) Mixed

Similarity 0.47 Platow et al.(2005) Peripheral

Transferability 0.10 Bandura(1986) Central

Valence/Halo 0.38 Hilmert et al.(2006) Peripheral

Each of these weights was inferred from examining the related paper, as noted in Table 5.7. The weights are intended as a “best guess” estimate of the importance of each factor with respect to social learning, due to their observed effect on either attention, perception, or retention. First, the input and output variables of interest were determined. Second, the form of the empirical relationship was determined, to the level of the paper’s presentation (ex. correlation, slope, function, etc). The third step was to estimate amount that the input could affect the output, if known. Last, each relationship was normalized so that the input variable ranged between 0 and 1. From these, the salience weights were defined. More information on how these weights were initialized is given in AppendixF. These weights are not intended to be taken as reliable estimates of the relative importance of factors, but were estimated to try to capture major differences between importance of factors.

The limitations to this approach are significant but unavoidable. Firstly, the experiments which prove these factors are important do not generally establish minimum or maximum values for their inputs. Even at the theoretical basis, it

is difficult to establish criteria for what constitutes the maximal or minimal level of authority that a person is perceived to have. Secondly, there is no assurance that these factors work linearly or independently. While this attempt at a linear approximation was workable for this research, a better functional combination could be necessary for more in-depth study.

Despite the limitations and caveats to the attentional salience calculation approach, it incorporates the directionality and known functional characteristics of the underlying empirical studies. This provides some insight into how various factors may interact and produces some interesting results that will be noted in Section7.

Additionally, social learning of affordances is straightforward using this cognitive framework. It requires three conditions: an affordance available to all agents, a set of agents aware of the affordance, and a set of agents unaware of the affordance. When agents choose to perform an action, the OODA loop for each observer evaluates if social learning of the affordance is appropriate. Any affordance in PMFServ can be treated as a meme using this system, without any changes to the affordance.

In document Modeling Memes: A Memetic View of Affordance Learning (Page 99-106)