Entity/Network Layer - Cognitive trait model for adaptive learning environments : a thesis pres

{lo ... l';

{ho ... hn}

Figure 5-3: Two-layered view of dichotomic property of node and entity

The attributes in all nodes N can be split into two groups: L and H. L and H each

represent one of the dichotomic properties of the entity, for example L for low

working memory capacity, and H for high working memory capacity. It is therefore a

2-layered (Node Layer + Entity Layer) representation of an entity which is dichotomic

in nature (see Figure 5-3).

Activation of any attribute is independent to other attributes including the other attribute in the same node. Only activated attributes affect the result of the execution of the MPN. Feedback at the end of an execution is used to change the percentage value of attributes. A node is labelled activated in an execution if at least one of its

attributes (i.e. h or I of the node) is activated. For each of the groups, L and H, the

weight of activated node multiplied by the attribute (in percentage) of the group are

summed by a method call Inclusion Resolution which will be explained in detail later

in this chapter. The group with higher sum is labelled the winning group, and the group with lower sum is labelled the losing group. The percentage values of the attributes in the winning group are increased, and that of the losing group are reduced. By this method, the portrayal of the entity can gradually move towards an accurate representation of the entity.

The dichotomic nature of nodes is an important characteristic of the network. As the two attributes in one node represent opposite nature of a node, if both attributes of the node are activated in an execution, a contradiction occurs. For example, if certain part of the behaviour log of a student indicates that the student is able to perform simultaneous task (an indicator of high working memory) and certain part of the behaviour log indicates the lack of ability to perform simultaneous task (an indicator of low working memory capacity), this is a contradiction.

A contradiction implies that the underlying theory represented by the node is not a suitable portrayal of the represented by the entity of the particular student. That is, the theoretical background of using "the ability to perform simultaneous task" to infer working memory capacity is not very suitable for the particular student. This does not imply the theory is wrong, but it indicates that there are certain aspects this student that makes the theory not applicable. This is the individual difference that we have discussed in chapter 3. A contradiction therefore means that the node's influence (its weight) to the overall value of the entity should be decreased. This is based on the

Multiple Portrayal Network

principle of non-compromise (Minsky, 1 986), and therefore is named as non compromising rule.

The non-compromising rule overrides the attributes' rights to either increase or decrease their values from being in a winning or losing group. The weight of a node labelled contradictory is decreased (see Figure 5-4). The non-compromising rule enables the MPN to gradually adapt itself by removing none-representative node of the entity and move towards a more accurate portrayal of the entity. In the application of MPN in cognitive trait model, the non-compromising rule allows the MPN to evolve into an individualised network representation for only the student it represents. This individualisation is of particular importance because the aggregated effect caused by summing of bias in inferential statistics employed by different cognitive researchers has not yet been thoroughly clarified (Lin, Kinshuk & Patel, 2003). It is therefore "m important requirement for the CTM to evolve in order to represent each different individual.

In the context of cognitive trait model, an MPN starts up by incorporating all the different portrayals and allows the learner's actions to gradually guide the MPN' s evolution. The inappropriate portrayals are automatically faded out (represented by reduced node size in Figure 5-4) and only the representative ones remain.

Figure 5-4 : Tilt of scale shows the changed percentages of attributes, and different size of circle shows the changed weights of nodes

In order to create an accurate representation of an entity, an important question needs to be considered: if different nodes represent different portrayals of an entity, could they overlap with each other, and what kinds of relationship are possible among them? 5 . 3 . 1 Types o f Relationships between Nodes

There are several possible relationships between any two of the nodes. The first one is inclusion: one of the two nodes completely includes the other node (Figure 5-5a). This is an extension of the overlap: node A completely overlaps with node B, but the reverse is not true.

Multiple Portrayal Network

Figure 5-5: Different relationships of nodes: (a) inclusion, (b )overlap, (c )equivalent, and (d)independent

Two nodes of an MPN could also overlap with each other (Figure 5-5b). The overlap is the correlated occurrence of the two MaTs. In the view of Minsky (1 986), each mental agent in the higher level of hierarchy may share with other higher level agents some lower level agencies. For example, the GET agent and the FIND agent both need the SEE agent to accomplish their tasks. An agent in Minsky's (1 986) view is a set of mental functions to complete a task: the GET agent gets an object; the FIND agent finds an object; and both the GET and FIND agent need to use the SEE agent to see objects in order do their tasks. Similarly, any two nodes in an MPN, representing two different MOTs of the same cognitive trait, might share some common elements. Figure 5-5c shows that two nodes are equivalent to each other. The equivalency is defined by their identical occurrence, i.e. the co-occurrence rate is 1 00%. This is most likely due to 1 ) different terminologies used to mean the same construct, 2) researches in different contexts interpreting the same construct differently.

However, any two nodes could also exist independent of each other (see Figure 5-5d). This type of relationship happening to all the nodes is unlikely because different nodes are different aspect of the same entity. The underlying core of the entity is very likely to bring some correlations to its nodes. However, it is still possible that two nodes are totally independent to each other theoretically.

5 . 3 . 2 Inclusion Resolution

Except the independent relationship, all other relationships may create misrepresentations of the entity if there is no mechanism to resolve the relationships of nodes. Inclusion of one node, say node A, in another, say node B, would cause a bias - the weight of node A is doubly represented. If node B includes node A, then the activation of node A must also activate node B, therefore giving credit to the weight of node A is totally unnecessary. Overlaps and equivalency of nodes indicate the same problem, too. Therefore, an Inclusion Resolution mechanism is proposed to solve this problem.

The Inclusion Resolution mechanism works by first identifying the Included and the

Inclusive Node. Please refer to Figure 5-6, the Included Node is graphically represented as a smaller region (node A) whereas the Inclusive Node is the larger region (node B). Mathematically speaking, the Included Node is the one whose conditional probability, with regard to the other, is larger, and vice versa for the Inclusive Node. The P(AIB), read as the probability of A given B, is 2 1 (2 + 5) = 2 I 7,

Multiple Portrayal Network

and P(BIA) is 2 / (2 + 8) = 2 / 1 0 = 1 / 5. P(AIB) is greater than P(BIA) and thus A is

the Included Node, and B is the Inclusive Node.

Figure 5-6: Node A is Included Node and B is Inclusive Node

The weight of the Included Node should remove the included region. The included region is 2 in Figure 5-6, and thus the proportion that A should be reduced is 2 / 5. The distinction of the Included and the Inclusive Node is vital as it is the key to distinguish which node should be reduced. But why not the Inclusive reduced? Figure 5-7a and Figure 5-7b show the reason.

7a 7b

Figure 5-7: Demonstration of why the weight of included node should be reduced

In Figure 5-7a, there are two nodes (nodes A and B), and three regions (regions a, b, and c). From the graphical size, node A is the Included, and node B is the Inclusive. Nodes A and B overlap in the region c, and because node A is the Included Node, region c should be taken out from the total region of node A, and leave node B unaffected. Thus, the total weight, of node A is now a I (a + c).

Figure 5-7b showed that node A is completely included in node B, region a has now disappeared, thus the weight of node A is now 0 I (0 + c) which is equal to 0 and is compliant to the previous discussion that the completely Included Node should not be given any credit.

5. 4 Application of Multiple-Portrayal Network in Cognitive Trait Model

As discussed previously, different researchers try to look at cognitive traits differently. It is analogous to different observers perceiving different portrayals of an object from different angles. In detailed discussion about cognitive traits in chapters 6, 7, and 8, there are number of manifestations of trait (MOTs) for each cognitive trait. MOTs come from different theoretical perspectives of the cognitive traits, similar to different portrayals of the same object.

In close scrutiny, each of the MOTs holds certain degree of truth about the corresponding cognitive trait. Taking any one of them and discarding the rest means

Multiple Portrayal Network

taking a great risk of losing the accuracy desired. With the regarded difficulty to understand the mysteriousness of the mind (Blackmore, 2003, p.36), not a single MOT could include a complete and all-embracing description of how working memory works. Due to this difficulty, MPN provides a viable solution.

In MPN, an entity/network represents a cognitive trait, such as working memory capacity, and each of the MOTs is represented by a node in the entity. Each of the two attributes of a node represents either low or high working memory capacity. When a cognitive trait model is just initiated, it could contain several instances of MPN each representing one cognitive trait. A cognitive trait has same number of nodes as the number of MOTs it has. For example, if there are three MOTs for working memory capacity, the MPN instance for working memory capacity will be initiated with three nodes like Figure 5- 1 . After several learning sessions, the MPN will be changed according to the learner' s behaviours, for example it could become like Figure 5-3 . The tilt in the scales shows the student's tendency towards having either low or high working memory capacity, and the decreased size of nodes indicates that contradictions were detected in the nodes. The contradiction implies that the theories (about working memory) the particular node stands for do not represent the working memory of the student that well.

The ability to handle unclear and complex relationships between nodes is of particular importance to cognitive trait model. This ability is termed Complex-Relationship (CR) Manageability in this study. Researchers examined the same entity, that is cognitive trait, and there is no proof to show that there is no overlap, or other relationships, between their theories. If all the different theories are indeed not related to each other, the mechanism of Inclusion Resolution mechanism can still accommodate them as having independent relationships. Otherwise, the proposed Inclusion Resolution mechanism has potential to handle aggregation of MaTs nonlinearly and achieve the desired CR-Manageability.

5. 5 Multiple-Portrayal Network as Machine L earning

The idea of MPN was inspired by machine learning. Its weighted node structure is similar to that of an artificial neural network, ANN, (e.g. Bar-Yam, 1 997). However, there is a significant difference between MPN and ANN. In this section, a brief conceptual introduction to ANN is presented. MPN is then compared to ANN. Finally, discussion is directed to categories of machine learning that MPN fits to in order to shed some light on where MPN situate in the larger context of machine learning. 5 . 5 . 1 Artificial Neural Network

An artificial neural network (ANN) is an information processing system that comprises of multiple interconnected processing nodes (Stergiou, and Siganos, 1 996). The nodes in ANN are often called neurons because of the analogy of ANN to human brain. The neurons in ANN are weighted, i.e. they each have a value, called weight, attached to it. The weight of a neuron acts as a modifier of the output of the neuron. For example, if a neuron has weight equals to 0.5, then whatever the neuron will

Multiple Portrayal Network

output will need to be multiple by 0.5 . The modification of neurons' weights accounts for the ANN' s learning.

Similar to a human brain, an ANN learns by examples. It can learn for example how to mimic an OR function by using the data in Table 5-1 :

Input A Input B Output

0 0 0

0 1 1

1 0 1

1 1 1

Table 5-1 : Truth Table of OR

In each training session, each pair of Input A and Input B in Table 5-1 are used as inputs to the ANN, and the Output as learning-guides to the ANN - they guide the ANN to modify the weights on its neurons. Using a simple ANN with 2 input neurons and 1 output neurons, Smith ( 1 996) showed that the ANN learned the OR function after 8 training sessions.

Smith (1 996) pointed out that ANN is particularly useful where: 1 . an algorithmic solution cannot be formulated;

2. a lots of example data can be gathered; and 3 . structure o f the data needs to b e extracted.

ANN is often applied to solve complex problems where algorithmic solutions cannot be formulated and the hidden structure of the data needs to be extracted (Smith, 1 996). Pattern recognition (handwriting, voice, or face) is an area where ANN is extensively employed (e.g. Pentland, & Choudhury, 2000; Wu et aI., 1 99 1 ).

5 . 5 . 2 Multiple Portrayal Network and Artificial Neural Network

Both mUltiple portrayal network (MPN) and artificial neural network (ANN) used weighted nodes. Both types of network are self-adaptive - they can both learn from experience (data). However, there exit three major differences between MPN and ANN, namely training, connections of nodes, and types of end representation.

1 . Training: An ANN needs to be trained using example data before it can be used; whereas an MPN is used and trained at the same time. At the first glance,

it may appear that the approach of ANN (training before use) is more reliable. But the reliability of MPN is derived from the fact that the essences of what its nodes represent (the different perspectives of cognitive traits) are supposedly already scientifically validated by the researchers who put forth their theories. 2. Connections of nodes: Initially, there are predefined and existing connections

between nodes (neurons) in ANN. These connections might be strengthened or weakened during the ANN's learning process. There are however no existing connections between nodes in MPN. Only if a node co-occurs with other nodes, the co-occurrences are then used to build the connections between nodes.

Multiple Portrayal Network 3 . Types of end representation: An ANN aims to represent a function which takes inputs and will generate outputs. For example an OR function takes 2 inputs to generate 1 output. An MPN aims to aggregate different perspectives of an entity to give an accurate representation of the entity. For example, an MPN for the working memory capacity of an individual aims to represent the working memory of the individual taking into account different perspectives/theories of working memory.

This comparison is not meant to see which of MPN or ANN is better. It is used to point out some differentiating features of the two.

5 . 5 . 3 Machine Learning and M PN

In terms of machine learning, mechanism of MPN can be categorised in both reinforcement learning (Ghahramani, 2004; Sutton and Barto, 1 999) and learning-to learn (Baxter, 2000). In reinforcement learning, the environment gives reward or punishment to actions taken by a mechanism. The goal of the learning is to maximize reward or minimize punishment (Sutton and Barto, 1 999). In terms of MPN in CTM, decision is made based on the weight of activated nodes - reward or punishment is also given only to activated nodes.

In learning-to-Iearn, a mechanism learns its own inductive bias, which refers to additional assumptions that the mechanism is used to predict correct outputs for situations that have not been encountered so far, based on its previous experience. Since it is not necessary to train MPN before using it, certain risk has to be acknowledged for the fact that the predictive power of MPN solely rely on the inductive bias. Typically, the inductive bias is supplied through the skills and insights of the expert manually (Baxter, 2000). However, the aforesaid risk is remedied by the fact that such bias in MPN (used in CTM) is trivialised because of the fact that the various research results and theories about human cognition used in this study are presupposed to have been validated.

MPN resembles what has been described as a learning-to-Iearn mechanism (Baxter, 2000) without the requirement of pre-training. Pre-training is typical in learning-to learn mechanisms. MPN differs from learning-to-Iearn on its theory-originated instead of human-originated inductive bias. Future research could be conducted to examine MPN's relationship with other learning-to-Iearn mechanisms.

Similar to the employment of conditional probability in the Inclusion Resolution mechanism, Bayesian network also uses conditional probability to perform inferences (MacKay, 2003). Theoretically speaking, there should not be a great amount of dependency in terms of depth - for example A is dependent on B which is again dependent on C, and so on. The reason is because the studies of a cognitive trait are about the same cognitive trait, e.g. working memory; they are not about sub-elements, sub-sub elements and sub-sub-sub elements of working memory which will result in dependency in depth. Bayesian network is suitable to study the dependency in depth issue (MacKay, 2003). We therefore do not think that we need Bayesian network in this study.

Multiple Portrayal Network 5. 6 Summary and Discussion

Multiple portrayal network (MPN) is a network structure suitable for representation of an entity that has multiple portrayals. Such entities are quite common in cognitive science in which different researchers examined a cognitive entity, for example working memory, from different perspectives and came up with different theories for the entity. Each of the portrayals of the entity usually does not provide a consensus and complete model for the entity; relationships between portrayals are often unclear or unknown. MPN provides an Inclusion Resolution mechanism to circumvent this issue and allows nonlinear aggregation of portrayals.

By using the non-compromising rule, MPN in cognitive trait model allows the learner's behaviours to shape the MPN and to determine which theory(ies) is good representative to the learner's working memory and which one is not. In other words,

In document Cognitive trait model for adaptive learning environments : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Information System [i e Systems], Massey University, Palmerston North, New Zealand (Page 63-71)