2.2 Behaviour Analysis Methods
2.2.3 Clique Behaviour Analysis
While there are a number of definitions regarding the term “clique” [52], it can generally be defined as a term referring to small clusters of individuals that fre- quently communicate with each other [18]. In clique behaviour analysis, it is defined here as the analysis of small clusters of e-mail accounts, to examine their group interaction behaviour. At this level of analysis, the focus is on under- standing the common group communication behaviour exhibited amongst sev- eral e-mail accounts, rather than the behaviour of individual e-mail accounts. The diagram in Figure 2.12 illustrates the idea of clique behaviour analysis. In the work by [2, 18, 42], a method called “User Cliques” is proposed for ex- amining an individual’s e-mail archive for unusual clique behaviour. The basis for the “User Cliques” method is that an individual may often send e-mail mes- sages to different groups of recipients, based on how those recipients relate to
A4 A1 A2 A3 Area of Analysis A5 A6 A7 A8 A13 A12 A9 A10 A11 Cluster of E-mail Accounts That Frequently Communicate with Each Other
Ai = E-mail account i = 1, 2, 3, 4, ...
Figure 2.12: Diagram illustrating the idea of clique behaviour analysis.
the individual. For example, the individual may send a certain set of messages to friends, another set of messages to work colleagues, and another set of messages to relatives. In most cases, it may be unlikely that the individual will distribute the same set of e-mail messages to different groups of recipients (e.g. the indi- vidual does not send the same messages to work colleagues and relatives). Based on this, it is assumed by [2, 18, 42] that there will be different social cliques that can be found by examining how an individual sends e-mail messages to certain sets of recipients.
To detect unusual clique behaviour, [2, 18, 42] searches for e-mail traffic be- haviour that violates an individual’s typical clique e-mailing behaviour. This is performed by establishing an individual’s normal clique e-mailing behaviour by examining the recipient list of each e-mail message sent from the individual’s e-mail account (i.e. taken from the TO, CC, and BCC fields). The list of re- cipients found from each e-mail message is then used to build models of the different cliques that the individual sends messages to. The individual’s cliques are profiled by using a rule defined by [2], which specifies that a set of recipi- ents cannot be subsumed by another set of recipients, when determining whether the recipients form a separate clique. For example, for an e-mail account U consisting of recipient sets: {A,B}, {A,B,C,D}, and {D,E,F}, the first set will be subsumed by the second set, meaning that there will be two cliques where: Clique1 ={A, B, C, D}andClique2 ={D, E, F}. Figure 2.13 shows a visual representation of how these cliques are profiled, which is based on the clique
diagrams originally drawn by [2]. U A B C D E
Item Set {A, B} Item Set {A, B, C, D} Item Set {D, E, F} Clique1 Clique2 F U A B C D U D E F
Figure 2.13: Diagram of clique profiling, based on the clique diagrams originally drawn by [2].
After establishing the individual’s normal cliques, the “User Cliques” method de- tects unusual e-mail traffic behaviour by scanning new or recent outgoing e-mail messages for recipient lists that violates any of the cliques previously profiled. If an outgoing e-mail message has a recipient list that is not a subset of any previously profiled clique, then that e-mail message is deemed to have caused a violation. This information can then be used to alarm the user/analyst that the individual has started communicating with new sets of recipients.
Although the “User Cliques” method proposed by [2, 18, 42] is able to analyse the cliques present in an individual’s e-mail archive, it suffers from drawbacks related to what it is able to analyse. A major problem is that the “User Cliques” method only works well for cases where the individual under analysis rarely or occasionally communicates with new recipients. For cases where the individual under analysis is a new e-mail account user or the individual constantly com- municates with new recipients, it was noted by [2] that it may be possible that too many alarms would be generated during these situations. Such cases may render the user cliques method useless since a large number of alarms may not necessarily be useful for the user/analyst.
Another limitation of the “User Cliques” method is that it does not examine the variations in behaviour of the existing cliques. In [2], the user cliques method has only been shown to be useful for detecting the appearance of new cliques. However, the problem with the method proposed by [2] is that it does not pro- vide any information about the dynamics of the existing cliques or whether the cliques originally profiled still continue to communicate with each other. Infor- mation on the variations in traffic behaviour of the existing cliques may be useful for understanding the level of activity exhibited by particular clusters of e-mail accounts.