2.3 Related Work
2.3.2 Fraud Detection Based on Very Low Prior Knowledge
The detection of abusive use of a retail transaction processing system by employees is the goal in [Kim et al., 2003b]. It is therewith one of — to our knowledge — only two academic publica- tions concerning analytic internal fraud detection at the employee level. The authors argue that as the retail sector often does not possess sufficient expertise about potential and actual frauds, an anomaly detection approach to fraud has to be employed. The idea is to find temporal association rules between transactions which are uncommon but still not extremely rare. These association rules are then used to create detectors. Experts should then be able to decide which detectors are valuable. Valuable detectors are then cloned and mutated to broaden the scope and make the search more “fuzzy” . This approach is inspired by the human immune system, using positive and negative selection [Kim et al., 2003b]. Although the vision was to build a ready-to-use prod- uct, the project turned out to be too ambitious and stopped at the prototype level as mentioned in the final report [Kim et al., 2003a]. However, extensive expert evaluation resources seemed to be available, which is desirable in the case of an anomaly detection approach. Evaluation results were not as expected as the detected and fully investigated anomalies turned out to be legal be- havior.
A solution which is occasionally used in anti-money-laundering is the Peer Group Analysis[We- ston et al., 2008]. The idea is to define a peer group which contains the objects most similar to the target object. Peer groups can for instance be defined based on clustering algorithms or based on business knowledge. The objects and peer group summaries (which describe the “average” behavior within a peer group) are monitored over time. If an object starts to exhibit behavior which is substantially different from the average behavior of its peer group, it is considered sus- picious. Changes which affect the whole peer group (e.g. due to changes in the market situation) are masked, which avoids the generation of false alerts. The authors argue that the distinguish- ing feature of the peer group analysis lies in its focus on local patterns rather than global mod- els. The unusualness of an object is not measured on the basis of the whole population but on “similar”objects. The choice of an adequate similarity definition is crucial for this method. This approach makes prominent assumptions, which are not mentioned explicitly in the publication. In particular, the authors assume that behavior change is a valuable indicator for fraudulent be- havior, which may not always be true. We will discuss this in greater detail when introducing
our unifying framework in section2.4. In a recent publication [Weston et al., 2008], an in-depth discussion of credit card fraud detection using peer group analysis is provided.
A similar solution for the problem of low prior knowledge9 in a topic where fraud detection
has not been done before is proposed in [Major and Riedinger, 2002]. This approach makes use of an “operations cycle” and a “development cycle” to detect fraud in health care claims. First, a Peer Group Analysis variant is used to find health care providers which “stand out from the mainstream”, which are then presented to a security unit. In the development cycle, rules should be induced based on the expert analysis of the outliers. As in [Kim et al., 2003b], these rules are proposed to be cloned and mutated. Details about the development cycle process are not given. This system is supposed to “address a class of identification problems that are more likely to be encountered in business than in science or engineering” and is therefore mostly application- oriented. The question arises if outlier detection is a sensible discriminator for fraud identifica- tion. We will get back to this issue when we discuss applicability issues below.
An alternative approach to the problem of missing identification knowledge is proposed in [Brock- ett et al., 2002]. It is not based on outlier detection as most of the related work, but tries to achieve a classification (or at least a fraud suspiciousness ranking) without given class labels. Instead, experts are required to look at each attribute used for prediction and rank the corresponding at- tribute values in terms of likelihood of suspicion. Attributes are assumed to be ordinal. Based on this ranking, the attribute values are transferred to numerical RIDIT-Scores10 [Brockett et al., 2002]. In contrast to the obvious solution of just assigning integers - as for example, the rank - to the possible attribute values, RIDIT-Scores do not presuppose equal interval spacing and, in addition, are able to reflect “abnormality”of an attribute value (which reflects the concept of en- tropy used e.g. in decision tree algorithms). Summation of the RIDIT-Scores of each attribute values of a given instance leads to an overall score which can be used for classification or ranking. Apparently, this approach requires a certain amount of model knowledge from experts to do the attribute value ranking. It could be argued that it therefore falls into the category of approaches with given model knowledge.
9The authors Major and Riedinger use the term ”fragmentary, microlevel knowledge”.
10This term was introduced by [Bross, 1958] and denotes a scoring model for ranking attribute values according to an
underlying latent variable (in this case fraud likelihood). In contrast to the obvious solution of just assigning integers - as for example, the rank - to the possible attribute values, RIDIT-Scores do not presuppose equal interval spacing and, in addition, are able to reflect “abnormality”of an attribute value (which reflects the concept of entropy used e.g. in decision tree algorithms)
An approach based on a first-order markov chain for phone fraud detection is discussed in [Hollm´en and Tresp, 1998]. The generative model used exists of two hidden binary variables, representing if an account is currently victimized by a fraudster and if the fraudster in question currently performs fraud, respectively. The observed binary variable is representing if a mobile phone is currently being used. The experiment was started in an entirely unsupervised exper- iment, but due to poor performance, available information which accounts were victimized by fraud was used for parameter estimation. Further examples which make use of outlier detection and behavior change analysis are [Yamanishi et al., 2000] and [Burge and Shawe-Taylor, 2001]. In [Xu et al., 2006], a method for monitoring behavior to detect online attacks is proposed. As in many other approaches, a behavior profiling is followed by a behavior monitoring. This is done on both the individual and the system level. The authors argue that system level monitoring may help to detect system flaws which are exploited by many users (for example, obtaining game points without playing in an online game), but may not be detected at the individual level as the impact is too low.