• No results found

Fraud Detection Based on Prior Identification Knowledge

2.3 Related Work

2.3.3 Fraud Detection Based on Prior Identification Knowledge

The situation of available identification knowledge maps directly to the use of supervised data mining algorithms, which require the corresponding labels. As expected, most of the related work meeting this initial situation makes used of supervised algorithms. In the field of man- agement fraud, [Fanning and Cogger, 1998] makes use of a Neural Network (AutoNet) to learn a model based on constructed training data. Informal identification knowledge in the form of SEC Enforcement Releases11 is converted to labels for training. Entity attributes are company and

management structure descriptions —this is one of very few examples in fraud detection where the focus does not lie on events but objects.

A considerable amount of research in the fraud detection area is based on ready-to-use data with labeled examples. The focus lies on the design of the data-mining method, application issues are completely ignored. In this setting, evaluation is straightforward. A common approach is the use of multiple classification models, which are combined using meta-learning12. Phua [Phua et al., 2004] uses this approach to detect illegitimate car insurance claims. An explicit cost model which considers average costs of investigation and average cost per claim is used for evaluation. The performance is measured in cost savings, which, of course, is attractive but may be infeasible for most real world evaluations as the actual costs may be very hard to determine. Meta-learning is also proposed to attack the problem of massive data, highly skewed data, and variable classifi- cation costs [Chan et al., 1999;Stolfo et al., 1997;Stolfo et al., 1998]. The idea of combining local fraud classifiers, which are calculated in different financial institutions into a global detector using meta-learning, is introduced in [Prodromidis and Stolfo, 1999]. This should allow companies to share knowledge about fraud without exchanging sensitive data and allow for a global detector which “will be able to pick up patterns of fraud that are not detectable at the local level [. . . ]”. The idea seems attractive as, e.g., in money laundering detection, the limited local views of single financial institutions form a crucial limitation. However, it remains unclear if models which are unspecific enough to be interchangeable without giving away sensitive data can be combined to a more expressive meta-classifier. In particular, without revealing customer information, activities of a person in different financial institutions cannot be mapped to each other. Another issue are

11U.S.Securities and Exchange Commission Enforcement Releases, http://www.sec.gov/divisions/enforce/friactions.shtml 12Meta-learning denotes the learning on the basis of results from previously applied learners. An extensive survey can

differences in schema definitions of the databases, which lead to incompatible classifiers. [Prodro- midis and Stolfo, 1999] describes a technique called “bridging”to overcome this problem, which basically is a combination of well-known data preprocessing steps. This approach is reported to be successful in an experimental setup using credit card fraud data across two participating banks.

[Maes et al., 1993] is an example of straightforward application of existing data mining algorithms to an “ideal” data set: it uses Bayesian and neural networks for credit card fraud detection data. Unfortunately, details about the used features are not given. Bayesian Networks Models are used in [Ezawa and Norton, 1995] to predict uncollectible debt.

No success using standard machine learning techniques is reported in [Fawcett et al., 1997] and [Fawcett and Provost, 1996] for phone cloning fraud detection. Two problems are identi- fied: First, a call that is unusual for one customer can be typical for another customer. Context information to account for this fact is not directly available in this case, so it is derived from his- torical data specific to each account. This leads to the detection of changes in behavior rather than absolute indicators of fraud. Second, fraud identification on the basis of individual calls is assumed to produce an unacceptable true positive/false positive ratio. The single event perspec- tive is therefore switched to an aggregated event or single object perspective, “smoothing out the variation and watching for coarser-grained changes that have better predictive power”. The pro- posed approach generates fraud rules based on each separate account using a beam search. In a rule selection step, the most general rules are selected. These general rules are then turned into account specific profiling monitors, that is, the rules are customized to each account. If a monitor registers deviation from the modeled normal behavior, it will generate an alert. In a final step, evidence from the monitors is then combined to an overall score. Evidence combination weights and the threshold above which an alarm should be issued is learned using a standard learning algorithm. This approach is similar to the peer group analysis [Weston et al., 2008] in two aspects: it also makes use of local instead of global models and searches for behavior changes.

Cahill et al [Cahill et al., 2002] extend this approach by changing the focus from a time-driven (ac- count summaries on a daily basis) to a event-driven model, weighting recent calls more heavily and by using adaptive, updatable user profiles they call “account signatures”. Account signatures are based on estimations for multivariate probability distributions, describing which call features

are likely for the account, and which are not. The authors state that “fraud typically results in unusual account activity”, this probability is the “right background to judge fraud against”. In addition to the account signature, a “fraud signature” is used, which requires labeled data. A more general view on the field of activity monitoring and corresponding evaluation issues is given in [Fawcett and Provost, 1999].

A completely different approach is used in [Burge and Shawe-Taylor, 2001]. An expert system using simple fuzzy logic based rules is proposed. The fuzzy rules account for subjective nature and therefore with the ambiguity of the parameters. The system is designed for insurance claim detection but is only tested on hypothetical data.

The recent survey of data mining-based financial fraud detection research [Yue et al., 2007] re- ports that the large part of publications relies on supervised methods as regression and neural networks. This is not surprising as the use of available labeled data is very convenient for readily generating research results and publications.