A Review on Prevention and Detection the Financial Fraud Statement in Government Sector using the Data Mining Framework

(1)

A Review on Prevention and Detection the Financial Fraud Statement in Government Sector using the Data Mining

Framework

(Paper ID: 10ET3011201405)

Harshil Gandhi Pragnesh Patel

P.G. student Asst. Prof.

Computer Engineering Computer Engineering

IIET, Dharmaj IIET, Dharmaj

Gujarat, India Gujarat,India

g.harshil@yahoo.in pra2690@gmail.com

Abstract: Many data mining techniques are developing for detecting the financial fraud [7]. Because when the internal auditing system is fail in the large organization or even in the government sector or in public sector then the chances of the fraud are increases. Data mining techniques are providing the best aid in fraud when missing some important data. This paper present comprehensive review of the literature on the Application of data mining techniques for the detection of financial fraud and proposes a framework for data mining techniques based smart meter fraud detection. The different classification techniques like NaïveBayes, decisiontrees and support vector machines are used to detect the financial fraud. The adoption of smart meters may bring new fraud detection concerns to the general public. Given the fact that metering data of individual homes/factories is accumulated every 15 minutes, it is possible to infer the pattern of electricity consumption of individual users. In order to protect the fraud detection of users in a completely decentralized setting (i.e., individuals do not communicate with one another. The novel approach allows individual meters to report the true electricity consumption reading with a pre-determined probability. Load serving entities (LSE) can reconstruct the total electricity consumption of a region or a district through inference algorithm, but their ability of identifying individual users’ energy consumption pattern is signiﬁcantly reduced. Using simulated data, we verify the feasibility of the proposed method and demonstrate performance advantages over existing approaches

Keywords: smart metering, Data privacy, Distribution water/electricity.

I. INTRODUCTION

With profound changes of the electric power industry towards a smarter grid in support of sustainable energy utilization, many utility companies are in the process of replacing conventional metering devices with smart meters [15]. Smart meters make it possible to provide near real-time price incentives to customers which could potentially reduce the need for expensive peak capacity and energy. The successful adoption of smart metering and pricing could offer many beneﬁts, including: reduction in wholesale prices [11], enhanced reliability [16], and environmental improvement [17]. However, the massive deployment of smart meters also raises a series of concerns, for example, (1) depending on the utility, the “gap” between the operational beneﬁts and

infrastructural investment is large [17]; and (2) the fear of loss of fraud detection(i.e., “spy at home”) may arise in ordinary customers. The first problem might be obviated by beneﬁts in the long run but the second problem becomes increasingly more challenging [18] [19]. It would be possible for utilities to infer the type of appliances individual customers are using at every 15 minutes (i.e., when you are using your computer and when your garage door is activated). The compromise of customer’s fraud detection would be significant if they are left unprotected.

User fraud detection has been an important issue in various applications involving information exchange, data sharing, and medical data dissemination .Many previous studies have been conducted in a centralized environment, where owners have the ability to adjust the data in a global manner. Often- times, fraud detections protected by suppression, generalization, and randomization to ensure properties such as K-anonymization, l-divergence, or t-closeness. Most of these techniques intend to hide identities of individuals in a crowd of others so that no single identity can be uniquely distinguished. Recently, some researchers have also considered fraud detection protection in a distributed environment, e.g. making peer-to-peer communication accountable without losing fraud detection and preserving location fraud detection in distributed environments.

Unfortunately, none of these models are suited to fraud detection problems of smart electricity meters, which are setup in a decentralized environment. Individual smart meters report their reading to the load serving entity (LSE) but they do not communicate with one another. It remains an open question of how to preserve the ability of the LSE to compute an approximation of the current electricity consumption (i.e., for dynamic pricing) while protecting the fraud detection of individual users. Efthymiou and Kalogridis suggested an approach to protecting smart metering data via anonymization but their approach requires the participation of a third party.

Similarly, Quinn suggested that metering data can be aggregated and encrypted so that an individual’s information is anonyms to roughly the scale of a city block. Recently,

(2)

Kalogridiset. al. introduced a new approach to enable fraud detection protection of smart meters toward undetectable appliance load signatures, which used a rechargeable battery to moderate the home’s load signature in order to hide appliance usage information . All these techniques, however, require a signiﬁcant effort in technology development, standards, policy, and regulatory activities, which are not yet available. In this paper, a fraud detection protection solution based on the existing infrastructure and technology is proposed. Most directly related to our research is a recent paper by Bohlietal, which added Distribution noises to each smart meter to prevent the adversary from guessing the patterns of energy consumption correctly. There are several issues with that approach:

(1) a substantial amount of smart meters are required to ensure the accurate aggregated reading and protect the fraud detection of individuals customers.

(2) It is easy to recover true readings because the Distribution noise added to each smart meter follows the same distribution;

(3) Approximately half of the smart meters report negative readings because their added noise was to cover 50%

confidence interval of the typical household power consumption, rending erroneous outputs.

II. FRAUD DETECTION

Fraud refers to obtaining goods/services and money by illegal way. Fraud deals with events which involve criminal motives that, mostly, are difficult to identify. Fraud is one of the biggest threats to business and commercial establishments today [12].

A) Types of Fraud

As per the research we decide that there are various types of frauds like credit card frauds, telecommunication frauds, and, Theft fraud/counterfeit fraud, Application fraud, Behavioral fraud, water and electricity fraud.[12][13].

1) Credit Card Fraud: Credit card fraud has been divided into two types: Offline fraud and On-line fraud. Offline fraud is committed by using a stolen physical card at call center or any other place. On-line fraud is committed via internet, phone, shopping, web, or in absence of card holder[10].

2) Telecommunication Fraud: The use of telecommunication services to commit other forms of fraud. Consumers, businesses and communication service provider are the victims.

3) Theft Fraud/ Counterfeit Fraud: In this section, we focus on theft and counterfeit fraud, which are related to one other.

Theft fraud refers using a card that is not yours. As soon as the owner give some feedback and contact the bank, the bank will take measures to check the thief as early as possible. Likewise, counterfeit fraud occurs when the credit card is used remotely;

where only the credit card details are needed.

4) Application Fraud: When someone applies for a credit card with false information that is termed as application fraud. For detecting application fraud, two different situations have to be

classified. When applications come from a same user with the same details, that is called duplicates, and when

5) Behavioral Fraud: Behavioral fraud occurs when sales are made on a cardholder present basis and details of legitimate cards have been obtained fraudulent basis.

6) Water Consumption Fraud: Water consumer dishonesty is a problem faced by all water and power utilities that managed by a financial billing system worldwide. Finding efficient measurements for detecting fraudulent electricity consumption has been an active research area in recent years. This thesis presents a new model towards Non-Technical Loss (NTL) detection in water consumption utility using data mining techniques [9].

7) Electricity Fraud : For detecting the electricity fraud the non-supervised artificial neural network called SOM (Self- Organizing Maps), which allows the identification of the consumption profile historically registered for a consumer, and its comparison with present behavior, and shows possible frauds.[4]

III. LITERATURE REVIEW

It is necessary to identify various methodologies that could possibly use to detection of the fraud. The objective of this portion of literature review is to identify the various techniques that use to detect the fraud. In this section, a graphical conceptual framework is proposed for the available literature on the applications of data mining techniques to financial accounting fraud detection. The classification framework, which is shown in Fig. 1, is based on a literature review of existing knowledge on the nature of data mining research

Fig. 1 Various method to detect the fraud

There are many data mining techniques are now use to detect the fraud and they are list below.

1) Neural networks 2) Decision Tree:

3) Support Vector Machine 4) KNN Technique 5) Logistic Regression 6) Genetic algorithms:

7) Outlier Detection:

(3)

1) Neural Network:

Using the last one or two year data neural network is train about the particular pattern of using a credit card by a particular consumer. As shown in the figure the neural network are train on information regarding to various categories about the card holder such as occupation of the card holder, income, occupation may fall in one category, while in another category information about the large amount of purchased are placed, these information include the number of large purchase, frequencies of large purchase, location where these kind of purchase are take place etc. within a fixed time period. When credit card is being used by unauthorized user the neural network based fraud detection system check for the pattern used by the fraudster and matches with the pattern of the original card holder on which the neural network has been trained, if the pattern matches the neural network declare the transaction ok [14] .

Fig. 2 Layer of Neural Network in Credit Card

2) Decision Tree

Decision tree is a kind of inductive algorithm, it has developed a variety of algorithms, such as CART,ASSISTANT, ID3, C4.5 and so on; In which, ID3 algorithm proposed by J.Ross.Quinlan in 1986, its development is particularly rapid, and its application is wide, has greatly promoted the application of decision tree algorithm. ID3 algorithm is the measure of the information gain in the information theory as the entity selects important properties, which uses the greatest attribute of information gain as a decision tree’s root node, establishes branches for the different value of the node, and then all of the branches run the same recursive algorithm [1].

3) Support Vector Machine

SVMs are supervised learning methods that generate input- output mapping functions from a set of labeled training data.

“The mapping function can be either a classification function (used to categorize the input data) or a regression function (used to estimation of the desired output)” [9]. For classification, “nonlinear kernel functions are often used to transform the input data (inherently representing highly complex nonlinear relationships) to a high dimensional feature space in which the input data becomes more separable (i.e., linearly separable) compared to the original input space. Then, the maximum-margin hyperplanes are constructed to

optimally separate the classes in the training data” [9]. SVMs can be used for prediction as well as classification. They have been applied to a number of areas, including handwritten digit recognition, object recognition, and speaker identification, as well as benchmark time-series prediction tests [9]

4) KNN Technique

KNN is the last selected mining method. The k-nearest neighbor algorithm is amongst the simplest of all machine learning algorithms: an object is classified by a majority vote of its neighbors, with the object being assigned to the class most common among its k nearest neighbors. The Nearest- neighbor classifiers are based on learning by analogy, that is, by comparing a given test tuple with training tuples that are similar to it. The training tuples are described by n attribute.

Each tuple represents a point in a n-dimensional space. In this way, all of the training tuples are stored in a n-dimensional pattern space.

5) Logistic Regression

Data mining tasks has more and more statistical model that involves discriminant analysis, regression analysis, multiple- logistic regression, etc. Logistic regression (LR) is useful for situations in which we want to be able to predict the presence or absence of a characteristic or outcome based on values of a set of predictor variables. It is similar to a linear regression model but is suited to models where the dependent variable is dichotomous. Logistic regression coefficients can be used to estimate odds ratios for each of the independent variables in the model and it is applicable to a broader range of research situations than feature analysis [12].

6) Genetic algorithms

Genetic algorithm is the procedure is repeated until a pre specified number of generations has passed, and the best solution found. It is parametric procedure and it needs to be problem undertaken to get a better performance. The list of these parameters and the settings are needed to generate fraud transaction.Such parameters are needed to compute the critical values, to calculate the CC usage frequency count, CC usage location, CC overdraft, current bank balance, average daily spending etc. as shown in Fig. 3 [10].

Fig. 3 System Design of Genetic Algorithm

(4)

Table 1 Compression of different approaches used for privacy preservation

7) Outlier Detection

Outliers are a basic form of non-standard attention that can be used for fraud detection. An observation that deviates much from other observations that arises suspicion that it was generated by a different mechanism is known as outlier.

Unsupervised learning approach is employed by this model.

Generally, the result of unsupervised learning is a new explanation or representation of the observed data, which will then lead to improved future decisions. Unsupervised methods do not need the prior knowledge of fraudulent and non-fraudulent transactions in historical database, but instead unsupervised learning detect changes in behavior and/or unusual transactions. These methods involve modeling of baseline distribution that represents normal behavior and then detects observations that show deviation from this norm. On other side, supervised methods, models are trained to discriminate between fraudulent and non- fraudulent transaction so that new observations can be assigned to classes. In supervised methods,

IV. PROBLEMS REGARDING TO FRAUD DEFECTION.

Water supply and electricity supply is a big problem related in distributing water and electricity of the organization. but it is important to balance the expenses with the income to allow delivering water service equally and as per the demand of the citizens. The mentioned Irregularities known as non-technical losses (NTLs). NTLs originating from electricity theft and other customer malfeasances are a problem in the electricity supply industry. [11][18] NTL is a problem in water supply industry too because of the similarity between water and electricity distribution systems in depending on meter technology and load profiling concept. NTLs include the following activities:

1) Losses due to faulty meters and equipment.

2) Tampering with meters so that meters record low rates of consumption.

3) Stealing by bypassing the meter or otherwise making illegal connections.

4) Arranging false readings by bribing meter readers.

5) Arranging billing irregularities with the help of internal employees by means of such subterfuges as making out lower bills, adjusting the decimal point position on the bills, or just ignoring unpaid bills.

6) Poor revenue collection techniques

(5)

V. CONCLUSION

Fraud has become more and serious problem in recent years. To improve customer risk management level in an automatic and effective way, building an accurate and easy handling risk monitoring system is one of the key tasks for the customer. One aim of this study is to identify the user model that best identifies fraud cases. There are many ways of detection of fraud. If one of these or combination of algorithm is applied into bank credit card fraud detection system, the Probability of fraud transactions can be predicted soon after use of the water or electricity by customer. This paper gives contribution towards the effective ways of Water and electricity fraudulent detection.

REFERENCES

[1] Kaiqi Zou, Wenming Sun Hongzhi Yu and Fengxin Liu “Id3 Decision Tree In Fraud Detection Application” International Conference On Computer Science And Electronics Engineering, 2012

[2] Liang Lei “Card Fraud Detection by Inductive Learning and Evolutionary Algorithm” Sixth International Conference on Genetic and Evolutionary Computing, 2012

[3] Koosha Golmohammadi And Osmar R. Zaiane “Data Mining Applications For Fraud Detection In Securities Market”

European Intelligence And Security Informatics Conference, 2012

[4] José E. Cabral, João O. P. Pinto Evandro M. Martins and Alexandra M. A. C. Pinto “Fraud Detection In High Voltage Electricity Consumers Using Data Mining”, 2008

[5] Charles Francis, Noah Pepper, Homer Strong “Using Support Vector Machines To Detect Medical Fraud And Abuse” Annual International Conference Of The Ieee Embs Boston, 2011 [6] Aijun Yang and Ping Song “Application Of Data Mining

Technology In Online Audit” International Conference On Computer Science And Service System, 2012

[7] Anuj Sharma and Prabin Kumar Panoramic “A Review of Financial Accounting Fraud Detection Based On Data Mining Techniques” International Journal of Computer Applications, 2012

[8] G.Kesavaraj and Dr.S.Sukumaran “A Study On Classification Techniques in Data Mining”2013

[9] Eyad H. Humaid and Tawfeeg Barhoum “Water Consumption Financial Fraud Detection: A Model Based On Rule Induction”Palestinian International Conference on Information and Communication Technology, 2013

[10] K.Ramakalyani, And D.Umadevi “Fraud Detection of Credit Card Payment System By Genetic Algorithm” International Journal Of Scientific & Engineering Research Volume 3, Issue 7, July-2012

[11] Anuj Sharma and Prabin Kumar Panoramic “A Review of Financial Accounting Fraud Detection Based On Data Mining Techniques” International Journal Of Computer Applications, 2012

[12] Khyati Chaudhary, Jyoti Yadav and Bhawna Mallick “A Review of Fraud Detection Techniques: Credit Card”

International Journal of Computer Applications, 2012

[13] Renu and Suman “Analysis On Credit Card Fraud Detection Methods” International Journal Of Computer Trends And Technology (Ijctt) – Volume 8 Number 1– Feb 2014

[14] Raghavendra Patidar, Lokesh Sharma “Credit Card Fraud Detectionusing Neural Network” International Journal of Soft Computing And Engineering, 2011

[15] Andrei Sorin Sabau “Survey of Clustering Based Financial Fraud Detection Research”2012

[16] Leman Akoglu, Rishi Chandy and Christos Faloutsos “Opinion Fraud Detection in Online Reviews By Network Effects”

Proceedings Of The Seventh International Aaai Conference On Weblogs And Social Media, 2013.

[17] Iais Application Paper on Deterring, Preventing, Detecting, Reporting And Remedying Fraud In Insurance Approved On 28 September 2011.