EFFICIENT DETECTION AND PREVENTION OF DATA LEAKEAGE IN DTE AND DCE

(1)

35

EFFICIENT DETECTION AND PREVENTION OF

DATA LEAKEAGE IN DTE AND DCE

A.N.VINODHINI1 Dr.S.AYYASAMY2

PG Scholar Professor

Dr.N.G.P. Institute of Technology Dr.N.G.P. Institute of Technology

Coimbatore-641665 Coimbatore-641665

TamilNadu, India. TamilNadu, India

ABSTRACT

A sensitive data leakage on DTE and DCE is a serious threat to organizational

security. Data loss occur may be in lack of proper encryption on files so, organization need

tools to store and transfer sensitive data. However detection and prevention of sensitive information is challenging in DTE and DCE . Now a day’s automation products are getting increased and need of the automotive products also. Still the existing system are facing more drawbacks in security in the current software. The most important factor is computerizing all the data in a centralized server and taking backups of old records.

The main objective of this project is to securely communicate between the production unit, administration unit, sales unit, quality check unit and store house unit and to avoid data leakage problem. A Data distributor has given sensitive data to supposedly trusted agents (unauthorized party). Some data has been leaked and found in an unauthorized place(e.g. on web or in somebody’s laptop). The distributor must assess the likelihood that the leaked data came from one or more agents, as opposed to having been gathered by other independent means. Hence we design in such a way that the data on reaching any destination say agent or unauthorized party, data as well as the IP address of the receiver will reach the distributor. If the distributor receives an IP address other an agent’s address, the distributor finds out that the data has been leaked. Fake object is inserted along with the data. From the data received, the distributor compares with their database and find out which agent has been leaked the data by calculating the probability for each agent on the leaked data. IP synchronization has been introduced in this project, this leads to avoid data leakage. The main goal of this project is to reduce manual works, increase the processing speed and ensure reliability of data.

Keywords : Leakage detection, sensitive data, guilt model, IP synchronization, Fake object, access control

INTRODUCTION

In the course of doing business,

sometimes sensitive data must be handed

over to supposedly trusted third parties

.For example, a hospital may give patient

records to researchers who will devise new

treatments. Similarly, a company may

have partnerships with other companies

that require sharing customer data.

Another enterprise may outsource its data

(2)

36

various other companies. We call the

owner of the data the distributor and the

supposedly trusted third parties the agents.

Our goal is to detect when the distributor’s

sensitive data have been leaked by agents,

and if possible to identify the agent that

leaked the data. We consider applications

where the original sensitive data cannot be

perturbed. Perturbation is a very useful

technique where the data are modified and

made “less sensitive” before being handed

to agents. For example, one can add

random noise to certain attributes, or one

can replace exact values by ranges.

However, in some cases, it is important not

to alter the original distributor’s data. For

example, if an outsourcer is doing our

payroll, he must have the exact salary and

customer bank account numbers. If

medical researchers will be treating

patients (as opposed to simply computing

statistics), they may need accurate data for

the patients. Traditionally, leakage

detection is handled by water-marking,

e.g., a unique code is embedded in each

distributed copy. If that copy is later

discovered in the hands of an unauthorized

party, the leaker can be identified.

Watermarks can be very useful in some

cases, but again, involve some

modification of the original data.

Furthermore, watermarks can sometimes

be destroyed if the data recipient is

malicious

RELATED WORKS

This paper, they focus on the use of

our morphing techniques in thwarting

traffic classifiers that utilize features based

on packet sizes.[1] As an example of the

usage of our technique, consider a general

web page classifier that uses packet sizes

to determine the identity of web pages. If

the user connects to www.webmd.com to

search for medical information over an

encrypted connection where packet sizes

are not padded, the web page classifier

would examine these sizes and determine

that the user has indeed gone to

www.webmd.com. The approach we take

allows the user (with the cooperation of

the web server or proxy) to morph her

download to appear as a different web

page(e.g., www.espn.com) to the

classifier[1]. Masking can be obtained by

means of padding and fragmenting. We

define formally what theideal target of

masking is, and then define the masking

problem as a statistical optimization

problem, aiming at minimizing the

required overhead. We find the optimal

solution of the masking problem in case of

two application types [2]. We perform

extensive experimental evaluation on the

(3)

37

tolerance of our techniques. Our

evaluation results under various data-leak

scenarios and setups show that our method

can support accurate detection with very

small number of false alarms, even when

the presentation of the data has been

transformed [3]. Design and development

of secure outsourcing techniques of

various functionalities to untrusted servers

are getting growing attention in the

research community. The rapid growth in

availability of cloud services, makes such

services attractive for clients with limited

computing or storage resources who are

unwilling or unable to procure and

maintain their own computing

infrastructure [4].

FAKE OBJECT

The distributor may be able to add

fake objects to the distributed data in order

to improve his effectiveness in detecting

guilty agents. However, fake objects may

impact the correctness of what agents do,

so they may not always be allowable. The

idea of perturbing data to detect leakage is

not new, e.g., [1]. However, in most cases,

individual objects are perturbed, e.g., by

adding random noise to sensitive salaries,

or adding a watermark to an image. In our

case, we are perturbing the set of

distributor objects by adding fake

elements. In some applications, fake

objects may cause fewer problems that

perturbing real objects. For example, say

that the distributed data objects are

medical records and the agents are

hospitals. In this case, even small

modifications to the records of actual

patients may be undesirable. However, the

addition of some fake medical records may

be acceptable, since no patient matches

these records, and hence, no one will ever

be treated based on fake records. Our use

of fake objects is inspired by the use of

“trace” records in mailing lists. In this

case, company A sells to company B a

mailing list to be used once (e.g., to send

advertisements). Company A adds trace

records that contain addresses owned by

company A. Thus, each time company B

uses the purchased mailing list, A receives

copies of the mailing. These records are a

type of fake objects that help identify

improper use of data. The distributor

creates and adds fake objects to the data

that he distributes to agents. As discussed

below, fake objects must be created

carefully so that agents cannot distinguish

them from real objects.

IP SYNCHRONIZATION

An IP address has three or

four octets (parts). It cannot have a number

above 255 in any of its octets. All the

(4)

38

and 255. For example, 209.20.5 or

216.222.8.131. The former will include the

entire IP range from

209.20.5.0-209.20.5.255. The latter (216.222.8.131) is

an address within a IP range. IP addresses

are "Internet Protocol" addresses, or

numeric addresses assigned to servers and

users connected to the Internet. Some

Internet services use the source address of

the client's computer as a form of

authentication. These systems keep track

of the Internet Protocol (IP) address that an

end user used the last time that user

accessed the site and try to determine if the

user is legitimate. When that same user

accesses the site from a different source IP

address, the site asks for further

authentication to revalidate the client's

computer. The theory is that a user's

typical location computer has a somewhat

persistent IP address, but when the user

has a new address, that user may be mobile

or using a less secure wireless media, and

then require further authentication. For

example, many organizations have firewall

policies with objects named like “Admin

Laptop" with the single IP address of his

computer. This technique is used by some

banking sites, some online gaming sites,

and Gmail. So that once IP has been

synchronized with the network means, the

application in the network can be accessed

only through the particular IP.

AGENT GUILT MODEL

Algorithm

Guilt Probability Calculation:

The highest probability employee

is considered as a leaker. The probability

is calculated by using the fake data which

is sent along with the leaked data. The

admin finds to which the fake data belongs

by referring to his database. If the fake

data of an employee matches with the

admin database the probability is increased

to 0.1. It is the process of finding the

leaked data by using the fake object which

is inserted along with the data at the time

of leakage for detecting the

employee(leaker). From the data received,

the admin compares with his database and

find out which employee has leaked the

data by calculating the probability for each

employee on the leaked data. The admin

selects the leaked data and generates that

which employees hold these data.

This process takes place by below code:

data();

query = "select fakeobj from fakedata

where empid='" +

ListBox1.SelectedItem.Text + "'";

(5)

39

SqlDataReader rd1 =

cmd.ExecuteReader();

while (rd1.Read())

{

empfake = rd1[0].ToString(); // reading

fake object from database

}

rd1.Close();

con.Close();

double count = 0.1;

if (orgfake == empfake) // Comparing

with the leaked data fakeobject and admin

database fake object

{

count += 0.1; // Increasing the

count

}

Finally a chart is generated by

using these probability values. The

maximum probability employee is

considered as a leaked employee.

Result and Discussion

In the implementation part, we

created some users as the employees in the

organization. Initially all the employee is

worked for a job assignment. All the

employees need to sent the product

creation quotation to the admin. The admin

needs to confirm any one of the quote.

While confirming the quote, the admin

will add a fake object in the primary quote.

This process will done automatically in the

algorithm process. Now the unique fake

object will be revolve inside the

employees. Now in case of leakage

through the fake object, the admin can

easily find out the leaker through

probabilistic model using agent guilt

model.

Our model parameters interact and to

check if the interactions match our

intuition, in this section we study two

simple scenarios as Impact of Probability p

and Impact of Overlap between R and S.

In each scenario we have a target that has

obtained all the distributor’s objects, i.e., T

(6)

40

a data distributor has to give the sensitive

data to a set of third party agents. The

distributor's copy of data that had been

transmitted to agents may be leaked by any

of them. It is necessary to identify the

agents that have leaked data. To improve

the chances of detecting leakage data and

the guilty agents, private object is created

for each record that is sent to various

agents. The agent guilt model is used to

find the probability of identifying agents

that have leaked information. Thus it helps

the distributor to turn away from that agent

and also to protect the responsive data.

Further process can clear the data if the

agent had sent data to unauthorized person.

CONCLUSION

In a perfect world, there would be

no need to hand over sensitive data to

agents that may unknowingly or

maliciously leak it. And even if we had to

hand over sensitive data, in a perfect

world, we could watermark each object so

that we could trace its origins with

absolute certainty. However, in many

cases, we must indeed work with agents

that may not be 100 percent trusted, and

we may not be certain if a leaked object

came from an agent or from some other

source, since certain data cannot admit

watermarks. In spite of these difficulties,

we have shown that it is possible to assess

the likelihood that an agent is responsible

for a leak, based on the overlap of his data

with the leaked data and the data of other

agents, and based on the probability that

objects can be “guessed” by other means.

Our model is relatively simple, but we

believe that it captures the essential

trade-offs. The algorithms we have presented

implement a variety of data distribution

strategies that can improve the

distributor’s chances of identifying a

leaker. We have shown that distributing

objects judiciously can make a significant

difference in identifying guilty agents,

especially in cases where there is large

overlap in the data that agents must

receive.

REFERENCE

[1] Traffic Morphing: AnEfficient Defense

against Statistical Traffic Analysis, C.V.

Wright, S.E. Coull, and F. Monrose,

“Traffic Morphing: AnEfficient Defense against Statistical Traffic Analysis,”Proc.

16thNetwork and Distributed Security

Symp. (NDSS),2009

[2] A. Iacovazzi and A. Baiocchi,

“Optimum Packet Length Masking,”Proc. Int’l Teletraffic Congress,2010

[3] Data Leak Detection As a Service:

Challenges and Solutions. X. Shu and D.

(7)

41

Proc. 8th Int. Conf. Secur. Privacy

Commun. Netw. (SecureComm), Padua,

Italy, Sep. 2012, pp. 222–240.

[4] Secure and efficient outsourcing of

sequence comparisons, M. Blanton, M. J.

Atallah, K. B. Frikken, and Q. Malluhi,

“Secure and efficient outsourcing of sequence comparisons,” inProc. 17th Eur.

Symp. Res. Comput. Secur., 2012, pp.

505–522.

[5] Y. Cui and J. Widom, “Lineage

Tracing for General Data Warehouse

Transformations,”The VLDB J.,vol. 12,

pp. 41-58, 2003.

[6] S. Czerwinski, R. Fromm, and T.

Hodes, “Digital Music Distribu-tion and

Audio Watermarking,”

http://www.scientificcommons.

org/43025658, 2007.

[7] F. Guo, J. Wang, Z. Zhang, X. Ye, and

D. Li, “An Improved Algorithm to

Watermark Numeric Relational

Data,”Information Security

Applications,pp. 138-149, Springer, 2006.

[8] F. Hartung and B. Girod,

“Watermarking of Uncompressed and Compressed Video,”Signal Processing,vol.

66, no. 3, pp. 283-301, 1998.

[9] S. Jajodia, P. Samarati, M.L. Sapino,

and V.S. Subrahmanian, “Flexible Support

for Multiple Access Control Policies,”

ACM Trans. Database Systems,vol. 26,

no. 2, pp. 214-260, 2001.

[10] Y. Li, V. Swarup, and S. Jajodia,

“Fingerprinting Relational Databases: Schemes and Specialties,”IEEE Trans.

Dependable and Secure Computing,vol. 2,

no. 1, pp. 34-45, Jan.-Mar. 2005.

[11] B. Mungamuru and H.

Garcia-Molina, “Privacy, Preservation and Performance: The 3 P’s of Distributed Data Management,” technical report,

Stanford Univ., 2008.

[12] V.N. Murty, “Counting the Integer

Solutions of a Linear Equation with Unit

Coefficients,”Math. Magazine,vol. 54, no.

2, pp. 79-81, 1981.

[13] S.U. Nabar, B. Marthi, K.

Kenthapadi, N. Mishra, and R. Motwani,

“Towards Robustness in Query

Auditing,”Proc. 32nd Int’l Conf. Very Large Data Bases (VLDB ’06),VLDB

Endowment, pp. 151-162, 2006.

[14] P. Papadimitriou and H.

Garcia-Molina, “Data Leakage Detec-tion,”

technical report, Stanford Univ., 2008.

[15] P.M. Pardalos and S.A. Vavasis,

“Quadratic Programming with One

(8)

42

Global Optimization, vol. 1, no. 1, pp.

15-22, 1991.

[16] J.J.K.O. Ruanaidh, W.J. Dowling, and

F.M. Boland, “Watermark-ing Digital Images for Copyright Protection,” IEE

Proc. Vision, Signal and Image

Processing,vol. 143, no. 4, pp. 250-256,

1996.

[17] R. Sion, M. Atallah, and S. Prabhakar,

“Rights Protection for Relational

Data,”Proc. ACM SIGMOD,pp. 98-109,

2003.

[18] L. Sweeney, “Achieving

K-Anonymity Privacy Protection Using

Generalization and Suppression,”

http://en.scientificcommons.