35
EFFICIENT DETECTION AND PREVENTION OF
DATA LEAKEAGE IN DTE AND DCE
A.N.VINODHINI1 Dr.S.AYYASAMY2
PG Scholar Professor
Dr.N.G.P. Institute of Technology Dr.N.G.P. Institute of Technology
Coimbatore-641665 Coimbatore-641665
TamilNadu, India. TamilNadu, India
ABSTRACT
A sensitive data leakage on DTE and DCE is a serious threat to organizational
security. Data loss occur may be in lack of proper encryption on files so, organization need
tools to store and transfer sensitive data. However detection and prevention of sensitive information is challenging in DTE and DCE . Now a day’s automation products are getting increased and need of the automotive products also. Still the existing system are facing more drawbacks in security in the current software. The most important factor is computerizing all the data in a centralized server and taking backups of old records.
The main objective of this project is to securely communicate between the production unit, administration unit, sales unit, quality check unit and store house unit and to avoid data leakage problem. A Data distributor has given sensitive data to supposedly trusted agents (unauthorized party). Some data has been leaked and found in an unauthorized place(e.g. on web or in somebody’s laptop). The distributor must assess the likelihood that the leaked data came from one or more agents, as opposed to having been gathered by other independent means. Hence we design in such a way that the data on reaching any destination say agent or unauthorized party, data as well as the IP address of the receiver will reach the distributor. If the distributor receives an IP address other an agent’s address, the distributor finds out that the data has been leaked. Fake object is inserted along with the data. From the data received, the distributor compares with their database and find out which agent has been leaked the data by calculating the probability for each agent on the leaked data. IP synchronization has been introduced in this project, this leads to avoid data leakage. The main goal of this project is to reduce manual works, increase the processing speed and ensure reliability of data.
Keywords : Leakage detection, sensitive data, guilt model, IP synchronization, Fake object, access control
INTRODUCTION
In the course of doing business,
sometimes sensitive data must be handed
over to supposedly trusted third parties
.For example, a hospital may give patient
records to researchers who will devise new
treatments. Similarly, a company may
have partnerships with other companies
that require sharing customer data.
Another enterprise may outsource its data
36
various other companies. We call the
owner of the data the distributor and the
supposedly trusted third parties the agents.
Our goal is to detect when the distributor’s
sensitive data have been leaked by agents,
and if possible to identify the agent that
leaked the data. We consider applications
where the original sensitive data cannot be
perturbed. Perturbation is a very useful
technique where the data are modified and
made “less sensitive” before being handed
to agents. For example, one can add
random noise to certain attributes, or one
can replace exact values by ranges.
However, in some cases, it is important not
to alter the original distributor’s data. For
example, if an outsourcer is doing our
payroll, he must have the exact salary and
customer bank account numbers. If
medical researchers will be treating
patients (as opposed to simply computing
statistics), they may need accurate data for
the patients. Traditionally, leakage
detection is handled by water-marking,
e.g., a unique code is embedded in each
distributed copy. If that copy is later
discovered in the hands of an unauthorized
party, the leaker can be identified.
Watermarks can be very useful in some
cases, but again, involve some
modification of the original data.
Furthermore, watermarks can sometimes
be destroyed if the data recipient is
malicious
RELATED WORKS
This paper, they focus on the use of
our morphing techniques in thwarting
traffic classifiers that utilize features based
on packet sizes.[1] As an example of the
usage of our technique, consider a general
web page classifier that uses packet sizes
to determine the identity of web pages. If
the user connects to www.webmd.com to
search for medical information over an
encrypted connection where packet sizes
are not padded, the web page classifier
would examine these sizes and determine
that the user has indeed gone to
www.webmd.com. The approach we take
allows the user (with the cooperation of
the web server or proxy) to morph her
download to appear as a different web
page(e.g., www.espn.com) to the
classifier[1]. Masking can be obtained by
means of padding and fragmenting. We
define formally what theideal target of
masking is, and then define the masking
problem as a statistical optimization
problem, aiming at minimizing the
required overhead. We find the optimal
solution of the masking problem in case of
two application types [2]. We perform
extensive experimental evaluation on the
37
tolerance of our techniques. Our
evaluation results under various data-leak
scenarios and setups show that our method
can support accurate detection with very
small number of false alarms, even when
the presentation of the data has been
transformed [3]. Design and development
of secure outsourcing techniques of
various functionalities to untrusted servers
are getting growing attention in the
research community. The rapid growth in
availability of cloud services, makes such
services attractive for clients with limited
computing or storage resources who are
unwilling or unable to procure and
maintain their own computing
infrastructure [4].
FAKE OBJECT
The distributor may be able to add
fake objects to the distributed data in order
to improve his effectiveness in detecting
guilty agents. However, fake objects may
impact the correctness of what agents do,
so they may not always be allowable. The
idea of perturbing data to detect leakage is
not new, e.g., [1]. However, in most cases,
individual objects are perturbed, e.g., by
adding random noise to sensitive salaries,
or adding a watermark to an image. In our
case, we are perturbing the set of
distributor objects by adding fake
elements. In some applications, fake
objects may cause fewer problems that
perturbing real objects. For example, say
that the distributed data objects are
medical records and the agents are
hospitals. In this case, even small
modifications to the records of actual
patients may be undesirable. However, the
addition of some fake medical records may
be acceptable, since no patient matches
these records, and hence, no one will ever
be treated based on fake records. Our use
of fake objects is inspired by the use of
“trace” records in mailing lists. In this
case, company A sells to company B a
mailing list to be used once (e.g., to send
advertisements). Company A adds trace
records that contain addresses owned by
company A. Thus, each time company B
uses the purchased mailing list, A receives
copies of the mailing. These records are a
type of fake objects that help identify
improper use of data. The distributor
creates and adds fake objects to the data
that he distributes to agents. As discussed
below, fake objects must be created
carefully so that agents cannot distinguish
them from real objects.
IP SYNCHRONIZATION
An IP address has three or
four octets (parts). It cannot have a number
above 255 in any of its octets. All the
38
and 255. For example, 209.20.5 or
216.222.8.131. The former will include the
entire IP range from
209.20.5.0-209.20.5.255. The latter (216.222.8.131) is
an address within a IP range. IP addresses
are "Internet Protocol" addresses, or
numeric addresses assigned to servers and
users connected to the Internet. Some
Internet services use the source address of
the client's computer as a form of
authentication. These systems keep track
of the Internet Protocol (IP) address that an
end user used the last time that user
accessed the site and try to determine if the
user is legitimate. When that same user
accesses the site from a different source IP
address, the site asks for further
authentication to revalidate the client's
computer. The theory is that a user's
typical location computer has a somewhat
persistent IP address, but when the user
has a new address, that user may be mobile
or using a less secure wireless media, and
then require further authentication. For
example, many organizations have firewall
policies with objects named like “Admin
Laptop" with the single IP address of his
computer. This technique is used by some
banking sites, some online gaming sites,
and Gmail. So that once IP has been
synchronized with the network means, the
application in the network can be accessed
only through the particular IP.
AGENT GUILT MODEL
Algorithm
Guilt Probability Calculation:
The highest probability employee
is considered as a leaker. The probability
is calculated by using the fake data which
is sent along with the leaked data. The
admin finds to which the fake data belongs
by referring to his database. If the fake
data of an employee matches with the
admin database the probability is increased
to 0.1. It is the process of finding the
leaked data by using the fake object which
is inserted along with the data at the time
of leakage for detecting the
employee(leaker). From the data received,
the admin compares with his database and
find out which employee has leaked the
data by calculating the probability for each
employee on the leaked data. The admin
selects the leaked data and generates that
which employees hold these data.
This process takes place by below code:
data();
query = "select fakeobj from fakedata
where empid='" +
ListBox1.SelectedItem.Text + "'";
39
SqlDataReader rd1 =
cmd.ExecuteReader();
while (rd1.Read())
{
empfake = rd1[0].ToString(); // reading
fake object from database
}
rd1.Close();
con.Close();
double count = 0.1;
if (orgfake == empfake) // Comparing
with the leaked data fakeobject and admin
database fake object
{
count += 0.1; // Increasing the
count
}
Finally a chart is generated by
using these probability values. The
maximum probability employee is
considered as a leaked employee.
Result and Discussion
In the implementation part, we
created some users as the employees in the
organization. Initially all the employee is
worked for a job assignment. All the
employees need to sent the product
creation quotation to the admin. The admin
needs to confirm any one of the quote.
While confirming the quote, the admin
will add a fake object in the primary quote.
This process will done automatically in the
algorithm process. Now the unique fake
object will be revolve inside the
employees. Now in case of leakage
through the fake object, the admin can
easily find out the leaker through
probabilistic model using agent guilt
model.
Our model parameters interact and to
check if the interactions match our
intuition, in this section we study two
simple scenarios as Impact of Probability p
and Impact of Overlap between R and S.
In each scenario we have a target that has
obtained all the distributor’s objects, i.e., T
40
a data distributor has to give the sensitive
data to a set of third party agents. The
distributor's copy of data that had been
transmitted to agents may be leaked by any
of them. It is necessary to identify the
agents that have leaked data. To improve
the chances of detecting leakage data and
the guilty agents, private object is created
for each record that is sent to various
agents. The agent guilt model is used to
find the probability of identifying agents
that have leaked information. Thus it helps
the distributor to turn away from that agent
and also to protect the responsive data.
Further process can clear the data if the
agent had sent data to unauthorized person.
CONCLUSION
In a perfect world, there would be
no need to hand over sensitive data to
agents that may unknowingly or
maliciously leak it. And even if we had to
hand over sensitive data, in a perfect
world, we could watermark each object so
that we could trace its origins with
absolute certainty. However, in many
cases, we must indeed work with agents
that may not be 100 percent trusted, and
we may not be certain if a leaked object
came from an agent or from some other
source, since certain data cannot admit
watermarks. In spite of these difficulties,
we have shown that it is possible to assess
the likelihood that an agent is responsible
for a leak, based on the overlap of his data
with the leaked data and the data of other
agents, and based on the probability that
objects can be “guessed” by other means.
Our model is relatively simple, but we
believe that it captures the essential
trade-offs. The algorithms we have presented
implement a variety of data distribution
strategies that can improve the
distributor’s chances of identifying a
leaker. We have shown that distributing
objects judiciously can make a significant
difference in identifying guilty agents,
especially in cases where there is large
overlap in the data that agents must
receive.
REFERENCE
[1] Traffic Morphing: AnEfficient Defense
against Statistical Traffic Analysis, C.V.
Wright, S.E. Coull, and F. Monrose,
“Traffic Morphing: AnEfficient Defense against Statistical Traffic Analysis,”Proc.
16thNetwork and Distributed Security
Symp. (NDSS),2009
[2] A. Iacovazzi and A. Baiocchi,
“Optimum Packet Length Masking,”Proc. Int’l Teletraffic Congress,2010
[3] Data Leak Detection As a Service:
Challenges and Solutions. X. Shu and D.
41
Proc. 8th Int. Conf. Secur. Privacy
Commun. Netw. (SecureComm), Padua,
Italy, Sep. 2012, pp. 222–240.
[4] Secure and efficient outsourcing of
sequence comparisons, M. Blanton, M. J.
Atallah, K. B. Frikken, and Q. Malluhi,
“Secure and efficient outsourcing of sequence comparisons,” inProc. 17th Eur.
Symp. Res. Comput. Secur., 2012, pp.
505–522.
[5] Y. Cui and J. Widom, “Lineage
Tracing for General Data Warehouse
Transformations,”The VLDB J.,vol. 12,
pp. 41-58, 2003.
[6] S. Czerwinski, R. Fromm, and T.
Hodes, “Digital Music Distribu-tion and
Audio Watermarking,”
http://www.scientificcommons.
org/43025658, 2007.
[7] F. Guo, J. Wang, Z. Zhang, X. Ye, and
D. Li, “An Improved Algorithm to
Watermark Numeric Relational
Data,”Information Security
Applications,pp. 138-149, Springer, 2006.
[8] F. Hartung and B. Girod,
“Watermarking of Uncompressed and Compressed Video,”Signal Processing,vol.
66, no. 3, pp. 283-301, 1998.
[9] S. Jajodia, P. Samarati, M.L. Sapino,
and V.S. Subrahmanian, “Flexible Support
for Multiple Access Control Policies,”
ACM Trans. Database Systems,vol. 26,
no. 2, pp. 214-260, 2001.
[10] Y. Li, V. Swarup, and S. Jajodia,
“Fingerprinting Relational Databases: Schemes and Specialties,”IEEE Trans.
Dependable and Secure Computing,vol. 2,
no. 1, pp. 34-45, Jan.-Mar. 2005.
[11] B. Mungamuru and H.
Garcia-Molina, “Privacy, Preservation and Performance: The 3 P’s of Distributed Data Management,” technical report,
Stanford Univ., 2008.
[12] V.N. Murty, “Counting the Integer
Solutions of a Linear Equation with Unit
Coefficients,”Math. Magazine,vol. 54, no.
2, pp. 79-81, 1981.
[13] S.U. Nabar, B. Marthi, K.
Kenthapadi, N. Mishra, and R. Motwani,
“Towards Robustness in Query
Auditing,”Proc. 32nd Int’l Conf. Very Large Data Bases (VLDB ’06),VLDB
Endowment, pp. 151-162, 2006.
[14] P. Papadimitriou and H.
Garcia-Molina, “Data Leakage Detec-tion,”
technical report, Stanford Univ., 2008.
[15] P.M. Pardalos and S.A. Vavasis,
“Quadratic Programming with One
42
Global Optimization, vol. 1, no. 1, pp.
15-22, 1991.
[16] J.J.K.O. Ruanaidh, W.J. Dowling, and
F.M. Boland, “Watermark-ing Digital Images for Copyright Protection,” IEE
Proc. Vision, Signal and Image
Processing,vol. 143, no. 4, pp. 250-256,
1996.
[17] R. Sion, M. Atallah, and S. Prabhakar,
“Rights Protection for Relational
Data,”Proc. ACM SIGMOD,pp. 98-109,
2003.
[18] L. Sweeney, “Achieving
K-Anonymity Privacy Protection Using
Generalization and Suppression,”
http://en.scientificcommons.