• No results found

A Survey of Perturbation Technique For Privacy-Preserving of Data

N/A
N/A
Protected

Academic year: 2020

Share "A Survey of Perturbation Technique For Privacy-Preserving of Data"

Copied!
5
0
0

Loading.... (view fulltext now)

Full text

(1)

International Journal of Emerging Technology and Advanced Engineering

Website: www.ijetae.com (ISSN 2250-2459,ISO 9001:2008 Certified Journal, Volume 3, Issue 6, June 2013)

162

A Survey of Perturbation Technique For Privacy-Preserving of

Data

Lokesh Patel

1

, Prof. Ravindra Gupta

2

1

M. Tech, 2Ass. Professor, SSSIST, Sehore Abstract Privacy concerns over the ever-increasing

gathering of personal information by various institutions led to the development of privacy preserving data. The approach protects the privacy of the data by perturbing the data

through a method. The major challenge of data perturbation

is to achieve the desired result between the level of data privacy and the level of data utility. Data privacy and data utility are commonly considered as a pair of conflicting requirements in privacy-preserving of data for applications and mining systems. Multiplicative perturbation algorithms aim at improving data privacy while maintaining the desired level of data utility by selectively preserving the mining task and model specific information during the data perturbation process. The multiplicative perturbation algorithm may find multiple data transformations that preserve the required data utility. Thus the next major challenge is to find a good transformation that provides a satisfactory level of privacy

data. we are going to handle the problem of transforming a

database to be shared into a new one that conceals private information while preserving the general patterns and trends from the original database. I am trying to get advantage of both the dimension like more protected data and relative fast.

Keywords-- Privacy Preserving, Perturbation, Data mining.

I. INTRODUCTION

Data perturbation technique, first proposed in (Agrawal & Srikant, 2000), represents one common approach in privacy preserving data mining, where the original (private) dataset is perturbed and the result is released for data analysis. Data perturbation includes a wide variety of techniques including (but not limited to): additive,

multiplicative (Kim & Winkler, 2003), matrix

multiplicative, k-anonymization (Sweeney, 2002), micro-aggregation (Li & Sarkar, 2006), categorical data perturbation (Verykios, 2004b), data swapping (Fienberg & McIntyre, 2004), resampling (Liew, 1985), data shuffling (Muralidhar & Sarathy, 2006). Now we mostly focus on two types of data perturbation that apply to continuous data: additive and matrix multiplicative, other detailed data perturbation techniques can refer to the related literatures.

1 Additive perturbation

The additive perturbation is a technique for privacy-preserving data mining in which noise is added to the data in order to mask the attribute values of records (Agrawal & Srikant, 2000). The noise added is sufficiently large so that individual record values cannot be recovered. Therefore, techniques are designed to derive aggregate distributions from the perturbed records. Subsequently, data mining techniques can be developed in order to work with these aggregate distributions.

2 Matrix multiplicative perturbation

The most common method of data perturbation is that of additive perturbations. However, matrix multiplicative perturbations can also be used to good effect for privacy-preserving data mining.

The product of a discrete cosine transformation matrix and a truncated perturbation matrix, then the perturbation

approximately preserves Euclidean distances. New

Fundamental Technologies in Data Mining

3 Evaluation of data perturbation technique

(2)

International Journal of Emerging Technology and Advanced Engineering

Website: www.ijetae.com (ISSN 2250-2459,ISO 9001:2008 Certified Journal, Volume 3, Issue 6, June 2013)

163 Reference (Liu et al., 2006) provides a detailed survey of attack techniques on the data perturbation, especially additive and matrix multiplicative perturbation. These attacks offer insights into vulnerabilities data perturbation techniques under certain circumstances. In summary, the following information could lead to disclosure of private information from the perturbed data.

a). Attribute Correlation: Many real world data has strong correlated attributes, and this correlation can be used to filter off additive white noise.

b). Known Sample: Sometimes, the attacker has certain background knowledge about the data such as the p.d.f. or a collection of independent samples which may or may not overlap with the original data.

c). Known Inputs/Outputs: Sometimes, the attacker knows a small set of private data and their perturbed counterparts. This correspondence can help the attacker to estimate other private data.

d). Data Mining Results:The underlying pattern discovered by data mining also provides a certain level of knowledge which can be used to guess the private data to a higher level of accuracy.

e). Sample Dependency: Most of the attacks assume the data as independent samples from some unknown distribution. This assumption may not hold true for all real applications. For certain types of data, such as the time series data, there exists auto correlation/dependency among the samples. How this dependency can help the attacker to estimate the original data is still an open problem.

At the same time, a ―privacy/accuracy‖ trade-off is faced for the data perturbation technique. On the one hand, perturbation must not allow the original data records to be adequately recovered. On the other, it must allow ―patterns‖ in the original data to be mined.

Data perturbation technique is needed for situations where accessing the original form of the data attributes is mandatory. It happens when, for instance, some conventional off-the shelf data analysis techniques are to be applied. While this approach is more generic, some inaccuracy in the analysis result is to be expected.

4 Application of the data perturbation technique

The randomization method has been extended to a variety of data mining problems. Refercence (Agrawal & Srikant, 2000) firstly discussed how to use the approach for solving the privacy preserving classification problem classification. Refercence (Zhang et al. 2005; Zhu & Liu, 2004) have also proposed a number of other techniques which seem to work well over a variety of different classifiers.

Privacy Preserving Data Mining.

There has been research considering preserving privacy for other type of data mining. For instance, reference (Evfimievski et al., 2002) proposed a solution to the privacy preserving distributed association mining problem. The problem of association rules is especially challenging because of the discrete nature of the attributes corresponding to presence or absence of items. In order to deal with this issue, the randomization technique needs to be modified slightly. Instead of adding quantitative noise, random items are dropped or included with a certain probability. The perturbed transactions are then used for aggregate association rule mining. The randomization approach has also been extended to other applications, for example, SVD based collaborative filtering (Polat & Du, 2005).

II. PROPOSED WORK

I-TRBP Method

We propose my new method Improved Translation & Rotation-Based Perturbation (I-TRBP). Our this method

takes the reference from the Geometric Data

Transformation (GDT) methods This method is designed to protect the underlying attribute values subjected to clustering by translating, rotating & normalizing the values of two attributes at a time. But in our method, we develop it for Classification and Clustering, we use translation, rotation, pair-wise security threshold value with variable level of privacy, and best of our knowledge this type of work is not happened.

General Assumptions

This approach to distort data points in the n-dimensional space draws the following assumptions:

The data matrix D contains confidential and

non-confidential numerical attributes that must be perturbed to

protect individual data values before

clustering/classification.

The existence of an Identification of object may be revealed but it could be also anonymized by suppression. However, the values of the attributes associated with an object are private and must be protected.

The perturbation I-TRBP when applied to a data matrix D must preserve the clusters and classes of different datasets.

(3)

International Journal of Emerging Technology and Advanced Engineering

Website: www.ijetae.com (ISSN 2250-2459,ISO 9001:2008 Certified Journal, Volume 3, Issue 6, June 2013)

164 Suppressing Identifiers. The existence of a particular

object, say ID, could be suppressed/revealed depending on the application, but it could be suppressed when data is made public (e.g. census, social benefits).

Data Perturbation Technique:

The Perturbation technique which is accomplished by the alteration of an attribute value by a new value (i.e., changing a 1-value to a 0-value, or adding noise). Recent developments in information technology have made possible the collection and analysis of millions of transactions containing personal data. These data include shopping habits, criminal records, medical histories, and credit records & others. This progress in the storage and analysis of data has led individuals and organizations to face the challenge of turning such data into useful information and knowledge.

The problem with data perturbation is that doing it indiscriminately can have unpredictable impacts on data mining. Vladimir Estivill-Castro and Ljiljana Brankovic explored the impact of data swapping on data mining for decision rules (combinations of attributes that are effective at predicting a target class value).5 Full swapping (ensuring that no original records appear in the perturbed data set) can prevent effective mining, but the authors concluded that limited swapping has a minimal impact on the results. The key for privacy preservation is that you don’t know which records are correct; you simply have to assume that the data doesn’t contain real values. Randomization, adding noise to hide actual data values, works because most data mining methods construct models that generalize the data. On average, adding noise (if centered around zero) preserves the data’s statistics, so we can reasonably expect that the data mining models will still be correct. Let’s assume we’re building a model that classifies individuals in to ―safe‖ and ―unsafe‖ driver categories. A likely decision rule for such a model would state that drivers between the ages of 30 and 50 are likely to be safe. Now assume we add random noise to the ages to prevent discovery of an individual’s driving record simply by knowing his or her age. Some safe drivers between the ages of 30 and 50 will be moved into other age brackets, and some unsafe drivers who are younger or older will be moved into the 30 to 50 bracket, but the 30 to 50 bracket will also lose unsafe drivers and gain safe drivers from other age brackets. On the whole, drivers in the 30 to 50 range will still likely be safer—even on the perturbed data.

At the same time, sharing of data is often beneficial in data mining applications. It has been proven useful to support both decision-making processes and to promote social goals.

Despite its benefits in various areas, the use of data mining techniques can also result in new threats to privacy and information security. The problem is not data mining itself, but the way data mining is done & those problems are related with privacy and data security.

1. It refers to the distribution of data. Some of the approaches have been developed for centralized data, while others refer to a distributed data scenario. Distributed data scenarios can also be classified as horizontal data distribution and vertical data distribution. 2.In general, data modification is used in order to modify the original values of a database that needs to be released to the public and in this way to ensure high privacy protection. It is important that a data modification technique should be in concert with the privacy policy adopted by an organization. Methods of modification include:

Perturbation, which is accomplished by the alteration of an attribute value by a new value (i.e., changing a 1-value to a 0-value, or adding noise),

Blocking, which is the replacement of an existing attribute value with a ―?‖,

Aggregation or merging which is the combination of several values into a coarser category,

Swapping that refers to interchanging values of individual records, and

Sampling, which refers to releasing data for only a sample of a population?

3.It refers to the data mining algorithm, for which the data modification is taking place. This is actually something that is not known beforehand, but it facilitates the analysis and design of the data hiding algorithm. We have included the problem of hiding data for a combination of data mining algorithms, into our future research agenda. For the time being, various data mining algorithms have been considered in isolation of each other. Among them, the most important ideas have been developed for classification data mining algorithms, like

decision tree inducers, association rule mining

algorithms, clustering algorithms, rough sets and Bayesian networks.

(4)

International Journal of Emerging Technology and Advanced Engineering

Website: www.ijetae.com (ISSN 2250-2459,ISO 9001:2008 Certified Journal, Volume 3, Issue 6, June 2013)

165 5. It is the most important, refers to the privacy

preservation technique used for the selective

modification of the data. Selective modification is required in order to achieve higher utility for the modified data given that the privacy is not jeopardized.

III. SCOPE

A closely related research area is privacy-preserving data mining [Aggarwal and Yu 2008c]. The term, privacy-preserving data mining (PPDM), emerged in 2000

[Agrawal and Srikant 2000]. The initial idea of PPDM was

to extend traditional data mining techniques to work with the data modified to mask sensitive information. The key issues were how to modify the data and how to recover the data mining result from the modified data. The solutions were often tightly coupled with the data mining algorithms under consideration. In contrast, PPDP may not necessarily be tied to a specific data mining task, and the data mining task may be unknown at the time of data publishing. the step of reconstructing the original data distribution which has high computation cost and the step of modifying

mining algorithm are not needed any more. Privacy

protection in emerging technologies. Emerging technologies, like location based services [Atzori et al. 2007; Hengartner 2007; You et al. 2007], RFID [Wang et al. 2006], bioinformatics, and mashup web applications, enhance our quality of life. These new technologies allow corporations and individuals to have access to previously unavailable information and knowledge; however, such benefits also bring up many new privacy issues. Nowadays, once a new technology has been adopted by a small community, it can become very popular in a short period of time. A typical example is ACM Computing Surveys, Vol. 42, No. 4, Article 14, Publication date: June 2010. Privacy-Preserving Data Publishing: A Survey of Recent Developments 14:47 the social network application called Facebook.2 Since its deployment in 2004, it has acquired 70 million active users. Due to the massive number of users, the harm could be extensive if the new technology is misused. One research direction is to customize existing privacy-preserving models for emerging technologies.

IV. FUTURE RESEARCH

We have study and analysis uses of privacy preserving in data mining and different methods of privacy preserving in data mining. The key challenge lies in preventing the data miners from combining copies at different trust levels to jointly reconstruct the original data more accurate than what is allowed by the data owner. We address this challenge by properly correlating noise across copies at different trust levels.

We prove that if we design the noise covariance matrix to have corner-wave property, then data miners will have no diversity gain in their joint reconstruction of the original data. We verify our claim and demonstrate the effectiveness of our solution through numerical evaluation. Because of the increasing ability to trace and collect large amount of personal information, privacy preserving in data inning applications has become an important concern. Information sharing has become part of the routine activity of many individuals, companies, organizations, and government agencies. Privacy-preserving ata publishing is a promising approach to information sharing, while preserving individual privacy and protecting sensitive information. In this survey, we reviewed the recent developments in the field. The general objective is to transform the original data into some anonymous form to prevent from inferring its record owners’ sensitive information. We presented our views on the difference between preserving data publishing and privacy-preserving data mining, and gave a list of desirable properties of a privacy-preserving data publishing method. We reviewed and compared existing methods in terms of privacy models, anonymization operations, information metrics, and anonymization algorithms. Most of these approaches assumed a single release from a single publisher, and thus only protected the data up to the first release or the first recipient.

V. CONCLUSION

We are reviewed several works on more challenging publishing scenarios, including multiple release publishing, sequential release publishing, continuous data publishing, and collaborative data publishing. Privacy protection is a complex social issue, which involves policy-making, technology, psychology, and politics. Privacy protection research in computer science can provide only technical solutions to the problem. Successful application of privacy preserving technology will rely on the cooperation of policy makers in governments and decision makers in companies and organizations. Unfortunately, while the deployment of privacy-threatening technology, such as

RFID and social networks, grows quickly, the

(5)

International Journal of Emerging Technology and Advanced Engineering

Website: www.ijetae.com (ISSN 2250-2459,ISO 9001:2008 Certified Journal, Volume 3, Issue 6, June 2013)

166 REFERENCES

[1 ] A SURVEY OF MULTIPLICATIVE PERTURBATION FOR PRIVACY -PRESERVING DATA MINING KEKE CHEN, LING LIU HTTP://WWW.RESEARCHGATE.NET/PUBLICATION/226023287_A_SUR VEY_OF_MULTIPLICATIVE_PERTURBATION_FOR_PRIVACY -PRESERVING_DATA_MINING 06/2008; DOI:10.1007/978-0-387-70992-5_7.

[2 ] Wu, Chai Wah Circuits and Systems, 2005. ISCAS 2005. IEEE

International Symposium on

http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=1465887&url =http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Fa rnumber%3D1465887

[3 ] ABUL, O.,BONCHI, F., AND NANNI,M. 2008. Never walk alone:

Uncertainty for anonymity in moving objects databases. In Proceedings of the 24th IEEE International Conference on Data Engineering (ICDE). 376–385.

[4 ] Parth Prajapati,&2Prof. Mahesh Panchal. A NOVEL ALGORITHM

FOR PPDM TO HIDE ASSOCIATION RULE

http://uniascit.in/files/documents/2012_0202010.pdf

[5 ] Divya Sharma A Survey on Maintaining Privacy in Data Mining

http://www.ijert.org/browse/april-2012-edition?download=24%3Aa-survey-on-maintaining-privacy-in-data-mining.

[6 ] Shweta Shrma Hitesh Gupta Priyank Jain A Study Survey of

Privacy Preserving Data Mining

http://www.ijrice.com/docs/IJRICE20120206.pdf

[7 ] Likun Liua, liang Hub, Di Wangc, Yanmei Huod, Lei Yange, Kexin

Yangf Two Noise Addition Methods For Privacy-Preserving Data -press.org/ijwmt/ijwmt-v2-n3/IJWMT-V2-N3-5.pdf

[8 ] Yaping Li, Minghua Chen, Qiwei Li, and Wei Zhang Enabling

Multilevel Trust in Privacy Preserving Data Mining

http://www.statfe.com/papers/tkde2012_mltppdm.pdf

[9 ] ADAM, N. R. AND WORTMAN, J. C. 1989. Security control

methods for statistical databases. ACM Comput. Surv. 21, 4, 515– 556.

[10 ]AGGARWAL, C. C. AND YU, P. S. 2008a. A framework for

condensation-based anonymization of string data. Data Min. Knowl. Discov. 13, 3, 251–275.

[11 ]AGGARWAL, C. C. AND YU, P. S. 2008b. On static and dynamic

methods for condensation-based privacypreserving data mining. ACM Trans. Datab. Syst. 33, 1.

[12 ]AGGARWAL, C. C. AND YU, P. S. 2008c. Privacy-Preserving

Data Mining: Models and Algorithms. Springer, Berlin.

[13 ]AGGARWAL, C. C. AND YU, P. S. 2007. On privacy-preservation

of text and sparse binary data with sketches.In Proceedings of the SIAM International Conference on Data Mining (SDM).

[14 ]AGGARWAL, C. C., PEI, J., AND ZHANG, B. 2006. On privacy

preservation against adversarial data mining. In Proceedings of the 12th ACM SIGKDD. ACM, New York.

[15 ]AGGARWAL, C. C. 2005. On k-anonymity and the curse of

dimensionality. In Proceedings of the 31st Conference on Very Large

Data Bases (VLDB). 901–909.

[16 ]AGGARWAL, G., FEDER, T., KENTHAPADI, K., MOTWANI,

R., PANIGRAHY, R., THOMAS, D., AND ZHU, A. 2006. Achieving anonymity via clustering. In Proceedings of the 25th ACM SIGMOD-SIGACT-SIGART PODS Conference. ACM, New York.

[17 ]AGGARWAL, G., FEDER, T., KENTHAPADI, K., MOTWANI, R., PANIGRAHY, R., THOMAS, D., AND ZHU, A. 2005. Anonymizing tables. In Proceedings of the 10th International

References

Related documents

Marcu1pm Original Article Heterogeneous nuclear ribonucleoprotein complexes from Xenopus laevis oocytes and somatic cells AUDREY MARCU, BHRAMDEO BASSIT, ROSALIE PEREZ and SERAF?N PI?OL

All Android apps (n = 21) and most (92%, n = 23/25) iOS apps had issues affecting data input that might increase the risk of an incorrect value being used for calculation (Table

The dual structure based on internal box is found effective for thermal barrier system, which can protect the electronic device inside of firefighting robot from

that may contribute to sexual dysfunction following a burn injury: psychopathology factors (the presence of adjustment problems including PTSD and depression symptomologies),

Platelet counts were low in children presenting with SMA (but only significantly so for the HIV-uninfected groups) than non-malaria controls, regardless of their HIV status (P

Intrunet SPC5000/6000 System example 2 X-BUS spurs or 1 loop System expander (8 inputs/ 2 outputs) System expander (8 outputs) Wireless expander System expander (Smart Power

The objectives of this study were to carry out a primary screening for the phytochemicals and antibacterial activity of methanol extracts of Origanum majorana ,