2020 4th International Conference on Modelling, Simulation and Applied Mathematics (MSAM 2020) ISBN: 978-1-60595-674-9
A Novel Method to Protect Private Information in Big Data Environment
Jian Wang
College of Computer and Information Engineering, Henan University of Economics and Law, Zhengzhou, China
*Corresponding author
Keywords: Data privacy, Privacy protocol, Block chain, Big data.
Abstract. In the big data environment, through the analysis and induction of massive data, mining out potential models and studying the market operation rules, can help enterprises adjust strategies, reduce risks and improve efficiency. However, with the wide application of new technology, the sensitive information of users may be disclosed while mining rules. In the big data environment, the amount of data related to users is large. If the privacy information in these data is leaked, it will cause great harm to users. In order to avoid the disclosure of sensitive information in the process of users accessing services in the big data environment, our research will propose new method to protect the privacy data of users in big data environment. Our research design a new approach based on private matching protocol using block chain technology to protect private data in big data environment.
Introduction
Traditional e-commerce is based on the development of the modern internet technology, so its business model and infrastructure construction are inborn[1]. In addition, its rapid development has also brought about the uneven quality of goods caused by the lack of supervision, the low level of customer satisfaction in logistics and distribution, and the leakage of customer information, which has become the bottleneck of its development.
The network infrastructure is not sound. E-commerce is a business activity based on modern network communication technology. The construction of its information infrastructure determines the fast characteristics of e-commerce. In spite of the great development of infrastructure construction in recent years, the development of e-commerce is restricted by the unbalance of many network resources in the late beginning of the information technology[2]. With the development of e-commerce technology, the requirement of network bandwidth has also been greatly improved. Therefore, strengthening the construction of network communication infrastructure, improving network speed, popularizing network coverage and reducing operating costs have become the primary concerns to promote the development of e-commerce applications.
The problem of payment is outstanding. Payment as the core link of electronic transactions, its convenience and safety is the important cornerstone for ensuring the rapid development of e-commerce. Although the volume of online transactions has increased year by year, the problems of online payment security and personal account privacy are still noticeable. The reasons are as follows. Firstly, the whole network payment system and the traditional financial system are not fully docking at present. Secondly, the corresponding legal lag after the disburse disputes cannot solve the payment dispute. Thirdly, what is the user's complicated operation is the hidden danger of payment security. Fourthly, the limited coverage area cannot meet the users' payment in remote areas. What has been stated above has seriously affected the promotion of e-commerce market.
express companies, but they have only completed the most basic distribution service in the logistics system, and the time is long, the channel is not smooth. It is not good in the customer's word of mouth[4]. It cannot meet the requirements of the high efficiency and low cost in e-commerce logistics and distribution.
Overview of the Approach
In cloud computing users always hope their identity can’t be disclosed while they access the cloud service. Users always hope they can anonymously access cloud service. But nowadays the cloud technology can’t meet users’ private need. There is a big and important problem about how to avoid the disclosure of users’ identity when users access cloud service. Although there are many research focus on the security issue in cloud, none work has been carefully addressed about the proposed problem. To allay users' concerns of disclosure of their identity, we proposed a novel approach based on anonymous access control to protect users’identity in cloud computing. We also use ring signature technology in this method.
After the emergence of e-commerce, it shows a rapid growth trend by virtue of its unique advantages and vigorous vitality. The scale of e-commerce market has expanded rapidly, and profitability has been significantly improved. However, in the process of e-commerce development, the upper and lower industry links of e-commerce also put forward new demands for e-commerce. Payment, payment security, logistics and distribution, after-sales service and other issues have become the main problems that restrict the development of e-commerce. The operation obstacle of e-commerce market is beginning to highlight. In the e-commerce market, the application of internet of things technology can make use of the advantages of the internet of things to improve the problems in the operation of e-commerce. The development of the internet of things technology will have a great impact on the subject and the transaction of e-commerce, and will also play an important role in the business management, supplier channel, consumer shopping experience, customer relationship management and other fields.
At present, the e-commerce market has become mature, but due to the transformation of the e-business market from the traditional retail trade model, there must be objective conditions and other factors, the overall market scale of e-commerce is still not perfect, the level of automation is insufficient, and the network foundation is not enough. The technology of internet of things can do better in improving automatic processing and optimizing network. Therefore, the application of internet of things technology in e-commerce can more efficiently and properly draw lessons from the operation and management of e-commerce.
Private Matching Protocol
The reason why we use privacy matching technology is to ensure that in the process of intersection of user's data and service provider's data, both parties cannot access each other's data. Under the condition of not disclosing user's sensitive information, we can judge whether the user's generalized record satisfies k-anonymity of service provider's data table. The following details the privacy matching protocol proposed in this section.
(1) Both Record A (named A, for short) and Record B (named B, for short) apply hash function h to their sets.
Xa=h (Va); Xb=h (Vb);
Each party randomly chooses a secret key. And ea is the key for A, eb is the key for B. (2) Both parties encrypt their hashed sets:
Ya=Fea (Xa)= Fea (h (Va)); Yb=Feb (Xb)=Feb (h (Vb)).
(a) B ships to A its set Yb=Feb (h (Vb)).
(b) B encrypts each yYa, with B’ key eb and sends back to A the pairs <y, Feb (y)>=<Fea (h (v)), Feb (Fea (h (v)))>.
(5) A encrypts each yYb, with A’s key ea, obtaining Zb=Fea (y)=Fea (Feb (h (v))), where the vVb.
Also, from pairs <Fea (h (v)), Feb (Fea (h (v)))> obtained in Step 4 (b) for the vVa, It creates pairs <v, Feb (Fea (h (v)))> by replacing Fea (h (v)) with the corresponding v.
(6) For each vVa, if this element vVa meets (Feb (Fea (h (v))))Zb, then this vVb, thus return True . Else return False.
Using Blockchain Technology to Protect Private Data in Big Data Environment
In the big data environment, the use of blockchain technology to protect users' personal privacy information is to get rid of the threat of privacy disclosure brought by the trusted third party. In the blockchain, although both sides of any transaction are untrustworthy, cryptography technologies based on the blockchain, such as digital signature, asymmetric encryption, consensus mechanism, can still let both sides complete the transaction, and in the process of transaction, their privacy information will not be disclosed.
In the big data environment, most of the services that users apply for and receive are mobile applications. Through network and app clients. In these cases, a major security threat is that users need to grant a set of permissions when registering. These permissions are granted indefinitely, and these applications continuously collect user personal data without the user's knowledge. Every user has no idea how their data is collected and accessed.
[image:3.595.153.448.496.687.2]Therefore, we plan to design a system to ensure that users have the right to control their personal data. Because our system thinks that the user is the owner of the data, and that the service as a visitor only has authorized access rights. At any given time, users can change the permission set and revoke access to previously collected data. Access control policies are stored securely on the block chain, where only legitimate users are allowed to change them. In addition, we design a validation algorithm for legitimate users.
Figure 1. The proposed decentralized frame.
The distributed hash table DHT is used for records on the block chain to ensure that the data is not tampered with, so as to ensure data integrity. This distributed hash table is maintained by a series of network nodes, which are separated from the nodes in the block chain. These network nodes are responsible for authorizing read and write operations.
Through the public key encryption processing authentication, and the implementation similar to DNS, the existing and widely accepted ID identity is mapped to the individual physical address. The synchronization algorithm processes the data exchange between users and services, and verifies the license through the database authentication server.
Experimental Evaluation
In this section we design experiments to show the application and performance of our proposed MAGA algorithm. In the following experiment we use IBM Quest Synthetic Data Generator to generate the experimental data. This experiment uses 10 nodes responsible for the implementation of the work. Each node is equipped with the notebook whose CPU is Intel Core i3 2.13 GHZ, memory is 4 GB and hard disk is 500 GB. This experiment compares our proposed MAGA algorithm and the traditional encryption decryption algorithm (TEDA, for short) on the execution time. Throughout the experiments, MAGA algorithm and TEDA algorithm use Java programming language.
In the experiment, we change the number of nodes from 1 to 10 to observe the difference between MAGA algorithm and TEDA algorithm on execution time. With the database T20. I5. N100K.D100K our experimental results are shown in Fig. 2. This figure shows that with the increase of the number of nodes, the needed execution time for these two algorithms will decrease. When there is only one node to use, the execution time of MAGA algorithm and TEDA algorithm are basically same. When the number of available nodes is 2 or 3, MAGA algorithm’s execution time is slightly higher than TEDA algorithm. When the number of available nodes is more than 3, MAGA algorithm’s execution time is significantly less than TEDA algorithm. The reason is that TEDA algorithm spends a lot of time on encryption and decryption.
20 30 40 50 60 70 80
1 2 3 4 5 6 7 8 9 10
Number of Nodes
E
x
ec
u
ti
o
n
T
im
e
(se
c
[image:4.595.200.388.465.583.2]) MAGA TEDA
Figure 2. Difference of execution time.
16 64 256 0.2
0.4 0.6 0.8 1
4 0
Number of Nodes
A
v
er
ag
e
L
o
ss
o
f P
riv
ac
y
[image:5.595.195.404.69.277.2]TEDA MAGA
Figure 3. Comparison of the average privacy leakage.
Summary
In the big data environment, through the analysis and induction of the massive data, mining out the potential mode and studying the market operation law can help enterprises adjust their strategies, reduce risks and improve efficiency. Although big data technology has been developed rapidly in recent years, the problem of privacy data protection in big data environment has not been solved. In the big data environment, users' data is often processed on a remote unknown server, so users are worried about their privacy information being leaked, which seriously hinders the widespread application of the technology. In order to reduce users' concerns, we have proposed a new method to prevent users' privacy information from being disclosed in the big data environment.
Acknowledgement
This research was supported by the science and technology research project of science and technology department of Henan Province (No. 172102210171)
References
[1] Jian Wang, Experimental Analysis for Novel K-NN Classification Algorithm in Cloud Computing. Journal of Computational Information Systems, Vol. 8, Issue: 22, (2012) p. 9217-9224.
[2] W. Du, Y. Han, and S. Chen, Privacy-preserving Multivariate Statistical Analysis: Linear Regression and Classification, In Proceedings of the Fourth SIAM International Conference on Data Mining, (2004), p. 222-233.
[3] Ke Wang, Benjamin C. M. Fung, and Philip S. Yu, Template-based Privacy Preservation in Classification Problems, In ICDM, (2005), p. 466-473.