2016 International Conference on Wireless Communication and Network Engineering (WCNE 2016) ISBN: 978-1-60595-403-5
SD2DS-Based Anonymous Datastore for IoT Solutions
Adam KRECHOWICZ
*and Stanisław DENIZIAK
Kielce University of Technology, Poland,
*Corresponding author
Keywords: SD2DS, Anonymity, IoT, Datastore.
Abstract. In this paper the architecture of the cloud datastore supporting the data anonymity, which especially would be suitable for IoT systems, is presented. The modern IoT systems gather more and more data. Storing of those data may be a huge challenge because of the growing number of sensors and the resolutions of sensors themselves. Cloud storages are great solution that can face that problem. On the other hand, the data stored by IoT systems are often very confidential and in many cases third-party company should not be involved in gathering and processing them.
Introduction
IoT systems might be a great revolution not only to the Internet technologies but also to our lives. Such systems are used not only in entertainment (like smart houses) but also in security, transportation and even in medical treatment. The growing development of such systems require more and more data to store and process. A typical IoT system may grow to huge number of sensors and this number can grow during the system use. Moreover, the resolution of the sensors ought to be satisfactory big to provide better processing methods. As an example let us consider video cameras. Such kind of devices is used in IoT systems not only to monitor the environment, but also to perform advanced processing methods like object detection or face recognition. Gathering and processing images obtained by video cameras are huge challenges in many IoT systems, which are currently in use.
Cloud-provided datastores, like Amazon DynamoDB or Google Cloud Datastore seem to be the most suitable solutions for storing data in IoT systems. They allow to store huge data sets and allow fast access to these data which is required for processing. Scalable Distributed Two-Layer Datastore SD2DS) [1] is another datastore system that was developed by us. The previous experiments proved that this datastore can seriously compete with the most popular storages that are currently used in many applications. We believe that this datastore can be efficiently used in cloud environments for storing huge data sets required for the growing IoT systems.
We agree that Cloud Computing (CC) solutions are great ways of solving vast majority of computational problems. However, they also introduce some drawbacks. In our opinion the aspects of data security and privacy must be carefully considered before using CC solutions. In this paper we present the architecture of the SD2DS-based datastore supporting the data anonymity, which conforms to the IoT concept.
Motivation and Related Work
Despite the many advantages of CC like pay-per-use model, scalability, fault tolerance, those kinds of solutions are not the remedy in all of the cases. Many people are considering the aspects of data security and privacy in such systems. Contrary to using typical storages, users do not know much, sometimes even anything, about the methods of data storing. In vast majority of cases, users do not know where their data are located, who is responsible for administering of the system and, most importantly, who can have access to those data.
unauthorized third-party. The Snowden affair [3] starts many considerations about the privacy of their data located in the Cloud. Additionally, all of us know at least several incidents concerning data leaks caused by hackers. One of the most infamous incident happened in 2014 when many of private data, including nude photos, of some celebrities was leaked from their cloud storage [4].
The peer-to-peer networks (P2P) offer the greatest solutions in the case of data privacy and anonymity. The most recognizable example is the TOR Project [5] which allows to browse the Internet anonymously. One of the other example is Tribler [6] which utilizes TOR solutions into BitTorrent network. But the most of existing cloud datastores do not give much attention to the problems of the data privacy and anonymity. They are mostly developed in such a way that they are running in a trusted environment. Most of them store the data in a raw format that can be easily spied.
Scalable Distributed Two-Layer Datastore
[image:2.595.179.416.341.504.2]Scalable Distributed Two-Layer Datastore (SD2DS) is a very efficient datastore that can be successfully used in CC environments. The main advantages of this system are that it does not use any kind of central element and it can scale very well. The conducted research [1] proved that it can outperform the most popular datastores that are commonly used. The performance comparisons with MongoDB and MemCached are presented in the Figure 1. The SD2DS proved to be efficient also in IoT applications [7].
Figure 1. The performance comparison of SD2DS and other datastores.
The data portions in our storage is identified by an unique key. Our system divides the data into two separate elements: the header and the body. Both of these elements are stored separately in different places. The header consists of metadata that allow to localize the actual data portion located in the body. The headers are stored in the first layer of the structure which is called File while the bodies are located in the second layer called Store. The header is an excellent place to store additional information that allow to extend the basic functionality of datastore. As an additional metadata we use checksums and encryption keys.
SD2DS Provided Data Anonymity
actually connect with the cloud storage instead of the client. Proxies are in form of a peer-to-peer network consisting of all the clients. We use a different number of proxies so even proxy itself cannot determine the origin of data. In the future work we are planning to introduce Onion Routing mechanism for better protection of the origin of data.
Figure 2. SD2DS providing anonymity architecture.
Unfortunately, using P2P network can be a cause of some serious problems regarding data privacy. Because the chain is as strong as the weakest link, the single malicious or hacked peer can do harm to the data that it should help to protect. It requires many additional tasks to provide privacy at accepted level. First of all, the data are sent to and from the user through the proxy so the broken proxy can misled client about the data content. It can send random data without retrieving them from a proper location or unnecessarily change them. It is the most basic problem that might be caused by some faults during data transmission or even intentional action of adverse peer. The good practice is to use hash codes that can ensure that the received data are unchanged and exactly the same as previously stored. The hash, in form of SHA-1, needs to be calculated before inserting the data and securely stored in the data header. When the client discovers that the received data do not match the originally inserted they should try to receive the data from different proxy. In some extreme cases, the client can still try to receive the data without proxy to be sure that it contains original data. The anonymity of the user does not need to be compromised because the cloud provider can recognize the client just as another proxy.
The more serious threads can be observed when an adverse proxy is used to store the data. In that scenario it could just accept the portion of the data and does not store them in the appropriate place. The user can be easily deceived that their data are safely located while they are just ignored by some proxies. To prevent from this happening the client should always check if each part of the data is stored properly by the receiver. If the data are correctly retrieved by a different set of proxies the client may assume that the insertion was correct. Multiple checking if the data was inserted properly may be a good practice despite the high transfer that needs to be used. In that case the client may just ask for the hash code from the cloud provider. The cloud storage system may calculate the hash based on the data and send it back through proxies to client.
The most serious thread can be observed when malicious proxy stores the data correctly but deletes them after the client performs all of the data checking. This problem can be solved in the most basic ways by not allowing to delete any data from the cloud. This simple solution can be enough in many situations because each user does not want to lose anything and wants to use it for all time. The use of Onion Routing is a very good idea in such a case because in that solution proxy does not know the exact content of the packet that it routes so the chance of the incorrect behavior is highly reduced.
Anonymous SD2DS for IoT
[image:4.595.153.444.242.446.2]We believe that the privacy and anonymous access to the data gathered by IoT system is very important issue. Currently we are developing that efficient system on the base of SD2DS [7]. Our current goal is to improve the security of the IoT data by introducing anonymity to it. The architecture of our system is presented in Figure 3. The sensors, gateways and connected applications are the typical parts of all IoT systems. Gathering of the data from the previous points in time may be extremely important to introduce advanced functionality. Because of that we introduce our SD2DS-based anonymous datastore consisted of three parts. The first layer, that contains all metadata, is located on the local infrastructure. The proxy layer is a P2P system that organizes the proxies. The second layer of SD2DS is located on cloud-based datastore and allows to store confidential data in a safe way.
Figure 3. The anonymous SD2DS for IoT.
Evaluation
We evaluated our datastore in real world environment consisted of 46 servers for storing data. We used 128 proxies organized as a peer-to-peer network. We evaluated the performance of our store in relation to the different number of strips and different number of proxies. We used data portions of fixed sizes (1MiB, 10MiB, 20MiB, 50MiB). We measured the time of downloading the data from the datastore. The results are presented in the Figure 4. As it can be seen, additional proxies and partition of the data into strips can have positive influence on performance. Especially bigger number of strips can cause a better performance because all strips can be downloaded simultaneously.
Figure 4. Performance in relation to the number of proxies and the number of strips.
[image:4.595.64.536.615.712.2]time. Figure 5 presents the results for downloading 1MiB components from the structure using 2 and 128 strips respectively. We used 3 proxy hops in these tests. The more and more proxies became malicious in time as indicated by the lowest charts. The environment started while all proxies runs without any failures. In both cases as the first proxy became faulty, the failed strips arose. The utilization of the checksums gave as an opportunity to detect failed strips. We adopted the strategy that in case of a bad strip detection the client retried to access the strip 10 times using randomly chosen proxies. After 10 attempts of retrieving failed strip the whole component (data element) were considered failed. As the Figure 5 indicates that the retrieve of valid component becomes impossible if the half of the all proxies becomes malicious.
Conclusions
[image:5.595.107.367.292.618.2]Nowadays, people are more and more aware of the issues concerning privacy in the cloud. A lot of incidents that are connected with the privacy of data in the cloud create new insight in these technologies. We believe that our contribution can change that negative trend. As the experimental results show, it can be achieved even without serious reduction of performance.
Figure 5. The evaluation of the structure with malicious proxies.
In our future work we are planning to introduce Onion Routing mechanisms to the proxies to improve privacy of the origin of the transmission. We are also trying to develop an incentive system so our P2P network can grow.
References
[1]A. Krechowicz, A. Chrobot, S. Deniziak and G. Lukawski, SD2DS-based datastore for large files. In: SD2DS-based datastore for large files, Springer, 2016, to appear.
[3]J. T. Richelson, The Snowden Affair. Web Resource Documents the Latest Firestorm over the National Security Agency.vhttp://www2.gwu.edu/~nsarchiv/NSAEBB/NSAEBB436/. Accessed: 2015-03-03.
[4]A. Duke, 5 Things to know about the celebrity nude photo hacking scandal.
http://edition.cnn.com/2014/09/02/showbiz/hacked-nude-photos-five-things/. Accessed:2015-03-03.
[5]Tor. Tor project. https://www.torproject.org/. Accessed: 2015-03-03.
[6]J. A. Pouwelse, P. Garbacki, J. Wang, A. Bakker, J. Yang, A. Iosup, D. H. J. Epema, M. Reinders, M. R. Van Steen and H.. J Sips. TRIBLER: a social-based peer-to-peer system. Concurrency and Computation: Practice and Experience, 20(2):127–138, 2008.
[7]S. Deniziak, et al., A Scalable Distributed 2-layered Data Store (SD2DS) for Internet of Things (IoT) systems, Measurement Automation Monitoring,61.7 (2015), pp.382-384.
[8]A. Krechowicz, Scalable Distributed Two-Layer Datastore Providing Data Anonymity.