Creative Components Iowa State University Capstones, Theses and Dissertations
Fall 2018
CloudChord: A P2P Network of Clients Cloud Storage for Data
CloudChord: A P2P Network of Clients Cloud Storage for Data
Access Pattern Privacy
Access Pattern Privacy
Xuan Lu
Iowa State University, [email protected]
Follow this and additional works at: https://lib.dr.iastate.edu/creativecomponents Part of the OS and Networks Commons, and the Systems Architecture Commons
Recommended Citation Recommended Citation
Lu, Xuan, "CloudChord: A P2P Network of Clients Cloud Storage for Data Access Pattern Privacy" (2018). Creative Components. 84.
https://lib.dr.iastate.edu/creativecomponents/84
1 | P a g e
Creative Component Report
CloudChord: A P2P Network of Clients Cloud Storage for Data
Access Pattern Privacy
2 | P a g e
Abstract
Cloud storage is becoming more and more popular in recent years. Users can use cloud storage services
without investing a large amount of money to set up and maintain their own storage systems. However,
cloud storage servers cannot be fully trusted. Although encryption of data provides a partial solution to
the problem, it is still possible that a server can learn some information about a user based on the user’s
access pattern, and further predicts the user’s activities. In this project, we aim to build a distributed
system in the users’ side to hide the users’ access patterns from a server. Our design is based on the
idea that, by mixing the accesses from a large number of active users, each individual user’s access
pattern can be easily buried among the collective access patterns of all the users. Especially, we develop
a P2P network system called CloudChord that integrates the cloud storages of a set of users. With
CloudChord, file blocks of each individual user can migrate to the cloud storages of other users in the set,
so that the user’s access pattern is hidden from the server. CloudChord is designed to run in the users’
premise to protect the data privacy of the users without the involvement of the server. CloudChrod also
3 | P a g e
1.
Introduction
Nowadays cloud storage service is becoming a more and more attractive choice for a user who needs
reliable storage infrastructures. Google Drive[1], one of the most popular cloud storage services,
provides a large amount of cloud storage at a very competitive price. In spite of the advantage of the
attractive price, cloud storage servers however cannot be always completely trusted, especially for the
storage of very sensitive data. And thus, increasing importance has been attached to user privacy and
data security in cloud storage.
Although encryption of data provides the partial solution to the issue, a user’s access pattern may still
be exposed to a server. To be more specific, it is possible that a server can learn the information about
the data based on when and how frequent a user accesses the data. The current two provable
approaches to hide user’s access pattern, i.e., Oblivious RAM (ORAM) and Private Information Retrieval
(PIR), can address this issue. However, the cost of applying these methods may be expensive for average
users, and the server side is required to take part in the process of hiding access pattern.
To address the drawback of the above approaches, this project aims to hide the user’s access pattern
completely in the user’s side. We consider a scenario where a large set of active users, who extend a
large variety of data access patterns. Thus, by mixing the accesses of the users in the set, each individual
user’s access pattern can be easily buried among the collective access patterns of the users. So that
4 | P a g e We develop a P2P system called CloudChord. It includes a P2P scalable overlay network for a set of users
to connect with each other. The network is built on Chord, a structured P2P network, and each user’s
storage account corresponds to one node in Chord. CloudChord utilizes Google Drive API to manage the
connection between each user and its associated Google Drive account. This way, CloudChord actually
provides a platform for each user to store and access its own data from others’ cloud storage accounts.
CloudChord keeps track of all outsourced data for each user via a secure tunnel, so the data can be
accessed in a highly efficient and fast way when needed.
CloudChord aims to offer a high level of data security, user connectivity, fairness and randomness of file
allocation, and high availability and anonymousness for the stored file block. Moreover, CloudChord
runs in the users’ premise, and based on the permissions of users, which ensures the privacy and data
security for the users.
The rest of this report is organized as follows. In Section 2, we introduce the related work about P2P
systems and the approaches to hide access pattern. In Section 3, we present the problem statement of
the project. We present the project design and implementation in Section 4, and then report the
evaluation of results in Section 5. In Section 6, we conclude this work and point out future research
5 | P a g e
2.
Related Work
We survey the related the work in this section. It covers current approaches to hide access pattern and
distributed peer-to-peer(P2P) systems.
2.1
Existing Approaches to Hide Access Pattern at Server Side
Oblivious RAM (ORAM) is a technology to protect a user’s access pattern privacy from an assumed
semi-honest cloud storage server. ORAM arranges the data block in a way that the server has no idea of
where each file block is stored, and defines the algorithms for the user to obliviously retrieve data and
to oblivious evict retrieved data back to the server. For example, Path ORAM is an effective way to hide
access pattern by changing the storage arrangement at the server side after each data retrieval [2]. One
drawback of ORAM is that, although the user knows the position map for each file block and directs the
oblivious retrieval and eviction of data, the server is required to get involved in the part of
rearrangement process. This usually leads to high communication and storage overheads in order to
prevent the server from inferring the user’s access pattern when it is involved.
Private information retrieval (PIR) is an approach for a user to query a server storage while hiding the
identity of the data item the user interested in. It hides each query independently of all previous queries.
Information-theoretic PIR approach is to replicate database among multiple servers, and the user needs
6 | P a g e user queries for both 2 servers. After receiving the results from the servers [3], the user applies XOR
operation on both results to get the final query result. The drawback for PIR is that, we have to assume
that servers will not collude with each other and the servers usually need to scan a large portion of its
data storage in order to answer a query.
2.2
P2P Networks
One drawback with the client-server model is that it is not scalable since the server is the bottleneck. A
P2P system however is able to make the available resource grows with the number of participants grow.
Each participant takes the role of both client and server, and each is considered a peer. It is mainly used
to share computer resource by direct exchange between peers. Examples of P2P systems include Skype
[4], Gnutella [5, 6], Kazaa [7], BitTorrent [8] etc.
2.3
Chord System
Chord system forms a structured overlay P2P overlay network since it is constructed using a
deterministic procedure [9]. Nodes are logically placed in a ring. Each node is identified by a hash value
derived from its IP address, and each data block is also identified by a hash value derived from the name
of the file it belongs to and its offset in the file. Both node identifier and data block identifiers (i.e., keys)
share the same domain space. Each data block is mapped to some node. Specifically, a block with key k
is stored in the node identified by the successor(k). If the node K exists, the successor(k) is node K.
7 | P a g e lookup. Each node maintains a list of routing information called finger table. With the number of N
8 | P a g e
3.
Problem Statement
We present the system model and design goals of this project in this section
3.1 System Model and Assumptions
The system model is considered as follows. It is assumed that the at least one user should exist in the
system so as to maintain the system. Ideally, there are large number of active users in the system. Each
user has one available Google Drive account associated, which allows the user to store data on Google
Drive server. Each user differs from each other by the localhost socket i.e. IP address and port number.
The storage cloud of our system is dependent on Google Drive. Currently, free storage limit of Google
Drive is 15GB, so the limit of CloudChord for each user is also 15GB.
3.2 Design Goals
Our project aims to build a distributed structured P2P software. It connects a large number of users who
own physical Google Cloud storage services. As for each user, the system runs as a software agent and
decentralized infrastructure which connects not only his own Google Drive service but other users’
Google Drive service as well. The design goals of this project listed as follows:
Security: The data content should be encrypted at the user side before exporting to the system, and the
9 | P a g e Connectivity: All the users should be connected with each other. The logical structure of the system is
represented as Chord which forms a ring from a large number of nodes. Every node in the system keeps
track of its predecessor node and successor node.
Fairness: Due to load balancing, the amount of space that each user stored in their own cloud storage
should be at a similar level with each other.
High Availability: Given the file name, the owner of the file can efficiently retrieve the file from the
system via the highly efficient LOOKUP time complexity O(logN) by finger table. Each user’s file should
be replicated for copies, so that even if one of the nodes keeping the file blocks for the source file gets
crashed. The user still can recover the source file from the duplicated file block.
Anonymity: Each user should not know data block stored in his cloud storage belong to which user. Even
if the data block belongs to the current user himself, he should realize until retrieval for the file. So that
it is difficult for a user to find out the access pattern among each other.
Low data access latency: files are usually uploaded and downloaded among multiple users in parallel so
10 | P a g e
4.
Design and Implementation
In this section, we presented the design and implementation of CloudChord system from both front-end
and back-end.
4.1
Front-end
The front-end not only provides basic interface GUI to interact with a user but also offers some basic
functions for the user once he joins the CloudChord. First, the GUI provides menus, text fields, buttons
and pop-up windows for users to send requests, and receives the messages from the command. Users
can easily check their finger tables, local cloud storage file lists and the destination nodes storing
uploaded file blocks etc. Besides that, the uploading, downloading, quitting and load balancing functions
can be initiated in the front-end, which make the system easy to use. Second, the front-end subsystem
creates the basic directory structure and property files for each user. Whenever the node starts or joins
the system, it creates a home directory and a downloading directory respectively. Three property files
sent_file.props, cloud.props, and name.props are also created for each user, which are required for file
manipulation. Third, the console in the front-end provides the feedback for user’s request (such as basic
node information), the feedback for file manipulation and illegal manipulation, etc. The GUI figure of the
11 | P a g e
4.2
Back-end
4.2.1
Structure of Chord in CloudChord
The backend provides all the core functions in CloudChord, the overall structure is shown as the figure
below. Each user keeps three property files locally which are required for the file manipulation.
CloudChord connects each account with their Google Drive account. File blocks are stored in each users’
cloud storage. Metadata for each file block is also stored in cloud storage. On uploading, the local node
splits the file of interest into blocks and sends out each file blocks to the corresponding cloud storage for
12 | P a g e blocks send backs their file blocks from the cloud storage, the requested node then merges all the blocks
to get the original file.
Naming
In CloudChord, every node is unique for its socket address, which is the localhost IP address and a port
number user specified. Each CloudChord file identifier is based on the hash value of the original file
name. We apply consistent cryptographic hash function SHA-1 to provide the identifier for each node by
hashing the socket address. After that, the customized compression function is used to provide a 32-bit
identifier for each node from the message digest computed by SHA-1. Since the identifier length for
13 | P a g e
Updating Successor
For each node in CloudChord, it is critical to keep connected with the successor and the predecessor of
the node. It is required that each node periodically keeps checking the online status of its successor. And
thus, our system is able to dynamically detect two scenarios: a new node joins the CloudChord and an
existing node leaves the system. The flow chart below shows how stabilizer works.
Once a new node A joins the system, it needs to contact one existing node B. Based on node A’s
CloudChord identifier, node B looks up A’s successor node C first by looking up its finger table. If node C
is not the most direct successor to A, other nodes will be contacted until node A finds out the most
direct most direct successor node C. After that, node A will connect with node C, and set node C as the
successor. Once they are connected, node C will disconnect from its previous predecessor and set node
14 | P a g e When node A leaves the system, it can also be detected. Every node continuously detects the active
status of its successor node, so that node A’s predecessor B can detect the disconnection of node A.
After that, node B looks up A’s successor node C by checking its finger table. Once they are connected,
node B sets node C as its new successor, and C sets node B as the predecessor in parallel.
Updating Finger Table
Each node maintains the routing information for other 32 nodes as the finger table. The indices of nodes
in the finger table are as follows: letting n be the identifier of the node in question, and i ∈ [1, 32],
finger[i] records the first node that succeeds (n + 2i) mod 232. Every node periodically chooses in random
one of the nodes from the finger table and checks whether it is still online or not. Moreover, the finger
table keeps updated once new node joins or existing node leaves the system. On one hand, consider a
new node A joins CloudChord, node B is the predecessor of node A. Suppose the finger table of node C
contains node B. Node C can find out the joining of node A through finding that node B has updated its
successor. On the other hand, when node A leaves CloudChord, node B will set A’s successor as the new
successor in B’s finger table, so that node C can update the finger table once node B updates its
successor. Also, the newcomer node A, it can keep asking its successor’s successor recursively until it fills
15 | P a g e
4.2.2
File Manipulation
Property File
There are three property files created for each user for file manipulation in CloudChord. Name.props is
used during the renaming process, which records the source file name and its corresponding hash value
file name. Sent_file.props keeps track of the file block and the destination socket address that the block
is sent to. Cloud.props records all the blocks stored in the current cloud storage and its corresponding
16 | P a g e
Uploading
When a user requests for file uploading, the file is renamed after the hash value from SHA-1 hashing, so
that the new file name shares the same domain space with the node identifier. For fault tolerance, each
file is replicated before outsourcing to cloud storage, and the duplicate copy is also outsourced together
with the original file. The duplicated one will be used for the original file is lost or broken. Name.props
file is used to record the original file name and its corresponding hash name. In order to ensure user
privacy, the original file is encrypted with AES encryption before outsourcing to other nodes. The
encryption key is kept on the user side. After that, the encrypted file is equally split into a certain
number of file blocks. Based on the identifier name for each split file block, it is outsourced to its
successor node. Sent_file props file records the split the file block and the destination successor socket
addresses. After a successor node receives a file block, it uploads to its local cloud storage service with
recording Google Drive file identifier assigned from Google Drive. Cloud.props file keeps track of the
key-value pair for split file block and its Google Drive file identifier. The flow chart for uploading is shown
17 | P a g e
Downloading
When a user requests to download a file, CloudChord first checks whether the input file name is owned
by the current user through checking whether the file name is in the sent_file.props file. If it is,
CloudChord sends a “Download Signal” to all of the target socket addresses for requesting the file blocks.
Each of the target sockets calls Google Drive Download API to download the target file block and send
back to the requested node. After collecting all the required file block, the requested node merges all
the file blocks. With the key stored in the user directory, the node decrypts the merged file to get the
original plain data. Finally, the file was renamed back to the original file name by looking up name.props
18 | P a g e
Quitting
When a node leaves the chord, all of the file blocks in node A will be migrated to its successor node B.
On one hand, the node removes all of the blocks from its cloud storage together with deleting all
records in cloud.props file. The successor B uploads all the migrated blocks to the cloud storage together
with updating its cloud.props file. On the other hand, the current node A needs to inform all of the
owners for the migrated blocks to update their sent_file.props about the new destination node address
for the migrated blocks. The flow chart for quitting is shown below:
4.2.3
Load Balance for User Storage
In order to allocate roughly equal number of uploaded files to every user cloud’s storage, a load
balancing mechanism is implemented as follows: once the total size of uploaded cloud storage for one
node exceeds a designated load balancing limit, CloudChord will choose a certain percentage of total file
19 | P a g e to the node quitting process, and the only difference is that a certain percentage of total blocks instead
of all of the blocks are chosen to migrate to the successor. The process will continue until the total
number of uploaded blocks becomes smaller than the load balance limit. The flow chart for load
20 | P a g e
5.
Evaluation
In this section, we report the evaluation of the performance of CloudChord for file
uploading/downloading and file distribution with load balancing.
5.1
Uploading and Downloading
We compare the time of direct uploading/ downloading and 5-node in CloudChord for uploading
/downloading with the number of files with the total size of 1KB, 2KB, 10KB, 20KB, and 40KB.
Experiment 1
Time: 10/2/18 3:00 PM
Place: campus
Size of split block: 256B
File type: txt
Time(sec) for Uploading and Downloading with different size of files
Uploading Downloading
Direct 5 users Direct 5 users
1 KB 5.463 5.212 1.339 1.258
2 KB 11.633 10.173 3.741 3.181
21 | P a g e
20 KB 115.325 103.767 36.099 31.483
40 KB 211.953 203.874 67.598 66.184
Experiment 2
Time: 10/2/18 8:00 PM
Place: home
Size of split block: 256B
File type: txt
Time(sec) for Uploading and Downloading with different size of files
Uploading Downloading
Direct 5 users Direct 5 users
1 KB 5.771 5.778 1.394 1.466
2 KB 12.749 11.979 3.413 3.224
10 KB 57.933 57.347 19.995 18.621
20 KB 117.654 118.564 35.347 33.557
22 | P a g e
Experiment 3
Time: 10/12/18 3:00 PM
Place: campus
Size of split block: 256B
File type: txt
Time(sec) for Uploading and Downloading with different size of files
Uploading Downloading
Direct 5 users Direct 5 users
1 KB 5.283 4.049 1.654 1.326
2 KB 10.554 10.896 3.509 2.976
10 KB 51.1 46.264 18.201 14.765
20 KB 105.087 93.541 38.496 32.764
40 KB 223.347 196.647 64.679 59.675
Experiment 4
Time: 10/12/18 8:00 PM
Place: home
Size of split block: 256B
23 | P a g e Time(sec) for Uploading and Downloading with different size of files
Uploading Downloading
Direct 5 users Direct 5 users
1 KB 5.842 5.848 1.415 1.49
2 KB 11.351 10.045 3.383 3.214
10 KB 56.302 56.686 18.479 16.05
20 KB 120.741 115.661 36.5 37.463
40 KB 223.792 212.704 66.76 65.55
Experiment 5
Time: 10/16/18 3:00 PM
Place: campus
Size of split block: 256B
File type: txt
Time(sec) for Uploading and Downloading with different size of files
Uploading Downloading
Direct 5 users Direct 5 users
1 KB 5.689 4.932 1.564 1.579
2 KB 10.893 10.329 3.876 3.088
10 KB 53.765 48.521 17.976 15.543
20 KB 109.345 101.432 37.987 31.875
24 | P a g e
Experiment 6
Time: 10/16/18 8:00 PM
Place: home
Size of split block: 256B
File type: txt
Time(sec) for Uploading and Downloading with different size of files
Uploading Downloading
Direct 5 users Direct 5 users
1 KB 5.964 5.787 1.904 1.865
2 KB 12.292 12.81 3.21 3.094
10 KB 57.703 52.928 16.774 16.004
20 KB 123.752 114.09 35.693 35.764
25 | P a g e Summary of experiments
Average Time(sec) for Uploading on campus with different size of files
Average Time(sec) for Downloading on campus with different size of files
0 50 100 150 200 250
1k 2k 10k 20k 40k
sec
Campus Uploading
direct 5 node
0 10 20 30 40 50 60 70 80
1k 2k 10k 20k 40k
sec
Campus Downloading
26 | P a g e Average Time(sec) for Uploading at home with different size of files
Average Time(sec) for Downloading at home with different size of files
0 50 100 150 200 250
1k 2k 10k 20k 40k
sec
Home Uploading
direct 5 node
0 10 20 30 40 50 60 70 80
1k 2k 10k 20k 40k
sec
Home Downloading
27 | P a g e The results above indicate the following conclusions:
1. Speed for downloading is faster than uploading. The possible reason is that the downloading
speed is usually faster than uploading.
2. With the increase of file size, the speed for uploading and downloading is getting slower. The
possible reason is that the Round trip time for single file block to upload and download is the
limiting step. So that with larger file size is, the more file blocks it has. That leads to the much
longer time.
3. The performance of 5-user CloudChord is a bit better than direct file uploading and downloading
especially on campus. The possible reason is that there is a trade-off between the number of
nodes and the network traffic. Although more nodes in CloudChord leading to the more network
traffic with longer time, multiple nodes also save the time by running among more
uploading/downloading in parallel. Since we use the local machine with the different specified
port numbers to mimic different users, our results imply that our system may apply to the
environment with many users in a small area like on campus or in a neighborhood.
5.2
Load Balancing
In order to determine the performance of load balancing, with 100 equally-sized files (256
bytes/file), we count the distribution of files with and without load balancing.
Experiment 1
Time: 10/2/18 4:00 PM
Place: campus
Number of users: 5
28 | P a g e Number of Files: 100
Size of Block: 256B
Percentage of migration: 10%
File type: txt
Distribution of 100 files among 5 users
User 1 User 2 User 3 User 4 User 5
(-) Load Balancing 41 21 6 15 17
(+) Load Balancing 24 23 7 24 22
Experiment 2
Time: 10/12/18 4:00 PM
Place: campus
Number of Nodes: 5
Load Balancing upper bound: 6KB
Number of Files: 100
Size of Block: 256B
Percentage of migration: 10%
29 | P a g e Distribution of 100 files among 5 users
User 1 User 2 User 3 User 4 User 5
(-) Load Balancing 1 13 30 34 22
(+) Load Balancing 17 13 24 24 22
Experiment 3
Time: 10/16/18 4:00 PM
Place: campus
Number of Nodes: 5
Load Balancing upper bound: 6KB
Number of Files: 100
Size of Block: 256B
Percentage of migration: 10%
File type: txt
Distribution of 100 files among 5 nodes
Node 1 Node 2 Node 3 Node 4 Node 5
(-) Load Balancing 30 5 28 14 23
30 | P a g e Summary of Experiment
Average Distribution of 100 files among 5 nodes in 3 times without Load Balancing
Average Distribution of 100 files among 5 nodes in 3 times with Load Balancing
0 10 20 30 40 50
user 1 user 2 user 3 user 4 user 5
Number of Files Blocks Stored Without
Load Balance
0 10 20 30 40 50user 1 user 2 user 3 user 4 user 5
31 | P a g e The results above indicate the following conclusion:
1. Although the SHA-1 hash function is designed for file block to distribute evenly among the users,
without load balancing, file distribution occasionally varies quite differently even for the same
node. The possible reason might be the number of nodes is still too few so that the successor for
file blocks cannot be evenly distributed.
2. From the standard deviation of the above two graphs, we can see that files distribute relatively
evenly with the load balancing, which is consistent with our expectation and that also partially
32 | P a g e
6. Conclusions and Future Work
In this project, we studied the problem of protecting each user’s access pattern from its cloud storage
server. We proposed to build a distributed chord system in the users’ side. Based on the observation
that different users of a cloud storage service extend a large variety of access patterns, by mixing the
various access patterns of different users, each individual user’s access pattern can be buried by the
collective access patterns of many users. We designed and implemented as a P2P system called
CloudChord, which integrates the cloud storage accounts of a set of users into a collective cloud storage
service, provides a platform for each of the users to store and access their file block from a collective
cloud storage service, and mixes the access patterns of the users. From our evaluation results, we
showed the effectiveness of load balancing for the distribution of file blocks. We also found that the
performance of CloudChord with the collaboration among multiple users is better than direct file access
by each user individually.
The future direction of this work is needed to implement periodically renaming the file blocks for
uploading and periodically randomly choose one of duplicated file block copies for downloading. With
33 | P a g e
References
[1] Google Drive API: https://developers.google.com/drive/v3/web/about-sdk.
[2] M. V. D. EMIL STEFANOV, ELAINE SHI, CHRISTOPHER FLETCHER, LING REN, XIANGYAO YU, SRINIVAS DEVADAS, Path ORAM: An Extremely Simple Oblivious RAM Protocol, CSS (2013).
[3] W. GASARCH, A Survey on Private Information Retrieval, BEATCS Computational Complexity Column, 82 (2004), pp. 72.
[4] "Skype website," http://www.skype.com/.
[5] "The Gnutella protocol specification v0.4,"http://www9/. limewire.com/developer/gnutella protocol 0.4.pdf.
[6] "The Gnutella protocol specification v0.6," http://rfcgnutella.sourceforge.net/src/rfc-06-draft.html.
[7] "Kazaa website," http://www.kazaa.com/.
[8] B. COHEN, The BitTorrent protocol specification version 11031.
34 | P a g e
Appendix A: CloudChord Instructions
Install the gradle
The instruction of installation and information for Gradle:
https://gradle.org/install/
In this project, we integrate the gradle in the Eclipse.
The gradle version is 4.6
Start the Program
1. Import the project as the gradle project first, and then follow the import instructions to
integrate the project into Eclipse.
2. Follow the path “src/main/java/frontEnd/Main.java”.
3. Right-click Main.java and choose “Run as Java Application”, and our project get started and the
GUI of this project will be promoted to the user.
Join the CloudChord
For the majority of cases, a user just needs to join the CloudChord with the introduction of one existing
node in CloudChord. Here is the basic step:
1. Fill in the port number in the Join filed ranging from 1024 to 65535
2. From the “Plz Select to Join” drop-down menu, it contains the list of socket address for the all
35 | P a g e 3. After Clicking the Join button, the success Join Information will be promoted on the console as
the following figure shown:
For the few cases that the first user to start CloudChord, he needs to fill in port number to initiate and
36 | P a g e
Appendix B: System Architecture
Adding Account
A user should own the Google Drive account and provides the authorization information. Please follow
the following steps to step up the Google Drive account and add the authorization file to CloudChord.
1. Before using the CloudChord, each user should own an enabled Google Drive account. If not,
follow the signup Google account instructions to set up one.
2. Click Google Drive API Console to create or select the project and click Continue. And then select
credentials.
3. In the Credentials pages with the subtitle of Add credentials to your project page, scroll down to
the bottom and click the Cancel button.
4. In the Credentials page, select OAuth consent screen tab. Enter the Application name if it is not
automatically set, and click Save button
5. Select Create Credentials button on the Credentials tab, and choose Create to OAuth client ID.
6. Choose Other in the application type, and enter “CloudChord” or another specified name in the
name field, and click Create button.
7. Click OK in the prompt dialog.
8. Click Download JSON in the rightmost to the client ID.
9. Rename the downloaded JSON file to client_secret.json and added it to the user’s home
37 | P a g e
Node Information
The About button provides basic node information for the current node. First, the socket address, IP
address and port number are listed. Second, the location in the chord structure is provided in the form
of the percentage. Third, information about the predecessor node and successor node are also provided,
which are important for the stabilization of the node and file manipulation.
Here is the example of the node information
Figure Table
The finger table information can be accessed by clicking the FigureTbl button. There are 32 entries in the
finger table since CloudChord is based on a 32-bit identifier. For each entry in the finger table, it lists the
index in the finger list, the corresponding Chord value. The next three items are about the successor
node. The socket address for the successor node, its hash code and its location in the CloudChord shown
38 | P a g e Here is the example of finger table:
Property File
Sent_file.props records all the outsourced file blocks and the location of their storage node. Each entry
in the sent_file.props is composed of the name of file block and socket address of the destination node.
As for the name of file blocks, there are some notations: cp_ indicates duplicated version, enc_ indicates
encoded file, and .sp indicates the split part of the file. SentInfo button initiates the sent_file.props
39 | P a g e Name.props keeps track of the file block and its corresponding hash value, which is used in file retrieval.
The naming notation is consistent with the naming notation in the sent_file.props, and its hash value is
followed by the name. NameInfo button initiates the name.props information as shown below:
As for cloud.props, since Google Drive assign each uploaded file with one Google Drive identifier.
Cloud.props records file block hash name and its corresponding Google drive identifier file name stored
40 | P a g e
Uploading File
Uploading is one of the core functions in CloudChord. The algorithm for uploading file is listed in the
report, the main step shown below:
1. First click SelectFile button, the Choose File window prompt. The path directly navigates to the
user’s home directory created by CloudChord.
2. The selected file name will be shown right to SelectFile button. If the file is the correct file to
upload, click the Upload button. Otherwise, the user can select the file again.
3. After clicking the Upload button, the file is encrypted with AES encryption, and the key for the
encryption is stored in the user’s home directory. With that, the encrypted file will be split into a
certain number of file blocks. And each encrypted file block will be outsourced to cloud storage.
4. The uploading information will be prompted on the console. The basic information includes the
outsourced file block, the hash name for the file block and the successor node for the file block.
Besides that, we also keep track of the total uploaded size for the current cloud storage for the
41 | P a g e also track the uploading time to show in the console. An example of uploaded information from
the console is shown below:
Downloading File
Downloading is another the core functions in CloudChord. The algorithm for downloading is listed in the
report, the main step shown below:
1. A user enters the file name for the downloading file. If the file is not sent from the current user,
a notice will show up to remind of the illegal operation. Otherwise, the user is free to download
42 | P a g e 2. After clicking the Download button, the current node retrieves the file block from the successor
node for each file block and automatically download into the user’s Download directory. The
console will show information as follows:
3. Each file block will be merged together based on the naming, which is the encoded file. With the
secret key stored in the home directory, the encoded file will be decoded into the plain text
source file.
Load Balancing
Load Balancing maintains files in CloudChord evenly distribute for all nodes cloud storage. The initiation
load balance is to clicking the LoadBal button, once the total size for one node storage exceeds the load
balance limit, certain percentage of files from cloud storage will be migrated to its successor node. We
43 | P a g e
Appendix C: Implementation
In this section, we present the source code for the core steps for this project
Dependencies
In build.gradle files, we need to Google Drive api
45 | P a g e Downloading
46 | P a g e set predecessor