CloudChord: A P2P Network of Clients Cloud Storage for Data Access Pattern Privacy

(1)

Creative Components Iowa State University Capstones, Theses and _{Dissertations}

Fall 2018

CloudChord: A P2P Network of Clients Cloud Storage for Data

Access Pattern Privacy

Xuan Lu

Iowa State University, [email protected]

Follow this and additional works at: https://lib.dr.iastate.edu/creativecomponents Part of the OS and Networks Commons, and the Systems Architecture Commons

Recommended Citation Recommended Citation

Lu, Xuan, "CloudChord: A P2P Network of Clients Cloud Storage for Data Access Pattern Privacy" (2018). Creative Components. 84.

https://lib.dr.iastate.edu/creativecomponents/84

(2)

1 | P a g e

Creative Component Report

CloudChord: A P2P Network of Clients Cloud Storage for Data

Access Pattern Privacy

(3)

2 | P a g e

Abstract

Cloud storage is becoming more and more popular in recent years. Users can use cloud storage services

without investing a large amount of money to set up and maintain their own storage systems. However,

cloud storage servers cannot be fully trusted. Although encryption of data provides a partial solution to

the problem, it is still possible that a server can learn some information about a user based on the user’s

access pattern, and further predicts the user’s activities. In this project, we aim to build a distributed

system in the users’ side to hide the users’ access patterns from a server. Our design is based on the

idea that, by mixing the accesses from a large number of active users, each individual user’s access

pattern can be easily buried among the collective access patterns of all the users. Especially, we develop

a P2P network system called CloudChord that integrates the cloud storages of a set of users. With

CloudChord, file blocks of each individual user can migrate to the cloud storages of other users in the set,

so that the user’s access pattern is hidden from the server. CloudChord is designed to run in the users’

premise to protect the data privacy of the users without the involvement of the server. CloudChrod also

(4)

3 | P a g e

1. Introduction

Nowadays cloud storage service is becoming a more and more attractive choice for a user who needs

reliable storage infrastructures. Google Drive[1], one of the most popular cloud storage services,

provides a large amount of cloud storage at a very competitive price. In spite of the advantage of the

attractive price, cloud storage servers however cannot be always completely trusted, especially for the

storage of very sensitive data. And thus, increasing importance has been attached to user privacy and

data security in cloud storage.

Although encryption of data provides the partial solution to the issue, a user’s access pattern may still

be exposed to a server. To be more specific, it is possible that a server can learn the information about

the data based on when and how frequent a user accesses the data. The current two provable

approaches to hide user’s access pattern, i.e., Oblivious RAM (ORAM) and Private Information Retrieval

(PIR), can address this issue. However, the cost of applying these methods may be expensive for average

users, and the server side is required to take part in the process of hiding access pattern.

To address the drawback of the above approaches, this project aims to hide the user’s access pattern

completely in the user’s side. We consider a scenario where a large set of active users, who extend a

large variety of data access patterns. Thus, by mixing the accesses of the users in the set, each individual

user’s access pattern can be easily buried among the collective access patterns of the users. So that

(5)

4 | P a g e We develop a P2P system called CloudChord. It includes a P2P scalable overlay network for a set of users

to connect with each other. The network is built on Chord, a structured P2P network, and each user’s

storage account corresponds to one node in Chord. CloudChord utilizes Google Drive API to manage the

connection between each user and its associated Google Drive account. This way, CloudChord actually

provides a platform for each user to store and access its own data from others’ cloud storage accounts.

CloudChord keeps track of all outsourced data for each user via a secure tunnel, so the data can be

accessed in a highly efficient and fast way when needed.

CloudChord aims to offer a high level of data security, user connectivity, fairness and randomness of file

allocation, and high availability and anonymousness for the stored file block. Moreover, CloudChord

runs in the users’ premise, and based on the permissions of users, which ensures the privacy and data

security for the users.

The rest of this report is organized as follows. In Section 2, we introduce the related work about P2P

systems and the approaches to hide access pattern. In Section 3, we present the problem statement of

the project. We present the project design and implementation in Section 4, and then report the

evaluation of results in Section 5. In Section 6, we conclude this work and point out future research

(6)

5 | P a g e

2. Related Work

We survey the related the work in this section. It covers current approaches to hide access pattern and

distributed peer-to-peer(P2P) systems.

2.1 Existing Approaches to Hide Access Pattern at Server Side

Oblivious RAM (ORAM) is a technology to protect a user’s access pattern privacy from an assumed

semi-honest cloud storage server. ORAM arranges the data block in a way that the server has no idea of

where each file block is stored, and defines the algorithms for the user to obliviously retrieve data and

to oblivious evict retrieved data back to the server. For example, Path ORAM is an effective way to hide

access pattern by changing the storage arrangement at the server side after each data retrieval [2]. One

drawback of ORAM is that, although the user knows the position map for each file block and directs the

oblivious retrieval and eviction of data, the server is required to get involved in the part of

rearrangement process. This usually leads to high communication and storage overheads in order to

prevent the server from inferring the user’s access pattern when it is involved.

Private information retrieval (PIR) is an approach for a user to query a server storage while hiding the

identity of the data item the user interested in. It hides each query independently of all previous queries.

Information-theoretic PIR approach is to replicate database among multiple servers, and the user needs

(7)

6 | P a g e user queries for both 2 servers. After receiving the results from the servers [3], the user applies XOR

operation on both results to get the final query result. The drawback for PIR is that, we have to assume

that servers will not collude with each other and the servers usually need to scan a large portion of its

data storage in order to answer a query.

2.2 P2P Networks

One drawback with the client-server model is that it is not scalable since the server is the bottleneck. A

P2P system however is able to make the available resource grows with the number of participants grow.

Each participant takes the role of both client and server, and each is considered a peer. It is mainly used

to share computer resource by direct exchange between peers. Examples of P2P systems include Skype

[4], Gnutella [5, 6], Kazaa [7], BitTorrent [8] etc.

2.3 Chord System

Chord system forms a structured overlay P2P overlay network since it is constructed using a

deterministic procedure [9]. Nodes are logically placed in a ring. Each node is identified by a hash value

derived from its IP address, and each data block is also identified by a hash value derived from the name

of the file it belongs to and its offset in the file. Both node identifier and data block identifiers (i.e., keys)

share the same domain space. Each data block is mapped to some node. Specifically, a block with key k

is stored in the node identified by the successor(k). If the node K exists, the successor(k) is node K.

(8)

7 | P a g e lookup. Each node maintains a list of routing information called finger table. With the number of N

(9)

8 | P a g e

3. Problem Statement

We present the system model and design goals of this project in this section

3.1 System Model and Assumptions

The system model is considered as follows. It is assumed that the at least one user should exist in the

system so as to maintain the system. Ideally, there are large number of active users in the system. Each

user has one available Google Drive account associated, which allows the user to store data on Google

Drive server. Each user differs from each other by the localhost socket i.e. IP address and port number.

The storage cloud of our system is dependent on Google Drive. Currently, free storage limit of Google

Drive is 15GB, so the limit of CloudChord for each user is also 15GB.

3.2 Design Goals

Our project aims to build a distributed structured P2P software. It connects a large number of users who

own physical Google Cloud storage services. As for each user, the system runs as a software agent and

decentralized infrastructure which connects not only his own Google Drive service but other users’

Google Drive service as well. The design goals of this project listed as follows:

Security: The data content should be encrypted at the user side before exporting to the system, and the

(10)

9 | P a g e Connectivity: All the users should be connected with each other. The logical structure of the system is

represented as Chord which forms a ring from a large number of nodes. Every node in the system keeps

track of its predecessor node and successor node.

Fairness: Due to load balancing, the amount of space that each user stored in their own cloud storage

should be at a similar level with each other.

High Availability: Given the file name, the owner of the file can efficiently retrieve the file from the

system via the highly efficient LOOKUP time complexity O(logN) by finger table. Each user’s file should

be replicated for copies, so that even if one of the nodes keeping the file blocks for the source file gets

crashed. The user still can recover the source file from the duplicated file block.

Anonymity: Each user should not know data block stored in his cloud storage belong to which user. Even

if the data block belongs to the current user himself, he should realize until retrieval for the file. So that

it is difficult for a user to find out the access pattern among each other.

Low data access latency: files are usually uploaded and downloaded among multiple users in parallel so

(11)

10 | P a g e

4. Design and Implementation

In this section, we presented the design and implementation of CloudChord system from both front-end

and back-end.

4.1 Front-end

The front-end not only provides basic interface GUI to interact with a user but also offers some basic

functions for the user once he joins the CloudChord. First, the GUI provides menus, text fields, buttons

and pop-up windows for users to send requests, and receives the messages from the command. Users

can easily check their finger tables, local cloud storage file lists and the destination nodes storing

uploaded file blocks etc. Besides that, the uploading, downloading, quitting and load balancing functions

can be initiated in the front-end, which make the system easy to use. Second, the front-end subsystem

creates the basic directory structure and property files for each user. Whenever the node starts or joins

the system, it creates a home directory and a downloading directory respectively. Three property files

sent_file.props, cloud.props, and name.props are also created for each user, which are required for file

manipulation. Third, the console in the front-end provides the feedback for user’s request (such as basic

node information), the feedback for file manipulation and illegal manipulation, etc. The GUI figure of the

(12)

11 | P a g e

4.2 Back-end

4.2.1 Structure of Chord in CloudChord

The backend provides all the core functions in CloudChord, the overall structure is shown as the figure

below. Each user keeps three property files locally which are required for the file manipulation.

CloudChord connects each account with their Google Drive account. File blocks are stored in each users’

cloud storage. Metadata for each file block is also stored in cloud storage. On uploading, the local node

splits the file of interest into blocks and sends out each file blocks to the corresponding cloud storage for

(13)

12 | P a g e blocks send backs their file blocks from the cloud storage, the requested node then merges all the blocks

to get the original file.

Naming

In CloudChord, every node is unique for its socket address, which is the localhost IP address and a port

number user specified. Each CloudChord file identifier is based on the hash value of the original file

name. We apply consistent cryptographic hash function SHA-1 to provide the identifier for each node by

hashing the socket address. After that, the customized compression function is used to provide a 32-bit

identifier for each node from the message digest computed by SHA-1. Since the identifier length for

(14)

13 | P a g e

Updating Successor

For each node in CloudChord, it is critical to keep connected with the successor and the predecessor of

the node. It is required that each node periodically keeps checking the online status of its successor. And

thus, our system is able to dynamically detect two scenarios: a new node joins the CloudChord and an

existing node leaves the system. The flow chart below shows how stabilizer works.

Once a new node A joins the system, it needs to contact one existing node B. Based on node A’s

CloudChord identifier, node B looks up A’s successor node C first by looking up its finger table. If node C

is not the most direct successor to A, other nodes will be contacted until node A finds out the most

direct most direct successor node C. After that, node A will connect with node C, and set node C as the

successor. Once they are connected, node C will disconnect from its previous predecessor and set node

(15)

14 | P a g e When node A leaves the system, it can also be detected. Every node continuously detects the active

status of its successor node, so that node A’s predecessor B can detect the disconnection of node A.

After that, node B looks up A’s successor node C by checking its finger table. Once they are connected,

node B sets node C as its new successor, and C sets node B as the predecessor in parallel.

Updating Finger Table

Each node maintains the routing information for other 32 nodes as the finger table. The indices of nodes

in the finger table are as follows: letting n be the identifier of the node in question, and i ∈ [1, 32],

finger[i] records the first node that succeeds (n + 2i_{) mod 2}32_{. Every node periodically chooses in random}

one of the nodes from the finger table and checks whether it is still online or not. Moreover, the finger

table keeps updated once new node joins or existing node leaves the system. On one hand, consider a

new node A joins CloudChord, node B is the predecessor of node A. Suppose the finger table of node C

contains node B. Node C can find out the joining of node A through finding that node B has updated its

successor. On the other hand, when node A leaves CloudChord, node B will set A’s successor as the new

successor in B’s finger table, so that node C can update the finger table once node B updates its

successor. Also, the newcomer node A, it can keep asking its successor’s successor recursively until it fills

(16)

15 | P a g e

4.2.2 File Manipulation

Property File

There are three property files created for each user for file manipulation in CloudChord. Name.props is

used during the renaming process, which records the source file name and its corresponding hash value

file name. Sent_file.props keeps track of the file block and the destination socket address that the block

is sent to. Cloud.props records all the blocks stored in the current cloud storage and its corresponding

(17)

16 | P a g e

Uploading

When a user requests for file uploading, the file is renamed after the hash value from SHA-1 hashing, so

that the new file name shares the same domain space with the node identifier. For fault tolerance, each

file is replicated before outsourcing to cloud storage, and the duplicate copy is also outsourced together

with the original file. The duplicated one will be used for the original file is lost or broken. Name.props

file is used to record the original file name and its corresponding hash name. In order to ensure user

privacy, the original file is encrypted with AES encryption before outsourcing to other nodes. The

encryption key is kept on the user side. After that, the encrypted file is equally split into a certain

number of file blocks. Based on the identifier name for each split file block, it is outsourced to its

successor node. Sent_file props file records the split the file block and the destination successor socket

addresses. After a successor node receives a file block, it uploads to its local cloud storage service with

recording Google Drive file identifier assigned from Google Drive. Cloud.props file keeps track of the

key-value pair for split file block and its Google Drive file identifier. The flow chart for uploading is shown

(18)

17 | P a g e

Downloading

When a user requests to download a file, CloudChord first checks whether the input file name is owned

by the current user through checking whether the file name is in the sent_file.props file. If it is,

CloudChord sends a “Download Signal” to all of the target socket addresses for requesting the file blocks.

Each of the target sockets calls Google Drive Download API to download the target file block and send

back to the requested node. After collecting all the required file block, the requested node merges all

the file blocks. With the key stored in the user directory, the node decrypts the merged file to get the

original plain data. Finally, the file was renamed back to the original file name by looking up name.props

(19)

18 | P a g e

Quitting

When a node leaves the chord, all of the file blocks in node A will be migrated to its successor node B.

On one hand, the node removes all of the blocks from its cloud storage together with deleting all

records in cloud.props file. The successor B uploads all the migrated blocks to the cloud storage together

with updating its cloud.props file. On the other hand, the current node A needs to inform all of the

owners for the migrated blocks to update their sent_file.props about the new destination node address

for the migrated blocks. The flow chart for quitting is shown below:

4.2.3 Load Balance for User Storage

In order to allocate roughly equal number of uploaded files to every user cloud’s storage, a load

balancing mechanism is implemented as follows: once the total size of uploaded cloud storage for one

node exceeds a designated load balancing limit, CloudChord will choose a certain percentage of total file

(20)

19 | P a g e to the node quitting process, and the only difference is that a certain percentage of total blocks instead

of all of the blocks are chosen to migrate to the successor. The process will continue until the total

number of uploaded blocks becomes smaller than the load balance limit. The flow chart for load

(21)

20 | P a g e

5. Evaluation

In this section, we report the evaluation of the performance of CloudChord for file

uploading/downloading and file distribution with load balancing.

5.1 Uploading and Downloading

We compare the time of direct uploading/ downloading and 5-node in CloudChord for uploading

/downloading with the number of files with the total size of 1KB, 2KB, 10KB, 20KB, and 40KB.

Experiment 1

Time: 10/2/18 3:00 PM

Place: campus

Size of split block: 256B

File type: txt

Time(sec) for Uploading and Downloading with different size of files

Uploading Downloading

Direct 5 users Direct 5 users

1 KB 5.463 5.212 1.339 1.258

2 KB 11.633 10.173 3.741 3.181

(22)

21 | P a g e

20 KB 115.325 103.767 36.099 31.483

40 KB 211.953 203.874 67.598 66.184

Experiment 2

Time: 10/2/18 8:00 PM

Place: home

File type: txt

1 KB 5.771 5.778 1.394 1.466

2 KB 12.749 11.979 3.413 3.224

10 KB 57.933 57.347 19.995 18.621

20 KB 117.654 118.564 35.347 33.557

(23)

22 | P a g e

Experiment 3

Time: 10/12/18 3:00 PM

Place: campus

File type: txt

1 KB 5.283 4.049 1.654 1.326

2 KB 10.554 10.896 3.509 2.976

10 KB 51.1 46.264 18.201 14.765

20 KB 105.087 93.541 38.496 32.764

40 KB 223.347 196.647 64.679 59.675

Experiment 4

Time: 10/12/18 8:00 PM

Place: home

(24)

23 | P a g e Time(sec) for Uploading and Downloading with different size of files

1 KB 5.842 5.848 1.415 1.49

2 KB 11.351 10.045 3.383 3.214

10 KB 56.302 56.686 18.479 16.05

20 KB 120.741 115.661 36.5 37.463

40 KB 223.792 212.704 66.76 65.55

Experiment 5

Time: 10/16/18 3:00 PM

Place: campus

File type: txt

1 KB 5.689 4.932 1.564 1.579

2 KB 10.893 10.329 3.876 3.088

10 KB 53.765 48.521 17.976 15.543

20 KB 109.345 101.432 37.987 31.875

(25)

24 | P a g e

Experiment 6

Time: 10/16/18 8:00 PM

Place: home

File type: txt

1 KB 5.964 5.787 1.904 1.865

2 KB 12.292 12.81 3.21 3.094

10 KB 57.703 52.928 16.774 16.004

20 KB 123.752 114.09 35.693 35.764

(26)

25 | P a g e Summary of experiments

Average Time(sec) for Uploading on campus with different size of files

Average Time(sec) for Downloading on campus with different size of files

0 50 100 150 200 250

1k 2k 10k 20k 40k

sec

Campus Uploading

direct 5 node

0 10 20 30 40 50 60 70 80

1k 2k 10k 20k 40k

sec

Campus Downloading

(27)

26 | P a g e Average Time(sec) for Uploading at home with different size of files

Average Time(sec) for Downloading at home with different size of files

0 50 100 150 200 250

1k 2k 10k 20k 40k

sec

Home Uploading

direct 5 node

0 10 20 30 40 50 60 70 80

1k 2k 10k 20k 40k

sec

Home Downloading

(28)

27 | P a g e The results above indicate the following conclusions:

1. Speed for downloading is faster than uploading. The possible reason is that the downloading

speed is usually faster than uploading.

2. With the increase of file size, the speed for uploading and downloading is getting slower. The

possible reason is that the Round trip time for single file block to upload and download is the

limiting step. So that with larger file size is, the more file blocks it has. That leads to the much

longer time.

3. The performance of 5-user CloudChord is a bit better than direct file uploading and downloading

especially on campus. The possible reason is that there is a trade-off between the number of

nodes and the network traffic. Although more nodes in CloudChord leading to the more network

traffic with longer time, multiple nodes also save the time by running among more

uploading/downloading in parallel. Since we use the local machine with the different specified

port numbers to mimic different users, our results imply that our system may apply to the

environment with many users in a small area like on campus or in a neighborhood.

5.2 Load Balancing

In order to determine the performance of load balancing, with 100 equally-sized files (256

bytes/file), we count the distribution of files with and without load balancing.

Experiment 1

Time: 10/2/18 4:00 PM

Place: campus

Number of users: 5

(29)

28 | P a g e Number of Files: 100

Size of Block: 256B

Percentage of migration: 10%

File type: txt

Distribution of 100 files among 5 users

User 1 User 2 User 3 User 4 User 5

(-) Load Balancing 41 21 6 15 17

(+) Load Balancing 24 23 7 24 22

Experiment 2

Time: 10/12/18 4:00 PM

Place: campus

Number of Nodes: 5

Load Balancing upper bound: 6KB

Number of Files: 100

Size of Block: 256B

(30)

29 | P a g e Distribution of 100 files among 5 users

User 1 User 2 User 3 User 4 User 5

(+) Load Balancing 17 13 24 24 22

Experiment 3

Time: 10/16/18 4:00 PM

Place: campus

Number of Nodes: 5

Load Balancing upper bound: 6KB

Number of Files: 100

Size of Block: 256B

File type: txt

Distribution of 100 files among 5 nodes

Node 1 Node 2 Node 3 Node 4 Node 5

(31)

30 | P a g e Summary of Experiment

Average Distribution of 100 files among 5 nodes in 3 times without Load Balancing

Average Distribution of 100 files among 5 nodes in 3 times with Load Balancing

0 10 20 30 40 50

user 1 user 2 user 3 user 4 user 5

Number of Files Blocks Stored Without

Load Balance

0 10 20 30 40 50

user 1 user 2 user 3 user 4 user 5

(32)

31 | P a g e The results above indicate the following conclusion:

1. Although the SHA-1 hash function is designed for file block to distribute evenly among the users,

without load balancing, file distribution occasionally varies quite differently even for the same

node. The possible reason might be the number of nodes is still too few so that the successor for

file blocks cannot be evenly distributed.

2. From the standard deviation of the above two graphs, we can see that files distribute relatively

evenly with the load balancing, which is consistent with our expectation and that also partially

(33)

32 | P a g e

6. Conclusions and Future Work

In this project, we studied the problem of protecting each user’s access pattern from its cloud storage

server. We proposed to build a distributed chord system in the users’ side. Based on the observation

that different users of a cloud storage service extend a large variety of access patterns, by mixing the

various access patterns of different users, each individual user’s access pattern can be buried by the

collective access patterns of many users. We designed and implemented as a P2P system called

CloudChord, which integrates the cloud storage accounts of a set of users into a collective cloud storage

service, provides a platform for each of the users to store and access their file block from a collective

cloud storage service, and mixes the access patterns of the users. From our evaluation results, we

showed the effectiveness of load balancing for the distribution of file blocks. We also found that the

performance of CloudChord with the collaboration among multiple users is better than direct file access

by each user individually.

The future direction of this work is needed to implement periodically renaming the file blocks for

uploading and periodically randomly choose one of duplicated file block copies for downloading. With

(34)

33 | P a g e

References

[1] Google Drive API: https://developers.google.com/drive/v3/web/about-sdk.

[2] M. V. D. EMIL STEFANOV, ELAINE SHI, CHRISTOPHER FLETCHER, LING REN, XIANGYAO YU, SRINIVAS DEVADAS, Path ORAM: An Extremely Simple Oblivious RAM Protocol, CSS (2013).

[3] W. GASARCH, A Survey on Private Information Retrieval, BEATCS Computational Complexity Column, 82 (2004), pp. 72.

[4] "Skype website," http://www.skype.com/.

[5] "The Gnutella protocol specification v0.4,"http://www9/. limewire.com/developer/gnutella protocol 0.4.pdf.

[6] "The Gnutella protocol specification v0.6," http://rfcgnutella.sourceforge.net/src/rfc-06-draft.html.

[7] "Kazaa website," http://www.kazaa.com/.

[8] B. COHEN, The BitTorrent protocol specification version 11031.

(35)

34 | P a g e

Appendix A: CloudChord Instructions

Install the gradle

The instruction of installation and information for Gradle:

https://gradle.org/install/

In this project, we integrate the gradle in the Eclipse.

The gradle version is 4.6

Start the Program

1. Import the project as the gradle project first, and then follow the import instructions to

integrate the project into Eclipse.

2. Follow the path “src/main/java/frontEnd/Main.java”.

3. Right-click Main.java and choose “Run as Java Application”, and our project get started and the

GUI of this project will be promoted to the user.

Join the CloudChord

For the majority of cases, a user just needs to join the CloudChord with the introduction of one existing

node in CloudChord. Here is the basic step:

1. Fill in the port number in the Join filed ranging from 1024 to 65535

2. From the “Plz Select to Join” drop-down menu, it contains the list of socket address for the all

(36)

35 | P a g e 3. After Clicking the Join button, the success Join Information will be promoted on the console as

the following figure shown:

For the few cases that the first user to start CloudChord, he needs to fill in port number to initiate and

(37)

36 | P a g e

Appendix B: System Architecture

Adding Account

A user should own the Google Drive account and provides the authorization information. Please follow

the following steps to step up the Google Drive account and add the authorization file to CloudChord.

1. Before using the CloudChord, each user should own an enabled Google Drive account. If not,

follow the signup Google account instructions to set up one.

2. Click Google Drive API Console to create or select the project and click Continue. And then select

credentials.

3. In the Credentials pages with the subtitle of Add credentials to your project page, scroll down to

the bottom and click the Cancel button.

4. In the Credentials page, select OAuth consent screen tab. Enter the Application name if it is not

automatically set, and click Save button

5. Select Create Credentials button on the Credentials tab, and choose Create to OAuth client ID.

6. Choose Other in the application type, and enter “CloudChord” or another specified name in the

name field, and click Create button.

7. Click OK in the prompt dialog.

8. Click Download JSON in the rightmost to the client ID.

9. Rename the downloaded JSON file to client_secret.json and added it to the user’s home

(38)

37 | P a g e

Node Information

The About button provides basic node information for the current node. First, the socket address, IP

address and port number are listed. Second, the location in the chord structure is provided in the form

of the percentage. Third, information about the predecessor node and successor node are also provided,

which are important for the stabilization of the node and file manipulation.

Here is the example of the node information

Figure Table

The finger table information can be accessed by clicking the FigureTbl button. There are 32 entries in the

finger table since CloudChord is based on a 32-bit identifier. For each entry in the finger table, it lists the

index in the finger list, the corresponding Chord value. The next three items are about the successor

node. The socket address for the successor node, its hash code and its location in the CloudChord shown

(39)

38 | P a g e Here is the example of finger table:

Property File

Sent_file.props records all the outsourced file blocks and the location of their storage node. Each entry

in the sent_file.props is composed of the name of file block and socket address of the destination node.

As for the name of file blocks, there are some notations: cp_ indicates duplicated version, enc_ indicates

encoded file, and .sp indicates the split part of the file. SentInfo button initiates the sent_file.props

(40)

39 | P a g e Name.props keeps track of the file block and its corresponding hash value, which is used in file retrieval.

The naming notation is consistent with the naming notation in the sent_file.props, and its hash value is

followed by the name. NameInfo button initiates the name.props information as shown below:

As for cloud.props, since Google Drive assign each uploaded file with one Google Drive identifier.

Cloud.props records file block hash name and its corresponding Google drive identifier file name stored

(41)

40 | P a g e

Uploading File

Uploading is one of the core functions in CloudChord. The algorithm for uploading file is listed in the

report, the main step shown below:

1. First click SelectFile button, the Choose File window prompt. The path directly navigates to the

user’s home directory created by CloudChord.

2. The selected file name will be shown right to SelectFile button. If the file is the correct file to

upload, click the Upload button. Otherwise, the user can select the file again.

3. After clicking the Upload button, the file is encrypted with AES encryption, and the key for the

encryption is stored in the user’s home directory. With that, the encrypted file will be split into a

certain number of file blocks. And each encrypted file block will be outsourced to cloud storage.

4. The uploading information will be prompted on the console. The basic information includes the

outsourced file block, the hash name for the file block and the successor node for the file block.

Besides that, we also keep track of the total uploaded size for the current cloud storage for the

(42)

41 | P a g e also track the uploading time to show in the console. An example of uploaded information from

the console is shown below:

Downloading File

Downloading is another the core functions in CloudChord. The algorithm for downloading is listed in the

report, the main step shown below:

1. A user enters the file name for the downloading file. If the file is not sent from the current user,

a notice will show up to remind of the illegal operation. Otherwise, the user is free to download

(43)

42 | P a g e 2. After clicking the Download button, the current node retrieves the file block from the successor

node for each file block and automatically download into the user’s Download directory. The

console will show information as follows:

3. Each file block will be merged together based on the naming, which is the encoded file. With the

secret key stored in the home directory, the encoded file will be decoded into the plain text

source file.

Load Balancing

Load Balancing maintains files in CloudChord evenly distribute for all nodes cloud storage. The initiation

load balance is to clicking the LoadBal button, once the total size for one node storage exceeds the load

balance limit, certain percentage of files from cloud storage will be migrated to its successor node. We

(44)

43 | P a g e

Appendix C: Implementation

In this section, we present the source code for the core steps for this project

Dependencies

In build.gradle files, we need to Google Drive api

(45)

(46)

45 | P a g e Downloading

(47)

46 | P a g e set predecessor

(48)

(49)