Public Auditing Based on Homomorphic Hash Function in
Secure Cloud Storage
1
Shufen NIU,
2Caifen Wang,
3Xiaoni DU
1,
College of Computer Science and Engineering, Northwest Normal University,
Lanzhou, Gansu 730070, China, E-mail:{sfniu76,duxn}@nwnu.edu.cn
*2, Corresponding Author
College of Computer Science and Engineering, Northwest Normal
University,Lanzhou, Gansu 730070, China, E-mail:[email protected]
3,
College of Computer Science and Engineering, Northwest Normal University,
Lanzhou, Gansu 730070, China
Abstract
Public verification enables a third party auditor (TPA), on the behalf of the data owner, to verify the integrity of cloud storage with the data owner’s public key. In this paper, by utilizing homomorphic vector hash function, we propose a secure cloud storage scheme supporting privacy-preserving public auditing for multiple files with different identifier. To achieve privacy preserving public auditing, we propose to integrate the homomorphic linear authenticator with random vector masking to hide linear combination of the data files. Meanwhile, the time it takes to audit are not affected by the number of files. The security of public auditing scheme relies on the hardness of the computational Diffie-Hellman and discrete logarithm problems under the random oracle model. Furthermore, through theoretical analysis and experimental results, the proposed scheme is demonstrated to have efficient performance.
Keywords
:
Data Storage, Privacy-Preserving, Public Auditability, Cloud Computing, Homomorphic Hashing Function1.
Introduction
Storing data in the cloud has become a trend. Increasing the number of clients store their important data in remote servers in the cloud, without leaving a copy in their local computers. Sometimes the data stored in the cloud is so important that the clients must ensure it is not lost or corrupted.
For the clients, the tasks of auditing the data correctness in a cloud environment can be formidable and expensive. So the clients may resort to an independent third party auditor (TPA) to audit the outsourced data.
Because TPA can not retrieve the entire storage file, traditional cryptographic primitives for the purpose of data security protection cannot be directly adopted [1],[2],[3]. Ateniese et al. [2] first proposed provable data possession (PDP), which allows a client to verify the integrity of her data stored in an untrusted server without retrieving the entire file. Wang et al. [4] constructed a public auditing mechanism for cloud data. In the scheme, the content of private data belonging to a personal a client is not disclosed to the third party auditor. Wang et al. [6] first proposed a mechanism for public auditing shared data in the cloud for a group of clients. With ring signature-based homomorphic authenticators, the TPA can verify the integrity of shared data but is not able to reveal the identity of the signer on each block. Chen et al. [7] introduced a mechanism for auditing the correctness of data with the multi-server scenario, where these data are encoded with network coding. More recently, Cao et al. [8] constructed an LT code-based secure cloud storage mechanism.
To overcome this drawback, a proper approach is to combine the homomorphic authenticator with random masking [4][10]. With random masking, the TPA no longer has all the necessary information to build up a correct group of linear equations and therefore cannot derive the owner’s data content.
In this paper, we propose a new privacy-preserving public auditing mechanism for data storage in an untrusted cloud. In our approach, we utilize homomorphic vector hashing function [11] to construct homomorphic authenticators [2],[9],[12] so that the third party auditor is able to verify the integrity of data without retrieving the entire data. To achieve privacy preserving public auditing, we propose to uniquely integrate the homomorphic linear authenticator with random vector masking to hide linear combination of the data files. We have the following main contributions:
1)We consider m files with m different identifiers. By exploiting homomorphic hash function, a authenticator
i for fileF
i
(
m m
i1,
i2,...,
m
in)(
i
1, 2,..., )
m
is computed, not for files blockm
ij . Thus for fileF
i, we produce one authenticator
i, not n authenticators
i1,
i2,...,
in, . Meanwhile, the amount of information used for verification, as well as the computation cost at the verification process is independent from the number of files. This processes can significantly reduce the communication, computation and storage costs.2)To achieve privacy-preserving public auditing, we propose to uniquely integrate the homomorphic linear authenticator with random vector masking
w
(
w w
1,
2,...,
w
n)
to hide linear combination ofthe data files.
3)The efficiency of the proposed algorithm will be assessed throughtheoretical analysis and experimental simulation.
The rest of this paper is organized as follows: In Section 2, definitions and preliminaries are presented. The proposed privacy-preserving public auditing based on homomorphic hashing scheme is presented in section 3. In Section 4, security analysis of the proposed scheme is discussed. In Section 5,the scheme’s complexity is analyzed in the aspects of computation costs, furthermore, experimental results are presented for the efficiency of the approach in this section. Finally, conclusions and possible research directions are presented in Section 6.
2.Definitions and Preliminaries
We consider a cloud data storage service involving three different entities, as illustrated in Fig 1: the cloud user, who has large amount of data files to be stored in the cloud; the cloud server, which is managed by the cloud service provider to provide data storage service and has significant storage space and computation resources; the third party auditor (TPA), who has expertise and capabilities that cloud users do not have and is trusted to assess the cloud storage service reliability on behalf of the user upon request.
2.1. Auditing Model
We follow a similar definition of previously proposed schemes in the context of [1], [2], and adapt the framework for our privacy-preserving public auditing system.
Definition 1 A public auditing scheme based on homomorphic hashing consists of the following five algorithms: SetUp, SigGen, Challenge, GenProof and CheckProof.
SetUp
(1 )
k
(
pk sk
,
)
: Given the security parameterk
, this function generates the public keypk
and the secret keysk
.pk
is public to everyone, whilesk
is kept secret by the user.SigGen
(
pk sk F
,
, )
F : Givenpk
,sk
andF
, this function computes a verification signature
F and makes it publicly known to everyone. This signature will be used for public verification of data integrity.Challenge
(
pk
,
F)
chal
: Using this function, the TPA generates a challengechal
to request for the integrity proof of filesF
. The TPA sendschal
to the server.GenProof
(
pk
,
F, ,
F chal
)
P
: Using this function, the server computes a responseP
to thechallenge
chal
. The server sendsP
back to the TPA.CheckProof
(
pk
,
F,
chal P
, )
(
success failure
,
)
: The TPA checks the validity of theresponse
P
. If it is valid, the function outputssuccess
, otherwise the function outputsfailure
. The secret keysk
is not needed in the CheckProof function.2.2. Security Requirements
There are two security requirements for the privacy-preserving public auditing based on homomorphic hashing scheme: security against the server with public verifiability, and privacy-preserving against third party auditor. We first give the definition of security against the server with public verifiability. In this definition, we have two entities: a challenger that stands for either the client or third party auditor, and an adversary that stands for the untrusted server.
Definition 2 (Security against the Server with Public Verifiability) We consider a game between a challenger and an adversary that has four phases: Setup, Query, Challenge and Forge.
Setup: The challenger runs the SetUp function, and gets the
(
pk sk
,
)
. The challenger sendspk
to the adversary and keepssk
secret.Query: An adversary adaptively selects some files
F i
i(
1, 2,..., )
m
and queries the verification signatures from the challenger. The challenger computes a verification signature
i for each of files and sends
i to the adversary.Challenge: The challenger generates the
chal
for the fileF i
i(
1, 2,..., )
m
and sends its to theadversary.
Forge: The adversary computes a response
P
to prove the integrity of the requested files. If CheckProof(
pk
,
i,
chal P
, )
success
then the adversary has won the game.Following we define the privacy-preserving against TPA auditing, which is given in Definition 3. In this definition, we also have two entities: a challenger that stands for either the client or the server, and an adversary that stands for the TPA .
Definition 3 (Privacy-Preserving against Third Party Auditor) We say a privacy-preserving public auditing based on homomorphic hashing scheme is privacy-preserving if there exists an extraction algorithm such that, for every adversary, whenever adversary playing the game, outputs an admissible cheating prover
P
0 for sampled filesF
, the extraction algorithm recoversF
fromP
0 .2.3. Preliminaries
In this subsection, we first introduce bilinear maps and several complexity assumptions. Then, we briefly describe several cryptographic primitives used in this paper.
Definition 4 Bilinear Map Let
G
1,G
T be multiplicative cyclic groups of prime orderp
, letg
1 , 2g
be generators ofG
1. A bilinear map is a mape G
:
1
G
1
G
Twith the following properties: 1) Computability: there exists an efficiently computable algorithm for computing mape
. 2) Bilinearity: for allu v
,
G
1, anda b
,
Z
p,e u v
(
a,
b)
e u v
( , )
ab.3) Non-degeneracy:
e g g
( ,
1 2)
1
.Definition 5 Discrete Logarithm(DL) Problem For
a
Z
p, giveng h
,
g
a
G
1, outputa
. The DL problem assumption holds inG
1 if not
time algorithm has advantage at least
in solving the discrete logarithm problem inG
1, which means it is computational infeasible to solve the discrete logarithm problem inG
1.Definition 6 Computational Diffie-Hellman(CDH) Problem For
a
Z
p, giveng
2,g
2
G
2, andh
G
1 computeh
a
G
1.The co-CDH problem assumption holds in
G
1 andG
2 if not
time algorithm has advantage at least
in solving the co-CDH problem inG
1 andG
2.Definition 7 Homomorphic hash function [11] Let
G
be a cyclic group of prime orderp
2
in which the discrete logarithm problem is hard, where
is security parameter and the public parameters contain a description ofG
andn
random generatorsg g
1,
2,...,
g
n
G
/ 1
. Then ahomomorphic hashing on message
v
( ,
v v
1 2,...,
v
n)
Z
np can be constructed by: 1( )
n vjj j
H
v
def
g
It is easy to verify that the vector hashing functions satisfying following properties:
1)Homomorphism: For any two messages
m ,m
1 2 , and scalarsw w
1,
2 , it holds that1 2
1 1 2 2 1 2
(
)
(
)
w(
)
wH w
m
w
m
H
m
H
m
.2)Collision Resistance: There is no probabilistic polynomial-time adversary capable of forging 1 2 3 1 2
(
m m m w w
,
,
,
,
)
satisfying bothm
3
w
1m
1
w
2m
2 and1 2
3 1 2
(
)
(
)
w(
)
wH
m
H
m
H
m
.Theorem 1 [11] The homomorphic hashing functions is secure assuming the discrete logarithm problem in
G
is hard.3. The Proposed Scheme
In this section, we describe the proposed auditing scheme. Just as mentioned in Section 2, the proposed scheme has five functions: SetUp, SigGen, Challenge, GenProof and CheckProof. In our scheme, each file vector
F i
i(
1, 2,..., )
m
is divided inton
blocks of equal lengths:1 2
(
,
,...,
)
i i i in
F
m m
m
. To achieve privacy-preserving public auditing, we propose to uniquely integrate the homomorphic linear authenticator with random vector maskingw
(
w w
1,
2,...,
w
n)
to hide linear combination of the data files.Let
G
1 andG
T be multiplicative cyclic groups of prime orderp
2
, ande G
:
1
G
1
G
T be a bilinear map as introduced in preliminaries. Letg
be a generator ofG
1,H
( )
is a secure hash function:Z
p
Z
p. Our scheme is as follows:SetUp
(1 )
: The cloud user chooses a random signing key pair(
spk ssk
,
)
, a randomx
Z
p, a random elementu
G
1, and computesv
g
x, randomly choosesg g
1,
2,...,
g
n
G
/ 1
. The secret parameter issk
( ; ;
x u ssk
)
and the public parameters arepk
(
spk v g g g
; ; ;
1,
2,...,
g
n)
.SigGen
(
pk sk F
,
,
i)
: Given data filesF
i
(
m m
i1,
i2,...,
m
in)
Z
np, and identifierid
i of file(
1, 2,..., )
iF i
m
, the user computes:1)
H id
(
1||
id
2||,...,
id
m)
, hereid
i are chosen by the user uniformly at random fromZ
p as the identifier of fileF
i.2)
(
)
(
n 1 mij idi)
x j ig
ju
F
i
for eachF
i .Denote the set of authenticatorsby
{ }
i 1 i m.In order to ensure the integrity of files identifier. One simple way to do this is to compute || ||...|| || || ( || ||...|| || )
1 2 1 2
t id id idmu ssigssk id id idmu . For simplicity, we assume the TPA knows the number of blocks
n
. The user then sendsF
i along with the verification{ , }
t
to the server and deletes them from local storage.1)with respect to the mechanism we describe in the TagGen phase, the TPA verifies the signature 1 2
(
||
|| ... ||
|| )
ssk m
ssig
id
id
id
u
viaspk
, and quits if the verification fails. Otherwise, the TPA recoversid id
1,
2,...,
id
m,
u
.2) to generate the challenge message for the audit
chal
, the TPA picks a randomc
-element subset 1 2{ ,
,..., }
cI
s s
s
of set[1, ]
n
. For each elementi
I
, the TPA also chooses a random valuev
i. The messagechal
specifies the positions of the files required to be checked.3)The TPA sends
chal
{( , )}
i v
i i I to the server.GenProof
(
pk
,
F,
F chal
i,
)
: Upon receiving challengechal
{( , )}
i v
i i I ,the server 1)chooses a random vectorw
(
w w
1,
2,...,
w
n)
, and calculates1
(
n wj, )
j j
R
e
g
v
2)computes the linear combination of sampled files specified in
chal
: 1 1 2(
,
,..,
)
T sc i s nv
i i
μ
F
w
=
1 1 1 1 2 2 1(
sc,
sc,..,
sc)
T i sv m
i i
w
i sv m
i i
w
i sv m
i in
w
n
3) calculates an aggregated authenticator 1 1 c i s v i i s
G
.
.Then sends P = (μ, σ,R) as the response proof of storage correctness to the TPA.
CheckProof
(
pk
, ,
chal u R
, , )
: With the responseP
( , , )
μ
R
, the TPA computes 1)
H id
(
1||
id
2||,...,
id
m)
. 2) 1 c s i s i iID
v id
then checking the verification equation:
1
( , )? (
n uj ID, )
j j
R e
g e
g
u
v
(1)
The correctness of the above verification equation is elaborated as follows:
( , )
R e
g
=
1 1 1(
j, )
(
c(
i) ) , )
i s n w n id v mij x j j j i s je
g
v e
g
u
g
1 1 1 1
(
, )
(
,
)
sc sc i ij i i i s i s j n w n v m v id x j j j je
g
v e
g
u
g
1 1 1 1
(
,
)
sc sc i ij i i i s j i s n v m n w v id x j j j je
g
g
u
g
=
1(
j, )
n ID j je
g
u
v
4. Security Analysis
We evaluate the security of the proposed scheme by analyzing its fulfillment of the security requirements described in Section 2.
Theorem 2: Under the CDH assumption, the proposed scheme is secure against the untrusted server. Proof We assume there exists an adversary (Now, the server is treated as an adversary) that wins the challenge picked by and show that will be able to forge
'determined by the challenge. If can break the public auditing scheme, we show how to construct an adversary that uses in order tobreak CDH. A simulates the public auditing environment for as follows: is given as inputs values
g
, a 1,
1g
G h
G
, its goal is to outputh
a.Setup: A generates a random generator of
G
1. Denote the generator byg
. sendsv
g
ato. This means that does not know the corresponding secret keya
.Hash Query: When receives a hash query for a value
u
idi , random values
i,
i
Z
p and setsu
idi
g
i
h
i , return to, since1 1 ij n m j j
g
G
, so random choose
i
Z
p and compute 1 1(
)
ij i i n m id j jg
g
u
, 1(
n mij ID)
a(
a)
i i jg
ju
g
Challenge: A generates a
chal
( , ,
i s s
1 2,...,
s
c)
sends it to .Forge: computes a response
'to prove the integrity of the requested file. output(
', )
(
n 1 uj ID, )
j j
R e
g
e
g
u
v
From the view of , 1
1 j n w w j j
g
g
G
, 1 1 j n u t j jg
g
G
, wherew t
,
Z
p By (1), can output 1 1 1 '(
)
sc sc i i i s i s w t ah
v
Theorem 3: Under the DL assumptions, from the server’s response
P
( , , )
μ
R
.
TPA cannot get any information about the client’s dataP
( , , )
μ
R
from the scheme execution. Hence, the scheme is private against third party.Proof We assume there exists an adversary
(Now, the TPA is treated as an adversary) that wins the challenge picked by and show that
will be able to recover1
' sc i s i ij
u
v m
determined by the challenge . From the view of TPA, if TPA can recoveru
', it means
can computeR
' suchthat: 1
( , )
(
n uj ID, )
j jR e
g
e
g
u
v
(2) ' 1( , )
(
n uj ID, )
j jR e
g
e
g
u
v
(3) From (2), (3),(
1, )
(
1 ', )
j j w w n n j j j je
g
g
e
g
g
We have n1 wj n1 wj' jg
j
jg
j
According to Theorem 1, since discrete logarithm problem in
G
1 is hard, adversary
can not compute vectorw
(
w w
1,
2,...,
w
n)
, so
can not recover1
' sc i s i ij
5. Performance Analysis
In this section, we first evaluate the proposed scheme in terms of computation overheads in different phases. After that, we present the experimental results.
5.1. Comparison analysis
We assume have m files with different identifiers in our system, each file
F i
i(
1, 2,..., )
m
are divided inton
blocks of equal lengths. Due to TPA side has different auditing goals: in our scheme, TPA side can choosec
sampled files having different identifiers for auditing, but in [4], TPA side auditc
sampled files, so we will primarily assess the performance of the proposed auditing schemes on both client side and cloud server side. In order to simple our comparison, we convert the exponentiation operations into multiplication operations, it is1 1
|
|
2
G Gp
Exp
Mult
.In following Figures and Tables,
●
m
denotes the number of the files; ●n
denotes the number of file blocks; ●c
denotes the number of sampled files; ●Mult
denotes the multiplication operations; ●Hash
denotes the hash operations; ●Pair
denotes the pairing operations; ●Add
denotes the addition operationsTable 1. Client Side
Our Scheme | | | | 2
( )
2
n p p
m MultHash
[4] Scheme
m n p
(|
| 1)
Mult
m Hash
Table 2. Server Side
Our Scheme ( 1) | |
( ) ( 1)
2
n c p
mc Mult m c Add Pair
[4] Scheme | | ( ( 1)) 2 mc p
m c Mult mcAdd m Hash m Pair
From Table 1 and Table 2 , we can see our algorithm have little computation overheads in comparing with Wang’s [4] on both client side and cloud server side. Furthermore, as shown verification equation (1), the computation cost at TPA is independent from the number of files. On the contrary, the verification size of [4] is linearly increasing with the size of the files.
5.2. Experimental Results
In the experiment, we firstly measure the total computation time of our scheme for different choices of paring parameters. Furthermore, we demonstrate the effectiveness and efficiency of our proposed mechanism in the signature phase, the auditing phase and the genproof phase, respectively. Our implementation was written in C using the Pairing-Based Cryptography Library (libpbc)[13]. The main properties of different tested parameters are summarized in Table 3.
Table 3.Main properties of tested pairing
Type Base field(bits) Dlog security(bits) degree of curve
a 512 1024 2
The computations are run on PC with 2.9 GHz CPU frequency and 4 GB of RAM, using Linux operating system. We utilize two elliptic curve, one is Type a with base field size of 512 bits and the embedding degree 2, the other is Type a with base field size of 1024 bits and the embedding degree 1. The security level is chosen to be
|
p
| 512
and|
p
| 1024
, respectively.Table 4 shows the total computation time of our scheme for different choices of paring parameters. We can see that the computation cost heavily depends on the selected Type of pairing, and for each paring parameters, the computation cost increases with the increasing number of auditing files.
Table 4.The total computation time (s) of the proposed scheme under different paring prameters m=600 c=200 c=300 c=400 c=500 c=600
Type a 129.605 130.744 131.105 131.577 131.922 Type e 292.753 295.957 296.693 302.800 304.464
Furthermore, we demonstrate the effectiveness and efficiency of our proposed mechanism in signature phase, auditing and the genproof phases, respectively.
5.2.1. Performance of the Signature phase
For each sensor nodes, computation costs for the signature mainly rely on the message size. In this experiment, we set the number of files
m
50
, the dimension of messages set ton
5,10, 50,100
, whereas the pairing parameter is fixed at Type a and Type e. As shown in Figure 2, if a larger message size is used, the total computation cost will increase because of the increasing number ofexponentiation operations.
0 100 200 300 400 500 600 700 800 900 1000 5 10 50 100The number of file blocks
Sign ature time (ms) Type a Type e
Figure 2. Computation costs of the client’s signature with m = 50 and n = 5, 10, 50, 100 5.2.2. Performance of the Genproof phase
For the Genproof phase, the efficiency of computation mainly depends on both the number of files and the message size. Thus, in this experiment, we set the length of message at 3200 bytes, whereas, the number of files m is set to 50, 100, 200. As illustrated in Figure 3, the computation cost at the cloud server heavily depends on the selected Type of pairing, and for each paring parameters, the computation cost increases with the increasing number of files.
0 500 1000 1500 2000 2500 50 100 200
The number of auditing files
G
enpro
of t
ime(m
s) Type aType e
Figure 3.Computation costs of the server’s genproof with m = 50, 100, 200
0 50 100 150 200 250 100 200 300 400 500
The number of files
Auditing time(ms
)
Type a Type e
Figure 4. Computation costs of the TPA’s auditing with m = 100, 200, 300, 400, 500 5.2.3. Performance of the Checkproof phase
From Figure 4, we can see that computation cost at the Checkproof phase is independent from the number of files. Our experimental results in this section show that our scheme has a better performance when auditing a large number of files.
6. Summary
In this paper,we propose a new public auditing scheme based on homomorphic hashing in secure cloud storage. The approach is proved to be secure against an untrusted server. It is also private against third party. Both theoretical analysis and experimental results demonstrate that the proposed scheme has very good efficiency in the aspects of computation costs. Currently we are still working on extending the scheme to support data dynamics
7. Acknowledgements
The authors wish to thank the anonymous referees for their patience in reading this manuscript and their invaluable comments and suggestions. The work was supported by the National Natural Science Foundation of China under grant 61202395, 61163038.
8. References
[1] A. Juels and Jr. B. S. Kaliski,"Pors: proofs of retrievability for large files", ACM conf. on Computer and Communications Security, pp. 584-597, 2007.
[2] G. Ateniese, R. Burns, R. Curtmola, J. Herring, L. Kissner, Z. Peterson, and D. Song, "Provable data possession at untrusted stores", ACM conf. on Computer and Communications Security, pp. 598-609, 2007.
[3] K. D. Bowers, A. Juels, and A. Oprea, "Hail: A High-Availability and Integrity Layer for Cloud Storage", in Proc. of ACM CCS’09, pp. 187-198, 2009.
[4] C. Wang, Q. Wang, K. Ren, and W. Lou, "Privacy-Preserving Public Auditing for Data Storage Security in Cloud Computing" in Proc. of IEEE INFOCOM 2010, pp. 525-533, 2010.
[5] G. Ateniese, R. D. Pietro, L. V. Mancini, and G. Tsudik",Scalable and efficient provable data possession", in Proc. of SecureComm’08, pp. 1-10, 2008
[6] B. Wang, B. Li, and H. Li, "Oruta: Privacy-Preserving Public Auditing for Shared Data in the Cloud", in Proc. of IEEE Cloud 2012, 2012.
[7] B. Chen, R. Curtmola, G. Ateniese, and R. Burns," Remote Data Checking for Network Coding-based Distributed Stroage Systems", in Proc. of ACM CCSW 2010, pp. 31-42, 2010.
[8] N. Cao, S. Yu, Z. Yang, W. Lou, and Y. T. Hou, "LT Codes-based Secure and Reliable Cloud Storage Service", in Proc. of IEEE INFOCOM 2012, pp. 693-701, 2012.
[9] Chao Lv, Hui Li, Jianfeng Ma, Ben Niu, Haiyang Jiang, "Security Analysis of a Privacy-preserving ECC-based Grouping-proof Protocol", JCIT: Journal of Convergence Information Technology, Vol. 6, No. 3, pp. 113-119, 2011
[10]Ateniese, R. D. Pietro, L. V. Mancini, and G. Tsudik, "calable and efficient provable data possession", in Proc. of SecureComm’08, pp. 1-10, 2008.
[11]M. Krohn, M. Freedman, D. Mazieres, " On the-fly verification of rateless erasure codes for efficient content distribution", in Proc. of IEEE Symposium on Security and Privacy, pp. 226-240, 2004.
[12]C. Erway, A. Kupcu, C. Papamanthou, and R. Tamassia," Dynamic Provable Data Possession", in Proc. of ACM CCS 2009, pp. 213-222, 2009
[13]Zhe Jia, Lei Pang, Shoushan Luo, Yang Xin, Miao Zhang, "Research on Distributed Privacy-Preserving Data Mining", JCIT: Journal of Convergence Information Technology, Vol. 7, No. 1, pp. 356 -367, 2012