Monir Azraoui, Kaoutar Elkhiyaoui, Refik Molva, Melek Ӧnen
Privacy and Security in Cloud Computing
Réunion CAPPRIS 21 mars 2013
§ Idea: Outsourcing
Ø Huge distributed data centers Ø Offer storage and computation
§ Benefit: Cost reduction
Ø Parallelization
Ø Maintenance, reliability
§ Main phases
Ø Data upload
Ø Computation upload (Java classes) Ø MapReduce
Ø Result return
Cloud computing
Many large files
§ Sensitive data
Ø Companies
F Internal data
F Human resources information
Ø Governmental organizations
F Prefecture: license plates, car owners...
§ Challenge: Prying clouds
Ø Adversary = honest-but-curious cloud Ø Data & Computation privacy
Ø Do not cancel cloud advantages
Ø Lightweight operations at the client side
Privacy in Cloud Computing
§ Proof of retrievability
§ Handling encrypted data
§ Accountability
A4Cloud EU Project
Current Research Focus
§ Proof of Retrievability
Ø Integrity
Ø Very large amounts of data
Ø Integrity proofs computed by untrusted clouds Ø Blockless verification
§ PoR: Juels 2007
§ Provable Data Possession: Ateniese 2007
Current research focus (cont‘d)
§ Handling encrypted data
Ø Prying clouds
F Data encrypted by the cloud
Ø Very large amounts of data
F Operations in the cloud performed by the cloud provider
⇒ Solution for word search: PRISM
Current research focus (cont‘d)
§ Data retention scenario
Ø Internet Service Provider retains
customers’ log/access data (for 6 years…!) Ø Example: DNS logs (time, IP, hostname)
§ Save money: Outsource to cloud
§ Challenge
Ø Protect customer Privacy against prying clouds
F Privacy: Encrypt log entries
Ø Support queries: “Has x accessed y (at time z)?”
F Word Search
Ø Efficiency: Leverage clouds’ massive parallelism
F MapReduce
Handling encrypted data - scenario
Pri
S M
Logs
§ Contribution
Ø Allows finding files containing words in clouds
F Contrary to server-based solutions, e.g., Boneh et.al. ’04 (“PEKS”), Song et.al. ‘00, Popa et.al. ’11 (“Crypt-DB”)
Ø Data privacy: No (non trivial) data analysis
Ø Computation privacy: query privacy, query unlinkability
Ø Evaluation: privacy proofs and implementation (11% overhead)
§ Main idea
Ø Word existence transformed to PIR problems
Ø Map: Evaluate PIR problem per mapper on each InputSplit Ø Reduce: combine mapper output with simple addition
Ø User decodes output, decides existence
PRIvacy preserving Search in MapReduce
PRISM: MapReduce Overview
Mapper InputSplit
Reducer
“PIR Matrix”
E(1) E(0)
E(0) E(0)
E(0) E(1)
∑ ∑
User
Result
Cloud
File
Encrypt & Upload Q(word)
Query for “word”
Q(word) Q(word) Q(word) Q(word)
E( ) E( ) E( ) E( )
homomorphic
Idea: Transform search for “word” into
PIR query
word?
PRISM - Upload
§ Data privacy ⇒ stateful cipher
Ø efficient encryption ⇒ AES
Ø Indistinguishability ⇒ AES + Plaintext counter
⇒ Example:
- K
d= HMAC(K,d) - Initialize: γ
w= 0
- Encrypt: E(w, γ
w) , γ
w= γ
w+ 1 - Maintain counter γ
wfor each w
E(w) = E(w, γ w )
AES Plaintext
counter
Pairing (e.g., padding +
concatenation)
§ Upload: Data ⇔ Matrix
𝑀
§ Query: User computes & send
𝛼 = [𝛼↓ 1 , 𝛼↓ 2 ,.., 𝛼↓𝑘 ,.., 𝛼↓𝑡 ]
Ø 𝛼↓𝑘 = 𝑏 (1+ 𝑎↓𝑘 . 𝑁
) mod p⇒ E(1)
Ø 𝛼↓𝑖 = 𝑏 ( 𝑎↓𝑖 . 𝑁
) mod p⇒ E(0)
§ Process: Server computes
𝜷 = 𝜶 . 𝑴
Ø 𝛽↓𝑗 = ∑↑▒𝛼↓𝑖 . 𝑀↓𝑖 , 𝑗
Ø e.g.
𝛽↓ 2 = 𝛼↓ 1 + 𝛼↓ 2
§ Decode: Homomorphism ⇒ Privacy Ø Unblind with
𝑏↑ −1
mod pPIR: Private Information Retrieval
1 2 ... t
1 1 1 0 1
2 0 1 0 0
... 1 0 0 0
t 1 0 1 0
1101 0100 1000 1010
d1 d2 d3 d4
k?
dk
wants to retrieve some data d
kShould not learn what is retrieved
11 Privacy and Security in Cloud Computing
PRISM – Search: Query transformation
§ User: PrepareQuery(w)
Ø If w exists
F W has been encrypted at least onceà E(w,1) has been uploaded
Ø Computes candidate position:
F CP : <X,Y> = E(w,1)
Ø Compute PIR input 𝛼 = [𝛼↓ 1 , 𝛼↓ 2 ,.., 𝛼↓𝑘 ,.., 𝛼↓𝑡 ]
F 𝛼↓𝑘 =𝑏(1+𝑎↓𝑘 .𝑁) ⇒ 𝛼↓2 = E(1) F 𝛼↓𝑖 =𝑏(𝑎↓𝑖 .𝑁) ⇒ 𝛼↓𝑖 = E(0)
Ø Send 𝛼 to the cloud
⇒Query privacy
PIR 1 2 ... t 1
2 ...
t
CP
PRISM-Search: Map & Reduce
§ Map: PIR Matrix construction (PIR matrix M≠data)
Ø Matrix initialization to 0
Ø ⌊𝐻(𝐶↓𝑖 ) , 𝑗⌋↓ 1 =1 ⇒ compute CP
i=<X
i,Y
i>= C
i§ Map: Process query: Column sums
Ø For all rows
F Compute: 𝜎↓𝑗 =∑↑▒𝛼↓𝑖 .𝑀↓𝑖,𝑗
• 𝜎↓1 =𝛼↓3 +𝛼↓4 =𝐸(0)
• 𝜎↓2 =𝛼↓2 +𝛼↓4 =𝐸(1)
§ Map:Both steps repeated q times Ø Send q vectors σ
§ Reduce:
Ø Aggregation = addition
Ø Homomorphism ⇒ correctness
C1 C2 C3 C4
PIR 1 2 ... t
1 0 0 0 0
2 0 0 0 0
... 0 0 0 0
t 0 0 0 0
PIR 1 2 ... t
1 0 0 0 0
2 0 1 0 0
... 0 0 0 0
t 0 0 0 0
PIR 1 2 ... t
1 0 0 0 0
2 0 1 0 0
... 1 0 0 0
t 0 0 0
PIR 1 2 ... t
1 0 0 1 1
2 0 1 0 1
... 1 0 1 1
t 1 1 0 1
PRISM – Result analysis
§ Receive t sums
Ø Decrypt σ
Y§ Decision
Ø D(σ
Y) =0 & h(C
i)=1⇒ contradiction, w cannot be in file Ø Otherwise w might be in file: false positives (collisions)
§ Run q>1 rounds of PRISM
Ø Depending on t, q, ... tailor false positives probabilities Ø Result: after q rounds, w is in file with high probability
0 1 0 1
Overview: Privacy Properties
§ Encryption of w using “Stateful-Cipher”
Ø Idea: instead of encrypting w, encrypt w with counter γ
wØ C := E(w, γ
w), γ
w:=γ
w+1 for each occurrence of w
Ø Initialize γ
wto 1, search for ciphertext E(w, 1)
§ PIR scheme (computation of P-values)
Ø query for column k (= candidate position, based on w) Ø P
k:= b ⋅ (1 + a
k⋅ N) mod p à E(1)
Ø P
i≠k:= b ⋅ a
i⋅ N mod p à E(0)
§ We formally prove IND-CPA
a
irandom number
b, N, p system parameters
“Pseudorandom Permutation”
Assumption