Dr. Uwe Röhm School of Information Technologies
INFO5011 – Advanced Topics in IT:
Cloud Computing
Week 12: Cloud Computing Security and Data Privacy
INFO5011 "Cloud Computing" - 2011 (U. Röhm and Y. Zhou) 01-2
Outline
! Cloud Computing Security
! Data Privacy in the Cloud
! Some Proposed Techniques
! Data Location Control
Cloud Computing Security
! policies, technologies, and controls
! to protect data, applications, and infrastructure of cloud
computing
INFO5011 "Cloud Computing" - 2011 (U. Röhm and Y. Zhou) 01-3
Cloud Computing Security Risks
A Gartner Report from July 2008 lists 7 specific risks:
! Privileged user access
! Who has access (network & physically) to the machines in the cloud?
! Regulatory compliance
! “Customers are ultimately responsible for the security and integrity of
their own data, even when it is held by a service provider.”
! Data Location
! Some data must be by law (not) stored in certain locations
! Data Segregation
! When data is stored in shared services along data of others
! Recovery
! Investigative support
! Long-term viability
General Approaches
! Policies & Procedures at the Cloud Provider
! E.g. only authorized persons have physical access to data centers
! E.g. how are deletes handled? Google: just delete refs to data,
which gets ‘eventually’ overwritten by other data; wiping old disks
! SLAs
! Authentification & Authorization
! Both for end-users and internal services
! Encryption
! Secure Network Connections
! Infrastructure Services
INFO5011 "Cloud Computing" - 2011 (U. Röhm and Y. Zhou) 01-5
Public vs. Private Cloud
! Public Cloud
! Shared servers and services provided by a different company
! Multi-tenancy; virtual machines
! Private Cloud
! On- or off-premise data centers which host cloud services under the
full control of an enterprise
! VPN tunnel to ‘cloud’ data center
! Partition of data center isolated from public cloud
! E.g. Amazon Virtual Private Cloud (VPC),
or Microsoft Private Cloud Solution
Public Cloud: Multi-tenant Shared Env.
INFO5011 "Cloud Computing" - 2011 (U. Röhm and Y. Zhou) 01-7
Source: Google White Paper
Amazon Virtual Private Cloud
INFO5011 "Cloud Computing" - 2011 (U. Röhm and Y. Zhou) 01-8
What do Cloud Providers do?
! Google’s Security White Paper:
! Discloses its general internal policies & procedures => Trust
! General Company Policies
! Code of Conduct; Employees Hiring Policy
! Information Security Procedures
! E.g. Information Access Controls, Audits, Monitoring etc
! Data Deletion & Media Disposal
! Cf. next slide
! Operational Security
! Virus Checking; Malware Detection etc.
! Network Security
! Monitoring and Incident Management
INFO5011 "Cloud Computing" - 2011 (U. Röhm and Y. Zhou) 01-9
Example: Data Deletion
! Google Security White Paper, pages 5 and 6:
Information Privacy
“
Privacy
is the interest that individuals
have in sustaining a „personal space
䈁
,
free from interference by other people
and organisations”
“Information privacy is the interest an
individual has in controlling, or at least
significantly influencing, the handling of
data about themselves.”
(Clarke, 2006)
A. Quah: Cloud Data Privacy Survey (2010)
INFO5011 "Cloud Computing" - 2011 (U. Röhm and Y. Zhou) 01-12
Sample over 153 Australian Internet Users:
90.6% of the respondents agreed that companies need to inform customers if they store and process personal customer information in the cloud.
Privacy Act 1988 (Cth)
! Protects personal information
! This means any information where an individual is ‘reasonable
identifiable’
! Applies to:
! Federal and ACT Government agencies
! All health service providers
! Private sector organisations with an annual turnover of more than AU
$3 million
! Regulated by the Office of the Privacy Commissioner
National Privacy Principles
Businesses covered by the Privacy Act have to comply with the National Privacy Principles (NPPs).
! NPP 1 – collection
! NPP 2 – use and disclosure
! NPP 3 – data quality
! NPP 4 – data security
! NPP 5 – openness
! NPP 6 – access and correction
! NPP 7 – identifiers
! NPP 8 – anonymity
! NPP 9 – transborder flow of data
NPP 9: Transborder Dataflow
! In the context of cloud computing, NPP 9 about transborder
data flow is most relevant
! Ultimately, data that is sent to the cloud exists on physical servers in
data centres operated by the cloud provider.
! NPP 9 limits the circumstances in which an organisation
may transfer information about an individual to an entity in a foreign country. It requires
! the consent of the individual, or
! the fulfilment of a contract with the individual.
! Otherwise, steps need to be taken to ensure that the recipient
overseas will treat the personal data in substantially the same way as would be required under Australian law.
INFO5011 "Cloud Computing" - 2011 (U. Röhm and Y. Zhou) 01-15
Data Centre Locations
Compare this with Cloud Provider Policies
INFO5011 "Cloud Computing" - 2011 (U. Röhm and Y. Zhou) 01-17
! Note: The USA Patriot Act allows law enforcement agencies to compel
cloud providers to turn over records and data about individual customers.
! The non-disclosure nature of such orders means that the data subject may never be notified that the government had obtained their records.
End-User - Provider Asymmetry
! How can end-users gather any information about a potential
privacy breach at a cloud service provider?
! Both technically (this goes back to Gardner’s Risk of missing support
for Forensics)
! But also legally (cf. non-disclosure nature of US Patriot Act)
! Auditability?
! Even if an individual were able to gather sufficient
information about a privacy violation, it is difficult, slow and expensive to pursue action in the foreign jurisdictions where the breach occurred.
! This is why the location of the data centre is a major
concern when it comes to cloud computing.
WHERE IN THE WORLD IS MY DATA?
Sudarshan Kadambi, Jianjun Chen, Brian F. Cooper, David Lomax, Raghu Ramakrishnan, Adam Silberstein, E.Tam, Hector Garcia-Molina
Yahoo! Research and Stanford University
VLDB 2011
(following slides based on the author’s own slides)
Problem Description
Figure 1: Globally replicated database that asyn-chronously propagates updates to remote datacen-ters.
east coast of the U.S., and in France. For clarity in our dis-cussion, we will focus on a single table containing records, but our techniques generalize directly to multiple tables or other data models. Each replica location stores a full or partial copy of the table.
Because of the high latency for communicating between datacenters, replication is typically done asynchronously. Usually, writes are persisted at one or more local servers and acknowledged to the applications (e.g., made 1-safe). Later, updates are sent to other replica locations. An example of this architecture is shown in Figure 1. As the figure shows, we can think of the system as having two distinct compo-nents: a database system, which manages reads and writes of data records, and a replication system, which manages repli-cation of updates between replica lorepli-cations. In real systems these components might be on the same server (as in MySQL replication [3]) or different servers (as in PNUTS [9]). The replication system must ensure reliable delivery of updates to remote datacenters despite failures. Individual servers might fail (and even lose data), but local or remote copies can be used for recovery.
In each location, a given record exists either as a full
replica or as a stub. A full replica is a normal copy of the
record, possibly enhanced with metadata to support selec-tive replication, such as a list of other full replicas. A stub contains only the record’s primary key and metadata, but no data values. Note that we do not consider selective repli-cation at the field or column level in this paper.
3.1.1 Handling reads and writes
We assume that there is a master copy of each record where updates are applied before being propagated to repli-cas. In PNUTS, the master copy for different records might be in different datacenters: Alice’s master copy might be in France while Bob’s master copy might be in India. The re-sult is a per-record consistency model, where replicas might lag the master by one or more versions, but will always even-tually receive all updates (and apply them in the same or-der). No locking or commit protocol is needed since trans-actions are per-record; for more details see [9]. Our tech-niques extend to systems that do not have a master and allow updates to be applied anywhere (e.g. [12, 18], and as we discuss in Section A.7, a mode of PNUTS that supports eventual consistency).
When a record is inserted, the master copy decides where the full replicas of the record are to exist, and sends full replicas and stubs to the appropriate locations. When a record is updated, the master applies the update and then sends the updated data only to the locations that contain a full replica. This is where the resource savings of selec-tive replication come from, since bandwidth and disk I/Os
are only necessary for full replica locations. If a record is updated in a non-master region, the update has to be for-warded to the master, but this is because of the mastership scheme, not specifically selective replication. When a record is deleted, a message is sent to all replicas (full and stub) to notify them to delete the data.
A record may be read from any location. If the local database contains a full replica, it serves the request. Other-wise, the database reads the list of full replica locations from the stub and forwards the request to one of them (prefer-ably the one with lowest network delay). This is the main penalty for selective replication: some reads that would have been served locally if all data were replicated everywhere now need to be forwarded, with an attendant increase in re-sponse latency (and some cross-datacenter bandwidth cost). As we increase the number of full replicas, there are fewer forwarded reads but also more cost to propagate updates.
It may be necessary to change the set of full replicas if, for example, the access pattern changes. In this case we might
promote some stubs to full replicas and demote some full
replicas to stubs. In our mechanism, each location requests promotion or demotion for records based on local access pat-terns, but the master decides whether to grant the request. This allows the master to enforce constraints like a minimum number of copies (see Section 4.1). If the master decides to convert a replica, it notifies all regions of the new list of full replicas for this record to ensure that reads can be properly forwarded. Additionally, if promoting a stub, the record data must be sent to the location with the new full replica.
3.2 Optimization problem
Inter-datacenter bandwidth can be extremely expensive, especially for datacenters with limited backbone connectiv-ity. Therefore, we optimize system cost by minimizing band-width used. Other costs, such as server cost, are also impor-tant. However, minimizing bandwidth usage means avoid-ing sendavoid-ing traffic to some datacenters, which will also re-duce the number of servers needed in that datacenter. Thus, bandwidth is a useful proxy for total system cost.
Inter-datacenter bandwidth for replication consists of:
• Replication bandwidth: The bandwidth required to send
updates between datacenters.
• Forwarding bandwidth: The bandwidth required to
for-ward read requests to remote datacenters because the local replica contains a stub.
We want to minimize the sum of replication bandwidth and
forwarding bandwidth.
Additionally, two types of constraints must be enforced. First, policy constraints specify where data must or cannot be replicated, often for legal reasons. Policy constraints may also specify a minimum number of full replicas to ensure data availability. Second, latency constraints might specify that the majority of users experience good response time. It is convenient to express this constraint by specifying the fraction of total global reads (e.g., 95%) that must be served by a local, full replica. Satisfying these constraints may mean making more full replicas, or making full replicas in different locations, than would result from simply trying to minimize bandwidth cost.
Then, we can define our optimization problem as follows:
Definition 1. Constrained selective replication problem - Given the following constraints:
1042
! distributed database, with replicas kept in-sync via an
asynchronous replication mechanism
! inter data center communication quite costly + high latency
! Scenario: a social networking application that uses this
distributed database
! Users typically show some locality which should direct the
Criteria to Replicate a Given Record
! Goal: A replication placement algorithm providing
! low bandwidth (keeping transfer costs minimal)
! low latency
! Dynamic Factors
! How often is the record read vs. updated?
! Latency of forwarded reads.
! Static Factors
! Legal Constraints
! Critical data items such as billing records might have additional
replication requirements.
Policy Constraints
•
Mechanism to enforce replica placement
– based on legal dictates, availability needs and other
application requirements.
– part of the schema definition done by a developer
•
Example:
[CONSTRAINT I] IF TABLE_NAME = "Users” THEN SET 'MIN_COPIES' = 2SET 'INCL_LIST' = ’USWest’ CONSTRAINT_PRI = 0
Policy Constraints (cont’d)
•
Example 2:
[
CONSTRAINT II
]
IF
TABLE_NAME = "Users" AND
FIELD_STR('home_location') = ‘France’
THEN
SET 'MIN_COPIES' = 3 AND
SET 'EXCL_LIST' = ’Asia’
CONSTRAINT_PRI = 1
Policy Constraints (cont’d)
•
Constraints can be layered
– highest priority constraints overwrites previous settings
– evaluated per-setting
•
Example 3:
[CONSTRAINT III]
IF
TABLE_NAME = “Users” AND
FIELD_STR(‘home_location’) = ‘India’ THEN
SET ‘INCL_LIST’ = ‘India’ CONSTRAINT_PRI = 2
Would result Indian records having 2 copies (from rule 1), one of which must be in India (rule 3)
Architecture
•
PNUTS
•
Asynchronous, primary-copy Replication
– Appendix briefly discusses update everywhere too
•
Timeline Consistency
– Updates are applied in the same order as on master copy
•
Per-record ‘transactions’
•
Replicate everywhere - with a twist:
– With selective replication, some replicas have a full copy
of record, others only have stubs
– Each stub has the primary key and additional metadata
such as list of replicas that have a full copy of the record.
– Read for a record at a replica that contains a stub will
result in a forwarded read.
Constraint Enforcement
•
Constraints are validated when they're supplied.
– The system doesn't allow constraints to be changed after data is
inserted.
•
Every record has a dedicated master, which makes an initial placement decision when the record is inserted.
– Any replica of R and any stub (R) are published to the remote
locations in a single transaction.
•
Updates only go to full replicas (async.);
•
deletes are executed everywhere
•
If record contents change, full copies can migrate (promotions/demotions).
– Stub-to-full-replica and vice versa based on access patterns
Dynamic Placement
•
Dynamic demotion/promotion of stubs
– Reading a stub: stub promoted to full replica
– Update on replica: if after retention interval -> stub
CryptDB [CIDR2011]
! Motivation:
! Data is stored in hosted environments which are not under direct
control of an organisation anymore
! Hence attacks are not only possible from outside the infrastructure,
but also from inside
! Basically: the platform cannot be trusted anymore
! Goal: information-centric security
! ‘self-defending’ data
! Encrypted
! Associated meta-data about access policy
! How to do this?
!"#$%&'(%#)*+',-'./"0$(1'
'
./"0$(1'2/,0,#34'
'
! !"#$"#%&'(&)*+*&,-&*-&./0&)*+*1*!"& ! 23,!&4567&8*8"$&8$"!"-+!&4$%8+69:&8$';,)"!&8$';*1<"&8$,;*#%& =>*$*-+""!&?,+3'>+&3*;,-=&+'&+$>!+&+3"&69@.&!"$;"$&'$&+3"&69A!& ?3'&B*,-+*,-&*-)&+>-"&+3"&69@.& & ! C"-"$*<&A88$'*#3D& ! )*+*&,-&*-&"-#$%8+")&('$B*+:&*-)&"E"#>+"&./0&F>"$,"!&';"$&"-#$%8+")&)*+*& ?,+3'>+&3*;,-=&*##"!!&+'&+3"&)"#$%8G'-&H"%!I& ! *)J>!+*1<"&F>"$%K1*!")&"-#$%8G'-&",-&*-&'-,'-&'(&&"-#$%8G'-!:&($'B& ?"*H"$&('$B!&'(&"-#$%8G'-&+'&!+$'-="$&('$B!&'(&"-#$%8G'-&+3*+&$";"*<&-'& ,-('$B*G'-&Adjustable query-based encryption
# .+*$+&'>+&+3"&)*+*1*!"&?,+3&+3"&B'!+&!"#>$"&"-#$%8G'-&!#3"B"& #A)J>!+&"-#$%8G'-&)%-*B,#*<<%&
#.+$,8&'L&<";"<!&'(&+3"&'-,'-!D
&
A-%&;*<>"&MN5O& .PA74Q&6P2& 7O6& A-%&;*<>"& NRPKMN5O&NRP& 7O6& ,-+&;*<>"&QN@&CryptDB: A Practical Encrypted Relational DBMS&
N-,'-&S& N-,'-&T& N-,'-&U&
()5%/%+$'6%78+)9:%#'
• randomized encryption (RND) -> maximum security
• deterministic encryption (DET) -> weaker privacy, however
allows server to check for equality
• order-preserving encryption (OPE) -> even more relaxed in
that it enables inequality checks and sorting operations
• homomorphic encryption (HOM) -> enables operations over
encrypted data such as additions,etc.
;&04%&%+$3<,+'
V$'-+"-)& W-B'),X")& 69@.& 4$%8+69&RY& +*1<"!& 4$%8+69&W6V!& Z>!"$K)"X-")& (>-#G'-![& ."$;"$& />"$%& 7"!><+!& P-#$%8+")&/>"$%& P-#$%8+")&7"!><+!& ./0&5-+"$(*#"& #O'*-="&+'&+3"&69@.&
#.3'><)&?'$H&'-&B'!+&./0&69@.&
=>3&04%'
.P0P42&\&V7N@&"B8&]QP7P&!*<*$%&^&
100000
UPDATE table1 SET col3onion1 = DecryptRND(key, col3onion1)
A-%&;*<>"&MN5O&
.PA74Q& 6P2& ?@('
SELECT * FROM table1 WHERE col3onion1 = x5a8c34
(=6' "B8D&
$*-H& -*B"& !*<*$%&
.://%+$'0/,*/%##'
• various components of Relational Cloud and are in the
process of integrating them into a single coherent system, prior to offering it as a service on a public cloud.
• implemented the distributed transaction coordinator along
with the routing, partitioning, replication, and CryptDB components
• developed a placement and migration engine that monitors
database server statistics, OS statistics, and hardware loads, and uses historic statistics to predict the combined load placed by multiple workloads.