INFO5011 Advanced Topics in IT: Cloud Computing Week 12: Cloud Computing Security and Data Privacy

(1)

Dr. Uwe Röhm School of Information Technologies

INFO5011 – Advanced Topics in IT:

Cloud Computing

Week 12: Cloud Computing Security and Data Privacy

INFO5011 "Cloud Computing" - 2011 (U. Röhm and Y. Zhou) 01-2

Outline

!  Cloud Computing Security

!  Data Privacy in the Cloud

!  Some Proposed Techniques

! Data Location Control

(2)

Cloud Computing Security

!  policies, technologies, and controls

!  to protect data, applications, and infrastructure of cloud

computing

Cloud Computing Security Risks

A Gartner Report from July 2008 lists 7 specific risks:

!  Privileged user access

! Who has access (network & physically) to the machines in the cloud?

!  Regulatory compliance

! “Customers are ultimately responsible for the security and integrity of

their own data, even when it is held by a service provider.”

!  Data Location

! Some data must be by law (not) stored in certain locations

!  Data Segregation

! When data is stored in shared services along data of others

!  Recovery

!  Investigative support

!  Long-term viability

(3)

General Approaches

!  Policies & Procedures at the Cloud Provider

! E.g. only authorized persons have physical access to data centers

! E.g. how are deletes handled? Google: just delete refs to data,

which gets ‘eventually’ overwritten by other data; wiping old disks

!  SLAs

!  Authentification & Authorization

! Both for end-users and internal services

!  Encryption

!  Secure Network Connections

!  Infrastructure Services

Public vs. Private Cloud

!  Public Cloud

! Shared servers and services provided by a different company

! Multi-tenancy; virtual machines

!  Private Cloud

! On- or off-premise data centers which host cloud services under the

full control of an enterprise

! VPN tunnel to ‘cloud’ data center

! Partition of data center isolated from public cloud

! E.g. Amazon Virtual Private Cloud (VPC),

or Microsoft Private Cloud Solution

(4)

Public Cloud: Multi-tenant Shared Env.

Source: Google White Paper

Amazon Virtual Private Cloud

(5)

What do Cloud Providers do?

!  Google’s Security White Paper:

! Discloses its general internal policies & procedures => Trust

! General Company Policies

!  Code of Conduct; Employees Hiring Policy

! Information Security Procedures

!  E.g. Information Access Controls, Audits, Monitoring etc

! Data Deletion & Media Disposal

!  Cf. next slide

! Operational Security

!  Virus Checking; Malware Detection etc.

!  Network Security

! Monitoring and Incident Management

Example: Data Deletion

!  Google Security White Paper, pages 5 and 6:

(6)

Information Privacy

“

Privacy

is the interest that individuals

have in sustaining a „personal space

䈁

,

free from interference by other people

and organisations”

“Information privacy is the interest an

individual has in controlling, or at least

significantly influencing, the handling of

data about themselves.”

(Clarke, 2006)

A. Quah: Cloud Data Privacy Survey (2010)

Sample over 153 Australian Internet Users:

90.6% of the respondents agreed that companies need to inform customers if they store and process personal customer information in the cloud.

(7)

Privacy Act 1988 (Cth)

!  Protects personal information

! This means any information where an individual is ‘reasonable

identifiable’

!  Applies to:

! Federal and ACT Government agencies

! All health service providers

! Private sector organisations with an annual turnover of more than AU

$3 million

!  Regulated by the Office of the Privacy Commissioner

National Privacy Principles

Businesses covered by the Privacy Act have to comply with the National Privacy Principles (NPPs).

!  NPP 1 – collection

!  NPP 2 – use and disclosure

!  NPP 3 – data quality

!  NPP 4 – data security

!  NPP 5 – openness

!  NPP 6 – access and correction

!  NPP 7 – identifiers

!  NPP 8 – anonymity

!  NPP 9 – transborder flow of data

(8)

NPP 9: Transborder Dataflow

!  In the context of cloud computing, NPP 9 about transborder

data flow is most relevant

! Ultimately, data that is sent to the cloud exists on physical servers in

data centres operated by the cloud provider.

!  NPP 9 limits the circumstances in which an organisation

may transfer information about an individual to an entity in a foreign country. It requires

! the consent of the individual, or

! the fulfilment of a contract with the individual.

! Otherwise, steps need to be taken to ensure that the recipient

overseas will treat the personal data in substantially the same way as would be required under Australian law.

Data Centre Locations

(9)

Compare this with Cloud Provider Policies

!  Note: The USA Patriot Act allows law enforcement agencies to compel

cloud providers to turn over records and data about individual customers.

! The non-disclosure nature of such orders means that the data subject may never be notified that the government had obtained their records.

End-User - Provider Asymmetry

!  How can end-users gather any information about a potential

privacy breach at a cloud service provider?

! Both technically (this goes back to Gardner’s Risk of missing support

for Forensics)

! But also legally (cf. non-disclosure nature of US Patriot Act)

! Auditability?

!  Even if an individual were able to gather sufficient

information about a privacy violation, it is difficult, slow and expensive to pursue action in the foreign jurisdictions where the breach occurred.

!  This is why the location of the data centre is a major

concern when it comes to cloud computing.

(10)

WHERE IN THE WORLD IS MY DATA?

Sudarshan Kadambi, Jianjun Chen, Brian F. Cooper, David Lomax, Raghu Ramakrishnan, Adam Silberstein, E.Tam, Hector Garcia-Molina

Yahoo! Research and Stanford University

VLDB 2011

(following slides based on the author’s own slides)

Problem Description

Figure 1: Globally replicated database that asyn-chronously propagates updates to remote datacen-ters.

east coast of the U.S., and in France. For clarity in our dis-cussion, we will focus on a single table containing records, but our techniques generalize directly to multiple tables or other data models. Each replica location stores a full or partial copy of the table.

Because of the high latency for communicating between datacenters, replication is typically done asynchronously. Usually, writes are persisted at one or more local servers and acknowledged to the applications (e.g., made 1-safe). Later, updates are sent to other replica locations. An example of this architecture is shown in Figure 1. As the figure shows, we can think of the system as having two distinct compo-nents: a database system, which manages reads and writes of data records, and a replication system, which manages repli-cation of updates between replica lorepli-cations. In real systems these components might be on the same server (as in MySQL replication [3]) or diﬀerent servers (as in PNUTS [9]). The replication system must ensure reliable delivery of updates to remote datacenters despite failures. Individual servers might fail (and even lose data), but local or remote copies can be used for recovery.

In each location, a given record exists either as a full

replica or as a stub. A full replica is a normal copy of the

record, possibly enhanced with metadata to support selec-tive replication, such as a list of other full replicas. A stub contains only the record’s primary key and metadata, but no data values. Note that we do not consider selective repli-cation at the field or column level in this paper.

3.1.1 Handling reads and writes

We assume that there is a master copy of each record where updates are applied before being propagated to repli-cas. In PNUTS, the master copy for diﬀerent records might be in diﬀerent datacenters: Alice’s master copy might be in France while Bob’s master copy might be in India. The re-sult is a per-record consistency model, where replicas might lag the master by one or more versions, but will always even-tually receive all updates (and apply them in the same or-der). No locking or commit protocol is needed since trans-actions are per-record; for more details see [9]. Our tech-niques extend to systems that do not have a master and allow updates to be applied anywhere (e.g. [12, 18], and as we discuss in Section A.7, a mode of PNUTS that supports eventual consistency).

When a record is inserted, the master copy decides where the full replicas of the record are to exist, and sends full replicas and stubs to the appropriate locations. When a record is updated, the master applies the update and then sends the updated data only to the locations that contain a full replica. This is where the resource savings of selec-tive replication come from, since bandwidth and disk I/Os

are only necessary for full replica locations. If a record is updated in a non-master region, the update has to be for-warded to the master, but this is because of the mastership scheme, not specifically selective replication. When a record is deleted, a message is sent to all replicas (full and stub) to notify them to delete the data.

A record may be read from any location. If the local database contains a full replica, it serves the request. Other-wise, the database reads the list of full replica locations from the stub and forwards the request to one of them (prefer-ably the one with lowest network delay). This is the main penalty for selective replication: some reads that would have been served locally if all data were replicated everywhere now need to be forwarded, with an attendant increase in re-sponse latency (and some cross-datacenter bandwidth cost). As we increase the number of full replicas, there are fewer forwarded reads but also more cost to propagate updates.

It may be necessary to change the set of full replicas if, for example, the access pattern changes. In this case we might

promote some stubs to full replicas and demote some full

replicas to stubs. In our mechanism, each location requests promotion or demotion for records based on local access pat-terns, but the master decides whether to grant the request. This allows the master to enforce constraints like a minimum number of copies (see Section 4.1). If the master decides to convert a replica, it notifies all regions of the new list of full replicas for this record to ensure that reads can be properly forwarded. Additionally, if promoting a stub, the record data must be sent to the location with the new full replica.

3.2 Optimization problem

Inter-datacenter bandwidth can be extremely expensive, especially for datacenters with limited backbone connectiv-ity. Therefore, we optimize system cost by minimizing band-width used. Other costs, such as server cost, are also impor-tant. However, minimizing bandwidth usage means avoid-ing sendavoid-ing traﬃc to some datacenters, which will also re-duce the number of servers needed in that datacenter. Thus, bandwidth is a useful proxy for total system cost.

Inter-datacenter bandwidth for replication consists of:

• Replication bandwidth: The bandwidth required to send

updates between datacenters.

• Forwarding bandwidth: The bandwidth required to

for-ward read requests to remote datacenters because the local replica contains a stub.

We want to minimize the sum of replication bandwidth and

forwarding bandwidth.

Additionally, two types of constraints must be enforced. First, policy constraints specify where data must or cannot be replicated, often for legal reasons. Policy constraints may also specify a minimum number of full replicas to ensure data availability. Second, latency constraints might specify that the majority of users experience good response time. It is convenient to express this constraint by specifying the fraction of total global reads (e.g., 95%) that must be served by a local, full replica. Satisfying these constraints may mean making more full replicas, or making full replicas in diﬀerent locations, than would result from simply trying to minimize bandwidth cost.

Then, we can define our optimization problem as follows:

Definition 1. Constrained selective replication problem - Given the following constraints:

1042

!  distributed database, with replicas kept in-sync via an

asynchronous replication mechanism

! inter data center communication quite costly + high latency

!  Scenario: a social networking application that uses this

distributed database

!  Users typically show some locality which should direct the

(11)

Criteria to Replicate a Given Record

!  Goal: A replication placement algorithm providing

! low bandwidth (keeping transfer costs minimal)

! low latency

!  Dynamic Factors

! How often is the record read vs. updated?

! Latency of forwarded reads.

!  Static Factors

! Legal Constraints

! Critical data items such as billing records might have additional

replication requirements.

Policy Constraints

•

Mechanism to enforce replica placement

–  based on legal dictates, availability needs and other

application requirements.

–  part of the schema definition done by a developer

•

Example:

[CONSTRAINT I] IF TABLE_NAME = "Users” THEN SET 'MIN_COPIES' = 2

SET 'INCL_LIST' = ’USWest’ CONSTRAINT_PRI = 0

(12)

Policy Constraints (cont’d)

•

Example 2:

[

CONSTRAINT II

]

IF

TABLE_NAME = "Users" AND

FIELD_STR('home_location') = ‘France’

THEN

SET 'MIN_COPIES' = 3 AND

SET 'EXCL_LIST' = ’Asia’

CONSTRAINT_PRI = 1

Policy Constraints (cont’d)

•

Constraints can be layered

–  highest priority constraints overwrites previous settings

–  evaluated per-setting

•

Example 3:

[CONSTRAINT III]

IF

TABLE_NAME = “Users” AND

FIELD_STR(‘home_location’) = ‘India’ THEN

SET ‘INCL_LIST’ = ‘India’ CONSTRAINT_PRI = 2

Would result Indian records having 2 copies (from rule 1), one of which must be in India (rule 3)

(13)

Architecture

•

PNUTS

•

Asynchronous, primary-copy Replication

–  Appendix briefly discusses update everywhere too

•

Timeline Consistency

–  Updates are applied in the same order as on master copy

•

Per-record ‘transactions’

•

Replicate everywhere - with a twist:

–  With selective replication, some replicas have a full copy

of record, others only have stubs

–  Each stub has the primary key and additional metadata

such as list of replicas that have a full copy of the record.

–  Read for a record at a replica that contains a stub will

result in a forwarded read.

Constraint Enforcement

•

Constraints are validated when they're supplied.

–  The system doesn't allow constraints to be changed after data is

inserted.

•

Every record has a dedicated master, which makes an initial placement decision when the record is inserted.

–  Any replica of R and any stub (R) are published to the remote

locations in a single transaction.

•

Updates only go to full replicas (async.);

•

deletes are executed everywhere

•

If record contents change, full copies can migrate (promotions/demotions).

–  Stub-to-full-replica and vice versa based on access patterns

(14)

Dynamic Placement

•

Dynamic demotion/promotion of stubs

–  Reading a stub: stub promoted to full replica

–  Update on replica: if after retention interval -> stub

CryptDB [CIDR2011]

!  Motivation:

! Data is stored in hosted environments which are not under direct

control of an organisation anymore

! Hence attacks are not only possible from outside the infrastructure,

but also from inside

!  Basically: the platform cannot be trusted anymore

!  Goal: information-centric security

! ‘self-defending’ data

!  Encrypted

!  Associated meta-data about access policy

!  How to do this?

(15)

!"#$%&'(%#)*+',-'./"0$(1'

'

./"0$(1'2/,0,#34'

'

!  !"#$"#%&'(&)*+*&,-&*-&./0&)*+*1*!"& !  23,!&4567&8*8"$&8$"!"-+!&4$%8+69:&8$';,)"!&8$';*1<"&8$,;*#%& =>*$*-+""!&?,+3'>+&3*;,-=&+'&+$>!+&+3"&69@.&!"$;"$&'$&+3"&69A!& ?3'&B*,-+*,-&*-)&+>-"&+3"&69@.& & !  C"-"$*<&A88$'*#3D& ! )*+*&,-&*-&"-#$%8+")&('$B*+:&*-)&"E"#>+"&./0&F>"$,"!&';"$&"-#$%8+")&)*+*& ?,+3'>+&3*;,-=&*##"!!&+'&+3"&)"#$%8G'-&H"%!I& ! *)J>!+*1<"&F>"$%K1*!")&"-#$%8G'-&",-&*-&'-,'-&'(&&"-#$%8G'-!:&($'B& ?"*H"$&('$B!&'(&"-#$%8G'-&+'&!+$'-="$&('$B!&'(&"-#$%8G'-&+3*+&$";"*<&-'& ,-('$B*G'-&

(16)

Adjustable query-based encryption

#  .+*$+&'>+&+3"&)*+*1*!"&?,+3&+3"&B'!+&!"#>$"&"-#$%8G'-&!#3"B"& # 

A)J>!+&"-#$%8G'-&)%-*B,#*<<%&

# 

.+$,8&'L&<";"<!&'(&+3"&'-,'-!D

&

A-%&;*<>"&MN5O& .PA74Q&6P2& 7O6& A-%&;*<>"& NRPKMN5O&NRP& 7O6& ,-+&;*<>"&QN@&

CryptDB: A Practical Encrypted Relational DBMS&

N-,'-&S& N-,'-&T& N-,'-&U&

(17)

()5%/%+$'6%78+)9:%#'

•  randomized encryption (RND) -> maximum security

•  deterministic encryption (DET) -> weaker privacy, however

allows server to check for equality

•  order-preserving encryption (OPE) -> even more relaxed in

that it enables inequality checks and sorting operations

•  homomorphic encryption (HOM) -> enables operations over

encrypted data such as additions,etc.

;&04%&%+$3<,+'

V$'-+"-)& W-B'),X")& 69@.& 4$%8+69&RY& +*1<"!& 4$%8+69&W6V!& Z>!"$K)"X-")& (>-#G'-![& ."$;"$& />"$%& 7"!><+!& P-#$%8+")&/>"$%& P-#$%8+")&7"!><+!& ./0&5-+"$(*#"& # 

O'&#3*-="&+'&+3"&69@.&

# 

.3'><)&?'$H&'-&B'!+&./0&69@.&

(18)

=>3&04%'

.P0P42&\&V7N@&"B8&]QP7P&!*<*$%&^&

100000

UPDATE table1 SET col3onion1 = DecryptRND(key, col3onion1)

A-%&;*<>"&MN5O&

.PA74Q& 6P2& ?@('

SELECT * FROM table1 WHERE col3onion1 = x5a8c34

(=6' "B8D&

$*-H& -*B"& !*<*$%&

.://%+$'0/,*/%##'

•  various components of Relational Cloud and are in the

process of integrating them into a single coherent system, prior to offering it as a service on a public cloud.

•  implemented the distributed transaction coordinator along

with the routing, partitioning, replication, and CryptDB components

•  developed a placement and migration engine that monitors

database server statistics, OS statistics, and hardware loads, and uses historic statistics to predict the combined load placed by multiple workloads.