Projectacronym: NEBULA
Projectname: Anovelvocationaltrainingprogrammeoncloud computingskills Projectcode: 540226-LLP-1-2013-1-GR-LEONARDO-LMP
Document Information
Document IDname: Nebula_WP4_D4.3.1_Learning_Material_and_Content_2015_30_04 Document title: NebulaVETprogramlearningmaterialandcontent
Type: Slides
DateofDelivery: 30/04/2015 Workpackage: WP4
Activity D.4.3.1 Disseminationlevel: Public
Document History
Versions Date Changes Typeofchange Deliveredby Version1.0 15/04/2015 InitialDocument - UCBLand INSAofLyon Version2.0 26/06/2015 Edition
Modificationsaccording tofeedbackprovidedby
partners
UCBLand INSAofLyon
Acknowledgement
The personsof UCBLin chargeof producingthe course areParisa Ghodous, CatarinaFerreira DaSilva,Jean Patrick Gelas and Mahmoud Barhamgi. The persons from UCBL involved in preparing, translation and reviewareHindBenfenatki,GavinKempandOlivier Georgeon.
Thepersonsof “INSAofLyon”in chargeofproducingthe courseareFrédériqueBiennier, NabilaBenharkat. The persons from INSA of Lyon involved in preparing, translation and review are Francis Ouedraogo and Youakim Badr.
Disclaimer
Theinformationinthis documentis subjecttochangewithoutnotice.All rightsreserved.
The course is proprietary of UCBL and INSA of Lyon. No copying or distributing, in any form or by any means, isallowedwithouttheprior writtenagreementofthe ownerofthepropertyrights. Thispublication reflects the views only of the author, and the Commission cannot be held responsible for any use, which
Module 3 objectives
The aim of this module is to provide the student
with the capabilities to analyse the risks and
legal implications associated to the migration
process, assessing their influence in the data,
processes, and applications
---Note: due to intellectual property reasons, the logotype of
UCBL must remain in all utilisation of this course content,
as well as the note “copyright DUNOD” mentioned in some
slides with figures.
Risk, security, and legal analysis for
migration to cloud
According to you, how should you take
care of Privacy?
• Can you define personal data / private life?
• How do you define privacy?
• Do you know some techniques to provide privacy?
• Do you know how your personal data are protected in
Cloud?
• Do you know the legal framework you deal with regarding
privacy?
According to you, how can you assess
the risks associated to the Cloud
Migration?
• In this part you will
– Learn basic principles to define private life and
define privacy requirements
– Get information to identify how personal data can
be protected
– Learn basic elements on legal frameworks
– Identify privacy challenges to take into account
while migrating on the Cloud
PART 3 OVERVIEW
1. Privacy
2. Actors involved in the cloud computing model
3. Solutions
Cloud Computing
Cloud computing has proven to be a successful paradigm that largely simplifies the deployment of data storage and computation capabilities for enterprises. It provides interesting characteristics:
•
Flexible pay-per-use pricing model, users pay only for what they consume•
No upfront cost for consumed (hardware/software) resources•
Scalable (and unlimited) storage and computation resources•
No needs to manage the allocated resourcesCloud Computing, the ugly face ...
However, the previously stated benefits come at an expensive price:
•
Users and enterprises lose control over the systems that manage their data and applications•
Users don't know where their data is stored and who can access it•
Cloud providers (and their management staff, e.g., DB and networks administrators, software developers,, etc.) may:‒ View the user’s sensitive information
‒ Process the users’ data for various reasons: e.g., sending targeted advertisements, to snoop on people, selling the data to interested parties
Privacy concerns through an example
A Conference Management System (CMS):
A typical CMS can do the following tasks:
•
Distributes the papers to the program committee (PC) members, based on their preferences and conflicts of interest•
Organizes the collection and the distribution of reviews and discussion•
Ranks papers according to their scores•
Sends out reminder emails, as well as notifications of acceptance or rejection•
Produces reports such as lists of sub-reviewers, acceptance statistics and conference programPrivacy concerns through an example
A Cloud-based CMS has the following advantages (for the conference chair):
•
The conference chair does not need to install and host a Web server and install a CMS software on the server. He/she needs only to create the conference account “in the cloud”•
The whole business of managing the server (including backups and security) is done by someone else, and gains economy of scale•
Accounts for authors and PC members exist already, and don’t want to be managed on a per-conference basisPrivacy concerns through an example
•
Data is stored indefinitely, and reviewers are spared the necessities of keeping copies of their own reviews•
The system can help complete forms such as the PC member invitation form and the paper submission form by suggesting likely colleagues based on past collaboration history•
For all of these reasons cloud based CMS such as EasyChair and EDAS are immense contribution to the academic communityPrivacy concerns through an example
Data Privacy Concerns: Accidental or deliberate disclosure
•
The Cloud-based CMS administrators become custodians of a huge quantity of data about the submission and the reviewing behaviour of researchers,aggregated across multiple conferences
•
This data could be deliberately or accidentally disclosed, with unwelcome consequences:•
Reviewer anonymity could be compromised, as well as the confidentiality of PC discussions•
The acceptance success records could be identified for researchers, over a period of yearsPrivacy concerns through an example
Data Privacy Concerns: Accidental or deliberate disclosure
•
The aggregated reviewing profile (fair/unfair, thorough/scant,
harsh/undiscerning, prompt/late, etc.) of researchers could be
disclosed
•
The data could be abused by:
•
Hiring or promotions committees
•
Funding and award committees
•
Researchers choosing collaborators and associates
•
...
•
The mere existence of the data makes the system administrators
vulnerable to bribery and coercion
Privacy concerns through an example
Data Privacy Concerns: Accidental or deliberate disclosure
•
The problem of data privacy exists before the emergence of Cloud Computing, but the Cloud “magnifies” it:•
Before the cloud: data privacy breaches were about one conference•
With the Cloud: data privacy breaches concern thousands of conferencesover decades, presenting tremendous opportunities for abuse if the data gets into the wrong hands
Privacy concerns through an example
Data Privacy Concerns: beneficial data mining
The data could be also exploited for some beneficial purposes:
•
Fraud and unwanted behaviour detection and prevention•
Researchers who systematically unfairly accept each other’s papers, or rivals who systematically reject each other’s papers•
Reviewers who reject a paper and later submit to another conference a paper with similar ideas•
Undesirable submission patterns and behaviours by individual researchers:•
Parallel of serial submissions of the same paperPrivacy concerns through an example
Data Privacy Concerns: beneficial data mining
The data could be also exploited for some beneficial purposes:
•
The data could be used to understand the way conferences are administered•
ACM and IEEE could use the data to construct quality metrics for theconferences
•
How much “new blood” is entering the community•
How a conference changed over its different editions•
The types of authors who submit to the conference•
This raises important questions as to who is allowed to mine the data and for what purposesActors involved in the cloud computing
model
Data owners: The moral or the real persons to whom the data belongs
•
Reviewers and authors in a CMS•
Patients in an Electronic Medical RecordPrivacy concerns:
•
The main concern of data owners is to protect their data and identities against all unauthorized access or uses•
They may also have privacy preferences that must be respected, e.g., apatient may allow only his primary physician to access his medical information while he is under treatment, and may refuse the usage of his data for research purposes, etc.
Actors involved in the cloud computing
model
Data users: The people who query the data for various reasons
•
Physicians who consult the medical data of their patients for treatments•
Medical Researchers who access the patients data to study the side effects of a given medicine, etc.Privacy concerns
•
The main concern of data users is to protect their queries and identities•
Example, a researcher who is trying to discover the side effects of a givenmedicine may require his identity and queries (about his current researches) to be protected (to keep his research secret from his peer researchers), etc.
Actors involved in the cloud computing
model
Service or cloud providers: they include all IT staff required to run and manage
the cloud
•
Network administrators•
Database administrators•
Software developers,•
Management and technical staffCurrent solutions for data privacy
•
Current solutions for data owners
•
Current solutions for data users
Current privacy solutions for data owners
There exist two categories of solutions
•
Encryption-based solutionsProtect the privacy against cloud providers
•
Privacy-aware access control solutions Protect the privacy against data usersEncryption-based solutions
Encryption is a simple and promising solution to protect the confidentiality of data from the cloud provider
Idea:
• Data is encrypted before they are stored in the cloud
• Malicious cloud insiders cannot view the private sensitive data
Limitations:
• Encryption limits the cloud’s ability to process the users' queries on their behalf
Encryption-based solutions
Different techniques were proposed to allow the cloud to process queries on encrypted data, without decrypting them:
•
Data partitioning techniques•
Order preserving encryption techniques•
Searchable encryption techniquesData partitioning techniques
•
Data elements are organized into groups, called Buckets•
Each Bucket has boundaries and a Tag•
All data elements inside a bucket are associated with the same tag•
Example: the Bucket B1 includes the employees whose age is in [18, 30]•
The client (i.e., data owner) should store the boundaries of all bucketsQuery model:
•
The client should:•
Determine the buckets that intersect with his query (based on the boundaries)•
Retrieve the buckets from the cloud server•
Decrypt the data of retrieved buckets, and remove the data elements that don't satisfy the queryData partitioning techniques
Limitations:
•
There is a trade-off between the ensured privacy protection and
the performance:
•
Large volume buckets offer better protection, but involve increased computation overhead for the client•
Small volume buckets offer poor protection, but less computation on the client site•
The client overhead is not negligible
Order preserving encryption techniques
•
The encryption scheme preserves the order relation between the original values and their encrypted valuesExample: if a > b, then e(a) > e(b)
•
The cloud can compute simple queries involving simple operations on encrypted data, e.g., MAX, MIN, Count, >, <, =•
Access control model cannot be implemented on the server sideLimitations
•
Malicious cloud providers may progressively know the mapping between real and encrypted valuesHomomophoric encryption techniques
•
The Homomophoric encryption scheme makes it possible to
answer all types of queries on encrypted data
•
Based on the idea that all query operations can be implemented
through two operations: the addition and the multiplication (i.e.
All query operators can be translated using these two operations)
Limitations
•
Impractical, for example the computation of a simple query may
take years
Privacy-aware access control solutions
•
The objective of these solutions is to protect the privacy of data against data users•
Most of these solutions are based on RBAC (Role Based Access Control) modelsRules : <Recipient, Data Item, Purpose, Conditions>
•
Recipient: the entity requesting the data•
Data item: the requested data•
Purpose: the objective for which the data is requested•
Conditions: the set of conditions that should be met Examples:•
A physician may access the Lab Tests of a patient in case of emergency•
A physician may access the personal information of a patient if the later agreesPrivacy-aware access control solutions
Pros and Cons:
•
These solutions provide fine-grained access control (at the
attribute level)
•
Offer different levels of accuracy for the a data item
•
Extensible, simple and easy to implement and use
•
Not always doable when the data stored on the cloud is
encrypted
•
Most of these solutions assume the cloud to be a trusted entity
(that can verify the privacy aware access rules)
Current privacy solutions for data users
There are different types of data that may be considered as
privacy sensitive by the cloud users:
•
Queries related information
•
Example: A scientific researcher may not want to discloses
his queries to protect his ongoing inventions
•
Identity
•
Example: An HIV patient may want to keep his identity
private when he ask questions about HIV symptoms
•
Contextual information
Query related solutions
PIR-based solutions:
•
Most of the current solutions are based on PIR (Private Information Retrieval) protocols•
The idea behind PIR protocols is to execute a query over un-trusted server without letting the server knows anything about executed queries or their results•
PIR protocols are cryptographic (i.e., queries and their answers are cipher texts)•
Different protocols exists for one or several serversLimitations:
•
PIR-based solutions are very time expensive, and thus impractical (a query may take years to be answered)Query related solutions
Plan-based solutions:
•
These works are motivated by the observation that different plans for the same query reveal different information about the user intention (behind the query) to the server•
These techniques are based on modifying the mature query optimization techniques to produce privacy aware query plans that satisfy the users constraints and preferences•
Example of the users constraints and preferences:•
Enforcing specific value constraints (e.g., name = John Doe) on a specific trusted server•
Using a specific copy of a relation from a specific serversLimitations:
•
This types of solutions require from users to have knowledge about the servers involved in resolving their queriesIdentity related solutions
Most of the solutions are based on Digital Identity Management DIM systems
•
DIMs allow to authenticate users to cloud service providers without releasing their identities•
A typical DIM involves the flowing entities:•
Cloud service providers (CSPs)•
Identity providers (IdPs): assign identity attributes to users•
Registrars: verify the identity attributes given by an IdP to users, then issuea certificate to the user
•
Users: a user can authenticate himself to CSPs using the certificate and gain access to authorized servicesIdentity related solutions
•
Using a DIM system the user can choose and manage the identity
attributes that he wishes to use
•
Examples of DIM system: Metasystem and CardSpace of Microsoft
Limitations:
•
Different Cloud services may require different DIMs which pose
interoperability issues between DIMs
General conclusions
•
Privacy protection is still a real challenge for the adoption of Cloud Computing in privacy critical and sensitive domains, e.g., healthcare, finance, military, etc.•
The existing solutions for privacy preservation are still unsatisfactory for both data owners and consumers, but the research is in constant progress•
The most effective solutions today rely on trust, auditing and tractability:•
Trust: Cloud service providers should be selected based on how trusted they are•
Trust computation is done based on the past interaction with the cloud provider and aggregated across a good number of usersGeneral conclusions
•
Auditing: mechanisms are needed to monitor the different
operations within a cloud to detect and prevent suspicious
queries and data accesses
•
Tractability: mechanisms are needed to track down the origins of
the different operations within a cloud (e.g., who accessed a given
data item, and for what purposes, etc.)
Legal constraints
• Risks management
– Due to their consequences... – Technologies
– Confidentiality
– Tracking / monitoring • Encryption
– Allowed keys and algorithms – Communication
– Encrypted data storage
– Different legal constraints depending on the countries • Encryption vs scrambled data
• Data privacy
– Personal data
Data privacy
• Personal data
– User related information
• Name, addresses, competencies, phone number, email… • IP address, computer name, visited URLs, geo-localization..
– Activity related information
• Log files, Access control system logs…
• « Physical » control (access badges, video…)
• Private life violation
– Personal data collection
• User must be informed
• Personal (and private) data / files let on computers
Big brother is watching you…. Private
life protection (1/2)
• Different legal contexts
– Market based regulation: US
• Federal Trade Commission
• Improper (unfair) sites won’t be visited • Hyper-protection for minors
– Legal specification: EU
• World wide protection for Europeans
Big brother is watching you…. Private
life protection (2/2)
• Major stakes
– World wide exchanges among Internet
– Legal framework?
– Common principles
• Fair and unfair practices
Cloud impact on Privacy (1)
• Rise of actors
– Personal data are totally distributed – Responsibilities management
– Difficult to get a consistent view
• Multiple policies
– Depending on actors
– Shared infrastructure / services – New protection needs
• Charter analysis
• Threats related to the provider
– Difficulties due to the extra-territoriality of the cloud
Cloud impact on Privacy (2)
• Your personal data means money for service providers… – To use a service
• “Pay by providing some data” – Data quality
– Trust level associated to the provider
• Personal data are necessary to achieve the operation supported by the service
– Provider charter
– Risk related to linked processes / linked data – To make a service be profitable
• Economy of personal data – Anonymised or not
– Pricing for addresses, mail, email, phone numbers… – Integrated in the service economical / pricing model
Legal constraints...
• Integration of constraints related to data protection act – Tracking risk
– Privacy at work • Access control
• Proxy confirguration /deconfiguration • Emails can be confidential
• Legal precedent
– Hyperprotection for people on the EU side – Hyperprotection for minors in the US
– Risk for companies only if the privacy violation is involved in / is used to justify a penalty
– Usage charter
Private life, privacy and web sites (1)
• Trails / data let on the Internet
– Name and address of the computer – Computer’s parameters
– Cookies
– Visited pages
• Information collection – Identity
– Address (mail and email) – Phone numbers
– (Pub) quiz
– Consideration of • Goods
Private life, privacy and web sites (2)
• Practices that may be « unclear »…
– Sell “Clients / prospects files”
• May be forbidden in EU depending on the way the file has been made / on what the file contains
• Problems due to data exportation
– Safe Harbour
– Advertisement on data collection
• E.g. Microsoft
– Linked processes on the collected data
• Traces and workflow recognition • Customer profile identification
Fair and unfair (1)
• Fair practices
– Transparency
• Personal data collection
• Processes involving personal data
– Absolute need
• Cyber-control used in conjunction with other security tools
– Equity
• Goals associated to personal data processing
– Proportionality
Fair and unfair (2)
• Unfair practices
– Data collection…
– Shelf life of the collected data – Linked processes
– Sell personal data
• Consequences
– Users must be informed – Legal notice
– Secured storage and secured processes involving personal data – Adapted analysis / mining processes on personal data
• Anonymisation
Data privacy at work
• Private life at work
– Private files
– Mailing protection is also applied for email tagged as “private” – Users authentication
• Login/password
• Physical key protection while implementing a PKI • Bio-metrics based authentication
• Activity related control
– Reporting files
• Survivable systems
• Usage of resources, productivity measures...
– Activity reporting
• Re-building Workflow process