• No results found

Cloud Computing: What a Project Manager Needs to Know

N/A
N/A
Protected

Academic year: 2021

Share "Cloud Computing: What a Project Manager Needs to Know"

Copied!
24
0
0

Loading.... (view fulltext now)

Full text

(1)

Cloud Computing:

What a Project Manager

Needs to Know

Dr. Patrick D. Allen, PMP

(2)

Purpose

Provide Project Managers with the very

basics of the three primary types of Clouds

and Cloud Computing, and the questions

they should ask when Clouds and their

project intersect

(3)

Overview

“Storage Clouds”

“Computing as a Service” Clouds

Questions PMs should ask

“Data-Focused” Clouds

Relational Databases vs Clouds

Map-Reduce and Accumulo examples

Questions PMs should ask

General Cloud questions PMs should ask

(4)

What’s a Cloud?

Three primary definitions of Clouds presented today:

1. Storage Cloud (just stores data; provides memory) 2. Compute-power as a Service (VMs)

Infrastructure as a Service or

Platforms as a Service or

Software as a Service

3. A Data-focused Cloud that also runs on VMs

E.g. Hadoop Data File System and data

processing

PMs need to make sure everyone understands which

type is being discussed

If you think you’re discussing a different one, confusion

(5)

First Type: Storage Cloud

Just gives you a place to store electronic data

Music, photos, scanned documents, back-ups

Can’t run any calculations or run programs on it

Can’t do Big Data calculations on it

Many Cloud Service Providers offer storage as

one of their options; others specialize in just

storage

Internet Service Providers, such as Comcast,

also provide Cloud Storage

Online gaming (like Steam) allows storing saved

games on clouds

(6)

2

nd

Type of Cloud: Computing as a Service

Instead of using your own computers, you use a

Third-Party’s computers at another location (e.g., AWS’s EC2)

Usually all same hardware with a variety of Virtual

Machine (VM) configurations to meet customer needs

When hardware dies, it is seamlessly replaced

All hardware and infrastructure and physical security

headaches are the responsibility of the Third Party

You’re responsible for secure comms to and from the

data stores and the security on the machines you use

You only pay for what you use (memory, computing

power or number of virtual machines used)

Great for surge-type activities, such as the census

that’s run every ten years, or new venture start-ups

(7)

2

nd

Type: Questions PMs Should Ask –1

What’s the cost per data stored (Cents per Gigabyte)?

What’s the cost for number of VM’s used?

How secure or private is my data when I store it on a

third-party platform?

What security or privacy guarantees are provided?

Will the PII be adequately protected?

Can I test Cloud security before I put real data there?

Am I starting a new business with limited investment?

Would a Cloud be useful for my Continuity of

Operations (COOP) plans?

It depends. Do your employees already regularly

perform remote operations like teleworking? Do you have a re-routing plan to get them to the Cloud?

(8)

2

nd

Type: Questions PMs Should Ask – 2

Can you store classified data on a cloud?

If a properly secured government-accredited private cloud, Maybe

If you are planning to use a Third-Party service, Maybe

As a minimum, use a virtual private cloud (e.g., AWS VPC)

And located entirely in the U.S. (not distributed world wide)

Probably need to limit access to selected personnel at the service provider site (like no foreign access in US Gov Cloud)

US-Gov-only Cloud important for data under export control

Need your security department’s approval, which includes your plan and vetting the provider

Probably need to do penetration testing before use, like “side channel attack” prevention

Not sure if this is yet being used for more than unclassified but sensitive data

For either case, always get a cyber security expert to prepare a risk assessment, and for classified data, a proper accreditation

(9)

Process for Approval for U & SBU Data

FedRAMP is a new standardized approach to

security assessment, authorization and

security monitoring for cloud-based products

and services

FedRAMP is mandatory for federal agency

cloud deployments and service models at the

low and moderate risk impact levels

Ref:

http://www.gsa.gov/portal/category/102371

Ref: The Business Monthly, Aug 2012 by

Gloria Larkin “Cybersecurity and FedRAMP: A

Mandatory Combination”

(10)

3

rd

Type: Data-Focused Cloud–Definitions

Huge Data: Petabytes or larger amounts of data

HDFS is Hadoop Data File System (more on this later)

Relational Database: Think rows and columns, densely populated (like a spreadsheet)

Structured non-relational databases: Cloud-based structured data technologies like Accumulo and HBase running on HDFS

Can be densely or sparsely populated

Tend to use flexible labels of length three to six (more later)

Many different types of data that may have some overlapping elements, but not the same across all types of data

If put into rows and columns it would be a huge table only sparsely populated

(11)

Relational Database Example

Name Address Age Height

John Smith Jane Doe Fred Flintstone Tony D. Tiger Elmer Fudd Peter Parker Bruce Wayne Roger Rabbit Peter Rabbit White Rabbit Washington DC Baltimore Rockville Battle Creek DeForest New York Gotham Fantasyland Rural Address Wonderland 35 29 55 67 60 28 36 41 118 135 5’10” 5’8” 4’10” 6’2” 4’6” 5’5” 6’1” 4’0” 1’1” 1’11” Find the Names of those of Age >25 but <60, and > 5’ tall

(12)

Sparse Data Example

John Smith Jane Doe Peter Parker Bruce Wayne Washington DC Baltimore New York Gotham Age 35 Age 29 Age 28 36 5’10” 5’8” 5’5” 6’1”

Medical Records Drivers Licenses Facebook Dating Service

John Smith

Peter Parker

Bruce Wayne

(13)

Accumulo Data Example

ID Col. Family Col. Qualifier Time Security Value

001 001 001 001 001 001

Personal Name 31 Apr ‘12 PII John Smith Personal Age 31 Apr ‘12 PII 35

Personal Height 31 Apr ‘12 PII 5’ 10” Address City 31 Apr ‘12 PII Wash DC Address Street 31 Apr ‘12 PII K Street Address Number 31 Apr ‘12 PII 810

002 002 002 002 002 002

Personal Name PII Peter Parker

Personal Age 31 Apr ‘12 PII 28 Personal Height 31 Apr ‘12 PII 5’ 5”

Address City 31 Apr ‘12 PII New York Address Street 31 Apr ‘12 PII

Address Number 31 Apr ‘12 PII

72nd Street

145 31 Apr ‘12

(14)

3

rd

Type: Data-Focused Cloud

Also runs on a VM farm, but uses a “Hadoop” or “Sector”

file management system (Hadoop is most widely used)

What does a Hadoop Data File System (HDFS) do for you?

Let’s you store huge amounts of non-relational data

Automatically parallelizes the computations

Automatically sorts results of “map” step

Handles all of the overhead associated with storing,

locating and processing your data

Allows for Map-Reduce programs and Direct Access

Table-based searches using Hadoop to be run

Can find relationships not easily visible in unstructured

(15)

3

rd

Type: Map-Reduce Program Example

Find the number people per household in census data

Distributed Databases of Household (HH) Census Data Count members of HH Hadoop Auto Sorts Map Reduce Add # HH w/ N members, N = 1 to 25 1, 3.5 M 2, 9.6 M 3, 6.8 M 4, 5.3 M

Key = HH Size, Value = #

HH001, 3 HH002, 6 HH003, 4 HH004, 3 HH001, 3 HH002, 6 HH003, 4 HH004, 3

Key = #, Value = Total

Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ Σ

(16)

3

rd

Type: Map-Reduce Pros and Cons

Map-Reduce programs are good for:

When you have huge data sets

If your data can't be managed in a relational database

When you are not sure what types of queries you will want to run

If you want to summarize the results of independent processes that can be applied to data in parallel

Map-Reduce programs are not good for:

If you can answer your questions with an existing relational database in a reasonable amount of time, why bother with the overhead of a cloud?

If your data can fit within a relational database, AND

If the queries you plan to run are fairly well-defined THEN

(17)

3

rd

Type: Questions PMs Should Ask – 1

Do I even need to use a Cloud?

If you have well-structured reasonable amounts of data, stick with a relational database UNLESS you just want the compute power on demand (2nd Type of Cloud presented)

If it is required by external authorities (like a customer), yes

Do I have a lot of "surge" events, where you only need to store and process large amounts of data periodically

Then using a cloud makes sense

Do I need to know how to write a Map-Reduce program or an Accumulo Table to use a Cloud?

No, can use pre-defined programs, OR you need someone who knows how write new ones for you

Do I need to know how to design a Map-Reduce program?

No, but it helps so you can ask for realistic output from the Cloud and really leverage the Cloud to solve your data problems

(18)

3

rd

Type: Questions PMs Should Ask – 2

Do I have access to an existing Cloud I could use?

If it meets your requirements, third-party Clouds work

Make sure of the “fine print” on the guarantees, and whether the recourse of the guarantee is sufficient to match the cost of the failure to guarantee

Have a security expert do a risk assessment before committing

Do I need to build my own instead?

If you have security, privacy or proprietary needs not met by an existing Cloud, might want to build your own

Consider the ongoing maintenance costs (may be primary rationale for moving to a Cloud)

(19)

General Cloud Questions for PMs

Where is the Cloud located? Can it be restricted to U.S.?

Who gets access to it?

How are the communications to/from the cloud secured?

How does it ingest its data?

How does it store its data?

How do they secure your data at rest?

How does it delete its data? Can you test that it’s gone?

Does it keep your data separate from other people's data?

Do you need/want a virtual private cloud instead?

How often is the hardware upgraded?

How many versions of VMs can you choose from?

(20)

Summary Observations

Cloud computing is here to stay

Many more projects in the future will encounter Clouds in

some way that will impact the project

Need to be aware of the strengths and limitations of

Clouds and whether they are appropriate for your project

You may not have a choice whether or not to use a Cloud

This briefing listed some of the basic questions you

should ask as appropriate to your project

Hopefully some of the mystery (and hype) of the Cloud

has been dispelled by this talk

It is useful to be able to design a Map-Reduce program so

your expectations of the output are realistic

Always do a cyber risk assessment on a Cloud you plan

(21)

Contact Info

Dr. Patrick D. Allen

Johns Hopkins University Applied Physics Lab 11100 Johns Hopkins Road

MS 21-N246

Laurel, MD 20723-6099 443-778-9915 v

443-778-3838 f

(22)

Back-up: Terminology Relationship

Google File System (GFS)

Hadoop Data File System (HDFS)

Hadoop

(Map Reduce) Map Reduce Big Table HDFS Accumulo APACHE GOOGLE Structured Data Map Reduce Environment File System

(23)

Back-up: Sample Map Reduce Program

Map algorithm

Map (key: sourceURL, value: text) { for each (targetURL in text)

EmitIntermediate (targetURL, sourceURL); }

Reduce Algorithm

Reduce (key: targetURL, value: sourceURL) { sourceList[] = null;

for each (u in sourceURL)

add sourceList[sourceURL]; Emit (targetURL, sourceList[]); }

(24)

Back-up: Map Reduce Example 2

Find targets for source 1

Find targets for source 2

Find targets for source 10^9 targetURL a – URL1 targetURL b – URL1 targetURL a – URL2 targetURL c – URL2 targetURL b – URL10^9 targetURL c – URL10^9 targetURL d – URL10^9 targetURL a – URL1 targetURL a – URL2 targetURL b – URL1 targetURL b – URL10^9 targetURL c – URL2 targetURL c – URL10^9 targetURL d – URL10^9

Create list for targetURL a

Create list for targetURL b

Create list for targetURL c

Create list for targetURL d sorted targetURL – sourceURL list Doc 1 Doc 2 Doc 10^9

References

Related documents

•The Maze Runner •The Missing Series •The Power of Six •Slated Series. •The Time Machine •A Wrinkle in Time Award

Keywords: electric vehicles, energy efficiency, Total Cost of Ownership, vehicle design, battery,

In order to attempt to solve the inverse problem of damage identification using vibration measurements to detect these changes, the direct problem of calculating

 Once the new node is inserted, the balance MUST be checked and restored if the tree has become unbalanced.  Even if the insertion caused one of the

In the field study, there were two different planting regimes: the control treatments were made up of seeds that were planted at the normal and usual time of

Based on the purpose, this study include the type of causal research for this study was conducted to test the effect of independent variables (profitability,

focus on groups with symmetric access to genre expectations. Future research could explore how genre expectations develop and are shared among people with asymmetric access to

Interviews were conducted before and after intervention using a physical restraint questionnaire to examine any change in staff knowledge, attitudes, and behaviours on physical