• No results found

Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu

N/A
N/A
Protected

Academic year: 2021

Share "Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu"

Copied!
30
0
0

Loading.... (view fulltext now)

Full text

(1)

Lecture 4

Introduction to

Hadoop & GAE

Cloud Application Development

(SE808, School of Software, Sun Yat-Sen University)

Yabo (Arber) Xu

(2)

Outline

  Introduction to Hadoop

• 

The Hadoop ecosystem

• 

Related projects

• 

How to start

  Introduction to GAE

• 

What is GAE

• 

Overview of runtime environment

• 

Scalable services

• 

Advantages and limitations

• 

Billing and free quotas

• 

Demo, and how to start

(3)

Hadoop Stack & Google’s Equivalents

Google Hadoop

MapReduce Hadoop

MapReduce

Programming Framework

GFS HDFS Distributed File

System

BigTable HBase Distributed Column

Database

Sawzall PIG / Hive High-level Language Chubby Zookeeper Distributed Consensus

Engine

(4)

Pig

  Data-flow oriented language

• 

“Pig latin”

• 

Datatypes include sets, associative arrays, tuples

• 

High-level language for routing data, allows easy integration of Java for complex tasks

  Developed at Yahoo!

(5)

Hive

  SQL-based data warehousing app

• 

Feature set is similar to Pig

• 

Language is more strictly SQL-esque

  Supports SELECT, JOIN, GROUP BY, etc.

  Features for analyzing very large data sets

• 

Partition columns

• 

Sampling

• 

Buckets

  Developed at Facebook

(6)

HBase

  Row/Column store

  Billions of rows * millions of columns

  Column-oriented – nulls are free

  Untyped – stores bytes.

  Constraint access model

• 

(key,value) look up

• 

Limited transactions ( only one row)

(7)

Hbase – Data Model

Data schema

Disk storage

(8)

Hbase – Design & Features

Design similar to GFS

Name node  Master server

Data node Region server, organized in columns and cells

Features

Fault tolerant and auto load balancing

Fast access to cells, and fast scan over the ranges of rows.

More flexible schema than traditional database.

Less transaction support and weak consistency guarantee

(9)

HBase as a MapReduce Input

  Each row is an input record to MapReduce

  MapReduce jobs can sort/search/index/query data in bulk

*If you are interested in knowing more about HBase, you may take a look at Cloudera’s training video on HBase.

(10)

Zookeeper

  Distributed consensus engine

  Provides well-defined concurrent access semantics:

• 

Leader election

• 

Service discovery

• 

Distributed locking / mutual exclusion

• 

Message board / mailboxes

(11)

Pipes, Streaming

  Multi-language connector libraries for MapReduce

• 

Write native-code MapReduce in C++

• 

Write MapReduce passes in arbitrary scripting languages

(12)

Hadoop related projects

  Avro: A data serialization system

  Chukwa –Hadoop log aggregation

  Scribe –More general log aggregation

  Mahout –Machine learning library

  Cassandra –Column store database on a P2P backend

(13)

Hadoop Status

  Still under active development

  Current stable release: 0.20.2 ( Hadoop official websites)

  There are some other well-maintained distribution

• 

Cloudera’s CDH2

• 

Yahoo’s Distribution: Hadoop 0.20.10

  Supported platform

• 

Linux as production platform/Win32 as a dev platform

  Get yourself started with (Also Lab1’s task)

• 

Download a Hadoop stable release

• 

Setup a single-node Hadoop installation

• 

Try out the HDFS operations

• 

Read WordCount Example codes, and run your first MR job on Hadoop

(14)

Introduction to Google App Engine (GAE)

IaaS

Infrastructure as a Service PaaS

Platform as a Service SaaS

Software as a Service

#1: The technology drives PaaS

#2: The application development

on top of

PaaS platform

(15)

What is Google App Engine

•  A PaaS platform for hosting web applications in Google- managed data centers.

• 

Released on April 08 with Python support.

• 

Java included on May 09.

+ + =

Google App Engine Java Language Google Web Toolkit Google App Engine for Java

(16)

A Traditional Scalable Website

(17)

A GAE Scalable Website

(18)

GAE Advantages

•  Easy to use, scale and manage

•  Run your application on Google’s infrastructure

•  Forgot worries of managing your servers

•  Think about developing more features for your web, let Google manage the rest

•  No server restart, no network issues

(19)

19

GAE Architecture

(20)

GAE Java Runtime Environment

•  Java 6 VM

•  Servlet 2.5 Container

•  HTTP Session support (need to enable explicitly)

•  JDO/JPA for Datastore API

•  JSR 107 for Memcache API

•  javax.mail for Mail API

•  javax.net.URLConnection for URLFetch API

http://code.google.com/appengine/docs/java/runtime.html

(21)

Java Standards on GAE

http://code.google.com/appengine/docs/java/runtime.html

(22)

Datastore API

•  Storing data and manipulation

•  Based on Bigtable

•  Not a relational database

•  GQL (Google Query Language)

•  Need to use JDO/JPA

http://code.google.com/appengine/docs/java/datastore/

(23)

23

Memcache

• 

Better than Datastore

• 

Storage on memory rather on disk

• 

Arbitrary key-value pair mapping

• 

It implements JCache interface

• 

1MB limit per entry

• 

Free quota 8.6M/day, 800 request/sec

http://code.google.com/appengine/docs/java/memcache/

(24)

Users & Authentication

  @gmail.com address

  Apps for Domain

  Admin Privileges

(25)

25

URLFetch

• 

Load external URL

• 

Asynchronous support

• 

HTTP/HTTPS

• 

Max 10 second response

• 

Max 1MB data

http://code.google.com/appengine/docs/java/urlfetch/

(26)

Even More…

•  Datastore – database storage and operations

•  Memcache API – high performance in-memory key-value cache

•  User Accounts – using Google accounts for authentication

•  URLFetch – invoking external URLs

•  Mail – sending mail from your application

•  XMPP – sending/receiving XMPP-compatible instant messages

•  Task Queues – for invoking background processes

•  Images – for image manipulation

•  Cron Jobs – scheduled tasks on defined time

http://code.google.com/appengine/docs/java/apis.html

(27)

Who is using GAE?

http://code.google.com/appengine/casestudies.html

(28)

GAE Demo

Demo site: http://shen-ma.appspot.com/

(29)

How Do You Start

  The best way to learn is by practice!

  Following GAE’s Getting-Started: Java, and have your first application online in 2 hrs. (Also Lab 1 Task)

  Recommend everybody using Eclipse as Dev IDE, GAE offers a very nice plugin

  Other GAE examples available on our course website

(30)

Intro done, ready to get your

hands dirty!

References

Related documents

• XML Gateway and XACML Authorization combined: SOA governance and XACML Policy enforcement for web service security provided through SOA Governance and XML firewall features of

Verified Dealers with an early bird status receive special rights and conditions at AUCTIONATA as they assist AUCTIONATA in building a large market place and a large community for

The literature review, focusing on international state- of-the-art practice of road safety impact assessment (RSIA) and network safety ranking (NSR), concluded that both pro-

Texas International Engineering Consultants (TIEC, Inc.) provides assistance in the development, implementation and maintenance of Quality Management Systems in compliance

Today, your child heard a read-aloud about the twelve Greek gods and goddesses that the ancient Greeks believed lived on Mount Olympus. Over the next several days, your child

The aim of the present Ph.D thesis was to identify novel key factors and new molecular mechanisms involved in physiological and pathophysiological processes

Our qualitative findings show three cycles of value delivery in equipment based service; the Recovery Value Cycle, the Availability Value Cycle and the Outcome