• No results found

Modernize your data warehouse

N/A
N/A
Protected

Academic year: 2021

Share "Modernize your data warehouse"

Copied!
24
0
0

Loading.... (view fulltext now)

Full text

(1)

Isabel Huerga Ayza

Senior Developer Advocate

(2)

Agenda

• Cloud data warehouses

(3)

Benefits of a cloud data warehouse

No infrastructure

costs and pay-as-you-go

Increases in

productivity

Scale, elasticity,

and flexibility

Get insights

(4)
(5)

Needed a new platform to support general operations

and new analytics service

Moved to AWS, using a wide range of services

Can scale system faster in response to unanticipated

spikes in traffic

Receives query results in seconds compared to 30

minutes under old system

Obtains deeper insights into billions of data points, using

information to deliver better services

Using the new AWS tools, we can

extract much finer-grained data points

based on millions of donations and

billions of visits, and then use that

information to provide a better platform

for our visitors.

Richard Atkinson

Chief Information Officer, JustGiving

JustGiving is a major online platform that supports

charitable giving. The organization is based in London.

(6)

Amazon Redshift Benefits

Massively parallel processing

Columnar data storage

Result caching

Usage-based pricing

Predictable costs

Integrated catalog & security

Exabyte data lake querying

AWS-grade security

Certifications such as SOC, PCI, DSS, ISO,

FedRRAMP, HIPAA

Easy to provision & manage

Automated administrative tasks

Virtually unlimited

(7)

Amazon Redshift enables you to have a

lake house approach

Customers moving to

data lake architectures

Data warehouse

(business data

)

Data lake

(event data)

(8)

Amazon Redshift federated query

Queries on Amazon RDS and Amazon Aurora

PostgreSQL databases

Analytics on live data without data movement

Unified analytics across data warehouse, data lake &

operational databases

Flexible and easy way to ingest data

Performant and secure access to data

(9)

Redshift Cluster Architecture

• Leader node

• SQL endpoint

• Stores metadata

• Coordinates parallel SQL processing &

ML optimizations

• Leader node is free with 2+ nodes

• Compute nodes

• Local, columnar storage

• Executes queries in parallel

• Load, unload, backup, restore from S3

• Amazon Redshift Spectrum nodes

• Execute queries directly against data lake

Load

Unload

Backup

Restore

JDBC/ODBC

SQL Clients / BI Tools

Leader

node

Compute

node

Compute

node

Compute

node

...

1

2

3

4

N

R

e

d

sh

ift

Sp

e

ct

ru

m

Load

Query

Amazon S3

(10)

Amazon Redshift analytics—

RA3

(new)

Solid-state disks + Amazon S3

Amazon Redshift Managed Storage (RMS)

Dense compute

DC2

Solid-state disks

Dense storage

DS2

Magnetic disks

Instance type

Disk type

Size

Memory

# CPUs

# Slices

RA3 4xlarge

RMS

Scales to 64 TB

96 GB

12

4

RA3 16xlarge

RMS

Scales to 64 TB

384 GB

48

16

DC2 large

SSD

160 GB

16 GB

2

2

DC2 8xlarge

SSD

2.56 TB

244 GB

32

16

DS2 xlarge

Magnetic

2 TB

32 GB

4

2

DS2 8xlarge

Magnetic

16 TB

244 GB

36

16

A Redshift cluster can have up to128

ds2.8xlarge or RA3.16xlarge nodes (i.e. 2PB

or 8 PB of local or managed storage,

respectively) & can support EBs of data with

its Redshift Lakehouse feature

(11)

Evolving Architecture

Amazon Redshift Managed Storage

• Pay separately for storage and compute

• Large high-speed SSD backed cache

• Automatic scaling (up to 64TB/instance)

• Supports up to 8.2PB of cluster storage

JDBC/ODBC

SQL Clients / BI Tools

Leader

node

Compute

node

Compute

node

Compute

node

Amazon Redshift Managed Storage

Exabyte-scale object storage

(12)

Local storage has enabled the fastest

Cloud-based DWs

Shared storage enables flexibility

at the cost of performance

What if we could get the benefits of both

without a network performance penalty?

Compute

node

Compute

node

Compute

node

Compute

node

(13)

Compute

Clusters

Compute

Clusters

Compute

Clusters

Compute

Clusters

Redshift

Cluster

AQUA node

AQUA node

AQUA node

AQUA node

Amazon Redshift Managed Storage

Compute

Clusters

Compute

Clusters

Compute

Clusters

Compute

Clusters

Redshift

Cluster

Compute

Clusters

Compute

Clusters

Compute

Clusters

Compute

Clusters

Redshift

Cluster

New distributed & hardware-accelerated processing

layer

With AQUA, Amazon Redshift is up to

10x faster than any other cloud data warehouse, no

extra cost

AQUA Nodes with custom AWS-designed analytics

processors to make operations (compression,

encryption, filtering, and aggregations) faster than

traditional CPUs

Available in Preview with RA3.

No code changes

required

Preview!

(14)

Node Scaling

Modify node type, number

of nodes, or both

Execute immediately or on a

schedule

(15)

Scale-out to multiple Amazon Redshift clusters

from a single endpoint in seconds

Support virtually unlimited concurrent

users and queries while maintaining SLAs

Per-second billing for additional

clusters used

Free 1-hr. usage per day (free for 97% of

clusters)

+

+

JDBC/ODBC

Amazon Redshift concurrency

scaling

(16)

Amazon

Redshift

cluster

Leader node

Compute nodes

Workload

manager

Concurrency query slots = Auto

BI/Analytics

Priority: Normal

ETL

Priority: High

Data science

Priority: Low

Auto WLM - Dynamically manage concurrency and

memory to optimize throughput and performance

Auto WLM priorities - Influence workload

performance, intelligent algorithms to keep

low-priority queries running

SQA (Short Query Accelerator) – prioritized selected

queries in dedicated space

QMR (Query Monitoring Rules) - Define actions

based on thresholds

Efficient sharing of cluster between users & business

groups

(17)

Machine learning based automatic optimizations

Automates table maintenance

Optimizes for peak performance as data

and workloads scale

Leverages machine learning

(18)
(19)

AWS migration tooling

AWS Schema Conversion Tool (AWS SCT)

converts your commercial

database and data warehouse schemas to open-source engines or

AWS native services, such as Amazon Aurora and Amazon Redshift

AWS Database Migration Service (AWS DMS)

easily and

(20)

AWS SCT

Features

Create assessment reports for homogeneous/heterogeneous

migrations

Convert database schema

Convert data warehouse schema

Convert embedded application code

Code browser that highlights places where

manual edits are required

Secure connections to your databases with SSL

Service substitutions/ETL modernization to AWS Glue

Migrate data to data warehouses using SCT

data extractors

The AWS SCT helps automate database schema and code

conversion tasks when migrating from source to target

database engines

Source DB

AWS SCT

Target DB

(21)

AWS SCT data extractors

Extract data from your data warehouse and migrate to Amazon Redshift

Extracts

data through local migration agents

• Data is

optimized

for Amazon Redshift and saved in local files

(22)

AWS DMS

Migrating

databases

to AWS

Migrate between on-premises and AWS

Migrate between databases

Automated schema conversion

(23)
(24)

Thank you!

Isabel Huerga Ayza

References

Related documents

Client Applications: On the data source loading side Amazon Redshift integrates with various vendor solutions to perform data loading and ETL procedures to load to the Redshift

According to a recent Ventana Research report on big data technology, only 22% of 163 organizations that Ventana polled last year were using Hadoop, and 45% said they had no plans

■ Extract data from different data sources, transform and cleanse the data, and load it in your data warehouse by using SQL Server Integration Services

But did you know that to get optimal performance from your air conditioning you have to choose a unit that matches your room size and has the right energy rating?. Otherwise,

When completed, the CloudFormation template will have created the environment described in the following diagram:.. 18 This environment has configured the Secure Agent Security

If you’re using a NoSQL database and need to analyze hundreds of millions of ad impressions, game metrics, or social media hashtags, then use Amazon Redshift as a data warehouse,

 Leverage existing data sources, from local to the enterprise data warehouse.  Share predictive insights easily within your LOB or with the

• Amazon Machine Learning: service for developers; visualization tools and wizards that guide you through the process; connects to data stored in Amazon S3, Redshift, etc. • Azure