• No results found

Design Patterns for Large Scale Data Movement. Aaron Lee

N/A
N/A
Protected

Academic year: 2021

Share "Design Patterns for Large Scale Data Movement. Aaron Lee"

Copied!
25
0
0

Loading.... (view fulltext now)

Full text

(1)

Aaron Lee

[email protected]

Design Patterns for

Large Scale Data

(2)

Data Movement Patterns

o

The right solution depends on

the problem you’re solving

Update rates?

Fan-in or fan-out?

Payload size?

Guarantee required?

Real-time or intermittent?

Any weird networks?

Acceptable latency?

(3)

Latency Required

o

Some not sensitive at all

‐ Batch updates

o

Seconds often good enough

‐ Database sync ‐ User interfaces

o

Others measure in

milli- or micro-seconds

‐ Algo trading ‐ Industrial controls Required Latency Not Critical Low as Possible

(4)

Network Distance

o

Co-location for max speed

‐ Minimize speed of light

o

LAN for many apps

‐ 10GigE networks

o

Long distance WAN

‐ Expensive, limited pipes

‐ Creates mismatches with

other networks

Network Distance

Co-location Global

(5)

Number of Messages

o

Few

‐ Batch updates ‐ Simple applications

o

Moderate

‐ Risk management ‐ Order routing

o

Insane

‐ Market data

‐ Click stream analysis

Number of Messages OMG

(6)

Degree of Distribution

o

Point-to-point

o

Fan-out (many subs)

o

Fan-in (many pubs)

o

Mesh

‐ Synching data between

many endpoints Degree of Distribution Millions of Endpoints 1:1

(7)

Message Size

o

Small

‐ Status updates, activity logging events

o

Medium

‐ Orders, product BOMs

o

Large

‐ Batch updates, media files, product catalogs

o

Very different stresses on

system based on message

size and frequency.

Size of Messages Huge Small

(8)

Importance of Delivery Guarantee

o

“Best effort” fine for

some scenarios

o

Others require “once

and only once”

o

Sequence matters for some

o

Some demand failsafe

even in DR scenarios

Delivery Guarantee Importance Very Not

(9)

Other Considerations

o

Message

‐ Format ‐ Protocol ‐ Structured/Unstructured

o

Network

‐ Availability ‐ RTT ‐ Bandwidth cost

o

Robustness

‐ Archival ‐ Caching ‐ Acceptable MTBF ‐ HA switchover times ‐ DR requirements

(10)

Combination of Factors Yields Design Patterns

o

Some attributes tend to

correlate

‐ # of messages and degree of distribution

o

Others usually contradict

‐ Network distance and latency

‐ Guarantee and latency

o

Tradeoffs and creative

solutions

Number of Messages Network Distance Required Latency Delivery Guarantee Importance Degree of Distribution

(11)

Identifying Patterns in Real-World Use Cases

Examples in this section:

Trade Order Flow

Manufacturing Data Sync

Oil and Gas Monitoring

Real Time Sports Betting

Use cases unique,

(12)

Order Flow

o Latency matters, but

not every microsecond

o Usually localized o Continuous, high-rate message flow o Mid-sized messages (1-2Kb) o Messages absolutely must be guaranteed Number of Messages Network Distance Required Latency Delivery Guarantee Importance Degree of Distribution

(13)

Order Flow;

Architecture

Smart Order Router Back Office Applications Management & Monitoring Client

Gateways GatewaysExchange

Disaster Recovery Site Exchanges Clients Slow Subscribers

Message Bus TimeReal Sync

(14)

Order Flow;

Similar Use Cases

o Credit card processing

‐ Long-distance WANs

‐ latency in hundreds of milliseconds

Need a way to correlate which use case is which color on the chart.

Number of Messages Network Distance Required Latency Delivery Guarantee Importance Degree of Distribution o E-commerce ‐ Higher volumes

‐ Higher guarantee required

o Logistics scheduling

‐ Less latency sensitive

(15)

Manufacturing Data Sync

o Geographically distributed

o 100% delivery guarantee

required

o Data rate is use case specific –

will assume lots of medium (< 5K) messages.

o Number of endpoints use case

specific, assume 10

manufacturing locations

Build from the background image on prior slide

Number of Messages Network Distance Required Latency Delivery Guarantee Importance Degree of Distribution Size of Messages

(16)

Manufacturing Data Sync;

Architecture

Applications & Databases Maximizing Bandwidth Fanout at Edge Smart Buffering

(17)

Manufacturing Data Sync;

Similar Use Cases

o Real Time Risk Management

‐ Smaller messages

‐ Latency more important

Number of Messages Network Distance Required Latency Delivery Guarantee Importance Degree of Distribution Size of Messages

o Retail Global Inventory

‐ Messages can be larger

‐ Distribution can be more

o Real Time Financials

‐ Messages larger

‐ Distribution less

(18)

Oil & Gas Pipeline Monitoring

o Wifi, Satellite, proprietary and

other unreliable networks

o Degree of distribution off the

charts. In this case, fan-in.

o Messages usually pretty small,

unless batch

o Latency unimportant

o Level of guarantee use case

specific, assume status

messages (ie. guarantee not

Number of Messages Network Distance Required Latency Delivery Guarantee Importance Degree of Distribution

(19)

Oil & Gas Pipeline Monitoring;

Architecture

Pipeline Sensors Collection Caches Wireless Analytics Engines

Big Data & Databases Unreliable Networks Big Data Loading Message Bus Real Time vs. Delayed Analytics

(20)

Oil & Gas Pipeline Monitoring;

Similar Use Cases

o Smart Grid

‐ Small messages

‐ Massive distribution Number ofMessages

Network Distance Required Latency Delivery Guarantee Importance Degree of Distribution o Transportation Monitoring ‐ Fewer endpoints ‐ Bigger messages

o Retail Point of Sale

‐ More predictable networks

(21)

Real-Time Sports Betting

o Huge message volumes

(in this case fan-out)

o Low level of guarantee for any

one outbound message

o High level of guarantee for

inbound messages

o Tiny messages

o Network is the internet +

mobile carriers

o Latency (beyond network

latency) is important Number of Messages Network Distance Required Latency Delivery Guarantee Importance Degree of Distribution Size of Messages

(22)

Real-Time Sports Betting;

Architecture

Highlight the degree of fan out, connection counts, event logging, real time

analysis for odds adjustment

Mobile Customers Web Customers Streaming Odds Data Clickstream & Marketing Security & Fraud Customer & Betting Apps Odds & Analytics Data Streaming Huge Connection Counts Low Latency Big Data Message Bus

(23)

Real-Time Sports Betting;

Similar Use Cases

o Mobile Social Updates

‐ Latency less important

‐ Distribution far greater Number ofMessages Network Distance Required Latency Delivery Guarantee Importance Degree of Distribution Size of Messages

o Real Time Travel Alerting

‐ Each message more important

‐ Volumes much lower

o Market Data Distribution

‐ Latency even more important

‐ Volumes often much higher

(24)

Number of Messages Network Distance Required Latency Delivery Guarantee Importance Degree of Distribution

(25)

Summary

References

Related documents

 In 2014, the average Medicaid fee-for-service reimbursement rate was 40.7 percent of commercial dental insurance charges for adult dental care services in states that provide

DSBV-broadcast, allows each process to broadcast a binary value, and obtain a set of values such that (1) no value broadcast only by Byzantine processes can belong to the set

When the correct pass code is entered the &#34;Children&#34; mode screen appears.. The following

• Android Auto text message read back may be altered by the phone carrier when the text message is too long.. o Verizon: Text messages are

 SMS (Short Message Service): A text message sent to consumers containing relevant marketing messages.. o Positives: Ability to reach consumers

approach to deal with a long-term problem, and may prematurely lock countries into investments that would inhibit the development of the next generation of technologies that

The chlorine free radical then react with stratospheric ozone to form chlorine monoxide radicals and molecular oxygen.. Reaction of chlorine monoxide radical with atomic

• List the publications required for prudent navigation in the local area including the following ASA minimum requirements:.. o Large scale charts of the area and Chart Number 1 o