• No results found

Getting Real Real Time Data Integration Patterns and Architectures

N/A
N/A
Protected

Academic year: 2021

Share "Getting Real Real Time Data Integration Patterns and Architectures"

Copied!
26
0
0

Loading.... (view fulltext now)

Full text

(1)

Getting “Real”

Real Time Data Integration Patterns and

Architectures

Nelson Petracek

Senior Director, Enterprise Technology Architecture Informatica

Digital Government Institute’s Enterprise Architecture Conference, May 1, 2014, Washington, DC

(2)
(3)

MORE AGILITY RIGHT Time PROACTIV E vs. REACTIVE INSTANT TRUST Self-Service Fresh Information All Data One Place Immediate Response Times 100% Uptime

User Expectations

(4)

Representative Use Cases

Sensor Monitoring

Customer Interaction

Security

(5)

It is no longer sufficient to view

information “after the fact”.

Business demands information

sooner, with more accuracy, in order

to meet competitive and regulatory

demands.

Business needs to respond to

“threats” and “opportunities sooner.

Reduce decision latency.

Proactive alerts and notifications.

Improve TTA (time to answer).

(6)

Traditional Data Management Approaches

Store Analyze Act Data Integratio n EDW BI Valuable for: •Reporting •Historical Activity •Strategic Analysis

(7)

The Challenges with Traditional Approaches

Store Analyze

Act

Takes too long to deliver

what is needed.

Lots of “wait” and “waste” in

the process.

No common and trusted data

access.

Information is missing or is

stale / delayed.

Too much “decision

(8)

Next Generation Data Integration

(9)

A Shift in Thinking is Needed…

Need to shift from building large, monolithic applications to smaller sets of distributed “micro-applications” based on the principles of “Reactive Applications”*.

Resilient

Scalable

Event Driven

Responsive

Move away from a “store first” approach; provide the ability to process event data as it arrives.

Focus on hybrid architectures that facilitate both batch and real-time processing.

(10)

Reactive Applications: Characteristics

ResilientAble to recover at all levels.

• Utilize fine grained resilience on the component level.

• “Bulkhead pattern”.

ScalableAvoid contention on shared resources.

• Scale out or up as needed (without rewrites).

• Maintain programming model as system is scaled. Event-DrivenSystem communicate via events.

• Loosely coupled, asynchronous, Amdahl’s Law. • Efficient use of resources.

ResponsiveHonor response time guarantees regardless of load.

• Provide users with a rich, interactive experience. • Observable models, event streams, stateful clients.

(11)

Sample Architectural Approach: Reactive

Applications

Operational Data

(Field Devices, Applications, Clickstream, IoT, logs, etc.) Event Based Applications Various Source Applications / Technologies Data Warehouse Hadoop / NoSQL Analytics Streaming Collection Vibe Data Stream Data Integration PowerCenter Event Processing Streaming Analytics RulePoint CDC / Data Access CDC PWX

Ultra MessagingReal Time Stream Transport / Delivery Ultra Messaging Stream Transformation B2B Data Transformation Power Exchange

(12)

OI System

Action

Proactive actions

instead of reactive.

Allows the end-user to

define conditions and

rules through

self-service capabilities.

Users are “pushed” the

information they need,

when they need it, in the

system that they need it.

EVENT S

DATA ALERT S

(13)

Sample “Big Data” Reference Architectures

* Source: http://hortonworks.com/hdp/ * Source: http://www.cloudera.com/content/cloudera/en/products-and-services/ “Real-Time” Component

(14)

Hybrid Architecture: Batch Plus Real-Time

“Big Data Supply Chain” Data Sources

(Devices, Apps, Clickstream, IoT,

logs, etc.)

• Batch

• Map / Reduce, YARN • Data Analytics

• Long term Persistence, High Latency • e.g. Purchase history analysis. Historical Batch

Computation

Distributed

Real-Time Computation Real-Time

• Continuous Computations

• Streaming Analytics / Event Processing • Incremental, Low Latency

• e.g. Sensor / infrastructure monitoring.

Data Targets (Dashboards, BI,

(15)

Stream Collection

Separate from “batch” or “bulk” data loading.

Involves the collection of event data (“streams”) as they occur, from various endpoints, systems, and people.

Multiple options available:

“Micro-batch” or near real-time data integration.

Data integration hub pattern.

Real-time collection.

Data replication, etc.

Number of factors to look at when determining the right pattern to utilize.

(16)

Stream Collection: Replication

EXTRACT SERVER MANAGER SERVER MANAGER http:// APPLY Console

Source System Target System

SQL Apply Merge Apply Audit Apply Intermediate Files Committed Checkpoint Checkpoint High Speed Extraction High Speed Parallel Apply

Utilize replication

beyond the “copying” of data from one data

store to another.

Event-enable back-end data stores.

Non-intrusively detect changes in data,

publish data changes to one or more targets.

Real-time delivery of

the latest data changes to target systems.

(17)

Stream Collection: Data Integration Hub Pattern

Eliminate point-to-point collection / delivery interfaces.

Provide a location independent mechanism for data producers (and consumers) to “talk” to one another.

“Publish and Subscribe”

Manage data delivery impedance mismatches.

Provide self-service capabilities.

Centralize data quality, masking, transformation logic.

(18)
(19)

Stream Collection: Distributed Agents

Distribute collection across thousands of endpoints.

Perform filtering,

transformation, etc. “close to the source”.

Focus on daemon-less or broker-less designs for

improved performance and scalability.

Provide varying qualities of service.

Streaming, guaranteed, etc.

Allow for dynamic configuration. Sources Targets Stream Node Stream Node Stream Node Stream Node Stream Node Stream Node

(20)

Stream Collection: Distributed Agents with

Collectors

Event Processin g Data Integration HDFS EDW Real Time Actions Local Hub Regional Hub Central Hub Agent

Edge data filtering and processing Streaming Data Collection Data Transfer Agent Agent Agent

(21)

Event Streaming Analytics

Execute logic against real-time streams.

Utilize streaming language constructs.

Logic may be executed at a point-in-time, or over time.

Temporal reasoning.

Join or merge multiple streams together for real-time pattern recognition, correlation, etc. across data sources.

Timely and contextual.

Augment real-time streams with historical context.

Distributed Real-Time Computation

(22)

Event Delivery

Data Integration Hub • Allow data consumers to “subscribe” to data

previously pushed to the hub.

• Batch + near real-time feed. Data Integration • Feed content into back-end systems through application interfaces.

• Batch + near real-time feed. Streaming Delivery • Push content to end

applications, dashboards, etc.

• Content may consist of derived or raw events.

• Near real-time + real-time feed.

(23)

Lambda Architecture

* Source: http://jameskinley.tumblr.com/post/37398560534/the-lambda-architecture-principles-for-architecting

Data is distributed to both a “Batch Layer” and “Speed Layer” for processing.

Batch layer manages the

append-only master set

of raw data.

“Serving Layer” indexes batch views for

low-latency queries.

“Speed Layer” covers

recent data not in the

Batch Layer.

Queries merge results from the batch and real-time views.

(24)
(25)
(26)

Questions?

www.operationalintelligenc

e.me

References

Related documents

The power, or magnification, at which a telescope is operating is determined by two factors: the focal length of the telescope’s main (objective) lens and the focal length of

Temnothorax nylanderi (Förster, 1850) new and a second record of Stenamma debile (Förster, 1850) (Hymenoptera, Formicidae) in Norway.. TORSTEIN KVAMME & THOR

The writers calculated the data (numeric) from the test given when students learning English used Interactive Board Games. Before the test gave to the sample, we gave the test to

For it can be assumed that the person defines his or her life authorship and perceives its authorship aspects (subjectivity, personal resources, autonomy,

Parnaíba Basin Maturation effect of the diabase intrusives on the Pimenteiras Fm DPC&Assoc... Parnaíba Basin

The new proposal would significantly lower the statutory rate for the corporation income tax, lower individual rates further and increase the tax thresholds, tax

Our results suggest that if the bottom line of the EU proposal were that at least 50% of GHG emissions reductions must be achieved via domestic actions for the Annex I countries as

Overall, the results were very consistent: Rank- ing only according to the strongest feature for a category gives a test precision (on all pairs) of be- tween 57% (for the Spam