• No results found

Taming the Elephant with Big Data Management. Deep Dive

N/A
N/A
Protected

Academic year: 2021

Share "Taming the Elephant with Big Data Management. Deep Dive"

Copied!
33
0
0

Loading.... (view fulltext now)

Full text

(1)

Taming the Elephant with

Big Data Management

(2)

Big Data Management

(3)

Safe Harbor

The information being provided today is for informational purposes only. The

development, release and timing of any Informatica product or functionality

described today remain at the sole discretion of Informatica and should not be

relied upon in making a purchasing decision. Statements made today are based

on currently available information, which is subject to change. Such statements

should not be relied upon as a representation, warranty or commitment to deliver

specific products or functionality in the future

(4)

Overview of Data Integration Solutions

Data Warehousing

Agile BI

Real-time DI

Data Migration

Apps Integration (on-prem)

DW Offloading/ Optimization

Data Lakes

Big Data Analytics

NoSQL Integration

Apps Integration (Hybrid)

Cloud & Hybrid DI

DW & Analytics (Cloud DBs)

Traditional Workloads

Next-Gen Workloads

Cloud & SaaS Workloads

PowerCenter

Big Data Management

Cloud Data Integration

(5)

Informatica’s big data Journey – 2012

2012 – 1

st

release of Informatica Big Data Edition

1

st

Data Integration Platform to

Natively execute on Hadoop

Support for Map Reduce

Support for HDFS/Hive/HBase

Profile Natively on Hadoop

Hadoop 1.0

Map Reduce

Processing & Resource Management

HDFS

Distributed Storage
(6)

Informatica’s big data Journey – 2016

YARN

INFA ENGINE

Blaze

Spark

Core Spark Core Tez

Hive on

Tez Hive onSpark Spark

Smart Executor

Informatica

Big Data Management

HDFS

Map Reduce Hive on Map Reduce

Polyglot computing:

Map Reduce, Blaze,

Tez, Spark

Multi-distribution

support on both

on-prem and cloud

End to End Big Data

Management

(7)

Run on Informatica Node(s)

Connect to Hadoop

sources/targets

Run on Hadoop

cluster

Connect to Hadoop

sources/targets

Connect to

non-Hadoop

sources/targets

Big data modes of execution

(8)

Why Informatica BDM?

Informatica Big Data Management

Informatica

Native PushdownSQL

Hadoop Pushdown Map

Reduce Tez Spark Blaze

Polyglot

Computing

Business

logic

Informatica Mappings

Solution

(9)

Big Data Challenges

36%

Obtaining Skills and

capabilities needed

33%

Security, Privacy

& Data Quality

26%

Integrating

multiple data

sources

26%

Integrating big data

technology with

existing

infrastructure

Source: Gartner → Mapping based development → PC Reuse → SQL to Mapping → Kerberos Support

→ Sentry / Ranger Support → Data masking, OS Profiles → DQ, Profiling on Hadoop → Power Exchange → Data Processor → SQOOP → On-Prem distro support → Cloud distro support

(10)

3 pillars of Informatica Big Data Management

Data

Integration Data Quality & Governance SecurityData Single, Comprehensive and Integrated Platform

for

(11)

100+

PRE-BUILT PARSERS

200+

PRE-BUILT CONNECTORS

Out of the

Box

BUSINESS RULES AND DATA STANDARDIZATION WebSphere MQ JMS MSMQ SAP NetWeaver XI JD Edwards Lotus Notes Oracle E-Business PeopleSoft Oracle DB2 UDB DB2/400 SQL Server Sybase ADABAS Datacom DB2 IDMS IMS Word, Excel PDF StarOffice WordPerfect Email (POP, IMPA) HTTP Informix Teradata Netezza ODBC JDBC VSAM C-ISAM Binary Flat Files Tape Formats… Web Services TIBCO webMethods Flat files ASCII reports HTML RPG ANSI LDAP EDI–X12 EDI-Fact RosettaNet HL7 HIPAA XML LegalXML IFX cXML AST FIX SWIFT Cargo IMP MVR Salesforce CRM Force.com RightNow NetSuite ADP Hewitt SAP By Design Oracle OnDemand Facebook Twitter LinkedIn Kapow Pivotal Vertica Netezza Teradata Aster

Universal connectivity

(12)

Data Storage &

Transport Formats Industry Standard Formats Organizational Formats

XML JSON Parquet AVRO Financial Services Healthcare EDI Delimited Files PDF Word Excel Hadoop Cluster Informatica IDE

Pre-Built Parsers for Industry Standards

(13)

SQOOP

JDBC based universal connectivity to many sources

No need for installation of database clients on Hadoop cluster to read / write

data

Seamless integration into Informatica mappings

Integration at both connection and data object level

(14)

Profiling on Hadoop

Analyst

Statistics to identify

anomalies

Value & Pattern

Analysis

Drill down analysis

Multi tenancy

(15)

Data Quality on Hadoop

Data Quality

Address validation

Parse

Match

Standardize

(16)

Security has many aspects

In

fra

st

ru

c

tu

re

Da

ta

Authentication Authorization Auditing Monitoring Encryption Data Masking+

http://blogs.informatica.com/2015/07/24/bigdatasecurity-2/

Ap

p

lic

a

tio

n

Multi-tenancy+
(17)

Authentication: Kerberos

Industry standard

authentication for Hadoop

clusters

Informatica BDM Supports:

Kerberos authentication in INFA domains

Connecting to Kerberos enabled Hadoop

clusters

360

O

support:

Client & Server

Metadata access & Data access

(18)

Blaze Security Integration – Ranger/Sentry

Informatica node Hadoop Cluster

Blaze Runtime Blaze Container

Mapping at runtime (in-memory)

Source Transforms Target

Ranger/Sentry

Blaze Executor

HDFS Data files

HDFS Service / Hive Server 2 Optimizer

call

(19)

Informatica Monitoring

1

(20)

Informatica Monitoring

1

(21)

Informatica Monitoring

2

3 1

(22)

Data Masking

Mask sensitive data while

ingesting and processing

Supports Persistent Data Masking

16 different techniques supported including

SSN

Credit Card

First & Last names, Emails

Polyglot engine:

Supported in Native mode

Supported in Hive mode
(23)

Multi-tenancy

Application Binding

Bind multiple Informatica users to one or more system accounts

System accounts can be OS / Hadoop accounts

Primarily used in batch use-cases, mappings

User Binding

Also known as pass through security

Bind individual Informatica users to their corresponding OS / Hadoop accounts
(24)

3 pillars of Informatica Big Data Management

Data

Integration Data Quality & Governance SecurityData Single, Comprehensive and Integrated Platform

for

End-to-End Big Data Management

SQOOP

Blaze

DI on Spark

SQOOP for Profiling

Blaze for Profiling

JDBC for reference

data*

Kerberos

Sentry / Ranger

(25)

Deep Dive

(26)

Scenario:

INFA Air receives information from multiple airports on the expected / actual schedules of various flights. They need to consolidate this information into a Hadoop environment to perform analytics such as flight-on-time analysis

Challenges:

Data is collected in various formats with various intervals: Some provide in flat files and some are staged in Oracle table

All this data is ingested into a Hive table for cleansing and analysis

The data from hive table is subsequently sent to alerting system to send individual alerts for travelers

DEMO – Use case

(27)

Private Network

Hadoop Cluster

Lab environment

Hadoop Node 1

Hadoop Node 2

Informatica Server

Informatica Client

(28)

Login credentials

Host name Username Password

Hadoop Node 1 psvrl65iw2016hdp00

1 iw2016 iw2016

Hadoop Node 2 psvrl65iw2016hdp00

2 iw2016 iw2016

INFA Server psvrl65iw2016i1001 iw2016 iw2016 INFA Client psvw7iw2016i1001 Administrator iw2016 Administrator,

Monitoring Administrator Administrator

Lab access:

https://informatica.instructorled.training

Access code: 34762748

xx

(29)
(30)
(31)

Lab 1 – High speed Ingestion in pushdown mode

Read from flat file

Read from Oracle

Union the data

Write to hive

Lab 2 – Extraction with schema-on-read

Read from Hive

Write to flat file

Dynamically update the schema

Use Blaze
(32)

Questions…?

(33)

Informatica User Groups are a great way for

you to invest in your professional development

and learn about new Informatica offerings.

Local Chapter Leaders manage each IUG

online and via in person meetings

Network and Socialize

Find and share content, best practices & tips

Learn about the latest technologies and

solutions from Informatica

Discover how colleagues and peers use

Informatica

https://network.informatica.com/welcome/

LEARN MORE AT IW16 : Go to the

Solutions Expo Informatica Pavilion /

Ecosystem & Innovation Area:

Talk to regional user group leaders

Learn about meeting plans

Join your regional user group

When:

Monday 6:00pm – 8:30pm

Tuesday 10:45am – 2:15pm

Wednesday 10:30am – 1:45pm

Where:

Moscone West Hall Level One

References

Related documents