• No results found

Managing Data in Motion

N/A
N/A
Protected

Academic year: 2021

Share "Managing Data in Motion"

Copied!
8
0
0

Loading.... (view fulltext now)

Full text

(1)

Managing Data

in Motion

Data Integration Best Practice

Techniques and Technologies

April Reeve

ELSEVIER

AMSTERDAM • BOSTON • HEIDELBERG • LONDON NEW YORK • OXFORD • PARIS • SAN DIEGO SAN FRANCISCO • SINGAPORE • SYDNEY • TOKYO

Morgan Kaufmann is an imprint of Elsevier

M<

(2)

Contents

Foreword xv Acknowledgements xvii

Biography xix Introduction xxi

PART 1 INTRODUCTION TO DATA INTEGRATION

Chapter 1 The Importance of Data Integration з The natural complexity of data interfaces 3 The rise of purchased vendor packages 4 Key enablement of big data and virtualization 5

Chapter 2 What Is Data Integration? 7

Data in motion 7 Integrating into a common format—transforming data 7

Migrating data from one system to another 8 Moving data around the organization 9 Pulling information from unstructured data 11

Moving process to data 12 Chapter 3 Types and Complexity of Data Integration 15

The differences and similarities in managing data in motion

and persistent data 15 Batch data integration 16 Real-time data integration 16 Big data integration 17 Data virtualization 17 Chapter 4 The Process of Data Integration

Development 19 The data integration development life cycle 19

Inclusion of business knowledge and expertise 20

PART 2 BATCH DATA INTEGRATION

Chapter 5 Introduction to Batch Data Integration 25

What is batch data integration? 25 Batch data integration life cycle 26

(3)

viii Contents

Chapter 6 Extract, Transform, and Load 29

WhatisETL? 29 Profiling 30 Extract 30 Staging 31 Access layers 32 Transform 33

Simple mapping 33 Lookups 33 Aggregation and normalization 33

Calculation 34

Load 34

Chapter 7 Data Warehousing 37 What is data warehousing? 37 Layers in an enterprise data warehouse architecture 38

Operational application layer 38

External data 38 Data staging areas coming into a data warehouse 39

Data warehouse data structure 40 Staging from data warehouse to data mart or

business intelligence 40 Business Intelligence Layer 40 Types of data to load in a data warehouse 41

Master data in a data warehouse 41 Balance and snapshot data in a data warehouse 42

Transactional data in a data warehouse 43

Events 43 Reconciliation 43 Interview with an expert: Krish Krishnan on

data warehousing and data integration 44

Chapter 8 Data Conversion 51 What is data conversion? 51 Data conversion life cycle 51 Data conversion analysis 52 Best practice data loading 52 Improving source data quality 53

(4)

Contents ix

Mapping to target 53 Configuration data 54 Testing and dependencies 55

Private data 55 Proving 56 Environments 56 Chapter 9 Data Archiving 59

What is data archiving? 59 Selecting data to archive 60 Can the archived data be retrieved? 60

Conforming data structures in the archiving environment 61

Flexible data structures 61 Interview with an expert: John Anderson on data

archiving and data integration 62 Chapter 10 Batch Data Integration Architecture and

Metadata 67 What is batch data integration architecture? 67

Profiling tool 67 Modeling tool 68 Metadata repository 69 Data movement 69 Transformation 70 Scheduling 71 Interview with an expert: Adrienne Tannenbaum on

metadata and data integration 73

PART 3 REAL TIME DATA INTEGRATION

Chapter 11 Introduction to Real-Time Data Integration 77

Why real-time data integration? 77 Why two sets of technologies? 78

Chapter 12 Data Integration Patterns 79

Interaction patterns 79 Loose coupling 79 Hub and spoke 80 Synchronous and asynchronous interaction 83

(5)

x Contents

Request and reply 83 Publish and subscribe 84 Two-phase commit 84 Integrating interaction types 85

Chapter 13 Core Real-Time Data Integration

Technologies 87 Confusing terminology 87 Enterprise service bus (ESB) 88 Interview with an expert: David S. Linthicum on

ESB and data integration 89 Service-oriented architecture (SOA) 90

Extensible markup language (XML) 92 Interview with an expert: M. David Allen on

XML and data integration 92 Data replication and change data capture 95

Enterprise application integration (EAI) 97 Enterprise information integration (Ell) 97 Chapter 14 Data Integration Modeling 99

Canonical modeling 99 Interview with an expert: Dagna

Gaythorpe on canonical modeling and data

integration 100 Message modeling 103

Chapter 15 Master Data Management 105 Introduction to master data management 105

Reasons for a master data management

solution 105 Purchased packages and master data 106

Reference data 107 Masters and slaves 107 External data 110 Master data management functionality 110

Types of master data management solutions—registry

and data hub I l l Chapter 16 Data Warehousing with Real-Time Updates 113

Corporate information factory 113 Operational data store 113

(6)

Contents xi

Master data moving to the data warehouse 116 Interview with an expert: Krish Krishnan on

real-time data warehousing updates 116

Chapter 17 Real-Time Data Integration Architecture

and Metadata 119 What is real-time data integration metadata? 119

Modeling 120 Profiling 120 Metadata repository 120

Enterprise service bus—data transformation

and orchestration 121 Technical mediation 122 Business content 122 Data movement and middleware 123

External interaction 123

PART 4 BIG, CLOUD, VIRTUAL DATA

Chapter 18 Introduction to Big Data Integration 127 Data integration and unstructured data 127 Big data, cloud data, and data virtualization 127

Chapter 19 Cloud Architecture and Data Integration 129 Why is data integration important in the cloud? 129

Public cloud 129 Cloud security 130 Cloud latency 131 Cloud redundancy 132

Chapter 20 Data Virtualization 135 A technology whose time has come 135

Business uses of data virtualization 137 Business intelligence solutions 137 Integrating different types of data 137 Quickly add or prototype adding data to a data

warehouse 137 Present physically disparate data together 138

Leverage various data and models triggering

transactions 138

(7)

xii Contents

Data virtualization architecture 138 Sources and adapters 138 Mappings and models and views 138

Transformation and presentation 139

Chapter 21 Big Data Integration 141 What is big data? 142 Big data dimension—volume 142

Massive parallel processing—moving

process to data 142 Hadoop and MapReduce 143

Integrating with external data 144

Visualization 144 Big data dimension—variety 145

Types of data 145 Integrating different types of data 145

Interview with an expert: William McKnight

on Hadoop and data integration 145 Big data dimension—velocity 146

Streaming data 147 Sensor and GPS data 147 Social media data 147 Traditional big data use cases 147

More big data use cases 148

Health care 148 Logistics 148 National security 149 Leveraging the power of big data—real-time decision

support 149 Triggering action 149

Speed of data retrieval from memory versus disk 150 From data analytics to models, from streaming

data to decisions 150 Big data architecture 151

Operational systems and data sources 151

Intermediate data hubs 151 Business intelligence tools 152 Data virtualization server 153

(8)

Contents xiii

Batch and real-time data integration tools 153

Analytic sandbox 153 Risk response systems/recommendation engines 153

Interview with an expert: John Haddad on

Big Data and data integration 154 Chapter 22 Conclusion to Managing Data in Motion 157

Data integration architecture 157 Why data integration architecture? 157

Data integration life cycle and expertise 158

Security and privacy 158 Data integration engines 160

Operational continuity 160

ETL engine 160 Enterprise service bus 161

Data virtualization server 161

Data movement 162 Data integration hubs 162

Master data 163 Data warehouse and operational data store 164

Enterprise content management 164

Data archive 164 Metadata management 164

Data discovery 165 Data profiling 165 Data modeling 165 Data flow modeling 165 Metadata repository 166

The end 166

References 167 Index 169

References

Related documents

Simulating clinical concentrations and delivery rates of a typical intravenous infusion, a variety of routinely used pharmaceutical drugs were tested for potential binding to

Com m aundment god willinge shall bee too the vttermoste of my powre executed, thoughe nott so soone as yett I woolde or your High nesse maye expecte, wayinge the greate partes

These cavities spent the least amount of time above 35˚C and 40˚C (Fig 9A-F) and thus a model cannot be run because there are so few non- diapausing individuals spending

Following a brief review of the developments of Ir-catalysed C-H borylation reactions and synthesis of fluoroaromatics, this thesis describes the investigation of

Mean differences (MD) in percentage body fat between each of the four lower family income quintiles and the highest income quintile were calculated in multiple linear regression

Whether you are planning to start exporting for the very first time or hoping to further expand your existing international presence, Switzerland Global Enterprise (S-GE)

We derived a general, analytical theory for any type of interaction that takes into account Beating &amp; Mixing, and shown that our ‘Beating &amp; Mixing’ theory correctly produces

The aims of this pilot study were to identify common issues and to explore the needs and experiences of people with mesothelioma and asbestos-related lung