• No results found

The Data Warehouse Challenge

N/A
N/A
Protected

Academic year: 2021

Share "The Data Warehouse Challenge"

Copied!
14
0
0

Loading.... (view fulltext now)

Full text

(1)

The Data Warehouse

Challenge

Taming Data Chaos

Michael H. Brackett

Fachbereichsbibliothek Informatik TU Darmstadt

Technische Hochschule Darmstadt

FACHBEREICH INFORMATIK

B I B L I O T H E K Irwentar-Nr.: ...H.3£...:T...G3.ty..2iL..

Saclwjebiete:

Standort:

n..7!..r;

WILEY COMPUTER PUBLISHING

John Wiley & Sons, Inc.

New York • Chichester • Brisbane • Toronto • Singapore

(2)

Contents

About the Author v Foreword by William H. Inmon vii Acknowledgments ix Preface xi

Chapter 1 Data Crisis 1

Information Demand 2 Dynamic Environment < 2 Business Changes 3 Business Information Demand 4 Data Situation 4 Disparate Data 5 Disparate Data Cycle 7 Data Dilemma 8 Technology Trends 9 Client/Server Architecture 10 Data Warehouse Systems 10 Geographic Information Systems 11 Other Trends 12 Metadata Demand 13 Summary 14 Questions 15

xv

(3)

xvi CONTENTS

Chapter 2 Data Challenge 17

The Realities 18 Basic Problem 18 Data Awareness 18 Data Understanding 19 Data Variability 20 Data Redundancy 21 Data Access 22 Tools 23 Standards 24 Hidden Resource 25 / Disparate Data Shock 25 Meeting the Challenge 26 Data Resource Initiative 26 Data Resource Strategies 27 Identify Data 27

Understand Data 27 Integrate Data 28 Aggregate Data 28 DepZoy Data 28 Opportunity for Change 29 Approaches 29 Justification 30 Summary 32 Questions 33

Chapter 3 Data Vision 35

Integrated Data Resource 36 Principles 36 Subject-Oriented 37 Business Survival-Oriented 38 i?eaZ World Perspective 38 Robust Resource 40 Sharable Resource 41 Development 42 A Formal Data Resource 43 Data Resource Library 44 Information Engineering Support 44 Data 46 Data Engineering 48 Summary 49 Questions 50

(4)

CONTENTS xvii

Chapter 4 Data Architecture 51

Formal Architecture 52 Information Technology Infrastructure 52 Data Resource Framework 55 Data Architecture 55 Common Data Architecture 57 Formal Approach 60 Data Architecture Perspective 60 Data Model Perspective 61

y Data Unit Perspective 62 Objects and Events 62 Features 63 Existences and Occurrences 64 Coded Data Values 65 Data Megatype Perspective 65 Summary 67 Questions 68

Chapter 5 Data Description 69

Data Names 70 Data Naming Conventions 71 Data Naming Taxonomy 72 A Structural Taxonomy 72 Original Taxonomy Components 74 Enhanced Taxonomy Components 75 Data Naming Vocabulary 76 Aligning Naming Conventions 78 Forming Data Names 78"

Data Site Names 79 Data Occurrence Selections Names 80 Data Subject Names 80 Data Code Set Names 81 Data Characteristic Names 82 Data Characteristic Variation Names 86 Data Characteristic Substitution Names 89 Data Code Names 90 Data Version Name 91 Data Name Abbreviations 92 Short Data Names 93 Defining Data 94 Data Definition Criteria 94 Data Definition Common Words 98 Summary 98 Questions 99

(5)

xviii CONTENTS

Chapter 6 Data Structure

Data Structure Concept

Common Data Structure Data Sets

Data Relations

Common Notation Data Relation types Data Relation Diagrams

Entity Relation Diagrams Subject Relation Diagrams File Relation Diagrams Multiple Perspectives Data Subject Hierarchy Presenting Ideas Data Keys

Primary Keys

Multiple Primary Keys Primary Key Intelligence Dual Primary Keys Foreign Keys Subject Structure Chart Coded Data

Code Tables Data Code Set Coded Data Trends Data Group Trends Data Classification

Data Classification Scheme Data Themes

Data Segments Data Clusters Summary

Questions

Chapter 7 Data Qualit

Disparate Data Quality Data Integrity

Data Value Integrity

Conditional Data Value Integrity Data Domains

Default Data Values

101

102 102 103 104 104 105 108 108 112 114 115 116 119 123 123 125 126 128 128 129 131 131 134 134 135 135 136 139 139 140 141 142

143

144 145 146 147 150 151

(6)

Chapter 8

CONTENTS

Data Structure Integrity

Conditional Data Structure Integrity Referential Integrity

Data Retention Integrity Data Derivation Integrity

Derived Data Redundant Data Replicated Data Data Accuracy

Scope

Data Currentness

Data Lineage and Heritage Temporal Data

Data Versions

Multiple Source Updates

Proactive and Retroactive Updates Data Completeness

Managing Data Quality

Data Quality Improvement Data Quality Criteria Data Quality Techniques Data Quality Process

Realizing Disparate Data Quality Understanding Existing Data Quality Determine Desired Data Quality Adjusting Data Quality

Tracking Data Quality Summary

Questions

Metadata

Metadata Situation Disparate Metadata Disparate Metadata Cycle Metadata Dilemma Metadata Shock A New Perspective

Metadata Types Common Metadata Metadata Warehouse

Metadata Warehouse Concent

XIX

152 154 156 157 158 158 163 164 164 165 165 167 170 172 173 174 175 176 177 177 178 179 179 179 179 179 180 181 183

185

186 186 187 188 188 189 189 191 193 194

(7)

xx CONTENTS

Metadata Warehouse Architecture 195 Metadata Warehouse Components 195 Data Naming Lexicon 197 Data Dictionary 199 Data Structure 202 Data Integrity 203 Data Thesaurus 205 Data Glossary . 208

"" Data Product Reference 209 Data Directory 211 Data Translation Schemes 212 Data Clearinghouse 213 Managing Metadata 216 Metadata Quality 216 Metadat Versions 218 Summary 220 Questions 221

Chapter 9 Data Refining 223

Data Refining Concept 224 Data Refining Approach 224 Data Product Concept 225 Data Product Names 227 Data Naming Taxonomy 227 Data Products 228 Data Product Groups 228 Data Product Units 229 Data Product Codes 230 Data Product Definitions 231 Data Product Structure 232 File Relation Diagram 232 File Structure Chart 233 Entity Relation Diagram 234 Entity Structure Chart 235 Data Product Quality 236 Data Product Integrity 236 Data Product Accuacy 237 Data Cross-Reference 238 Data Cross-Reference Approach 239 Data Product Group 240 Data Product Unit 240 Data Product Code 244 Data Product Inventory 246

(8)

CONTENTS xx!

Data Variability 247 Primary Key Variability 247 Data Subject Variability 247 Data Characteristic Variability 247 Data Code Value Variability 249 Official Data Variations 251 Official Primary Key 252 Official Data Characteristic Variations 252 Official Data Domains 254 Official Data Codes 254 Data Translation Schemes 255 Data Characteristic Translation 255 Data Code Translation 257 Disparate Data Integration 258

Integration Scope 258 Official Data Source 259 Integration Table 260 Physical Integration 261 Summary 262 Questions 263

Chapter 10 Evaluational Data 265

Data Warehouse System Concept 266 Decision Support 266 Data Resource Support 267 Data Warehouse System Definition 268 Dual Database Concept 269 A New Perspective 270 Evaluation Data 270 Data Architecture 272 Data Dimensions 273 Evaluation Data Perspective 21A Evaluation Data Description 274 Data Subjects 275 Data Subject Names 276 Data Characteristic Names 277 Data Selection 278 Data Versions 279 Data Definitions 279 Evaluation Data Structure 280 Primary Keys 280 Subject Relation Diagram 281 Summary Data Subject Matrices 283

(9)

xxii CONTENTS

Evaluation Data Integrity 285 Data Relations 285 Data Normalization 286 Data Summarization 288 Data Summarization Levels 290 Maintaining Evaluation Data 291 Data Addition 292 Data Removal 293 Data Rederivation 295 Data Version 296 Data Perspectives 297 Metadata 298 Data Exploration and Mining 301 Summary 302 Questions 303

Chapter 11 Data Transformation 305

Data Transformation Concept 306 Data Transformation Perspective 306 Data Transformation Routes 310 Data Transformation Matrix 311 Data Transformation Steps 311 Identify Target Data 312 Identify Source Data 313 Extract Source Data 314 Reconstruct Historical Data 315 Translate Data 316 Recast Data 317 Restructure Data 319 Summarize Data 320 Load Data 321 Review Data 321 Summary 322 Questions 323

Chapter 12 Spatial Data 325

A Data Perspective 326 Decision Support 326 Data Situation 327 Common Data Architecture 328 Spatial Data Definitions 329

(10)

CONTENTS xxiii

I Spatial Data Description 331 Data Layers 331 Spatial Data Layer Names 335 Spatial Data Definition 338 Spatial Data Structure 339 Data Relations 339 Primary Keys 342 Spatial Data Quality 344 Datums 344 Linear Referencing Systems 345 Linear Addressing Systems 347 Geographic Areas 348 Linear Object Segmentation 349 Metadata 350 Managing Spatial Data 351 Spatial Data Tiers 351 Spatial Data Themes 353 Seen Areas 354 Duplicate Data Layers 355 Data Layer Extents 356 Time-Variant Spatial Data 356 Data Layer Aggregation 357 Three-Dimensional Aggregation 360 Spatial Data Scale 361 Integrating Tabular and Spatial Data 362 Spatial Data Referencing 363 Descriptive Spatial Referencing 364 Nondescriptive Georeferencing 366 Indirect Spatial Referencing 367 Summary 369 Questions 370

Chapter 13 Distributing Data 373

Data Distribution Concept 374 Data Distribution 374 Data Distribution Dilemma 375 Common Data Architecture 376 Official Data 377 Replicating Data 378 Distributed Data Description 379 Distributed Data Names 379 Distributed Data Definitions 381

(11)

xxiv CONTENTS

Distributed Data Structure 381 Logical Data Structure 382 Distributed Data Structure 382 Physical Data Structure 384 Distributed Data Diagram 386 Data Partitioning 389 Data Subject Partitioning 390 Data Occurrence Partitioning 391 Data Characteristic Partitioning 392 Dual Data Partitioning 393 Distributing Data 393 Data Distribution Driver 394 Distributing Tabular Data 394 Distributing Evaluational Data 395 Distributing Spatial Data 396 Distributing Metadata 397 Data Marts 398 Redistributing Data 399 Distributed Data Quality 400 Data Origination 401 Data Tracking 401 Data Concurrency 403 Distributed Data Quality Principles 405 Summary 406 Questions 407

Chapter 14 Common Data Model 409

The Data Schema Concept 410 Two-Schema Concept 410 Three-Schema Concept 411 Four-Schema Concept 412 Five-Schema Concept 414 Abstract Schema Concept 415 Framework for Information Systems 416 Five-Schema and the Framework 417 Common Data Modeling 418 Data Modeling Perceptions 419 Data Modeling Problems 420 Common Data Architecture 422 Common Data Modeling Concept 424 Forward Data Modeling 424 Reverse Data Modeling 426 Vertical Data Modeling 427

(12)

Common Data Modeling Method Basic Data Modeling Components An Integrated Data Resource

Modeling Logical Schema Developing New Data Refining Disparate Data Developing Evaluational Data Distributing Data

Changing Operating Environments Integrating Data

Data Model Interfaces Data Subject Hierachies

Common Person Grouped Code Tables Archive and History Data Summary

Questions

CONTENTS XXV

428 428 430 431 431 432 433 433 434 435 436 437 439 441 442 444 446

Chapter 15 Resolving the Dilemma 447

Data Issues 448 Increasing Data Disparity 448 Knowledge Loss 449 Millennium Data Problem 450 Client Data Access 451 Acquired Applications 453 Conflicting Data Standards 454 Standards and Guidelines 455 Rapid Development 456 Multiple Common Data Architectures 457 Legacy Systems 457 Stabilizing Variables 458 Business Improvement 460 Resolution Initiative 461 Recognition 461 Vision 462 Orientation 463 Strategy 465 Evaluation 466 Summary 466 Questions 468

Glossary 469

(13)

xxvi CONTENTS

Appendix A Common Words 523

Common Data Site Words 523 Common Data Subject Words 523 Common Data Characteristic Words 525 Common Data Characteristic Variation Words 528 Common Data Version Words 529 Common Data Definition Words 529

Appendix B Short Data Names 531

Parent Elimination Notation 531 Subordinate Inclusion Notation 532 Subordinate Substitution Notation 532 Parent Substitution Notation 533 Summary Data Subject Notation 533 Program Name Notation 533

Appendix C Data Definition Examples 535

Data Sites 535 Data Occurrence Groups 535 Data Subjects 536 Data Characteristics 537 Data Characteristic Variations 538 Data Codes 539 Data Versions 539

Appendix D Metadata Explanation 541

Appendix E Cross-Reference Example 545

Original Data Definitions 545 Data Qaulity Information 545 Cross-References 551 Cross-References by Common Data Name 551 Cross-References by Product Data Name 552 Subject Relation Diagram 553 Data Definitions 553 Geospatial Dataset 554 Geospatial Dataset Attribute Accuracy 554 Geospatial Dataset Horizontal Accuracy 554 Geospatial Dataset Process 555 Geospatial Dataset Source 555 Geospatial Dataset Vertical Accuracy 556

(14)

CONTENTS xxvii

Appendix F Evaluation Data Example 557 Operational Subject Relation Diagram 558 Evaluation Subject Relation Diagram 559 Primary Key Matrix 560 Data Characteristic Matrix 562 Bibliography 565 Index 567

References

Related documents

In December 2004, Shire was notified that Colony had submitted an ANDA under the US Hatch-Waxman Act seeking permission to market its generic versions of the 5mg, 10mg, 15mg,

Players can create characters and participate in any adventure allowed as a part of the D&amp;D Adventurers League.. As they adventure, players track their characters’

By choosing suitable congruence subgroups of Γ , we are able to present the 1-skeleton of the corresponding finite quotients of Bd(F) as Cayley graphs of explicit finite groups,

proprietary and particularly existing basic technology components and brings them together (e.g. building automation and telemedical devices) offering powerful services ;

The objective of the present research was to study the diversity of mineral contents of wild forages from different sources and to evaluate the beneficial effect of mineral

Advanced Papers Foundation Certificate Trust Creation: Law and Practice Company Law and Practice Trust Administration and Accounts Trustee Investment and Financial

Strategy #4: Review and update comprehensive park plan Action Step #1: Identify financial costs of plan components Action Step #2: Research funding options. Action Step #3:

UltraGenda’s Appointment Storm concept for online bookings made up by the UltraGenda Pro and UG Broka applications is nothing less than a revolution. Processes that