The Data Reservoir as an enabler of
differentiating Analytics initiatives
3
rd
March 2015
Mandy Chessell CBE FREng CEng FBCS Distinguished Engineer, Master Inventor Chief Architect, Information Solutions
Agenda
Changing landscapes
Analytics Lifecycles
Data reservoir overview
Questions
“
Looks like you’ve got all the data
– what’s the holdup?”
C
HANGING
I
NFORMATION
Knowing your customers enables you to serve them better …
Behavioral data
- Orders
- Transactions
- Payment history
- Usage history
Descriptive data
- Attributes
- Characteristics
- Relationships
- Self-declared info
- (Geo)demographics
Attitudinal data
- Opinions
- Preferences
- Needs and Desires
Interaction data
- Email / chat transcripts
- Call center notes
- Web Click-streams
- In person dialogues
Who?
What?
Why?
How?
High-value, dynamic - source of competitive differentiation
Knowing your customers enables you to serve them better …
Behavioral data
- Orders
Descriptive data
- Attributes
- Characteristics
Attitudinal data
- Opinions
- Preferences
- Needs and Desires
Interaction data
- Email / chat transcripts
- Call center notes
- Web Click-streams
- In person dialogues
Why?
How?
High-value, dynamic - source of competitive differentiation
Master Data
Information
Analysis of
Channel Interaction
Methods
Analysis of
Feedback and
Interaction Content
?
The broadening scope of analytics
Applications
Data
Warehouse
Pattern
Discovery for
Analytics
Reporting
Data Marts
Operational
Data Store
SOA
The broadening scope of analytics
Master Data
Management
Hub
Applications
Data
Warehouse
Pattern
Discovery for
Analytics
Reporting
Data Marts
Operational
Data Store
SOA
The broadening scope of analytics
Master Data
Management
Hub
Applications
Data
Warehouse
Pattern
Discovery for
Analytics
Reporting
Data Marts
Hadoop
Operational
Data Store
Hadoop provides cheap storage and processing to increase
the amount of data – and the type of data that can be
processed in a cost-effective manner.
Customer
Conversations,
Web,
Social Media,
Log files, …
Sensors and
SOA
The broadening scope of analytics
Master Data
Management
Hub
Applications
Data
Warehouse
Pattern
Discovery for
Analytics
Hadoop
Operational
Data Store
SAND BOXESAnalyze
Values
Search
For Data
Reporting
Data blues & skills issues
A disproportionate portion of the time spent in analytics project is about data
preparation: acquiring/preparing/formatting/normalizing the data
Business scenarios we see
Subject matter experts want access to their organization’s data to explore the content,
select, control, annotate and access information using their terminology with an
underpinning of protection and governance.
Data Scientists seeking data for new
analytics models.
Marketeer seeking data for new
campaigns.
Fraud investigator seeking data to
understand the details of suspicious
activity.
•
Day-to-day activity.
•
Requiring
ad hoc access
to a wide variety of data
sources.
•
Supporting analysis and
decision making.
•
Using the
subject matter
experts terminology
.
The interesting dilemma …
A man goes into a jewellers and buys an expensive watch …
•
Is it fraud – in which case the bank must stop it
•
Is it money-laundering – in which case the bank must report it
•
Does he have an expensive trophy partner – in which case perhaps he would be
interested in a loan?
•
Has he just won the lottery – should the bank improve the services offered?
Threat
Obligation
Opportunity
Opportunity
The same event is of interest by different departments.
There is major overlap in the data required to answer the question.
It may not be possible to determine the answer with just the information in the channel
- Previous or subsequent activity is required
Application Groupings
Characterised by:
Availability
Data requirements
Performance
Skills
Rate of change / Stability
Systems of
Engagement
Systems of
Record
Systems of
Insight
A growing demand …
Business Teams want
•
Open access to more information
•
More powerful analysis and visualization tools
IT Teams are
•
Concerned about cost.
T
HE
D
ATA
R
ESERVOIR
Access in place
Up-to-date information
Cost-effective
Slower access path
•
Remote Access
•
Reformatting
Make a local copy
Specially formatted for use
case
Local access
Local control
Local cost
Potentially stale values
How do we access information?
How much information? How rapidly is it changing? How frequently is it
accessed? How much transformation is required to consume the
information? When is the information available? Who owns the information?
How easily can it be changed?
How does the data reservoir support analytics development?
Advertise
Data Reservoir
Catalog
How does the data reservoir support analytics development?
Advertise
Discover
Data Reservoir
Catalog
Provision
1
2
3
4
How does the data reservoir support analytics development?
Advertise
Discover
Explore
Data Reservoir
Catalog
1
2
3
5
4
Sandbox
How does the data reservoir support analytics development?
Advertise
Discover
Explore
Deploy
Data Reservoir
Catalog
Provision
1
2
3
5
6
4
Sandbox
Active decision making in real-time
1.
An activity occurs that calls for a
decision.
2.
The context from the activity is
past to the decision process.
3.
The decision process augments
the context with stored
information and runs the
decision model.
4.
One of more actions are
recommended to the activity.
Context
Action
Decision
Feedback
Information
2
3
5
Facts,
Recent Events,
Options
Decision Input,
Actions and
Outcomes
3
5
How does the data reservoir support data distribution?
Data Reservoir
Catalog
Provision
1
Access
3
Distribute
2
Big Data Lakes or Swamps?
As we collect data
• Can we preserve clarity?
• Do we know what we are collecting?
• Can we find the data we need?
Are we creating a data swamp?
How do we build trust in big data?
"The need for increased agility and accessibility for data analysis is the primary
driver for data lakes," said Andrew White, vice president and distinguished
analyst at Gartner. "Nevertheless, while it is certainly true that data lakes can
provide value to various parts of the organization, the proposition of enterprise
wide data management has yet to be realized."
The Data Reservoir
Information Management and Governance Fabric
Data Reservoir Services
Data reservoir connects to many types of systems
Line of Business Applications Information Service Calls Search Requests Report Requests Deploy Decision Models Information Service Calls Data Access Deploy Real-time Decision Models Data Reservoir Operations Curation Interaction Management Data Access Data Deposit Data Deposit Decision Model Management Enterprise ITNew Sources
Third Party Feeds Third Party APIs Internal Sources Deploy Real-time Decision Models
Consumers of
Insight
Analytics Tools
Simple, ad hoc Discovery and Analysis Reporting Analytical Insight ApplicationsSystem of
Record
Applications
E nte rpris e S erv ic e B usSystems of
Engagement
Other Systems
Of Insight
Other Data
E nte rpris e S erv ic e B us Events to Evaluate Information Service Calls Data Out Data In Notifications
Data reservoir supports real-time and batched ingestion of data
Line of Business Applications Information Service Calls Search Requests Deploy Decision Models Information Service Calls Data Access Deploy Real-time Decision Models Curation Interaction Data Deposit Data Deposit Decision Model Management Deploy Real-time Decision ModelsAnalytics Tools
M
ANUALR
EQUESTI
NFORMATIONS
ERVICEC
ALLC
HANGED
ATAC
APTURES
CHEDULEDE
XTRACTEnterprise IT
New Sources
Third Party Feeds
Simple, ad hoc Discovery and Analysis Reporting Analytical Insight Applications
System of
Record
Applications
E nte rpris e S erv ic e B usSystems of
Engagement
E nte rpris e S erv ic e B us Events to Evaluate Information Service Calls Data Out NotificationsData refineries provide data movement, preparation, governance
Line of Business Applications Information Service Calls Search Requests Report Requests Deploy Decision Models Information Service Calls Data Access Deploy Real-time Decision Models Data Reservoir Operations Curation Interaction Management Data Access Data Deposit Data Deposit Decision Model Management Enterprise ITOther Systems
Of Insight
New Sources
Third Party Feeds Third Party APIs Internal Sources
Other Data
Deploy Real-time Decision Models Understand Information SourcesData Reservoir
Repositories
Consumers of
Insight
Analytics Tools
Simple, ad hoc Discovery and Analysis Reporting Analytical Insight ApplicationsSystem of
Record
Applications
E nte rpris e S erv ic e B usSystems of
Engagement
Events to Evaluate Information Service Calls Data Out Data In Notifications Enterprise IT Interaction Service Interfaces Data Ingestion Publishing Feeds Continuous Analytics STREAMING ANALYTICS EVENT CORRELATIONBig data needs a variety of repositories for cost, access and
performance reasons
Line of Business Applications Information Service Calls Search Requests Deploy Decision Models Information Service Calls Data Access Deploy Real-time Decision Models Curation Interaction Data Deposit Data Deposit Decision Model Management Enterprise ITSystem of
Record
Applications
E nte rpris e S erv ic e B usNew Sources
Third Party Feeds
Systems of
Engagement
Deploy Real-time Decision ModelsAnalytics Tools
View-based Interaction Published SAND OBJECT CACHE Simple, ad hoc Discovery and Analysis Reporting Analytical Insight Applications Events to Evaluate Information Service Calls Data Out Notifications Data Res erv oi r Repo s itori Descriptive Data INFORMATION VIEWS CATALOG Shared Operational Data ASSET HUB ACTIVITY HUB CODE HUB CONTENT HUB Deposited Data HistoricalData AUDITDATA
OPERATIONAL
HISTORY
SEARCH
INDEX
All types of data
All types of data
System-level Data
(Pre-Archive)
Master and Reference
Like a well-run library, the data reservoir has a catalog
Line of Business Applications Information Service Calls Search Requests Report Requests Deploy Decision Models Information Service Calls Data Access Deploy Real-time Decision Models Data Reservoir Operations Curation Interaction Management Data Access Data Deposit Data Deposit Decision Model Management Enterprise IT Deploy Real-time Decision Models Understand Information Sources Understand Information Sources UnderstandCompliance ComplianceReport Advertise
Information Source
Governance, Risk and Compliance Team Information Curator Catalog Interfaces
Consumers of
Insight
Analytics Tools
View-based Interaction Published SAND BOXES REPORTING DATA MARTS OBJECT CACHEOther Systems
Of Insight
New Sources
Third Party Feeds Third Party APIs Internal Sources
Other Data
Simple, ad hoc Discovery and Analysis Reporting Analytical Insight ApplicationsSystem of
Record
Applications
E nte rpris e S erv ic e B usSystems of
Engagement
Events to Evaluate Information Service Calls Data Out Data In Notifications Data Res erv oi r Repo s itori es Harvested Data INFORMATION WAREHOUSE Descriptive Data INFORMATION VIEWS CATALOG Shared Operational Data ASSET HUB ACTIVITY HUB CODE HUB CONTENT HUB Deposited Data Historical Data DEEPDATA AUDITDATA OPERATIONAL HISTORY SEARCH INDEX OFFLINE ARCHIVEDiffering user perspectives
Information Governance
Catalogue
Search for, locate and download
data and related artifacts.
Provision Sand
Boxes.
Add additional insight into
data sources through
automated analysis.
Sand
Box
Define governance policies,
rules and classifications.
Monitor compliance.
View lineage (business and technical)
and perform impact analysis.
Data Reservoir
Information governance provides the mechanism for building trust
Line of Business Applications Information Service Calls Search Requests Report Requests Deploy Decision Models Information Service Calls Data Access Deploy Real-time Decision Models Data Reservoir Operations Curation Interaction Management Data Access Data Deposit Data Deposit Decision Model Management Enterprise IT
Other Systems
Of Insight
New Sources
Third Party Feeds Third Party APIs Internal Sources
Other Data
Deploy Real-time Decision Models Understand Information Sources Information Integration & Governance INFORMATION BROKER OPERATIONAL GOVERNANCE HUB CODE HUB WORKFLOWSTAGINGAREAS MONITOR GUARDS
Consumers of
Insight
Analytics Tools
Simple, ad hoc Discovery and Analysis Reporting Analytical Insight ApplicationsSystem of
Record
Applications
E nte rpris e S erv ic e B usSystems of
Engagement
Events to Evaluate Information Service Calls Data Out Data In NotificationsInformation governance delivers …
Information Governance
Compliance
Policy
Administration
Policy
Enforcement
Policy
Monitoring
Policy
Implementation
Standards
Protection
Lifecycle
Quality
Information Values
Quality
Information
Dependencies
Information
Requirements
Information Supply
Chain Integrity
Information
Identification
Information
Retention
Information
Usage
Information
Privacy
Information
Architecture
Information
Disposal
Policy
Three interlocking lifecycles of information governance
Policy
Policy
Policy
Operations
Development
MetadataClassification Schemes
Classification is at the heart of information governance. It characterizes the type, value and
cost of information, or the mechanism that manage it. The design of the classification
schemes is key to controlling the cost and effectiveness of the information governance
program.
•
Business Classifications
•
Business classifications characterize information from a business perspective. This captures its
value, how it is used, and the impact to the business if it is misused.
•
Resource Classifications
•
Resource classifications characterize the capability of the IT infrastructure that supports the
management of information. A resource's capability is partly due to its innate functions and partly
controlled by the way it has been configured.
•
Activity Classifications
•
Activity classifications help to characterize procedures, actions and automated processes.
•
Semantic Classification
Policy support inside the Information Governance Catalogue
Principle
Policy
Implications
Classification
Classification
Governance
Rule
Classified by
Deployed to,
Executed by,
Monitored by
Actioned by
Metadata
Description
Governance Rule
Implementations
Governance Rule
Implementations
Modelled Metadata
Asset
Principle
Policy
Implications
Principle
Policy
Implications
Governs
Information
Asset
Describes
Implemented by
Data Reservoir
Line of Business Applications Information Service Calls Search Requests Deploy Decision Models Information Service Calls Data Access Deploy Real-time Decision Models Curation Interaction Data Deposit Data Deposit Decision Model Management Enterprise IT Events to Evaluate Information Service Calls Notifications System of Record Applications E n te rp ris e S e rv ic e B u s New Sources Systems of Engagement Deploy Real-time Decision Models Understand Information Sources Understand Information Sources UnderstandCompliance ComplianceReport Advertise
Information Source
Governance, Risk and Compliance Team Information Curator Simple, ad hoc Discovery and Analysis Analytical Insight Applications
Governance Rules
Defined for each classification for each situation
Personal information
masked here
Personal information
masked here
Analytics ToolsSensitive information
masked here
Integrated Metadata
Data Lineage (Traceability)
•
Where does this data come from?
•
Why is this data incorrect?
•
Why is this data incomplete?
•
Can I trust this value?
Impact Analysis
•
Where is this element used?
•
What happens if I change this?
Optimization
•
Where is the redundancy?
•
How can I make this run more efficiently?
Understanding
•
What does this mean?
•
How is this used?
Control
•
Why is this parameter set to this value?
•
Who made this change?
The Information Governance Ecosystem
Information Governance is built on metadata management,
Policy and
Standards
Information
Refineries
Exception
Management
Reporting and
Audit
Information
Curation
Information is delivered in appropriate forms for consumers
Line of Business Applications Information Service Calls Search Requests Report Requests Deploy Decision Models Information Service Calls Data Access Deploy Real-time Decision Models Data Reservoir Operations Curation Interaction Management Data Access Data Deposit Data Deposit Decision Model Management Enterprise ITOther Systems
Of Insight
System of
Record
Applications
E nte rpris e S erv ic e B usNew Sources
Third Party Feeds Third Party APIs
Systems of
Engagement
Internal Sources Deploy Real-time Decision Models Understand Information Sources Understand Information Sources UnderstandCompliance ComplianceReport Advertise
Information Source
Governance, Risk and Compliance Team Information Curator Catalog Interfaces Raw Data Interaction SAND BOXES
Data Reservoir
Repositories
View-based Interaction Access and FeedbackOther Data
Consumers of
Insight
Analytics Tools
Published SAND BOXES REPORTING DATA MARTS OBJECT CACHE Simple, ad hoc Discovery and Analysis Reporting Analytical Insight Applications Events to Evaluate Information Service Calls Data Out Data In NotificationsInformation Virtualization hides the complexity of the information
landscape
Search and
View Values
Add
Insight
Create
APIs
Browse
Sources
Provision
Define
Views
Analyze
Values
Information Virtualization
Building a data reservoir
The data reservoir needs governance and change management to ensure that
information is protected and managed efficiently.
The first step in creating the reservoir is to establish the information
integration and governance components, the staging areas for integration, the
catalog, the common data standards.
The build out of the reservoir then proceeds iteratively based on the following
processes:
•
Governance of a data reservoir subject area.
•
Managing an information source.
•
Managing an information view.
•
Enabling analytics.
•
Maintaining the data reservoir infrastructure.
Information Integration & Governance INFORMATION BROKER OPERATIONAL GOVERNANCE HUB CODE HUB WORKFLOW
Data reservoir logical architecture
Line of Business Applications Information Service Calls Search Requests Deploy Decision Models Information Service Calls Data Access Deploy Real-time Decision Models Curation Interaction Data Deposit Data Deposit Decision Model Management Enterprise IT Events to Evaluate Information Service Calls Data Out NotificationsNew Sources
Third Party Feeds
Deploy Real-time Decision Models Understand Information Sources Understand Information Sources Understand
Compliance ComplianceReport Advertise
Information Source
Governance, Risk and Compliance Team Information Curator Catalog Interfaces Raw Data Interaction SAND BOXES Enterprise IT Interaction Service Interfaces Publishing Feeds Continuous Analytics STREAMING ANALYTICS Simple, ad hoc Discovery and Analysis Reporting Analytical Insight Applications
Analytics Tools
View-based Interaction Access and Feedback Published SAND OBJECT CACHESystem of
Record
Applications
E nte rpris e S erv ic e B usSystems of
Engagement
EVENT CORRELATION Data Res erv oi r Repo s itori Descriptive Data INFORMATION VIEWS CATALOG Shared Operational Data ASSET HUB ACTIVITY HUB CODE HUB CONTENT HUB Deposited Data HistoricalData AUDITDATA
OPERATIONAL
HISTORY
SEARCH
The data reservoir
As organizations experiment with
analytics they discover:
•
Creating new analytics requires access to
historical data from many systems.
•
This data includes valuable and sensitive
data that is core to the organization’s operation.
•
Hadoop is a flexible platform for storing many types of data but is not necessarily fast
enough for the production deployment of some analytics. Data needs to be
reformatted and copied onto a specialist analytics platforms such as Netezza.
A data reservoir provides:
•
Single extraction of data from operational systems and distribution to multiple
analytics platforms.
•
Cataloguing and governance of the data in the analytics platforms
•
Simple interfaces for the line of business to access the information they need.
Data Reservoir = Efficient Management, Governance, Protection and Access.
Data ReservoirInformation Management and Governance Fabric
Data Reservoir Services
z
z
z
z
z
z
zP
RODUCT
M
APPINGS
Systems interfacing with the Data Reservoir
System/Subsystem Name Description
Mobile and other channels These are operational applications that support the interaction with people such as customers, suppliers, employees. The data reservoir may supply key data values and analytical insight to a high-speed cache for these applications to improve the performance of simple lookups. The data reservoir is able to refresh this cache after an outage.
System of Record Applications These are operational applications that are driving anorganization’s daily business. They supply information to the data reservoir that describes this daily operation and its associated master data. They receive analytical insights and other derived information such as micro-segmentation and alerts.
New Sources New sources describe information outside of the business data managed by the system of record applications. This may be log files from customer interactions, or information from third parties such as social media services and data providers. Other Data Lakes This data reservoir may be exchanging information with other data lakes, swamps or reservoirs either owned by this
organization, or part of a cloud deployment or owned by an external party.
Decision Model Management Decision model management describes the systems used by data scientists and business analysts as they configure analytics models and rules to execute inside the data reservoir. This is where the advanced analytics and data mining is managed from. The team need access to samples of the data, formatted for analysis tools, with sufficient performance capacity to handle intense, lumpy workloads from the mining and testing processing.
Information Curator An individual or group of people in the organization that have information sources to share.
Data Reservoir Services Components Summary
Component Description Product Pattern
Data Ingestion
Data ingestion is where data from the information sources is loaded into the data reservoir. This data is treated as reference data (read only) by the processes in the data reservoir. The data ingestion component is responsible for validating the incoming data, transforming relevant structured data to the data reservoir format and routing it to the appropriate data reservoir repositories.
InfoSphere Information Server Information Broker
Publishing Feeds
Publishing feeds is responsible for distributing data from the data reservoir repositories to systems outside of the reservoir. This includes other data reservoirs and the operational systems of record.
InfoSphere Information Server Information Broker
Real-time Interfaces
Real-time interfaces (a) provide services to access data in the data reservoir repositories and (b) provide real-time interfaces for querying data outside of the reservoir. These interfaces may be services or SQL style interfaces.
InfoSphere MDM, Information Server (Information Services Director)
Information Service
Real-time Analytics
Real-time analytics provides complex event processing and real-time analytics based on the activity within the organization, and externally.
InfoSphere Streams Streaming Analytics Node
Raw Data Interaction
Raw data interaction provides access to most of the data (security permitting) in the data reservoir for advanced analytics. It is responsible for masking sensitive personal information where appropriate.
InfoSphere Information Server; GaianDB/InfoSphere Federation Server
InfoSphere Big Insights
Information Provisioning
Catalog Interfaces
The catalog interfaces provide information about the data in the data reservoir. This includes details of the information collections (both repositories and views), the meaning and types of information stored and the profile of the information values within each information collection.
InfoSphere Information Server (InfoSphere Governance Catalog) Information Identification View-based Interaction
Provides access to data in the data reservoir (subject to security permissions) for line of business teams that wish to perform ad hoc queries, search, simple analytics and data exploration. The structure of this information has been simplified and it is labeled using business relevant terminology.
InfoSphere Information Server; GaianDB/InfoSphere Federation Server;
InfoSphere Data Explorer
Information Service; Information Provisioning; Search Node Reporting Data Marts
The reporting data marts provide departmental/subject oriented data marts targeted at line of business reports.
Data Reservoir Repositories –
Harvested Data, Descriptive Data and Deposited Data
Type Name Description Product Pattern
Harvested Data
Operational History
A repository providing a historical record of the data from the systems of record. Database Operational Status Node Information
Warehouse
A repository optimized for high speed analytics. This data is structured and contains a correlated and consolidated collection of information.
PureData for Analytics; Industry Models
Information Warehouse Deep Data A repository holding a copy of most of the data in the data reservoir. It provides
a place where raw data can be landed for analysis. The data may be annotated, linked and consolidated in deep data. Data may be mapped to data structures after it is stored so effort is spend as needed rather than at the time of storing. This repository is designed for flexibility, supporting both for high volumes and variety of data.
InfoSphere Big Insights; Industry Models
Map-Reduce Node
Audit Data A repository used to keep a record of the activity in the data reservoir. It is used for auditing the use of data and who is accessing it, when and for what purpose.
InfoSphere Big Insights Information Event Node
Descriptive Data
Catalog A repository and applications for managing the catalog of information stored in the data reservoir.
InfoSphere Information Server; Industry Models
Information Identification Information
Views
Definitions of simplified subsets of information stored in the data reservoir repositories. These views are created with the information consumer in mind.
Relational Database;
InfoSphere MDM; InfoSphere
Virtual Information Collection
Data Reservoir Repositories –
Shared operational data
Type Name Description Product Pattern
Shared Operational Data
Asset Hub A repository for slowly changing operational master data (information assets) such as customer profiles, product definitions and contracts. This repository provides authoritative operational master data for the time interfaces, real-time analytics and for data validation in data ingestion. It is a reference repository of the operational MDM systems but may also be extended with new attributes that are maintained by the reservoir. When this hub is taking data from more than one operational system, here may also be additional quality and deduplication processes running that will improve the data. These changes are published from the asset hub for distribution both inside and outside the reservoir. InfoSphere MDM Advanced Edition Information Asset and Information Asset Hub
Activity Hub A repository for storing recent activity related to a master entity. This repository is needed to support the real-time interfaces and real-time analytics. It may be loaded through the data ingestion process and through the real-time interfaces. However, many of its values will have been derived from analytics running inside the data reservoir.
InfoSphere MDM Custom Domain Hub; Industry Models
Information Activity and Information Activity Hub Code Hub A repository of common code tables and mappings used for joining information
sources to create information views.
InfoSphere Reference Data Management Hub (RDM)
Information Code and Information Code Hub
Content Hub A repository of documents, media files and other content that has been managed under a content management repository and is classified with relevant metadata to understand its content and status.
Information integration and governance components
Name Description Product Pattern
Information Broker
The runtime server environment for running the integration processes (such as the information deployment process) that move data in and out of the data reservoir and amongst the components within the reservoir.
InfoSphere Information Server Information Broker
Code Hub A repository managing code tables and code table used in the internal management of the data reservoir.
InfoSphere Reference Data Management Hub (RDM) Information Code and Information Code Hub Staging Areas
A server supporting the staging areas used to move information around the data reservoir.
Database or InfoSphere Big Insights or WebSphere MQ
Staging Area
Operational Governance Hub
A repository and applications for managing the information flow and information governance within the data reservoir. This information node supports the metadata services.
InfoSphere Information Server Governance Node
Monitor A mechanism to monitor the overall function and responsiveness of the data reservoir to assure consistent working.
InfoSphere Information Server Information Probe and Information Monitoring Workflow A server running stewardship processes that coordinate the work of individuals
responsible for fixing any problems with the data in the data reservoir.
WebSphere Business Process Manager
Agile Information Process and
This component provides the control of the information movement and consumption within the data
reservoir (more details follows …)
R
EFERENCE
M
ATERIAL
Information Architecture for a New Era of Computing
A high level description of the
Big Data and Analytics
Taking the Journey to IBM Cognitive Systems
Describes how an organization
should prepare for cognitive
computing
Includes an example roadmap of
solutions to develop key skills and
capabilities.
Next Best Action Redguide
The NBA Redguide is a customer guide to the solution.
It is suitable for the C-suite executives.
It explains the value of the solution.
It describes the solution’s architecture using the same
diagrams as we have just covered.
It also has examples of case studies from different
industries.
Ethics for Big Data and Analytics
Context –
for what purpose was the data originally surrendered? For
what purpose is the data now being used? How far removed from the
original context is its new use?
Consent & Choice
– What are the choices given to an affected
party? Do they know they are making a choice? Do they really
understand what they are agreeing to? Do they really have an
opportunity to decline? What alternatives are offered?
Reasonable
– is the depth and breadth of the data used and the
relationships derived reasonable for the application it is used for?
Substantiated –
Are the sources of data used appropriate,
authoritative, complete and timely for the application?
Owned
– Who owns the resulting insight? What are their
responsibilities towards it in terms of its protection and the obligation
to act?
Fair
– How equitable are the results of the application to all
parties? Is everyone properly compensated?
Considered
– What are the consequences of the data collection and
analysis?
Access
– What access to data is given to the data subject?
Accountable
– How are mistakes and unintended consequences
detected and repaired? Can the interested parties check the results
http://www.ibmbigdatahub.
com/whitepaper/ethics-big-data-and-analytics
Staying Ahead in the Cyber Security Game
http://www-
01.ibm.com/common/ssi/cgi-bin/ssialias?subtype=WH&infotyp
e=SA&appname=SWGE_TI_SE_US
EN&htmlfid=TIL14103USEN&attac
hment=TIL14103USEN.PDF#loade
d
Industry Models and Big Data
Whitepaper on the use of our
industry models with big
Roles within the Reservoir
Governor
; appoint an individual to coordinate the definition of policies related to information governance and their
implementation.
Information Steward
; appoint an individual to coordinate the manual activity necessary to monitor and verify that an
information collection is meeting agreed quality levels. Create user interfaces and access rights to involve this individual in
information quality processes such as the exception management process.
Quality Analyst
; appoint an individual to monitor and analyze the state of the information flowing through the information
supply chain.
Integration Developer
; maintaining the data movement functionality in, around and out of the data reservoir.
Infrastructure Operator
; appoint an individual responsible for starting, maintaining, and monitoring the systems that
support the information supply chain.
10001 01011 01101