Business Analytics – The Key
Element in Enterprise Big Data
Initiatives
Mike Biere
Rocket Software
March 10, 2014 1:30pm – 2:30pm
Session Number (15314)
Grand ballroom Salon G
Who wants to define Big Data?
Are you tired of hearing about Big Data?
Roughly from Wikipedia: Data whose magnitude cannot be handled by
conventional tools.
The impression you may have from many articles: Big Data exists only in
the context of Hadoop
The emerging consensus is that Big Data is any data source that fits the
magnitude criteria where magnitude means Volume, Velocity, Variety, and
Complexity
We define Big data as Hadoop and non-Hadoop Big Data.
Big Data … what? What?
Big Data … big deal?
•
Big Data means numerous things to many people but at
the heart of it is the need to coalesce and examine a
massive range of information.
•
In order to do so, the common approach is to find a
business analytics tool to process the widest range of
sources possible:
•
RDBMS
•
OLAP
•
Text
•
Hadoop etc.
•
Other …
•
If you cannot process the data all you have done is increase
your storage requirements and end user frustration.
3
Data Data
Data Data
Challenges
• IT
• What sources to address
• What BIA technologies support them
• How much ‘stuff’ will we need to support BD? Staff, HW, SW, consulting?
• Business users
• What out of this tsunami of data makes any sense for me to try and access it?
• What are the business drivers for my efforts?
• What is the ROI?
• Solution providers – that would be me
• How on earth can we maintain what we have and yet deliver more
feature/function in the BD space?
• What to our customers believe to be the analytic requirements for their BD
efforts?
• How much BD are we going to have to analyze and, if huge, can we do this
efficiently?
Hadoop
Hadoop invented at Yahoo to deal with storage and retrieval in 2002 Hadoop grew to include Google’s MapReduce process
Hadoop has evolved since is used in many organizations with a big focus on non-structured non-RDBMS data.
http://wiki.apache.org/hadoop/PoweredBy
Hadoop handles High Velocity, High Volume, Large Variety, and Complexity
Hive
“Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language“ … Hive was invented at Facebook.
http://hive.apache.org/
IBM InfoSphere Biginsights
IBM InfoSphere BigInsights brings the power of Hadoop to the enterprise. BigInsights is the IBM implementation of Hadoop.
http://www-01.ibm.com/software/data/infosphere/biginsights/
6
Most BIA vendors have heavily concentrated here
Big Data challenges for a BIA provider
•
We cannot ignore existing data access and analyses (e.g. DB2, Oracle,
IMS, OLAP, text, etc.
•
As technologies emerge, new solutions often offer a BIA UI that
addresses mostly what they are currently doing and sometimes ignore the
existing BIA realm.
•
The data layer topology is critical such that all data must “look the same”
to an end user
•
Metadata layers are critical to both IT and the end user to mask, change,
present the information uniformly
•
Federation of data is essential if and when it makes sense
•
Caching data from all sources must be an option
•
Refreshing vs cache
Rocket’s commitment to Big Data
follows closely the IBM path as well as
external influencers
•
Hadoop and more Hadoop
•
Keeping an eye on what’s next
•
Ensuring our data and metadata structures are malleable
•
and flexible
•
Continued work on structured and unstructured
(non-Hadoop-ish) sources
•
Enhancing data federation capabilities
•
Maximizing efficiencies and reduction of overhead as the
data volumes continue to accelerate
•
Let’s look at what we’ve done with the
QMF family for Big Data
Once the data is returned to QMF, it can be used in any QMF Report or
visualization
.
QMF functions with Hadoop and
non-Hadoop Big Data
QMF 11 supports Hive data sources, and the
underlying Hadoop data
QMF 11 supports IBM InfoSphere
BigInsights with direct access using the
BigSQL driver as an alternative to Hive
DB2 11 supports IBM InfoSphere
BigInsights with direct access using the
HDFS_READ table function
QMF 11 with the DB2 Analytics Accelerator
DB2 for z/OS holds data of high Volume, and requires queries of great Complexity and is one of the Variety of data sources in the Big Data world. With the addition of the DB2 Analytics Accelerator for z/OS the information can be delivered with Big Data Velocity
QMF 11 provides some control with the DB2
Analytics Accelerator
Semi-colon separated SQL statements allow the user to guide the DB2
Analytics Accelerator for any given query.
IBM IMS is clearly to be counted among the Variety of data sources in an enterprise
Big Data environment. http://www-01.ibm.com/software/data/ims
QMF 11 support for IMS and VSAM* files
IBM DB2 with BLU Acceleration
One of IBM’s newest entries in the Big Data world is the DB2 with
BLU Acceleration A simple upgrade of advanced editions of DB2 for Linux 10 or DB2 for AIX 10 to version 10.5 brings a software acceleration.
IBM DB2 with BLU Acceleration
One of IBM’s newest entries in the Big Data world is the DB2 with
BLU Acceleration A simple upgrade of advanced editions of DB2 for Linux 10 or DB2 for AIX 10 to version 10.5 brings a software acceleration.
JavaScript calls to http URLs that supply data.
A non-SQL capability: A JavaScript defined table can be created and will reside in the QMF table lists. When queried it will launch a request and return data in a typical result set. This data is available for any QMF report or visualization.
To understand how it works look at this DB2 query. It creates at table, populates it with a select statement, queries the table and then drops the table.
A JavaScript table is stored as a definition
of the table…
… and a request for data to populate and
optionally clean out old data…
It appears to the business user as just another
table.
On the horizon: IBM announces collaboration
between DB2 and MongoDB
http://event.on24.com/eventRegistration/EventLobbyServlet?target=lobby.jsp&eventid=644972&sessionid=1&key=DB062AB924C12 67CB960F5AB827DA366&eventuserid=85776810
QMF and Big Data Summary
Hadoop Structured New
IBM Biginsights DB2 for z/OS JavaScript tables
Hive/Hadoop Analytics Accelerator
IMS, VSAM DB2 BLU OLAP Other
So … does this align with your Big data initiative? What other sources do you need?