© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Data Big and Small: How Publisher gain Value out of Data in
the Future
Agenda
Introduction De Gruyter, Newbooks & MarkLogic
Big Data Definition
Data Big and Small
Publishers perspective: Challenge, Approach & Solution
DeGruyter, Use Case: Internal Data Analysis
Innovative, agile responses to the challenges of tomorrow
Only a holistic view on publishing and a subsequent strategy will satisfy
customers’ new demands
The very early days of Big Data
Big Data = One C and Three V
´
s
Volume
Variety
Velocity
Complexity
How did we get here?
The world as it was…
Data: Regular & Tabular
Compute & Storage:
Slow & Expensive
EMP
EMPNO
ENAME
DEPTNO
7782
CLARK
10
7934
MILLER
10
7876
ADAMS
20
7902
FORD
20
7900
SMITH
30
We had an elegant model that met our needs…
DEPT
DEPTNO
DNAME
10
ACCOUNTING
20
RESEARCH
30
SALES
40
SHIPPING
Fast forward to today…
33% on
Innovation &
Growth
67% on
Maintenance
(keeping the lights on)
We end up with the wrong technology for the job
When all you have is a hammer,
everything looks like a nail…
The Three V’s of Big Data
VOLUME
VARIETY
VELOCITY
IT faces the challenge leveraging both:
Heterogeneous and Unstructured Data
OLTP
Warehouse
Data
Marts
?
Reference
Data
Archives
12% Structured
88% Unstructured
The Three V’s of Big Data
VOLUME
VARIETY
Variety
More of the same things
Lots of different things
SOURCES
QUESTIONS
FORMATS
SHAPE
The result…
The Big and Small
Data Opportunity:
It is possible to utilize
all
your
data in a cost effective way and
realign for the future?
CHALLENGE, APPROACH
& SOLUTION
Information Continuum
RDBMS
Free text
Relational
Emails
Documents
XML
Metadata / Onix
Content
Geospatial
Graph
Search Engine
Volume of
Information
Today
´
s Data Landscape
Challenge: Things publishers need to deal with
Different file formats and schemas (XML, CSV, Excel, Binaries)
Different information transfer technologies (REST/SOAP, MBS, file ex-/
imports, File transfer protocols)
Growing data amounts = scalibility
Current situation: acquire knowledge of your data streams
desired situation: data driven decision making
Information and System
Flow Chart
Approach: What do publishers need?
more systems to cover all our requirements
= increasing costs, maintenance, support
= further interfaces, mid-/long term integration
= inflexible, not agile
OR
one target system that allows interdisciplinarily reporting and offers
NoSQL technologies?
MarkLogic Enterprise NoSQL Database
SEARCH
DATABASE
Semantics
Solution: NoSQL Database
- don‘t worry about your data
- different applications for different target groups
- XML-oriented searches, queries and indexes
- manage large volumes
De Gruyter uses MarkLogic for:
- De Gruyter Online platform
Use Case
Correlation of usage and sales (1/2)
2 exemplary questions
•
Find out what is being used but not sold?
•
What is being sold but not used?
=> early insight into customers behavior; allows business to react
accordingly and address customers
Increasing the attractiveness of a publications requires a better
understanding of the connection between sales and usage
Use Case
Correlation of usage and sales (2/2)
Source 1: Usage statistics of De Gruyter online
gives an overview of database, book (chapter) and journal
(article) usages
Source 2: Sales Figures From Data warehouse
contains sales overall statistics: webshop, mail/telephone
order (customer service)
Group Question
Think about a „Use Case“ which will allow you to break up data silos
to do ad-hoc analysis of heterogenous datatypes coming from various
sources that can be usefull for your business
-
What kind of data?
-
What data sources?
-
Questions that take a lot of effort to find answers to?
-
Which decision making process can this data / answer support?
-
What new insights can be derived from this?