Data Big and Small: How Publisher gain Value out of Data in the Future

(1)

© COPYRIGHT 2015 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Data Big and Small: How Publisher gain Value out of Data in

the Future

(2)

Agenda



Introduction De Gruyter, Newbooks & MarkLogic



Big Data Definition



Data Big and Small



Publishers perspective: Challenge, Approach & Solution



DeGruyter, Use Case: Internal Data Analysis

(3)

Innovative, agile responses to the challenges of tomorrow

(4)

Only a holistic view on publishing and a subsequent strategy will satisfy

customers’ new demands

(5)

(6)

(7)

The very early days of Big Data

(8)

Big Data = One C and Three V

´

s

Volume

Variety

Velocity

Complexity

(9)

How did we get here?

(10)

The world as it was…

Data: Regular & Tabular

Compute & Storage:

Slow & Expensive

(11)

EMP

EMPNO

ENAME

DEPTNO

7782

CLARK

10

7934

MILLER

10

7876

ADAMS

20

7902

FORD

20

7900

SMITH

30

We had an elegant model that met our needs…

DEPT

DEPTNO

DNAME

10

ACCOUNTING

20

RESEARCH

30

SALES

40

SHIPPING

(12)

Fast forward to today…

33% on

Innovation &

Growth

67% on

Maintenance

(keeping the lights on)

(13)

We end up with the wrong technology for the job

When all you have is a hammer,

everything looks like a nail…

(14)

The Three V’s of Big Data

VOLUME

VARIETY

VELOCITY

(15)

IT faces the challenge leveraging both:

Heterogeneous and Unstructured Data

OLTP

Warehouse

Data

Marts

?

Reference

Data

VOLUME

VARIETY

(17)

Variety

More of the same things



Lots of different things

SOURCES

QUESTIONS

FORMATS

SHAPE

(18)

The result…

(19)

(20)

The Big and Small

Data Opportunity:

It is possible to utilize

all

your

data in a cost effective way and

realign for the future?

(21)

CHALLENGE, APPROACH

& SOLUTION

(22)

Information Continuum

RDBMS

Free text

Relational

PDF

Emails

Documents

XML

Metadata / Onix

Content

Geospatial

Graph

Search Engine

Volume of

Information

Today

´

s Data Landscape

(23)

Challenge: Things publishers need to deal with



Different file formats and schemas (XML, CSV, Excel, Binaries)



Different information transfer technologies (REST/SOAP, MBS, file ex-/

imports, File transfer protocols)



Growing data amounts = scalibility

Current situation: acquire knowledge of your data streams

desired situation: data driven decision making

(24)

Information and System

Flow Chart

(25)

Approach: What do publishers need?

more systems to cover all our requirements

= increasing costs, maintenance, support

= further interfaces, mid-/long term integration

= inflexible, not agile

OR

one target system that allows interdisciplinarily reporting and offers

NoSQL technologies?

(26)

(27)

MarkLogic Enterprise NoSQL Database

SEARCH

DATABASE

Semantics

(28)

Solution: NoSQL Database

- don‘t worry about your data

- different applications for different target groups

- XML-oriented searches, queries and indexes

- manage large volumes

De Gruyter uses MarkLogic for:

- De Gruyter Online platform

(29)

Use Case

Correlation of usage and sales (1/2)

2 exemplary questions

•

Find out what is being used but not sold?

•

What is being sold but not used?

=> early insight into customers behavior; allows business to react

accordingly and address customers

Increasing the attractiveness of a publications requires a better

understanding of the connection between sales and usage

(30)

Use Case

Correlation of usage and sales (2/2)

Source 1: Usage statistics of De Gruyter online

gives an overview of database, book (chapter) and journal

(article) usages

Source 2: Sales Figures From Data warehouse

contains sales overall statistics: webshop, mail/telephone

order (customer service)

(31)

Group Question

Think about a „Use Case“ which will allow you to break up data silos

to do ad-hoc analysis of heterogenous datatypes coming from various

sources that can be usefull for your business

-

What kind of data?

-

What data sources?

-

Questions that take a lot of effort to find answers to?

-

Which decision making process can this data / answer support?

-

What new insights can be derived from this?

(32)