• No results found

designing and exploiting BIG DATA PLATFORM BIG DATA PLATFORM SQL

N/A
N/A
Protected

Academic year: 2021

Share "designing and exploiting BIG DATA PLATFORM BIG DATA PLATFORM SQL"

Copied!
21
0
0

Loading.... (view fulltext now)

Full text

(1)

designing and exploiting

BIG DATA PLATFORM

SQL

BIG

DATA

(2)

010101010101010101001 1010100100101010100011101001001010101010100010101010101010100101010101010100101110 10101010101010101001010101 0011001010100100101010100011101001001010101010100010101010101010100101010101010100 1011101010101010010101010011001010100100101010100011101001001010101010100010101010 101010100101010101010100101110101010101001010101 0011001010100100101010100011101001001010101010100010101010101010100101010101010100 101110101010101001010101 0011001010100100101010100011101001001010101010100010101010101010100101010101010100 101110101010101001010101 0011001010100100101010100011101001001010101010100010101010101010100101010101010100 101110101010101001010101 0011001010100100101010100011101001001010101010100010101010101010100101010101010100 101110101010101001010101

Purpose: Capturing

all

business data and powering ‘smart

applications’ that turn data into profits

• • • • • •

• • • • • •

(3)

Smart Applications turn data into profits by delivering…

• •• • •• • •• • ••

%

…the most relevant

LOYALTY offers for EACH

CLIENT…

…the best PRODUCT

ASSORTMENT for each

store all the time..

• ••

…the right amount of

INVENTORYin each store

every day…

• ••

• ••

%

…the right check-out

COUPON for each client

in real time… …the BEST PRICE for

each product at all times…

…the most RELEVANT 1-TO-1 COMMUNICATION

at all times…

…the most engaging MOBILE

CONTENT or COUPON for

each client…

…the most RELEVANT

WEBSHOP CONTENT for

each client…

• ••

BIG DATA PLATFORM

(4)

Why BIG DATA PLATFORM?

Cost

Big Data Platform

SQL

BIG

Capacity

BIG DATA PLATFORM

SQL

Speed

BIG DATA PLATFORM

SQL

Capability

BIG DATA PLATFORM

SQL

Profitability

BIG DATA PLATFORM

SQL

SQL (status quo)

Never lose or dump data again

Decrease latency from hours to seconds

1/100

th

of SQL’s cost

Massively increase the depth of information captured

Generate > 10€ of profits for every 1€ of cost

(5)

Acting as a bridge between

Transactions

and

Decisions

• • • • • • • • • • • • Supply Chain Marketing Operations

… all relevant data sources…

Smart applications exploiting the data

Making operational decisions

High degree of automation

High ROI Strong profitability

Transactional world

Data

Decisional World

Unlimited capacity Event Sourcing Agility & Resilience Very low cost

(6)

Removing the performance limitations of SQL

Unlimited storage

: Close to unlimited, affordable storage

True availability:

Stable and high performance no matter how much data

is stored and how many applications are pulling data

Preserving the full history of your data

: Record every change to your

data and enable much smarter decision (Event sourcing)

High agility:

Rapid iterations over the data

Simplicity:

Data is documented and truly easy to consume

Serviceability:

Emphasize on self-serving patterns while avoiding table

clutter and cryptic fields

Rapid & simple implementation:

Scoping and implementation in weeks

(7)

Smart Apps take operational decisions and create ROI

The Data Platform has the purpose to serve “smart apps”: Applications that deliver intelligence over the data made available.

Examples:

Choosing the right amount of inventory for each reference in each store.

Choosing the right price for each reference in each store.

Choosing the ‘right’ coupon for each loyalty card holder.

….

• •

Each questions is addressed by its own dedicated “smart app”. Theses apps can be developed internally or by 3rd parties. The Data Platform is built to offer the maximal agility in the process: ‘Plugging’ an app on top should be a matter of hours.

The goal for these Smart appsis to directly generate operation decisions(ex: adjusting a price, launching a re-order, choosing a coupon) that are plugged directly into the transactional system later on. Smart appsthat operate over the Data Platform should not be confused with Business Intelligence.

(8)

All

experts in the company are empowered to initiate

and use smart apps

The number of people or teams that are involved with these smart apps can be

very large:

The Data Platform makes it very

easy

to consume

clean

and

practical

data

.

The

performance

to access data is

both very stable and very high

.

Expertise that exists among all the teams of the retailers can be preserved and

leveraged.

The Data Platform is precisely designed to

not be the bottleneck

of

single data initiatives. The whole setup emphasizes self-service. After the initial

coaching, “consumer” teams are able to work autonomously (without the Data

Platform team).

(9)

The Pitfalls of classic data approaches

True’ granular intelligence requires

the full depth and history

of data for really

smart decisions.

Classic approaches in commerce fail mainly for 3 reasons:

Lack of agility (speed):

Being “data smart” implies being able to iterate

rapidly (weekly or even daily) over the data. Classic approaches are too

slow: The world changes faster than results get delivered.

Ongoing loss of information

: A lot of data is lost in subtle yet critical

ways due to the performance limits of classic solutions (e.g.: historical

inventory levels, sales history beyond x month etc.)

Lack of affordable scalability:

Classic solutions limit the amount of data

that can be stored and processed, while being extremely expensive.

“Unlimited” scalability should neither require expensive hardware, nor

require expensive teams to run.

(10)

A Custom Design insures 100% fit and max performance

Custom Design:

Why not use a big framework/application for all clients?

90% of the development effort attributable to the creation of connectors

to the various relevant data sources – This is always required.

No packaged framework is ever a perfect fit for your business. The result

is unnecessary complexity and an

impedance mismatch

: Teams end up

trying to recycle features that where never intended for that exact use.

A customization from the bottom up increases performance and ‘fit’

compared to a pre-packaged approach.

Example:

By adopting a

domain-specific

data format, it is possible to

store

and process 1 year of checkout receipts for a network of 1000 stores on a

smartphone

.

See http://w3.lokad.com/receiptstream

(11)

SQL implies adopting tabular storage and its subtle yet very strong limitations.

Example Stock-On-Hand History: In most ERPs, the SKU is associated with a table that matches SKU ID and SKU Stock-on-Hand. However, the history that has led to this stock-on-hand situation is lost. Many important questions can therefore not be addressed:

What was the stock on hand at any point in time?

What was the list of stock corrections applied to the SKU over time?

How many units have been discarded because the reached the expiration date? What was the true service level of a product over time?

Example Price Change: Instead of just entering a new price X for a SKU/location, it becomes possible to capture “Price moved to X because competitor just lowered to Y”. This type of information is hugely valuable to build smart apps and create ROI.

Event Sourcing (I): Capture

Much

More…

In SQL, only the present stage of each data field is preserved. The history of each data point is however lost.

When using Event Sourcing, all tables are replaced by a single (potentially very long) list of events. The whole path that has led to the present situation is recorded. Each event can truly capture the intent behind each data change.

(12)

SQL is by design very slow on querying large datasets.

The event sourcing approach consumes the events as they come in, and the result is always up to date. The result is an extremely low latency. See example ‘Loyalty data’ at the end of the presentation.

Scalability & Cost: Storing billions of events is simple, and extremely well-suited for cloud storage. In practice monthly storage cost is approximately 1/100th of the cost of SQL. Storage cost example

1 TB of data : 100€ /month instead of 10.000€ /month in SQL storage

… at

a fraction

of the Latency and 1/100

th

of the price

A relational database trying to address a SQL query in real-timehas no other option but to sequential iterate over a massive chunk of data. It simply cannotbe made fast.

(13)

Availability and auto-scaling

The cloud offers two properties that are simply invaluable for a Data Platform:

1. Auto-scaling: The infrastructure will adapt (*) to the workload pressure. Performance is always exactly the same (no matter it is the first day of the month, or middle of the night)

2. High availability and fault-tolerance: The cloud allocatescomputing resources and abstracts away the hardware failure. No need to worry any more about failing hard drives and the myriad of similar hardware glitches that do happen all the time.

* However, auto-scaling works only if the software architecture has been natively designed for the cloud. Achieving this is exactly one of the aspects covered by Lokad.

While the Data Platform is custom software, we strongly suggest to adopt a public cloud as it is an essential ingredient both to massively lower the costs and massively increase the agility of the project.

Cloud Computing is a Must-Have for the Data Platform

Unprecedented TCO

Public clouds (Windows Azure in particular) offer an unprecedented total cost of ownership to access computing resources:

Ballpark: 100€/month per TB storage (Internal initiatives cannot come close to economies of scale of a public cloud)

(14)

Automation is a requirement for cost efficiency & scalability

When numbers are read by people, they are very expensive

. In retail, any

software that produces numbers that are expected to be read by

people

is

fundamentally

non-scalable

.

The goal of the “smart” apps is to generate

operation decisions

(ex: moving a

price down) that are fed directly into the transactional systems.

Smart apps

that

operate over the Data Platform should not be confused with

Business Intelligence

.

The reliability of smart app outputs is insured by the Data Platform

ERP systems are likely to expect very reliable data sources. The Data Platform

offers a way to collect the results from the smart apps and to expose them.

This introduces reliability even when the underlying smart app’s analysis

is

not

reliable.

By doing so, the Data Platform makes those results suitable for a production use

through the existing transactional systems.

The Data Platform serves as dedicated abstraction layer that helps

retail experts

to

focus on their core domain instead of IT technicalities.

(15)

The projections are made accessible through API (application programming interface).

We favor one very specific flavor of API: the REST API.

There are many practical benefits of having APIs:

The technology behind the API (aka the Data Platform itself) can be radically

different from the technology powering the

smart apps

. Retail is vast,

one size fits

all

is not a reasonable position for any relatively large retailer.

It creates overall “access” patterns that are much easier to document, much easier

to consume as well.

It allows tuning on a need basis very specific access rules, which can be extremely

valuable when plugging 3

rd

party companies to the Data Platform.

(16)

Challenge: Maintaining a projection of all loyalty card holders with half a dozen dimensions such as:

The number of purchases in the last 3 years

The average basket size

Demographics

….

Application Example: Loyalty Data Storage

* A possible work-around is a SQL table dedicated to the “client” profiles and nightly batches that will update this table with the data of the day. However:

It is convoluted and requires a database expert to devise a “strategy” to deal with the problem.

It creates confusion between tables containing input data and intermediate computations.

It creates data duplication and amplifies the overall scalability problems

SQL: The above projection is quasi impossible to run as a SQL query. The query has to iterate over every single transaction over the last 3 years, which proves extremely time intensive.

Cloud Data Platform: Retrieving such a “projection” can be done in seconds.

(*)

(17)

Status Quo: In SQL, only the present stage of each data field is preserved. The history of each data point is however lost.

The problem: The information of what was when on-hand at any point in time is very valuable for many decisions and smart analytics, e.g.:

Correction of electronic inventory records – increased accuracy Out-of-shelf monitoring – increased accuracy

Inventory optimization– measure ‘true’ availability and performance

Application Example: Inventory Tracking

SQL: SQL storage does not allow to preserve the history of on-hand inventory levels over time.

Cloud Data Platform: Full history of on-hand levels available. User for out-of-shelf monitoring, for the automatic correction of records and for tracking inventory performance.

(18)

Status Quo: Limitations on storage capacity cause the loss of data

history. Limitations on latency make reasonable queries impossible.

Examples:

Receipts can only be recorded for a few month, history is truncated

Querying receipts is time consuming, expensive and limited

Loyalty coupons cannot be accessed or created ‘in real time’

Application Example: Receipt and Coupon Storage

Cloud Data Platform: The platform allows the efficient storage of all data and full history

including ‘events’. The data is accessible from all parts of the company, even mobile devices. Low latency make the data useable for smart apps that provide value to staff and customers.

%

• •

(19)

Status Quo: eCommerce managers know the value of data for their business. Massive amounts of data are generated each day. Personalization is the next challenge. However, ambition is far ahead of the status quo.

Storage capacity – large amounts of data generated by web analytics and operations

Latency – ‘near real time’ access often required

Accessibility

‘Smart’ exploration (smart applications)

Application Example: eCommerce ‘Data Hub’

Cloud Data Platform: The platform allows the efficient storage of all data and full history

including ‘events’ from all relevant sources. The data is accessible from all parts of the company and all applications. Low latency provide ‘real time’ capabilities. Smart applications on top

increase profitability. Examples:

More relevant product suggestions/more relevant personalizationof the webshop

The best price for each product at any point in time

• •

(20)

50k€ for a scoping mission

Scope the usages that would benefits from the Data Platform.

Clarify the vision about the data, how it should be collected, structured and exposed.

Setup the proper collaborative tools, development tools and processes to carry on with this

Data Platform initiative.

If required: Help hiring the 1 or 2 developers that will be needed internally to run the Data

Platform.

20k€ for drafting a prototype Data Platform

Goal: Setup a minimal project that could be extended to your technical teams, with the

architecture and design patterns in place to kickstart the project.

Plugging-in two identified data sources.

Setting up an event storage over Windows Azure. Setup sample projections.

Setup sample APIs.

20k€ for drafting a prototype smart app

Goal: illustrate that ROI can be generated through the Data Platform.

Devise a statistical analysis (prototype) to address an existing problem for the retailer.

Required Investment < 100k€

Lokad can help you rolling out your own BIG DATA PLATOFORM, and to plug and/or

building

smart apps

that generate direct ROI.

(21)

Head Office 10 rue P. de Champaigne 75013, Paris France

Contact:

Joannes Vermorel Founder +33 1 75 57 47 63 [email protected]

Thank you!

Matthias Steinberg CEO +49 176 3491 6256 [email protected] German Office Wöhlertstrass 12/13 10115 Berlin Germany

References

Related documents

REQM occupies an important position in project management of software outsourcing, as requirement is the foundation and starting point of project, no matter

The factors to consider include the appropriate thermal insulation of the building envelope, the avoidance of thermal bridges, appropriate size and orientation of transparent

The most abundant native bee is the common eastern bumble bee, Bombus impatiens Cresson 1863, which was the only bee observed in all community gardens sampled ( N = 19)

Quad-SPI Flash memory, 4&#34; TFT LCD using MIPI DSI interface with capacitive touch panel.The Arduino ™ compatible connectors expand the functionality with a wide choice of

Once your samples are received the only correspondence you will have with The Genetics Company is via the email address you provide.. Please note that your DNA

A literature review was conducted to investigate Agile and Scrum project management, with a specific interest in Scrum outside of the software development arena. Keywords

This study intends to analyze the expression of specific sets of markers in tumor samples and in serum from patients with Non-Small Cell lung Cancer (NSCLC) or Stage III or IV

These include, among others, having chaired the MR Safety Committee for the American College of Radiology from its inception in 2001 until the summer of 2012, being awarded