• No results found

Using Data Mining and Machine Learning in Retail

N/A
N/A
Protected

Academic year: 2021

Share "Using Data Mining and Machine Learning in Retail"

Copied!
24
0
0

Loading.... (view fulltext now)

Full text

(1)

Using Data Mining and Machine Learning in Retail

(2)

The Challenge Shortened processing windows Escalating costs Hitting scalability ceilings Demanding business requirements ETL complexity Latency in data Tight IT budgets Growing data volumes

Over a Century of Innovation

A Fortune 100 company, nearly $40

billion in annual revenue.

The nation’s fourth largest broad line

retailer with almost 2,500 full-line and specialty retail stores in the US and Canada.

A front runner in Big Data efforts

including driving personalized

marketing and generating savings from legacy migration.

Running one of the biggest rewards

programs that captures and analyzes a very large number of customer

(3)

Big Data can no longer be defined by the

amount of data, but by the type, speed,

and storage capacity needed to compute

and analyze that data.

(4)

We are creating so much data, so quickly, that 90% of the data in the world today has been created in the last 2 years.

(5)

With traditional computer processing--it can be difficult to

compute everything, due to storage space, processing time, and

cost.

This typically leads to incomplete computations, data latency, and

overall lack of quality analysis.

Hadoop brings infinite scalability, extremely large storage

capability, and fast data processing.

(6)

Runs applications on a large cluster built of commodity hardware.

Provides reliability and data motion to applications.

Implements a computational paradigm named MapReduce.

• Applications divided into small fragments of work for execution/ re-execution on any node in the cluster.

Provides a Distributed File System (HDFS) that stores data on compute

nodes, resulting in high aggregate bandwidth across the cluster. Both

Map/Reduce and the Distributed File System Framework automatically handle the node failures.

Apache Hadoop is a framework which:

(7)

Stability: Hadoop is “horizontally scalable.”

• Easily stores and processes petabytes of data, just by adding hardware.

Economical: Uses commodity based hardware.

Efficient: Extremely powerful processing ability.

Reliability: Data is replicated 3x times (min) in different locations; failed

tasks are rerun.

Storage space & Capacity: Central Repository; Keep everything forever.

(8)

How can I better manage my inventory?

How can I better understand my customers’ buying habits?

How can I detect fraudulent activity?

How can I create better targeted interaction with my customer?

How do I get customers to purchase more products?

(9)
(10)

Top Apache Foundation software project

Uses Scalable Machine Learning algorithms

Collection of pre-built data-mining libraries

Primary focus on collaborative filtering, clustering &

classification

Houses a Java based math library that uses common math

operations

Uses MapReduce paradigm

(11)
(12)

Clustering

 

Recommendation Systems

Market Basket Analysis

(13)

A process of grouping similar things in such a

way, so that ‘like items’ are grouped together

with other items that most closely represent

themselves.

(14)

Why use Clustering??

To better understand a customer’s buying behavior

To develop targeted marketing campaigns

To understand interest, motivation, and lifestyle, in

order more effectively move merchandise in and out of

stores

(15)

An information filtering system that is used to

predict a users rating or preference, typically

using a collaborative, content-based or hybrid

approach to recommendations.

(16)

 Framework that filters and recommends items based on user behavior, preferences and activities.

Based on their similarities to others.  Recommenders

User basedItem based

Online and Offline support Can utilize Hadoop

Uses numerous similarity measurements, such as Cosine, LLR, Tanimoto, Pearson, and more.

(17)

Looks at the item and the users preference in order, and provides a

recommendation.

 Allows for highly precise

recommendations.

Difficulty when making

recommendation over cross-sections of service when used for cross- selling. A C B Users Ratings Matching Content with similar feature values is recommended Feature Values

Content used in the past

X Z Y User Profile Feature Values Content Profile profile

Content- Based Filtering

(18)

A model used to describe the commonality of several relationships between two objects.

Items: anything that is purchased

Basket: a set of items

The numbers of items in a basket is typically small, and the number

of baskets is typically large

(19)

 A list of Purchasers

 Additional “Purchaser” data is can be useful (but is not needed)

 A list of transactions

 Seek to identify purchasing patterns

What items are normally purchased togetherWhat is the purchasing sequence

 Is there a seasonality effect to purchasing  Categorize buying behavior

Translate buying behavior into actionable insight  Targeted promotions

 Inventory placement  Store layout

 Cross- Selling

(20)

Any set of items that appears regularly within multiple baskets

Originally used to analyze a physical “supermarket basket”

Best used to link commonly bought together pairs that often have no

relationship to each other

Example: Diapers & Beer

A major store chain discovered that diapers and beer were regularly

appearing in baskets together. Theory was that if you bought diapers you are likely to have a baby at home, with a baby at home it is less likely that you go to a bar to drink, and more likely you will have a beer at home.

(21)

Retail Stores

Showroom floor planning

Catalog layout

Crossing selling

Fraud Analysis

(22)

Big Data Stack

Data Governance & Integration --ETL/ELT Security Storage-hdfs On-Promises Metadata NOSQL DB NOSQL DB Hive/Pig Advance Query Storage-hdfs Cloud Hive/Pig Advance Query Data Analytics Data Mining

Data Visualization & Reporting

Real-Time Streaming Time series On demand Consumption Layer Consumption Layer Semantic Layer Semantic Layer Computation/Acc ess Layer Computation/Acc ess Layer Storage Layer Storage Layer Security Layer Security Layer Integration Layer Integration Layer Frequency Frequency Integration Layer Integration Layer

(23)

Security Layer Security Layer Storage Layer /NO SQL DB Storage Layer /NO SQL DB Computat ion/Acces s Layer Computat ion/Acces s Layer Semanti c Layer Semanti c Layer Consump tion Layer Consump tion Layer Distribution Distribution

(24)

References

Related documents

When the authorized pressure vessel inspector believes that a pressure test is necessary or when, after certain repairs or alterations, the inspector believes that one

Qualitative data were collected by two researchers using a combination of two qualitative interview strategies: the general interview guide approach and the standardized open-ended

happened, and as such was an act of self-definition. Du Fu’s poetics of historical memory refers to these transformations enacted in and through poems on the identities of place

Wi-Fi Protected Access (WPA), a link layer solution, was designed specifically for wireless networks using 802.1X and is particularly well suited for wireless security.. This

To odignim in ullan ute modolenim dolortie esto erit utat iure dolore dunt lan vel utpatisis nostrud tie magnim vel iureril luptat aut lum- san ulla cor sumsand ionsectet wisim

93.. Depletion is a well known phenomenon in the colloids community, and it has been employed in numerous creative ways. Examples include the separation of different particles [38,

Data Driven Decision Making (D3M) in HCO’s Tips (cont’d).  How data is

• Produced a final technical report focused on NGAS closed expander engine conceptual design; included results of the trades and design studies conducted at the integrated