• No results found

PARC and SAP Co-innovation: High-performance Graph Analytics for Big Data Powered by SAP HANA

N/A
N/A
Protected

Academic year: 2021

Share "PARC and SAP Co-innovation: High-performance Graph Analytics for Big Data Powered by SAP HANA"

Copied!
6
0
0

Loading.... (view fulltext now)

Full text

(1)

PARC and SAP Co-innovation, page 1

PARC and SAP Co-innovation:

High-performance Graph Analytics for Big Data

Powered by SAP HANA

Harnessing the combined power of SAP HANA and PARC’s

HiperGraph graph analytics technology for real-time insights

Surendra Reddy (PARC), Cirrus Shakeri (SAP), Heinz Ulrich Roggenkemper (SAP),

Hartmut Vogler (SAP), and Jens Doerpmund (SAP)

(2)

Table of Contents

Executive Overview ... 3

Introduction to Graph Analytics and PARC’s Big Data Research ... 3

Real-Time Marketing and Big Data ... 3

How SAP HANA and PARC HiperGraph Disrupt the Way Business Insights are Delivered to Users ... 4

Real-world Case Study: Major Retailer Data with HiperGraph and HANA ... 5

(3)

PARC and SAP Co-innovation, page 3

Executive Overview

Graph analytics is a crucial element in extracting insights from Big Data because it helps discover hidden relationships by connecting the dots. A graph, meaning the network of nodes and relationships, treats the linkage between objects as equally important as the objects themselves. Social networks or supply chains are obvious examples, but graphs include any network of objects such as customers, products, purchase orders, customer support calls, product inventory, etc. HiperGraph, PARC’s breakthrough Big Data technology, is a high-performance graph analytics engine. Through a four-month research project with SAP, we added HiperGraph’s analytics to SAP HANA to demonstrate a live, real-time marketing insights use case.

Graph reasoning technologies provide the ability to contextualize relational data with the tapestry of information and can go beyond simplistic reporting and dashboards. This creates opportunities to rapidly experiment, gain new insights, and identify root causes. The demonstrated technology match between HANA and HiperGraph has great disruptive potential, especially in the identification of key patterns within datasets (e.g., via clustering).

With HANA and HiperGraph we can finally deliver on the promise of a closed feedback loop in the enterprise where transactions are analyzed and reacted to in real-time. The intelligence that is implicit in large volumes of structured and unstructured data from varieties of sources from inside or outside of the enterprise can be delivered to the users in the form of smart business applications. We concluded that the existing commercial or open source algorithms either did not provide the real-time response or were unable to scale to the large volumes of data. The requirements from our customer (an online retailer) required real-time response from their Big Data system. PARC’s graph reasoning, versatile

goal-directed clustering, egocentric recommendations, and real-time recommendation algorithms combined with the power of HANA in-memory technologies far exceeded the expectations.

Brand managers can use this solution to automatically find clusters of customers with similar purchases, clusters of products that are frequently bought together, clusters of products that tend to be purchased on sale vs. those that are purchased at full price, and so on, and act on these insights during the customer’s shopping experience.

There is a great opportunity for businesses to gain value by combining the HANA in-memory technology with HiperGraph reasoning, recommendation, matrix factorization, egocentric collaborative filtering, and versatile goal-directed clustering. With SAP and PARC co-innovation in Big Data analytics we can now reduce and/or eliminate the need for complex extract, transform, and load (ETL) processes; increase speed in clustering; and introduce new accessibility for business users to directly explore data clusters. We are democratizing data science for all business users in the enterprise.

Introduction to Graph Analytics and PARC’s Big Data Research

During PARC’s multi-year research effort on graph-based reasoning and graph analytics, we found many broad applications across industries beyond their popular use for the mining of Twitter associations or Facebook friends. We realized that most applications today handle data which is inherently deeply associative, and becoming more and more graph-oriented in nature.

Typically enterprise datasets are high dimensionality with a rich tapestry of relationships requiring highly scalable machine learning and reasoning algorithms for advanced analytics. PARC has developed high-performance algorithms for analyzing large graphs in real time called HiperGraph. After six months of exploration with Hadoop + Hive, Native Map/Reduce, R/ MR, and Mahout under different execution environments like multi-core, multi-threaded, and parallel computation, PARC found the optimal solution by integrating our reasoning and insight discovery algorithms with SAP HANA.

Real-Time Marketing and Big Data

Technology Drivers

The primary challenge is finding the right tools to reveal insights in a dataset that can sometimes take days to process. Organizations must have powerful hardware and nuanced software to produce actionable insights. Otherwise, the analysis can hold little value.

(4)

Business Drivers

With the proliferation of digital channels like web, social, and mobile, today’s consumers have more power and choice than ever before. Multi-channel campaign marketing is becoming highly important for brands to reach out and engage with these technically savvy customers and respond to them with relevant offers and campaigns in near real time. Real-time recommendation engine/service is one example of many real-time response approaches. Traditional, batch processing recommendation engines are a good start, but not sufficient.

To make real-time marketing work, marketing managers should be able to discover and contextualize marketing insights in near real-time and craft customized one-on-one personalized messages to the targeted audience. Real-time marketing is viewed as a business process, having an operational team ready to react and engage with consumers and messages that are relevant to current events, sports, television, or sometimes even natural disasters. The benefits of being a relevant and timely brand can be powerful, but needs to be executed at scale and delivered in a unique but consistent way to optimize the customer experience.

From PARC’s experience working with customers for the past 18 months, we realized that the following four pillars form a strong foundation for the success of real-time marketing efforts:

• Speed and Agility: Fundamental to actually achieving real-time insights is understanding the context of a

customer at the very “moment of truth” when engaging with a consumer. This also needs to be an automated and dynamic process.

• Personalization: Every message or touch point delivered by a brand should matter, providing a unique and

memorable experience for each consumer, including personalized messages for anonymous prospects and known customers. Marketers need to look to an approach that not only relates to current events quickly, but also places emphasis on the individual consumer, and takes into account coordinating a consistent experience across multiple channels.

• Scalability: Real-time insights process needs to automatically glean through large volumes of high dimensional

data for every differentiated consumer. Every consumer has their own tastes, interests, needs, and behaviors. This consumer data is not frozen in time either. As customers engage with brands in different channels, provide commentary, and share and exchange information, this context should be absorbed and used to immediately enhance the customer experience at every touch point.

• Cross-Channel Optimization: Add the ability to carry contextual information with consumers wherever they go,

across channels, ensuring that information is reconciled and channel conflict is avoided.

How SAP HANA and PARC HiperGraph Disrupt the Way Business Insights are Delivered to Users

SAP HANA is a fast, massively parallel ACID-compliant database platform for both analytical and transactional data processing. Both transactions and analytics are supported within the in-memory columnar engine, and all data

processing and calculations take place in memory. HANA provides business, predictive and advanced analytic libraries (e.g. rules engine, text processing, spatial analytics) which can be called from within a rich, procedural language. What is unique about HANA is that it enables customers to perform complex analytical processing directly on top of the online transaction processing (OLTP) data structures, thus eliminating redundant data storage and reporting via batch-processing. With HANA Live, customers have access to a large number of non-materialized business views for real-time reporting and application development.

HANA’s real-time response combined with PARC’s fast HiperGraph reasoning algorithms helped us to generate qualitatively superior output (i.e., clusters with higher modularity, rapid discovery of hidden patterns, and insights). The perfect match between HiperGraph and HANA’s analytics is unique in terms of turning the speed of computations into new ways of solving problems. For example, we can simulate the spread of diseases, optimize when and where vaccinations should be done, analyze viral marketing, detect next-best-action, optimize supply-chains with up-to-the-second transactions, and detect frauds with input data in real-time.

(5)

PARC and SAP Co-innovation, page 5

This powerful combination of HANA and HiperGraph deliver three building blocks to disrupt the way business insights are delivered to the business users:

#1 – Real-Time Data: All sources of data from the enterprise, both at rest and in motion, are in an environment where

enterprise data is at users’ fingertips. No heavy ETL tools. Access to data is as simple as point and click. Where data lives is not a worry. Users are equipped with the ability to simply define what data is relevant for their function – be it OLTP, online analytical processing (OLAP), or knowledge bases. HANA plus HiperGraph fuses the relevant features and builds the best models through machine learning. It can provide the knobs and controls to drive objectives from these insights and act on them.

#2 – Automated Domain Specific Models: Manual generation of models doesn’t scale. Domain specific use cases and

business problems can empower business analysts to rapidly explore and discover new insights and act on them. The system should automatically generate and select models as well as provide nobs and controls for business analysts to apply it to years of historic data and/or a stream of live real-time data. Seamlessly and easily configure data in a format that an organization’s algorithms can consume and be executed on infrastructure in a parallel, distributed, highly scalable way.

#3 – Actionable Insights: Knowledge of how to apply analytics against data through the use of application business

rules to produce a positive impact to the business. Producing insights and reports are not sufficient. Business analysts should be able to fuse insights and business rules in a way that they can actually be consumed by business users and acted upon. What’s even more exciting is the ability to deploy insights operationally through an application that leverages individual domain expertise and understanding of the business logic associated with the targeted use case being solved against.

Real-world Case Study: Major Retailer Data with HiperGraph and HANA

In a recent project for a large retail customer, PARC deployed HiperGraph and ego-centric collaborative filtering— the act of making predictions for a single individual based on the behaviors of others that have at least one commonality— with HANA to help shape a new approach to real-time big data analytics.

The primary focus of the project was to improve and streamline contextual recommendations. Our goal was to learn from disparate data sources—including Internet usage histories, third-party CRM data, and click patterns—to help improve product layout, recommendations, and generate a smarter user experience that is based on algorithmic deductive reasoning and machine learning.

The dataset represented 50 million customers, 3 million products, 371 million e-commerce transactions, and 6.5 billion clickstreams. That is a massive amount of data points and relationships to model. The real challenge is being able to deliver tools that can enable users to explore and discover new insights—and then translate those insights into actions.

(6)

Upon completion, brand and CRM managers were able to:

• Gather data from many external sources (including news) to gain insight into their risk position • Engage customers in interactive/personalized conversations (real-time)

• Provide a consistent, cross-channel experience including real-time touch points like web and mobile • Understand and respond to critical moments in the customer sales cycle (in the moment)

• Model and adjust campaigns based on customer real-time activities

We validated the premise that the entirety of a real-time dataset, such as that generated by the retail industry, can be processed in real-time to allow for immediate insights and influence.

Next Steps and the Innovation Edge of HANA and HiperGraph

The demonstrated technology match between HANA and HiperGraph has a great disruptive potential, especially in the key identification of clusters within datasets. PARC has demonstrated using HANA to hold and extract arbitrary datasets while using Goal Directed Clustering running in remote accelerators to identify clusters within those datasets. This innovation is about turning the fast in-memory computations of HANA and HiperGraph into business applications that are qualitatively different or superior compared to the current generation of applications.

Right now SAP and PARC are entering a new phase of our partnership in order to bring this co-innovation to the market. In the coming months, we will provide more details on the technology and product roadmap.

Stay tuned! Visit www.parc.com/services/focus-area/bigdata or follow us @SAPInMemory and @PARCinc for updates.

3333 Coyote Hill Road

Palo Alto, California 94304 USA +1 650 812 4000

engage@parc.com | www.parc.com © Palo Alto Research Center Incorporated

PARC, a Xerox company, is in The Business of Breakthroughs®. Practicing open innovation, we provide custom R&D services, technology, expertise, best practices, and IP to Fortune 500 and Global 1000 companies, startups, and government agency partners. We create new business options, accelerate time to market, augment internal capabilities, and

SAP Labs, Bay Area 3410 Hillview Avenue Palo Alto, CA 94304 USA +1-650-849-4000

www.sap.com

As market leader in enterprise application software, SAP (NYSE: SAP) helps companies of all sizes and industries run better. From back office to boardroom, warehouse to storefront, desktop to mobile device – SAP empowers people and organizations to work together more efficiently and use business insight more effectively to stay ahead of the competition. SAP applications and services enable more than 251,000 customers to

References

Related documents

By way of contribution, we enhance understanding of the role of affective embodiment as a foundation for activist feminist practices; develop a theory of the protesting body

In your examination, you find that the balance of Accounts Receivable represents sales of the current audit year only; that In your examination, you find that the balance of

courts have said we must) solely upon the statutory text of the DMCA, then our defense is colorable, but hardly bulletproof-according to cases like Reimerdes, the

Petitioners contend that there was no implied trust between Filomena Lariosa and Emilio Villahermosa and that petitioner Herminia Rosario had no way of knowing if there was

Hitachi Data Systems has partnered with SAP to create a high-performance and scalable big data appliance, combining the strength of SAP HANA In-Memory Analytics and the benefits

 Easiest way to monitor user behavior on a web site: cookies  Track user across different web sites: third-party cookies, for. example set by banner ads 

SAP HANA - Breakthrough innovation with in-memory computing.. Winning in five categories – powered by SAP HANA SAP HANA Services Extending leadership in applications Broadening

use event marketing entertainment (like shows, contests, or parties) to reach consumers through direct hand-to-hand sampling or interactive displays.. The practice