Solving big data problems in real-time with CEP and Dashboards -
patterns and tips
Kevin Wilson Karl Kwong
• Big data is a reality and organizations must embrace to
succeed
• Complex Event Processing (CEP) technology gives us a new
way to look at big data – real-time micro-trending
• CEP supports data processing patterns that are very useful
but difficult to implement in traditional database model
• Leveraging big data in real-time will change the way
organizations run
IDC predicts size of data “digital universe” grow to 2.7 zetta-bytes by end 2012 - ↑ 48%
Recent Explosion of Data
Average human hair is about 70 µm diameter – very small Let’s say 1 byte of data = 1 human hair
2.7 zettabytes worth of hair side-by-side:
• distance circling the earth 100 billion times • go to the Sun and back 100 thousand times
Putting it into Perspective
Enterprises
• Web - weblogs, click stream events, web transactions
• ERP - B2B transactions, B2C transactions • Contact Center – Emails, telephony
Industries
• Telecomm – call records (CDR) • Utilities – smart meters
• Manufacturing - equipment health
Common Characteristics
• High volume and velocity • Streaming sources
• Operational in nature
Demands a new way to look at this data!
In-memory Analytical Appliances – SAP HANA
• Load data into large main memory • Store data an optimized format
• Large set aggregations (few million rows) can be done in seconds
• Data analysis tools can interface with the appliance to provide typical data analysis
Two Common Approaches to Big Data
Map Reduce and Distributed File Systems - Hadoop
• Takes advantage of distributed processing to transform data
• Aggregation and transformation of extremely large set (multi billion rows) can be done in hours • Data can then be fed into more traditional data
What is addressed by in-memory and map reduce?
• Volume of data • Processing time • Analysis
What do we gain from big data today?
• Higher resolution (more records) can be used for analysis
• Trending can done over longer periods
So what is missing?
• Focused on historical analysis
• Insights more suitable for strategic and tactical decisions
• Need a way to cope with big data and answer what
is happing right now!
Analytical Trending
• Examples:
• Quarterly sales performance • Annual customer satisfaction • Monthly branch queue time
• Typical Aggregation:
• Years • Quarters • Months • Weeks
• Support strategic and tactical decisions
• Strategic investments
• Compensation and rewards • Weekly Staffing
• Corporate performance
Different Way to Trend
Real-time Micro Trending
• Examples:
• Max wait time for agent • Banner ad click rate • Failed inspection rate
• Typical Aggregation:
• Days • Hours • Mins
• Rolling or sliding window
• Support operational or time-sensitive decisions
3 3.1 3.2 3.3 3.4 3.5 3.6 0-9 10-19 20-29
10 Min Avg.
2.8 3 3.2 3.4 3.6 3.8 0-4 5-9 10-14 15-19 20-24 25-295 Min Avg.
0 1 2 3 4 5 0 2 4 6 8 10 12 14 16 18 20 22 24 26 281 Min Avg.
Traditional analysis relies onlandmark aggregation periods
• Longer the period the less up-to-date • Resolution is reduced
• Does not reflect what is happening right now
Increasing aggregation frequencies • Reduces latencies
• Increases resolution
• Still doesn’t tell what is happening right now
At some point shortening
aggregation period breaks down • Too short aggregation period exposes
noise in the data
• Loose visibility on general data movement
Is there a way hide noise and lower latency?
Rolling or Sliding Window Aggregation
Sliding window approach
• Aggregate over an logical time/event window • Computation is done continuously
• Filter out noise
• Take into account the most recent data
New Class of Software is Needed
Streams Data Warehouse SOA / ESB APPSData Mart Performance Management
Complex Event Processing (CEP)
Semantic Layer Context History KPIs / Goals Alerts / Notifications Visualize Analyze
What is happening now?
What has happened
SAP Sybase Event Stream Processor
• Unlimited number of input streams • Input events in native formats
• Incoming data is processed as it arrives, according to the business logic defined using high level authoring tools
• Stream output to apps, dashboards • Range of built-in adapters for
out-of-the-box connectivity
• Java, C++ and .NET API’s for custom integration
?
INPUT STREAMS Market Events Transactions Process Events Dashboards Applications Studio (Authoring) Reference Data SAP SybaseEvent Stream Processor Sybase IQ
CCL - primary method to interact with SAP Sybase ESP
• Extension to Structured Query Language (SQL)
• Added keywords for defining and manipulating time
windows and related operations
• CCL allows continuous processing of high-volume of
streaming data
Continuous Computing Language (CCL)
Insert Into StreamSummary
Select Max(Price) as High, Min(Price) as Low,
First(Price) as Open, Last(Price) as Close From StreamFeed Keep Every 1 minute
CCL enables some powerful window-based data processing concepts beyond continuous metrics
• Occurrence detection
• Detect 1-N occurrences of a condition over a time period
• Useful in fraud detection and intrusion detection • Example: detect excessive use of a smart cash card
over a short period of time
• Absence detection
• Test for absence of a certain event over a given period • Useful in transportation and logistic scenarios
• Example: matching order, packing and shipping records over set SLA period – absence of event trigger alert
• Threshold crossing
• Detect when a value crosses a predefined threshold • Support up, down or dual direction threshold violation • Use of multiple threshold to create complex alarm
conditions
• Example: combine multiple threshold such as wait-time, drop rate and skill set to set off critical alert to reallocated or call in addition agents
• Condition-based stream splitting
• State management
Power Dashboards with New Insights
Real-time dashboards allow users to:
• Assess current environment quickly
• Provide quick summary of situation
• Only see what’s relevant and important for job
• Comprehend severity of situation (or opportunity)
• Show “current” information vs. “projected” or “historic” data
• Reflect impact across activities or processes or
• Project status (red/yellow/green) • Show appropriate time window &
appropriate detail what is being measured
• Act in time
• Display prominent but relevant alerts • Point to specific actions
Some Questions Answered by Micro-trending Big Data
Spot emerging threats or opportunities
before it’s too late
React to changing conditions sooner
Make decision based on more timely
information
Financial firm: “I want to track the current value and net gain of all my positions, and monitor my aggregate
exposures in real-time”
eCommerce: “I want to customize offers based on current behavior to improve conversion rates”
Telecom provider: “I want to alert Customer Service when an individual customer has just experienced their
4th dropped call in a 2 hours”
Relevance in Every Industry
Financial
Capital markets
Banking fraud prevention
Telecommunications
Operations monitoring Mediation
Proactive churn management
Utilities
Smart grid applications Demand management
Transportation
Location-based monitoring Customer satisfaction / loyalty
Retail / consumer product goods
Real-time click stream analysis Customer sentiment analysis Supply chain management
Hospitality / Service
On-line gaming
Customer experience and loyalty
Healthcare
Healthcare (e-care, asset tracking)
Public Sector