Transactions & Interactions
The Correlation of Structured and Unstructured Data
Shaun Connolly, Hortonworks
Big Data Has Reached Every Market
Source: McKinsey & Company report. Big data: The next frontier for innovation, competition, and productivity. May 2011.
Digital data is personal, everywhere, increasingly
accessible, and will continue to grow exponentially
What is Apache Hadoop?
Page 3
•
Set of open source projects
•
Transforms commodity hardware
into a service that:
– Stores petabytes of data reliably
– Allows huge distributed computations
•
Solution for big data
– Deals with complexities of high
volume, velocity & variety of data
•
Key attributes:
– Redundant and reliable (no data loss)
– Extremely powerful
– Batch processing centric
– Easy to program distributed apps
– Runs on commodity hardware
One of the best examples of
open source driving innovation
Page 4
Yahoo!, Apache Hadoop & Hortonworks
http://www.wired.com/wiredenterprise/2011/10/how-yahoo-spawned-hadoop
Hadoop at Yahoo!
40K+ Servers 170PB Storage 5M+ Monthly Jobs 1000+ Active Users
Yahoo! embraced Apache Hadoop, an open source platform, to
crunch epic amounts of data using an army of dirt-cheap servers
2006
Yahoo! spun off 22+ engineers into Hortonworks, a company focused on
advancing open source Apache Hadoop for the broader market
HCatalog Zookeeper Hive
Pig Hadoop
Core
HBase
Challenge:
• Integrate, manage, and support
changes across a wide range of open source projects that power the Hadoop platform; each with their own release schedules, versions, & dependencies.
• Time intensive, Complex, Expensive
Solution: Hortonworks Data Platform
• Integrated certified platform distributions
• Extensive Q/A process
• Industry-leading Support with clear
service levels for updates and patches
• Continuity via multi-year Support and
Maintenance Policy
Hortonworks Data Platform
Fully Supported Integrated Platform
= New Version
Advancing Hadoop for Broader Market
Architecting the Future of Big Data
Hortonworks Focus
Transform Apache Hadoop into a complete Data Platform with the data, application, and operational services that enable a
vibrant ecosystem driving the next wave of business innovation and productivity
Operations
Hortonworks Data PlatformPlatform APIs Administration APIs
Advancing Hadoop for Broader Market
Page 7
Replication, DR Retention, ILM
ETL (basic & advanced)
Integration (msg bus, …)
Datastore Federation SQL, NewSQL, NoSQL
Tools, Languages Algorithms, Data Science Search
Analytics, EDW
SaaS, Packaged & Custom Apps BI, Reporting, Visualization
Operations Hortonworks Data Platform
Platform APIs Administration APIs
Hortonworks Data Platform
Big Data Value Creation Opportunities
Financial Services
• Detect/prevent fraud
• Model and manage risk
• Improve debt recovery rates
• Personalize banking/insurance products
Healthcare
• Remote patient monitoring
• Predictive modeling for new drugs
• Personalized medicine
• Optimal treatment pathways
Retail
• In-store behavior analysis
• Cross selling, recommendation engines
• Optimize pricing, placement, design
• Optimize inventory and distribution
Web / Social / Mobile
• Sentiment analysis
• Web log, image, and video analysis
• Location-based marketing
• Price comparison services
Manufacturing
• Design to value
• Improve service via product sensor data
• Crowd-sourcing
• “Digital factory” for lean manufacturing
Government
• Detect/prevent fraud
• Segment populations, customize action
• Support open data initiatives
• Cyber-security
© 2011 Datameer, Inc. All rights reserved. Page 9
• Business Intelligence Platform on Hadoop
• Established 2009 by Hadoop and enterprise software veterans
• Offices in Silicon Valley, New York and Germany
• Funded by Kleiner Perkins, Caufield and Byers + Redpoint Ventures
Datameer
© 2011 Datameer, Inc. All rights reserved. Page 10
Leading Financial Institution
§ Log file analytics involving disparate systems
§ Ensures smooth operation during “mini-crises”
§ Also provides visibility into visitors’ click path
§ 1000’s of operational servers, 10+ formats
§ Hadoop as long term data store: 200+ TB
§ Using Datameer for ingestion, analysis,
and data quality metrics
§ Datameer unifies JSON, XML, IIS, Apache, and
proprietary log formats into company standard
§ Datameer integrated with Active Directory
© 2011 Datameer, Inc. All rights reserved. Page 11
Major Social Gaming Pioneer
§ Top tier Facebook-based gaming company
§ Offering: strategy games to millions of monthly users
§ Looking to aggregate different data sources such as:
game play and twitter sentiment
§ Datameer aggregates logs from hundreds of games
servers for gameplay behavioral analytics, drives existing dashboards
§ Hourly/daily/weekly metrics available to game
producers, sliced across 10+ dimensions, establishing high-value cohorts to optimize games and cross-sell / up-sell
§ Close monitoring of the performance of their
© 2011 Datameer, Inc. All rights reserved. Page 12
Seamless Data Integration! Powerful Analytics! Self-Service Dashboards!
• Interactive spreadsheet UI"
• Cleansing, transformation, analysis"
• Over 180 built-in analytic functions"
• Macros and function plug-in API"
• Drag and drop"
• Powerful visualizations"
• Mashup anything"
• Integration into existing portals"
• Wizard-based integration"
• Structured, semi- and unstructured"
• No complex mappings/schemas"
• Connector plug-in API"
© 2011 Datameer, Inc. All rights reserved. Page 13
Enterprise Integration
© 2011 Datameer, Inc. All rights reserved. Page 14
Datameer Analytics Solution
Use Case Example
© 2011 Datameer, Inc. All rights reserved. Page 15
For more information: www.datameer.com or www.hortonworks.com
Live Hortonworks Webinars: What’s in Store for Hadoop.Next
Sign up at: www.hortonworks.com/webinars
Live Datameer Webinars: Datameer Analytics Solution “Below Decks” with Datameer
Sign up at: http://datameer.com/news-events/events.html
For information on Datasift, please go to www.datasift.com.