1 © 2013 Berlin Big Data Center • All Rights Reserved © Volker Markl
1 Presentation to the European Competitiveness Council on March 3rd, 2015 © Volker Markl
Towards a Thriving Data Economy:
Open Data, Big Data, and Data Ecosystems
Volker Markl
volker.markl@tu-berlin.de
dima.tu-berlin.de
|
dfki.de/web/research/iam/
|
bbdc.berlin
Based on my 2014 Vision Paper
“On Declarative Data Analysis and Data
Independence in the Big Data Era“
PVLDB 7(13): 1730-1733
2 2 © Volker Markl
2 © Volker Markl
More and more data is available to science
and businesses!
Drivers:
Cloud Computing
Internet of Services
Internet of Things
Cyber Physical Systems
Underlying Trends:
Connectivity
Collaboration
Computer Generated Data
video streams web archives sensor data audio streams RFID data simulation data
3 3 © 2013 Berlin Big Data Center • All Rights Reserved © Volker Markl
3 © Volker Markl
Data & Analysis: Increasingly Complex!
data volume too large Volume
data rate too fast Velocity
data too heterogeneous Variability
data too uncertain Veracity
Data
Reporting aggregation, selection Ad-Hoc Queries SQL, XQuery
ETL/Integration Map/Reduce
Data Mining MATLAB, R, Python Predictive/Prescriptive MATLAB, R, Python
Analysis
ML
DM
ML
DM
sca lab ility a lg o ri th m s sca lab ility a lg o ri th m sData-driven applications …
lifecycle management
home automation health
water management
market research transportation
energy management
information marketplaces
… will revolutionize decision-making in business and the sciences!
… have great economic potential!
5 5 © Volker Markl
5 © Volker Markl
Opportunities in Individual Sectors
Sectors/Domains Big Data Value Source Public
Administration
EUR 150 billion to EUR 300 billion in new value (Considering EU 23 larger governments)
OECD, 2013
Healthcare & Social Care
EUR 90 billion considering only the reduction of national healthcare expenditure in the EU
McKinsey Global Institute, 2011
Utilities Reduce CO2 emissions by more than 2 gigatonnes, equivalent to EUR 79 billion (Global figure)
OECD, 2013
Transport and Logistics
USD 500 billion in value worldwide in the form of time and fuel savings, or 380 megatonnes of CO2 emissions saved
OECD, 2013
Retail & Trade 60% potential increase in retailers’ operating margins possible with Big Data
McKinsey Global Institute, 2011
Geospatial USD 800 billion in revenue to service providers and value to consumer and business end users
McKinsey Global Institute, 2011
Applications & Services
USD 51 billion worldwide directly associated to Big Data market (Services and applications)
Several European companies and in particular research institutions and startups
have created interesting technologies and services along the data value chain.
However, both in business & science, data use is handled in a fragmented way.
In particular SMEs lack skills to capitalize on data assets in order to improve
their competetiveness.
Actors along the data value chain should cooperate and form the basis of a strong
and vibrant data-driven ecosystem to maximise big data value creation.
Data Value Chains will succeed only when individual links
operate with needed capabilities
Social & Economic Benefits
7 7 © 2013 Berlin Big Data Center • All Rights Reserved © Volker Markl 7 © Volker Markl
Application
Data
Science
Control Flow Iterative Algorithms Error Estimation Active Sampling Sketches Curse of Dimensionality Decoupling Convergence Monte Carlo Mathematical Programming Linear AlgebraStochastic Gradient Descent
Regression Statistics Hashing Parallelization Query Optimization Fault Tolerance Relational Algebra / SQL Scalability
Data Analysis Language Compiler Memory Management Memory Hierarchy Data Flow Hardware Adaptation Indexing Resource Management NF2 /XQuery Data Warehouse/OLAP
“Data Scientist” – “Jack of All Trades!”
Domain Expertise (e.g., Industry 4.0, Medicine, Physics, Engineering, Energy, Logistics)
Data Science Requires Systems Programming!
R/Matlab:
3 million users
Hadoop:
100,000
users
Data Analysis Statistics Algebra Optimization Machine Learning NLP Signal Processing Image Analysis Audio-,Video Analysis Information Integration Information Extraction Data Value Chain Data Analysis Process Predictive Analytics Indexing Parallelization Communication Memory Management Query Optimization Efficient Algorithms Resource Management Fault Tolerance Numerical StabilityPeople with Big Data Analytics Skills
We cannot address the complexity of Data Science merely by teaching it. We need
new technologies to empower more people to conduct deep analysis on big data!
9 9 © 2013 Berlin Big Data Center • All Rights Reserved © Volker Markl
9 © Volker Markl
Deep Analysis of „Big Data“ is Key to Competetiveness!
Small Data Big Data (3V)
D eep Analy tic s Sim ple Analy s is
The established vendors and exisiting products are falling short of the needs;
new technologies, systems, platforms, and services for deep analytics are emerging.
The cards are dealt anew!
IBM BigInsights
Apache Flink
Many new companies and products are emerging to enable deep big data analysis;
strong European contenders include Apache Flink, SAP HANA, Parstream, and Exasol.
Small Data Simple Analy s is Big Data (3V) Deep Ana ly tic s
11 11 © Volker Markl 11 © Volker Markl
Legal
Dimension
Social
Dimension
Economic
Dimension
Technology
Dimension
Application
Dimension
Business Models
Benchmarking
Open Source & Open Data
Deployment Models
Information Pricing
Information Marketplaces
Scalable Data Processing
Data Management
Signal Processing
Statistics/ML
Linguistics/Text&Speech
Novel Computer Architectures
HCI/Visualization
The Five Dimensions of the Data Economy
Ownership
Copyright/IPR
Liability
Insolvency
Privacy
User Behaviour
Societal Impact
Collaboration
Competitive Intelligence
Industry 4.0/IoT
Energy
Healthcare
Transportation
Digital
Humanities
Systems
Frameworks
Skills
Best-Practices
Tools
PPP: Uniting the Actors
•
Main industry drivers: ATOS (ES), Engineering (IT), DFKI (DE),
Fraunhofer (DE), Nokia Networks and Solutions (FI), Orange (FR),
SAP (DE), SIEMENS (DE), Software AG (DE), Thales (FR), TIE
Kinetix (NL)
•
Have worked on a Strategic Research & Innovation Agenda (SRIA)
for period 2016 – 2020 (regular updates during the running of the
PPP)
•
Lighthouse Projects (e.g., on health, logistics, energy)
•
Innovation spaces will offer secure environments for experimenting
with both private and open data; will also act as business incubators
and hubs for the development of skills, competence and best
13 13 © Volker Markl
13 © Volker Markl
Call to
Action: „Data Ecosystem for Europe“
■
Educate Data Scientists to Create the Required Talent
□ Information Literacy
□ -shaped Students (computer science/data management and mathematics/data
analysis skills, combined with application, legal, and social skills)
□ Enhance the e-competencies framework with data skills and job profiles
■
Research Data Analytics Technologies, Systems and Platforms
□ Simplified programming, large-scale data management, and novel hardware
□ Scalable machine learning, statistical methods, and mathematical programming
□ Information marketplaces, large-scale data stream processing and visual analytics
■
Innovate to Maintain Competitiveness
□ Create networks of national centers of excellence in big and open data
□ Provide data, processing and analytics capabilities through information marketplaces
□ Demonstrate flagship use-cases to raise awareness & solve real-world problems
□ Startups are key innovation drivers in this field – promote startups in the area of data
analytics technologies, information marketplaces, and applications
□ Raise awareness of data value and analysis value in enterprises and governments
(Chief Data Scientist) and transfer technologies to enterprises, in particular SMEs
□ Determine legal frameworks and business models
□ Create a data ecosystem