Big Data and Smart Government
Institute for Public Administration Australia Nov 20, 2014
Ramayya Krishnan
W.W. Cooper and Ruth F. Cooper Professor of Information Systems H. John Heinz III College, Carnegie Mellon University
Seizing the Data Revolution
• Data Tsunami: Explosive Growth in Size, Complexity, and Data Rates
– Enabled by mobile phones, social media, email, videos, images, click streams, Internet
transactions … and sensors everywhere!
– Opportunity to integrate and leverage with existing legacy data sources
– HOWEVER, what is important is not whether data is big or small but that it has context
and relevance to the task or decision at hand.
• The Age of Data: From Data to Knowledge to Action
– Widespread use of data to create actionable information leads to timely and more
informed decisions and actions.
– Fundamentally, this is about evidence-based management and policy making
Imagine …
•
By coupling roadway sensors, traffic cameras, and individuals’ GPS
devices, we can
reduce traffic congestion and generate significant
savings
in time and fuel costs.
•
By
accurately predicting natural disasters
such as hurricanes and
tornadoes, we can employ life-saving and preventative measures that
mitigate their potential impact.
•
By integrating emerging technologies, such as MOOCS and inverted
classrooms, with knowledge from research about how people learn, we
can
transform formal and informal education
.
•
By
mining data from electronic health records
and through experiments,
develop a
causal understanding
of cost-efficient and personalized
practice guidelines associated with the best health outcomes
Source: Sajal Das, Keith Marzullo Personal Sensing Public Sensing Social Sensing People-Centric Sensing Actions (controllers) Percepts (sensors) Agent (Reasoning) Smart Health Care Situation Awareness: Humans as sensors feed multi-modal data streams Sense Identify Assess Intervene Evaluate Emergency Response Environment Sensing
Smart Sensing, Reasoning and Decision
Credit: Photo by US Geological Survey
Pervasive Computing Social Informatics
Mobile Devices & Cellular Networks are Pervasive
The number of mobile-connected devices will soon exceed the number of people on earth.
vs
Mobile data traffic will grow at a compound annual growth rate (CAGR) of 66 percent from 2012 to 2017, reaching 11.2 exabytes per
month by 2017.
Slide Credit: Intel Corporation
“Legacy” Data
•
Statistical Agencies
–
Surveys vs. social media?
–
Surveys remain very relevant and social media is a complement not a
substitute!
•
Public Health
•
Education
•
Business data
•
Adoption of Evidence-based Management and Policy Making is
Characterizing “Data”
http://www.intergen.co.nz/Global/Images/BlogImages/2013/Defining-big-data.png
The Data and Analytics Value Chain
• Network and access Infrastructure – Devices and data rates
– Mobile phones are the dominant device but we are on the cusp of embedded
sensors for a variety of applications
• Secure Data storage and compute infrastructure
– Role of cloud Services
– Regional vs. National Strategies
• Data governance and sharing infrastructure
– Role of Open Data initiatives from the Public Sector
– Role of “Exchanges” (e.g., health information exchanges with private and public
data in the US)
– Data sharing standards
– Data for the “public Good: (see Orange’s Data for development) • Data Privacy Policy and its interaction with business models
CMU SURTRAC System
1. Video cameras measure traffic conditions.
Controller
2. System optimizes phase schedule at intersection and sends commands to the control box.
Controller
CMU SURTRAC System
Video Cameras
3.Schedule is communicated to
downstream intersections to indicate what is coming.
SURTRAC:
Scalable, Real-Time Adaptive
Signal Control for Urban Road Networks
Traffic 21 and The Robotics Institute, Carnegie Mellon University
4. Scheduling cycle is repeated every few seconds.
Surtrac Pilot – Results and Status
Penn Circle Field Test (Jun 2012):
% Improv. Travel Time # of Stops Wait Time Emission s AM rush 30.11 % 29.14 % 47.78 % 23.83% Mid Day 32.83 % 52.58 % 49.82 % 29.00% PM rush 22.65 % 8.89% 35.60 % 18.41% Evening 17.52 % 34.97 % 27.56 % 14.01% Overall 25.79 % 31.34 % 40.64 % 21.48%
Bakery Square Expansion (Nov 2013):
% Improv . Travel Time # of Stops Wait Time Emissions AM rush 17.02% 33.81% 32.76% 16.21% Mid Day 21.35% 37.23% 38.09% 17.62% PM rush 28.61% 44.87% 46.40% 24.77% Overall 24.07% 40.39% 41.54% 20.67%
Crime prediction in Chicago
Since 2009, we have been working with the Chicago Police Department (CPD) to predictand prevent emerging clusters of violent crime.
Our new crime prediction methods have been incorporated into our
CrimeScansoftware, which has been used operationally by CPD
for deployment of patrols.
From the Chicago Sun-Times, February 22, 2011:
“It was a bit like “Minority Report,” the 2002 movie that featured genetically altered humans with special powers to predict crime. The CPD’s new crime-forecasting unit was analyzing 911 calls and produced an intelligence report predicting a shooting would happen soon on a particular block on the South Side. Three minutes later, it did…”
Data Management Technology
New and Existing methods
•
Statistics
•
Machine Learning
•
Optimization
Scalable Computation
Many important applications must process large streams of live data and provide
results in near-real-time
- Social network trends - Website statistics
- Intrusion detection systems - etc.
Require large clusters to handle workloads
Require latencies of few seconds
Exploit data parallelism or graph parallelism based on task
Frameworks and Services
•
Graphlab (CMU), AMPlab (Berkeley)
–
Computational frameworks
•
Amazon (EC), EMC, Oracle,…
–
Storage and computation via the private market
•
Bundled and specialized offerings
The Data and Analytics Value Chain
• Network and access Infrastructure – Devices and data rates
– Mobile phones are the dominant device but we are on the cusp of embedded
sensors for a variety of applications
• Secure Data storage and compute infrastructure
– Role of cloud Services
– Regional vs. National Strategies
• Data governance and sharing infrastructure
– Role of Open Data initiatives from the Public Sector
– Role of “Exchanges” (e.g., health information exchanges with private and public
data in the US)
– Data sharing standards
– Data for the “public Good: (see Orange’s Data for development) • Data Privacy Policy and its interaction with business models
Technology alone will not solve
all of society’s challenges
.
Must consider economic, social
and cultural barriers to adoption
and use of solutions.
“Lean” Innovation
•
Policy Innovation
–
Key enabler of a number of ICT applications
–
Example: Mobile money; Balancing KYC
requirements with enabling airtime agents to do
both cash in and cash out
•
Business model innovation
–
incenting individual engagement with
compensation models
Policy Considerations
• Be Problem Driven: Support both bottom up and strategic planning on problems and
initiatives that are likely to benefit from data analytics and enable their use of these technologies
– Disaster preparedness, Intelligent Water Management, Smart Retail, Education… • Leverage Existing Investments and Nurture New Data Sources:
– Statistical agencies, public health, CDR’s from telco’s, social media data – Public-private partnership
• Provide incentives for good data governance and stewarding:. User protection via
privacy and security technologies. Need to create policies that will enable data flow!
• Education and Workforce Development: Develop, recruit and grow the skills needed
to fuel and support the data-driven economy.
– Partnership with universities and private sector to train human capital • Make enabling infrastructure accessible and affordable:
– Mobile broadband – reach and pricing – Cloud infrastructure and services
Education and Workforce Development
“Data Science: The
Sexiest Job of the 21
stCredits