03: What You Need to Know About Big Data:
Understanding and Better Utilizing Data
Analytics
Trainer(s):
Mike Holland
NYU Center for Urban Science and Progress
http://cusp.nyu.edu
Timothy Savage
NYU Center for Urban Science and Progress
http://cusp.nyu.edu Alan Mitchell KPMG www.kpmg.com Stephen C. Beatty KPMG www.kpmg.com
Mike Holland Tim Savage March 7, 2015
What You Need to Know about Big Data
Applied Sciences NYC
“Applied Sciences NYC is the City’s unparalleled opportunity to build or expand world-class applied sciences and engineering campuses in New York
City. We are seeking to dramatically expand our capacity in the applied sciences to maintain our global
competitiveness and create jobs. These
campuses would not only enrich the City’s existing research capabilities, but also lead to innovative ideas that can be commercialized, catalyzing hundreds of
spinoff companies and increasing the probability that the next high growth company – a Google, Amazon, or
Facebook – will emerge in New York City.”
New York City Economic Development Corporation
The NYU-led Center for Urban Science and
Progress, a multi-sector research and education
collaborative, was announced on April 23, 2012.
Big Cities + Big Data
• Informatics capabilities are exploding
– Storage, transmission, analysis
• Proliferation of static and mobile sensors
• Internet of things
Global network traffic, 30% CAGR
• The world is urbanizing
• Cities are the loci of
consumption, economic activity,
and innovation
Cities are the cause of our
problems
GRADUATE PROGRAMS IN
APPLIED URBAN SCIENCE AND INFORMATICS
DEGREE
Master of Science
LENGTH
One Year, 3-semester (Full-time)
CLASS SIZE
Approx. 60 students
Projects for the City & State
• City Lights
• Building Informatics • Urban Soundscape
• Neuroeconomics of Decision Making • Economic Mapping
• Greener Greater Buildings Plan • MTA Bus Driver Optimization • MTA Origin/Destination Study
• New York City Police Department 911/311 • Trash Informatics
• Parks Attendance & Utilization
• Property Ownership Records Assessment • School Property Use Assessment
• Taxi Visualization • Transit Operations
Properly acquired, integrated, and analyzed,
data can
•
Take government beyond imperfect understanding
–
Better (and more efficient) operations, better planning,
better policy
•
Improve governance and citizen engagement
•
Enable the private sector to develop new services for citizens,
governments, firms
•
Enable a revolution in the social sciences
Environment
Meteorology, pollution, noise, flora, faunaPeople
Relationships, location, economic
/communications
activities, health, nutrition, opinions, …
Infrastructure
Condition, operations
Urban Data Sources: Acquire, Integrate, Use
Novel Technologies
• Visible, infrared and spectral imagery • RADAR, LIDAR
• Gravity and magnetic • Seismic, acoustic • Ionizing radiation, biological, chemical • … Sensors • Personal (location, activity, physiological) • Fixed in situ sensors • Crowd sourcing
(mobile phones, …) • Choke points (people,
vehicles) Organic Data Flows
• Administrative records (census, permits, …) • Transactions (sales, communications, …) • Operational (traffic,
transit, utilities, health system, …)
• Social media (Twitter, Facebook, blogs, …)
The book identifies ways in which vast new sets of data on human beings can be collected, integrated, and analyzed to improve urban systems and quality of life while protecting confidentiality. Sponsored by CUSP, the American Statistical Association, its Privacy and Confidentiality
subcommittee, and the Research Data Centre of the German Federal Employment Agency.
Editors: Julia Lane, American Institutes for Research; Victoria Stodden,
Columbia; Stefan Bender, The German Federal Employment Agency; Helen Nissenbaum, NYU
Chapter Authors
Alessandro Acquisti, Carnegie Mellon University; Cynthia Dwork, Microsoft; Peter Elias, University of Warwick; Robert Goerge, UChicago; Alan Karr, National Institute of Statistical Sciences and Jerry Reiter, Duke University; Steve Koonin and Michael Holland, CUSP; Frauke Kreuter, U-MD and Richard Peng, Johns Hopkins; Carl Landwehr, George Washington University; Helen Nissenbaum and Solon Baracas, NYU; Paul Ohm, Colorado; Alexander Pentland, et al., MIT; Kathy Strandberg, NYU; Victoria Stodden, Columbia; John Wilbanks, Sage Bionetworks/Kauffman Foundation.
visit dataprivacybook.org.
Privacy, Big Data, and the Public Good:
Frameworks for Engagement
Overview
• Data from yellow cabs 2009-2013 is almost 800 million trips; nearly impossible to manage, explore, visualize, and analyze with existing tools
Objective & Goal
• Build scalable, usable tools that can be used by experts and non-experts
• Work with relevant city agencies on development & deployment of the technology
Status
• Initial deployment of TaxiVis at NYC Taxi & Limousine Commission and Department of Transportation
Freire, Silva, Vo, et al.
Analysis of
Taxis as Sensors for Manhattan
Taxis are sensors that can provide unprecedented
insight into city life: economic activity, human behavior,
mobility patterns, …
•
April 2011: Taxi drivers petitioning TLC for higher fares to compensate for
rising gasoline prices.
•
August 2011: Hurricane Irene
•
October 2012: Hurricane Sandy
Urban Observatory
PERSISTENT and SYNOPTIC ANALYTICS for URBAN SCIENCE
Photo by Tyrone Turner/National Geographic
Other synoptic modalities: Hyperspectral, RADAR, LIDAR, Gravity, Magnetic, …
Manhattan in the Thermal IR
199 Water Street
Built 1993 :: 998,000 sq ft electricity, natural gas, steam
Plumes of Opportunity
Background subtraction:
• registration to reference image
• form 10 absolute difference images from surrounding frames
• construct the minimum difference image pixel by pixel
Plume identification and tracking:
• denoise background subtracted image • identify excess/deficit in luminosity space • cross check object location in color space
• localization and probability weighted tracking of centroids
Upcoming use cases:
• plume rate • urban winds
• carbon vs steam emissions • TOO (triggered) observations raw image
background subtracted
Street Environment: Attention, Distraction, and Interaction Dynamics
Source: Dobler, et al.
Federal Open Data Policies
http://nys‐its.github.io/open‐data‐handbook/OpenDataHandbook.pdf
http://catalog.data.gov/dataset?organization_type=City+Government#topic=cities_ navigation
Source: Barbosa, Luciano, et al. "Structured open urban data: understanding the landscape." Big data 2.3 (2014): 144-154.
Cities and States with Chief Data Officers
Blue signifies a state‐level officer, green signifies a local‐level officer, and
yellow signifies an officer in education.
Source: Steve Towns, Which States and Cities Have Chief Data Officers?, govtech.com, June 13, 2014
Open Data Can Lead to Open Innovation
•
A consortium of public sector transit
agencies, commercial firms, nonprofits,
academic researchers, and interested
individuals
•
Real‐time arrival predictions
•
94% reported increased or greatly
increased satisfaction with public
transit
•
Significant decrease in actual wait time
per user, and an even greater decrease
in perceived wait time
•
78% of riders reported increased
walking ‐‐‐ a significant public health
benefit
http://onebusaway.org/
$826B • Health • Education • Social Services $245B • Planning • Public Buildings • Financial Admin • Community Development $180B • Emergency Mgmt • Courts, Jails • Police • Fire $397B • Sanitation • Utilities • Parks • Roads
Streets
Safety
Human
Services
General
Government
Core City Services Include…
We need to understand:
• How data flows within
agencies?
• How interoperable can
data be?
• What data can be
shared?
•
and how is it shared to
support delivery of city
services?
Local Gov’t. Expenditures: U.S. Census Bureau, 2012 Census of Governments: Surveys of State
Tools
•
Data acquisition and synthesis
•
Exploration and data “mining”
•
Formulation of meaningful policy questions
Tools
•
Data acquisition and synthesis
•
Exploration and data “mining”
•
Formulation of meaningful policy questions
Picture merges image captured from video, 3‐D LIDAR map of NYC, PLUTO
(Primary Land Use Tax Lot Output) database, and LL84 Energy Benchmarking data
Tools
•
Data acquisition and synthesis
•
Exploration and data “mining”
•
Formulation of meaningful policy questions
TaxiVis: Interactive Visual Exploration of NYC Taxi Records
Tools
•
Data acquisition and synthesis
•
Exploration and data “mining”
•
Formulation of meaningful policy questions
Tools
•
Data acquisition and synthesis
•
Exploration and data “mining”
•
Formulation of meaningful policy questions
Uses of Data Analytics
•
Regulatory compliance
•
Targeted enforcement
•
Improved understanding of municipal
Some Examples
•
Regulatory compliance
•
Targeted enforcement
•
Improved understanding of municipal
Some Examples
•
Regulatory compliance
•
Targeted enforcement
•
Improved understanding of municipal
Apartment Fires in the Bronx and Brooklyn
•
20,000+ complaints/year of unsafe illegal conversions
–
Department of Buildings: 200 building inspectors for
900,000 buildings relied on expert judgment to prioritize
–
Historically, only 8% of inspections found serious violations
•
Strongest predictors of unsafe illegal conversion
–
Whether the building is current on its property taxes: data
at Department of Finance
–
Whether banks have filed any mortgage foreclosures: data
at Office of Court Administration
•
Teaming Fire Marshals up with Building Inspectors
–
Fire fighters 15X more likely to die responding to a fire in an
illegal conversion than other fires
–
Vacate orders jumped to more than 70%
Source: Mike Flowers, “Beyond Open Data: The Data-Driven City” in Beyond Transparency: Open Data
and the Future of Civic Innovation, Brett Goldstein, Lauren Dyson, Eds.; San Francisco, CA:
Some Examples
•
Regulatory compliance
•
Targeted enforcement
•
Improved understanding of municipal
cusp.nyu.edu
NYUCUSP
@NYU_CUSP