NATIONAL STATISTICAL COORDINATION BOARD
1
12th National Convention on Statistics
October 1-2, 2013, EDSA Shangri-La Hotel, Mandaluyong City
BIG DATA :
Big Opportunity or
Big Threat for Official Statistics?*
Jose Ramon G. Albert, Ph.D.
Secretary General, NSCB Email: jrg.albert@nscb.gov.ph
NATIONAL STATISTICAL COORDINATION BOARD
2
12th National Convention on Statistics
October 1-2, 2013, EDSA Shangri-La Hotel, Mandaluyong City
Outline of the Presentation
I.
Introduction: Importance of Official
Statistics in Public Policy
II. Big Data is Here!!!
III. Big Data: Big News or Big Mess?
IV. Some Final Words on Big Data and
NATIONAL STATISTICAL COORDINATION BOARD
3
12th National Convention on Statistics
October 1-2, 2013, EDSA Shangri-La Hotel, Mandaluyong City
I. Introduction
Importance for
managing economies
more
effectively
• Inputs to monitor national development plans, roadmaps, targets, international commitments (MDGs, post MDG agenda)
• “You can’t manage well what you don’t measure”
Credibility
: integrity, independence and
professionalism
• UN Fundamental Principles on Official Statistics
Critics find official statistics not sufficient: call
for “
data revolution
”
NATIONAL STATISTICAL COORDINATION BOARD
4
12th National Convention on Statistics
October 1-2, 2013, EDSA Shangri-La Hotel, Mandaluyong City
I. Introduction
BIG in volume
• surveys and censuses, administrative reporting systems
but not big in frequency (i.e. velocity)
• Despite ICT tools, still not fast enough (due to costs, human resources, processes for reporting, including attention to precision and accuracy).
Tried and Tested Methods for Collecting Data
NATIONAL STATISTICAL COORDINATION BOARD
5
12th National Convention on Statistics
October 1-2, 2013, EDSA Shangri-La Hotel, Mandaluyong City
Electronic devices (mobile phones, smart
phones, tablets, laptops), social media, “google”,
sensors, tracking devices (GPS)
• 2.5 quintillion (2.5 x 1018) bytes of data created per day in 2012
• More and more internet subscribers !!! (In PH, 36% in 2012 from 2% in 2000)
• More and more mobile subscribers !!! (102 per 100 persons in 2012 in PH)
II. BIG DATA is here !!!
0.00 20.00 40.00 60.00 80.00 100.00 120.00 140.00 160.00 180.00 2 0 0 0 20 01 20 02 20 03 20 04 20 05 20 06 20 07 20 08 20 09 20 10 20 11 20 12
Mobile-cellular telephone subscriptions per 100 inhabitants Cambodia Indonesia Lao P.D.R. Malaysia Myanmar Philippines Singapore Thailand Timor-Leste Viet Nam Cambodia Myanmar Timor-Leste 0.00 10.00 20.00 30.00 40.00 50.00 60.00 70.00 80.00 2 0 0 0 20 02 20 04 20 06 20 08 20 10 20 12 Percentage of Individuals using the Internet
Cambodia Indonesia Lao P.D.R. Malaysia Myanmar Philippines Singapore Thailand Timor-Leste Viet Nam
NATIONAL STATISTICAL COORDINATION BOARD
6
12th National Convention on Statistics
October 1-2, 2013, EDSA Shangri-La Hotel, Mandaluyong City
Age of gadgets, social media and sensors
Increasing Public Need for
“Knowing in (Real)
Time”
Health Surveillance: Google Flu Trends (J.
Ginsburg et al, Nature , 2009)
II. BIG DATA is here !!!
NATIONAL STATISTICAL COORDINATION BOARD
7
12th National Convention on Statistics
October 1-2, 2013, EDSA Shangri-La Hotel, Mandaluyong City
Google Dengue Trends
II. BIG DATA is here !!!
NATIONAL STATISTICAL COORDINATION BOARD
8
12th National Convention on Statistics
October 1-2, 2013, EDSA Shangri-La Hotel, Mandaluyong City
Beyond Health: Google Predicting the Present
II. BIG DATA is here !!!
Predicting the Present with Google Trends (Choi & Varian, April 2009)
NATIONAL STATISTICAL COORDINATION BOARD
9
12th National Convention on Statistics
October 1-2, 2013, EDSA Shangri-La Hotel, Mandaluyong City
Monitoring Inflation and Traffic: UN Global Pulse
reports of successes in Pulse Laboratory in
Jakarta relating about
“rice” on Twitter with
actual price of rice (Letouze, 2012)
II. BIG DATA is here !!!
NATIONAL STATISTICAL COORDINATION BOARD
10
12th National Convention on Statistics
October 1-2, 2013, EDSA Shangri-La Hotel, Mandaluyong City
DATA REVOLUTION in PH
NSO using Google Maps to help re-design
master sample of household surveys;
NSCB extensively using web for dissemination
(online articles, facebook, twitter, livestream)
DOST’s
Project
Nationwide
Operational
Assessment of Hazards (NOAH) helps
government minimize climate disaster risks
• 676 deaths in CDO due to Sendong in 2011 • 1 death in CDO
due to Pablo in 2012
NATIONAL STATISTICAL COORDINATION BOARD
11
12th National Convention on Statistics
October 1-2, 2013, EDSA Shangri-La Hotel, Mandaluyong City
II. BIG DATA is here !!!
Importance of Information in Planning and
Programming, especially Mitigating Risks from
New Threats to Development (such as Impact of
Climate Change on Climate Disasters)
NATIONAL STATISTICAL COORDINATION BOARD
12
12th National Convention on Statistics
October 1-2, 2013, EDSA Shangri-La Hotel, Mandaluyong City
III. Big Data: Big News or Big Mess?
Official Statistics
Big Data
1. Structured and planned product
1. Largely unstructured unfiltered “data exhaust”, i.e., by-product of digital products (transactions, web, social media) 2. Methodological and clear concepts 2. Poor analytics 3. Regulated 3. Unregulated 4. Macro-level but
typically based on high volume primary data
4. Micro-level huge volume with high velocity (or frequency) and variety
5. High cost 5. Generally little, or no cost 6. Centralized; point in
time
NATIONAL STATISTICAL COORDINATION BOARD
13
12th National Convention on Statistics
October 1-2, 2013, EDSA Shangri-La Hotel, Mandaluyong City
A. Privacy Issues
• Much of Big Data being generated includes personal information. Precise, geo-location-based information pushes boundary of confidentiality/privacy.
Amazon, Visa, Mastercard watching our shopping preferences
Google watching our browsing habits Twitter watching what’s on our minds
Facebook watching various info, including our social relationships
Mobile providers watching whom we talk to, what we say to them, and even who is nearby
III. Big Data: Big News or Big Mess?
NATIONAL STATISTICAL COORDINATION BOARD
14
12th National Convention on Statistics
October 1-2, 2013, EDSA Shangri-La Hotel, Mandaluyong City
A. Privacy Issues
• Examples of Breach of Confidentiality/Privacy
In 1943, US Census Bureau gave US govt block addresses (although not street names and numbers) of Japanese-Americans that led to having them imprisoned because of the US-Japan war
Netherlands’ civil records used by Nazis to round up Jews
Census data were used by BPS Statistics Indonesia to assist government in coming up with list of “poor” households
NATIONAL STATISTICAL COORDINATION BOARD
15
12th National Convention on Statistics
October 1-2, 2013, EDSA Shangri-La Hotel, Mandaluyong City
A. Privacy Issues
• “Notice and Consent” of Users
Can users give “informed consent” to an unknown use?
When Google Flu Trends was developed, did Google have contact all its users for approval to use old search queries for this project?
Should users be asked to agree to any possible future use of their data?
Other ways to protect privacy, but imperfect: Opting out (but this can leave a trace)
Anonimization (but “re-identification” still possible)
NATIONAL STATISTICAL COORDINATION BOARD
16
12th National Convention on Statistics
October 1-2, 2013, EDSA Shangri-La Hotel, Mandaluyong City
B. Big Bias
• Gains in velocity (and cost) over sacrificing precision and accuracy, i.e. Big Data may not be completely accurate, but is thought of as “good enough.”
• But how good is “good enough?”
• Recent work suggests some over-estimation of Google Virus Trends of flu levels (11% in the US public this flu season, almost double the CDC’s estimate of about 6%).
“
NATIONAL STATISTICAL COORDINATION BOARD
17
12th National Convention on Statistics
October 1-2, 2013, EDSA Shangri-La Hotel, Mandaluyong City
B. Big Bias
• A study of Twitter and Foursquare data before, during and in aftermath of Hurricane Sandy (Grinberg, et al., 2013) revealed:
III. Big Data: Big News or Big Mess?
grocery shopping peaks the night before the storm)
nightlife picked up the day after).
Greatest number of tweets about Hurricane Sandy came from Manhattan. (This creates the illusion that Manhattan was the most hit in the US. It wasn’t!)
NATIONAL STATISTICAL COORDINATION BOARD
18
12th National Convention on Statistics
October 1-2, 2013, EDSA Shangri-La Hotel, Mandaluyong City
B. Big Bias
"[Big Data] is sometimes seen as a cure-all, as computers were in the 1970s. Chris Anderson… wrote in 2008 that the sheer volume of data would obviate the need for theory, and even
the scientific method….
[T]hese views are badly mistaken. The numbers have no way of speaking for themselves. We speak for them ..
If the quantity of information is increasing by 2.5 quintillion bytes per day, the amount of useful information almost
certainly isn't. Most of it is just noise, and the noise is
increasing faster than the signal.” – Nate Silver, The Silver and the Noise
NATIONAL STATISTICAL COORDINATION BOARD
19
12th National Convention on Statistics
October 1-2, 2013, EDSA Shangri-La Hotel, Mandaluyong City
C. Predictive Analytics Gone Wild
• Perilously Predicting Future Crime and Punishing Future
Criminals ala movie Minority Report
Parole boards in US using “predictions” from data analysis for parole decisions
City of Memphis, Tennessee uses Blue CRUSH (Crime Reduction Utilizing Statistical History) to concentrate police resources in a specific area at a specific time. (Crimes fell by a quarter from CRUSH inception in 2006, but due to CRUSH???) US Dept of Homeland Security uses FAST (Fture
Attribute Screening Technology) to identify potential terrorists (Reportedly 70% accurate ??? ) III. Big Data: Big News or Big Mess?
NATIONAL STATISTICAL COORDINATION BOARD
20
12th National Convention on Statistics
October 1-2, 2013, EDSA Shangri-La Hotel, Mandaluyong City
IV. Final Words
1. Big Data is Here to Stay
2. The End of Official Statistics? Hardly! BUT…
3. Possible Ways Forward
Need for Legal Protocols and Institutional
Arrangements for Access/Ownership
• Public-Private Partnerships / Investments on Data
Addressing Privacy Issues with Big Data
Investments for Capacity Building in the PSS
and Partners to Harness Big Data
• Official Statistics community identifying “signals”
within “noise” ; certifying quality; deciphering truth from falsehood
NATIONAL STATISTICAL COORDINATION BOARD
21
12th National Convention on Statistics
October 1-2, 2013, EDSA Shangri-La Hotel, Mandaluyong City
Big Data and Official Statistics:
Partners in Enabling Public to
“Know in (Real) Time”