• No results found

BIG DATA : Big Opportunity or Big Threat for Official Statistics?* Jose Ramon G. Albert, Ph.D. Secretary General, NSCB

N/A
N/A
Protected

Academic year: 2021

Share "BIG DATA : Big Opportunity or Big Threat for Official Statistics?* Jose Ramon G. Albert, Ph.D. Secretary General, NSCB"

Copied!
21
0
0

Loading.... (view fulltext now)

Full text

(1)

NATIONAL STATISTICAL COORDINATION BOARD

1

12th National Convention on Statistics

October 1-2, 2013, EDSA Shangri-La Hotel, Mandaluyong City

BIG DATA :

Big Opportunity or

Big Threat for Official Statistics?*

Jose Ramon G. Albert, Ph.D.

Secretary General, NSCB Email: jrg.albert@nscb.gov.ph

(2)

NATIONAL STATISTICAL COORDINATION BOARD

2

12th National Convention on Statistics

October 1-2, 2013, EDSA Shangri-La Hotel, Mandaluyong City

Outline of the Presentation

I.

Introduction: Importance of Official

Statistics in Public Policy

II. Big Data is Here!!!

III. Big Data: Big News or Big Mess?

IV. Some Final Words on Big Data and

(3)

NATIONAL STATISTICAL COORDINATION BOARD

3

12th National Convention on Statistics

October 1-2, 2013, EDSA Shangri-La Hotel, Mandaluyong City

I. Introduction

Importance for

managing economies

more

effectively

• Inputs to monitor national development plans, roadmaps, targets, international commitments (MDGs, post MDG agenda)

• “You can’t manage well what you don’t measure”

Credibility

: integrity, independence and

professionalism

• UN Fundamental Principles on Official Statistics

Critics find official statistics not sufficient: call

for “

data revolution

(4)

NATIONAL STATISTICAL COORDINATION BOARD

4

12th National Convention on Statistics

October 1-2, 2013, EDSA Shangri-La Hotel, Mandaluyong City

I. Introduction

BIG in volume

• surveys and censuses, administrative reporting systems

but not big in frequency (i.e. velocity)

• Despite ICT tools, still not fast enough (due to costs, human resources, processes for reporting, including attention to precision and accuracy).

Tried and Tested Methods for Collecting Data

(5)

NATIONAL STATISTICAL COORDINATION BOARD

5

12th National Convention on Statistics

October 1-2, 2013, EDSA Shangri-La Hotel, Mandaluyong City

Electronic devices (mobile phones, smart

phones, tablets, laptops), social media, “google”,

sensors, tracking devices (GPS)

• 2.5 quintillion (2.5 x 1018) bytes of data created per day in 2012

• More and more internet subscribers !!! (In PH, 36% in 2012 from 2% in 2000)

• More and more mobile subscribers !!! (102 per 100 persons in 2012 in PH)

II. BIG DATA is here !!!

0.00 20.00 40.00 60.00 80.00 100.00 120.00 140.00 160.00 180.00 2 0 0 0 20 01 20 02 20 03 20 04 20 05 20 06 20 07 20 08 20 09 20 10 20 11 20 12

Mobile-cellular telephone subscriptions per 100 inhabitants Cambodia Indonesia Lao P.D.R. Malaysia Myanmar Philippines Singapore Thailand Timor-Leste Viet Nam Cambodia Myanmar Timor-Leste 0.00 10.00 20.00 30.00 40.00 50.00 60.00 70.00 80.00 2 0 0 0 20 02 20 04 20 06 20 08 20 10 20 12 Percentage of Individuals using the Internet

Cambodia Indonesia Lao P.D.R. Malaysia Myanmar Philippines Singapore Thailand Timor-Leste Viet Nam

(6)

NATIONAL STATISTICAL COORDINATION BOARD

6

12th National Convention on Statistics

October 1-2, 2013, EDSA Shangri-La Hotel, Mandaluyong City

Age of gadgets, social media and sensors

Increasing Public Need for

“Knowing in (Real)

Time”

Health Surveillance: Google Flu Trends (J.

Ginsburg et al, Nature , 2009)

II. BIG DATA is here !!!

(7)

NATIONAL STATISTICAL COORDINATION BOARD

7

12th National Convention on Statistics

October 1-2, 2013, EDSA Shangri-La Hotel, Mandaluyong City

Google Dengue Trends

II. BIG DATA is here !!!

(8)

NATIONAL STATISTICAL COORDINATION BOARD

8

12th National Convention on Statistics

October 1-2, 2013, EDSA Shangri-La Hotel, Mandaluyong City

Beyond Health: Google Predicting the Present

II. BIG DATA is here !!!

Predicting the Present with Google Trends (Choi & Varian, April 2009)

(9)

NATIONAL STATISTICAL COORDINATION BOARD

9

12th National Convention on Statistics

October 1-2, 2013, EDSA Shangri-La Hotel, Mandaluyong City

Monitoring Inflation and Traffic: UN Global Pulse

reports of successes in Pulse Laboratory in

Jakarta relating about

“rice” on Twitter with

actual price of rice (Letouze, 2012)

II. BIG DATA is here !!!

(10)

NATIONAL STATISTICAL COORDINATION BOARD

10

12th National Convention on Statistics

October 1-2, 2013, EDSA Shangri-La Hotel, Mandaluyong City

DATA REVOLUTION in PH

NSO using Google Maps to help re-design

master sample of household surveys;

NSCB extensively using web for dissemination

(online articles, facebook, twitter, livestream)

DOST’s

Project

Nationwide

Operational

Assessment of Hazards (NOAH) helps

government minimize climate disaster risks

• 676 deaths in CDO due to Sendong in 2011 • 1 death in CDO

due to Pablo in 2012

(11)

NATIONAL STATISTICAL COORDINATION BOARD

11

12th National Convention on Statistics

October 1-2, 2013, EDSA Shangri-La Hotel, Mandaluyong City

II. BIG DATA is here !!!

Importance of Information in Planning and

Programming, especially Mitigating Risks from

New Threats to Development (such as Impact of

Climate Change on Climate Disasters)

(12)

NATIONAL STATISTICAL COORDINATION BOARD

12

12th National Convention on Statistics

October 1-2, 2013, EDSA Shangri-La Hotel, Mandaluyong City

III. Big Data: Big News or Big Mess?

Official Statistics

Big Data

1. Structured and planned product

1. Largely unstructured unfiltered “data exhaust”, i.e., by-product of digital products (transactions, web, social media) 2. Methodological and clear concepts 2. Poor analytics 3. Regulated 3. Unregulated 4. Macro-level but

typically based on high volume primary data

4. Micro-level huge volume with high velocity (or frequency) and variety

5. High cost 5. Generally little, or no cost 6. Centralized; point in

time

(13)

NATIONAL STATISTICAL COORDINATION BOARD

13

12th National Convention on Statistics

October 1-2, 2013, EDSA Shangri-La Hotel, Mandaluyong City

A. Privacy Issues

• Much of Big Data being generated includes personal information. Precise, geo-location-based information pushes boundary of confidentiality/privacy.

 Amazon, Visa, Mastercard watching our shopping preferences

 Google watching our browsing habits  Twitter watching what’s on our minds

 Facebook watching various info, including our social relationships

 Mobile providers watching whom we talk to, what we say to them, and even who is nearby

III. Big Data: Big News or Big Mess?

(14)

NATIONAL STATISTICAL COORDINATION BOARD

14

12th National Convention on Statistics

October 1-2, 2013, EDSA Shangri-La Hotel, Mandaluyong City

A. Privacy Issues

• Examples of Breach of Confidentiality/Privacy

 In 1943, US Census Bureau gave US govt block addresses (although not street names and numbers) of Japanese-Americans that led to having them imprisoned because of the US-Japan war

 Netherlands’ civil records used by Nazis to round up Jews

 Census data were used by BPS Statistics Indonesia to assist government in coming up with list of “poor” households

(15)

NATIONAL STATISTICAL COORDINATION BOARD

15

12th National Convention on Statistics

October 1-2, 2013, EDSA Shangri-La Hotel, Mandaluyong City

A. Privacy Issues

• “Notice and Consent” of Users

 Can users give “informed consent” to an unknown use?

 When Google Flu Trends was developed, did Google have contact all its users for approval to use old search queries for this project?

Should users be asked to agree to any possible future use of their data?

 Other ways to protect privacy, but imperfect:  Opting out (but this can leave a trace)

 Anonimization (but “re-identification” still possible)

(16)

NATIONAL STATISTICAL COORDINATION BOARD

16

12th National Convention on Statistics

October 1-2, 2013, EDSA Shangri-La Hotel, Mandaluyong City

B. Big Bias

• Gains in velocity (and cost) over sacrificing precision and accuracy, i.e. Big Data may not be completely accurate, but is thought of as “good enough.”

• But how good is “good enough?”

• Recent work suggests some over-estimation of Google Virus Trends of flu levels (11% in the US public this flu season, almost double the CDC’s estimate of about 6%).

(17)

NATIONAL STATISTICAL COORDINATION BOARD

17

12th National Convention on Statistics

October 1-2, 2013, EDSA Shangri-La Hotel, Mandaluyong City

B. Big Bias

• A study of Twitter and Foursquare data before, during and in aftermath of Hurricane Sandy (Grinberg, et al., 2013) revealed:

III. Big Data: Big News or Big Mess?

 grocery shopping peaks the night before the storm)

 nightlife picked up the day after).

 Greatest number of tweets about Hurricane Sandy came from Manhattan. (This creates the illusion that Manhattan was the most hit in the US. It wasn’t!)

(18)

NATIONAL STATISTICAL COORDINATION BOARD

18

12th National Convention on Statistics

October 1-2, 2013, EDSA Shangri-La Hotel, Mandaluyong City

B. Big Bias

"[Big Data] is sometimes seen as a cure-all, as computers were in the 1970s. Chris Anderson… wrote in 2008 that the sheer volume of data would obviate the need for theory, and even

the scientific method….

[T]hese views are badly mistaken. The numbers have no way of speaking for themselves. We speak for them ..

If the quantity of information is increasing by 2.5 quintillion bytes per day, the amount of useful information almost

certainly isn't. Most of it is just noise, and the noise is

increasing faster than the signal.” – Nate Silver, The Silver and the Noise

(19)

NATIONAL STATISTICAL COORDINATION BOARD

19

12th National Convention on Statistics

October 1-2, 2013, EDSA Shangri-La Hotel, Mandaluyong City

C. Predictive Analytics Gone Wild

• Perilously Predicting Future Crime and Punishing Future

Criminals ala movie Minority Report

 Parole boards in US using “predictions” from data analysis for parole decisions

 City of Memphis, Tennessee uses Blue CRUSH (Crime Reduction Utilizing Statistical History) to concentrate police resources in a specific area at a specific time. (Crimes fell by a quarter from CRUSH inception in 2006, but due to CRUSH???)  US Dept of Homeland Security uses FAST (Fture

Attribute Screening Technology) to identify potential terrorists (Reportedly 70% accurate ??? ) III. Big Data: Big News or Big Mess?

(20)

NATIONAL STATISTICAL COORDINATION BOARD

20

12th National Convention on Statistics

October 1-2, 2013, EDSA Shangri-La Hotel, Mandaluyong City

IV. Final Words

1. Big Data is Here to Stay

2. The End of Official Statistics? Hardly! BUT…

3. Possible Ways Forward

Need for Legal Protocols and Institutional

Arrangements for Access/Ownership

Public-Private Partnerships / Investments on Data

Addressing Privacy Issues with Big Data

Investments for Capacity Building in the PSS

and Partners to Harness Big Data

Official Statistics community identifying “signals”

within “noise” ; certifying quality; deciphering truth from falsehood

(21)

NATIONAL STATISTICAL COORDINATION BOARD

21

12th National Convention on Statistics

October 1-2, 2013, EDSA Shangri-La Hotel, Mandaluyong City

Big Data and Official Statistics:

Partners in Enabling Public to

“Know in (Real) Time”

/NSCBPhilippines @NSCBPhilippines http://www.nscb.gov.ph info@nscb.gov.ph /NSCBInfo NSCBinfo@gmail.com

References

Related documents

Here, it is advantageous to include developers, because a good understanding of the product helps them to make better decisions during implementation (see, Item 2)?. Furthermore,

If you expect search engine spiders to execute Flash, Java or Javascript code in order to access links to further pages within your site, you'll usually be disappointed with

In conclusion, for the studied Taiwanese population of diabetic patients undergoing hemodialysis, increased mortality rates are associated with higher average FPG levels at 1 and

The Interior Design Institute’s team of qualified professionals are here to guide you through your diploma course, revealing the secrets that have brought them success along with

− Analiza prednosti (snaga) projektne nastave: učenici rado sudjeluju u planiranju, realizaciji i vrednovanju projektne nastave i imaju mogućnost izbora sadržaja

Either the suspect contributed the evidence, or an unlikely coincidence happened – the once-in-1.6 × 10 15 (1.6 quadrillion) coincidence that an unrelated person would

Single-cell analysis identified large (>1 Mb) clonal CNVs in lymphoblasts and in single neurons from normal human brain tissue, suggesting that some CNVs occur during

combination was purposefully selected for its ability to accommodate the investigation of cross- case comparisons (Yin, 2009) needed to derive general insights grounded