• No results found

USE OF GEOSPATIAL AND WEB DATA FOR OECD STATISTICS

N/A
N/A
Protected

Academic year: 2021

Share "USE OF GEOSPATIAL AND WEB DATA FOR OECD STATISTICS"

Copied!
20
0
0

Loading.... (view fulltext now)

Full text

(1)

USE OF GEOSPATIAL

AND WEB DATA FOR

OECD STATISTICS

CCSA SPECIAL SESSION ON SHOWCASING BIG DATA

1 OCTOBER 2015

Paul Schreyer

(2)
(3)

OECD:

Facilitator of discussion on new data sources

for NSOs

OECD’s own use of new data sources

From Big Data to Smart Data

Not every New data source is Big

(4)

Business value analysis: why are we

working on this?

More granularity or coverage of existing data

(e.g. spatial disaggregation)

New output (e.g., measuring trust, inequalities)

Greater timeliness

– nowcasting

Increased impact

– analysis supporting OECD

mission, possibility to link areas

Increased responsiveness

– capacity to address new

topics quickly, respond to what-if questions

(5)

Capacity to identify, evaluate and access new

data sources

Command of methodology

Proven quality and metadata frameworks

Suitable IT infrastructures

Established legal and ethical frameworks

Skills and training capacity

Business process analysis:

Necessary capabilities

(6)

* Online Real estate prices (OECD GOV) * Measuring trade restrictiveness by scraping and analysing trade laws (OECD TAD)

Web crawling, web scraping

Content Analysis Mobility studies Sensor and geospatial data

* African Economic Outlook (AEO): Civil tensions and political governance indicators (OECD DEV)

* Big Data Measures of Human Well-Being – Evidence from US Google Index (OECD STD)

* Measure transport reliability from

geolocalisation logs (ITF)

* Air quality and land cover data (OECD GOV)

* Enriching the

metropolitan database using geo-spatial data (OECD GOV)

* PIAAC log file data (OECD EDU)

(7)

EXAMPLE 1 ENVIRONMENTAL

INDICATORS

Using geospatial data

(satellite data)

(8)

Where air pollution is above recommended

levels

Where improvements in air quality have

happened

Linking air pollution to health

Average population exposure to air

pollution (PM2.5)

(9)

Source: Raster (satellite observations)

9

Ground-based stations Satellite observations Advantages • Direct measures

• Offer regular levels of air pollution over time

• More pollutants are available

• Global coverage

• Consistent method to compute air pollution in cities, regions and countries

• Consistent time-series data, spanning more than a decade

Disadvantages • Low coverage in developing countries • Uneven coverage within and across

countries

• PM2.5 concentration rarely monitored • Site selection, measurement

techniques, and reporting methods differ across regions and countries

• Modelled data

• Satellite observations are less precise for bright surfaces (snow or desert) • Current data are on a multi-year

average, evaluation of short-term events often unavailable

Satellite observations

• Raster: van Donkelaar et al. (2014) • Resolution: ~10 km2

(10)

1. The satellite-based

values of air pollution

are multiplied by the population living in the

area (using a 1km2 resolution grid)

2. The

exposure to air pollution

in a region is

given by the sum of the population weighted

values of PM2.5 in the 1km2 grid cells falling

within the boundaries of the region

3. Finally, dividing this aggregated value by the

total population in the region, we obtain the

average exposure to PM2.5 concentration in

a region

(11)

• 68% of the urban population in OECD countries (376 million people) are exposed to pollution above the WHO’s recommended levels.

• OECD estimates show wide variation in PM2.5 exposure levels across cities within countries, the largest in Mexico, Italy, Japan and Korea

11

Levels and trends in OECD cities

Mé ri d a Pal er m o N aha Uls a n T oul on P or tland G dańs k Las P al m as B rem en S toc k hol m G las gow B rn o C onc epc ió n G enev a Q uebec U tr ec ht Li sbon A thens Ant w er p Li nz C uer nav ac a M ila n K um am ot o C heongj u S tr as bour g B uf fal o K rak ów Z ar agoz a E sse n Ma lm ö Li ver pool Os tr a va S ant iago Z u ri ch T or ont o T he H ague P o rto T hes sal oni ca B ru sse l V ienna B udapes t B rat is lav a Lj ubl jana C openhague n He ls in ki T a llin n Os lo Du b lin -10 0 10 20 30 40 M ex ic o ( 33) It al y ( 11) Japan ( 36) K or ea ( 10) Fr anc e ( 15) U ni ted S tat es ( 70) P ol and ( 8) S pai n ( 8) G er m any ( 24) S w eden ( 3) U ni ted K ingdom ( 15) C zec h R epubl ic ( 3) C hi le ( 3) S w itz er land ( 3) C anada ( 9) N et her lands ( 5) P or tugal ( 2) G reec e ( 2) B el gi um ( 4) A us tr ia ( 3) H ungar y ( 1) S lov ak R epubl ic ( 1) S lov eni a ( 1) D enm ar k ( 1) Fi nl and ( 1) E st oni a ( 1) N or w ay ( 1) Ir el and ( 1)

Metropolitan minimum Country average Metropolitan maximum

Co un try (N o. of c itie s)

(12)

Europe USA Japan World Raster

name Corine land cover National land cover dataset (NLCD) Japan National Land Service

Information data

MODIS 500 Map of Global Urban Extent

Resolution 25 metres 30 metres 100 metres 500m

Years 2000-06 2001-06 1997-2006 2008

Classif. of

urban land 44 land urban classes 21 land cover classes 11 land cover classes 17 land cover classes Water

Other example: raster sources used for

land cover

(13)

…feeds into the OECD Regional Well-Being

Database

Links: Regional Well-Being database Regional Well-Being web tool

(14)

EXAMPLE 2 TRADE POLICY

ANALYSIS

Using qualitative data from

government websites

(15)

Basic idea

Traditionally:

• Policy questionnaires to countries

• ‘Manual’ screening of government websites

New:

• Machine-based monitoring of government web sites

• Automatic check for changes or addition of rules and

regulations

Test case: qualitative information for the OECD’s trade

restrictiveness information and index

(16)

Text comparison - Initial discovery

 Run a text comparison between the original document and the new updated document

 Detect and flag specific paragraphs changed or updated inside long documents

Text comparison - Advanced discovery

.

 Changes in rules and regulations can also happen through new pages

 Use ‘big data’ techniques to compare in house

structured information to the universe of laws and regulations in a given country.

 Work on text definitions similar to the original ones to help identifying potentially relevant documents.

(17)

 Web-crawling: scripts to systematically scan

governmental websites where regulations can be found (federal, provincial, regional, etc.).

 Web-scraping: scripts to extract the relevant

information in documents, possibly based on articles and paragraphs (text analysis).

 Document conversion: most laws and regulations are

in pdf but possibly in other formats that would need to become text documents to run text analysis.

 Text comparison: tools and dictionaries to compare

the text of updated documents with the original text, to calculate similarity coefficients with other documents, in a variety of languages with the option to also use proximity of similar words.

(18)

Promising results on French legal texts (Legifrance)

(19)

Significant

potential

Use cases and pilots provide really

important

reality checks

Smart data

and multiple source, not

necessarily big data

Initiatives have sprung in many parts of

OECD

Need to be accompanied by overall

strategy

being developed at OECD

(20)

References

Related documents

The Tonle Sap Lake's importance in sustaining the health of Mekong fisheries is reaffirmed by the fact that it is home to a large proportion of fish species found in the Mekong

3 ROLES SCRUM MASTER PRODUCT OWNER DEV TEAM 5 EVENTS SPRINT PLANNING DAILY SCRUM SPRINT SPRINT REVIEW RETROSPECTIVE 3 ARTEFACTS PRODUCT BACKLOG SPRINT BACKLOG

In the first years after the entry into force of the Regulation, some national courts interpreted the Regulation's rules on jurisdiction broadly so as to bring insolvency

In Chapter 2, we presented the constructions of the standard Wiener polynomial chaos expansions of random variables with finite second moments, and we showed that under the condition

Evidence-based intervention to reduce avoidable hospital admissions in care home residents (the Better Health in Residents in Care Homes (BHiRCH) study): protocol for a

We also noticed that the boson-fermion interaction suppresses the fermionic density in the center of the trap where the density of the condensate peaks at different regimes related

It is paramount to choose the right user requirement approach that might be suitable to organization environment. Recognizing the strengths and weaknesses of each approach could

White, Lisa Marie, "The Value of Well-Being: Advancing Urban Blue Infrastructure with Holistic Metrics" (2014).