• No results found

IPUMS-International: Harmonizing Big Data for Smart Research

N/A
N/A
Protected

Academic year: 2021

Share "IPUMS-International: Harmonizing Big Data for Smart Research"

Copied!
56
0
0

Loading.... (view fulltext now)

Full text

(1)

IPUMS-International:

Harmonizing Big Data for Smart Research

Patricia Kelly Hall

University of Minnesota Presented at the

Microdata Computation Centre

(MiCoCe)

Workshop

April 29, 2014 Nuremberg, Germany

(2)

IPUMS: Big Data for Smart Research

IPUMS-International Overview

Hazards of “Big Data” research

IPUMS harmonization principles / process

Data & tools for smart research

IF TIME: User statistics for Europe

(3)

IPUMS: Big Data for Smart Research

IPUMS-International Overview

IPUMS harmonization principles / process

Data & tools for smart research

User statistics for Europe

(4)

international.ipums.org

Source data for IPUMS-International are generously provided by participating National Statistical Offices

(5)
(6)

IPUMSI Microdata Availability

(7)
(8)
(9)

IPUMS: Big Data for Smart Research

IPUMS-International Overview

Hazards of “Big Data” research

IPUMS harmonization principles / process

Data & tools for smart research

User statistics for European

(10)

Hazards of using Big Data

Big data

= many different types of data

(11)

Hazards of using Big Data

Big data

= many different types of data

(12)

Hazards of using Big Data

Big data

= many different types of data

= different population coverage

= different data producers

(13)

Hazards of using Big Data

Big data

= many different types of data

= different population coverage

= different data producers

= different places and times

(14)

Hazards of using Big Data

Big data

= many different types of data

= different population coverage

= different data producers

= different places and times

= different variable names

(15)

Hazards of using Big Data

Big data

= many different types of data

= different population coverage

= different data producers

= different places and times

= different variable names

(16)

Hazards of using Big Data

Big data

= many different types of data

= different population coverage

= different data producers

= different places and times

= different variable names

= different levels of documentation

(17)

Hazards of using Big Data

Big data

= many different types of data

= different population coverage

= different data producers

= different places and times

= different variable names

= different levels of documentation

Big data born of rapid change in technology

Technology ≠ substitute for understanding

(18)

Overcoming the

(19)

Overcoming the

Hazards of using Big Data

Harmonization of data is essential

(20)

Overcoming the

Hazards of using Big Data

Harmonization of data is essential

(21)

Overcoming the

Hazards of using Big Data

Harmonization of data is essential

- universe, variable meaning, codes

Harmonization of microdata

(22)

Overcoming the

Hazards of using Big Data

Harmonization of data is essential

- universe, variable meaning, codes

Harmonization of microdata

AND

metadata are both necessary

(23)

Overcoming the

Hazards of using Big Data

Harmonization of data is essential

- universe, variable meaning, codes

Harmonization of microdata

AND

metadata are both necessary

Tools to facilitate access to metadata

(24)

IPUMS: Big Data for Smart Research

IPUMS-International Overview

Hazards of “Big Data” research

IPUMS harmonization principles / process

Data & tools for smart research

User statistics for European

(25)

DATA METADATA Data files Data dictionary Enumeration forms Enum. instructions Census/sample design Reformat data Donation Draw sample Confidentiality A

IPUMS data dictionary Images to editable files Translate docs to English

Create source variables Confidentiality B

Verify data

Tag enumeration text Document sourcevariables

Harmonize codes Variable programming Family pointers

GIS boundary files

Variable descriptions Sample documentation

(26)

IPUMS follows international standards on

microdata confidentiality

EUROSTAT statistical confidentiality standards

(Thorogood, 1999) – basic framework for IPUMS-International protocols

ECE cites IPUMS as “best practice”

“Entrusting census microdata and metadata for timely integration and

dissemination via the IPUMS-EurAsia and IECM initiatives, 2010-2014,” ECE/CES/GE.41/2009/23

Dennis Trewin, ISI Special Task force on Statistical

Confidentiality & Microdata access describes

IPUMS-International as

“...a best practice for a data repository of international statistical data.”

(27)

Suppress low-level geographic identifiers

- usually < 20,000 persons).

Swap a small percentage of cases between

geographic areas.

For recent censuses: recode cells representing very

small numbers of persons in the population.

Suppress categories or entire variables as requested

by the NSO.

(28)

IPUMS: Big Data for Smart Research

IPUMS-International Overview

Hazards of “Big Data” research

IPUMS harmonization principles / process

Data & tools for smart research

User statistics for European

(29)

IPUMS value-added features facilitate research

Available

free

on-line,

easy to use,

saves time

Harmonized variables with

consistent coding

Time-series

potential for most countries

Consistent

geographic regions

Constructed

family interrelationship

variables

Comprehensive interactive

online documentation

Pooled data

from

customizable extract system

SPSS, Stata or SAS

sytax files

Great

user support

(30)
(31)

IPUMS Customizable Data Extract System

Home Ownership Relation to Head Age Marital Status Occupation

Data extract

(32)

3. Submit extract

Pooled Data Extracts

sample water sex education

Argentina 2001 3.6 million Chile 2002 1.5 million Cuba 2002 1.1 million Extract Engine

Argentina 2001

Chile 2002

Cuba 2002

Water supply

Sex

Education

1. Select samples

2. Select variables

1 dataset 3 censuses 4 variables 6.2 million records Harmonized codes

(33)

Selected United Nations MDG indicators using IPUMS-I data

Goal Indicators Target

1. Eradicate extreme poverty and hunger

Non-official Young unemployment rate, aged 15-24, each sex and total

1.B. Achieve full and productive employment and decent work for all, including women and young people.

2. Achieve universal primary education

2.1. 2.2 2.3

Net enrollment ratio in primary education

Proportion of pupils starting grade 1 who reach grade 5b Literacy rate of 15-24 year-olds.

2. Ensure that, by 2015, children everywhere, boys and girls alike, will be able to complete a full course of primary schooling

3. Promote gender equality and empower women 3.1A 3.1B 3.2

Ratio of girls to boys in primary, secondary, and tertiary education Ratio of literate women to men, 15-24 years old

Share of women in wage employment in the non-agricultural sector

3. Eliminate gender disparity in primary and secondary education preferably by 2005, and in all levels of education no later than 2015

5. Improve maternal

health 5.4 Adolescent birth rate 5B. Achieve, by 2015, universal access to reproductive health.

7. Ensure environmental sustainability

Non-official Proportion of the population using solid fuels

7A.. Integrate the principles of sustainable development into country policies and programs and reverse the loss of environmental resources

7.8 7.9

Proportion of population with sustainable access to an improved water source, urban and rural

Proportion of population with access to improved sanitation facility, urban and rural

7C. Halve, by 2015, the proportion of people without

sustainable access to safe drinking water and basic sanitation

Non-official Proportion of households with access to secure tenure

7D. By 2020, to have achieved a significant improvement in the lives of at least 100 million slum dwellers

8. Develop a global partnership for development

8.14/8.15 8.16

Telephone lines and cellular subscribers per 100 population Internet users and personal computers per 100 population

8F. In cooperation with the private sector, make available the benefits of new technologies, especially information and communications

(34)

Goal 2: Achieve universal primary education

2.3 Literacy rate of 15-24 year-olds: “...percentage of the population 15–24 years old who can both read and write with understanding a short simple statement on everyday life.” (United Nations, 2003)

IPUMS-I operationalization

:

IPUMS-I Integrated variables used:

AGE & LIT

24) AGE & 15 (AGE 24 -15 ages Persons 24) AGE & 15 (AGE 24 -15 ages and 2) (LIT persons Literate       Formula

(35)

Developing Countries: Literacy rates (ages 15-24) in IPUMS samples

Source: Cuesta and Lovatón (2013)

Data Source: Original census data provided by national statistical offices of partner countries; data harmonized and distributed by PUMS-Interneational

(36)

Source: Cuesta and Lovatón (2013)

Data Source: Original census data provided by IBGE, national statistical office of Brazil; data harmonized and distributed by PUMS-International.

Census

2000

Census

1991

Census

2010

(37)

IPUMS: Big Data for Smart Research

IPUMS-International Overview

Hazards of “Big Data” research

IPUMS harmonization principles / process

Data & tools for smart research

User statistics for Europe

(38)
(39)
(40)
(41)

IPUMS User Statistics:

Registered IPUMS users by country data requested

In Europe

(n=1,673)

In the World

(n=5,299)

1 Mexico 1,481 2 Brazil 1,269 3 United States 1,258 4 Colombia 944 5 Argentina 831 6 Chile 746 7 France 740 8 South Africa 737 9 China 720 10 Kenya 704 11 Spain 702 12 Canada 701 1 France 740 2 Spain 702 3 Greece 561 4 United Kingdom 554 5 Portugal 540 6 Austria 515 7 Romania 495 8 Hungary 443 9 Italy 417 10 Netherlands 391 11 Switzerland 347 12 Belarus 322

(42)

IPUMS User Statistics:

Registered IPUMS users by user country of residence

In Europe

(n=2,290)

In the World

(n=5,299)

1 United States 3,256 2 Spain 402 3 France 371 4 United Kingdom 264 5 Germany 230 6 Italy 215 7 Brazil 159 8 Switzerland 149 9 Austria 147 10 Netherlands 133 11 Canada 116 12 China 113 1 Spain 402 2 France 371 3 United Kingdom 264 4 Germany 230 5 Italy 215 6 Switzerland 149 7 Austria 147 8 Netherlands 133 9 Belgium 59 10 Romania 47 11 Hungary 46 12 Denmark 32 322

(43)

IPUMS User Statistics:

Registered IPUMS users by user country of residence

In Europe

(n=2,290)

In the World

(n=5,299)

1 United States 3,256 2 Spain 402 3 France 371 4 United Kingdom 264 5 Germany 230 6 Italy 215 7 Brazil 159 8 Switzerland 149 9 Austria 147 10 Netherlands 133 11 Canada 116 12 China 113 1 Spain 402 2 France 371 3 United Kingdom 264 4 Germany 230 5 Italy 215 6 Switzerland 149 7 Austria 147 8 Netherlands 133 9 Belgium 59 10 Romania 47 11 Hungary 46 12 Denmark 32 322

(44)

IPUMS User Statistics:

Registered IPUMS users by user country of residence

In Europe

(n=2,290)

In the World

(n=5,299)

1 United States 3,256 2 Spain 402 3 France 371 4 United Kingdom 264 5 Germany 230 6 Italy 215 7 Brazil 159 8 Switzerland 149 9 Austria 147 10 Netherlands 133 11 Canada 116 12 China 113 1 Spain 402 2 France 371 3 United Kingdom 264 4 Germany 230 5 Italy 215 6 Switzerland 149 7 Austria 147 8 Netherlands 133 9 Belgium 59 10 Romania 47 11 Hungary 46 12 Denmark 32 322

(45)

IPUMS User Statistics:

Top-ranked Institutions by number of users

In the World

In Europe

1 Univ. Auton. de Barcelona (164) 2 Panteion Univ. of Athens

3 INED - France

4 Vienna Institute of Demography 5 Paris School of Economics

6 Universite de Strasbourg 7 City University London 8 MPID - Germany 9 University of Oxford 10 University of Vienna 11 Bocconi University 11 University of Tubingen 13 University of Essex 13 University of Groningen 15 University of Geneva (28) 1 University of Minnesota (201) 2 United Nations

3 Univ. Autonoma de Barcelona

4 The World Bank

5 University of Chicago 6 Harvard University

7 Panteion Univ. of Athens

8 University of Michigan 9 University of Washington 10 Arizona State University 11 Columbia University

12 Stanford University

(46)

IPUMS User Statistics:

No. of extracts of

European census samples by year of extract

Europe France Ireland Turkey

2003 84 84

2004 59 59

2005 101 101

2006 174

91

2007 774 123

2008 1,419 183

2009 1,794 197

2010 2,365

269

2011 4,169 562 186

2012 4,932 523 332 179

2013 4,339 367 278 298

Total 20,210 2,559

796 477

(47)

IPUMS User Statistics:

Number of extract

requests that include European samples by country

In Europe

(n=20,210)

11 Netherlands 1,097 12 Belarus 1,032 13 Ireland 918 14 Slovenia 813 15 Germany 809 16 Turkey 654 1 France 2,717 2 Spain 2,305 3 Greece 1,925 4 Romania 1,680 5 Portugal 1,673 6 Austria 1,622 7 United Kingdom 1,559 8 Hungary 1,311 9 Italy 1,183 10 Switzerland 1,101

(48)

IPUMS: Big Data for Smart Research

IPUMS-International Overview

Hazards of “Big Data” research

IPUMS harmonization principles / process

Data & tools for smart research

User statistics for Europe

(49)

Data Tabulator

Fast online analysis of sample data

Pooled samples

(across time and/or country)

International.ipums.org

Click on Analyze Data Online

(50)

Local Value-Added Web Portals

IECM

Barcelona, Spain

http://www.iecm-project.org/

African Integrated Census Microdata

Addis Ababa, Ethiopia

http://ecastats.uneca.org/aicmd/

Economic Research Forum

Cairo, Egypt

Under construction at

http://www.erf.org.eg

(51)

Geography Enhancements /

GIS Boundary Files

Better geographic documentation

Pooling adjacent units

(20,000 person threshold)

Creating integrated geo-statistical units

Collecting boundaries

- Digital and paper maps

Digitizing and code matching to census units

Return all creations to statistical office in

(52)

IRDC

IPUMS-International Remote Data Center

Controlled access to anonymized high

precision (full count, long-form, etc.)

census microdata and metadata

Certified facilities /electronically-secured

terminals are used to analyze IRDC data

Servers maintained by MPC

Data are not downloadable, tables or other

results undergo review prior to release

(53)
(54)
(55)

Terra Populus Data Domains

Environment

Population

Individuals and households Areal Data Land cover Land use Climate

Microdata

(56)

Thank You!

Patricia Kelly Hall

References

Related documents

With regard to paragraph 17 and related text in the Respondent's submissions, the shape of the federal on-reserve child welfare program is not subject to

analyzed how the effect of moving abroad on domestic employment and performance (productivity and export) of internationalizing Japanese firms depends on conditions related to

The design and construction oversight of the Driscoll Bridge was the result of a collaboration between three groups of engineers: the New Jersey Highway Authority staff, the

Fuelwood Consumed (kg/yr);.. Locations: The areas of concern for this systematic review are agroecosystems in low- and middle-income countries as identified by the World Bank

Ê Close the server blade, place it back in the basic unit and switch on the server blade as described in the chapter &#34;Completion&#34; on page 53.. Ê Start the operating

This is the recommendation that moral theorists, personality psychologists, and people in general abandon the practice of ‘global’ attributions of character in favor of

pedigree, red nose american pitbull terrier bloodlines , american pit bull terrier club nederland, american staffordshire terrier cross pitbull, american bulldog puppies for sale

Assessment of the levels of persistent organic pollutants and 1-hydroxypyrene in blood and urine samples from Mexican children living in an endemic malaria area in Mexico.. DDT,