• No results found

ACCESS METHODS FOR UNITED STATES MICRODATA

N/A
N/A
Protected

Academic year: 2021

Share "ACCESS METHODS FOR UNITED STATES MICRODATA"

Copied!
21
0
0

Loading.... (view fulltext now)

Full text

(1)

ACCESS METHODS FOR UNITED

STATES MICRODATA

Daniel Weinberg, US Census Bureau

John Abowd, US Census Bureau and Cornell U Sandra Rowland, US Census Bureau (retired)

Philip Steel, US Census Bureau Laura Zayatz, US Census Bureau

August 20, 2007

This presentation is intended to inform interested parties of ongoing research and to encourage discussion of work in progress. The views expressed on methodological issues are those of the presenter and not necessarily those of

(2)

Traditional Methods of Data Access

1. Tabulations

ƒ Confidentiality of respondents

protected by limiting the number of cells relative to the number of

observations

ƒ May use complementary

(3)

Traditional Methods of Data Access

2. Public-Use Microdata Samples

ƒ Confidentiality of respondents

protected by omitting some

information and modifying some of the remaining information

ƒ Methods include top- and

bottom-coding, re-categorization, noise

infusion, swapping, and geographic aggregation

(4)

Four recent data access approaches

ƒ Licensing – providing restricted data directly

to individuals or organizations under a confidentiality protection agreement.

ƒ Research data centers – statistical enclaves for

research purposes.

ƒ Remote access – submission of analysis

requests (typically computer programs)

ƒ Synthetic data – data that mirrors the

properties of the collected data yet fully protects the confidential data provided by respondents.

(5)

Licensing

Aspects of the license document :

ƒ Defines the information subject to the

agreement;

ƒ Specifies the individuals who may have

access to subject data;

ƒ Describes limitations of disclosure and

clearance procedures;

ƒ Lists administrative requirements;

ƒ Requires that copies of publications based on

(6)

Licensing, continued

ƒ Requires the organization to contact the

sponsoring agency in case of (suspected) breaches of security;

ƒ Requires the organization to agree to

unannounced and unscheduled inspections; and

ƒ Reviews the security requirements for the

maintenance of, and access to, subject data, and describes penalties for violations.

(7)

Licensing, continued

Examples of U.S. uses:

ƒ National Center for Education Statistics

ƒ Division of Science Resource Statistics

ƒ Bureau of Labor Statistics

(8)

Research Data Centers (Data Enclaves)

ƒ The nine Census Bureau facilities are

partnerships with academic and non-profit organizations (one at HQ).

ƒ Staffed by a Census Bureau employee.

ƒ Meet all physical and computer security

requirements.

ƒ Host Census Bureau and other federal

agency data.

ƒ Researchers become “Special Sworn

(9)

Research Data Centers, continued

Goals:

ƒ Increase the utility and quality of

Census Bureau data products;

ƒ Encourages knowledgeable

researchers to become familiar with an agency’s data products and data collection methods;

ƒ Research can address important

(10)

Research Data Centers, continued

Goals, continued:

ƒ Improves the quality of data

collection and processing practices by subjecting current methods to testing through additional uses;

ƒ Allows for data linking not possible

with aggregates that leverage the value of existing data;

(11)

Census RDC Projects Must Meet

Five Standards

ƒ Benefit to Census Bureau programs

(13 criteria used)

ƒ Scientific merit

ƒ Requires non-public data

ƒ Must be feasible

ƒ No risk of disclosure

(12)

Other Research Data Centers

Examples:

ƒ Bureau of Labor Statistics HQ

ƒ National Center for Health Statistics

HQ

ƒ National Institute of Child Health and

Human Development (3 locations)

ƒ National Opinion Research Center (2

(13)

Remote Access

ƒ Data files usually edited in advance to

reduce the possibility of disclosure.

ƒ Employ automated and manual filters

that block certain kinds of queries and results.

ƒ Must be monitored automatically

and/or manually for disclosure avoidance. A difficult issue is

(14)

Remote Access, continued

Methodologies

ƒ “Remote job execution systems” - an

email interface that allows users to

send programs; processing is usually done in batch mode.

ƒ Web interface with custom-built or

(15)

Remote Access, continued

Examples

ƒ

Luxembourg Income Study

ƒ

Canada, Denmark, the

Netherlands, Sweden, Australia

ƒ

In the U.S.: National Centers for

Education and Health Statistics,

Census Bureau

(16)

Synthetic Microdata

ƒ Has a relatively long history (e.g., U.S.

1990 decennial census, 1989 U.S. Survey of Consumer Finances)

ƒ “Fully synthetic data” - posterior

predictive distribution from a model-based data analysis [Rubin, Fienberg]

ƒ “Partially synthetic data” - synthesizing

either the sensitive values or the

(17)

Synthetic Microdata, continued

Standards to meet:

ƒ

Protect confidentiality at least as

well as other methods.

ƒ

Provide inferences that are

consistent with the inferences an

analyst would have made from the

original data (analytical validity).

(18)

Synthetic Microdata, continued

Major Census Bureau uses (1):

ƒ Survey of Income and Program

Participation linked to longitudinal social security benefit histories

and longitudinal employee-employer earnings reports.

ƒ Cleared for release

ƒ More tests of analytical validity of

(19)

Synthetic Microdata, continued

Major Census Bureau uses (2):

ƒ

Longitudinally integrated

employer-employee data from

the Longitudinal

Employer-Household Dynamics (LEHD)

program in “OnTheMap” Internet

application tabulating

(20)

Synthetic Microdata, continued

Major Census Bureau uses (3,4):

ƒ

Longitudinal Business Database

ƒ “Beta” version released for testing

ƒ

American Community Survey

(21)

Concluding Comment

ƒ Threats to public microdata release

are increasing.

ƒ Statistical agencies must respond to

this threat in order to meet the needs of their users.

ƒ Statistical agencies have responded

- through licensing, research enclaves, remote access, and

References

Related documents

Riders are recommended to bear left at this point and then turn right at the junction with the main road into the centre of the Mortara and the end of the section VF sign,

water above acceptable limits in 10 to 100 years from now. Outside the former Soviet Union, direct and indirect contamination of lakes has caused and is still causing many

Autism, however, is a lifelong condition, with support still required into adulthood, and the type of supports needed may change across the autistic person’s life course

We conducted a cross-sectional survey study of junior and senior students from three US health professions at one institution to compare properties of the RIPLS and the IEPS,

(50us) 0BH Moves the cursor horizontal position to 00H, the vertical positioning is dependent on the currently selected font, allowing for immediate character writing in the

The solution advanced in this paper is to use Open Sound Control messages and native Max/MSP patches and externals to implement objects for dynamic,

Every new type of breathing apparatus and any apparatus which has been modified in any way that may affect the test results, shall be tested in accordance with this NORSOK standard.

In connection with signing a class-wide settlement term sheet with counsel for Plaintiff Sabrina Cardenas, Sony extended the warranties on the KDS- 50A2020, KDS-55A2020,