David S. Fearon
JHU Data Management Services
Jennifer Darragh
Sheridan Library GIS & Data Services Johns Hopkins University
IASSIST ‘15 June 5, 2015
Training for de-identifying human
subjects data for sharing:
a viable library service
©2015 JHU Data Management Services.
Collaborating on a training and
service area
Data Management Planning & consulting
Archiving and Sharing Research Data
Data Management Training Workshops
GIS AND DATA SERVICES
GIS training & software Statistics software support Subscription data services Data discovery
JHU Data Management Services
Data Management Training Topics
Preparing Data Management Plans Data Management Best Practices
Sharing Data in
Spreadsheets
Preparing Data
for Archiving
Removing Human Subject Identifiers• Our most popular training topic
• An underserved topic in JHU’s research support • Offered a new support service area
data management during the project
acquire data dissemination archiving preservation Idea / proposal Data re-use / discovery Preparing Data for Sharing & Archiving Data Management Planning
“Concierge” service for the
JHU Research Data Life Cycle
Manage Data in JHU Data Archive
IRB(s) Data/GIS Librarians Subject Librarians Research Admin Central IT IT within Schools Biostats Center Data manager group
HPCs Tech Transfer Research Conduct General Counsel Archives Institutional Repository
Institutes with focus on security, clinical data, etc. Institutional
6
Contexts for developing data de-Identification training
data management during the project
acquire data dissemination archiving preservation Idea / proposal Data re-use / discovery
Contexts for developing data de-Identification training
Sharing and archiving human subjects data: a role for Research Data Services
IRB: plans for protecting subjects and data acquisition
Storage/backup Access/security Data Organization Researcher must mitigate disclosure risk during research
Johns Hopkins Data Archive
Protecting identifiers is critical for serving
open access data
Ultimately the researcher's
responsibility, but the Archive can provide guidance
https://archive.data.jhu.edu
Compliance with US Funder
Open Access Policies
• Broader emphasis on data sharing
• Encourages efforts at removing identifiers for public access
• NIH policy changes have big impact at Johns Hopkins
10
Trained by the ICPSR
• 4 day course topics:
• Locating and protecting identifiers in data • Assessing disclosure risk in shared files • Techniques for mitigating disclosure risk
in public-use files
• ICPSR gave permission to adapt training • Filtered material into a 1 hour session • Drew upon additional literature
• Shifting focus from disclosure mitigation by
Developing the training
• Vetted training material with JHU IRB offices
• Developed a supplementary handout
https://jh.box.com/De-IdentificationTips
Disclaimer: We are providing advice; IRB is the final authority on this subject
Removing Identifiers from Human Subject Data
Protecting
identifiers in field Locating
identifiers IRB & consent
forms
Removing or masking identifiers
Sharing publicly available datasets
• How to locate & protect personal identifiers
• How to prepare de-identified datasets for sharing
Audience response to the training
• First session in March 2013
• 12 sessions, ~250 attendees to date • 2 campus venues at JHU
– Social sciences at JHU Arts & Sciences – Health Sciences at Schools of
Medicine, Nursing, Public Health • Reaching graduate students, research
Requests for department presentations was
incentive for customized content
Radiology - Removing identifiers from
medical images Education - FERPA guidelines
Expanding disclosure protection support as an area of data services
Interior of George Peabody Library
Contexts:
• Funder Open Access policies for data sharing • University's push for better data management,
compliance and privacy protection
• JHU Data Archive's responsibility to researchers depositing open access data.
Interior of George Peabody Library • De-identification software tool support
Expanding disclosure protection support as an area of data services
Applications to Assist in De-identifying Human Subjects Research Data
Unstructured Text Data in Digital Images
Tabular or Otherwise Structured Data DICOMCleaner
Freeware? YES
Intended Purpose? Medical Images in DICOM (Digital Imaging and Communications in Medicine) format
Specific Data Input
Format? DICOM format
Skill Needed technically proficient, requires time investment to learn
Latest Date on
Website 2015
Support? No explicit support
Interior of George Peabody Library • De-identification software tool support
•
Consultation service: guidance to researchers
preparing and sharing public access data
Expanding disclosure protection support as an area of data services
• De-identification software tool support
• Consultation service: guidance to researchers preparing and sharing public access data
Expanding disclosure protection support as an area of data services
•
Identifier disclosure
analysis as part of
JHU's archiving service
Recommendations for adding identifier
disclosure mitigation services
to your research support
Worth considering: expands visibility, relevance, and campus partnerships for research support services
Talk to IRB and compliance offices about gaps in
support, scoped to data services (e.g., data sharing, preservation, security)
Start small with website resources, build materials for training.
If operating a data archive or institutional
repository, (especially self-deposit) be aware of disclosure risk, consider basic content screening.
©2015 JHU Data Management Services. Not for distribution or repurposing without permission
Interior of George Peabody Library
Questions?
Dave Fearon: [email protected]http://dmp.data.jhu.edu/