• No results found

Research Data Exchange (RDE) and Safety Pilot Model Deployment Data

N/A
N/A
Protected

Academic year: 2021

Share "Research Data Exchange (RDE) and Safety Pilot Model Deployment Data"

Copied!
19
0
0

Loading.... (view fulltext now)

Full text

(1)

FOT-Net Final Event

First FOT-Net Data Workshop

Amsterdam, 18-19 March 2014

Research Data Exchange (RDE) and

Safety Pilot Model Deployment Data

Dale Thompson

(2)

Outline

 The Research Data Exchange (RDE)

Mission

Structure

Statistics and Usage

 Featured Data Environment: Safety Pilot Model Deployment (SPMD)

Overview of the SPMD

Hosting the SPMD Data

 Future Data Environment - SHRP 2 Naturalistic Driving Study (NDS)

Overview of SHRP2 NDS

(3)

The RDE: a central transportation data repository for

researchers and application developers

 Data Capture and Management program’s mission is to provide a variety of data-related services that support the development, testing, and demonstration of multi-modal

transportation mobility applications  The RDE is a transportation data

sharing system that promotes sharing of archived and real-time data from multiple sources and multiple modes  The RDE provides the ability for users

to download data and appropriate documentation, create research projects and collaborate with other

(4)

The RDE employs the concept of a Data Environment

to structure the various data sets

 RDE organizes data using a data environment / data set / data file hierarchy

 A Data Environment is a collection of data sets which were obtained under the same test / experiment

 Data Sets represent a logical

arrangement of files that convey a central concept or idea about an aspect of a data collection exercise  Data sets contain Data Files that are

archived collection of data (elements) and can be text, zip, binary, or other file types

(5)

The RDE currently houses 11 Data Environments with

data from different locations throughout the US

 States from which data have been collected include California, Florida, Michigan, Minnesota, Oregon,

Virginia, and Washington

 The number of data sets that are a part of these 11 environments range from 2 to 37

 And the number of data files per data set ranges from a few to well over 100

(6)

The Safety Pilot Model Deployment is a recently added

data environment to the RDE

 SPMD is an exploration of the real-world effectiveness of connected vehicle safety applications in multi-modal driving conditions

 A one-day sample of SPMD data is captured in this environment

□ This provide users with a snapshot of the output from the

implementation of connected vehicle technology

 This environment contains

□ 5 data sets

□ mobility data elements collected from approximately 3000 vehicles

□ weather and infrastructure related data elements

(7)

Hyper-accurate, hyper-frequent data posed a series of

challenges in uploading the SPMD data

 Some of the challenges faced in making the SPMD data publicly available include:

□ Data governance

□ Distribution rights

□ Personally identifiable information

□ Size of the data sets and data files  Understanding the data governance

structure amongst involved entities is integral to acquiring data for

distribution

 Two of the more directed constraints for data distribution are the inclusion of data that may compromise the initial goal of the exercise, and the

(8)

PII had to be removed from the SPMD data while

maintaining meaningfulness of the data

 To protect participants’ identity the RDE team rid all data files of data elements that contain PII

 Data elements that could be paired with other publicly available data were also deleted

 Vehicle trajectories, with points collected at 10Hz, revealed the identity of participants, therefore

□ Sanitization algorithms were

developed to truncate trajectories to mask trip origins and destinations

□ The algorithms were also applied to dependent / related data elements

Complete Trajectories

Truncated Trajectories

(9)

Connected vehicle data is an emerging area, subject to

“Big Data” opportunities and challenges

 The SPMD data environment was structured in 5 data sets, with a total sanitized volume of approximately 24 GB (largest file ~ 10GB) for a 24-hr period

 The original un-sanitized data set was approximately 50GB

 The challenge with working with such large data sets is two-fold

□ Extracting and sanitizing the data is computationally expensive

□ (Large) files had to be carefully broken into more manageable segments for easy download

(10)

The RDE team will continue to post additional data sets

while leveraging efforts of similar data sharing entities

 Data sets being pursued for RDE hosting include data from:

□ Dynamic Mobility Applications

□ Applications for the Environment: Real Time Information Synthesis (AERIS)

□ Road Weather Management Program

 Entities that the RDE team is looking to partner with, to not only share data, but also sharing strategies and

insights when distributing data

□ FOT-Net Data

(11)

The RDE team will be adding the SHRP2 Naturalistic

Driving Study (NDS) data in the coming months

 Designed to investigate ordinary driving under real world conditions, with aim of learning about driver decisions

 Wide-spread demographics of the study’s 3100 participants

 Two year timeframe for extensive data collection

 Wide-spread geography of test sites around the US: Tampa, FL;

Bloomington, IN; Durham, NC; Buffalo, NY; State College, PA;

(12)

There is a wide variety of data available from the study

 Driver Assessment Data: visual perception, medical history, reaction time, driving knowledge, etc.

 Vehicle Data: vehicle make and model, and how vehicle is equipped (with sensors, for example)

 Driving Data: Video images from various perspectives in vehicle, vehicle kinematics, and others such as seat belt use, steering wheel angle, alcohol presence, radar to identify external near field objects

 Crash Data: interview Q&As, police crash reports

 Roadway Data: roadway geometry, speed limit signs, intersection location and

characteristics, etc. (these data are obtained in an effort separate from the collection of the driving data)

(13)

In making NDS data accessible via the RDE, the

procedure followed will be informed by that of SPMD

 Similar to the challenges when distributing SPMD data, these

challenges will face when distributing NDS data

 These challenges include:

□ Data governance

□ Distribution rights

□ Personally identifiable information

□ Size of the data sets and data files  RDE team will employ lessons leaned

from posting the SPMD data to the RDE, while being cognizant of the nuisances of the NDS data that will lead changes to the developed approach

(14)

RDE Policy Issues

 The RDE is a public-facing, research resource that hosts large volumes of potentially sensitive data from multiple sources

 It required development of policies and procedures in a number of areas typical of other websites:

□ Authorities and membership management

□ Accessibility

□ Terms of use

 To create the RDE, the team also confronted a range of unique policy issues in these areas:

□ Data ownership

□ Data security

(15)

Data Ownership

 Issue: The RDE may host data that has been provided by different sources:

□ Federal contractors

□ State and other public agencies

□ Universities

□ Private individuals or businesses

 Relevant RDE Goal: Foster and support research in transportation operations by a wide variety of stakeholders

 Challenge: Balance rights of various providers (with different institutional structures and needs) against needs for wide access and use

 Response:

□ Sign agreements with each data contributor

□ Offer RDE content to the public under open source license that requires attribution (Creative Commons Attribution-ShareAlike 3.0 Unported)

(16)

Data Security

 Issue: The RDE contains several terabytes of data, scaling up to petabytes.  Relevant RDE Goals:

□ Offer reliable and cost-effective access to huge data sets

□ Comply with Federal Information Security Management Act (FISMA)

 Challenge: Develop a business model for on-site Departmental hosting or certify external server host

 Response:

□ Launch version 1.0 on website contractor servers ▪ Enforce security training

▪ (Insert additional info on IndraSoft certification here)

□ Transition to FedRAMP-certified cloud-based host (Amazon Web Services or similar)

(17)

Data Privacy

 Issue: The RDE contains GPS traces from vehicles.  Relevant RDE Goals:

□ Provide maximum research value from available data

□ Protect identity of vehicle users

 Challenge: Develop an approach to reliably de-identify GPS traces  Response:

□ Launch with GPS traces only from public agency vehicles on agency business

□ Develop processes for:

▪ GPS trace de-identification by minimal truncation ▪ Validation of the de-identification methods

(18)

Near-Term Steps: Data Federation

 Issue: Data federation entails providing access through the RDE to data sets not owned or managed by the RDE Team

 Relevant RDE Goals:

□ Protect data rights of providers

□ Protect privacy of vehicle users in data sets

□ Ensure overall system security

 Challenge: Develop a flexible system of agreements that can be instituted between the RDE Team and federated sites

(19)

Questions

Dale Thompson

USDOT

ITS Joint Program Office

[email protected]

202-493-0259

References

Related documents

The fourth part of your submitted proposal should (1) state the auditor’s preference for whether the County or the auditor should prepare the majority of year-end adjusting

We conduct a comparison between DG3 (three-point discontinuous Galerkin scheme; Huynh, 2007), MCV5 (fifth- order multi-moment constrained finite volume scheme; Ii and Xiao, 2009)

Initially, I had difficulty understanding how it was that students were integrating the various disciplinary perspectives in their pursuit of the question, “What does it mean to

T h e second approximation is the narrowest; this is because for the present data the sample variance is substantially smaller than would be expected, given the mean

En efecto, así como los libertarianos ven en cual- quier forma de intervención del Estado una fuente inevitable de interferencias arbitrarias –con la excepción de aquella acción

In order to determine the optimal action that should be executed to recover the system into normal operation at the next step, the proposed model is adapted to decision

[87] demonstrated the use of time-resolved fluorescence measurements to study the enhanced FRET efficiency and increased fluorescent lifetime of immobi- lized quantum dots on a

VeriFone was among the first to receive Type Approval specifying that its payment devices and solutions conform to EMV specifications for hardware and software, with Omni 3350