• No results found

Data Sharing in the US

N/A
N/A
Protected

Academic year: 2021

Share "Data Sharing in the US"

Copied!
31
0
0

Loading.... (view fulltext now)

Full text

(1)

FOT-Net Data Stakeholder Meeting on

Open Data and Data Re-use in Horizon 2020

Data Sharing in the US

Ram Kandarpa, Booz Allen Hamilton

(under contract to USDOT)

March 10, 2015 Brussels, Belgium

(2)

Topics

Challenge 1: A common platform for data sharing

Challenge 2: Protection of Privacy

Challenge 3: Big Data and Cloud Analytics

Challenge 4: Engagement with RDE Users

(3)

Challenge 1: A common platform for data sharing

An accessible common platform for systematically sharing Field Operational Test

(FOT) data is essential for a continuous and efficient usage of the FOT data in

research and development of multiple applications enabled by such data.

Data Sharing Platform Application Data Capture Information Raw Data

(4)

A platform is needed for collecting and sharing connected vehicle

and infrastructure data for research and applications development

Data Environments

Real-time Data Capture and Management

Transit Data Truck Data Reduce Speed 35 MPH Weather Application Transit Signal Priority Fleet Management/ Dynamic Route Guidance

Connected Vehicle Applications

Mobile Devices

(5)

The Research Data Exchange (RDE) is the US DOT’s primary

repository of publically available FOT CV-related data

Purpose

– To provide a variety of data-related services that support the development, testing, and demonstration of

multi-modal transportation mobility, weather, and environmental applications.

Objectives

– Enables systematic data capture from connected vehicles, mobile devices, and infrastructure

– Provides high quality and well-documented data sets

– Integrates data from multiple sources into data environments

US DOT Program Owner

– The Data Capture and Management (DCM) program

within the Intelligent Transportation Systems Joint Program Office (ITS-JPO)

(6)

The RDE currently hosts a mix of data from probes, connected

vehicles, infrastructure and contextual sources from nearly a

dozen FOTs and demos

Probe Message Data.Actual and simulated vehicle trajectories and probe snapshot messages in SAE

J2735 format from tests conducted at the Connected Vehicle Test Bed in Novi, MI in 2008, 2009, and 2010.

Vehicle and Roadside Device Data.Integrated multimodal data from vehicles and roadside sensors from

four sites (Seattle, Portland, Pasadena, and San Diego). Data includes light and transit vehicles, incidents, weather, freeway and arterial travel times, and traffic signal data.

Connected Maintenance Vehicles.Real-time streaming and archived onboard (GPS/AVL) data from

wirelessly-connected snowplows and maintenance trucks operated by Minnesota DOT.

Basic Safety Messages (BSM) - Orlando.BSM data collected every 0.1 second from transit vehicles at the

2011 World Congress Demonstration in Orlando FL.

BSM Data - Leesburg.BSM data collected every 0.1 second from a device in a vehicle in the vicinity of

Leesburg, VA.

(7)

The US DOT plans to expand the offerings of data available on the

RDE over the next several years as “CV Pilot” deployments begin

Near Term Additions:

– Additional Data from Safety Pilot Deployment

– Data from 2014 ITS World Congress

Queue Length Data and CV Data

Weather Data Demonstration

Future Additions:

– Dynamic Mobility Applications (DMA) Prototypes

– CV Pilot Deployments

– Operational Data Environments

Links to additional connected vehicle related data

(8)

The US DOT is developing guidance and requirements for

systematic provision of data from FOTs to the RDE

Guidance is being developed in a user-friendly question-and-answer (Q&A) format

Guidance will be made available online in the near-term

Major topics covered in the guidance include:

Requirements for providing data to the RDE

(9)

Within the CV FOT guidance, requirements are identified in support

of making data available for research uses

Data Requirements: These specify the minimum level of requirements for the data that the

US DOT has in making data available on the RDE.

These requirements are described for each stage of a typical FOT:

 *Note: this is not a comprehensive list; it is for illustrative purposes.

Stage of a Typical CV FOT Requirements to Consider*

Conceptualization • Assignment of appropriate FOT POC

Partnership Formation • Data ownership, permits, share-ability under the Open

Data License

Design and Development • Consideration of PII within the data

• Usage of non-proprietary data formats Implementation and Operation • Proper metadata documentation

• Adherence to data quality levels

Evaluation • Logically structured data files

(10)

Also within the CV FOT guidance, potential data-related issues are

identified in support of facilitating the FOT conductor

Data Issues related to CV FOTs: The guidance will provide data-related considerations an

FOT conductor/agency may encounter throughout the various stages of the FOT.

These issues are described for each stage of a typical FOT:

*Note: this is not a comprehensive list; it is for illustrative purposes.

Stage of a Typical CV FOT Potential Issues to Consider*

Conceptualization • Test goals and objectives

Partnership Formation • Data collection approach and plan

Design and Development • Performance measures

Implementation and Operation • Acquisition of data

(11)

Topics

Challenge 1: A common platform for data sharing

Challenge 2: Protection of Privacy

Challenge 3: Big Data and Cloud Analytics

Challenge 4: Engagement with RDE Users

(12)

Challenge 2: Protection of Privacy

Protecting the privacy of the individuals partaking in the FOTs is paramount if data

from the FOT are to be shared openly or outside of the designated parties.

De-identifying the data and/or seeking written permissions from the test participants are

two possible methods for protecting privacy.

(13)

The Detroit City Data Environment features connected vehicle data

and video data from a queue length estimation experiment

This demonstration was conducted in downtown Detroit, which is part of the Southeast Michigan Test Bed

Nine (9) vehicles traversed a predefined path that included 12 instrumented intersections

Collected data included:

– Vehicle kinematic data (lat, long, speed, acceleration, etc.)

– Intersection data (signal phase and timing, geo-spatial elements)

– Traveler information message (roadway advisories)

– Queue length (collected by field observer)

– Sample video recording of the demonstration (primarily to support verification of queue length estimates)

(14)

For the Detroit City Data Environment, steps were taken to ensure

privacy of individuals within video data

Privacy Issues:

Vehicle data contained no PII due to the constraints of the demonstration and queue length estimation experiment

However, the video recordings contained data that presented privacy concerns

– The audio associated with each video contained conversations between the field observers

– While video resolution did not allow the recognition of license plates, it did facilitate the recognition of some pedestrians as they walked along sidewalks and at crosswalks*

– To remedy this, the videos were processed to remove all audio.

– A filter was added to further degrade the resolution of video to make pedestrian features less distinguishable but still allowing user to recognize queue length

(15)

The Safety Pilot Model Deployment (SPMD) is a naturalistic driving

study primarily to evaluate the efficacy of V2V technologies

SPMD is an exploration of the real-world effectiveness of connected vehicle safety applications in multi-modal driving

conditions

This study included approximately 3000 drivers, conducting their day-to-day activities in instrumented vehicles

The hyper-frequent and hyper-local data collected by these vehicles provides tremendous research value but at the same time poses a threat

Privacy concerns arise from the ability to use vehicle position data to identify home, work, child care facilities, etc.

These data may be considered PII and then be used to uncover additional PII

(16)

Before data can be distributed to the public, PII related data has to

be removed while maintaining usefulness of the data

The RDE will host two samples of the SPMD data, a 1-day and 60-days, and two different sanitization strategies are needed to rid the data of PII

For the 1-day sample, the sanitization algorithm centered on identifying drivers’ origin and destination, and truncating the trajectories accordingly

Once Origins/Destinations have been identified, a series of measures were applied to best mask those locations

The algorithms were also applied to dependent/related data elements to further eliminate the possibility of uncovering PII

Complete Trajectories

(17)

Before data can be distributed to the public, PII related data has to

be removed while maintaining usefulness of the data

Cont’d …

For the 60-days sample, a more involved algorithm was developed, building on what was previously developed

The updated algorithm focused on more nuanced driver behaviors, beyond what is typical when classifying origins and

destinations

This focus was primarily employed to further mask PII which may be obtained through observing driving patterns over time

After applying this algorithm, the output is again a series of truncated trajectories while still maintaining, as best as possible, the usefulness of the data

Complete Trajectories

(18)

Video data, showing a test participant, will soon be made available

on the RDE, as part of road weather warning system demo

A planned data environment for the RDE contains field-simulated road weather data collected during a demonstration at the 2014 ITS World Congress in Detroit

Participants were driven in a specially instrumented demo van which did a short loop around the Belle Isle test track while collecting data from multiple onboard sensors during simulated road weather events

A video camera inside the van collected video

footage of the onboard warnings (generated by the simulated weather events) and the host who narrated the events for the participants

The host has granted written consent to the US DOT to allow the video (which includes her likeness)

(19)

The US DOT has alternative means available for retaining and

sharing data that has not been ridded of personally identifiable

information

For Connected Vehicle FOT data, the US DOT is using its Saxton Transportation Operations Laboratory at the Turner Fairbank

Highway Research Center (TFHRC) as a secure repository, with

access granted to researchers and interested parties on a limited basis

For data from the Naturalistic Driving Study (collected under the Strategic Highway Research Program 2), the US DOT has

established the ‘Safety Data Enclave’ at the TFHRC to provide comprehensive data sets on a limited basis to researchers and other interested parties

Some data may lose much of its value if steps are taken to eliminate all personally identifiable information.

There may be sufficient grounds to retain unaltered FOT data for later usage in a controlled (i.e., non-public) setting

(20)

Topics

Challenge 1: A common platform for data sharing

Challenge 2: Protection of Privacy

Challenge 3: Big Data and Cloud Analytics

(21)

Challenge 3: Big Data and Cloud Analytics

As connected vehicle data are generated in ever greater quantities during FOTs, it is

clear new methods beyond archiving and downloading will be needed to effectively

share data. Efforts are now underway to migrate the data on the RDE to a

cloud-based storage environment with analytical tools co-located.

(22)

The current RDE architecture cannot sustain

very large datasets

The paradigm of downloading data files for subsequent local analysis is not sustainable for significant-sized data files

– Two months of the SPMD data exceeds two terabytes (2TB). The time to download this data makes the existing approach unworkable.

– Even if a user were to succeed in downloading a large file, analyzing the data in a meaningful way would require significant time and processing resources

The obvious solution is to provide a cloud-based RDE architecture, and avoid file downloads by supporting analytical tools that are co-located with the data

– Users could perform their analyses in the cloud environment, or

– Users could filter the large datasets into smaller, more manageable files that could be downloaded for further analysis

Open source analytical tools, such as ‘R’, can easily be made available in cloud computing environments, and allow for complex data queries and processing operations

(23)

A cloud-based RDE solution will address size and processing

constraints

Cloud-based resources are “elastic”

– Users can adjust the storage and processing resources they need in real time

A user can expand storage and CPU power to query/integrate large data sets

The user can then “turn off” the resources when they are not needed

– US DOT is investigating various economic models of how to store the data and how users would access the data

For example, one approach would require a user to establish a cloud-provider “account” in order to conduct their desired analyses

Cloud-based resources would be secure

– Any cloud RDE solution would require the Federal Risk and Authorization Management Program (FedRAMP) approval

The RDE team is in the process of developing detailed access requirements and migration plans for a cloud-based solution, targeted for the Version 3 Release later this year

(24)

Topics

Challenge 1: A common platform for data sharing

Challenge 2: Protection of Privacy

Challenge 3: Big Data and Cloud Analytics

(25)

Challenge 4: Engagement with RDE Users

It is important to engage with the RDE users to educate them on the contents and

capabilities of the RDE, and to seek their feedback to continuously improve the RDE to

meet evolving needs of the research and development community.

(26)

The US DOT has hosted RDE and DCM program workshops to

personally engage with targeted users

The US DOT’s Data Capture and Management (DCM) Program held two all-day

workshops on March 26-27, 2014 to engage key Program stakeholders in a

dialogue regarding the following topics:

– RDE’s features/capabilities

– RDE content (e.g., data quality and availability of metadata)

– Data policy

– Data management practices

– Stakeholder engagement

– Long-term visioning for the DCM Program and the RDE

The workshops were comprised of a series of short presentations from the DCM

Program’s federal and contractor support staff, supplemented by a variety of

break-out group brainstorming discussions and other facilitated exercises

Notes and recommendations from the workshops were consolidated into a report

and used to inform future DCM Program priorities and activities

(27)

Measuring usage of the RDE is done through a variety of means

The following four-tiered approach has been applied to measure and evaluate

ongoing RDE usage

The findings from each of these measurement approaches are shared with DCM

Program leadership in the form of an executive dashboard and quarterly reports

Formal, direct outreach and discussion with key RDE stakeholders,leveraging standardized interview protocols

Stakeholder Interviews

Written data gathering efforts delivered via the RDE website and Survey Monkey to select individual stakeholders

Surveys

Statistical review of the RDE website analytics, focusing on quantitative data such as number of user logins and downloads

Website Assessment

Review of RDE data to identify gaps, inconsistencies, and trends in stakeholder data needs

Data Analysis

(28)

Quantitative analyses are performed based on usage of the RDE,

serving as an indication for engagement of the user community

The following represent some of the quantitative RDE analytics that are captured

and reported

(29)

Other planned or ongoing means of user outreach

Ongoing feedback is currently being

solicited via the RDE Feedback

Form, available to all registered

RDE account holders through the

RDE website under the “About” tab

The next RDE User Satisfaction

survey will be deployed in the winter

2015 timeframe

Additional comments, questions, suggestions, and concerns can be submitted to

(30)

Sharing and re-using FOT data involves closely related and highly

collaborative efforts

Data

Sharing

and Re-Use

Guidance and Requirements Collection and Management Uploading and Measurement and Evaluation

(31)

To enhance and expand data re-use, the DCM Program welcomes

your ideas and suggestions

The DCM Program has historically been charged with providing research-ready CV data for purposes of research and application development.

As CV technology moves from the R&D stage into a deployment and operations stage over the next 5 years, the DCM program will evolve into the broader ‘Connected Data Systems’ (CDS) program

– Focus will be on ways of obtaining and providing data to real-time connected vehicle applications that support traffic management and operations.

Your input is welcome regarding how the CDS program can best make data available for both research and operational purposes.

References

Related documents

Voice acknowledgement is provided by APR9600 (IC2). It is a single-chip voice recording and play back device that can record and play multiple message at random or in

e , f The 18 F-fluorodeoxyglucosepositron emission tomography-computed tomography ( 18 F-FDG PET/CT) scan performed five months after starting cART showed intense accumulation

En efecto, así como los libertarianos ven en cual- quier forma de intervención del Estado una fuente inevitable de interferencias arbitrarias –con la excepción de aquella acción

spectra of the gratings as a function of angle of incidence will show a single resonance line that Figure 1.6: The SPP dispersion relation for coupling by a grating.. The

VFD-patients were less impaired in SF-36 scores than general stroke patients one month post lesion (6/8 subscales) but had lower SF-36 scores compared to stroke patients six months

High quality research is needed to advance knowledge on the effectiveness of pain interventions, integrated care, models of care delivery, and reimbursement innovations.. Also

[r]

resolution process very onerous to all concerned, politicians, regulators, and the firms themselves, so