P1752 Metadata Subgroup Group Meeting

(1)

Biology (EMB) Standards Committee

P1752 Metadata Subgroup Group Meeting

5 November 2019 Teleconference

(2)

Members/Attendance

  Subgroup chair: Ida Sim, Open mHealth / UCSF   Subgroup secretary: Anand Nandugudi, U Memphis   Call out your name in the following order if you’re here (so we can get familiar with your voice)   Pradeep Balachandran   Jakob Bardram   Daniela Brunner   Simona Carini   Paul Harris   Shivayogi Hiremath   Sean McConnell   Leonard Njeru Njiru   Henry Ogoe   Paul Petronelli   Udi Rubin   Anna T   Vishnu Ravi

(3)

(4)

-> { Header/min metadata } { Body: datapoint }

Data stream(s) Firmware/ _algorithm

SMART-on-FHIR environment mDATA app Electronic Health Record Digital Health Software Tool IoT Smart

Thermometer _{IoT Motion} Sensor Smartphone Wrist sensor Ingestible Sensor Patch data and metadata standards Software Datasheet Dataset & ML Details

Metadata

Landscape

Cloud or Edge: Runtime acquisition and privacy metadata Clinical Trial Management System Product Datasheet Privacy Policy Hardware Datasheet UDI Personal Datasheet Privacy Policy Study Protocol Data Sharing Policy

(5)

Action Items

(6)

Action Items from October 15

• Globally unique DatapointID – Sean, Jakob

• Review next iteration of data absence – Ida/Anand

• Deferred for now

•  Review handling of privacy metadata •  Use AMA Blood Pressure use case as driving example after metadata minimum stabilized

(7)

(8)

Datapoint versus Datapoint series: UUIDs?

•  Schema can be used for instances of arrays of observations (i.e. a series) not only a single datapoint •  Metadata must be identical for every data point in the series. •  Assign 128-bit UUID under RFC 4122 namespace? •  error rate ~1 in a billion for non-unique ids •  Is a UUID assigned to the Datapoint or each observation in the Datapoint series? Metadata Schema Datapoint Metadata Schema Datapoint series JSON arrays are ordered

(9)

(10)

Data Absence

•  Absence is no data value of any sort available even though a value was expected, e.g., •  regular sampling of a data series •  data collection was intended (e.g., per protocol) •  If data of any sort is available, then data is not absent •  e.g., person did not respond to EMA when triggered: the response is absent, but there is a time stamp of initiation so the EMA can be labeled “missed” •  data may be of poor quality, “poor” being with respect to a specific use •  Insufficient data •  Unusable data , e.g., •  Motion above a threshold •  Sensor not worn •  data may be obfuscated for privacy or other reasons and may or may not be labelled as such

(11)

Reasons for Data Absence

• Potential reasons include

•  Device failure (battery, software/hardware failure) •  Buffer full if onboard storage •  Break in the communication link •  Privacy enabled for a time period – only reason we may know for certain •  Deliberate tampering

• Example value set

•  See FHIR Observation Resource dataAbsentReason value set

(12)

Four Scenarios

Datapoint series is acquired on a regular basis 1.  Algorithm or system that uses the data could infer that data is absent if sampling rate is known (see later examples) Datapoint series is sampled on an irregular basis 2.  With known initiation (e.g., trigger, per protocol) •  datapoint will not have an observation value, but with time stamp of initiation + possibly a label (e.g., “missed”) 3.  Without known initiation (i.e., ad hoc data) •  No datapoint at all. Absence discoverable only by interrogating system. 4.  Datapoint has/had observation value but has been deemed “unusable” or ”private” somewhere along processing chain. Value may be reported as “no value”, etc

(13)

Proposal for Metadata Minimum

• Objective is to support identification of absent data

•  Able only to support identification of absence of regularly acquired data •  via acquisition_rate in Acquisition metadata, used only for datapoint series •  Absence of irregularly acquired data is discoverable through system interrogation

• Will not represent absent data, i.e., we will not have datapoints with

metadata but no data

• Will not capture reason for data absence or removal of data due to

lack of usability

•  Non-trivial to diagnose, no one taxonomy meets all needs •  If reason was due to privacy withhold, knowing that is a privacy leak

(14)

Data Absence – cStress Example

Datapoint series without sampling/ acquisition rate •  {! "header": {! "id": "123e4567-e89b-12d3-a456-426655440000",! [...]! },! "body": {! "stress_values": [! {! "probability": 0.75,! "effective_time_frame": {! "time_interval": {! "start_date_time": "2019-08-01T07:00:00Z",! "end_date_time": "2019-08-01T07:00:59Z"! }! }! },! {! "probability": 0.85,! "effective_time_frame": {! "time_interval": {! "start_date_time": "2019-08-01T07:01:00Z",! "end_date_time": "2019-08-01T07:01:59Z"! }! }! },! {! "probability": 0.80,! "effective_time_frame": {! "time_interval": {! "start_date_time": "2019-08-01T07:03:00Z",! "end_date_time": "2019-08-01T07:03:59Z"! }! }! }! ]! }! }

•  Explicit about start time

•  Explicit about duration of effective time

•  But can’t represent that a datapoint in a series

should be there but isn’t

2019-08-01T07:02:00Z” to

2019-08-01T07:02:59Z” is missing

(15)

Data Absence – cStress Example

With ”sampling/acquisition rate” and offset {! "header": {! "id": "123e4567-e89b-12d3-a456-426655440000",! ”acquisition_rate": {! ”value": 1/60,! ”unit": “hz”! }, ! [...]! },! "body": {! "stress_values": [! {! "offset": 0,! "probability": 0.75! },! {! "offset": 1,! "probability": 0.85! },! {! "offset": 3,! "probability": 0.80! ! }! ],! "effective_time_frame": {! "time_interval": {! "start_date_time": "2019-08-01T07:00:00Z",! "end_date_time": "2019-08-01T07:03:59Z"! }! }! }! }!

•  Offsets imply exactly regular data acquisition, which may not be the case

•  Doesn’t say that each stress value is effective over 1 minute

•  Effective time frame of the whole series can be difficult to define (eg what if last value is absent? What if half of values are absent?)

2019-08-01T07:02:00Z” to

2019-08-01T07:02:59Z” is missing

(16)

Data Absence – Proposed Approach

With sampling/acquisition rate and no offsets {! "header": {! "id": "123e4567-e89b-12d3-a456-426655440000",! ”acquisition_rate": {! ”value": 1/60,! ”unit": “hz”! }, ! [...]! },! "body": {! "stress_values": [! {! "probability": 0.75! "start_date_time": "2019-08-01T07:00:00Z",! ”duration": {! ”value": 1,! ”unit": “min”! }! },! {! "probability": 0.85! "start_date_time": "2019-08-01T07:01:00Z",! ”duration": {! ”value": 1,! ”unit": “min”! }! },! {! "probability": 0.80! "start_date_time": "2019-08-01T07:03:00Z",! ”duration": {! ”value": 1,! ”unit": “min”! }! }! ]! }! }

•  Explicit about start time

•  Explicit about duration of effective time

•  can be represented using duration or start and end times

•  Given the expected acquisition rate, it can be inferred that a value is missing

2019-08-01T07:02:00Z” to

2019-08-01T07:02:59Z” is missing

(17)

Mininum Metadata:

Proposal

(18)

Metadata Elements: Datapoint

Needs Property

(bold = required) Example

Which datapoint is this? UUID (datapoint, datapoint

series?) Generate using RFC 4122 approach

What does this value represent? schema ID and schema metadata Pointer to the stress datapoint schema

(19)

Metadata Elements: Acquisition

Needs Properties (bold = required) Example When was this datapoint first created at the (sensor) source? Recorded or packaged time. source_creation_datetime date-time schema represents a point in time (ISO8601). Timezone is UTC unless otherwise specified 2019-08-01T07:01:00Z

Was the datapoint sensed or self-reported? modality sensed

If data was acquired with a periodic

(20)

Metadata Elements: Source

Needs Properties (bold = required) Example What firmware/algorithm? What hardware? What app/ product? Which person? Which study? Pointer(s) to Software Datasheet, Hardware Datasheet (UDI), Product Datasheet, Personal Datasheet (User ID), Study Datasheet (Study ID) Datasheet type {software, hardware, product, personal, study} Pointer: PURL etc

(21)

(22)

Outstanding Items

• Datapoint UUID – Sean, Jakob

• Source_creation_datetime – Paul P

• Draft metadata schema and other examples

• AMA Blood Pressure use case to test

(23)

(24)

Upcoming Meetings

• Metadata WG

(25)

P1752 Metadata Subgroup Group Meeting

Sponsored by IEEE Engineering in Medicine &

Biology (EMB) Standards Committee