Sponsored by IEEE Engineering in Medicine &
Biology (EMB) Standards Committee
P1752 Metadata Subgroup Group Meeting
5 November 2019 Teleconference
Members/Attendance
Subgroup chair: Ida Sim, Open mHealth / UCSF Subgroup secretary: Anand Nandugudi, U Memphis Call out your name in the following order if you’re here (so we can get familiar with your voice) Pradeep Balachandran Jakob Bardram Daniela Brunner Simona Carini Paul Harris Shivayogi Hiremath Sean McConnell Leonard Njeru Njiru Henry Ogoe Paul Petronelli Udi Rubin Anna T Vishnu Ravi-> { Header/min metadata } { Body: datapoint }
Data stream(s) Firmware/ algorithm
SMART-on-FHIR environment mDATA app Electronic Health Record Digital Health Software Tool IoT Smart
Thermometer IoT Motion Sensor Smartphone Wrist sensor Ingestible Sensor Patch data and metadata standards Software Datasheet Dataset & ML Details
Metadata
Landscape
Cloud or Edge: Runtime acquisition and privacy metadata Clinical Trial Management System Product Datasheet Privacy Policy Hardware Datasheet UDI Personal Datasheet Privacy Policy Study Protocol Data Sharing PolicyAction Items
Action Items from October 15
•
Globally unique DatapointID – Sean, Jakob
•
Review next iteration of data absence – Ida/Anand
•
Deferred for now
• Review handling of privacy metadata • Use AMA Blood Pressure use case as driving example after metadata minimum stabilizedDatapoint versus Datapoint series: UUIDs?
• Schema can be used for instances of arrays of observations (i.e. a series) not only a single datapoint • Metadata must be identical for every data point in the series. • Assign 128-bit UUID under RFC 4122 namespace? • error rate ~1 in a billion for non-unique ids • Is a UUID assigned to the Datapoint or each observation in the Datapoint series? Metadata Schema Datapoint Metadata Schema Datapoint series JSON arrays are orderedData Absence
• Absence is no data value of any sort available even though a value was expected, e.g., • regular sampling of a data series • data collection was intended (e.g., per protocol) • If data of any sort is available, then data is not absent • e.g., person did not respond to EMA when triggered: the response is absent, but there is a time stamp of initiation so the EMA can be labeled “missed” • data may be of poor quality, “poor” being with respect to a specific use • Insufficient data • Unusable data , e.g., • Motion above a threshold • Sensor not worn • data may be obfuscated for privacy or other reasons and may or may not be labelled as suchReasons for Data Absence
•
Potential reasons include
• Device failure (battery, software/hardware failure) • Buffer full if onboard storage • Break in the communication link • Privacy enabled for a time period – only reason we may know for certain • Deliberate tampering•
Example value set
• See FHIR Observation Resource dataAbsentReason value setFour Scenarios
Datapoint series is acquired on a regular basis 1. Algorithm or system that uses the data could infer that data is absent if sampling rate is known (see later examples) Datapoint series is sampled on an irregular basis 2. With known initiation (e.g., trigger, per protocol) • datapoint will not have an observation value, but with time stamp of initiation + possibly a label (e.g., “missed”) 3. Without known initiation (i.e., ad hoc data) • No datapoint at all. Absence discoverable only by interrogating system. 4. Datapoint has/had observation value but has been deemed “unusable” or ”private” somewhere along processing chain. Value may be reported as “no value”, etcProposal for Metadata Minimum
•
Objective is to support identification of absent data
• Able only to support identification of absence of regularly acquired data • via acquisition_rate in Acquisition metadata, used only for datapoint series • Absence of irregularly acquired data is discoverable through system interrogation•
Will not represent absent data, i.e., we will not have datapoints with
metadata but no data
•
Will not capture reason for data absence or removal of data due to
lack of usability
• Non-trivial to diagnose, no one taxonomy meets all needs • If reason was due to privacy withhold, knowing that is a privacy leakData Absence – cStress Example
Datapoint series without sampling/ acquisition rate • {! "header": {! "id": "123e4567-e89b-12d3-a456-426655440000",! [...]! },! "body": {! "stress_values": [! {! "probability": 0.75,! "effective_time_frame": {! "time_interval": {! "start_date_time": "2019-08-01T07:00:00Z",! "end_date_time": "2019-08-01T07:00:59Z"! }! }! },! {! "probability": 0.85,! "effective_time_frame": {! "time_interval": {! "start_date_time": "2019-08-01T07:01:00Z",! "end_date_time": "2019-08-01T07:01:59Z"! }! }! },! {! "probability": 0.80,! "effective_time_frame": {! "time_interval": {! "start_date_time": "2019-08-01T07:03:00Z",! "end_date_time": "2019-08-01T07:03:59Z"! }! }! }! ]! }! }• Explicit about start time
• Explicit about duration of effective time
• But can’t represent that a datapoint in a series
should be there but isn’t
2019-08-01T07:02:00Z” to
2019-08-01T07:02:59Z” is missing
Data Absence – cStress Example
With ”sampling/acquisition rate” and offset {! "header": {! "id": "123e4567-e89b-12d3-a456-426655440000",! ”acquisition_rate": {! ”value": 1/60,! ”unit": “hz”! }, ! [...]! },! "body": {! "stress_values": [! {! "offset": 0,! "probability": 0.75! },! {! "offset": 1,! "probability": 0.85! },! {! "offset": 3,! "probability": 0.80! ! }! ],! "effective_time_frame": {! "time_interval": {! "start_date_time": "2019-08-01T07:00:00Z",! "end_date_time": "2019-08-01T07:03:59Z"! }! }! }! }!• Offsets imply exactly regular data acquisition, which may not be the case
• Doesn’t say that each stress value is effective over 1 minute
• Effective time frame of the whole series can be difficult to define (eg what if last value is absent? What if half of values are absent?)
2019-08-01T07:02:00Z” to
2019-08-01T07:02:59Z” is missing
Data Absence – Proposed Approach
With sampling/acquisition rate and no offsets {! "header": {! "id": "123e4567-e89b-12d3-a456-426655440000",! ”acquisition_rate": {! ”value": 1/60,! ”unit": “hz”! }, ! [...]! },! "body": {! "stress_values": [! {! "probability": 0.75! "start_date_time": "2019-08-01T07:00:00Z",! ”duration": {! ”value": 1,! ”unit": “min”! }! },! {! "probability": 0.85! "start_date_time": "2019-08-01T07:01:00Z",! ”duration": {! ”value": 1,! ”unit": “min”! }! },! {! "probability": 0.80! "start_date_time": "2019-08-01T07:03:00Z",! ”duration": {! ”value": 1,! ”unit": “min”! }! }! ]! }! }• Explicit about start time
• Explicit about duration of effective time
• can be represented using duration or start and end times
• Given the expected acquisition rate, it can be inferred that a value is missing
2019-08-01T07:02:00Z” to
2019-08-01T07:02:59Z” is missing
Mininum Metadata:
Proposal
Metadata Elements: Datapoint
Needs Property
(bold = required) Example
Which datapoint is this? UUID (datapoint, datapoint
series?) Generate using RFC 4122 approach
What does this value represent? schema ID and schema metadata Pointer to the stress datapoint schema
Metadata Elements: Acquisition
Needs Properties (bold = required) Example When was this datapoint first created at the (sensor) source? Recorded or packaged time. source_creation_datetime date-time schema represents a point in time (ISO8601). Timezone is UTC unless otherwise specified 2019-08-01T07:01:00ZWas the datapoint sensed or self-reported? modality sensed
If data was acquired with a periodic