• No results found

PhUSE Metadata Management Project

N/A
N/A
Protected

Academic year: 2021

Share "PhUSE Metadata Management Project"

Copied!
22
0
0

Loading.... (view fulltext now)

Full text

(1)

PhUSE  Metadata  Management  Project  

 

Metadata,  Study  data  standards,  Master  

data,  terminology  and  interoperability  

defini<ons  

 

Mitra  Rocca,  FDA  

Marcelina  Hungria,  DIcore  Group  

 

 

 

 

(2)

Table  of  Content  

PhUSE  CSS  –  Emerging  Technology  (ET)  Working  group  

Metadata  Management  Project  

Metadata  DefiniHons    

Metadata  ImplementaHon    

DefiniHons  

Metadata  and  study  data  standards  

Master  data  

Controlled  terminology  

Interoperability  

Pooling,  aggregaHon,  integraHon  

Lessons  learned  

(3)

PhUSE  CSS  Emerging  Technology  

FDA/PhUSE  ComputaHonal  Science  Symposium  

(CSS)  is  a  collaboraHve  effort  between  industry  

and  the  FDA  to  work  on  implementaHon  of  data  

standards  

In  2013  a  new  working  group  was  established  

focusing  on  the  following  emerging  technologies:  

semanHc  technology  (now  in  a  dedicated  working  group)  

Metadata  management  

Cloud  compuHng    

Big  data  

(4)

Metadata  Management  Project  Goals  

Changing  landscape:  need  for  concept  based  

Metadata  Repository  (MDR)  from  protocol  to  

data  submission    

(5)

Project  Team  Deliverables  

Metadata  Defini<ons  Document  

hZp://www.PhUSEwiki.org/wiki/index.php?Htle=Metadata_Management

 

Comments  to  FDA  Guidances  

(6)

Metadata  Defini<ons  Soup  ….  

(7)

Metadata  Defini<ons  

1  METADATA  MANAGEMENT

   

1.1  Metadata    

1.2  Structural  metadata     1.3  Descrip5ve  metadata     1.4  Study  Instance  Metadata     1.5  Metadata  repository     1.6  Metadata  registry     1.7  Data  element     1.8  ABribute     1.9  Class     1.10  Data  type    

1.11  Value  level  metadata    

2  CONTROLLED  TERMINOLOGY,  CODE  SYSTEMS  &  

VALUE  SETS

   

2.1  Controlled  Terminology/controlled   vocabulary     2.2  Code  system     2.3  Dic5onary     2.4  Concept     2.5  Code     2.6  Code  list     2.7  Value  set    

 

3  MASTER  DATA  MANAGEMENT  

3.1  Master  Data    

3.2  (Master)  Reference  Data     3.3  Master  Data  Management    

4  INTEROPERABILITY    

Categoriza5on  of  Interoperability  (by  HL7)     4.1  Technical  interoperability  (“machine   interoperability”)    

4.2  Seman5c  interoperability     4.3  Process  Interoperability    

5  DATA  AGGREGATION,  INTEGRATION,  POOLING  

5.1  Data  pooling    

5.2  Data  integra5on     5.3  Data  aggrega5on    

(8)

Approach  

Defini<ons  

Lessons  Learned

 

(9)

Metadata  Defini<ons  

1  METADATA  MANAGEMENT

   

1.1  Metadata    

1.2  Structural  metadata     1.3  Descrip5ve  metadata     1.4  Study  Instance  Metadata     1.5  Metadata  repository     1.6  Metadata  registry     1.7  Data  element     1.8  ABribute     1.9  Class     1.10  Data  type    

1.11  Value  level  metadata    

2  CONTROLLED  TERMINOLOGY,  CODE  SYSTEMS  &  

VALUE  SETS

   

2.1  Controlled  Terminology/controlled   vocabulary     2.2  Code  system     2.3  Dic5onary     2.4  Concept     2.5  Code     2.6  Code  list     2.7  Value  set    

 

3  MASTER  DATA  MANAGEMENT  

3.1  Master  Data    

3.2  (Master)  Reference  Data     3.3  Master  Data  Management    

4  INTEROPERABILITY    

Categoriza5on  of  Interoperability  (by  HL7)     4.1  Technical  interoperability  (“machine   interoperability”)    

4.2  Seman5c  interoperability     4.3  Process  Interoperability    

5  DATA  AGGREGATION,  INTEGRATION,  POOLING  

5.1  Data  pooling    

5.2  Data  integra5on     5.3  Data  aggrega5on    

(10)

Synonym  

Reference  Data  Management;  MDM  

DefiniHon   &  

source  

[Gartner  –  Magic  Quadrant  for  Master  Data  Management  of  Customer  Data  SoluHon]   hZp://www.gartner.com/technology/reprints.do?id=1-­‐1CK9UDO&ct=121019&st=sb    

MDM   is   a   technology-­‐enabled   discipline   in   which   business   and   IT   work   together   to   ensure   the   uniformity,   accuracy,  stewardship,  semanHc  consistency  and  accountability  of  the  enterprise's  official,  shared  master  data   assets.  

[Source:  Master  Data  Management]  

Master   Data   Management   (MDM)   is   the   collecHve   applicaHon   of   governance,   business   processes,   policies,   standards  and  tools  facilitate  consistency  in  data  definiHon.  

 The  idea  of  Master  Data  focuses  on  providing  unobstructed  access  to  a  consistent  representa5on  of  shared   informa5on  [Source:  SAS  White  Paper  on  SupporHng  Your  InformaHon  Strategy  with  a  Phased  Approach  to   Master  Data  Management  

DescripHon  

Master  Data  Management  (MDM)  comprises  of  a  set  of  processes  and  tools  that  consistently  define  and   manage  the  master  data  and  master  reference  data  of  an  enterprise,  which  are  fundamental  to  the   company’s  business  operaHons.  

MDM  has  the  objecHve  of  providing  processes  &  tools  for  collecHng,  aggregaHng,  matching,  consolidaHng,   quality-­‐assuring,  persisHng  and  distribuHng  such  data  throughout  an  organizaHon  to  ensure  consistency  and   control  in  the  ongoing  maintenance  and  applicaHon  use  of  this  informaHon.  

 There  are  different  models  for  master  data  management  –  the  2  main  extremes  are

•  Centralized  model  –  where  all  data  are  managed  within  a  central  data  store  and  pushed  to  the  different  

applicaHons  within  an  organizaHon.

•  Decentralized  model  (registry)  where  the  master  data  are  managed  within  each  applicaHons  but  then  

reconciled  through  a  registry  systems  to  federate.  

Example  

Specific  products  from  vendors  such  as  INFORMATICA,  IBM,  Soqware  AG,…  

Recommended  

definiHon  

Set  of  processes  and  tools  needed  for  the  deployment  of  master  data  and  master  reference  data  within  an   organizaHon.  

Approach    

(11)

Metadata  

(Organization/  Enterprise  Level)

Structural   Metadata Metadata Descriptive   Metadata Semantic   Descriptive   Metadata Process   Descriptive   Metadata

(Study  Level)

Study  Structural   Metadata Study   Metadata Study  Descriptive   Metadata

(Drug  Level)

Drug  Structural   Metadata Drug   Metadata Drug  Descriptive   Metadata Semantic   Descriptive   Metadata Process   Descriptive   Metadata

Subset  of  CDISC  CDASH,  SDTM  standard   (based  on  company  best  practice)

Subset  of  IDMP  standard  +  CDISC  (CDASH,  SDTM  for  a  compound)

Semantic   Descriptive   Metadata Process   Descriptive   Metadata

(12)
(13)

 

“Codelist”  

 

How  Controlled  

Vocabularies  are  described  

and  used

Concepts Concept   Representation Codes Code   Systems Designations Concept   Identifiers Value  Sets ISO  21090   Datatypes  –  the   CD  Concept   Descriptor Value  Set   Definition Value  Set   Versioning Code   System   Versioning

inspired from Julie James, BlueWave Informatics inspired from Julie James, BlueWave Informatics

Female   female   F  (Primary)  

Value  Set  &  Code  with  CDISC  example   C16576  for  F  

In  define.xml  (machine  processable):  

Code  System  (CodeList  Context):  

nciExtCodeID  (not  directly  processable  

–  URI  instead)  

Value  Set  (CodeList)  CUI  for  SEX:  

C66731  

Code  CUI  for  Designa<on  F  (Female):  

C16576      

Controlled  

Terminology  

C16576  +  F  

(14)
(15)

Data  Pooling,  Integra<on,  Aggrega<on  

INTEGRATION

:    

TransformaHon,    

mapping  or  

harmonizaHon  of  data    

(ETL  process)    

AGGREGATION

 

AddiHonal  

grouping  or  

derivaHon  of  data  

Dataset  1  

Dataset  2  

Dataset  3  

POOLING  

Storing  data    

together  without    

changing    

the  datasets  

(16)

Lessons  learned  (Compiled  from  

different  team  members)  

Efficient  Data  Integra<on  and  compliance  to  regulatory  standards  does  not  start  

ader  pooling  (retroac<ve  approach);  it  starts  with  the  protocol  (proac<ve  

approach)  

A  proac<ve  approach  is  based  on  two  components:  

o

DefiniHon  of  Master  Data  (Drug  Products,  Studies,  Sites,  InvesHgators,  ..)  and  

associated  descripHve  metadata  

o

DefiniHon  of  study  structural  metadata  –  aka  study  specific  data  standards  –  

as  a  subset  of  the  enterprise  wide  variables  and  value  sets  contained  in  a  

Metadata  repository  (MDR)  

To  be  manageable,  variables  in  an  MDR  need  to  be  grouped  in  seman<cally  

(17)

© 2014 PAREXEL INTERNATIONAL CORP. / 17 CONFIDENTIAL

CHANGING LANDSCAPE :

Enforcing data standards from protocol onwards

17

Retro-active approach from

paper protocol,

Different interpretations of

same protocol

Limited standards

Time to build integrated

SDTM data sets

Pro-active approach with

structural metadata

One single interpretation of

protocol

Increased efficiency, consistency

& quality through standards

Reduced time for integration and

secondary data use

(18)

© 2014 PAREXEL INTERNATIONAL CORP. / 18 CONFIDENTIAL

CONCEPT BASED MDR :

Protocol is not about variables but about concepts

Annotated eCRF for Patient Demography

SDTM data set (SAS)

(different t variables names and different structures than

eCRF)

?

?

(19)

© 2014 PAREXEL INTERNATIONAL CORP. / 19 CONFIDENTIAL

CONCEPT BASED MDR:

CDASH/SDTM can be organized by CRC

Concept

CDASH Question

CDASH

Variable

SDTM

Variable

Subject

What is the sex of

the subject?

SEX

SEX

What is the subject’s

date of birth?

BRTHDAT or

BRTHYR

BRTHMO

BRTHDY

BRTHDTC

What is the ethnicity

of the subject?

ETHNIC

EHTNIC

What is the race of

the subject?

RACE

RACE

What is the subject’s

age?

What are the age

units used?

AGE

AGEU

AGE

AGEU

eCRF content

description

mapping

SDTM

(20)

Conclusions  

Let  us  speak  the  same  language  

We  need  to  change  the  way  we  consider  compliance  to  data  standards  and  

data  integraHon:  

o

From  a  retroacHve  way  (building  define.xml  at  submission)    

o

To  a  proacHve  approach  (study  data  standards  defined  at  study  setup)    

We  need  new  tools  to  manage  metadata:  Concept  based  MDR    

o

Grouping  variables  into  semanHcally  meaningful  concepts  (following  

industry  wide  paZerns)  

o

Linking  data  sources  (e.g,  CDASH  based  collecHon)  to  data  submission  

(SDTM)  variables  

o

Linking  with  controlled  terminology  

(21)

Metadata  Defini<on  Project  

Par<cipants  

Isabelle  de  Zegher  (co-­‐chair)

 

Parexel    

Mitra  Rocca  (co-­‐chair)

 

 

FDA  

Marcelina  Hungria  (co-­‐chair)

 DiCore  Group  

Julie  James

 

 

 BlueWave  InformaHcs  

Tim  Church

 

 

 Torch  

Yun  Oldshue

 

 

 Takeda  

Praveen  Garg

 

 

 ICON  

Kenneth  Stoltzfus  

 

 Accenture  

Gregory  Steffens  

 

 NovarHs  

John  Leveille

 

 

 d-­‐Wise  

Aimee  Basile

 

 

 Celgene  

(22)

 

 

Mitra  Rocca  

Senior  Medical  InformaHcian  

Office  of  TranslaHonal  Sciences  

CDER,  FDA  

[email protected]  

 

Marcelina  Hungria  

Clinical  Data  Standards  &  IntegraHon  

Consultant  /  Owner  

DIcore  Group,  LLC  

[email protected]  

 

References

Related documents

– Metadata – title, keyword, description, H1 – Visible (readable) html content?. –

It is able to exploit simultaneously visual and textual content, metadata, and relational information, either implicit like word or im- age similarities, or explicit, like

Figure 6.9: Crawl rate plotted against percentage of the collection crawled with a random default scheme using the CSIRO collection. Furthermore, it is clear that the plots of

the sample’s distribution of members and non-members across these two dimensions.. these association-specific questions were directed at all firms, whereas some were only designed to

Sites may elect to provide only year of birth or provide the age at diagnosis (located on the Disease Description at Diagnosis eCRF) in lieu of birth year if this is a requirement

345/06 EXEMPT MINUTES RESOLVED that the exempt Minutes of the meeting of the Cabinet Member for Sustainable Communities held on 9 November 2006 be confirmed as a correct

Keywords: augmentative and alternative communication, phonemic awareness, alphabetic principle, complex communication needs, dynamic assessment.. Copyright © 2017,

Our analysis in yerba mate revealed 9,437 BlastX/tBlastn RBH pairs sensu stricto (equivalent Arabidopsis gene model peptide/yerba mate gene isoform), including 9,244 gene pairs out