Contribution to the Open data strategy in the
Wheat Initiative
Esther Dzalé Yeumo Kaboré
2
Context
Coordinate worldwide research efforts in the fields of wheat genetics, genomics, physiology, breeding and agronomy. (Sept. 2011)
Share relevant agricultural data available from G-8 countries with African partners and … develop options for the establishment of a global platform to make reliable agricultural and related information available to African partners. (April 2013)
seeks to support global efforts to make agricultural and nutritionally relevant data available, accessible, and usable for unrestricted use worldwide. (Oct. 2013)
3
Some Societal challenges …. :
Feed the world
Climate change
Sustainable agriculture
Health and nutrition
Imply to deal with data :
Data driven science
Big data
Data management, sharing, and re-use
With different point of views :
Politics
Technics (IT)
Scientific Disciplines
Intellectual property, ethics
Economics
4
Status
Recognized and endorsed by the Research Data Alliance (RDA)
Part of the Wheat Initiative Information System project
Focus:
The WG aims to provide a common framework for describing,
representing linking and publishing Wheat data with respect to open standards.
The WG will focus first on the following data types: SNP, Genomic
annotations, Phenotypes, Genetic Maps, Physical Maps, Germplasm, expression data.
Issues
Build on existing stuffs
Keep coherence with existing projects and initiatives as much as
possible
Adoption
5
5 stars Linked Open Research Data
*
**
***
****
*****
Publish your data on the Web at a stable URI (whatever
structured format) under an open license
Use non-proprietary formats (e.g., CSV instead of Excel)
Document your data: Provide human-readable documentation (the research context, data collection methods, data preparation, etc.) and basic metadata (creator, publisher, date of creation, last modification, version number, etc.)
When using a in-house vocabulary, make it available via a stable URI, both as a formal file and human-readable documentation, using content negotiation
Link to others by re-using existing vocabularies to name things and their relationships rather than re-inventing. Link out explicitly to external data sources.
Make it easy to find and access
Make it easy to re-use Put it in context Make it easy to understand
6
Our work plan
What Who
A survey to inventory the assets The Wheat community
Analyse the results and discuss their consequences
The work group + Wheat experts
Produce a report The work group (+ adption groups for the review)
Write a cookbook to provide the Wheat data managers with
guidelines
The work group (+ adption groups for the review)
Identify data interoperability use cases
The work group + adoption groups
A library of linked vocabularies and ontologies
The work group (+ adption groups for the review)
7
Data formats
Inventory
SNPs Genomic annotations
Phenotypes Genetic maps
Physical maps
8
Vocabularies and ontologies
Gene Ontology
Sequence Ontology
Plant Ontology/Anatomy
Plant Ontology/development stage
Trait Ontology
Crop Ontology (Wheat Trait Ontology)
Project specific trait ontologies (Drops, agronomics)
others?
9
Metadata standards
Darwin Core
Dublin Core
MAGE
MINSEQE (Minimum Information about a high-throughput SeQuencing
Experiment)
Others?
Practices
Data storage?
Data policy?
Guidelines for data management?
10
Report of the survey
A cookbook
What kind of entities and relationships are involved in describing and
accessing Wheat data => ontologies
What properties should be considered for publishing meaningful/useful
LOD-ready Wheat data
What controlled vocabularies terms are appropriate in any given property
when producing LOD-ready Wheat data
A library of linked vocabularies and ontologies
A prototype that demonstrate the gain of interoperability
11
Configuration
What data?
Where the data comes from?
How the data sources are connected?
How the data will be integrated?
To what ontologies/models the newly imported data will be matched?
Data import using flexible connectors for csv, xml, sql,
sparql, rdf, etc.
Abstract to semantic environment using advanced
data/ontologies selectors
COEUS: rapid build of knowledge bases
12
A data integration tool
Developed by the Information institute, University of south California
quick and easy data integration from a variety of data sources including
databases, spreadsheets, delimited text files, XML, JSON, KML and Web APIs.
Karma learns to recognize the mapping of data to ontology classes.
Many demos and use cases available on the website of Karma here:
http://www.isi.edu/integration/karma/
13
Be an adoption group
Provide data interoperability use cases
Review the outputs and give feedbacks
Utilize the cookbook
Initial Adopters of Working Group Deliverables:
The International Wheat Initiative
The French National Institute for agricultural research (INRA)
The Food and Agriculture Organization of the United Nations (FAO)
The International Maize and Wheat Improvement Center (CIMMYT)