Background
l
IBDB v1 was based on ICIS
l
The Data Management System (
DMS
) is the
component of ICIS that manages
Background
The Functions of the DMS are to:
1.
store and manage documented and
structured data from genetic resource,
variety evaluation and crop improvement
studies,
2.
link data to specialised data sources such
as GMS, soil and climate databases and
3.
facilitate enquiries, searches and data
Background
What goes in the DMS?
- raw observed data, derived data, means
data and summary statistics.
- ideally, any data that are routinely
The Effect of Fertilizer N on the yield of rice Year: 2000
Season: Dry Season
Rep Main Plot Sub Plot Variety Fert Yield
1 1 1 B 20 10.3
1 1 2 B 50 12.7
1 2 1 C 50 18.4
1 2 2 C 20 13.7
1 3 1 A 20 12.6
1 3 2 A 50 16.7
2 1 1 B 50 19.2
2 1 2 B 20 12.3
2 2 1 A 20 17.1
2 2 2 A 50 14.1
2 3 1 C 50 16.3
2 3 2 C 20 12.2
Filename: N2000DS.xls
ICIS DMS Schema
Study Factor
Level
Variate
Observation Unit
DATA MODEL
NITROGEN FERTILIZER
PROPERTY
SCALE kg/ha
METHOD
Fert
Total Application
Variate Factor
Yield
GRAIN YIELD
t/ha
ICIS DMS Schema
Study Factor
Level
Variate
Observation Unit
Datum
PROPERTY
SCALE
ITERATIONS …
•
DMS 5.xx
•
DMS 6.0
•
IBDB v1 DMS
CONCEPTS
• The STUDY INFORMATION component records global
contextual information about the experiment
• The TRIAL ENVIRONMENT component manages all data
values describing the environments observed in the study including georeference information, place names, growing
environments, and overall management practices (non-treatment factors).
• The GERMPLASM ENTRY component manages all label
values describing the germplasm entries in the the experiment including local and global identifiers, names, sources and roles (check or test lines) of the entries.
CONCEPTS
• The TRIAL DESIGN component manages the treatment and
sampling design and structure of the datasets in the study.
• The OBSERVATION component manages the values of the
variates for each dataset.
CONCEPTS
CONCEPTS
Overview of the ONTOLOGY MANAGEMENT SYSTEM
The Crop Ontology contains several controlled vocabularies (CV): The PROPERTIES CV has terms describing the design and
treatment factors applied in phenotyping experiments and the traits being measured in them.
The METHODS CV has terms describing the protocols by which those properties are applied or measured in phenotyping
experiments.
The SCALES CV has terms describing the scales or units in which the values of the properties are recorded.
The STANDARD VARIABLES CV has terms defined by combinations of one property term, one method term and one scale term which
CONCEPTS
• Chado is one of the relational databases that are used
in GMOD
• Its tables are broken down into groups called modules
• Chado has been designed to allow extensibility
• Chado is also ontology-aware. One could state this
even more forcefully: Chado depends on ontologies.
Source: http://gmod.org/wiki/Overview
CONCEPTS
CHADO Modules
Audit - for database audits
Companalysis - for data from computational
analysis
Contact - for people, groups, and organizations
Controlled Vocabulary (cv) - for controlled vocabularies and ontologies
Expression - for summaries of RNA and protein
expresssion
General - for identifiers
Genetic - for genetic data and genotypes
CONCEPTS
CHADO Modules
Mage - for microarray data
Map - for maps without sequence
Organism - for taxonomic data
Phenotype - for phenotypic data
Phylogeny - for organisms and phylogenetic trees
Publication (pub) - for publications and references
Sequence - for sequences and sequence features
CONCEPTS
CHADO Modules
Controlled Vocabulary (cv) - for controlled
vocabularies and ontologies
Phenotype - for phenotypic data
CONCEPTS
CHADO Natural Diversity Modules
• The Chado Natural Diversity Module is an
extension to the Chado schema to better support
natural diversity data (phenotypic).
• Under development
CHADO ND MODULE
ND_geolocationND_experiment
NEW DMS SCHEMA
CHADO MODULES Controlled
Vocabulary (cv) Phenotype
Stock
CHADO ND MODULE tables
ND_geolocation
CONCEPTS
Key elements of the Logical Data Model for Penotyping Data
DIFFERENCES BETWEEN IBDBV1 AND IBDBV2
DIFFERENCES
• User defined factors that was central to IBDB V1 has
been dropped. In effect each IBDBv2 study has four factors.
• The old constraint that the factor label had to be
maximally discriminatory has been implemented
slightly differently by assuming or imposing an ID in each factor – study id, trial instance, germplasm entry, and field plot.
• values of variables are stored in their respective
components instead of just splitting then into label levels and data values
• effect and representation are now rolled into the
concept of dataset and managed through the