DMS_New_Schema.pdf

(1)

(2)

Background

l 

IBDB v1 was based on ICIS

l 

The Data Management System (

DMS

) is the

component of ICIS that manages

(3)

Background

The Functions of the DMS are to:

1. 

store and manage documented and

structured data from genetic resource,

variety evaluation and crop improvement

studies,

2. 

link data to specialised data sources such

as GMS, soil and climate databases and

3. 

facilitate enquiries, searches and data

(4)

Background

What goes in the DMS?

- raw observed data, derived data, means

data and summary statistics.

- ideally, any data that are routinely

(5)

The Effect of Fertilizer N on the yield of rice Year: 2000

Season: Dry Season

Rep Main Plot Sub Plot Variety Fert Yield

1 1 1 B 20 10.3

1 1 2 B 50 12.7

1 2 1 C 50 18.4

1 2 2 C 20 13.7

1 3 1 A 20 12.6

1 3 2 A 50 16.7

2 1 1 B 50 19.2

2 1 2 B 20 12.3

2 2 1 A 20 17.1

2 2 2 A 50 14.1

2 3 1 C 50 16.3

2 3 2 C 20 12.2

Filename: N2000DS.xls

(6)

ICIS DMS Schema

Study Factor

Level

Variate

Observation Unit

(7)

DATA MODEL

NITROGEN FERTILIZER

PROPERTY

SCALE kg/ha

METHOD

Fert

Total Application

Variate Factor

Yield

GRAIN YIELD

t/ha

(8)

ICIS DMS Schema

Study Factor

Level

Variate

Observation Unit

Datum

PROPERTY

SCALE

(9)

ITERATIONS …

• DMS 5.xx

• DMS 6.0

• IBDB v1 DMS

(10)

CONCEPTS

•  The STUDY INFORMATION component records global

contextual information about the experiment

•  The TRIAL ENVIRONMENT component manages all data

values describing the environments observed in the study including georeference information, place names, growing

environments, and overall management practices (non-treatment factors).

•  The GERMPLASM ENTRY component manages all label

values describing the germplasm entries in the the experiment including local and global identifiers, names, sources and roles (check or test lines) of the entries.

(11)

CONCEPTS

•  The TRIAL DESIGN component manages the treatment and

sampling design and structure of the datasets in the study.

• The OBSERVATION component manages the values of the

variates for each dataset.

(12)

CONCEPTS

(13)

CONCEPTS

Overview of the ONTOLOGY MANAGEMENT SYSTEM

The Crop Ontology contains several controlled vocabularies (CV): The PROPERTIES CV has terms describing the design and

treatment factors applied in phenotyping experiments and the traits being measured in them.

The METHODS CV has terms describing the protocols by which those properties are applied or measured in phenotyping

experiments.

The SCALES CV has terms describing the scales or units in which the values of the properties are recorded.

The STANDARD VARIABLES CV has terms defined by combinations of one property term, one method term and one scale term which

(14)

CONCEPTS

• Chado is one of the relational databases that are used

in GMOD

• Its tables are broken down into groups called modules

• Chado has been designed to allow extensibility

• Chado is also ontology-aware. One could state this

even more forcefully: Chado depends on ontologies.

Source: http://gmod.org/wiki/Overview

(15)

CONCEPTS

CHADO Modules

Audit - for database audits

Companalysis - for data from computational

analysis

Contact - for people, groups, and organizations

Controlled Vocabulary (cv) - for controlled vocabularies and ontologies

Expression - for summaries of RNA and protein

expresssion

General - for identifiers

Genetic - for genetic data and genotypes

(16)

CONCEPTS

CHADO Modules

Mage - for microarray data

Map - for maps without sequence

Organism - for taxonomic data

Phenotype - for phenotypic data

Phylogeny - for organisms and phylogenetic trees

Publication (pub) - for publications and references

Sequence - for sequences and sequence features

(17)

CONCEPTS

CHADO Modules

Controlled Vocabulary (cv) - for controlled

vocabularies and ontologies

Phenotype - for phenotypic data

(18)

CONCEPTS

CHADO Natural Diversity Modules

• The Chado Natural Diversity Module is an

extension to the Chado schema to better support

natural diversity data (phenotypic).

• Under development

(19)

CHADO ND MODULE

ND_geolocation

ND_experiment

(20)

NEW DMS SCHEMA

CHADO MODULES Controlled

Vocabulary (cv) Phenotype

Stock

CHADO ND MODULE tables

ND_geolocation

(21)

CONCEPTS

Key elements of the Logical Data Model for Penotyping Data

(22)

DIFFERENCES BETWEEN IBDBV1 AND IBDBV2

DIFFERENCES

•  User defined factors that was central to IBDB V1 has

been dropped. In effect each IBDBv2 study has four factors.

• The old constraint that the factor label had to be

maximally discriminatory has been implemented

slightly differently by assuming or imposing an ID in each factor – study id, trial instance, germplasm entry, and field plot.

•  values of variables are stored in their respective

components instead of just splitting then into label levels and data values

•  effect and representation are now rolled into the

concept of dataset and managed through the