DATA MANAGEMENT PLAN IN THE REAL LIFE…
SCIENCES
Yvan Le Bras Cyril Monjeaud Olivier Collin Jacques Nicolas
Context
Kahn. On the future of genomic data. Science (2011) vol. 331 (6018) pp. 728-9
Now : Genomics : Next Generation Sequencing Now : Proteomics Next : Bio-imaging Digital data Huge amount Heterogenous
Critical situation for some laboratories
Context
• Exchange from one domain to another
• From ICT / IT to scientific domains
• Between scientific domains
E-BIOGENOUEST
From the biogenouest project to the first french e-Science center : CeSGO
E-Biogenouest
• Started in May 2012 for 3 years
• Funded by Brittany and Pays de la Loire
• E-science initiative for the Biogenouest network
• Test an e-Science approach
E-Biogenouest
• Started in May 2012 for 3 years
• Funded by Brittany and Pays de la Loire
• E-science initiative for the Biogenouest network
• Test an e-Science approach
• Roadmap preparation
An innovative VRE concept More than 120 scientists trained! More than 200 users!
1669 meetings ;) -Mission interdisciplinarité CNRS -PIA -IFB -Fce Génomique -Rapsodyn -Sciences citoyennes -UEB C@mpus -CPER -FRM -INCa -H2020 Health IT Agro Environment 7 submitted publications
VRE: a tool for e-Science application
Virtual Research EnvironmentWeb portal Data softwares Processing resources User Community Collaboration
An innovative VRE approach
• Research Lifecycle
• Open source solutions
• Don’t reinvente the wheel
http://www.jisc.ac.uk/whatwedo/campaigns/res3/jischelp.aspx#simulate
Mutualise
win win
Communauté
HubZero
Galaxy EMME
Continuum
Continuum data management & analysis
HUBzero : Scientifique collaborative platform
eBGO HUB
HUBzero to share knowledge and manage groups and
projects Informations 218 users 111 projects 53 groups 729 resources
> 400 uniq users uniques by month …
Purdue University
ISAtools : Experimental data management
EMME
ISAtools suite to store data & metadata
Fonctionalities
-based on biomed ontologies
-bridge between existing biomed standards -format publication submission
-Pydio to upload data
-biological investigation repository (data + metadata)
Oxford eResearch Centre
Galaxy : Data analysis web platform
GALAXY by GenOuest
To analyse & share data as processes and tools
Informations
34917 jobs 150 users
More than 800 outils
Share
- data - histories - workflows - tools
Penn state university
Pydio : File sharing platform
Pydio by GenOuest
To store & share data as links
Informations
-Galaxy workspace -EMME workspace -INCa workspace
Share
- data via URI - control
- safety - privacy
Abstrium SAS
• For society
• Open Science and open data
• For end users scientists communities
• Data management plan
• Preserve, access, share & visualise (data & analytics porocesses)
• Help for project management …
• For ICT
• Facilitate the use of tools Research Service
• Accelerate switch between dev to production state
• Optimise infrastructures use (storage, computing & network…)
• Infrastructure for data infastructure of data
DMP ON THE LINE
Data storage
Metadata management
Metadata management
Metadata management
Metadata management
Metadata management
Metadata management
Metadata management
Metadata management
Metadata management
Metadata management
Metadata management
Metadata management
Metadata management
Metadata management
Metadata management
Metadata management
Metadata management
Metadata management
Metadata management
Metadata management
Metadata management
Metadata management
Data analysis
Data analysis
Data analysis
Data analysis
Data analysis
Data analysis
Data analysis
Data analysis
Data analysis
Data analysis
Data analysis
Metadata repository
Metadata repository
Metadata repository
CeSGO & DMP
• Données administratives • Dénomination du projet • Description du projet • Nom / ID du responsable • Agence de financement • Version du DMP• Politique appliquée aux données
• Responsabilités et ressources
• Collecte / création de données
• Description du jeu de données
• Protocole
• Méthode
• Equipements
• Assurance qualité appliquée
• Documentation et métadonnées
• Entrepôt Bii
CeSGO & DMP
• Stockage, sauvegarde et sécurité des données
• Datacenter CeSGO pendant la durée du projet (max : 5 ans)
• Ethique et cadre légal
• Protection des données sensibles ou personnelles
• CC version 4.0
• Partage des données
• Accès libre ou restreint
• Délai : 3 ans max après leur collecte
• Entrepôts (GEO, Genbank, SRA, Uniprot, PRIDE, ….)
• Outils nécessaires à la réutilisation / validation des données
• Data paper
CESGO: 5 GOALS
CeSGO : Western France e-Science
Data
management
Life sciences protocols
metadata
CeSGO : Western France e-Science
• New VREs!
CeSGO : Western France e-Science
• New VREs!
• Connected using semantic web approaches
• Thanks to DOI attribution
CeSGO : Western France e-Science
Reproducibility
cloud
docker
Galaxy
versioning
CeSGO : Western France e-Science
Accessibility
wiki
Public resources
Analytics processes Publications ExperimentsMerci de votre attention
eBGO HUB (collaboration) http://www.e-biogenouest.org/
Scitizen portal (citizen science) http://scitizen.genouest.org
EMME portal (data management) http://emme.genouest.org/
Galaxy instance (data analysis) http://galaxy.genouest.org/
GO4Bioinformatics (education ) https://www.e-biogenouest.org/einfrastructure/education