• No results found

6-Simon Tyrrell - simon tyrrell grassroots talk for phenoharmonis.pdf

N/A
N/A
Protected

Academic year: 2020

Share "6-Simon Tyrrell - simon tyrrell grassroots talk for phenoharmonis.pdf"

Copied!
31
0
0

Loading.... (view fulltext now)

Full text

(1)

Grassroots: An Infrastructure for sharing Services & Data

Simon Tyrrell, Xingdong Bian and Robert P. Davey

Earlham Institute, Norwich Research Park, Norwich, NR4 7UZ, UK

(2)

Background

Grassroots is part of the Wheat Information System (WheatIS) and also funded by Designing Future Wheat (DFW) to build a system that responds to the needs of the international wheat community

● Promotion of an open-access model for data exchange ● Reliance on a distributed system

● Facilitate sharing data and tools

● Promotion of the visibility of each participating platform to contribute to their sustainability.

(3)

Challenges

● Geographic disparity

○ Researchers spread out across the world ● Code reusability

○ Each set of developers re-inventing the wheel ● Data interoperability

○ Different custom formats for storing data ● Service interoperability

○ Connecting similar services

(4)

Goals

An infrastructure for a distributed set of servers to transparently share:

● Data

○ Well-described

○ Can be reused and shared.

● Services

○ Can be added to analysis tools and pipelines ○ Integration

(5)

Typical Web Server-Client Interaction

● User requests data from a web server such as static html pages...

● … or dynamically-generated content such as BLAST or text search results via a Web Service.

HTML

Client Server Data

(6)

Grassroots Server Infrastructure - Apache

Apache httpd Web Server

Apache httpd is the most commonly used Web Server

● Open source

● Very configurable ● Robust

● Widely supported

● Easily extensible by adding functionality as modules such as e.g.

○ SSL for secure connections

(7)

Server Infrastructure - Grassroots

Apache httpd Web Server

Grassroots Apache

module Grassroots libraries

● Grassroots Apache module acts as a bridge between Apache and the Grassroots Infrastructure.

● A set of cross-platform libraries that can be used with the Apache web server via a Grassroots module including

○ Networking code to access code and services across the web ○ Server and Service management tools

○ Standardising access to/from our web services and their parameters ○ Read and write data from different resources e.g.

■ iRODS ■ Local files ■ Dropbox ■ Google drive

(8)

Server Infrastructure - Heavyweight Services

Apache httpd Web Server

Grassroots Apache

module Grassroots libraries

Heavyweight services

Grassroots Heavyweight Services

● Programmer-level tools that conform to the

Grassroots Services API, which is a strict set of standards to access underlying tools and data

○ BLAST

○ iRODS Search

(9)

Server Infrastructure - Lightweight Services

Apache httpd Web Server

Grassroots Apache

module Grassroots libraries

Lightweight services

Heavyweight services

Grassroots Lightweight Services

● Structured text files

● Scripts that use Grassroots libraries to access information from other web services e.g.

(10)

Platform and programming language independent

Use any architecture that can produce and consume Grassroots information

Clear and easy JSON schema

(11)

JSON

● JSON is an open text format that describes objects using key–value pairs. ● Standard format for transferring data across the Web.

● Can be used by many different programming languages and applications.

{

"services": [{

(12)

● Platform and programming language independent

○ Use any architecture that can produce and consume Grassroots information ○ Clear and easy JSON schema

Distributed information exchange

Built upon interconnected web servers and services

Requires production and consumption of standardised information

Communicate through standardised REST API

(13)

Run a Service

BLAST

(14)

Run the same Service on another Server

Database B Database A

Earlham Institute

University of Bristol BLAST

(15)

Issues

● Manually having to access each Service individually ● Collation of results

● Human error

○ Not running each service with the same parameters ○ Mistakes when putting the results together

(16)

Distributed Services

BLAST

Database B Database A BLAST

Earlham Institute

(17)

Different Server, Same List of Services

BLAST

Database B Database A BLAST

Earlham Institute

(18)

Duplicated Services...

BLAST

Database B Database A BLAST

Earlham Institute

(19)

… Get Amalgamated

BLAST

Database B Database A BLAST

Earlham Institute

(20)

Duplicated Services - Under the hood

BLAST

Database B Database A BLAST

Earlham Institute

(21)

● Platform and programming language independent

○ Use any architecture that can produce and consume Grassroots information ○ Clear and easy JSON schema

● Distributed information exchange

○ Built upon interconnected web servers and services

○ Requires production and consumption of standardised information ○ Communicate through standardised REST API

● Run computational tasks through local/HPC services

Semantic metadata support

Ontologies / controlled vocabularies

Data description consistency

(22)

Example Ontology data

"north_east_bound" : {

"@type" : "GeoCoordinates", "latitude" : 53.0703866, "longitude" : -0.5396723 },

"south_west_bound" : {

"@type" : "GeoCoordinates", "latitude" : 53.0551367, "longitude" : -0.5623362 }

},

"Address" : {

"@type" : "PostalAddress", "postalCode" : "LN5 0QG", "addressLocality" : "Welbourn", "addressRegion" : "Lincolnshire", "addressCountry" : "GB"

} "@context" : "http://schema.org",

"Date collected" : { "@type" : "Date", "date" : "2013-05-16" },

"Name/Collector" : {

"@type" : "Person", "name" : “Lemmy" },

"Company" : {

"@type" : "Organization", "name" : "FooBar Inc" },

"location" : {

"location" : {

(23)

Example Ontology data

"north_east_bound" : {

"@type" : "GeoCoordinates", "latitude" : 53.0703866, "longitude" : -0.5396723 },

"south_west_bound" : {

"@type" : "GeoCoordinates", "latitude" : 53.05513670000001, "longitude" : -0.5623362

} },

"Address" : {

"@type" : "PostalAddress", "postalCode" : "LN5 0QG", "addressLocality" : "Welbourn", "addressRegion" : "Lincolnshire", "addressCountry" : "GB"

} "@context" : "http://schema.org",

"Date collected" : { "@type" : "Date", "date" : "2013-05-16" },

"Name/Collector" : {

"@type" : "Person", "name" : “Lemmy" },

"Company" : {

"@type" : "Organization", "name" : "FooBar Inc" },

"location" : {

"location" : {

(24)

Grassroots architecture

● Platform and programming language independent

○ Use any architecture that can produce and consume Grassroots messages ○ Clear and easy JSON schema

● Distributed information exchange

○ Built upon interconnected web servers and services

○ Requires production and consumption of standardised information ○ Communicate through standardised REST API

● Run computational tasks through local/HPC services ● Semantic metadata support

○ Ontologies / controlled vocabularies ○ Data description consistency

Extensible

Adding and integrating services

Programming a service conforming to JSON schema API

(25)

Field Pathogenomics Service

● Collaboration with Diane Saunders at the John Innes Centre ● An intuitive and informative breeder’s tool available at

https://grassroots.tools/yellowrust-map/

● Tracking yellow rust pathogen spread over time and location ● Genotype data

○ Sequencing of infected varieties gives pathogen and host information ● Phenotype data

(26)
(27)
(28)
(29)

● Basic Local Alignment Search Tool (BLAST) finds regions of similarity between biological sequences.

● Utilises EI’s high performance computing cluster to run jobs ● Enables high-throughput BLAST searches

● Across multiple databases ● Across multiple sites

○ University of Bristol ● Since 12 Nov 2015 launch

○ 9000+ page visits ○ More than 14000 jobs ● Dynamic web front-end

● Semantic marked up output

(30)

Acknowledgements

● University of Bristol

○ Paul Wilkinson ○ Mark Winfield ○ Keith Edwards ○ Gary Barker ○ CerealsDB Team

● INRA-URGI (France)

○ Michael Alaux ○ Raphael Flores ○ Hadi Quesneville

● John Innes Centre

○ Ricardo Ramirez Gonzalez

● Earlham Institute

○ Ksenia Krasileva ○ Diane Saunders ○ Matt Clark

○ Erik van den Bergh ○ Toni Etuk

○ Felix Shaw ○ Alice Minotto ○ Jon Wright ○ Paul Bailey

○ Bernardo Clavijo ○ Luis Yanes

(31)

Availability

References

Related documents

jumpstart JSON Schema development you can counsel the JSON Schema generator to create the valid schema based an existing XML Schema or JSON instance document The JSON Schema

materials needed to implement the program to the family (e.g., toys,.. Counseling Outcome Research and Evaluation, Vol 6, No. This article is © SAGE Publications and permission

o Do not isolate the accessible swing (2 respondents). like the one near Sunnyside Park) (1 respondent) o Make all the play structures dinosaur themed (1 respondent)3. o Add a

JSON with JSON Schema, text fields, JSON functions let you use this data in any SQL query.. Was this

Must have HS Diploma, 3 years office experience (medical office preferred), computer skill to include: Word, Excel, and Outlook, must be able to work flexible hours.. Fax Resume

In [ 36 ], it is seen that selecting higher inductances allows systems without synchronization loop, such as the one proposed in this paper, to get better power factor and input

• William Tyrrell Thomson, Introduction to Space Dynamics Dover Publications, 1986 • Francis J. Hale, Introduction

Contact Information:  Kenneth O. Simon      [email protected]    Phone:  (205)250‐6622     KENNETH O. SIMON