• No results found

idigbio Technology, Cloud and Appliances

N/A
N/A
Protected

Academic year: 2021

Share "idigbio Technology, Cloud and Appliances"

Copied!
21
0
0

Loading.... (view fulltext now)

Full text

(1)

iDigBio

Technology,

Cloud and

Appliances

Jose Fortes

(on behalf of the

iDigBio IT team)

iDigBio External Advisory Board Meeting 2012 (Project Year 1)

(2)

Advanced Computing and Information Systems laboratory

CI Stakeholders

Domain Data Producers Infrastructure Providers Domain Service Providers Domain Data Consumers National/Global Data Aggregators 2

iDigBio

Museums Amazon WS Google Microsoft Azure DataONE TCNs Collectors GBIF ALA Researchers Amazon Turk Georeferencing Imaging services Data quality Mapping EOL TCNs TCNs Government Translation OCR BISON NESCent Data Conservancy iPlant iPlant Teachers Citizens TCNs
(3)

Advanced Computing and Information Systems laboratory

Stakeholders APIs

Domain Data Producers Infrastructure Providers Domain Service Providers Domain Data Consumers National/Global Data Aggregators 3

iDigBio

Museums Amazon WS Google Microsoft Azure TCNs Collectors GBIF ALA Researchers Citizens Amazon Turk Georeferencing Imaging services Data quality Mapping EOL TCNs TCNs Government Translation OCR Domain

data Appliances BLOBs

Updates Notification Query results Customer Requests Processed data Domain-level data Updates Notification Usage track BISON DataONE TCNs Data Conservancy NESCent iPlant Teachers

(4)

Advanced Computing and Information Systems laboratory

Interface Model for iDigBio and TCNs

4

Infrastructure Providers, National/Global Data Aggregators, Domain Service Providers, Domain Data Consumers

. . . . . . iDigBio + Resources TDWG XMPP OCCIWG REST WS WS-I TAPIR HTTP SQL UTF-8 RDF XML X.509 OpenID SAML TCP JPEG2000 ODBC Virtual Appliances Machines Storage Networking Learning Modules Archiving Data Collections Structured Data Services Wiki Workshop Resources Workflow Engines Taxonomic Validation Data Conversion Geographical Mapping Collaboration Tools Non-structured Data Services TCNs National History Museums Google App Engine XSEDE Microsoft Live Amazon EC2/S3 Applied Innovations Microsoft Azure Google Apps Federal Collections

(5)

Advanced Computing and Information Systems laboratory

Building the iDigBio Cloud

Cloud-based strategy

Providing useful services/APIs (programmatic and web-based)  Federated scalable object storage and information processing  Digitization-oriented virtual appliances

Reliance on standards, proven solutions and sustainable software

Continuous consultation with stakeholders

(6)

Advanced Computing and Information Systems laboratory

Unique UF+FSU record

Track record of building cyberinfrastructure

PUNCH and In-VIGO

 Nanohub, Netcare, In-VIGOBlast …

Morphbank AFRESH

 Telecenter  Archer

(7)

Advanced Computing and Information Systems laboratory

Keeping our eyes on the ball

7 Common/frequent needs: archival storage, server hosting, feedback on the data, data intensive transformations …

10-year tsunami of requirements: from being on Facebook to multilingual search-and-compute across multiple data sets…

(8)

Advanced Computing and Information Systems laboratory

Evolution of iDigBio capabilities

8

Time

Data ingestion Data access, provision and visualization Provide and enable data feedback Data linking and federation Process and visualize integrated data

Increasing storage and server hosting in support of the above Increasing number of appliances in support of the above

Web site for interaction with public, community, education and above Q3/2012 Q3/2013 Q3/2014 Q3/2015

(9)

Advanced Computing and Information Systems laboratory

• Textual data

o JSON document database

o Data ingestion via DwC-a files o Get / Set API

• Image Data

o Internet-accessible object storage

o Upload appliance

o Limited access to low-level APIs

Textual Data (RIAK) Image Data (SWIFT) API Gateway Internet access

(10)

Advanced Computing and Information Systems laboratory

• Textual Data

o JSON document database o Data Ingestion via DwC-a files o Rich RESTful API

• Image Data

o Web-accessible object storage o Upload appliance

o Fully abstracted storage • Indexing and Search

o Extract EXIF data from images o Limited but useful set of indexes o Intuitive search UI

o Search available via API • Portal

o Consumes and interfaces text, image and search APIs (minimal server side code)

o Web-based mapping - client side javascript limits useable record count to about 50k records at a time.

Textual Data (RIAK) Image Data (SWIFT) API Gateway Internet access Filter Set Query interfa ce EXIF extrac tion iDigBio Portal

(11)
(12)

Advanced Computing and Information Systems laboratory

Virtual Appliances in iDigBio

Packaging of software and dependences in virtual machines

End user/desktop (e.g. VMware, Virtualbox)

Infrastructure-as-a-Service clouds (e.g. OpenStack)

Enhance user experience, facilitate integration with cloud

Image ingestion appliances (short term)

Batch upload of images from a local storage to cloud

Generate GUID/URLs for later processing

Reliable transfers using cloud APIs (e.g. Swift/iDigBio)

Post-processing appliances

(

OCR tools; end-user or batch)

Geo-referencing appliances

(

Training/verification)

(13)

Advanced Computing and Information Systems laboratory

Archer cyber-infrastructure

Hundreds of distributed compute/routers nodes 24/7 operation, 650+ cores

Custom appliance image for computer architecture community

Job scheduling across participating institutions

(14)

Advanced Computing and Information Systems laboratory

Now: appliance proposal process

By users/developers through the iDigBio Web portal

Requirements – demonstrates usage/buy-in, software license, documentation, etc

Queue of appliances for integration

iDigBio will prioritize and work with developers

Leverage expertise in appliance development

Focus on images that users can download and run on VMware, Virtualbox

(15)

Advanced Computing and Information Systems laboratory

Short term

Ingestion appliance Web-based UI Images captured

(e.g. HD/flash media)

/images/1/100.tif /1/101.tif /2/200.tif … iDigBio object Storage cloud (Swift) Batch upload, Cloud APIs Web server Cloud client File interface /1/100.tif GUID1 /1/101.tif GUID2

Facilitate data ingestion, interface with iDigBio

(16)

Advanced Computing and Information Systems laboratory

Medium-term – “Marketplace”

iDigBio Portal Users/ Developers Community appliances End users iDigBio Personnel iDigBio appliances Proposals
(17)

Advanced Computing and Information Systems laboratory

Long-term – information processing

iDigBio Portal Users/ Developers Community appliances End users iDigBio

(18)

Advanced Computing and Information Systems laboratory

Summary

iDigBio cloud

Service-oriented standards-based cyberinfrastructure focused on the ADBC community needs

Scalable data management and information processing using standard interfaces, data formats, protocols, tools

Toolboxes as appliances

Evolving collection of community-selected tools Built-in interfaces for effortless iDigBio integration

Embedded best practices and standards in biocollections work

Software re-use when open-source, well maintained,

manageable, sustainable and efficient to re-purpose

Feedback and suggestions welcome

(19)

Advanced Computing and Information Systems laboratory

Acknowledgments

National Science Foundation

Judith Skog and Anne Maglia

IDigBio team at University of Florida and Florida State

University

(20)

Advanced Computing and Information Systems laboratory

Extras

(21)

Advanced Computing and Information Systems laboratory

iDigBio IT Vision

Cyberinfrastructure to enable

the collaborative creation, integration and management of digitized biocollections,

their use in scientific research, education and outreach

Visible as a collection of persistent Internet-accessible

services, data and resources

 For biocollection “producers”  For biocollection “consumers”

For biocollection service providers For cyberinfrastructure providers For national/global data aggregators

References

Related documents