• No results found

The development of the MIRRI ICT infrastructure for microbial resources

N/A
N/A
Protected

Academic year: 2021

Share "The development of the MIRRI ICT infrastructure for microbial resources"

Copied!
22
0
0

Loading.... (view fulltext now)

Full text

(1)

The development of the MIRRI ICT

infrastructure for microbial resources

Paolo Romano, Boyke Bunk, Anna Klindsworth, David Smith, Alexander

Vasilenko, Frank Oliver Glockner and Vincent Robert

(2)

A common situation …

MS-Access

MySQL

(3)
(4)

Outlook

1.

Management system for curators

2.

Publication of data for third parties

3.

Interoperability

(5)

1. Management system for curators

A. MANAGE COLLECTION’S DATA USING WEB BASED APPLICATIONS

Pros:

• Accessibility to databases from anywhere

• Accessibility to databases using any devices

• Possibly easy to use for basic operation

• Maintenance is easy for IT departments since the software is centrally installed and maintained

• No need for installations on curators, researchers or technicians devices (Desktop, laptop, tablet, smart phone, etc.) since access is done using browsers

• The same software might be used for the management and the publication of data

Cons:

• Developments costs are usually higher

• Developments can be significantly more complex to support all browsers and their versions

• Some advanced or even basic functionalities might be much more difficult or impossible to program

• Rich interfaces or memory demanding operation might be impossible

• Interface can be much slower than desktop applications

• Interactions with other software might be more difficult or impossible

• Maintenance of software might be more intensive to allow new versions of browsers to still function properly

• Security issues are more complex to handle with Web Apps than with desktop application since the application is potentially accessible from any device by anyone

(6)

1. Management system for curators

B. MANAGE COLLECTION’S DATA USING DESKTOP APPLICATIONS

Pros:

• Rich software interface

• Easy to use

• Fast response to user’s commands

• Memory demanding or interface rich operations can easily be performed (to the technical limits of the OS, computer, etc., of course)

• Relatively easy to develop (for basic functionalities at least)

• Interactions with other software can be easy to establish. Pipelines can be created and import-export functionalities easy to implement or to use

• Data access security can easily be ensured

Cons:

• Installation can be problematic (different Operating Systems (OS) versions, missing DLL, etc.)

• DA are usually made for one OS (Windows, Mac or Linux) but won’t work with others.

• When installed on different computers, updates and upgrades of the software must be re-installed everywhere making bug fixing or new version less easy to fix or install

• DA are usually not accessible from a remote computer or device

• For software working with limited installation options (fixed number of licenses), DA might become expensive and/or difficult to update/upgrade

(7)

1. Management system for curators

C. CREATE MANAGEMENT SOFTWARE USING IN-HOUSE RESOURCES

Pros:

Taylor made application fitting perfectly with the needs of

the curators (at design time at least)

Fast response to implement new features and bug solving

This solution can be quite cheap if the software remains

simple

Possible if strong team of stable developers

Cons:

• Curators or researchers are rarely good software designers or programmers making the resulting solution uneasy to use, maintain and further develop

• Real developers are rarely available in culture collections (CC) because they are expensive.

• Good developers easily tend to leave the CC to find better paid position leaving the software unmaintained and hardly usable by newly recruited developers.

• This option can be extremely expensive when the wanted functionalities are complex and large.

• Most in-house solutions are not (easily at least) scalable (add/modify/remove more

tables, fields, operations, etc.) and redesign or complete rewriting of software is often needed. This leads to interfacial instability for the end-users which is a key issue.

• Developments take a long time before being usable and stable especially for single or small developers teams.

• Many software were abandoned after a few months/years because they were too slow, difficult to use, user-unfriendly, buggy or unstable. This is a common situation in a CC.

“If you think that professionals are

expensive, wait until you work

(8)

1. Management system for curators

E. USE EXISTING OPEN-SOURCE OR FREE SOFTWARE

Pros:

Large offer

Very good and advanced software available

Free of charge

Unlimited use

Good for collections with strong IT support and software

developers

Extensions possible by local developers (not always)

Cons:

No complete solution for culture collections available

Creation of pipelines needed and can be difficult to achieve in a user-friendly way

Using open-source software is far from easy and in practice it may be impossible to

enter into the code of others Access to code can be an illusion

Support might be a serious issue in case of problems … and there are always problems

with any software …

(9)

1. Management system for curators

D. USE EXISTING COMMERCIAL SOFTWARE

Pros:

Large offer

Very good and advanced software available

Support available

Custom developments can be made by professionals

Complete or near-complete solutions are available

for culture collections, so why reinvent the wheel ?

Cons:

Few complete solutions for culture collections available

Some solutions are not extensible/flexible or adapted to all collections

Pure software companies have little biological background making it difficult

to communicate

Costs associated with software can be important

Maintenance costs should remain under control

“If you think that professionals are

expensive, wait until you work

with amateurs …” Red Adair

But

(10)

1. Management system for curators

F. DATABASE TYPES

(11)

1. Management system for curators

F. DATABASE TYPES

Good:

Relational databases :

MySQL

PostgreSQL

MSSQL

Oracle

Document based or other advanced databases

MongoDB

Vertica

etc

All data should be in databases

Not good:

Proprietary databases

Catalogs on paper

Word

Excel

MS-Access

Filemaker Pro

(12)

1. Management system for curators

G. DATABASE ACCESS & BACKUP

Good:

Backup 2x/day

Sharding

which is the process of storing data records

across multiple servers

Live replication

Databases should be physically close to application

especially for large data exchanges or sequence

alignments (for example)

Not good:

No backup

(13)

1. Management system for curators

H. INSTALLATION OF SOFTWARE, VERSIONING INFORMATION AND

TECHNOLOGY (IT) RESOURCES NEEDS

Good:

No installation or simple or minimum (true for web

apps, less for desktop apps if installed on all computers)

Hosted solutions are super easy for both IT and users

Not good:

Very complex installations or settings of parameters

Some LIMS software can be extremely hard/long/expensive to set

Client-server apps are more difficult to maintain if installed on all computers and

updates can be challenging

IT costs can be high

Salaries

Servers, hardware, firewalls, SAN, etc

(14)

1. Management system for curators

I. HOSTED SOLUTIONS

Pros:

• No installation

• Super easy for both IT and users

• Available anywhere, anytime on any device (computer, smartphone, tablet, etc)

• Fast and reliable if good IT infrastructure behind and using Citrix

• Easy maintenance of software/databases

• No need to buy hardware (server, SAN, firewalls, etc)

• No need to buy and maintain expensive and sophisticated software for the management and the monitoring of the system (VMWare vSphere, for example)

• No need to hire IT staff

• Continuous monitoring and support

• Given the number of services provided, hosted solutions are often much cheaper than running a complete infrastructure in house

• Management of CC software and associated database can directly be connected to the website used for publication of CC data

Cons:

• Require recurrent payments (monthly or annually) which means that these costs must be part of the annual budget of the CC

• Access to database engine might not be possible (only backups of databases could be asked from time to time)

(15)

1. Management system for curators

J. MOST WANTED FUNCTIONALITIES

We love:

Collection maintenance

Strain distribution

Research

Screening

Dynamic System (curators/researcher can change the system without the need for IT

or developers)

Advanced security and access management

Tracking of database modifications by each user

Ability to import and export data as text, images, DNA trace files, microplate reader

data, MS-Excel, HTML, XML, FASTA, NCBI and more

Linking or exportation of data to other websites such as GBIF, StrainInfo, NCBI, etc.

Ability to create custom layouts such as invoices, catalogs, sample labels

Strains stock management

Customer information management

(16)

1. Management system for curators

J. MOST WANTED FUNCTIONALITIES

We love:

LIMS module to manage and track DNA sequencing projects including revival

of strains from collection stocks, DNA prep, PCR, gels, viewing, aligning and

editing DNA sequences, and depositing consensus DNA sequences into the

database and online catalog

Scripting and debugger tools to automate routine tasks and extend

functionalities of the software

Integration of scripts within existing menus of the software

Reporting functions allow export of data in many formats including tab

delimited, text, MS-word, MS-excel, HTML, FASTA, NCBI, etc.

Integrated content management system for the administration of CC websites

and associated communication devices

Polyphasic identification and classification, to identify and classify strains

based on a custom weighted combination of DNA

sequence, physiological, morphological and other

Species determinations

(17)

1. Management system for curators

J. MOST WANTED FUNCTIONALITIES

We love:

Pairwise DNA sequence alignment.

Multiple DNA sequence alignment

Storage of data of many formats including text, dates, calculations, literature

references, DNA sequence trace files, electrophoresis gel photos, GPS

coordinates, microplate reader data (96 or 384 wells), and photos. Data types

can thus include morphological, physiological, molecular, chemical,

ecological, geographic, and literature reference data

DNA gel analysis

Cell size determination

Import, manage, analyze and export spectral data such as MALDI tof or other

systems

Generation of dynamic geographic distribution maps using Google Maps

(18)

2. Publication of data for third parties

Curators want:

Direct access to published data.

Easy/live release of new strains and associated data

Restrict data access to Internet users/clients if needed

Easy/live adaption of webpages and website content

Websites should be seen as a way to communicate with clients and end-users. This

could be done by:

simple webpages

forums

news systems

Change the look and some functionalities of the website on the fly without the

intervention of website developers

Allow deposit forms to be filled by depositors of strains without having to re-type all

data manually.

Allow clients to easily select strains to be ordered via a Cart system

Know pending orders, payments and data associated with any client

Allow end-users searching their databases according to the specificities of their

collection

Allow third parties to take advantage of their CC’s data to increase traffic to their

websites. This can be done via friendly URLs, simple or advanced web services

(19)

2. Publication of data for third parties

Clients want:

Easy searching system on as many features as possible

Simple Cart system allowing easy (de-)selection of strains to

be ordered

Not having to retype all personal or institutional information

each time they order strains

Fast and easy communication with curators or sales

departments of the CC

Frequently asked question (FAQ) section answering most of

their questions

(20)

2. Publication of data for third parties

End-users want:

Easy searching system on as many features as possible

Advanced query system allowing to combine queries in complex

ones using AND, OR and NOT operators (including brackets to

group conditions)

Easy copy-pasting of data

Easy exportation of selected data, manually or via software (web

services)

Pairwise DNA or protein sequences alignments against reference

databases

Polyphasic identifications and/or classifications against reference

databases

MLST (or similar methods) allowing identifications or typing of

strains

(21)

3. Interoperability

DATA STANDARDS AND PROTOCOLS

BioSharing (http://biosharing.org/)

Biodiversity Information Standards (TDWG; http://www.tdwg.org/)

Genomic Standards Consortium (GSC;

http://en.wikipedia.org/wiki/Genomic_Standards_Consortium)

etc

LINKS TO EXISTING RESOURCES

STRAININFO

WDCM

TAXONOMIC DATABASES (M

YCO

B

ANK

, DSMZ,

ETC

)

GBIF

INSDC (NCBI, ENBL, DDBJ,

ETC

)

BOLD

L

IFE

W

ATCH

, B

IO

V

EL

, V

IBRANT

, L

IFE

L

INK

, E

LIXIR

, Q-

BANK

,

ETC

M

ANY MORE

(22)

Work in progress

We need your help, opinions, suggestions and critics

Contact us : [email protected]

References

Related documents