• No results found

A Web Service Approach for Distributed Access to Methods, Data and Models (I Don’t Care Where My Data and Methods Are)

N/A
N/A
Protected

Academic year: 2020

Share "A Web Service Approach for Distributed Access to Methods, Data and Models (I Don’t Care Where My Data and Methods Are)"

Copied!
27
0
0

Loading.... (view fulltext now)

Full text

(1)

A Web-Service Approach for

Distributed Access to Methods, Data

and Models

Rajarshi Guha Geoffrey Fox Kevin E. Gilbert

Marlon Pierce David Wild

Overview

Pub3D

Model Exchange

A Web-Service Approach for Distributed

Access to Methods, Data and Models

(I Don’t Care Where My Data and Methods Are)

Rajarshi Guha Geoffrey Fox Kevin E. Gilbert Marlon Pierce David Wild

School of Informatics Indiana University

(2)

A Web-Service Approach for

Distributed Access to Methods, Data

and Models

Rajarshi Guha Geoffrey Fox Kevin E. Gilbert

Marlon Pierce David Wild

Overview

Pub3D

Model Exchange

Outline

Overview

Pub3D

(3)

A Web-Service Approach for

Distributed Access to Methods, Data

and Models

Rajarshi Guha Geoffrey Fox Kevin E. Gilbert

Marlon Pierce David Wild

Overview

Pub3D

Model Exchange

Local versus Distributed Resources

I Access arbitrary resources (methods, data, applications)

(4)

A Web-Service Approach for

Distributed Access to Methods, Data

and Models

Rajarshi Guha Geoffrey Fox Kevin E. Gilbert

Marlon Pierce David Wild

Overview

Pub3D

Model Exchange

Combining Data & Functionality

I But what do we mean by “combining” all these resources?

I Different levels of complexity

I Keep track of new additions to a database via an RSS

feed

I Use Yahoo Pipes to combine and manipulate output

easily

I Write full fledged programs in your language of choice

(5)

A Web-Service Approach for

Distributed Access to Methods, Data

and Models

Rajarshi Guha Geoffrey Fox Kevin E. Gilbert

Marlon Pierce David Wild

Overview

Pub3D

Model Exchange

Web Services - What’s Available

Cheminformatics functionality

I Molecular descriptors

I Similarity (2D, 3D)

I Format conversion

I Depictions

I 3D coordinate generation

I Summarized on ChemBioGrid

Dong, X. et al,J. Chem. Inf. Model.,2007,47, 1303–1307

(6)

A Web-Service Approach for

Distributed Access to Methods, Data

and Models

Rajarshi Guha Geoffrey Fox Kevin E. Gilbert

Marlon Pierce David Wild

Overview

Pub3D

Model Exchange

Web Services for Modeling

I Computational web services can be viewed as wrappers around the actual program

I Since predictive models are a common feature in cheminformatics we’d like to support them as well

I This leads to a number of requirements

I Ability to develop models

I Store (deploy) models

I Use the models via the web service infrastructure

I We provide a computational infrastructure based on R which supports

I Feature selection

I Model development

I Model deployment

I Arbitrary R code

(7)

A Web-Service Approach for

Distributed Access to Methods, Data

and Models

Rajarshi Guha Geoffrey Fox Kevin E. Gilbert

Marlon Pierce David Wild

Overview

Pub3D

Model Exchange

What’s Available?

I Regression (OLS, CNN, RF)

I Classification (LDA)

I Clustering (k-means)

I Feature selection (stepwise and exhaustive)

I Automated model generation

I Load X, Y data, build linear and non-linear models with

optional LOO CV

I Deployed models

I Random forests for 60 NCI DTP cell lines

I Cytotoxicity

I Ames mutagenicity

(8)

A Web-Service Approach for

Distributed Access to Methods, Data

and Models

Rajarshi Guha Geoffrey Fox Kevin E. Gilbert

Marlon Pierce David Wild

Overview

Pub3D

Model Exchange

Web Services - What’s Available

Data sources

I We maintain a number of databases

I PubChem mirror (mainly for local research)

I Pub3D

I Directly accessible via queries in SQL

I We also encapsulate specific queries as web service calls

(9)

A Web-Service Approach for

Distributed Access to Methods, Data

and Models

Rajarshi Guha Geoffrey Fox Kevin E. Gilbert

Marlon Pierce David Wild

Overview

Pub3D

Model Exchange

REST Frontends to SOAP Services

I REST is a network architecture that avoids complex message formats

I No extra libraries required

I Access services using URL’s

I Much simpler interface compared to SOAP

I We’ve been putting REST interfaces onto a number of our SOAP services

I Current REST services include

I Database (3D structure, PubChem mirror)

I Molecular descriptors

(10)

A Web-Service Approach for

Distributed Access to Methods, Data

and Models

Rajarshi Guha Geoffrey Fox Kevin E. Gilbert

Marlon Pierce David Wild

Overview

Pub3D

Model Exchange

REST 3D Structure Service

I To get a 3D structure for a PubChem CID

(11)

A Web-Service Approach for

Distributed Access to Methods, Data

and Models

Rajarshi Guha Geoffrey Fox Kevin E. Gilbert

Marlon Pierce David Wild

Overview

Pub3D

Model Exchange

REST Depiction Service

I To get a 2D depiction for arbitrary SMILES

(12)

A Web-Service Approach for

Distributed Access to Methods, Data

and Models

Rajarshi Guha Geoffrey Fox Kevin E. Gilbert

Marlon Pierce David Wild

Overview

Pub3D

Model Exchange

REST Descriptor Service

I To get descriptors for an arbitrary SMILES string

(13)

A Web-Service Approach for

Distributed Access to Methods, Data

and Models

Rajarshi Guha Geoffrey Fox Kevin E. Gilbert

Marlon Pierce David Wild

Overview

Pub3D

Model Exchange

Pub3D

I A 3D structure database derived from PubChem

I Current version contains a single 3D structure for 17M compounds

I Structures obtained using MMFF94

I Not the lowest energy conformer

I Structures can be retrieved in SD format by CID using a web page or web service interfaces

I We also store distance moment shape descriptors allowing us to perform shape similarity searches

I http://www.chembiogrid.org/cheminfo/p3d/

(14)

A Web-Service Approach for Distributed Access to Methods, Data and Models Rajarshi Guha Geoffrey Fox Kevin E. Gilbert

Marlon Pierce David Wild Overview Pub3D Model Exchange

Pub3D Performance

I Shape searches can be as fast as 5s, for reasonably large result sets

I Fast enough for us to explore the “density of space” around a given query compound

Query Times for 10000 Random CIDs (R = 0.4)

Time (sec)

Number of Queries

0 20 40 60 80

0 500 1000 1500 2000 2500 3000 3500

0.0 0.1 0.2 0.3 0.4 0.5 0.6

0 20000 60000 100000 140000 Radius

Nearest Neighbor Count

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

1 0.91 0.83 0.77 0.71 0.67 0.62

Similarity Levitra (110634) Diazepam (3016) Taxol (36314) Didanosine (50599) Calcascorbin (6247)

(15)

A Web-Service Approach for

Distributed Access to Methods, Data

and Models

Rajarshi Guha Geoffrey Fox Kevin E. Gilbert

Marlon Pierce David Wild

Overview

Pub3D

Model Exchange

Pub3D & Clustering

I Indexing gives good performance

I As index size increases, performance degrades

I Could add more RAM

I Clustering the database

allows us to scale to

(16)

A Web-Service Approach for

Distributed Access to Methods, Data

and Models

Rajarshi Guha Geoffrey Fox Kevin E. Gilbert

Marlon Pierce David Wild

Overview

Pub3D

Model Exchange

Pub3D & Clustering

I Indexing gives good performance

I As index size increases, performance degrades

I Could add more RAM

I Clustering the database allows us to scale to

(17)

A Web-Service Approach for

Distributed Access to Methods, Data

and Models

Rajarshi Guha Geoffrey Fox Kevin E. Gilbert

Marlon Pierce David Wild

Overview

Pub3D

Model Exchange

Pub3D and Conformers

I Single, arbitrary conformers aren’t too useful

I We have currently generated conformers for a subset of PubChem (4 to 10 heavy atoms)

I 3 Kcal energy window

I 243,892 compounds

I 2M conformers

I Clustering will be vital when conformers are considered

I Allows us to handle arbitrary numbers of conformers

I May need to consider some sort of compression

(18)

A Web-Service Approach for

Distributed Access to Methods, Data

and Models

Rajarshi Guha Geoffrey Fox Kevin E. Gilbert

Marlon Pierce David Wild

Overview

Pub3D

Model Exchange

Model Exchange

I The literature contains many published models

I No way to utilize them unless we manually rebuild them

I In many cases we will not have access to descriptors

I Difficult to search for models specifically

I We should be able to do the following

I Search for models

I Exchange them

I Execute them

I Is this a “format” issue?

(19)

A Web-Service Approach for

Distributed Access to Methods, Data

and Models

Rajarshi Guha Geoffrey Fox Kevin E. Gilbert

Marlon Pierce David Wild

Overview

Pub3D

Model Exchange

Model Exchange

I The literature contains many published models

I No way to utilize them unless we manually rebuild them

I In many cases we will not have access to descriptors

I Difficult to search for models specifically

I We should be able to do the following

I Search for models

I Exchange them

I Execute them

I Is this a “format” issue?

(20)

A Web-Service Approach for

Distributed Access to Methods, Data

and Models

Rajarshi Guha Geoffrey Fox Kevin E. Gilbert

Marlon Pierce David Wild

Overview

Pub3D

Model Exchange

PMML for Model Exchange

I Predictive ModellingMarkup Language

I A standard supported by the Data Modeling Group

I Allows you to serialize predictive models to XML

I Linear, logistic regression

I Tree models

I Neural networks, Na¨ıve Bayes models

I Association models

I Ensemble models (random forests, arbitrary ensembles)

I Supported by a number of platforms

I IBM, Salford Systems

I SAS, SPSS

(21)

A Web-Service Approach for

Distributed Access to Methods, Data

and Models

Rajarshi Guha Geoffrey Fox Kevin E. Gilbert

Marlon Pierce David Wild

Overview

Pub3D

Model Exchange

PMML for Model Exchange

I Since it’s XML, it’s extensible

I PMML allows us to specify the model itself

I But we need to add extra information to make a model truly searchable

I Provenance (who made it, when was it made, . . . )

I Property (what is being modeled)

(22)

A Web-Service Approach for

Distributed Access to Methods, Data

and Models

Rajarshi Guha Geoffrey Fox Kevin E. Gilbert

Marlon Pierce David Wild

Overview

Pub3D

Model Exchange

PMML for Model Exchange

Provenance

I Easily solved using Dublin Core

Property

I Could be addressed using keywords

I Could reuse pre-existing ontologies

Requirements

I This is tricky

I Need to identify what descriptors were used

I What software, version

(23)

A Web-Service Approach for

Distributed Access to Methods, Data

and Models

Rajarshi Guha Geoffrey Fox Kevin E. Gilbert

Marlon Pierce David Wild

Overview

Pub3D

Model Exchange

PMML for Model Exchange

Provenance

I Easily solved using Dublin Core

Property

I Could be addressed using keywords

I Could reuse pre-existing ontologies

Requirements

I This is tricky

I Need to identify what descriptors were used

I What software, version

(24)

A Web-Service Approach for

Distributed Access to Methods, Data

and Models

Rajarshi Guha Geoffrey Fox Kevin E. Gilbert

Marlon Pierce David Wild

Overview

Pub3D

Model Exchange

PMML for Model Exchange

Provenance

I Easily solved using Dublin Core

Property

I Could be addressed using keywords

I Could reuse pre-existing ontologies

Requirements

I This is tricky

I Need to identify what descriptors were used

I What software, version

(25)

A Web-Service Approach for

Distributed Access to Methods, Data

and Models

Rajarshi Guha Geoffrey Fox Kevin E. Gilbert

Marlon Pierce David Wild

Overview

Pub3D

Model Exchange

Summary

I Pub3D is a shape searchable version of PubChem

I Conformers will have to be considered for it to be useful

I Are searches meaningful? Benchmarks required

I Web services provide one approach to model deployment

I We should be able to search for models explicitly

I PMML is one extensible solution the addresses model

exchange

(26)

A Web-Service Approach for

Distributed Access to Methods, Data

and Models

Rajarshi Guha Geoffrey Fox Kevin E. Gilbert

Marlon Pierce David Wild

Overview

Pub3D

Model Exchange

Summary

I Pub3D is a shape searchable version of PubChem

I Conformers will have to be considered for it to be useful

I Are searches meaningful? Benchmarks required

I Web services provide one approach to model deployment

I We should be able to search for models explicitly

I PMML is one extensible solution the addresses model exchange

(27)

A Web-Service Approach for

Distributed Access to Methods, Data

and Models

Rajarshi Guha Geoffrey Fox Kevin E. Gilbert

Marlon Pierce David Wild

Overview

Pub3D

Model Exchange

Acknowledgments

I Kangseok Kim

References

Related documents

violations of the Conflict of Interest Code at the request of another Member or by Resolution of the House or on his own initiative. Germany has had a Code of Conduct for members

The two most common methods used to capture leads are lead retrieval systems offered by an organizer, 74 percent, or a paper-based lead form approach, 59 percent.. Custom

Unlike standard bunching approaches in which the notched incentive is the same for all individuals (for example a tax rate notch as in Kleven & Waseem 2013 ), in our setting

Genre: Action, Drama Director: Aku Louhimies Production company: Solar Films Stage of project: In development Looking for: Pre-sales, co-producers, distributors, financing.

As we have seen, despite the increasing emphasis on the importance of the business model to an organization’s success, there has been a lack of consensus regarding its definition

necessarily  (solely)  due  to  the  inflammation  of  the  rheumatic  disease  but  can  also  be . influenced  by  other  factors.  ESR,  for  instance,  can  be 

Re: Federal-State Joint Board on Universal Service; TracFone Wireless, Inc.; Petitions for Designation as an Eligible Telecommunications Carrier in the States of New York, Florida,

Average market prices per trading period by policy treatment for student and professional subjects 60 70 80 90 100 1 5 10 15 20 1 5 10 15 20 Average Price (Tokens) No