• No results found

Automation and semantics: the CombeChem experience

N/A
N/A
Protected

Academic year: 2020

Share "Automation and semantics: the CombeChem experience"

Copied!
26
0
0

Loading.... (view fulltext now)

Full text

(1)

Oct 2004

Oct 2004 Jeremy FreyJeremy Frey InformaticsInformatics11

Automation and

Automation and

Semantics: The

Semantics: The

CombeChem

CombeChem

Experience

Experience

Jeremy Frey

Jeremy Frey

Informatics & Data Visualisation

Informatics & Data Visualisation

Intech

Intech

Centre Oct 2004

Centre Oct 2004

October 2004

October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics

Talk: Workflow

Talk: Workflow

Introduction to

Introduction to

e

e

-Science & the

-Science & the

Combechem

Combechem

Project

Project

Smart Labs

Smart Labs

Semantics & Databases

Semantics & Databases

Publication@Source

Publication@Source

(2)

October 2004

October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics

e

e

-Science

-Science

‘e-Science is about global collaboration in key areas of

science, and the next generation of infrastructure that will

enable it.’

‘e-Science will change the dynamic of the way science is

undertaken.’

John Taylor, DG of UK OST

‘[The Grid] intends to make access to computing power,

scientific data repositories and experimental facilities as

easy as the Web makes access to information.’

Tony

Blair, 2002

What is the web?

Publication@Source

trace all the way back from publication to the original data –

provenance

CombeChem

Who needs provenance?

㻥㼘㼖㼋㻏㻃㻥㼏㼄㼌㼕㻃㻉 㻃㻫㼘㼗㼗㼒㼑㻃㻕 㻓 㻓 㻗

October 2004

October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics

The Comb

e

Chem Project

㻷㼋㼈㻃㼈㼛㼓 㼒㼑㼈㼑㼗㼌㼄㼏㻃㼚 㼒㼕㼏㼇 㻃㼒㼉

㼆㼒㼐 㼅 㼌㼑㼄㼗㼒㼕㼌㼄㼏㻃㼖㼜㼑㼗㼋㼈㼖㼌㼖㻃㼄㼑㼇 㻃㼋㼌㼊 㼋

㼗㼋㼕㼒㼘㼊 㼋㼓 㼘㼗㻃㼄㼑㼄㼏㼜㼖㼌㼖㻃㼐 㼈㼈㼗㼖㻃㼗㼋㼈

㼈㼛㼓 㼒㼑㼈㼑㼗㼌㼄㼏㼏㼜㻃㼊 㼕㼒㼚 㼌㼑㼊 㻃㼓 㼒㼚 㼈㼕㻃㼒㼉

㼆㼒㼐 㼓 㼘㼗㼌㼑㼊

㻤㼘㼗㼒 㼐 㼄㼗㼌㼒 㼑㻏 㻃㻶㼈㼐 㼄㼑㼗㼌㼆㼖㻃㻉 㻃㼗㼋㼈㻃㻪㼕㼌㼇 䇶

㻨㼑㼇 㻃㼗㼒㻃㻨㼑㼇 㻃㼏㼌㼑㼎㼌㼑㼊 㻃㼒㼉㻃㼇 㼄㼗㼄㻃㼄㼑㼇

㼌㼑㼉㼒㼕㼐 㼄㼗㼌㼒㼑

㻬㼑㻃㼆㼋㼈㼐 㼌㼖㼗㼕㼜㻃㼗㼋㼌㼖㻃㼆㼄㼑㻃㼅 㼈㻃㼄㻃㼙㼈㼕㼜㻃㼏㼒 㼑㼊 㻃㼆㼋㼄㼌㼑

(3)

October 2004

October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics

The Comb

e

Chem Project

㻦㼒㼏㼏㼈㼆㼗㻃㼇 㼄㼗㼄㻃㼚 㼌㼗㼋㻃㼕㼈㼊 㼄㼕㼇 㻃㼗㼒㻃㼋㼒㼚 㻃㼌㼗

㼆㼒㼘㼏㼇 㻃㼈㼙㼈㼑㼗㼘㼄㼏㼏㼜㻃㼅 㼈㻃㼘㼖㼈㼇

㻰㼄㼎㼈㻃㼖㼘㼕㼈㻃㼗㼋㼈㻃㼐 㼈㼗㼄㼇 㼄㼗㼄㻃㼌㼖㻃㼒 㼉㻃㼋㼌㼊 㼋㻃㼔 㼘㼄㼏㼌㼗㼜

㻵㼈㼆㼒 㼕㼇 㻃㼓 㼕㼒 㼓 㼈㼕㼏㼜㻃㼄㼗㻃㼖㼒 㼘㼕㼆㼈

㻷㼋㼈㻃㻦㼋㼈㼐 㼌㼖㼗㼕㼜㻃㻯㼄㼅

㻳㼈㼒 㼓 㼏㼈㻃㻉 㻃㻰㼄㼆㼋㼌㼑㼈㼖㻃㼚 㼒 㼕㼎㼌㼑㼊 㻃㼗㼒 㼊 㼈㼗㼋㼈㼕

October 2004

October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics

People

Chemistry (Southampton & Bristol)

Mike Hursthouse, Chris Frampton, Jon Essex, Jeremy Frey, Guy

Orpen, Stephan Christensen, Thomas Gelbrich, Sam Peppe,

Hongchen Fu, Graham Tizard, Suzanna Ward, Lefteris Danos, Jamie

Robinson, Kieron Taylor

National Crystallography Service (NCS)

Simon Coles, Mark Light, Ann Bingham

Electronics and Computer Science (Southampton)

Dave De Roure, Luck Moreau, Mike Luck, Hugo Mills, Graham Smith,

Simon Miles, Nicky Harding, Gareth Hughes, monica Schraefel, Terry

Payne

It-Innovation (Southampton)

Mike Surridge, Ken Meacham, Steve Taylor, Daren Marvin

Statistics (Southampton)

Alan Welsh, Sue Lewis, Ralph Manson, Dave Woods

(4)

October 2004

October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics

•Bristol

•Chemistry

ECS

Stats

Chemistry

Combi

Centre

•Southampton

•NCS

•IUPAC

•RSC

•IBM

•CCDC

•Pfizer

•IT

•Innovation

•Comb

e

Chem Partners

•GSK

•AZ

October 2004

October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics

Literature

Analysis

Statistics

Plan

Access to data

Experiments

Smart Labs

High

Throughput

measurement

Dissemination

E-Bank

Data

Design

(statistics)

(5)

October 2004

October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics

Plan &

COSHH

Digital Model

Information

Integration

Report

Knowledge

Goal

Literature

Synthesis

not just one laboratory

but many co-laboratories

working together

Analysis

Smart Laboratory

October 2004

October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics

Chemists and programming

Chemists and programming

Many Chemists

Many Chemists

think that they

think that they

can program

can program

(6)

October 2004

October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics

What about that! His brain still uses perl scripts

e

-Workflow

Some Chemists

can

October 2004

October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics

Plans

Plans

Small set of

fixed plans

Variable plans,

written by chemist

(difficult!)

Ad-hoc, implied

by process

execution

NCS

Tea

SHG

Co

ntin

uu

m

of

pla

n ty

pe

(7)

October 2004

October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics

A chemistry lab is a hostile environment

without much room to maneuver

what can be captured captured

automatically with sensors?

what must rely on manual annotation?

The fume

cupboard

The chemist

October 2004

October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics

critical data

entry

Industrial support

(8)

October 2004

October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics

Getting real

Getting real

Functional prototype for in-lab, real use testing

Functional prototype for in-lab, real use testing

October 2004

October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics

very precise scales - but not connected to any recording device

(9)

October 2004

October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics

Much more automation in modern chemistry

“That is so cool Dave, you only need a palm pilot”

October 2004

October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics

Getting not just the what and how, but the

why

(10)

October 2004

October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics

Data model

Data model

Process record

Provenance record

Measurements

Processes

Annotations

Service invocations

Secure time-stamps

etc…

In

cr

ea

si

ng

d

et

ai

l

Plan

Intended actions:

guide to chemist,

or [later] workflow

October 2004

October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics

Review over Tea

Review over Tea

We ran through our lo-

We ran through our lo-

fi

fi

prototypes with chemists by

prototypes with chemists by

running the tea experiment

running the tea experiment

They knew what was going on and could comment on

They knew what was going on and could comment on

veracity, features, process

(11)

October 2004

October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics

Extensions:

Extensions:

Ray Cooke

Ray Cooke

Scrolling

Scrolling

through lab

through lab

books

books

Will Davies

Will Davies

Automating

Automating

TLC plate

TLC plate

capture for

capture for

record and

record and

annotation

annotation

October 2004

October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics

Services

Results

Data

Semantic

Web

Links

User

Data access via

Semantic Web

Semantic

Data

Provenance

Data

Middleware

(SOAP)

Da

ta

sto

ra

ge

via

Je

na

Services

Services

Results

Data

Semantic

Web

Links

User

Data access via

Semantic Web

Semantic

Data

(12)

October 2004

October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics

Databases

Databases

Database will become the key method of

Database will become the key method of

handling all data

handling all data

Metadata must be generated at inception

Metadata must be generated at inception

and added as data traverses the workflow

and added as data traverses the workflow

Version control, audit and backup

Version control, audit and backup

handled at the database level.

handled at the database level.

October 2004

(13)

October 2004

October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics

October 2004

October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics

Databases - Our experience

Databases - Our experience

What do you do when the actual users

What do you do when the actual users

keep changing their mind?

keep changing their mind?

Is a traditional relational database

Is a traditional relational database

suitable?

suitable?

Danger of re-enforcing scientific bias

Danger of re-enforcing scientific bias

against relational database for

against relational database for

laboratory data.

laboratory data.

(14)

October 2004

October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics

October 2004

October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics

Property in RDF

Property in RDF

<c:OrganicMolecule rdf:about="file:///storage/ba8efc2ce0edada69d63b02d1b8630c6.rdf">

<c:has-inchi>1.12Beta/C12H13NO2/c1-2-15-8-9-5-6-11(14)12-10(9)4-3-7-13-12/h1H3,2H2,3-7H,8H2,14H</c:has-inchi>

<c:has-cas>22049-19-0</c:has-cas>

<c:has-empirical-formula>C12H13NO2</c:has-empirical-formula>

<c:has-stereocentres>0</c:has-stereocentres>

<c:has-property>

<c:MeltingPoint>

<c:has-information>

<c:Information>

<c:has-value>150</c:has-value>

<c:has-uncertainty>

<c:Range>

<c:has-value>16</c:has-value>

</c:Range>

</c:has-uncertainty>

</c:Information>

</c:has-information>

</c:MeltingPoint>

</c:has-property>

(15)

October 2004

October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics

Schema

Schema

<rdfs:Class rdf:about="&c;OrganicMolecule">

<rdfs:label>Organic Molecule</rdfs:label>

<rdfs:subClassOf rdf:resource="&c;Molecule" />

</rdfs:Class>

<rdfs:Class rdf:about="&c;PhysicalProperty">

<rdfs:label>Property</rdfs:label>

</rdfs:Class>

<rdfs:Class rdf:about="&c;PartitionCoefficient">

<rdfs:label>Paritition Coefficient</rdfs:label>

<rdfs:subClassOf rdf:resource="&c;PhysicalProperty" />

<rdfs:description>Ratio of substance dissolved in octan-1-ol and water

</rdfs:descrip tion>

</rdfs:Class>

This turns out to be a very flexible approach

October 2004

October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics

1 1 2 2 1 3 Sample of

4-flourinated biphenyl

Add Reflux

Butanone Sample of K2CO3 Powder Weigh grammes 0.9031 Measure 40 ml Add Weigh 2.0719 g text Butanone dried via silica column and

measured into 100ml RB flask. Used 1ml extra solvent to wash out

container.

Started reflux at 13.30. (Had to change heater stirrer) Only reflux for 45min, next step 14:15. Add Reflux Add

Dissolve

4-flourinated

biphenyl in

butanone

Add K2CO3

powder

Heat at reflux

for 1.5 hours

text

Annotate

Annotate

Ingredient List

(16)

October 2004

October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics

1 1 2 2 1 3

Sample of 4-flourinated

biphenyl

Add Reflux

Butanone Sample of K2CO3 Powder Weigh grammes 0.9031 Measure 40 ml Add Weigh 2.0719 g text Butanone dried via silica column and

measured into 100ml RB flask. Used 1ml extra solvent to wash out

container.

Started reflux at 13.30. (Had to change heater stirrer) Only reflux for 45min, next step 14:15. Add Reflux Add

Dissolve

4-flourinated

biphenyl in

butanone

Add K2CO3

powder

Heat at reflux

for 1.5 hours

text

Annotate

Annotate

Ingredient List

Fluorinated biphenyl 0.9 g Br11OCB 1.59 g Potassium Carbonate 2.07 g Butanone 40 ml

October 2004

October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics

(17)

October 2004

October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics

October 2004

October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics

Lessons

Lessons

That we need two related

That we need two related

ontologies

ontologies

Plan

Plan

that are going to be done

that are going to be done

Record

Record

what was done

what was done

Not necessarily the same thing

Not necessarily the same thing

Steps are added/repeated during the

Steps are added/repeated during the

experiment

experiment

(18)

October 2004

October 2004

No, the computers are up, We’re down

Jeremy FreyJeremy Frey IntechIntech Informatics Informatics

Experiments on the Grid

National Crystallography

Grid Service

October 2004

October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics

Security

and trust

for

(19)

October 2004

October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics

The

The

Grid Zone

Grid Zone

Security is fundamental

Security is fundamental

Who is using our experiments

Who is using our experiments

Insulate them from each other and

Insulate them from each other and

from the rest of our institution

from the rest of our institution

Process & Role based security

Process & Role based security

Use DMZ

Use DMZ

This combination creates a

This combination creates a

Grid Zone

Grid Zone

October 2004

October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics

NCS Grid Service

NCS Grid Service

Architecture

(20)

October 2004

October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics

October 2004

(21)

October 2004

October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics

Dissemination & Publication

Dissemination & Publication

A different approach is required to provide

A different approach is required to provide

data to the community

data to the community

The grid provides the necessary medium

The grid provides the necessary medium

What & how do we want to make available

What & how do we want to make available

October 2004

October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics

Journals:

Journals:

Publication @ source

Publication @ source

Journal

Journal

Materials

Database

Multimedia

Laboratory Data

Paper

(22)

October 2004

October 2004

SVG active graphics

Jeremy FreyJeremy Frey IntechIntech Informatics Informatics

October 2004

October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics

Grid

E-Scientists

Entire E-Science Cycle

Encompassing

experimentation,

analysis, publication,

research, learning

5

Institutional

Archive

Local

Web

Publisher

Holdings

Digital

Library

E-Scientists

Graduate

Students

(23)

October 2004

October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics

The need for

xtl-

Prints

100’s of

structures

How do we

disseminate?

National Crystallography

Service

October 2004

October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics

Comb

i

chem

Comb

e

chem

The need for

xtl-

Prints

DATA

PUBLICATION

(24)

October 2004

October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics

Grid

Semantic (Pervasive) Grid

October 2004

October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics

e

-worries

WSRF

GTi

Must ensure this

is not a problem

for applications

(25)

Oct 2004

Oct 2004 Jeremy FreyJeremy Frey InformaticsInformatics4949

Making sure other people

Making sure other people

can re-use your data easily

can re-use your data easily

and with confidence

and with confidence

Even when there is a huge

Even when there is a huge

amount of it!

amount of it!

October 2004

October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics

Web sites?

www.combechem.org

www.smarttea.org

www.soton.ac.uk/~xservice

ecrystals.chem.soton.ac.uk

(26)

October 2004

October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics

Changing the way we work

Changing the way we work

Data Provenance

Quantum Mechanical

Analysis PropertiesPrediction Data Mining, QSAR, etc Design of

Experiment

E-Lab: Combinatorial

Synthesis

E-Lab: Properties Measurement E-Lab:

X-Ray Crystallography

Laboratory

Processes LaboratoryProcesses

Structures DB

Properties DB

Data Streaming Authorship/

Submission Visualisation Agent Assistant Laboratory

Processes

References

Related documents

Next we look at whether upscaling is causing displacement of lower socio-economic groups and then propose a way to account for the significant presence of upscaling in Outer

Additionally, any further increase in the prices of newer formats of air fresheners including electric air fresheners and aerosols, which are already considered as expensive

animal services, misc financial assistance 23 5% Information Services Agency contact information w/o specified needs,. I&amp;R services and products for public

(f) The proposed modified PID plus feedforward controller is focused on the idea of practical and easy design procedure approach. Hence, the proposed controller performance

L’archive ouverte pluridisciplinaire HAL, est destin´ ee au d´ epˆ ot et ` a la diffusion de documents scientifiques de niveau recherche, publi´ es ou non, ´ emanant des

(a) Temperature time average for uncontrolled, proportional (P), proportional-integral (PI) and proportional- integral-derivative (PID) controllers and (b) instantaneous

At this stage, the reviewers have as- sembled a preliminary list of archi- tecture decisions and decision forces, so Step 5’s goal is twofold: clarify the architecture decisions

NUANS® – Canadian Trademark search - Report contains Federal trademark registrations and applications as well as provincial corporate and business names, Federal corporations and