Oct 2004
Oct 2004 Jeremy FreyJeremy Frey InformaticsInformatics11
Automation and
Automation and
Semantics: The
Semantics: The
CombeChem
CombeChem
Experience
Experience
Jeremy Frey
Jeremy Frey
Informatics & Data Visualisation
Informatics & Data Visualisation
Intech
Intech
Centre Oct 2004
Centre Oct 2004
October 2004
October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics
Talk: Workflow
Talk: Workflow
Introduction to
Introduction to
e
e
-Science & the
-Science & the
Combechem
Combechem
Project
Project
Smart Labs
Smart Labs
Semantics & Databases
Semantics & Databases
Publication@Source
Publication@Source
October 2004
October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics
e
e
-Science
-Science
‘e-Science is about global collaboration in key areas of
science, and the next generation of infrastructure that will
enable it.’
‘e-Science will change the dynamic of the way science is
undertaken.’
John Taylor, DG of UK OST
‘[The Grid] intends to make access to computing power,
scientific data repositories and experimental facilities as
easy as the Web makes access to information.’
Tony
Blair, 2002
What is the web?
Publication@Source
trace all the way back from publication to the original data –
provenance
CombeChem
Who needs provenance?
㻥㼘㼖㼋㻏㻃㻥㼏㼄㼌㼕㻃㻉 㻃㻫㼘㼗㼗㼒㼑㻃㻕 㻓 㻓 㻗
October 2004
October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics
The Comb
e
Chem Project
㻷㼋㼈㻃㼈㼛㼓 㼒㼑㼈㼑㼗㼌㼄㼏㻃㼚 㼒㼕㼏㼇 㻃㼒㼉
㼆㼒㼐 㼅 㼌㼑㼄㼗㼒㼕㼌㼄㼏㻃㼖㼜㼑㼗㼋㼈㼖㼌㼖㻃㼄㼑㼇 㻃㼋㼌㼊 㼋
㼗㼋㼕㼒㼘㼊 㼋㼓 㼘㼗㻃㼄㼑㼄㼏㼜㼖㼌㼖㻃㼐 㼈㼈㼗㼖㻃㼗㼋㼈
㼈㼛㼓 㼒㼑㼈㼑㼗㼌㼄㼏㼏㼜㻃㼊 㼕㼒㼚 㼌㼑㼊 㻃㼓 㼒㼚 㼈㼕㻃㼒㼉
㼆㼒㼐 㼓 㼘㼗㼌㼑㼊
㻤㼘㼗㼒 㼐 㼄㼗㼌㼒 㼑㻏 㻃㻶㼈㼐 㼄㼑㼗㼌㼆㼖㻃㻉 㻃㼗㼋㼈㻃㻪㼕㼌㼇 䇶
㻨㼑㼇 㻃㼗㼒㻃㻨㼑㼇 㻃㼏㼌㼑㼎㼌㼑㼊 㻃㼒㼉㻃㼇 㼄㼗㼄㻃㼄㼑㼇
㼌㼑㼉㼒㼕㼐 㼄㼗㼌㼒㼑
㻬㼑㻃㼆㼋㼈㼐 㼌㼖㼗㼕㼜㻃㼗㼋㼌㼖㻃㼆㼄㼑㻃㼅 㼈㻃㼄㻃㼙㼈㼕㼜㻃㼏㼒 㼑㼊 㻃㼆㼋㼄㼌㼑
October 2004
October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics
The Comb
e
Chem Project
㻦㼒㼏㼏㼈㼆㼗㻃㼇 㼄㼗㼄㻃㼚 㼌㼗㼋㻃㼕㼈㼊 㼄㼕㼇 㻃㼗㼒㻃㼋㼒㼚 㻃㼌㼗
㼆㼒㼘㼏㼇 㻃㼈㼙㼈㼑㼗㼘㼄㼏㼏㼜㻃㼅 㼈㻃㼘㼖㼈㼇
㻰㼄㼎㼈㻃㼖㼘㼕㼈㻃㼗㼋㼈㻃㼐 㼈㼗㼄㼇 㼄㼗㼄㻃㼌㼖㻃㼒 㼉㻃㼋㼌㼊 㼋㻃㼔 㼘㼄㼏㼌㼗㼜
㻵㼈㼆㼒 㼕㼇 㻃㼓 㼕㼒 㼓 㼈㼕㼏㼜㻃㼄㼗㻃㼖㼒 㼘㼕㼆㼈
㻷㼋㼈㻃㻦㼋㼈㼐 㼌㼖㼗㼕㼜㻃㻯㼄㼅
㻳㼈㼒 㼓 㼏㼈㻃㻉 㻃㻰㼄㼆㼋㼌㼑㼈㼖㻃㼚 㼒 㼕㼎㼌㼑㼊 㻃㼗㼒 㼊 㼈㼗㼋㼈㼕
October 2004
October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics
People
Chemistry (Southampton & Bristol)
Mike Hursthouse, Chris Frampton, Jon Essex, Jeremy Frey, Guy
Orpen, Stephan Christensen, Thomas Gelbrich, Sam Peppe,
Hongchen Fu, Graham Tizard, Suzanna Ward, Lefteris Danos, Jamie
Robinson, Kieron Taylor
National Crystallography Service (NCS)
Simon Coles, Mark Light, Ann Bingham
Electronics and Computer Science (Southampton)
Dave De Roure, Luck Moreau, Mike Luck, Hugo Mills, Graham Smith,
Simon Miles, Nicky Harding, Gareth Hughes, monica Schraefel, Terry
Payne
It-Innovation (Southampton)
Mike Surridge, Ken Meacham, Steve Taylor, Daren Marvin
Statistics (Southampton)
Alan Welsh, Sue Lewis, Ralph Manson, Dave Woods
October 2004
October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics
•Bristol
•Chemistry
•
ECS
•
Stats
•
Chemistry
•
Combi
•
Centre
•Southampton
•NCS
•IUPAC
•RSC
•IBM
•CCDC
•Pfizer
•IT
•Innovation
•Comb
e
Chem Partners
•GSK
•AZ
October 2004
October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics
Literature
Analysis
Statistics
Plan
Access to data
Experiments
Smart Labs
High
Throughput
measurement
Dissemination
E-Bank
Data
Design
(statistics)
October 2004
October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics
Plan &
COSHH
Digital Model
Information
Integration
Report
Knowledge
Goal
Literature
Synthesis
not just one laboratory
but many co-laboratories
working together
Analysis
Smart Laboratory
October 2004
October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics
Chemists and programming
Chemists and programming
Many Chemists
Many Chemists
think that they
think that they
can program
can program
October 2004
October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics
What about that! His brain still uses perl scripts
e
-Workflow
Some Chemists
can
October 2004
October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics
Plans
Plans
Small set of
fixed plans
Variable plans,
written by chemist
(difficult!)
Ad-hoc, implied
by process
execution
NCS
Tea
SHG
Co
ntin
uu
m
of
pla
n ty
pe
October 2004
October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics
A chemistry lab is a hostile environment
without much room to maneuver
what can be captured captured
automatically with sensors?
what must rely on manual annotation?
The fume
cupboard
The chemist
October 2004
October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics
critical data
entry
Industrial support
October 2004
October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics
Getting real
Getting real
Functional prototype for in-lab, real use testing
Functional prototype for in-lab, real use testing
October 2004
October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics
very precise scales - but not connected to any recording device
October 2004
October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics
Much more automation in modern chemistry
“That is so cool Dave, you only need a palm pilot”
October 2004
October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics
Getting not just the what and how, but the
why
October 2004
October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics
Data model
Data model
Process record
Provenance record
Measurements
Processes
Annotations
Service invocations
Secure time-stamps
etc…
In
cr
ea
si
ng
d
et
ai
l
Plan
Intended actions:
guide to chemist,
or [later] workflow
October 2004
October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics
Review over Tea
Review over Tea
We ran through our lo-
We ran through our lo-
fi
fi
prototypes with chemists by
prototypes with chemists by
running the tea experiment
running the tea experiment
They knew what was going on and could comment on
They knew what was going on and could comment on
veracity, features, process
October 2004
October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics
Extensions:
Extensions:
Ray Cooke
Ray Cooke
Scrolling
Scrolling
through lab
through lab
books
books
Will Davies
Will Davies
Automating
Automating
TLC plate
TLC plate
capture for
capture for
record and
record and
annotation
annotation
October 2004October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics
Services
Results
Data
Semantic
Web
Links
User
Data access via
Semantic Web
Semantic
Data
Provenance
Data
Middleware
(SOAP)
Da
ta
sto
ra
ge
via
Je
na
Services
Services
Results
Data
Semantic
Web
Links
User
Data access via
Semantic Web
Semantic
Data
October 2004
October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics
Databases
Databases
Database will become the key method of
Database will become the key method of
handling all data
handling all data
Metadata must be generated at inception
Metadata must be generated at inception
and added as data traverses the workflow
and added as data traverses the workflow
Version control, audit and backup
Version control, audit and backup
handled at the database level.
handled at the database level.
October 2004
October 2004
October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics
October 2004
October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics
Databases - Our experience
Databases - Our experience
What do you do when the actual users
What do you do when the actual users
keep changing their mind?
keep changing their mind?
Is a traditional relational database
Is a traditional relational database
suitable?
suitable?
Danger of re-enforcing scientific bias
Danger of re-enforcing scientific bias
against relational database for
against relational database for
laboratory data.
laboratory data.
October 2004
October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics
October 2004
October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics
Property in RDF
Property in RDF
<c:OrganicMolecule rdf:about="file:///storage/ba8efc2ce0edada69d63b02d1b8630c6.rdf">
<c:has-inchi>1.12Beta/C12H13NO2/c1-2-15-8-9-5-6-11(14)12-10(9)4-3-7-13-12/h1H3,2H2,3-7H,8H2,14H</c:has-inchi>
<c:has-cas>22049-19-0</c:has-cas>
<c:has-empirical-formula>C12H13NO2</c:has-empirical-formula>
<c:has-stereocentres>0</c:has-stereocentres>
<c:has-property>
<c:MeltingPoint>
<c:has-information>
<c:Information>
<c:has-value>150</c:has-value>
<c:has-uncertainty>
<c:Range>
<c:has-value>16</c:has-value>
</c:Range>
</c:has-uncertainty>
</c:Information>
</c:has-information>
</c:MeltingPoint>
</c:has-property>
October 2004
October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics
Schema
Schema
<rdfs:Class rdf:about="&c;OrganicMolecule">
<rdfs:label>Organic Molecule</rdfs:label>
<rdfs:subClassOf rdf:resource="&c;Molecule" />
</rdfs:Class>
<rdfs:Class rdf:about="&c;PhysicalProperty">
<rdfs:label>Property</rdfs:label>
</rdfs:Class>
<rdfs:Class rdf:about="&c;PartitionCoefficient">
<rdfs:label>Paritition Coefficient</rdfs:label>
<rdfs:subClassOf rdf:resource="&c;PhysicalProperty" />
<rdfs:description>Ratio of substance dissolved in octan-1-ol and water
</rdfs:descrip tion>
</rdfs:Class>
This turns out to be a very flexible approach
October 2004
October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics
1 1 2 2 1 3 Sample of
4-flourinated biphenyl
Add Reflux
Butanone Sample of K2CO3 Powder Weigh grammes 0.9031 Measure 40 ml Add Weigh 2.0719 g text Butanone dried via silica column and
measured into 100ml RB flask. Used 1ml extra solvent to wash out
container.
Started reflux at 13.30. (Had to change heater stirrer) Only reflux for 45min, next step 14:15. Add Reflux Add
Dissolve
4-flourinated
biphenyl in
butanone
Add K2CO3
powder
Heat at reflux
for 1.5 hours
text
Annotate
Annotate
Ingredient List
October 2004
October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics
1 1 2 2 1 3
Sample of 4-flourinated
biphenyl
Add Reflux
Butanone Sample of K2CO3 Powder Weigh grammes 0.9031 Measure 40 ml Add Weigh 2.0719 g text Butanone dried via silica column and
measured into 100ml RB flask. Used 1ml extra solvent to wash out
container.
Started reflux at 13.30. (Had to change heater stirrer) Only reflux for 45min, next step 14:15. Add Reflux Add
Dissolve
4-flourinated
biphenyl in
butanone
Add K2CO3
powder
Heat at reflux
for 1.5 hours
text
Annotate
Annotate
Ingredient List
Fluorinated biphenyl 0.9 g Br11OCB 1.59 g Potassium Carbonate 2.07 g Butanone 40 ml
October 2004
October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics
October 2004
October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics
October 2004
October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics
Lessons
Lessons
That we need two related
That we need two related
ontologies
ontologies
Plan
Plan
–
–
that are going to be done
that are going to be done
Record
Record
–
–
what was done
what was done
Not necessarily the same thing
Not necessarily the same thing
Steps are added/repeated during the
Steps are added/repeated during the
experiment
experiment
October 2004
October 2004
No, the computers are up, We’re down
Jeremy FreyJeremy Frey IntechIntech Informatics InformaticsExperiments on the Grid
National Crystallography
Grid Service
October 2004
October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics
Security
and trust
for
October 2004
October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics
The
The
“
“
Grid Zone
Grid Zone
”
”
Security is fundamental
Security is fundamental
Who is using our experiments
Who is using our experiments
Insulate them from each other and
Insulate them from each other and
from the rest of our institution
from the rest of our institution
Process & Role based security
Process & Role based security
Use DMZ
Use DMZ
This combination creates a
This combination creates a
“
“
Grid Zone
Grid Zone
”
”
October 2004
October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics
NCS Grid Service
NCS Grid Service
Architecture
October 2004
October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics
October 2004
October 2004
October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics
Dissemination & Publication
Dissemination & Publication
A different approach is required to provide
A different approach is required to provide
data to the community
data to the community
The grid provides the necessary medium
The grid provides the necessary medium
What & how do we want to make available
What & how do we want to make available
October 2004
October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics
Journals:
Journals:
Publication @ source
Publication @ source
Journal
Journal
Materials
Database
Multimedia
Laboratory Data
Paper
October 2004
October 2004
SVG active graphics
Jeremy FreyJeremy Frey IntechIntech Informatics InformaticsOctober 2004
October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics
Grid
E-Scientists
Entire E-Science Cycle
Encompassing
experimentation,
analysis, publication,
research, learning
5
Institutional
Archive
Local
Web
Publisher
Holdings
Digital
Library
E-Scientists
Graduate
Students
October 2004
October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics
The need for
xtl-
Prints
100’s of
structures
How do we
disseminate?
National Crystallography
Service
October 2004
October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics
Comb
i
chem
Comb
e
chem
The need for
xtl-
Prints
DATA
PUBLICATION
October 2004
October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics
Grid
Semantic (Pervasive) Grid
October 2004
October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics
e
-worries
WSRF
GTi
Must ensure this
is not a problem
for applications
Oct 2004
Oct 2004 Jeremy FreyJeremy Frey InformaticsInformatics4949
Making sure other people
Making sure other people
can re-use your data easily
can re-use your data easily
and with confidence
and with confidence
Even when there is a huge
Even when there is a huge
amount of it!
amount of it!
October 2004
October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics
Web sites?
www.combechem.org
www.smarttea.org
www.soton.ac.uk/~xservice
ecrystals.chem.soton.ac.uk
October 2004
October 2004 Jeremy FreyJeremy Frey IntechIntech Informatics Informatics
Changing the way we work
Changing the way we work
Data Provenance
Quantum Mechanical
Analysis PropertiesPrediction Data Mining, QSAR, etc Design of
Experiment
E-Lab: Combinatorial
Synthesis
E-Lab: Properties Measurement E-Lab:
X-Ray Crystallography
Laboratory
Processes LaboratoryProcesses
Structures DB
Properties DB
Data Streaming Authorship/
Submission Visualisation Agent Assistant Laboratory
Processes