• No results found

The curation of laboratory experimental data as part of the overall data lifecycle

N/A
N/A
Protected

Academic year: 2020

Share "The curation of laboratory experimental data as part of the overall data lifecycle"

Copied!
37
0
0

Loading.... (view fulltext now)

Full text

(1)

The

The

curation

curation

of laboratory experimental

of laboratory experimental

data as part of the overall data

data as part of the overall data

lifecycle

lifecycle

Jeremy

Jeremy

G.Frey

G.Frey

School of Chemistry, University of

School of Chemistry, University of

Southampton, UK

Southampton, UK

21 Nov 2006

21 Nov 2006

DCC Conference, Glasgow

(2)

21 Nov 2006

21 Nov 2006 Jeremy G. FreyJeremy G. Frey DCC Conference 2006

If you do things right at the

start then all the following

processes are much easier!

Exponentially growing

(3)

The

The

Comb

Comb

e

e

Chem

Chem

Project

Project

End to End linking of data and

End to End linking of data and

information

information

Publication@Source

Publication@Source

So collect data with regard to how it

So collect data with regard to how it

could eventually be used

could eventually be used

Make sure the metadata is of high quality

Make sure the metadata is of high quality

Record properly at source in Digital Form

Record properly at source in Digital Form

The Chemistry Lab

The Chemistry Lab

(4)

21 Nov 2006

21 Nov 2006 Jeremy G. FreyJeremy G. Frey DCC Conference 2006

Combechem

Smart Lab

R4L

e-Bank

E-Malaria

Instruments on the Grid

(5)

Plan &

COSHH

Digital Model

Information

Integration

Report

Knowledge

Goal

Literature

Synthesis

not just one laboratory

but many co-laboratories

working together

Analysis

Smart Laboratory

Smart Storage

Smart Dissemination

Smart HCI

The concept of Publication @ Source

The concept of Publication @ Source

(6)

21 Nov 2006

21 Nov 2006 Jeremy G. FreyJeremy G. Frey DCC Conference 2006

If only I knew exactly

how she did this

experiments

I know all this supplementary

information could be useful but

will people really remember the

format? Is it worth all the

hassle?

I wish I could get

the numbers from

this graph - the pdf

is not much use.

I wish I had

recorded things at

the start the way I

do now…..

(7)

First, they do an online search

Need to make

the data

available

Need to be

able to find it

(8)

21 Nov 2006

21 Nov 2006 Jeremy G. FreyJeremy G. Frey DCC Conference 2006

I am sure we collected that information a

few years ago…

The details should be in her thesis…..

Can you read what he says here….?

Can you find the file of data

that were used to make the

plot?

(9)

What are the people up to?

What are the people up to?

Capture Data and Context

Capture Data and Context

People

People

Process

Process

(10)

21 Nov 2006

21 Nov 2006 Jeremy G. FreyJeremy G. Frey DCC Conference 2006

Permanent,

documented

and primary

record of

laboratory

(11)

Observations

are

never

collected on note pads,

filter paper or other

temporary paper for

later transfer into a

notebook

(12)

21 Nov 2006

21 Nov 2006 Jeremy G. FreyJeremy G. Frey DCC Conference 2006

COSHH

COSHH

L

L

everage off things we already

everage off things we already

have to do

have to do

We have a cunning

We have a cunning

plan

(13)

1 1 2 2 1 3 1 4 Sample of

4-flourinated biphenyl

Add Reflux Cool

Butanone Sample of

K2CO3 Powder Weigh grammes 0.9031 Measure 40 ml Add Weigh 2.0719 g text 3 5 Add g Sample of Br11OCB 2 6 Reflux 2 7 Cool Water Measure 30 ml 9 Liquid-liquid extraction DCM Measure

3 of 40 ml

10 Dry MgSO4 11 Filter (Buchner) 12 Remove Solvent by Rotary Evaporation 13 Fuse Silica 14 Column Chromatography Ether/ Petrol Ratio

Butanone dried via silica column and measured into 100ml RB flask. Used 1ml extra solvent to wash out

container.

Started reflux at 13.30. (Had to change heater stirrer) Only reflux

for 45min, next step 14:15. Inorganics dissolve 2 layers. Added brine

~20ml.

Organics are yellow solution

Washed MgSO4 with DCM ~ 50ml

Measure

excess

Observation Types

weight - grammes

measure - ml, drops

annotate - text

temperature - K, °C

Key

Process

Input

Literal

Observation

Add Reflux Cool Add

Add Reflux Cool Dry Filter Remove Solvent by Rotary Evaporation Fuse Column Chromatography Dissolve 4-flourinated biphenyl in butanone Add K2CO3

powder Heat at refluxfor 1.5 hours Cool and addBr11OCB Heat atreflux until completion

Cool and add

water (30ml) Combine organics,dry over MgSO4 & filter Remove solvent in vacuo Liquid-liquid extraction Extract with DCM (3x40ml)

Fuse compound to silica & column in ether/petrol

4 8 Add Add text Annotate Annotate text Weigh Annotate g Annotate Annotate text text Future Questions

Whether to have many subclasses of processes or fewer with annotations How to depict destructive processes

How to depict taking lots of samples

What is the observation/process boundary? e.g. MRI scan

1.5918

Combechem 30 January 2004 gvh, hrm, gms

Ingredient List

Fluorinated biphenyl 0.9 g Br11OCB 1.59 g Potassium Carbonate 2.07 g Butanone 40 ml

(14)

21 Nov 2006

21 Nov 2006 Jeremy G. FreyJeremy G. Frey DCC Conference 2006

1 1 2 2 1 3

Sample of 4-flourinated

biphenyl

Add Reflux

Butanone Sample of

K2CO3 Powder Weigh grammes 0.9031 Measure 40 ml Add Weigh 2.0719 g text Butanone dried via silica column and

measured into 100ml RB flask. Used 1ml extra solvent to wash out

container.

Started reflux at 13.30. (Had to change heater stirrer) Only reflux

for 45min, next step 14:15.

Add Reflux Add

Dissolve

4-flourinated

biphenyl in

butanone

Add K2CO3

powder

Heat at reflux

for 1.5 hours

text

Annotate

Annotate

Ingredient List

Fluorinated biphenyl 0.9 g

Br11OCB 1.59 g

Potassium Carbonate 2.07 g

(15)

Pub-Sub systems provide the flexible & extensible approach

to distribution of real time laboratory monitoring & archiving

(16)

21 Nov 2006

21 Nov 2006 Jeremy G. FreyJeremy G. Frey DCC Conference 2006

But what

about the

laboratory

environment?

(17)

Semantic

Semantic

DataGrid

DataGrid

CombeChem

CombeChem

used, tested &

used, tested &

strained the Semantic Web

strained the Semantic Web

for

for

Enhanced (annotated)

Enhanced (annotated)

DataGrid

DataGrid

over multiple diverse

over multiple diverse

stores

stores

Storage of Provenance

Storage of Provenance

Information

Information

Some Data Storage

Some Data Storage

Annotated multimedia streams

Annotated multimedia streams

(18)

21 Nov 2006

21 Nov 2006 Jeremy G. FreyJeremy G. Frey DCC Conference 2006

Laboratory

Laboratory

Blogs

Blogs

Laboratory notebook is a

Laboratory notebook is a

Blog

Blog

Encourage and facilitate collaboration

Encourage and facilitate collaboration

Need a data repository behind the

Need a data repository behind the

B

B

log

log

R4L

R4L

E-Bank

E-Bank

Flexible

Flexible

Service oriented approach

Service oriented approach

being developed

being developed

(19)

Instrument Blog

(20)

21 Nov 2006

21 Nov 2006 Jeremy G. FreyJeremy G. Frey DCC Conference 2006

The ‘Scientific Blog’ is being tried in an

(21)

Format Issues

– everyday and

for the long

(22)

21 Nov 2006

21 Nov 2006 Jeremy G. FreyJeremy G. Frey DCC Conference 2006

Note the use of

“YouTube”

(23)

Record the ‘Scientific Conversation’ –

this part of the record often exists

only in the ‘grey literature’

CoAKTing

(24)

21 Nov 2006

21 Nov 2006 Jeremy G. FreyJeremy G. Frey DCC Conference 2006

Laboratory

Laboratory

IRs

IRs

and Information

and Information

Management

(25)
(26)

21 Nov 2006

21 Nov 2006 Jeremy G. FreyJeremy G. Frey DCC Conference 2006

Validation

Validation

Increasing the value of data

Increasing the value of data

How to bring all the necessary information

How to bring all the necessary information

together to enable appropriate validation

together to enable appropriate validation

Increasingly difficult & expensive to

Increasingly difficult & expensive to

achieve

achieve

Need provenance and context

Need provenance and context

Essential step otherwise just a collection

Essential step otherwise just a collection

of items

(27)

Why?

Why?

Publishing Data and Information

Publishing Data and Information

Loss

(28)

21 Nov 2006

21 Nov 2006 Jeremy G. FreyJeremy G. Frey DCC Conference 2006

SVG “active”

graphics

Link to data, follow links

back to the raw data

archive

Link to simulation, full

simulation data archived

in BioSimGrid

R4L

(29)

Access to information requires

Access to information requires

crossing administrative domains

crossing administrative domains

Researcher

National

Archive

Research

Group

Institution

International

Database

Research

(30)

21 Nov 2006

21 Nov 2006 Jeremy G. FreyJeremy G. Frey DCC Conference 2006

Subversive

and furtive

sharing &

exploitation

of data in

virtual

space

Data

CAS

RDF

OAI

Taxi

E-user

(31)
(32)

21 Nov 2006

21 Nov 2006 Jeremy G. FreyJeremy G. Frey DCC Conference 2006

Metadata Lifecycle

Metadata Lifecycle

Creation and maintenance of metadata

Creation and maintenance of metadata

Need a metadata infrastructure as well as

Need a metadata infrastructure as well as

a data infrastructure

a data infrastructure

Capture process as well as results

Capture process as well as results

Automatic metadata generation when

Automatic metadata generation when

possible

possible

(33)

Plans

Plans

Plans are useful

Plans are useful

This is the wa

This is the wa

y

y

things are supposed to be

things are supposed to be

done

done

The

The

Plan

Plan

p

p

rovide

rovide

s

s

a

a

digital context so

digital context so

increases the value of planning

increases the value of planning

Key to our

Key to our

Smart Lab

Smart Lab

approach

approach

.

.

(34)

21 Nov 2006

21 Nov 2006 Jeremy G. FreyJeremy G. Frey DCC Conference 2006

Who is responsible

Who is responsible

C

C

ontext is crucial for

ontext is crucial for

curation

curation

every person, on each step of the process

every person, on each step of the process

of converting data to knowledge

of converting data to knowledge

N

N

eed to consider the future access to this

eed to consider the future access to this

information by themselves and others.

(35)

Information

Providers

Information

Consumers

(36)

21 Nov 2006

21 Nov 2006 Jeremy G. FreyJeremy G. Frey DCC Conference 2006

All I am saying is that now is the time to

develop the technology to deflect an asteroid

(37)

PEOPLE

PEOPLE

Southampton

Southampton

ECS,

ECS,

MATHS & CHEMISTRY

MATHS & CHEMISTRY

IT-INNOVATION

IT-INNOVATION

BRISTOL

BRISTOL

UKOLN

UKOLN

CCLRC

CCLRC

INDIANA

INDIANA

SYDNEY

SYDNEY

MANCHESTER

MANCHESTER

EPRSC e-Science &

EPRSC e-Science &

Chemistry Programmes

Chemistry Programmes

JISC

JISC

e-Infrastructre

e-Infrastructre

DTI

DTI

See web site for full

See web site for full

details and links

details and links

References

Related documents

The uniaxial compressive strengths and tensile strengths of individual shale samples after four hours exposure to water, 2.85x10 -3 M cationic surfactant

In addition, the findings further reveal that none of the studies investigate the mediating effect of the nature of innovation (incremental and radical) on the relationship

Hence, although glycine and D-serine bind to delta2 and reduce the spontaneous current through delta2-lurcher channels (Naur et al., 2007), neither substance evokes current responses

The specific questions posed to industry can be classified under the headings of i) quantitative shipping data; ii) public health incident data; iii) current

FICO is the borrower’s credit quality score at application; LTV is the loan- to-value ratio at application; OPTION is the value of the borrower’s prepayment option reflecting

Abstract In this paper the well-known minimax theorems of Wald, Ville and Von Neumann are generalized under weaker topological conditions on the payoff function ƒ and/or extended

For this, OpenFPM provides an encapsulated serialization/de-serialization sub-system that is used by each processor locally in order to serialize the local pieces of a distributed

The company selected SolidWorks mechanical design software for the Little Benthic Vehicle project because of its ease of use, ability to model organic shapes and surfaces,