• No results found

Problems in research data archiving

N/A
N/A
Protected

Academic year: 2021

Share "Problems in research data archiving"

Copied!
22
0
0

Loading.... (view fulltext now)

Full text

(1)

Problems in research data archiving

Iver Lauermann

(2)

Outline

Why archiving of research data? Archiving vs. storage?

Archiving policy at HZB

Recent history of archiving at HZB Discussion: what, who, when, where Data formats for archiving

Obstacles to archiving in everyday work Examples for current data storage at HZB

– Electronic lab-book

– Scanning of paper lab-book – Storage of large amounts of data

Conclusions

(3)

3

What is archiving of data?

Archiving

Secure, long-time storage of data

secure: self-burned cds or dvds are not durable

long-time: electronic data-formats change quickly

and hardware fails with time, formats need to be

regularly updated and adapted

Requires meta-data, i.e. a description of the stored data

that allows (expert) users the interpretation of the

archived data

(4)

4

Good reasons to archive data

Makes later use of data by yourself or others possible!

usually only fraction of available data is published

new knowledge makes later use worthwhile

for open access to data, archiving is a necessary

condition

Preservation of evidence for data from a publication

In many alleged cases of scientific misconduct original

data had disappeared

(5)

Archiving policy at HZB

„Any scientific (raw) data on which a publication is based is

documented and securely stored, for at least 10 years, at the

organisational (department, institute) unit of its origin“

(from „Regeln zur Sicherung guter wissenschaftlicher Praxis und zum Verfahren bei wissenschaftlichem Fehlverhalten 14.06.2002.“ Based on the „Proposals for Safeguarding Good Scientific Practice”, January, 1998 by the Deutsche Forschungsgemeinschaft (DFG)

(6)

Bad examples: the „NASA effect“

Data from space explorer Viking were partly useless in 1985 after only 9 years of

storage. Data from 30 years of space exploration were stored on 1.2 million magnetic

tapes by NASA in 1976 but in the mid-nineties many were useless1.

“Last month, researchers working out of

an abandoned McDonald’s restaurant on the grounds of NASA Ames Research

Center recovered data collected by

NASA’s Nimbus II satellite on 23

September 1966 …..…. The LOIRP2 team

obtained $750,000 from NASA and private enterprise and enlisted the assistance of a retired Ampex engineer. They cleaned, rebuilt, and reassembled one drive, then designed and built equipment to convert the analog signals into an exact 16-bit

digital copy.”3

1Hilmar Schmundt: Im Dschungel der Formate. In: Der Spiegel 26/2000. URL: 2Lunar Orbiter Image Recovery Project

3SCIENCE Magazine, 12 March 2010, Vol. 327, p. 1322

(7)

History at HZB

Initiated by Wolfgang Fritsch (HMI library), 2 working groups (structure of matter department and energy department) on archiving were

established early in 2007

In 2007/2008 2 papers with recommendations were written

July 2008: 2 day Workshop on archiving at HMI with participation of BESSY and external speakers

2008 final paper with recommendations

December 2010: announcement of archive server by IT department

However, no “true archiving” but secure, long term storage

Currently, a working group at HZB is searching for foreseeable demand, cost and technical solutions (hardware, software),

(8)

Recommendations: what, who, when, where, how

What: initially only data on which a publication is based has to be archived, but in the long run all research data should be archived

Who: head of organizational unit (i.e. institute director) is responsible but each scientist should be educated and be responsible for archiving

When: on a regular basis; after a publication has appeared

Where: central storage and support by IT is necessary

How: As easy for users as possible, as flexible as possible, including current methods

(9)

Since Dec. 2010 „official“ archive server at HZB

Data are stored on magnetic tape, can not be accessed unless removed from archive

Path \\home\NB-Archiv\dauer\ordnername can be addressed by every

internal HZB computer and filled with data

Content of folders is copied on several magnetic tapes when no more changes detected, removed from hard drive

Archiving over a period of 5 or 10 years

Archives are checked once a year to ensure legibility

Only members of respective department can access data

(from “FM-D info, Neues und Wissenswertes aus der zentralen

Datenverarbeitung“ Mai 2011)

(10)

10

Archiving of publication data

Archiving of all data

Current

Mid-long term

Disadvantage:

Archiving of only publication relevant data is extra ‘overhead' because not integrated in a normal work flow of acquiring and automatically archiving data

However, it is debatable if complete archiving of all data and meta data is practical and desirable

(11)
(12)

„The existing rules have been amended, late in 2010, with regulations about orphaned data or rather orphaned-to-be data, i.e. data from

scientific organisational units which are being discontinued, in case there is no other scientific organisational unit which can take care of such data. In such case data should be documented and taken care of

by the library for data in paper form

by IT services for digital data“

(HZB Intranet Bibliothek 23. Mai 2011)

(13)

13

Problems with archiving in the energy department

Experiments are manifold and complex, i.e. multi-step preparation

of materials and thin films (absorbers, buffer and window layers,

characterization of materials and devices by a multitude of

methods

Data are generated in many different electronic formats or on

paper

(14)

14 Probenpreparation (in-situ Messungen) Processing (Kontakte, ARC...) Solarzelle Teststruktur Grundlagen- Forschung Qualitätscheck (I-V,IPCE...)

Planung/Simulation Externe Proben

(15)

15

Problems with archiving in the energy department

Research is largely based on graduate students, large fluctuation

makes archiving difficult but especially necessary

No common format or specific rules, quality of archived data

depends on individuals

Extra workload for scientists, who often don’t see the necessity

Archiving of raw data without link to a publication is often not

regarded as necessary nor feasible

(16)

16

Examples for formats of meta data

(17)

17

Examples for formats of meta data

(18)

18

Scanned lab book for spectrocopic meta data

(19)

BENSC-instrumentation

E-Hall: E1,E2,E3,E4,E5,E6,E7,E9,E10 V-Hall: V1,V3,V6,V7,V12 NLH2: V5,V15,V16

central IT

BADGE portal for BENSC-user Oracle- Data base WEB- portal sam p le en v ir o n m en t

raw data, log-files

(automatic copy)

experiment control:

CARESS, MAD, IGOR SAN-file saving:

experiment, year,

month, run-number

current state/

proposal number, etc.

meta data

meta data

NEXUS data format

raw data + complete experiment description experiment-specific data meta data Meta data-server

planned

(20)

Current status at HZB

3-year project was launched to develop real archiving

Define future data volume (currently few TB/week)

A small number of experiments produce most data

(tomography)

Automatic storage of all user data from large scale

instruments is debated (BER and BESSY II)

(21)

Conclusions

Data archiving is necessary and compulsory at HZB

Centralized storage space for paper-based and electronic data is available

So far archiving is mainly done in the departments on local computers

In most departments only publication-related data is archived

Quality of archiving very different and dependent on individuals

Very few common standards, formats and rules established

Improvement by education and better support necessary

3-year project was launched to develop real archiving

(22)

Acknowledgements

Wolfgang Fritsch (formerly HZB library)

Jens Uwe Hoffmann (structure of matter department)

Marcus Kühbacher (structure of matter department)

Christof Rethfeldt (structure of matter department)

Andreas Schöpke (energy department)

Klaus Schwarzburg (energy department)

Dietmar Herrendörfer (information technology)

Christian Jung (Scientific-Technical Infrastructure )

References

Related documents

In their review of the Arabic EDLM literature documented in the Shamaa database, Karami-Akkari and El-Saheb (2019) found that 77.7% of the articles were from Arab countries

Under certain circumstances, YBX1 also promotes the expression of multidrug resistance 1 (MDR1) gene, which is involved in the development of drug resistance. Thus, it is critical

voice broking, electronic broking and post-trade services and superior EPS growth for our

Chairman’s Statement, Financial Highlights, Group Chief Executive Officer’s Review, Operating and Financial Review, Directors and Secretary, Directors’ Report, Corporate Governance

2–4 Units Cooperatives Balloon Mortgages Non-Traditional Credit Temporary Buy downs Manufactured Housing Properties located in Guam Non-Occupant Co-borrowers ARMs with

La ganancia excesiva de peso, así como la dieta rica en comida basura durante la gestación juegan un papel primordial en la programación fetal.. Ésta está

• Learn the steps to developing a successful program • Identify existing resources on campus to incorporate • Identify supportive ways to partner with students to.. develop

Results: Knockdown of PTOV1 in prostate epithelial cells and HaCaT skin keratinocytes caused the upregulation, and overexpression of PTOV1 the downregulation, of the Notch target