Workshop:
Planning Ingest and
Research Data Management
Jens Ludwig
[email protected]‐goeAngen.de
GoeAngen State and University Library
Hamburg, 19
thOctober 2011
Overview
MoNvaNon
nestor "Into the Archive" Ingest Guide
WissGrid Research Data Management Blueprint
Overview
MoNvaNon
nestor "Into the Archive" Ingest Guide
WissGrid Research Data Management Blueprint
MoNvaNon
Two major planning instruments in Germany which address somehow "the first phase":
• The nestor ingest guide and
• the WissGrid research data management blueprint
And a couple of other guides, checklists, etc. have been produced. But how do they compare to each other?
Is not the variety of guides already confusing? Do we need a guide for the guides?
1. We will introduce both guides. This is hopefully interesNng in itself. 2. We want to discuss with you the relaNon of this guides. What use
Overview
MoNvaNon
nestor "Into the Archive" Ingest Guide
WissGrid Research Data Management Blueprint
Relevance of Ingest
Beagrie et al 2008 and others: Ingest is the biggest cost factor, usually up to 50%
It is the first step and therefore of strategical importance
It is nearly always the first quesNon of interested people
„Garbage in – Garbage out“: Ingest is also the biggest risk factor, it is crucial for quality
Complexity of Ingest
Is it possible to have a single ingest standard for long-‐term preservaNon?
The variety of data sources is nearly endless and changing (intell., tech. and org. context).
You have to address most quesNons already at the Nme of the ingest. The usage contexts are nearly unforeseeable.
Main tasks are not tech., but org.
A single standard for an interface or procedure? No. AlternaNves?
nestor Ingest Guide
nestor working group for long-‐term preservaNon standards
Streamline the ingest by providing an introducNon and common working basis
– for archives and informaNon producers/providers – for planing the ingest
"Into the Archive – A guide for the informaNon transfer"
English version: urn:nbn:de:0008-‐20080710002
Design Principles
Do not use the OAIS terminology but be compaNble
Define ingest as transfer of responsibility and not technical
Organize the process in manageable parts
Sacrifice comprehensiveness (good enough instead of everything)
Defines aims instead of strict sequences and instrucNons
Structure
1. Objects
a. SelecNon of informaNon to be archived (intell. and tech.) b. Metadata
c. Significant properNes
2. Processes
a. Transfer packages b. ValidaNon
c. Transfer
3. Management
a. Laws and contracts
b. Ingest agreement and documentaNon
1. Objects
a. Select informaNon to be archived
i. Select intellectual enNNes
ii. Analyse export form
iii. Agree necessary adaptaNons
b. Select metadata
i. Define required metadata
ii. Clarify responsibility for providing the metadata
c. Significant properNes
i. Define the significant properNes
ii. Compile the significant properNes
2. Processes
a. Transfer packages
i. Assignment of the informaNon objects to transfer packages ii. Reconstructability of informaNon objects
iii. IdenNficaNon of the transfer packages
b. ValidaNon
i. DefiniNon of validaNons
ii. Required degree of fulfilment and consequences of non-‐fulfilment iii. Persons involved and tools
iv. Schedule
c. Transfer of data
i. Legal/contractual framework
ii. Technical and organisaNonal possibiliNes iii. DefiniNon and test of transfer work stages
3. Management
a. Laws and contracts
i. IdenNficaNon of corporate bodies under public law and agents ii. DefiniNon of relaNons between producer and archive
iii. ObligaNons concerning archive materials should be known iv. Copyright ascertained
v. RegulaNon of copyright vi. Warranty and liability
b. Ingest agreement and document
i. Binding documentaNon of decisions ii. ReporNng of ingest processes
c. Areas of management
i. Quality ii. Safety iii. Processes
ApplicaNon
E.g. for each preservaNon service level of the
TextGrid Repository (Digital HumaniNes Data):
Service Level Ingest Guide, e.g. Bitstream
Preserva8on Preserva8on Content Data Cura8on
1.a.i) Analyse
export form all kind of files only supported formats content quality sufficient? 1.b.i) Define
required metadata administraNve metadata (more) technical metadata content/intellectual metadata 2.a.i) Mapping of
objects to transfer packages
use templates for the TextGrid object model or mets package descripNon from the subminer
Overview
MoNvaNon
nestor "Into the Archive" Ingest Guide
WissGrid Research Data Management Checklist
Overview
MoNvaNon
nestor "Into the Archive" Ingest Guide
WissGrid Research Data Management Checklist
Comparison
WissGrid Research Data Checklist nestor Ingest Guide
Planning and CreaNon –
SelecNon and Appraisal 1a) selecNon of informaNon
Ingest All secNons esp. 2) processes
Storage and Infrastructure ParNally in 1a) analyse export form and 2c) tech. opNons for transfer PreservaNon Planning and AcNon 3a) obligaNons concerning archive material 1c) significant properNes and
Access and Reuse –
Management, OrganisaNon, Policies 3c) management (processes) and 3a) laws and contracts Costs and Staff 3c) management (costs and risks)
Law and Ethics 3a) laws and contracts
Metadata 1b) selecNon of metadata
Topics for Discussion 1
Although the nestor Ingest guide focuses only on one topic of the lifecycle it already covers a lot of points of the WissGrid checklist.
Could the ingest guide be of use for research data? What is missing? Hypothesis:
• Classic memory insNtuNons are (usually and currently) too focused on the end products to curate research data.
• Therefore their instruments are not well suited for the earlier phases of the research life cycle.
• The planning instruments address different parts of the life cycle and different disseminaNon levels.
Could the checklist and the lifecycle model be of use for classical memory insNNons?
Private Domain Access Re-Use Group Domain Data management Persistent Domain extended Data management acces restrictions nestor Ingest guide?
Wissgrid Research Data Management Checklist?