Methodology for developing the DePICT model

1.3 Research methodology

1.3.1 Methodology for developing the DePICT model

The DePICT model was developed as original research while the author was working on the Planets project⁴ (Farquhar, Hockx-Yu, 2007), the SCAPE project⁵ (Edelstein et al., 2011; SCAPE, nd) and the TIMBUS project ⁶ (Edelstein et al., 2011; TIMBUS, nd). These large scale EU co-funded projects presented an ideal testbed for examining concepts, properties and requirements applied in digital preservation methodologies and tools, and to investigate their information needs and information exchange. The three projects had different foci: interacting preservation services and tools covering the whole of the business-cycle; scalable solutions for large collections or for collections consisting of large, complex or heterogeneous objects; and preservation of processes

4 Planets, a four-year project co-funded by the European Union 2006 – 2010 to address core digital preservation challenges. www.planets-project.eu/

5 SCAPE, a three-and-a-half-year project co-funded by the European Union 2011 – 2014 to address core digital preservation challenges. www.scape-project.eu/

6 TIMBUS, a three-year project co-funded by the European Union 2011 – 2014 to address digital preservation of business processes. www.timbusproject.net/

and third-party dependencies with some specialisation on legal issues affecting digital preservation. Being able to study the field under these different perspectives enriched the model and ensured thorough coverage. At the same time the author was employed by the British Library and the Digital Preservation Coalition, which permitted ready access to content owning experts and practitioners who were willing to test the model and to be interviewed about their collections, their decision making approaches and constraints applying to their digital preservation practice. It also permitted access to large-scale digital collections to understand the properties of a large variety of different content-types. It finally also permitted an appreciation for real-life business processes and pragmatic business needs. The author also served on the PREMIS⁷ Editorial Committee that strives to provide a data dictionary as de facto metadata standard, with which the digital preservation community can capture its digital preservation metadata needs.

Intimate familiarity with the dictionary resulted in the author’s awareness of short-comings of the current solution, her ability to influence changes to the de facto standard, and the ability to closely interact with the user community to understand user needs in practice.

A successful model

• can capture all of the information that needs to be captured to support the functionality required;

• is easily maintained and easily understood by its users by virtue of being slim and tidy, avoiding unnecessary detail and avoiding multiple possible implementations for identical problems;

• encourages interoperability through its clarity of intentions. Different users find it easy to come up with similar implementations for similar problems;

• permits solutions that are natural to the domain and does not require contortions when it is applied;

• is flexible and general enough to accommodate different uses;

• is extensible to increase the level of detail to one that is appropriate to the individual tasks.

The DePICT model’s goal is to cover all core digital preservation functions without limiting itself to particular sub-domains or implementation techniques and technologies.

7 PREMIS, the de facto standard on digital preservation metadata. A data dictionary with associated

In order to develop a successful conceptual model for a domain, it is necessary to have a comprehensive understanding of it. To gain this understanding for the digital preservation domain, a large number of information sources was analysed as summarised in Table 4. Each information source provided one iteration in the improvement of the model. Each information source was studied in detail. Concepts, properties, and vocabulary were extracted. The concepts were categorized and related, and model requirements were identified. They were then compared to the previous iteration’s model, resulting in the addition, removal, combination, refinement or restructure of model elements when gaps became apparent. They were also compared against existing conceptualisations of the domain in order to discover the gaps that currently don’t meet user requirements. The results of this gap analysis are reported in chapter 2.

Wherever possible the DePICT model was aligned with existing models, if not, DePICT would extend the coverage of existing models. This process was continued until the model reached a stable state where the analysis of new information sources no longer resulted in modifications.

Chapter 5 on validation and valuation of the model describes in detail how this methodology was implemented.

Once the stable state was reached, a final, formal conceptual model expressed in UML was created from the collected model requirements. A corresponding appropriate machine-interpretable model as an XML schema was implemented.

The analysis of information sources in the DePICT context allowed for the original interpretation of some particularly interesting issues that had previously been raised in the digital preservation community, but could now be analysed in depth as the entities in question were soundly embedded in a coherent framework. These issues are

• the mismatch between and the relationship of properties that can be extracted from preservation objects to the properties that are used by stakeholders to express their preservation requirements (section 3.1.4.2).

• the role of significant properties (significance constraints) in digital preservation and the relationship between significance constraints and the representation information of the OAIS framework (OAIS, 2002) (section 3.2.3.5).

In a final validation step, the model was used to contribute to the improvement of the PREMIS de facto standard. Again, this is discussed in depth in chapter 5 on validation and valuation.

Table 4: Research methodology approaches

Top-down approaches: Model requirements, refinement and validation

• Create a preliminary model from first principles: what scope, context, and functions in digital preservation should be addressed, and what concepts should be present to support them.

• Analyse the literature for theoretical descriptions of digital preservation conceptual models.

• Analyse the literature for abstract definitions of preservation policies and preservation strategies.

Bottom-up approaches: Model requirements, refinement and validation

• Analyse actual preservation policy and strategy documents drawn from various institution types for their content. They capture many of the concepts that are seen to be important by decision makers.

• Interview decision makers to determine factors that influence their preservation decisions.

• Compile a list of example constraints found in policy and strategy documents and mentioned in expert interviews.

• Study the broad array of preservation services implemented by the Planets project (Farquhar, Hockx-Yu, 2007; Planets, nd). Analyse which information on which concepts is used and produced by them. Perform a gap analysis of which of their aspects are not supported by existing conceptual models.

• Study the interaction of the preservation services implemented by the Planets project.

• Study the constraints expressed in the use cases collected through the Plato preservation planning tool.

• Apply the conceptual model during the design phase of the metadata management component for the British Library’s Digital Library System.

• Study the functional models for digital preservation in OAIS and Planets.

• Learn from existing models, such as PREMIS (2012) and the other work described in the related research chapter 2.

• Engage with the PREMIS user community to determine unmet needs.

• Develop concrete change proposals to the PREMIS data dictionary to test for practical implementability of DepICT ideas.

• Examine how the model fits with the ISO31000 standards for risk management.

Gap Analysis

• Contrast the requirements and the resulting model with existing models, such as PREMIS (2012) and the other work described in the related research section.

Synthesis

• At each step

o Extract relevant concepts, properties, relationships and requirements from the information gained;

o Refine and validate the most current model with the newly found information.

o Align as much as possible with existing models; extend existing models when necessary.

o Update the gap analysis to show where existing models do not meet user requirements.

• Create a final, formal conceptual model in UML.

• Design a corresponding appropriate machine-interpretable model (e.g. XML schema).

Valuation

• Prepare in-depth analyses of particularly relevant issues.

• Contribute to the improvement of the PREMIS de facto standard.

In document DePICT : a conceptual model for digital preservation (Page 42-46)