Chapter 3: Data curation: A sustainability practice
3.7 Significant data curation models and frameworks
3.7.4 Digital Curation Centre curation lifecycle model
The Digital Curation Centre (DCC) has developed a curation model which illustrates the stages required for successful data curation. Like the data lifecycle (see Figure 3.3) this model is continuous, that is, it repeats itself in an ordered sequence. This DCC lifecycle model incorporates the functions of the holistic approach to data curation. This model can be used as generic guidelines within an organisation to ensure the necessary steps for effective data curation are completed in an orderly manner. The DCC lifecycle model aids in defining roles and responsibilities, building on a framework of standards and technology to be implemented (Data Curation Centre, 2008a).
The DCC curation lifecycle model complements, rather than restates, a number of standards and can be used in conjunction with other related reference models, standards and frameworks. A standard such as the OAIS (ISO 14721:2003) and other standards of the International Organisation of Standardisation (ISO) can be used to enhance the identification of processes, workflow design and other management tasks within the DCC curation lifecycle model (Higgins, 2008:135).
70
Figure 3.12: DCC curation lifecycle model (Higgins, 2008:136)
This lifecycle model comprises three groups namely, full lifecycle actions, sequential actions and occasional actions. As illustrated in Figure 3.12 (above), full lifecycle actions are shown inside the sequential actions (as discussed further in Section 3.7.4.1). These activities take place at any time during the digital curation lifecycle. Sequential actions are steps that are repeated to ensure the best practices of data curation are applied; this sequence may be repeated depending on the data and curation process. Occasional actions are actions which interrupt or reorder the sequential actions as a result of a decision (Data Curation Centre, 2008a).
The three groups of actions of the DCC curation lifecycle model as illustrated in Figure 3.12 are discussed in the following sub-sections with the purpose of clarifying the actions required to curate data.
71 3.7.4.1 Full lifecycle actions
Description and representation information (see inner ring surrounding the data, Figure 3.12) involves assigning metadata (administrative, descriptive, technical, structural and preservation). This action is carried out to assign representation information to understand and render digital data and associated metadata, enabling data to be used and reused. Preservation planning (ring surrounding description and representation information), plans for preservation throughout the data curation lifecycle and includes plans for management and administration of data. Community watch and participation (ring surrounding preservation planning) are implemented to monitor or watch appropriate communities and engage in the development of shared standards and tools. Data curation and preservation actions (ring surrounding community watch and participation) caution communities to manage and assign administrative actions planned to promote curation and preservation (Data Curation Centre, 2008a).
3.7.4.2 Sequential actions
Sequential actions in the DCC curation lifecycle model (Figure 3.12) form the basis for active data curation and guide the data curation process in an orderly fashion. Higgins (2008:138) identifies the following as part of the sequential actions in the DCC lifecycle model:
Conceptualise: Plan the creation of data, and include methods of capture and storage options
Create and receive: The create metadata process preserves metadata from the time of creation, and receives data from data creators, archives, repositories and data centres
Appraise and select: Data for long-term curation and preservation is selected in accordance with policies and guidelines
Ingest: Requires the transfer of data to an archive, repository, data centre or other custodians, while adhering to guidelines, policies and other legal requirements
72
Preservation: Actions undertaken to ensure long-term preservation of the authoritative nature of data, these actions should ensure data remains authentic, reliable and usable, but at the same time maintains integrity
Store: Data is stored in a secure manner, while maintaining the relevant data standards
Access, use and reuse: Ensure access to data for designated users is maintained
Transform: Creation of new data from the original data through migration of formats or creation of subsets from new derived results
These sequential actions are at the heart of data curation activities and similar actions can be found in the OAIS functional model (see Section 3.7.2). These sequential actions have a logical orderly flow of actions, hence the name sequential actions. Data curation requires such logical processing to maintain and add value to data effectively.
3.7.4.3 Occasional actions
There are three actions in this third category of the DCC curation lifecycle model which are located outside of the sequential actions, namely dispose, reappraise and migrate (see Figure 3.12). Data that has been selected for long-term storage needs to be disposed in accordance with policies and requirements. To reappraise data, data which fails the validation procedures must be returned and reintegrated into the cycle for reselection. Finally data needs to be migrated into different formats in accordance with the storage environment (Higgins, 2008:138).
Occasional actions are only prompted under certain circumstances. These actions are equally important for the long-term survival of data. Policies are necessary for an efficient data storage facilitation and the data must conform to such policies agreed upon. Some data may need to be reappraised, as it does not comply with the requirement and policies of the storage facility.
73
The DCC lifecycle model offers an outline of actions needed to preserve data. These actions are compatible with the likes of the OAIS functional model and other frameworks and it is fairly universal in its approach. The DCC offers a unique but active lifecycle approach to data curation.