• No results found

Suitable file formats for transfer of digital records to The National Archives

N/A
N/A
Protected

Academic year: 2021

Share "Suitable file formats for transfer of digital records to The National Archives"

Copied!
7
0
0

Loading.... (view fulltext now)

Full text

(1)

Suitable file formats for

transfer of digital

records to The National

Archives

The National Archives

September 2011

© Crown copyright 2011

You may re-use this information (excluding logos) free of charge in any format or medium, under the terms of the Open Government Licence. To view this licence, visit

www.nationalarchives.gov.uk/doc/open-government-licence/ or e-mail:

psi@nationalarchives.gsi.gov.uk .

Any enquiries regarding the content of this publication can be sent to

DigitalPreservation@nationalarchives.gsi.gov.uk

Where we have identified any third party copyright information you will need to obtain permission from the copyright holders concerned.

(2)

1. Introduction

1.1 What is the purpose of this guidance?

This guidance sets out the range of digital file formats that The National Archives can currently sustain over the long term. We are able to accept these formats from public record bodies, preserve the information they contain and provide access to future policy-makers and researchers.

Following this guidance will help you comply with The National Archives requirements for transfer of digital records. It will also help you ensure that the information which supports your own business requirements remains complete and available to you for as long as you need it.

The National Archives does not stipulate which file formats you should use for the creation of public records. This decision must be made on the basis of your own business requirements; however, we strongly recommend that requirements for sustaining access to your records are considered as an integral part of evaluating file formats for use in your organisation. If you have selected digital records for transfer to The National Archives that are not in a format we can c urrently accept, please talk to us before you take any other action (see section 4 below for a discussion of the available options).

This guidance applies to the transfer of digital records and information to The National Archives. It excludes the transfer of government websites and datasets as part of the UK government web archive. This activity is covered by separate guidance1.

1.2 Who is this guidance for?

This guidance is primarily aimed at information managers and IT managers who need to assess the use of file formats in different business situations across their organisation.

2. What is a file format?

A file format is a method of storing digital information in a computer file, allowing its later use by computer systems or people. There are thousands of different file formats for different kinds of digital content and there may be several different versions of the „same‟ file format.

A file format is often confused with the software most commonly used to create or use it. For example, we talk about „Microsoft Word‟ files, or „Acrobat PDF‟ files. Despite these naming conventions, in principle a file format is not bound to any particular software – even if this is sometimes the case in practice.

1

(3)

A file format is like a language which is only spoken by certain pieces of software. In general, the greater the number of languages, and the fewer speakers there are for each, the harder it is to maintain the ability to read and understand them over long periods of time.

3. Choosing file formats

Before you select file formats for use within your organisation, you will need to understand your requirements:

Business requirements, for example, your need to record, read, work with, share or publish information now and in the future

Technical requirements, for example, compatibility with your technical environment; your ability to provide supporting storage and infrastructure services

Legislative and policy requirements, for example: the Public Records Act, S46 Code of Practice2, Government IT strategy3, the Data Protection Act, Public sector transparency agenda

Having understood your requirements, you should evaluate your file formats in this context. We suggest that you use the following criteria for your assessment:

Capability – how well the format meets your requirements Resilience – how resilient the format is to time

Bear in mind that you may require access to your digital records over very long periods of time. If your records are selected for permanent preservation, they will also need to meet The National Archives technical criteria for transfer.

Flexibility – how well the format can adapt to changing requirements

Consider the ability of your formats to adapt to your changing business requirements, and to changes in the technical and legislative environment in which you operate. For example, the Freedom of Information Act and the Transparency agenda have both introduced new requirements for maintaining and providing access to public sector information.

The National Archives‟ guidance on Evaluating File Formats provides a full discussion of evaluating file formats against your business and technical requirements4.

This guidance will focus on choosing file formats that support long-term preservation; however, the same recommendations will help you to maintain the continuity of your digital information for business use in the short and medium term .

2

Lord Chancellor‟s Code of Practice on the management of records issued under section 46 of the Freedom of Information Act 2000

3

The Government IT strategy, published by Cabinet Office in March 2011 states “ The Government believes that citizens should be able to read government documents with the standardised document format reader of their choice. The first wave of compulsory open standards will det ermine, through open consultation, the relevant open standard for all government documents.”

www.cabinetoffic e.gov. uk/sites/default/files/reso urces/uk-government-government-ict-strategy_0.pdf

4

(4)

4. File formats for long-term preservation

The digital environment is continuously evolving: new formats appear, older formats undergo revisions or version enhancements, supporting devices and infrastructure can change beyond recognition in a matter of a few years. While some formats have stood the test of time, others have become increasingly difficult and costly to maintain. In general, the formats that have proved difficult to sustain are generally those that were not widely adopted, were poorly supported, evolved rapidly or were heavily platform restricted.

The sheer number of different formats in existence makes it economically unsustainable for The National Archives to commit to managing and maintaining access to all formats in perpetuity. While the individual bytes of a digital file can be preserved relatively

easily, the ability to access and understand the information contained in those bytes will be lost if the required software and technical environment are no longer available. The challenge increases where the file format specification is not available – either because it is proprietary or was never formally defined.

The level of resource that would be needed to provide long-term access to an ever-expanding list of file formats prohibits this being a viable option. The only reasonable alternative is for The National Archives to limit the range of formats we can accept and maintain. This strategy enables us to continue to deliver our core task of providing access to the digital public record, whilst operating within our budgetary constraints. The National Archives has identified a list of file formats that are currently within our technical and budgetary capacity to receive, maintain and make available. When using this list to inform your choice of file formats for use in your organisation, you should note the following:

The list will grow as new formats emerge, our understanding of existing formats develops, and the available tools and technologies improve.

Although unlikely, formats may be removed from the list as new risks are discovered or formats are superseded.

If you have records in formats that are not on the list, you should not convert the files for the purpose of transfer to The National Archives: your format may be in the process of being added to the list; the new format you select may be at risk of being removed from the list; the process of conversion may place the authenticity of the records in question unless appropriate procedures are followed. In this situation, please contact us to discuss the options before you take any action. If you have formats that you believe should be added to the list, please contact

us.

The list of file-formats that The National Archives is currently able to accept is given in Section 6 below. This list is reviewed regularly. Please refer to The National Archives website5 to make sure you have the latest version of this document before making any decisions based on this list.

5

(5)

5. Transferring digital records to The National Archives

When appraising digital records prior to transfer, you should establish the range of file formats contained within the selected records. You will need to send us this information at an early stage of the digital transfer process. File format identification tools such as DROID6 can assist you in this. In addition to identifying the base file format, DROID will distinguish between different versions of the format, for example it can distinguish fifteen different format versions which all use the extension .doc

If records which have been selected for preservation include files in formats which do not appear on our list of suitable formats for transfer you should discuss this with us at the earliest opportunity. Formats that are not listed will need to be assessed on an individual basis to determine whether they can be transferred, or whether you (or your sponsoring department) will need to preserve them until acceptable arrangements can be made for transfer, accessioning and access at The National Archives. Factors such as the format, version, content of the records and number of files will influence this discussion.

Regardless of format, The National Archives will not accept records if there is anything that would prevent full and safe access to the data, for e xample:

computer viruses, worms or associated malware full or partial encryption of the data, including DRM full or partial password protection of the data full or partial corruption

All digital records selected for transfer should be checked for these issues alongside the sensitivity review process and we will require confirmation from the transferring

department that this has been done.

6. Acceptable File Formats for Transfer: version 1.0 September 2011

The list that follows does not detail different versions of individual file formats; however, in certain cases, the particular version of a format may affect our ability to accept the files. We will require a comprehensive list of file formats, including version information, from you at an early stage of the digital transfer process.

6.1 Document and Text Formats

Open Document Text .odt Open Office Write .sxw Microsoft Rich Text Format .rtf Plain Text .txt

6

DROID is a free and open source file format identification tool, developed by The National Arc hives (www.nationalarchives.gov. uk/droid).

(6)

Portable Document Formats .pdf

HTML .htm .html eXtensible Mark-up language .xml

Microsoft Word, Works and Wordpad .docx .doc .wps .wpd Word Perfect .wpd

6.2 Spreadsheet Formats

Comma Separated Values .csv OpenDocument Spreadsheet .ods OpenOffice Calc .sxc Lotus 1-2-3 Worksheet .wks Lotus Symphony Spreadsheet .wrk .wr1 Microsoft Excel .xlsx .xls Microsoft Works Spreadsheet .wks

6.3 Presentation Formats

OpenDocument Presentation .odp OpenOffice Impress .sxi

Microsoft PowerPoint .pptx .ppt Harvard Graphics Presentation .shw

6.4 Graphic Formats

JPEG 2000 .jp2 JPEG .jpg Tagged image file format .tif OpenOffice Draw .sxd OpenDocument Graphics .odp Portable Network Graphics .png Graphics Interchange Format .gif Windows Bitmap .bmp Encapsulated Postscript .eps Corel Draw .cdr Adobe digital negative .dng Microsoft Visio Drawing .vsd

6.5 Audio Formats

Waveform Audio Format .wav Broadcast Wave Format .wav Ogg Vorbis .ogg .oga

(7)

Windows Media Audio .wma MIDI Audio .mid MPEG 1/2 Audio Layer 3 .mp3 MPEG-4 .mp4

6.6 Video Formats

Motion JPEG 2000 .mj2 Ogg Theora .ogg .ogv Windows Media Video .wmv Macromedia Director .dir Macromedia Flash .swf Quicktime .mov MPEG-4 .mp4

6.7 Email Formats

Internet Message Format .eml Microsoft Outlook Personal Folders .pst Microsoft Outlook Email Message .msg

6.8 Other Formats

Microsoft Project .mpp

7.

Further reading

Digital Continuity service

For information and guidance on maintaining the usability of digital records over time. www.nationalarchives.gov.uk/digitalcontinuity

Digital Transfer

Guidance for departments preparing and transferring digital records to The National Archives.

www.nationalarchives.gov.uk/documents/information-management/digital-transfer-guidance.pdf

DROID

Free, open source, file profiling and characterisation tool developed by the National Archives.

www.nationalarchives.gov.uk/droid Sustainable digital preservation

Report of the Blue Ribbon Task Force on Sustainable Digital Preservation and Access. www.jisc.ac.uk/publications/reports/2010/blueribbontaskforcefinalreport.aspx

References

Related documents

4 (c) shows optical micrographs of friction stir weld AA6061 at heat affected zone from the micrograph it can be observed that grains are slightly elongated. 4

They are described more into detail in the following subsections (4.2.4 and 4.2.5).. This lifecycle shall exist at various product life cycle stages, at the level of product

Such a collegiate cul- ture, like honors cultures everywhere, is best achieved by open and trusting relationships of the students with each other and the instructor, discussions

Cincturing in comparison with PCPA in this trial tends to increase the fresh and dried fruit yields of Zante currant, suggesting that the increase in currant production in the

The summary resource report prepared by North Atlantic is based on a 43-101 Compliant Resource Report prepared by M. Holter, Consulting Professional Engineer,

As a part of the standardization activities, a survey was distributed to the CubeSat community to collect the satellite developers’ experience and desires regarding the

Inverse modeling of soil water content to estimate the hydraulic properties of a shallow soil and the associated weathered bedrock.. Ayral