The near future - THE HARVESTING MODEL AND SERVICES BASED UPON IT

6.6 ‘Mopping up’: how to serve authors who have nowhere to self archive

7. THE HARVESTING MODEL AND SERVICES BASED UPON IT

7.5 The near future

When considering the near future of services, there are some observations that can be made with (relative) confidence.

The ePrints UK, Portal-in-a-Browser and Google models are not mutually exclusive. ePrints UK have a system in place that could and should be launched, as a simple service, very quickly. Work on developing OAI, SRW/SRU and RSS interfaces to add to the service could continue in the meantime. The Google model may offer an alternative point of entry to scholarly resources that is likely to prove popular with many users.

ePrints UK discard the harvested full-text documents, once the full text has been analysed and the metadata have been updated. However, if a fully integrated service is to be considered, then the full text could not only be harvested and used to support added-value services, but the full text could be used as the basis for preservation. There are obviously rights issues to be considered if this latter opportunity is to be taken, but ongoing developments in the specification of rights over OAI-based resources are currently being considered by the OAI-Rights Working Group. In the short term, the most likely rights statements applied to resources such as e-prints will be Creative Commons licences. The nature of these licences means it is unlikely that

rights statements will indicate whether ‘storage of a resource at a service provider for the purposes of preservation’ is explicitly allowed or forbidden. Nevertheless, as long as the conditions and restrictions explicitly stated in the licence are not overlooked, maintaining a copy of the full text purely for the purposes of preservation should be permissible (in fact most creative commons licences allow distribution, but it may be more appropriate for a preservation service to direct downloads to the original data provider rather than fulfilling these requests via the service provider’s copy). However, if there is doubt about the legal position, then agreement should be sought from individual data providers that a centralised service would be permitted to maintain and preserve resources in this way.

Recommendations and guidelines already made by ePrints UK should be pursued and promoted, and the findings of RoMEO, SHERPA and other projects should be incorporated into services as they develop.

With regard to the issue of poor quality of metadata, once metadata have been harvested into a national store, it becomes possible to examine those records in any number of ways. For example, the metadata can be used to carry out automated surveys on:

• Subject – schemes and keyword terms used; to assess the challenges to

be addressed in achieving a harmonised scheme (or schemes)

• Format – in what formats are digital objects stored; build a practical

‘real world’ list of formats that should commonly be supported and identify risks to preservation likely to be caused by proprietary or obscure formats

• Type – how have archives categorised their digital objects by type?

• Identifier – how many resources have been assigned persistent

identifiers? How many have not? Long term access to these resources is at risk if persistent identifiers have not been assigned

Once such surveys have been carried out, the resulting information can be utilised to inform further developments in improving the quality of metadata exposed, which is key to providing effective browsing and advanced searching facilities.

One issue that needs further attention is the identification of duplicate

resources. At the time of writing, duplication of resources is scarcely an issue compared to the need to populate repositories in the first place. However it is an issue that will need to be practically resolved, as repository populations grow.

There are several types of apparent duplicates:

• Different formats of a resource

• Mirror copies of a resource in different locations

• Creation of duplicate records or submissions of duplicate resources

within a repository

• Records from multiple data providers identifying a single resource in a

single location

As discussed in section A.3.4.1, records relating to revisions and formats of a resource should be grouped together, either as loosely bound separate records or as a single structured record. The same approach would also be

appropriate for mirror copies.

The creation of duplicate records or submissions of duplicate resources within a repository are problems that should be addressed locally by:

• Repository administrators

• Addition/enhancement of duplicate checking algorithms within the

repository software

Giving consideration to records from multiple data providers identifying a single resource:

• Creating algorithms using ‘fuzzy’ matching to identify duplicates is not

a new challenge

• The multiple records should be amalgamated into one record

• In cases where some data providers have enhanced the record they are

exposing, this will represent an opportunity to enhance metadata rather than a crisis

As experience with Dspace has shown, it is possible to create and expose functional OpenURLs. While service providers can provide these metadata, in the longer term it would be better if the creation and exposure of OpenURLs were carried out by the data providers.

As the discussion of existing service providers revealed earlier in this document, some features that are deemed desirable and achievable are personalisation, annotation, alerting and linking to related documents in search results.

The Open Access movement and the OAI are both rooted in cooperative and collaborative philosophies. Therefore one certainty is that cooperation and collaboration will be important keys to the development of a future

Information Environment.

In particular, archive software developers from different projects need to get ‘round the table’ and to work together. While administrators of archives can

contribute to improvements within their own domains, major steps forward across the board are more likely to be achieved if fundamental improvements are built into archive software packages themselves. Agreement on the

labelling of resource types, a set of common high level subject terms, introduction of standard collection descriptions and the adoption of web

services that enhance metadata at the time of submission – any or all of these would be helpful.

Future developments to any proposed services will be both evolutionary and revolutionary. While the evolutionary can be predicted, at least to a certain extent, the revolutionary presents an unknown: who knows what paradigm shifts may occur in the next three, five or ten years that will render much of our current thinking invalid?

Despite these reservations, the next section makes a few tentative observations on ‘what might be’ at sometime in the future.

In document Delivery, Management and Access Model for E prints and Open Access Journals within Further and Higher Education (Page 66-69)