UK Web archive user statistics

Part II: User access

2. UK Web archive user statistics

Public knowledge of the existence of web archives appears to be very low, and knowledge of the UKWAC and UK Web Archive collections lower yet. This is consistent with the status of, and relatively low publicity for the archives, which characterizes the reduced use of Web archiving collections. It is reasonable to assume that the use of Web archiving will increase dramatically when Websites start to be harvested on a larger scale (i.e. if the need for archiving permissions is removed) and Web archiving initiatives become better publicised. It is important to stress that, because much of the use of Web page archives is likely to be in the distant future, the majority of prospective users of such archives have probably not yet been born. As pointed out before, Web archiving programmes are still relatively young, and because of being on a pioneering phase, issues of public access seem to attract little interest from the part of Web archiving institutions. However, the relatively low usage of the archives today must not be taken as evidence that Web archiving is not valuable and that it is not going to be more popular in the future.

The analysis of Web access statistics is notoriously complex, depending as it does on many definitions and on data gathered automatically by Web analytics software.

This software often makes decisions which are not controlled by humans, and which cannot consistently measure all attributes of Web archive use. Accordingly, these figures must be seen as approximate. Traffic measurement is complicated further by a unique feature of The National Archives’ collection, namely that visitors to government Websites can be redirected automatically to archived copies of pages of documents in certain situations (if the necessary software has been installed on the Website hosting server). In effect, these visitors become users of the National Archive Web collection, without having made a conscious request to do so, by redirection from a live site where the page no longer exists. Though the archived pages are labelled as such, users may also not realise that they are accessing documents from a Web archive collection. Figures for redirected access are shown separately in the table below, in the column related to ‘automated

redirection’ traffic, providing statistics on Web archive usage in the UK for the first semester of 2009:

Figure 2: Traffic statistics for the UK Web archive and The National Archive

(The National Archives, 2009)

Unfortunately UK Web Archiving institutions do not provide statistics on the number of visitors accessing the collection on a periodical basis. The latest published figures showing public visits to the collections were gathered between February and June 2009, the period when UK the Web archived launched its beta site after the dissolution of the UK Web Archive Consortium. Figures for the collections of The UK Web Archive and The National Archives are shown above, indicating the statistics of automated redirection from live Websites to archived content in the National Archives Web collection.

An early study carried out by UKWAC partners in September and October 2005 by the Digital Preservation Consortium reported about 8,000 unique visits per month.

The above statistics for 2009 represent about 26,500 visitors per month excluding automated redirection visits (i.e. 180 + 704 per day for 30 days per month), indicating a growth of over 300% in under four years. From this we can conclude that traffic volumes are growing, but remain low for a national Internet resource.

These results, however, seems to be consistent with the relative youth of the web archive collections and the limited publicity they have received.

44 2.1 Other Archives

Arguably the most relevant comparator for UK Web archiving institutions is PANDORA, the Australian national Web archive programme, relevant because of its greater wealth of content. The statistics available for user traffic in PANDORA for June 2009, the last period reported for UK Web archive access, shows that the site had approximately 1,000 visitors on a daily basis (excluding robots software), accessing an average of 3.63 pages per day. It is interesting to note that month used here. Apart from making available user statistics on a monthly basis, PANDORA also publishes separate lists of popular search terms. A brief examination shows that among the meaningful search terms input by users, information of local importance figures highly. Examples of the top key terms used by visitors include “Tatiana Grigorieva” (a Russian athlete who took Australian citizenship in 1997), “first families”, “Australian citizenship” and “2000 Olympics.” It is fair to point out that a relatively high proportion of the search terms reflect curiosity about PANDORA itself rather than genuine research on its contents;

roughly half of the top 40 search terms, accounting for one quarter of all search terms, include various combinations of PANDORA, Web archive, etc (The National Library of Australia, 2009). Another interesting information provided by PANDORA is the breaking down of the total number of visitor attributed to different countries.

The UK appears in the statistics provided for June 2009 as the fifth largest source of visits after Australia, Japan, France, and the Netherlands. The US Internet Archive does not publicly report user’s statistics.

Unfortunately, public information on the number of visitors for Web archiving collections in the English speaking countries is not provided by their respective institution or consortia, except for PANDORA which offers comprehensive data on the number of visitors, the countries from where the platform is accessed and the

main search terms adopt by users. By withholding statistic data on use and access, many Web archiving institutions are missing the opportunity to offer evidence to their respective governments on the relevance of Web archive for the general community failing, therefore, to obtaining more visibility and public support for their projects. By the same token, they will also fail to provide relevant information for researchers (historians, sociologists, informational professionals, etc) interested in analysing, now and in the future, the development of Web archiving initiatives.

In document Web Archiving in the UK: Current Developments and Reflections for the Future (Page 42-45)