• No results found

Digital Image Archive of Medieval Music Phase 2: Technical Appendix

N/A
N/A
Protected

Academic year: 2021

Share "Digital Image Archive of Medieval Music Phase 2: Technical Appendix"

Copied!
6
0
0

Loading.... (view fulltext now)

Full text

(1)

Digital Image Archive of Medieval Music Phase 2: Technical Appendix

Paul Vetch, DDH

Overview

Technical research and development for this phase of the DIAMM project has afforded an opportunity to review and respond to the many years of feedback gathered from end users of the resource, which has been online in a substantially unchanged form (particularly in terms of its functionality) since 2004. One of the core aims of this phase was to investigate ways to increase the overall accessibility of the DIAMM archive and its metadata - which has, in the past, only been fractionally represented online, and has in addition structured the contents of the database in an archive / manuscript oriented way rendering it difficult to use effectively for those users unfamiliar either with DIAMM’s scope of coverage, or with the disposition of music manuscript holdings across the world. Thus for a user with (for example) a

palaeographical interest in the archive, DIAMM has been difficult to approach. The new version of DIAMM has been developed specifically to offer users both the flexibility to construct different views of the data according to their research needs, and the ability to interrogate the database from a number of different disciplinary perspectives.

Development work for this phase of DIAMM was very substantially delayed at DDH; early staffing problems meant that work began much later than expected and as is to be expected on a project of this complexity numerous issues arose once work began which made it impossible to squeeze the same amount of development effort into a shorter period of time. DDH has since changed its working and management practices to avoid such problems arising in the future, but the additional amount of time spent on the project – whilst by no means desirable – has at least allowed for a much longer period of user engagement and testing, and the feedback we have received has been reflected in our work. In addition we can be confident that the finished product is technically as up-to-date as it could possibly be.

1.

Technical Infrastructure

The underlying database and infrastructure requirements of the DIAMM project have to date represented a number of challenges and idiosyncracies which this phase of the project has allowed us to address.

• The DIAMM master database (MDB) has been (and will continue to be) a working environment,

and as such it contains a significant body of procedural material and metadata not intended for public display. The master database needed therefore to be able to continue to exist and develop somewhat independently of the website. The master database is a standalone FileMaker Pro system and needed to remain in this format.

• The process of updating the DIAMM website has, to date, involved significant manual

intervention by staff at both Oxford and DDH, including a number of different technologies, and was previously possible only on a quarterly or occasional basis.

• DIAMM has previously been structured around the concept of ‘Source’ records (i.e. data has in

the past been recorded and related at the level of a complete manuscript), and this is how the previous DIAMM website exposed it. For this phase of the project, the database structure was substantially altered to model ‘Items’ within sources; that is, instances of works of music within source manuscripts. This represents a radical change from the database, from being primarily structured around physical objects, to being conceptually structured around works of music. However it also makes the DIAMM database model more generic, and potentially better suited to storing manuscript materials of any kind.

(2)

The key aims of the low level technical infrastructure work have been firstly to plan and implement a new mechanism for harvesting data from the master database for publication on the DIAMM website; to model and develop a completely new database model for the DIAMM web application based around the need to accommodate data at ‘Item’ level, and to expose a much greater subset of the data within the MDB; and to develop a streamlined workflow for the delivery and preparation of images.

1.1 Database Modelling

The substantial redesign of the DIAMM MDB necessitated a matching redesign of the surrogate database held at DDH for the web application that went some way beyond the simple addition of new fields and tables (although the new database naturally contains a significantly greater body of data, arranged across many more fields and tables, than did the previous verson).

As a result of this an early decision was taken to build the new DIAMM web application from the ground up, with a from-scratch surrogate database model built to allow the Item level data to be properly

managed, and which also allowed for some of the more complex codicological scenarios and relationships inherent to the DIAMM MDB model. Accordingly the surrogate database now handles discrete

manuscript parts which have been subsequently bound together; or the virtual aggregation of items which logically belong with one another but which have been separated at some point in the past (as for

example in the case of some of the Part Books), by means of new grouping structure known as a ‘Set’. At present there are seven distinct types of set record, but the model has been designed to allow data to be grouped either arbitrarily, or by other criteria, if required in the future. The same concept has also been used in the development of the new annotation functionality.

1.2 Database Interoperability and Updating

As noted the existing DIAMM MDB is a vitally important instrument not only in terms of the archive data it contains but also as a means of sustaining the project, generating reports and custom views of the data allowing management oversight. Once the changes to allow for modelling of item-level data were made during the first year of the project the basic structure of the database is now expected to remains unchanged, but at table and record level it will continue to need to be extended and adapted to accommodate new data as the project grows.

Historically data has passed from the MDB to the website by means of a scripted process which dumps data from the master into a binary dumpfile. This file was then transferred digitally to DDH, where an upload script was invoked to import the data first to a staging environment, and ultimately to the live website. A complication with this approach has always been that the DIAMM system allows user contributed content (which is stored only in the surrogate database at DDH); thus any changes to record IDs etc (to which user annotations might implicitly or explicitly refer for example) must be

accommodated too.

Initially we had planned to implement further scripting to allow this process to be semi-automated. However a decision was taken to explore the possibility of using a new feature in FileMaker known as ‘External SQL Sources’, which allows a direct low level connection to be made from one database to another. Although implementation of this system proved highly problematic and time consuming owing both to the complexity of the new DIAMM ‘Item’-level database structure and some shortcomings of early versions of the ESS functionality we have nevertheless implemented a direct connection between the DIAMM MDB and the surrogate web application database at DDH. The connection is designed to include only those fields which the web application requires, and, crucially, can be invoked by the DIAMM team in Oxford. Once the process of ‘pushing’ data begins some validation steps occur before the new data becomes visible on the new DIAMM staging server.

(3)

1.3 Image workflow

In addition, the workflow for image management was also unsatisfactory. The previous version of DIAMM involved manual pre-processing of surrogate images from master TIFFs held in Oxford into both thumbnail and zoomable formats. Zoomable images were delivered using the proprietary Zoomify format which slices images into many thousands of tiles for streaming to a Flash-based client. This process was error prone, since it relied on an elaborate filenaming scheme to relate surrogate images to the corresponding database records, and digital transfer of multiple images (particularly the sets of Zoomify images) proved impractical, which meant that images had to be sent in batches to DDH on physical storage media.

During the no-cost extension period of the project we have worked to prepare the DIAMM website for transition to exclusive use of JPEG2000 for the storage and delivery of images. The DIAMM application server now includes a local version of the IIPimage image server,1 compiled with the Kakadu JPEG2000 library.2 The 25,000 or so images which DIAMM has permission to display online have been converted from the original TIFFs into JPEG2000 format, and in combination with the local image server, are used for generating all surrogates, including thumbnails and zoomable images. This approach greatly simplifies workflow in that a single file is now all that needs to be uploaded when a new image is accessioned. This new approach is also more futureproof and flexible in that the JP2 images can be used not only for the existing outputs used on the website but could also support different usages in the future (for example to provide larger sized static images).

The image accession process currently requires JP2 images to be generated before files are uploaded but a new image processing system will be available to all DDH projects during 1Q2012 which will allow images to be uploaded in their native format and then subsequently automate their processing and storage.

In parallel with the switch to JPEG2000 a new image viewer has been developed to support delivery of zoomable images; this is described in 3, below.

1.4 Server infrastructure

The DIAMM application is relatively demanding on server hardware in part because of the size of the underlying database and the complexity of the queries needed to power the browsing and searching functionality. Facetted navigation in particular is well known for being expensive in these terms. During development performance of the application was a major problem and considerable time was spent optimising the application to run more efficiently (as noted in the technical appendix to our second no-cost extension request), although with only partial success, and this was one of the factors that hampered development progress. However, DDH’s server infrastructure was completely replaced in November 2011; DIAMM was one of the first projects migrated to the new systems and performance is now completely satisfactory.

2.

Web Application Redevelopment and Database Delivery

Web application development has focussed both on accessibility, and the modernising of DIAMM both in terms of its public facing functionality and user interface. Functional enhancements have included the design and implementation of faceted navigation, provision of enhanced end user curation tools, and, in general terms, on the exposure of as much if the underlying data to the end users as possible.

                                                                                                                          1  http://iipimage.sourceforge.net/   2  http://www.kakadusoftware.com/  

(4)

2.1 Item Level Data and the DIAMM ‘Record’ concept

The fundamental change to the database model (from ‘Source’ to ‘Item’ level) meant that a quite different approach had to be taken to the display of data from the database. Item level records in DIAMM are largely ‘synthetic’ in that they inherit much of their meaning from their context / related records, and contain, in many cases, little data themselves. Equally, items can, in theory, attach to more than one source record. Likewise the new web application allows access to ‘synthetic’ data for other low level records; viewing information about a Composer for example displays not only basic metadata but also exposes the relationships to that composer (for example to sources or items).

2.2 Faceted Navigaton & Record Display

A key design goal for the new DIAMM web application was to ensure that users not only had access to more of the DIAMM metadata, but to make that data more approachable and findable.

Design planning and discussions with the DIAMM team suggested that the database model was well suited to exposure via a faceted navigation system, which would still allow the previous navigation method (i.e. navigation by source – Country, City, and Archive) but would also make a great deal of other musicological and codicological data available for use as search criteria. The faceted navigation model allows users to freely combine search terms with results displayed in real time. As the user filters through available records search criteria which no longer apply are hidden, allowing the user to narrow their result set down to a manageable number or to explore patterns which emerge when certain criteria are added or removed. In a true faceted navigation system search criteria can also be removed in any order.

In order to develop the new faceted navigation system, the new database model outlined above was expressed in UML and this was subsequently used, together with functional specification documentation, to develop a new layer of Hibernate data access objects to support querying of the new database. A new suite of Java objects was also developed, building on work developed for the much smaller and simpler ‘British Printed Images to 1700’ project (http://www.bpi1700.org.uk), which allowed the data to be presented across the following facets for searching and browsing purposes:

Primary Facets:

• Source

• Composer

• Genre

• Free Text (not global, but across a set of indexed fields)

Secondary Facets: • Provenance • Person • Languages • Date • Clef

Wherever possible, the data for each facet is presented both in autocomplete format (better suited to ‘professional’ users who need quick access to the data, or those who have a sense of the coverage of the database) and in a fully browsable format, allowing a user to explore the full coverage of the system.

(5)

Results are shown as items listed within sources by default, although users may group their results by Composer, Provenance, Genre and Century if they wish. Users can also chose to filter out item records without available images, and to change the number of records retrieved at one time.

Development of the faceted navigation for DIAMM was complicated both by the complexity of the database structure and the fact that ‘Item’ level data is not yet available for all of the sources in the database. The new web application is designed to tolerate this.

2.3 Traditional Searching

For performance and usability reasons the number of criteria made available in a faceted navigation context needs to be limited; the facets that were selected were chosen based on their general applicability across a large percentage of the dataset. A larger set of fields is available for querying via the enhanced ‘Advanced Search’, which provides a powerful traditional search type tool which allows for both very simple (single criteria) searching all the way up to complicated Boolean searching with up to five sets of criteria specified from any one of 23 different fields. The form has been designed to be as approachable as possible, thus by default the user is presented only with a single fieldset, and can add more if they wish to construct a more elaborate search.

Most fields are delivered with autocomplete functionality, which is designed to guide users towards more useful searches (by preventing insertion of strings of text that don’t exist), but it is possible for users to ignore the suggestions and enter free text if they prefer.

As with the facetted browse, users can choose to only return results that contain images and to list results in Item or Source format.

2.4 Bibliography Search

A new, separate search functionality for Bibliography has been newly developed. Bibliography items are cross linked to the records which cite them and vice versa, so this information is now fully integrated into the DIAMM web application.

2.5 End User Functionality

The previous version of DIAMM allowed users, when in the Image Viewer, to add private and public annotations, and also offered a separate transcription function. The new system greatly extends these capabilities and also integrates them more seamlessly into the web application as a whole. Users may still create and edit annotations or transcriptions when in image view (which is now done via a ‘just in time’ editing system which is easier to use. In addition, it is now possible to create custom ‘collections’ of records which can be saved and associated with a user’s account for later use. This functionality is somewhat analogous to that of a lightbox on a stock photography website or a recipe binder on a food website. Users are able to collect a variety of different objects together, such that a collection can contain images, source records, item records, etc. This functionality has been deliberately designed to allow for further experimentation and extension.

3.

Image Delivery

As noted above the image workflow for DIAMM has changed and this has necessitated a different delivery mechanism for zoomable image delivery. The IIPImage software package we are using can be partnered with a JavaScript client called ‘MooViewer’,3 which functions in much the same way as                                                                                                                          

(6)

Zoomify conceptually, requesting image tiles from the IIPimage server which are dynamically generated and then streamed to the client browser, where the viewer reassembles the image for display.

Partly because the MooViewer client is written using a JavaScript framework which is no longer current, we decided to rewrite it in JQuery,4 partly to make it more sustainable but also to add some specific additional functionality both to make the viewer more generally useful and to allow for its graceful introduction to DIAMM (and other projects). In particular we have introduced support for four different zoomable image formats: IIPimage, Djatoka, DeepZoom, and Zoomify. For this reason our viewer is called JQuery OmniViewer, and it is available as an Open Source tool, with documentation, in the

Assembla code repository.5 As with MooViewer, OmniViewer supports haptic input devices such as the

iPad / iPhone, in addition to synchronised scrolling / panning across multiple images (implemented in DIAMM), and templating support.

OmniViewer’s support for multiple zoomable image formats allows it to be easily introduced to existing projects, including those where it is not practical to setup an image server. In particular we have designed the viewer to allow an upgrade path for users who have made a substantial investment in Zoomify images and who wish to use a non-Flash client but do not have the resources to reprocess their images. This functionality has allowed us to smoothly introduce the viewer for DIAMM initially using existing Zoomify tiles during the process of migrating to the new image format.

4.

Project Research & Development Team

• John Bradley (Consultant)

• Bea Caballero (UI Developer)

• Elliott Hall (Lead Application Developer)

• John Lee (Application Developer)

• David Little (Lead UI Developer)

• Matteo Romanello (Client-side Developer)

• Dr Charlotte Tupman (Project Analysis & Coordination)

• Paul Vetch (Project Manager)

• Raffaele Viglianti (Application Developer)

• Tim Watts (Systems Manager)

                                                                                                                          4  http://jquery.com/  

5  http://www.assembla.com/spaces/jquery_omniviewer.    OmniViewer  will  shortly  be  added  to  DDH’s  Github  

References

Related documents

The PROMs questionnaire used in the national programme, contains several elements; the EQ-5D measure, which forms the basis for all individual procedure

This paper provides outcomes from an evaluation of a federally funded program combining HIV prevention services with an integrated mental health and substance abuse treatment

Senior Seminar, Systems Analysis & Design II, and Management of Technology/ Case Study courses. Percentage of schools requiring courses from the IS 2002 categories as defined

proyecto avalaría tanto la existencia de una demanda real e insatisfe- cha de este servicio por parte de la población titular de derechos como la capacidad de ambos

• Follow up with your employer each reporting period to ensure your hours are reported on a regular basis?. • Discuss your progress with

Top panel: samples from a MC simulation (gray lines), mean computed over these samples (solid blue line) and zero-order PCE coefficient from the SG (dashed red line) and ST