Network Activity D Developing and Maintaining Databases

(1)

Network Activity D

-

Developing and Maintaining Databases

Report D3.2.2

User Interface implementation

Patricia KELBERT MNHN Paris – BGBM Berlin

(2)

Illustration Index

Index Page... 8

Extract of the advanced search page for metadata ... 9

Extract of the metadata browser page... 10

Advanced search panel ... 11

Fields list available for the advanced search... 11

Query fields for the advanced search... 12

Query field for the simple search... 12

Preliminary results for the search “Malus*” (extract) ... 13

Refining query with a selection of countries ... 14

Preliminary results after refining the query with a selection of countries. ... 15

Summary box ... 15

Extract of the available results page ... 16

Confirmation page for downloading a selection of full details... 17

Detailed results page for the unit Malus Sylvestris 100108842. ... 18

Detailed results from the cache because of an unavailability of the original provider... 18

Selected units (shopping basket approach) ... 19

Preference's page... 20

(4)

1 Introduction

The goal of the NA-D 3.2.2 is to implement the User Interface (UI) for the new BioCASE Portal. Its implementation has to be based on architecture study including BioCASE, DiGIR and TAPIR protocols support, cache query mechanisms and language module support. There also must be a clear separation between the unit-level query system and the metadata-level system, which has to be re-used.

In the last report, the work done for the UI designs and architecture has been described. In brief, we opted for the Internet user interface the TAO1 design, implemented only through CSS files. The tasks that had to be done were :

z developing the web pages with the pythonic object oriented web development framework CherryPy

z developing methods to query the databases

z developing methods to handle the results from the Cache Databases and from the original providers (display and download)

z adapting an existing application to translate templates into different languages

2 Material and methods

2.1 Material

The programming language used here is Python2 (version 2.4). Python is a dynamic interpreted object-oriented programming language, developed as an open source project.

The code has been written within an open source application framework called Eclipse, usually used to build softwares. The version used here is Eclipse SDK 3.1.23, and the plug-in PyDev4 (version1.0.3) has been combined to Eclipse in order to develop our pythonic project.

The whole code has been implemented within the pythonic web development framework CherryPy5 version 2.2.1, combined to the use of KID6 (version 0.9.2) templates.

To deploy the new BioCASE portal, the Apache HTTP Server 2.0.557 has been chosen with the servlet container Tomcat8. Connectors such as mod-python2.49 and jakarta-tomcat-connectors-1.2.1510 make it possible to run CherryPy and Tomcat behind the Apache HTTP Server.

1 TAO design snapshot: http://ww2.biocase.org/synth-gui/GUI-DESIGN/TAO/

2 Python: http://www.python.org/

3 Eclipse Project: http://www.eclipse.org/

4 PyDev Project: http://pydev.sourceforge.net/

5 CherryPy: http://www.cherypy.org

6 KID: http://kid.lesscode.org/

7 Apache HTTP Server: http://httpd.apache.org/

8 Apache Tomcat: http://tomcat.apache.org/

(5)

2.2 Methods

2.2.1 Programming

2.2.1.a Templates

A template engine (or template parser or template processor) is software that processes input text (the template) to produce one or more output texts on a website.

The processing itself generally functions by replacing specific sequences of text with data provided by a model or resulting from more complex operations. It separates code (here Python) from HTML.

CherryPy can work with several templating packages, such as Kid.

...

tmpl = kid.Template(name=tmpl_name,baseurl=self.baseurl, title=title, body=body, trail=trail, navlist=navlist, **kwargs)

return tmpl.serialize(output='html') ...

Table 1.Extract from the python code

...

...

Table 2.Extract from the template code

...

...

Table 3.Extract from the generated HTML code, with self.baseurl = “/synth-ui”

2.2.2 Search

According to the SYNTHESYS cache database and the original providers system, the search for full units details has to be done in at least 2 steps:

1. connection and query to the SYNTHESYS Cache Database 2. connection and query to the original provider

(6)

2.2.2.a Connection and query to the Cache Database The SYNTHESYS Cache is a MySQL database.

The connection to the database is done thanks to the MySQLdb module:

self.connection = MySQLdb.connect(db=dbName, user=user, passwd=passwd, host=host,port=port)

Table 4. Connection to the MySQL-DB

The queries are formed depending on the parameters that are asked. That is to say, not all data are each time retrieved from the DB, but only those the user asked for: the tables from the DB are not joined if information contained in their columns are not relevant for the current query. This provides a gain of time and resources.

The connection properties are set as variable: the connection to another database does not require a modification of the python code, but a modification of a text formatted configuration file.

2.2.2.b Connection and query to the original provider

The connections and queries to the original providers are performed through the wrapper software. The request (in XLM format) sent to the provider depends on the provider's protocol (BioCASE, DiGIR or TAPIR). This part of the code is based on an existing code developed for the bps2 query tool.

The connection and query to the original provider require :

z the protocol's name

c BioCASe

c DiGIR

c TAPIR

z the schema with its concepts

c for example: ■ ABDC2.06 z unitID: "/DataSets/DataSet/Units/Unit/UnitID" z collectionID: "/DataSets/DataSet/Units/Unit/SourceID" z institutionID: "/DataSets/DataSet/Units/Unit/SourceInstitutionID" ■ darwin2 z unitID: "CatalogNumber" z collectionID: "CollectionCode" z institutionID: "InstitutionCode"

z the resource's URL

(7)

2.2.3 Multithreading

A process is a program that is currently executing. Every process has at least one thread running

within it. A thread is a program's path of execution. If only one thread was available, a program would be able to do only one thing at a time. Threads enhance performance and functionality in programming languages by allowing a program to efficiently perform multiple tasks simultaneously. To realize many different connections in the same time, a thread pool is very useful. The thread pool is a queue of idle threads. Thread pooling allows a thread to be assigned to a task and, when the task is completed, to be recycled for use in another task. Because threads in the pool are already up and running, response time is usually reduced. The number of threads in the pool can be fixed to an upper limit to prevent a sudden overloading of the application.

In order to prevent simultaneous multiple connections to the SYNTHESYS cache database, a bounded semaphore is used to lock the access to the database. This lock is acquired by a thread (that can come from the threadpool) and released when the query is executed.

(8)

3 Results

3.1 UI

The new user interface for the BioCASE Portal 2 can be founded at: http://search.biocase.org/synth-ui. The menu provides fast access to :

1. different search types (Unit, Metadata, Map(under development) and Itineraries (under development))

2. the registry (providers, institutions and collections)

3. related information (about the project, acknowledgments ...)

Illustration 1: Index Page

The Index page provides direct access to the unit-level simple search and to the metadata-level simple search.

(9)

The search on metadata is redirected to the old portal, which has been re-used and re-designed with the new TAO style.

Illustration 2: Extract of the advanced search page for metadata

The three kinds of metadata-search are then still available:

z Basic Search, now renamed as Simple Search

z Advanced Search

(10)

Illustration 3: Extract of the metadata browser page

The innovation in the Portal V2 concerns the units level search.

3.2 Search for units

Two kinds of search based on names are available:

z the simple search, where the user only have to enter a taxonomic name

z the advanced search, where the user can refine its query from the start, by selecting fields.

3.2.1 Advanced Search

The advanced search can be used to do a specific query: the user can fill selected fields and then perform a precise search.

(11)

Illustration 4: Advanced search panel

The following filters can be applied:

z only data with multimedia content

z record basis has to be an observation or a specimen or any

By clicking on “Choose field”, the list of fields the user can pick up is deployed:

(12)

For example, if the user is looking for “Malus *” gathered or observed in “France” or in “Germany”, the interface will look like this:

Illustration 6: Query fields for the advanced search

Note: different fields are linked with an AND operator, whereas comma separated text within a field are linked with an OR operator.

3.2.2 Simple Search

The simple search only involves a taxonomic name. The asterisk (*) can be used as a wild card.

Illustration 7: Query field for the simple search

By clicking on “Search”, the user is redirected to the results pages.

3.3 Results

The results are displayed in at least 3 steps:

1. the preliminary results: the list of taxonomic names that match with the user's query 2. the available results: the list of units corresponding to the user's selection (no more than

10 000 units)

3. the detailed results: the whole details from the original provider, if no error occurred, or details available in the cache if the original provider was not available

(13)

3.3.1 Preliminary Results

This first result page makes it possible to refine the current query.

The page is divided in two main parts:

z a summary at the top

z the results formatted in a table take the rest of the page

Illustration 8: Preliminary results for the search “Malus*” (extract)

(14)

3.3.1.a Watching the results

The search returned 99 taxonomic names matching with “Malus*” (ie. starting with “Malus”). It corresponds to 24883 units. 4785 of them are described as Observation in the database and 24 of them are Specimens.

It is possible to see the higher taxa, or the genus, or the families, or the common names, or the countries (of gathering/observation), or the collection's ID or the institution's ID or the record basis of the current query by clicking on the magnifying glass or on the column's name.

Clicking on the calculator will count and display the number of distinct names for the selected category. It is also possible to download the list of names (for a category) by clicking on the download box. A text file is created on the server and is sent to the user.

Only 10 000 results in term of units can be displayed and handled for the next step: user might have to refine its query to limit the number of results or to search only for specific data.

3.3.1.b Refining the query

The user can refine the query by picking boxes within the table's columns and clicking on the “Refine query” button. This action will always display the list of scientific names whose criteria match with the new query.

To refine the query for the units of Malus* gathered or observed in Austria, Bulgaria, Denmark, Finland, France or Germany, the user has to check the corresponding boxes.

(15)

When the user clicks on “Refine query”, the results page is reloaded:

Illustration 10: Preliminary results after refining the query with a selection of countries.

The number of units decreased: only 4854 units of Malus* with our selection of gathering/observation countries exist in the database.

By clicking on the calculators, we get the number of distinct names for each category:

(16)

As there are less than10 000 units, we can select the 12 scientific names and see the corresponding results. (“Select All” and then “See Results”). A new page is loaded, with more details coming from the cache.

Note: It would have also been possible to select our 6 countries and directly click on “See result”

3.3.2 Available Results

Illustration 12: Extract of the available results page

(17)

There is a summary box again, with the query executed, the number of hits returned, the number of data containing multimedia content and a link to the map illustration of these units.

The headers of the table (here TaxonName, Country, CollectionID and Institution ID) can be selected in the Preference's page.

There are 3 different ways to access the full details for units:

z see the details for a single unit

z see the details for a group of units

z download the details for a selection of units

3.3.2.a See details for a single unit

To see the details from the original provider for a single unit, the user only needs to click on the taxonomic name he is interested in. Each name is linked to the triplet unitID-collectionID-institutionID. This action will load a new page, called “Detailed Results”.

3.3.2.b See details for a selection of units

The user can select some units by picking the left boxes, and then click on “» » See the details for the selected units”: the detailed results page is loaded and a kind of shopping basket is added.

3.3.2.c Download details for a selection of units

The user can select some units by picking the left boxes, and then click on “» » Download the details for the selected units”. A confirmation page is displayed:

(18)

The user can cancel his choice, or confirm it: as the data are retrieved in a background task, and as it can take some time, it is possible to get an email when the job is done. A control is used to prevent abuses from robots (here the user has to copy the text contained in the picture Abies alba in the text field). Note: giving an email address is optional

Once the form is validated, the ZIP file is created and filled with the non-formatted detailed data (i.e. pure XML).

3.3.3 Detailed Results

If the original provider is available and returned a well formed document with the details, these data are displayed.

Illustration 14: Detailed results page for the unit Malus Sylvestris 100108842.

If an error occurred with the original provider (unavailable, not well formed document, time out, no data corresponding to the triple ID...), all the data stored in the cache database are displayed, with a notice informing that they do not come from the original provider.

(19)

Illustration 15: Detailed results from the cache because of unavailability of the original provider

With the shopping-basket approach, the user does not need to go back to the previous page, he only needs to click on a name stored in the selection list to see the details for another unit:

Illustration 16: Selected units (shopping basket approach)

3.4 Internationalisation

The portal can be used for the moment in 3 languages:

z English

z German

(20)

The templates are generated one time, and the application calls the template associated to the current language.

The different templates are generated by an ANT application, usually used to translate the ABCD and DarwinCore XSLT into different languages.

This translation tool is based on a text file with keyword = value and a build.xml that does the job. For the moment, the translated files have to be copied in the good folder manually.

The portal's language can be changed by going in the Preference's page, selecting a language from the list and validating this choice.

Illustration 17: Preference's page

(21)

3.4.1 Session storage

3.4.1.a Cookies

The user's preferences are stored in a cookie. By the next visit, the portal will be displayed with the language already stored.

The cookies are handled through CherryPy and are associated to unique session identification numbers.

Note: the user's browser has to accept cookies.

3.4.1.b Other session preferences Some other preferences can be stored in the session's cookie:

z the default grouping : which column has to be highlighted and displayed first on the Preliminary Results page

z the default unit search: which kind of units search has to be loaded (either the Simple Search or the Advanced Search)

z the default metadata search: which kind of metadata search has to be loaded (either the Browser, or the Simple Search or the Advanced Search)

z the default result display: which columns have to be displayed on the Available Results page. The taxonomic name is always displayed as it links the units triple ID with its name.

z the number of results per page: by default, the number is set to 20. It is the number of units that can be displayed per page for the Available Results.

(22)

4. Conclusion

The goal of this 4 months task was to develop the new portal for BioCASE, which had to be based on architecture study including BioCASE, DiGIR and TAPIR protocols support, cache query mechanisms and language module support. The unit-level query system and the metadata-level system, which has to be re-used, had to be clearly separated.

This portal also had to be developed in a generic way, in order to be easily re-used for other Search Portals based on the same database structure.

4.1 Portability & adaptability: GBIF-DE Botany

The portal has been adapted for the new GBIF-DE Botany portal: for the design, only the CSS files have been modified. The database structure for the cache is identical to the SYNTHESYS cache database, except of the table concerning the countries (the cache is filtered for specimen and observations coming from Germany). Then, some code parts have been set as comment (invisible).

4.2 Thesaurus

TOQE is an acronym for “Thesaurus Optimized Query Enhancer”. TOQE is a service that may be integrated into search engines or all kinds of query-processing pipelines. Based on the desired mode of operation it expands the query in a thesaurical meaning. TOQE itself is implemented as a Web Service. A client should interact with TOQE, which then queries a database backend and returns the requested data. So the desired design is that TOQE acts as a wrapper for a thesaurus database.

This web-service has been implemented by Niels Hoffmann, and will be linked with the new portal.

(23)

4.3 Feedback

A bugs tracker has been installed by the Edinburgh group. The portal's user will be invite to add comments and report bugs on this website: http://193.62.154.87/BioCase

4.4 To do

Some improvements are already planned, such as:

z an auto-completion of the search fields: as you type, the application will offer suggestions for the countries, taxonomic names ..., thanks to AJAX technologies

z the selection of check boxes on several results pages

z sorting by alphabetical order (ascending or descending) the columns contents of the results pages

z the possibility to download generic data for queries that obtained more than 10 000 units

Network Activity D Developing and Maintaining Databases