• No results found

Chapter 8 Evaluation

8.1 Usage, access, query, and session statistics

8.1.1 Changes in the search engine usage on the PRPZ‐WebSite – A log file

analysis

In Chapter 5 we conducted an initial analysis regarding the IR situation in the investigated departments. Amongst others we investigated in 2007 the usage of search engines linked from the local web site of Pharmaceutical Research in Penzberg (Fig. 5‐4). Since then, the access profile changed significantly (Fig. 8‐1). First of all, the number of search engines linked on the PRPZ‐WebSite raised from six search tools in 2007 to eight search tools until the mid of 2008. The additionally linked search engines are the prototype YASA, the online dictionary Leo (English‐ German), and a search engine targeting the Diagnostics division of Roche. The tool “Google search appliance” is not listed anymore as its license was discontinued after the introduction of YASA.

Google’s web search is still the most frequently accessed tool in the investigated departments, followed by the in‐house telephone book. While access to the

telephone book remained about the same, access to the search engine Google dropped by approximately 10%. The most likely reason is that the search engine Leo is now directly accessible from the PRPZ‐WebSite. Prior, workers were used to reach the dictionary by transmitting the navigational query “leo” to Google – as could be observed from the logs in 2008. Further, we assume that another reason might be due to YASA’s integration of DMOZ, which covers parts of the Internet. The usage of the search engines targeting the pharmaceutical intranet web pages and the diagnostics intranet web pages has not changed significantly. Similarly, access to Wikipedia remained about the same. The usage of PubMed however, grew slightly. The most significant change occurred in the search engine which originally targeted the PRPZ‐Share. In the past, Google’s Search Appliance had a fraction of only 0.3%. YASA on the contrary, which has replaced Google Search Appliance, has reached a quota of 11.9% – a significant increase.

Fig. 8‐1: Usage of search engines linked from the PRPZ‐WebSite. Data logged over a period of 6 month (Jan ‘09 – June ‘09).

The first version of the prototype YASA was introduced in January 2008. At that time YASA’s functionality was very limited: Search was only possible on the PRPZ‐Share and adaptation was merely applied based on the searcher’s departmental background. In March 2008, we introduced the first prototype to approximately 40 scientists from Pharmaceutical Research in Penzberg. As a result the usage number grew significantly in that month (Fig. 8‐2). In the subsequent month more and more functionality was integrated. Most notably, in July 2008, we expanded the coverage of the PRPZ‐Share to the patents department and in 2009 we indexed all of the PRPZ‐Share including secured folders. In January 2009 an increased usage of YASA could be observed. Eventually this is because of the inclusion of the previously not searchable secured folders. Wrapping it up, the usage of YASA grew steadily the more principles were implemented and the more workers heard about the tool.

Fig. 8‐2: Number of queries transmitted to YASA per month. Data logged over a period of 18 month (Jan ‘08 – June ‘09).

The diagram displaying the query‐per‐hour distribution (Fig. 8‐3) follows the typical pattern of the workday. At about 8 o’clock a large increase can be observed, which has a local minima at noon due to lunch, before it decays the later the hour gets. The peak usage of about 10 queries per hour is reached at 11 o’clock.

Fig. 8‐3: Average number of queries per hour. Data logged over a period of 18 month (Jan ‘09 – June ‘09).

8.1.2 Access of sources within YASA

In order to gather information about access to sources within YASA, we traced source access over a period of three month in 2009 (Fig. 8‐4). The vast majority of queries (91.3%) target the PRPZ‐Share. The sources PRPZ‐WebSite and PubMed have a similar access of 2.4% and 2.7%, respectively. DMOZ and Wikipedia are barely queried as is reflected by their access frequency of about 0.4%. The database sector has a fraction of 2.9%. Here, most requests target the telephone book (71.8%) – a similar observation to that in Fig. 8‐1. The software application repository is the second most frequently accessed database having a fraction of 18.1%. The databases

Phasis (Pharmaceutics animal study information system), TheraPS (workflow database), HWI (hardware inventory), and Plasmid are barely accessed by means of YASA. Their low usage was expected. The telephone database and the application database contain information which is freely accessible by all searchers and which is relevant to most employees. In contrast, access to the other databases is often restricted and they contain very specific information which is only relevant to a minority of the staff. Hence, our observation reflects well the access distribution to the sources.

Fig. 8‐4: Query distribution on sources within YASA. Data logged over a period of 3 month (Apr ’09 – June ’09)

In addition to the queries we also logged the clicks, i.e. the document access frequency, for some sources (Fig. 8‐5). We excluded databases from the click statistics because only few databases offer clickable links. Even though, the database “Applications” does usually offer clickable links to the intranet homepage of the listed application, there are still some entries which do not have a homepage associated. Similarly, for the databases “Telephone Book”, “Phasis”, “TheraPS”, “HWI”, and “Plasmid” there are no clickable links available. Instead, all relevant information is displayed with the result item. Indeed, offering links would be a convenient feature especially for databases which are highly related to applications such as “Phasis”, “TheraPS”, and “Plasmid”. A hyperlink which would open the

b) Databases a) Overview

selected item in context of the application would be a convenient feature. However, the respective applications lack this linkage functionality.

The statistics for the selected sources (i.e. the non‐database sources) shows that most document requests (89.3%) target the PRPZ‐Share. The remaining 11.7% percent are partitioned amongst Tagged Documents, DMOZ, PubMed, Wikipedia, and PRPZ‐WebSite. The source “Tagged Documents” is the leader of this minority having a frequency of 6.2%. Following that, PubMed as well as PRPZ‐WebSite have an access frequency of about 2%. The least targeted items are results from Wikipedia (0.3%) and DMOZ (0.1%).

Fig. 8‐5: Click distribution of sources within YASA. Data logged over a period of 6 month (Jan ’09 – June ’09).